API Documentation
Wauldo is a RAG API that returns verified answers with source citations and confidence scores. OpenAI SDK compatible. Zero hallucinations.
https://api.wauldo.com
REST + SSE Streaming
RapidAPI Key or JWT
Authentication
Two authentication methods are supported:
Option 1 — RapidAPI recommended
Get your API key from RapidAPI and include it in every request:
// Headers
X-RapidAPI-Key: your_api_key
X-RapidAPI-Host: smart-rag-api.p.rapidapi.com
Option 2 — JWT Token
Authenticate with username/password to get a Bearer token:
curl -X POST https://api.wauldo.com/api/auth/login \
-H "Content-Type: application/json" \
-d '{"username": "demo", "password": "demo_password"}'
# Response
{ "token": "eyJhbGciOiJIUzI1NiIs..." }
# Then use in all requests
Authorization: Bearer eyJhbGciOiJIUzI1NiIs...
Quick Start
Upload a document and get a verified answer in 2 API calls:
Upload your document
curl -X POST https://api.wauldo.com/v1/upload \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"content": "Section 4.2: Late payments incur a 2% monthly fee...",
"filename": "contract.txt"
}'
Ask a question
curl -X POST https://api.wauldo.com/v1/query \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"query": "What is the late payment fee?", "top_k": 5}'
Get a verified answer
{
"answer": "The contract specifies a 2% monthly late payment fee (Section 4.2).",
"sources": [
{ "content": "Section 4.2: Late payments incur a 2% monthly fee...", "score": 0.92 }
],
"audit": {
"confidence": 0.92,
"grounded": true,
"model": "qwen/qwen3.5-flash"
}
}
OpenAI SDK Compatibility
Wauldo is a drop-in replacement for the OpenAI API. Just change the base_url — your existing code works as-is.
from openai import OpenAI
# Just swap the base_url — everything else is the same
client = OpenAI(
base_url="https://api.wauldo.com/v1",
api_key="your_jwt_token"
)
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Explain quantum computing"}],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content, end="")
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://api.wauldo.com/v1',
apiKey: 'your_jwt_token',
});
const stream = await client.chat.completions.create({
model: 'auto',
messages: [{ role: 'user', content: 'Explain quantum computing' }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
curl https://api.wauldo.com/v1/chat/completions \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [{"role": "user", "content": "Explain quantum computing"}],
"stream": true
}'
Upload Document (text)
/v1/upload
Upload text content to be chunked, indexed, and available for queries.
Request Body
| Parameter | Type | Description |
|---|---|---|
| content required | string | Document text content (max 10MB) |
| filename optional | string | Filename for source tracking (e.g. report.txt) |
Response 200
{
"status": "success",
"chunks_count": 12,
"source": "report.txt"
}
Upload File
/v1/upload/file
Upload a file directly using multipart form data.
Supported formats
curl -X POST https://api.wauldo.com/v1/upload/file \
-H "Authorization: Bearer $TOKEN" \
-F "file=@contract.txt"
Response 200
{
"status": "success",
"chunks_count": 24,
"source": "contract.txt",
"file_size": 15234
}
Query
/v1/query
Ask a question against your uploaded documents. Returns a verified answer with sources, confidence score, and full audit trail.
Request Body
| Parameter | Type | Description |
|---|---|---|
| query required | string | Your question |
| top_k optional | integer | Number of source chunks to retrieve (default: 5, max: 20) |
| stream optional | boolean | Enable SSE streaming — see Streaming guide |
| debug optional | boolean | Include retrieval funnel diagnostics — see Audit Trail |
| quality_mode optional | string | fast, balanced, or premium — see Quality Modes |
Response 200
{
"answer": "The contract specifies a 2% monthly late payment fee (Section 4.2).",
"sources": [
{
"content": "Section 4.2: Late payments incur a 2% monthly fee...",
"score": 0.92,
"source": "contract.txt"
}
],
"audit": {
"confidence": 0.92,
"confidence_label": "high",
"grounded": true,
"retrieval_path": "BM25Reranked",
"model": "qwen/qwen3.5-flash",
"latency_ms": 1420,
"sources_used": 2,
"sources_evaluated": 5
}
}
Chat Completions
/v1/chat/completions
OpenAI-compatible chat endpoint. Works with any OpenAI SDK. Supports streaming.
Request Body
| Parameter | Type | Description |
|---|---|---|
| messages required | array | Array of {"role": "user"|"system"|"assistant", "content": "..."} |
| model optional | string | Model name or "auto" (default: auto-selected) |
| stream optional | boolean | Enable SSE streaming (recommended for UX) |
| temperature optional | number | Sampling temperature, 0.0 to 2.0 (default: 0.7) |
| max_tokens optional | integer | Maximum tokens in the response |
List Models
/v1/models
Returns available models. OpenAI SDK compatible.
curl https://api.wauldo.com/v1/models \
-H "Authorization: Bearer $TOKEN"
Collections
/v1/collections
List all document collections for the authenticated tenant.
/v1/collections/{name}
Delete a collection and all its chunks. Useful for re-uploading updated documents.
Health
/health
Returns API health, RAG chunk count, Redis status, active provider, and uptime. No auth required.
{
"status": "ok",
"rag_chunks": 142,
"redis": "connected",
"provider": "openrouter",
"uptime_seconds": 86400
}
SSE Streaming
When stream: true is set on /v1/query, the response is delivered as Server-Sent Events (SSE). This lets you show sources and stream the answer token-by-token for a great UX.
Event sequence
Example: consume the stream
import requests, json
resp = requests.post(
"https://api.wauldo.com/v1/query",
headers={"Authorization": f"Bearer {token}"},
json={"query": "What is the late fee?", "stream": True},
stream=True
)
for line in resp.iter_lines():
if not line:
continue
data = line.decode().removeprefix("data: ")
if data == "[DONE]":
break
event = json.loads(data)
if "sources" in event:
print(f"Found {len(event['sources'])} sources")
elif "token" in event:
print(event["token"], end="")
elif "audit" in event:
print(f"\nConfidence: {event['audit']['confidence']}")
const resp = await fetch('https://api.wauldo.com/v1/query', {
method: 'POST',
headers: { 'Authorization': `Bearer ${token}`, 'Content-Type': 'application/json' },
body: JSON.stringify({ query: 'What is the late fee?', stream: true })
});
const reader = resp.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
for (const line of decoder.decode(value).split('\n')) {
if (!line.startsWith('data: ')) continue;
const data = line.slice(6);
if (data === '[DONE]') return;
const event = JSON.parse(data);
if (event.token) document.getElementById('answer').textContent += event.token;
}
}
curl -N https://api.wauldo.com/v1/query \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"query": "What is the late fee?", "stream": true}'
# Output:
# data: {"sources": [...]}
# data: {"token": "The"}
# data: {"token": " contract"}
# data: {"token": " specifies"}
# ...
# data: {"audit": {"confidence": 0.92, "grounded": true, ...}}
# data: [DONE]
Audit Trail
Every query response includes an audit object that makes the answer self-verifiable. Use it to build trust indicators in your UI, flag low-confidence answers, or debug retrieval issues.
Audit fields
Using audit in your app
# Show a trust badge based on confidence
audit = response["audit"]
if audit["grounded"] and audit["confidence_label"] == "high":
show_badge("Verified", color="green") # Safe to display
elif audit["confidence_label"] == "medium":
show_badge("Likely correct", color="yellow") # Show with caveat
else:
show_badge("Low confidence", color="red") # Warn the user
# Log for monitoring
log(model=audit["model"], latency=audit["latency_ms"], path=audit["retrieval_path"])
Quality Modes
Control the speed/quality tradeoff with the quality_mode parameter. If omitted, Wauldo auto-selects the best tier based on your query complexity and RAG confidence.
# Explicitly set quality mode
curl -X POST https://api.wauldo.com/v1/query \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"query": "Analyze the financial implications",
"quality_mode": "premium"
}'
Retrieval Paths
Wauldo uses cost-aware routing to choose the best retrieval strategy for each query. The retrieval_path in the audit trail tells you which was used.
Fast keyword matching. Used when the query closely matches document terms. Fastest path (~10ms retrieval).
Example: "What is the late payment fee?" against a contract with those exact terms.
BM25 retrieval + BGE neural reranking. Best balance of speed and accuracy. Catches semantic matches that keyword search might miss.
Example: "How much extra do I pay if I'm late?" — paraphrased query, similar meaning.
Full dense vector search with Reciprocal Rank Fusion (BM25 + cosine similarity). Most thorough but slowest path. Used when the query is conceptually related but uses different vocabulary.
Example: "financial penalties for overdue invoices" against a doc that says "late payment fee".
Use Cases
Wauldo works best when you need verified, source-cited answers from your own documents.
Legal & Compliance
Upload contracts, policies, or regulations. Ask about specific clauses, obligations, or deadlines. Every answer cites the exact section.
Q: "What is the termination notice period?"
A: "60 days written notice (Section 12.3)"
confidence: 0.95 | grounded: true
Knowledge Base / Support
Upload product docs, FAQs, or runbooks. Build a support bot that gives accurate answers instead of hallucinating.
Q: "How do I reset my password?"
A: "Go to Settings > Security > Reset..."
confidence: 0.88 | grounded: true
Financial Analysis
Upload earnings reports, balance sheets, or market research. Extract specific numbers with source verification.
Q: "What was Q3 revenue?"
A: "$4.2M, up 23% YoY (page 3)"
confidence: 0.91 | grounded: true
Technical Documentation
Upload API specs, architecture docs, or code. Get precise technical answers grounded in your actual documentation.
Q: "What's the max payload size?"
A: "10MB per request (API limits doc)"
confidence: 0.93 | grounded: true
Python SDK
pip install wauldo
from wauldo import WauldoClient
client = WauldoClient("https://api.wauldo.com")
client.login("demo", "demo_password")
# Upload a document
client.rag_upload("Your document text...")
# Query with debug info
result = client.rag_query("What are the key points?", debug=True)
print(result.answer)
print(result.audit.confidence) # 0.92
print(result.audit.grounded) # True
print(result.audit.retrieval_path) # "BM25Reranked"
print(result.sources) # [Source(...)]
# Chat (OpenAI-compatible)
reply = client.chat_simple("Explain quantum computing")
# Conversation with memory
conv = client.conversation(system="You are a helpful assistant")
conv.say("Hello!")
conv.say("What did I just say?") # remembers context
TypeScript SDK
npm install wauldo
import { WauldoClient } from 'wauldo';
const client = new WauldoClient('https://api.wauldo.com');
await client.login('demo', 'demo_password');
// Upload & query
await client.ragUpload('Your document text...');
const result = await client.ragQuery('What are the key points?', 5, { debug: true });
console.log(result.answer);
console.log(result.audit.confidence); // 0.92
console.log(result.audit.grounded); // true
console.log(result.audit.retrievalPath); // "BM25Reranked"
// Streaming chat
await client.chatStream(
[{ role: 'user', content: 'Hello' }],
{ onToken: (t) => process.stdout.write(t) }
);
Rust SDK
cargo add wauldo
use wauldo::WauldoClient;
let client = WauldoClient::new("https://api.wauldo.com");
client.login("demo", "demo_password").await?;
// Upload & query with debug
client.rag_upload("Your document text...").await?;
let result = client.rag_query_debug("What are the key points?").await?;
println!("{}", result.answer);
println!("Confidence: {}", result.confidence()); // 0.92
println!("Grounded: {}", result.grounded()); // true
// Streaming chat
client.chat_stream(messages, |token| {
print!("{}", token);
}).await?;
Error Codes
| Code | Meaning | Action |
|---|---|---|
| 400 | Bad request | Check required parameters |
| 401 | Unauthorized | Check your API key or token |
| 413 | Payload too large | Body exceeds 10MB — split your document |
| 429 | Rate limited | Wait and retry, or upgrade plan |
| 500 | Internal error | Retry once. Contact support if persistent |
| 502 | LLM provider error | Retryable — auto-retried 2x internally |
| 503 | Service starting up | Retry after 10-15s (cold start) |
Rate Limits
| Plan | Requests/month | Premium AI calls | Price |
|---|---|---|---|
| Basic | 300 | 50 | Free |
| Pro | 1,000 | 500 | $9/mo |
| Ultra | 10,000 | 5,000 | $29/mo |
| Mega | Unlimited | Unlimited | $0.002/req |
Rate limits are per API key. Manage your subscription on RapidAPI.
Limits & Quotas
| Resource | Limit |
|---|---|
| Request body | 10 MB |
| Max chunks per upload | 5,000 |
| Embedding dimensions | 1 – 4,096 |
| Streaming response | 256 KB |
| SSE timeout | 1,800s (30 min) |
| Standard API timeout | 180s (3 min) |
| Source chunks per query | Max 3 (all ≥0.20 score) |
Interactive Explorer
Try the API directly without writing code: