Skip to content

API Documentation

Wauldo is a RAG API that returns verified answers with source citations and confidence scores. OpenAI SDK compatible. Zero hallucinations.

Base URL
https://api.wauldo.com
Protocol
REST + SSE Streaming
Auth
RapidAPI Key or JWT
New here? Get a free API key on RapidAPI (300 requests/month, no credit card), then follow the Quick Start below.

Authentication

Two authentication methods are supported:

Option 1 — RapidAPI recommended

Get your API key from RapidAPI and include it in every request:

// Headers
X-RapidAPI-Key: your_api_key
X-RapidAPI-Host: smart-rag-api.p.rapidapi.com

Option 2 — JWT Token

Authenticate with username/password to get a Bearer token:

curl -X POST https://api.wauldo.com/api/auth/login \
  -H "Content-Type: application/json" \
  -d '{"username": "demo", "password": "demo_password"}'

# Response
{ "token": "eyJhbGciOiJIUzI1NiIs..." }

# Then use in all requests
Authorization: Bearer eyJhbGciOiJIUzI1NiIs...

Quick Start

Upload a document and get a verified answer in 2 API calls:

1

Upload your document

curl -X POST https://api.wauldo.com/v1/upload \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Section 4.2: Late payments incur a 2% monthly fee...",
    "filename": "contract.txt"
  }'
2

Ask a question

curl -X POST https://api.wauldo.com/v1/query \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query": "What is the late payment fee?", "top_k": 5}'
3

Get a verified answer

{
  "answer": "The contract specifies a 2% monthly late payment fee (Section 4.2).",
  "sources": [
    { "content": "Section 4.2: Late payments incur a 2% monthly fee...", "score": 0.92 }
  ],
  "audit": {
    "confidence": 0.92,
    "grounded": true,
    "model": "qwen/qwen3.5-flash"
  }
}

OpenAI SDK Compatibility

Wauldo is a drop-in replacement for the OpenAI API. Just change the base_url — your existing code works as-is.

from openai import OpenAI

# Just swap the base_url — everything else is the same
client = OpenAI(
    base_url="https://api.wauldo.com/v1",
    api_key="your_jwt_token"
)

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.wauldo.com/v1',
  apiKey: 'your_jwt_token',
});

const stream = await client.chat.completions.create({
  model: 'auto',
  messages: [{ role: 'user', content: 'Explain quantum computing' }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
curl https://api.wauldo.com/v1/chat/completions \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Explain quantum computing"}],
    "stream": true
  }'
Supported endpoints: /v1/chat/completions, /v1/models — both work identically to OpenAI. Wauldo auto-selects the best model for each request unless you specify one.

Upload Document (text)

POST /v1/upload

Upload text content to be chunked, indexed, and available for queries.

Request Body

ParameterTypeDescription
content required string Document text content (max 10MB)
filename optional string Filename for source tracking (e.g. report.txt)

Response 200

{
  "status": "success",
  "chunks_count": 12,
  "source": "report.txt"
}

Upload File

POST /v1/upload/file

Upload a file directly using multipart form data.

Supported formats

.txt .md .csv .json .yaml .xml .html .rtf .py .js .ts .rs .java .go .cpp .sql .sh .css .toml .log
curl -X POST https://api.wauldo.com/v1/upload/file \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@contract.txt"

Response 200

{
  "status": "success",
  "chunks_count": 24,
  "source": "contract.txt",
  "file_size": 15234
}

Query

POST /v1/query

Ask a question against your uploaded documents. Returns a verified answer with sources, confidence score, and full audit trail.

Request Body

ParameterTypeDescription
query required string Your question
top_k optional integer Number of source chunks to retrieve (default: 5, max: 20)
stream optional boolean Enable SSE streaming — see Streaming guide
debug optional boolean Include retrieval funnel diagnostics — see Audit Trail
quality_mode optional string fast, balanced, or premium — see Quality Modes

Response 200

{
  "answer": "The contract specifies a 2% monthly late payment fee (Section 4.2).",
  "sources": [
    {
      "content": "Section 4.2: Late payments incur a 2% monthly fee...",
      "score": 0.92,
      "source": "contract.txt"
    }
  ],
  "audit": {
    "confidence": 0.92,
    "confidence_label": "high",
    "grounded": true,
    "retrieval_path": "BM25Reranked",
    "model": "qwen/qwen3.5-flash",
    "latency_ms": 1420,
    "sources_used": 2,
    "sources_evaluated": 5
  }
}

Chat Completions

POST /v1/chat/completions

OpenAI-compatible chat endpoint. Works with any OpenAI SDK. Supports streaming.

Request Body

ParameterTypeDescription
messages required array Array of {"role": "user"|"system"|"assistant", "content": "..."}
model optional string Model name or "auto" (default: auto-selected)
stream optional boolean Enable SSE streaming (recommended for UX)
temperature optional number Sampling temperature, 0.0 to 2.0 (default: 0.7)
max_tokens optional integer Maximum tokens in the response

List Models

GET /v1/models

Returns available models. OpenAI SDK compatible.

curl https://api.wauldo.com/v1/models \
  -H "Authorization: Bearer $TOKEN"

Collections

GET /v1/collections

List all document collections for the authenticated tenant.

DELETE /v1/collections/{name}

Delete a collection and all its chunks. Useful for re-uploading updated documents.

Health

GET /health

Returns API health, RAG chunk count, Redis status, active provider, and uptime. No auth required.

{
  "status": "ok",
  "rag_chunks": 142,
  "redis": "connected",
  "provider": "openrouter",
  "uptime_seconds": 86400
}

SSE Streaming

When stream: true is set on /v1/query, the response is delivered as Server-Sent Events (SSE). This lets you show sources and stream the answer token-by-token for a great UX.

Event sequence

sources Sent first — contains the retrieved source chunks with scores. Display these immediately while the answer generates.
token Sent repeatedly — each event contains one token of the answer. Append to your UI in real-time.
audit Sent once after all tokens — contains the full audit trail (confidence, grounded, model, latency).
[DONE] Stream complete. Close the connection.

Example: consume the stream

import requests, json

resp = requests.post(
    "https://api.wauldo.com/v1/query",
    headers={"Authorization": f"Bearer {token}"},
    json={"query": "What is the late fee?", "stream": True},
    stream=True
)

for line in resp.iter_lines():
    if not line:
        continue
    data = line.decode().removeprefix("data: ")
    if data == "[DONE]":
        break
    event = json.loads(data)

    if "sources" in event:
        print(f"Found {len(event['sources'])} sources")
    elif "token" in event:
        print(event["token"], end="")
    elif "audit" in event:
        print(f"\nConfidence: {event['audit']['confidence']}")
const resp = await fetch('https://api.wauldo.com/v1/query', {
  method: 'POST',
  headers: { 'Authorization': `Bearer ${token}`, 'Content-Type': 'application/json' },
  body: JSON.stringify({ query: 'What is the late fee?', stream: true })
});

const reader = resp.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  for (const line of decoder.decode(value).split('\n')) {
    if (!line.startsWith('data: ')) continue;
    const data = line.slice(6);
    if (data === '[DONE]') return;
    const event = JSON.parse(data);
    if (event.token) document.getElementById('answer').textContent += event.token;
  }
}
curl -N https://api.wauldo.com/v1/query \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query": "What is the late fee?", "stream": true}'

# Output:
# data: {"sources": [...]}
# data: {"token": "The"}
# data: {"token": " contract"}
# data: {"token": " specifies"}
# ...
# data: {"audit": {"confidence": 0.92, "grounded": true, ...}}
# data: [DONE]

Audit Trail

Every query response includes an audit object that makes the answer self-verifiable. Use it to build trust indicators in your UI, flag low-confidence answers, or debug retrieval issues.

Audit fields

confidence 0.0 to 1.0 — How confident the system is in the answer. Based on source relevance scores and fact-checking. Display as a percentage in your UI.
confidence_label high (≥0.45), medium (≥0.20), or low (<0.20). Use this to color-code answers: green, yellow, red.
grounded true or false — Whether the answer is fully supported by the retrieved sources. If false, the answer may contain information not in your documents.
retrieval_path Which retrieval strategy was used: BM25Only, BM25Reranked, or DenseFull. See Retrieval Paths.
model Which LLM generated the answer (e.g. qwen/qwen3.5-flash, openai/gpt-4.1-mini).
latency_ms Total processing time in milliseconds (retrieval + LLM generation).
sources_used Number of source chunks included in the LLM context.
sources_evaluated Total chunks considered before filtering. Compare with sources_used to see filtering effectiveness.
Debug mode: Add "debug": true to your query to get the full retrieval funnel: candidates_foundcandidates_after_tenantcandidates_after_scoresources_used. Useful for diagnosing "I uploaded a doc but the answer seems wrong" issues.

Using audit in your app

# Show a trust badge based on confidence
audit = response["audit"]

if audit["grounded"] and audit["confidence_label"] == "high":
    show_badge("Verified", color="green")      # Safe to display
elif audit["confidence_label"] == "medium":
    show_badge("Likely correct", color="yellow") # Show with caveat
else:
    show_badge("Low confidence", color="red")    # Warn the user

# Log for monitoring
log(model=audit["model"], latency=audit["latency_ms"], path=audit["retrieval_path"])

Quality Modes

Control the speed/quality tradeoff with the quality_mode parameter. If omitted, Wauldo auto-selects the best tier based on your query complexity and RAG confidence.

Fast
Gemini 2.0 Flash
~2-4s latency
$0.10 / 1M tokens
Best for: simple questions, chat, summaries
Premium
GPT-4.1
~5-8s latency
$2.00 / 1M tokens
Best for: complex analysis, critical accuracy
RAG Quality tier: When RAG confidence is high (≥0.60), Wauldo automatically uses Qwen 3.5 Flash — optimized for document-grounded answers at $0.065/1M tokens. This is the secret behind our 100% RAG retrieval score.
# Explicitly set quality mode
curl -X POST https://api.wauldo.com/v1/query \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Analyze the financial implications",
    "quality_mode": "premium"
  }'
Plan limits: Basic (free) plan caps at balanced tier. Upgrade to Pro or higher for premium access.

Retrieval Paths

Wauldo uses cost-aware routing to choose the best retrieval strategy for each query. The retrieval_path in the audit trail tells you which was used.

BM25Only BM25 score ≥ 0.45

Fast keyword matching. Used when the query closely matches document terms. Fastest path (~10ms retrieval).

Example: "What is the late payment fee?" against a contract with those exact terms.

BM25Reranked BM25 score ≥ 0.20

BM25 retrieval + BGE neural reranking. Best balance of speed and accuracy. Catches semantic matches that keyword search might miss.

Example: "How much extra do I pay if I'm late?" — paraphrased query, similar meaning.

DenseFull BM25 score < 0.20

Full dense vector search with Reciprocal Rank Fusion (BM25 + cosine similarity). Most thorough but slowest path. Used when the query is conceptually related but uses different vocabulary.

Example: "financial penalties for overdue invoices" against a doc that says "late payment fee".

Multi-source merge: Regardless of path, all chunks scoring ≥0.20 are included (max 3 sources). Sources are labeled by relevance so the LLM can resolve conflicts deterministically: Source 1 always wins.

Use Cases

Wauldo works best when you need verified, source-cited answers from your own documents.

Legal & Compliance

Upload contracts, policies, or regulations. Ask about specific clauses, obligations, or deadlines. Every answer cites the exact section.

Q: "What is the termination notice period?"
A: "60 days written notice (Section 12.3)"
   confidence: 0.95 | grounded: true
📚

Knowledge Base / Support

Upload product docs, FAQs, or runbooks. Build a support bot that gives accurate answers instead of hallucinating.

Q: "How do I reset my password?"
A: "Go to Settings > Security > Reset..."
   confidence: 0.88 | grounded: true
📈

Financial Analysis

Upload earnings reports, balance sheets, or market research. Extract specific numbers with source verification.

Q: "What was Q3 revenue?"
A: "$4.2M, up 23% YoY (page 3)"
   confidence: 0.91 | grounded: true
🛠

Technical Documentation

Upload API specs, architecture docs, or code. Get precise technical answers grounded in your actual documentation.

Q: "What's the max payload size?"
A: "10MB per request (API limits doc)"
   confidence: 0.93 | grounded: true

Python SDK

pip install wauldo
from wauldo import WauldoClient

client = WauldoClient("https://api.wauldo.com")
client.login("demo", "demo_password")

# Upload a document
client.rag_upload("Your document text...")

# Query with debug info
result = client.rag_query("What are the key points?", debug=True)

print(result.answer)
print(result.audit.confidence)        # 0.92
print(result.audit.grounded)          # True
print(result.audit.retrieval_path)    # "BM25Reranked"
print(result.sources)                 # [Source(...)]

# Chat (OpenAI-compatible)
reply = client.chat_simple("Explain quantum computing")

# Conversation with memory
conv = client.conversation(system="You are a helpful assistant")
conv.say("Hello!")
conv.say("What did I just say?")  # remembers context

TypeScript SDK

npm install wauldo
import { WauldoClient } from 'wauldo';

const client = new WauldoClient('https://api.wauldo.com');
await client.login('demo', 'demo_password');

// Upload & query
await client.ragUpload('Your document text...');
const result = await client.ragQuery('What are the key points?', 5, { debug: true });

console.log(result.answer);
console.log(result.audit.confidence);      // 0.92
console.log(result.audit.grounded);        // true
console.log(result.audit.retrievalPath);   // "BM25Reranked"

// Streaming chat
await client.chatStream(
  [{ role: 'user', content: 'Hello' }],
  { onToken: (t) => process.stdout.write(t) }
);

Rust SDK

cargo add wauldo
use wauldo::WauldoClient;

let client = WauldoClient::new("https://api.wauldo.com");
client.login("demo", "demo_password").await?;

// Upload & query with debug
client.rag_upload("Your document text...").await?;
let result = client.rag_query_debug("What are the key points?").await?;

println!("{}", result.answer);
println!("Confidence: {}", result.confidence());  // 0.92
println!("Grounded: {}", result.grounded());      // true

// Streaming chat
client.chat_stream(messages, |token| {
    print!("{}", token);
}).await?;

Error Codes

CodeMeaningAction
400Bad requestCheck required parameters
401UnauthorizedCheck your API key or token
413Payload too largeBody exceeds 10MB — split your document
429Rate limitedWait and retry, or upgrade plan
500Internal errorRetry once. Contact support if persistent
502LLM provider errorRetryable — auto-retried 2x internally
503Service starting upRetry after 10-15s (cold start)
Auto-retry: The SDKs automatically retry 502/503/500 errors with exponential backoff. If you're using raw HTTP, retry these status codes up to 2 times with a 2s delay.

Rate Limits

PlanRequests/monthPremium AI callsPrice
Basic30050Free
Pro1,000500$9/mo
Ultra10,0005,000$29/mo
MegaUnlimitedUnlimited$0.002/req

Rate limits are per API key. Manage your subscription on RapidAPI.

Limits & Quotas

ResourceLimit
Request body10 MB
Max chunks per upload5,000
Embedding dimensions1 – 4,096
Streaming response256 KB
SSE timeout1,800s (30 min)
Standard API timeout180s (3 min)
Source chunks per queryMax 3 (all ≥0.20 score)

Interactive Explorer

Try the API directly without writing code: