Large language models hallucinate. This is not a bug that will be fixed in the next release. It is a fundamental property of how these models work: they predict the next probable token, not the next true token.
ChatGPT will confidently tell you that a contract clause says something it does not say. Copilot will generate an API call to an endpoint that does not exist. Claude will cite a paper that was never written. For toy projects, this is amusing. For applications serving real users — legal tools, financial dashboards, healthcare systems, internal knowledge bases — this is unacceptable.
You need answers that are verified against your actual documents, with proof. Here is how to get there in five minutes.
The Traditional Approach
Building a reliable document Q&A system the traditional way requires assembling multiple pieces:
- A vector database (Pinecone, Weaviate, Chroma) to store document embeddings
- An embedding model (OpenAI ada-002, BGE, Cohere) to convert text to vectors
- A chunking strategy to split documents into retrievable segments
- An LLM (GPT-4, Claude, Llama) to generate answers from retrieved context
- A fact-checking layer to verify the answer against the source material
That is five services, three APIs, and weeks of integration work before you even start thinking about confidence scoring, multi-document synthesis, or tenant isolation. Most teams either skip the fact-checking entirely (and ship hallucinations) or spend months building a brittle pipeline that breaks every time a provider changes their API.
The Wauldo Way
Wauldo collapses the entire pipeline into two API calls: upload, then query. Everything else — chunking, embedding, hybrid retrieval, LLM generation, fact-checking, confidence calibration — happens server-side in a single request.
Every response comes with an audit trail: which sources were used, how confident the system is, whether the answer is grounded in the documents, and which retrieval path was taken. You do not have to trust the answer. You can verify it.
Step by Step
Step 1: Get your API key. Sign up on RapidAPI and grab your key. The free tier gives you 300 requests per month — enough to build and test your integration.
Step 2: Upload a document. Send your file to the upload endpoint. Wauldo will chunk it, build a BM25 index, compute embeddings, and store everything — all in one call.
curl -X POST "https://api.wauldo.com/v1/upload" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: multipart/form-data" \ -F "file=@quarterly-report.pdf" \ -F "collection=finance"
The response confirms how many chunks were created and indexed. A typical 10-page PDF produces 30-50 chunks and takes under 2 seconds to process.
Step 3: Query your documents. Ask a question in natural language. The system retrieves relevant chunks, generates an answer, and fact-checks it — all in a single request.
curl -X POST "https://api.wauldo.com/v1/query" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "query": "What was the revenue growth in Q3?", "k": 3, "collection": "finance" }'
Step 4: Read the audit trail. This is what makes Wauldo different. Every response includes structured metadata you can use programmatically:
{
"answer": "Revenue grew 23% year-over-year in Q3...",
"sources": [
{ "title": "quarterly-report.pdf", "chunk": "Q3 results show..." }
],
"audit": {
"confidence": 0.87,
"confidence_label": "high",
"grounded": true,
"retrieval_path": "BM25Only",
"sources_used": 2,
"model": "qwen/qwen3.5-flash-02-23",
"latency_ms": 1243
}
}
The confidence score tells you how strongly the answer is supported by the retrieved documents. The grounded boolean tells you whether the fact-checker verified the claims. The retrieval_path shows which retrieval strategy was used (BM25, reranked, or full dense retrieval). This is not a black box — every answer is self-explicable.
What Makes It Different
Most RAG APIs stop at "retrieve chunks and pass them to an LLM." Wauldo goes further:
- Confidence scoring — Every answer includes a calibrated confidence score (high/medium/low) so your application can decide what to show users versus what to flag for human review
- Grounded verification — A fact-checker compares the generated answer against the source chunks using token overlap and semantic similarity. If the answer contains claims not supported by the documents,
groundedisfalse - Cost-aware retrieval — The system automatically picks the fastest retrieval path that will produce good results. Simple keyword matches use BM25 (fast, cheap). Complex semantic queries use dense embeddings with BGE reranking (slower, better)
- Quality modes — Pass
"quality_mode": "premium"to use GPT-4.1 for complex questions, or let the system auto-select the best model based on query complexity - Tenant isolation — Documents are scoped by tenant. Your users' data never leaks across accounts. BM25 scoring is tenant-scoped, meaning search results reflect only that tenant's documents
Benchmark results: In our live evaluation suite, Wauldo scores 100% on RAG retrieval (9/9 queries correct) with zero hallucinations across all test categories. The full benchmark methodology and results are public.
Try It Now
You can test the full pipeline right now without writing any code. The live demo lets you upload a file, ask questions, and see the audit trail in real time.
When you are ready to integrate:
- Grab your API key on RapidAPI (free tier: 300 requests/month)
- Browse the API documentation on Postman
- Use the SDKs: Python, TypeScript, or Rust
Five minutes. Verified answers. No hallucinations.