Large language models hallucinate. This is not a bug that will be fixed in the next release. It is a fundamental property of how these models work: they predict the next probable token, not the next true token. Our deep dive into how our zero-hallucination RAG pipeline works explains the technical reasons in detail.

ChatGPT will confidently tell you that a contract clause says something it does not say. Copilot will generate an API call to an endpoint that does not exist. Claude will cite a paper that was never written. For toy projects, this is amusing. For applications serving real users — legal tools, financial dashboards, healthcare systems, internal knowledge bases — this is unacceptable. That is why we built Wauldo Guard, a hallucination firewall that catches wrong answers before they reach your users.

You need answers that are verified against your actual documents, with proof. Here is how to get there in five minutes.

The traditional approach

Building a reliable document Q&A system the traditional way requires assembling multiple pieces:

  • A vector database (Pinecone, Weaviate, Chroma) to store document embeddings.
  • An embedding model (OpenAI text-embedding, sentence-transformers, Cohere) to convert text to vectors.
  • A chunking strategy to split documents into retrievable segments.
  • An LLM (GPT-4, Claude, Llama) to generate answers from retrieved context.
  • A fact-checking layer to verify the answer against the source material.

That is five services, three APIs, and weeks of integration work before you even start thinking about confidence scoring, multi-document synthesis, or tenant isolation. Most teams either skip the fact-checking entirely (and ship hallucinations) or spend months building a brittle pipeline that breaks every time a provider changes their API. We cover the 5 most common RAG mistakes in a separate post — nearly all of them stem from this complexity.

The Wauldo way

Wauldo collapses the entire pipeline into two API calls: upload, then query. Everything else — chunking, embedding, hybrid retrieval, LLM generation, fact-checking, confidence calibration — happens server-side in a single request.

Every response comes with an audit trail: which sources were used, how confident the system is, whether the answer is grounded in the documents, and which retrieval path was taken. You do not have to trust the answer. You can verify it.

Step by step

Step 1: Get your API key. Sign up on RapidAPI and grab your key. The free tier gives you 300 requests per month — enough to build and test your integration.

Step 2: Upload a document. Send your file to the upload endpoint. Wauldo will chunk it, build a BM25 index, compute embeddings, and store everything — all in one call.

curl -X POST "https://api.wauldo.com/v1/upload" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@quarterly-report.pdf" \
  -F "collection=finance"

The response confirms how many chunks were created and indexed. A typical 10-page PDF produces 30-50 chunks and takes under 2 seconds to process.

Step 3: Query your documents. Ask a question in natural language. The system retrieves relevant chunks, generates an answer, and fact-checks it — all in a single request.

curl -X POST "https://api.wauldo.com/v1/query" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What was the revenue growth in Q3?",
    "k": 3,
    "collection": "finance"
  }'

Step 4: Read the audit trail. This is what makes Wauldo different. Every response includes structured metadata you can use programmatically:

{
  "answer": "Revenue grew 23% year-over-year in Q3...",
  "sources": [
    { "title": "quarterly-report.pdf", "chunk": "Q3 results show..." }
  ],
  "audit": {
    "support_score": 0.87,
    "verdict": "SAFE",
    "grounded": true,
    "retrieval_path": "BM25Only",
    "sources_used": 2,
    "model": "auto",
    "latency_ms": 1243
  }
}

The support_score tells you how strongly the answer is supported by the retrieved documents. The grounded boolean tells you whether the fact-checker verified the claims. The retrieval_path shows which retrieval strategy was used (BM25, reranked, or full dense retrieval). This is not a black box — every answer is self-explicable.

What makes it different

Most RAG APIs stop at "retrieve chunks and pass them to an LLM." Wauldo goes further:

  • Support scoring — Every answer includes a calibrated number in [0,1] so your application can decide what to show users versus what to flag for human review.
  • Grounded verification — A fact-checker compares the generated answer against the source chunks using token overlap and semantic similarity. If the answer contains claims not supported by the documents, grounded is false.
  • Cost-aware retrieval — The system automatically picks the fastest retrieval path that will produce good results. Simple keyword matches use BM25 (fast, cheap). Complex semantic queries use dense embeddings with reranking (slower, better).
  • Quality modes — Pass "quality_mode": "premium" to route complex questions to a stronger model, or let the system auto-select based on query complexity.
  • Tenant isolation — Documents are scoped by tenant. Your users' data never leaks across accounts. BM25 scoring is tenant-scoped, meaning search results reflect only that tenant's documents.

In our live evaluation suite, Wauldo scores above 97% on adversarial RAG retrieval categories. See how we compare to ChatGPT and basic RAG solutions. The full benchmark methodology and results are public.

Try it now

You can test the full pipeline right now without writing any code. The live demo lets you upload a file, ask questions, and see the audit trail in real time.

When you are ready to integrate:

Five minutes. Verified answers. No hallucinations.

Try it free 500 verifications a month on the free tier. Paste any answer with its sources and get the number back. Verify now → or get a key.