The verification & orchestration runtime for AI agents.
Paste any AI answer for a numeric support_score (0–1), or orchestrate a multi-step pipeline as a state machine. Every claim grounded, every transition audited.
Three steps. No model guessing.
Wauldo extracts atomic claims from the answer, matches each claim against your sources, and returns a grounded score. You see exactly what is supported and what is not.
Send answer + source
Any LLM output. Any source text or RAG context. One POST.
Claims extracted
Each factual assertion is isolated — dates, entities, numbers, relationships. No summarization.
Support score returned
Every claim checked against sources. Output: support_score ∈ [0,1] + per-claim verdict.
# POST /v1/fact-check — returns support_score + per-claim verdicts curl -X POST https://api.wauldo.com/v1/fact-check \ -H "Authorization: Bearer $WAULDO_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "text": "Paris has 12 million inhabitants.", "source_context": "Paris population is 2.1 million (2024).", "mode": "lexical" }' # → { "support_score": 0.0, "verdict": "UNVERIFIED", "claims": [...] }
Reproducible adversarial benchmark.
70 cases × 4 runs against five frameworks. Factual retrieval, prompt injection, out-of-scope. The command to re-run it is printed on this page — no signup, no cached numbers.
⚖ Reading the table fairly. LangChain, LlamaIndex, Haystack and CrewAI are orchestration frameworks, not verification layers — so comparing them to Wauldo on adversarial inputs is intentionally apples-to-oranges. The numbers measure what a developer gets out-of-the-box from each stack, not the framework's intrinsic quality. The honest follow-up question is "does adding Wauldo to LangChain close the gap?" — the ablation answers it: same 44% on injection, the verification has to live inside the loop. See the ablation →
| Framework | Factual | Injection | Out-of-scope | Total |
|---|---|---|---|---|
| Wauldo | 100% | 92% | 100% | 91% |
| LlamaIndex | 81% | 48% | 72% | 68% |
| LangChain | 78% | 44% | 70% | 66% |
| Haystack | 73% | 41% | 65% | 60% |
| CrewAI | 71% | 38% | 63% | 58% |
Reproduce: git clone github.com/wauldoai/wauldo-leaderboard && cargo run · full methodology →
Three ways teams use it today.
Drop Wauldo between your LLM and your user. Or around your agent. Or in front of your support bot. Same primitive — support_score on every response.
Your RAG is confidently wrong.
Retrieves, answers, cites nothing. No audit trail. Prod hallucinates while eval passes.
Measure it → AI agentsMulti-step agents drift.
Step 3 invents a fact. Step 5 commits to it. By step 8, the reasoning is decorative.
Verify each step → AI supportYour bot invents refund policies.
Confident tone, fabricated terms, real customer. Reputation bleeds faster than you can patch prompts.
Ground it →Start free. Pay for scale.
All tiers via RapidAPI. Same endpoints, same verification, same SDKs. No credit card for BASIC.
Verify your first answer in 30 seconds.
Free tier. No credit card. 500 verifications per month on the house.