// verification & orchestration runtime · ClaudeCode · Cursor · Continue ready

The verification & orchestration runtime for AI agents.

Paste any AI answer for a numeric support_score (0–1), or orchestrate a multi-step pipeline as a state machine. Every claim grounded, every transition audited.

No signup · Free · widget on the right · sandbox in a new tab
/.well-known/agent-manifest.json · /v1/agents/schema · agent-mode auto-detected via User-Agent
POST /v1/fact-check live

Demo runs in lexical mode (~1s, fast). API also supports hybrid (multilingual embedding) and semantic (LLM-judge) for paraphrases — see /docs#fact-check-modes.

Add source document (optional — for stricter grounding)
mode: lexical · ~1s
// median adversarial · 4 runs
91%
On 70 hand-crafted adversarial cases. Range 86–97. +48pt vs LangChain on prompt injection.
// runs · 2026-04-10 → 2026-04-15 86 · 91 · 93 · 97
Run 1 86% Run 2 91% Run 3 93% Run 4 97%
MIT open source · 5ms p50 fast path · 1.566s avg end-to-end agent run · Reproduce the bench →
// how it works

Three steps. No model guessing.

Wauldo extracts atomic claims from the answer, matches each claim against your sources, and returns a grounded score. You see exactly what is supported and what is not.

01 · INPUT

Send answer + source

Any LLM output. Any source text or RAG context. One POST.

02 · EXTRACT

Claims extracted

Each factual assertion is isolated — dates, entities, numbers, relationships. No summarization.

03 · SCORE

Support score returned

Every claim checked against sources. Output: support_score ∈ [0,1] + per-claim verdict.

curl · verify any answer
# POST /v1/fact-check — returns support_score + per-claim verdicts
curl -X POST https://api.wauldo.com/v1/fact-check \
  -H "Authorization: Bearer $WAULDO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Paris has 12 million inhabitants.",
    "source_context": "Paris population is 2.1 million (2024).",
    "mode": "lexical"
  }'
# → { "support_score": 0.0, "verdict": "UNVERIFIED", "claims": [...] }

// benchmark · v2026-04-17

Reproducible adversarial benchmark.

70 cases × 4 runs against five frameworks. Factual retrieval, prompt injection, out-of-scope. The command to re-run it is printed on this page — no signup, no cached numbers.

⚖ Reading the table fairly. LangChain, LlamaIndex, Haystack and CrewAI are orchestration frameworks, not verification layers — so comparing them to Wauldo on adversarial inputs is intentionally apples-to-oranges. The numbers measure what a developer gets out-of-the-box from each stack, not the framework's intrinsic quality. The honest follow-up question is "does adding Wauldo to LangChain close the gap?" — the ablation answers it: same 44% on injection, the verification has to live inside the loop. See the ablation →

70-case adversarial · 4 runs · 5 frameworks api live
FrameworkFactualInjectionOut-of-scopeTotal
Wauldo100%92%100%91%
LlamaIndex81%48%72%68%
LangChain78%44%70%66%
Haystack73%41%65%60%
CrewAI71%38%63%58%

Reproduce: git clone github.com/wauldoai/wauldo-leaderboard && cargo run · full methodology →



// pricing

Start free. Pay for scale.

All tiers via RapidAPI. Same endpoints, same verification, same SDKs. No credit card for BASIC.

BASIC
$0/mo
500 requests/mo
  • All endpoints
  • Community support
  • No credit card
Start free
PRO
$19/mo
10,000 requests/mo
  • All endpoints
  • Priority queue
  • Email support
Subscribe
MEGA
$0.008/req
Pay-per-use
  • Unlimited volume
  • No commitment
  • Scales to millions
Go pay-per-use

Full pricing, FAQ, calculator →


Reproducible build MIT SDKs · PyPI · npm · crates.io Open-source leaderboard View changelog

Verify your first answer in 30 seconds.

Free tier. No credit card. 500 verifications per month on the house.

$ curl api.wauldo.com/v1/fact-check