The gap in NeMo's self-check
NeMo Guardrails ships a self check facts output rail. It works by prompting your generation model — the one that just produced the answer — with a variation of "is this answer grounded in the context?". If a model hallucinated a fact, the same model is now the judge of whether it hallucinated. That is circular: same parametric memory, same failure modes, no external ground truth, and a single yes/no with no granularity.
For a support bot, a compliance assistant, or any RAG product, that is not a safety net. You need a check that does not share the generator's blind spots.
An independent output rail
wauldo-nemo registers a NeMo output rail backed by Wauldo's verification API. After your model answers, the rail sends the answer and its retrieved context to an independent service and gets back a structured verdict. Unsupported answers are refused, weakly-supported ones are annotated — each with the source evidence.
pip install wauldo-nemo Register the rail on your LLMRails instance — Wauldo stays the verifier, your existing model (OpenAI / Anthropic / local) stays the generator:
from wauldo_nemo import register, RailConfig, RailDecision
register(
rails,
config=RailConfig(
on_error=RailDecision.PASS, # Wauldo unreachable → fail-open (flagged)
on_missing_context=RailDecision.ANNOTATE,
max_retries=1, # fail fast: no latency tax on an outage
),
) Then reference it from a Colang output flow. $relevant_chunks is populated by NeMo's retriever; if it is empty the rail annotates ("no context") instead of crashing:
define flow wauldo verify output
$result = execute wauldo_fact_check(bot_message=$bot_message, source_context=$relevant_chunks)
if $result.decision == "refuse"
bot refuse unsupported answer
stop
if $result.decision == "annotate"
bot inform answer is weakly supported What the rail returns
Unlike a single true/false, each verification comes back structured — so your flow can refuse, annotate, or pass with full context:
- Claim-level verdicts — which specific claim failed, not a blanket pass/fail.
- Evidence — the passage in your own sources behind each verdict.
- A hallucination rate and per-claim confidence.
- Modes — a fast rule-based
lexicalmode, plushybrid/semantic.
review to a refusal, but can never override the verification service into a pass. The mapping lives in one framework-agnostic module, unit-tested without the NeMo runtime.
A verification layer, not an orchestrator
wauldo-nemo does not own flow control — NeMo does. Wauldo is the verification layer underneath. It is a thin adapter on top of the published wauldo SDK; all verdict logic lives in the SDK, never re-implemented in the rail. MIT-licensed, tested on Python 3.9 to 3.12.
We won't call it "deterministic" — the pipeline uses LLMs. The honest differentiator is independence from the generation model plus claim-level evidence.
pip install wauldo-nemo · source on GitHub ↗ · API docs ↗. 500 verifications/month free. See pricing →