INTEGRATION · NVIDIA NeMo Guardrails · pip install wauldo-nemo

Fact-check NeMo Guardrails answers — with an independent verifier.

NeMo's built-in self check facts rail asks the same LLM that wrote the answer whether the answer is true. Same model, same blind spots. wauldo-nemo is an output rail that checks each answer against its retrieved context with a separate verification service — claim by claim, with the evidence behind every verdict.

The gap in NeMo's self-check

NeMo Guardrails ships a self check facts output rail. It works by prompting your generation model — the one that just produced the answer — with a variation of "is this answer grounded in the context?". If a model hallucinated a fact, the same model is now the judge of whether it hallucinated. That is circular: same parametric memory, same failure modes, no external ground truth, and a single yes/no with no granularity.

For a support bot, a compliance assistant, or any RAG product, that is not a safety net. You need a check that does not share the generator's blind spots.

An independent output rail

wauldo-nemo registers a NeMo output rail backed by Wauldo's verification API. After your model answers, the rail sends the answer and its retrieved context to an independent service and gets back a structured verdict. Unsupported answers are refused, weakly-supported ones are annotated — each with the source evidence.

pip install wauldo-nemo

Register the rail on your LLMRails instance — Wauldo stays the verifier, your existing model (OpenAI / Anthropic / local) stays the generator:

from wauldo_nemo import register, RailConfig, RailDecision

register(
    rails,
    config=RailConfig(
        on_error=RailDecision.PASS,         # Wauldo unreachable → fail-open (flagged)
        on_missing_context=RailDecision.ANNOTATE,
        max_retries=1,                      # fail fast: no latency tax on an outage
    ),
)

Then reference it from a Colang output flow. $relevant_chunks is populated by NeMo's retriever; if it is empty the rail annotates ("no context") instead of crashing:

define flow wauldo verify output
  $result = execute wauldo_fact_check(bot_message=$bot_message, source_context=$relevant_chunks)

  if $result.decision == "refuse"
    bot refuse unsupported answer
    stop

  if $result.decision == "annotate"
    bot inform answer is weakly supported

What the rail returns

Unlike a single true/false, each verification comes back structured — so your flow can refuse, annotate, or pass with full context:

Claim-level verdicts — which specific claim failed, not a blanket pass/fail.
Evidence — the passage in your own sources behind each verdict.
A hallucination rate and per-claim confidence.
A relevance verdict — whether the answer actually addresses the user's question, scored separately from factual support.
Modes — a fast rule-based lexical mode, plus hybrid / semantic.

Honesty-bound by design The rail's policy can only make a verdict stricter, never more lenient — your thresholds escalate a review to a refusal, but can never override the verification service into a pass. The mapping lives in one framework-agnostic module, unit-tested without the NeMo runtime.

Catch answers that are true — but off-topic

A fact-check alone has a blind spot: an answer can be perfectly grounded in your sources and still not answer the question. "What's your refund window?" → "Our headquarters are in Paris." — verified, supported, useless. NeMo's self-check can't see this either: the answer is consistent with the context.

The rail scores relevance to the user's question as a separate axis, decoupled from factual support. The question is picked up automatically from NeMo's $last_user_message — no flow changes needed. Set a floor and choose what happens below it:

from wauldo_nemo import register, RailConfig, RailDecision, PolicyThresholds

register(
    rails,
    config=RailConfig(
        thresholds=PolicyThresholds(
            min_relevance_score=0.75,
            on_low_relevance=RailDecision.ANNOTATE,  # or REFUSE
        ),
    ),
)

A verified-but-off-topic answer comes back as verdict=verified with a low relevance score — your flow reads it from $wauldo_relevance. Same honesty rule as everything else: the relevance gate can only escalate a decision, never soften one.

Roll it out in shadow mode

Not ready to let a rail refuse answers in production? Set shadow=True: the rail runs the full verification, logs what it would have decided (structured logs with a request_id per call, optional OpenTelemetry spans via the otel extra), and lets everything through. Watch the would-be refusals for a week, then flip enforcement on. The evidence behind each verdict is also exposed to your flows as $wauldo_evidence, and refusal wording is yours to own via refuse_template.

A verification layer, not an orchestrator

wauldo-nemo does not own flow control — NeMo does. Wauldo is the verification layer underneath. It is a thin adapter on top of the published wauldo SDK; all verdict logic lives in the SDK, never re-implemented in the rail. MIT-licensed, tested on Python 3.9 to 3.12.

We won't call it "deterministic" — the pipeline uses LLMs. The honest differentiator is independence from the generation model plus claim-level evidence.

Get started pip install wauldo-nemo · source on GitHub ↗ · API docs ↗. 500 verifications/month free. See pricing →

The gap in NeMo's self-check

An independent output rail

What the rail returns

Catch answers that are true — but off-topic

Roll it out in shadow mode

A verification layer, not an orchestrator

Verify the rest of your stack.

LangChain retrieves but doesn't verify.

Verify OpenAI completions.

RAG that's confidently wrong.