The two halves of agent lock-in

LangChain recently shipped Deep Agents Deploy — and they're right about one thing: an agent harness is intimately tied to memory. As Sarah Wooders put it: the harness is context management. When the harness is closed and the memory APIs are proprietary, you don't own your agent. You rent it.

We agree. wauldo deploy is open-first by design:

  • AGENTS.md — your instructions, in a file you can grep, diff, and version.
  • Skills — markdown knowledge + executable actions, scanned from a skills/ directory.
  • mcp.json — external tools via the standard MCP format used by Claude Desktop, Cursor, Cline.
  • Agent Protocol — conformant endpoints at /ap/v1/agent/tasks so AutoGPT, SuperAGI, and any AP client work out of the box.
  • Model-agnostic — OpenAI, Anthropic, Google, and aggregators. Swap providers without rewriting a line.
  • Self-hostable — single binary, local storage, no mandatory cloud.

That covers half of the lock-in problem. The other half is the one nobody on the managed-or-open spectrum has solved: your agent is still an LLM, and LLMs still lie.

The half everyone skips: verification

Take an SDR agent. It pulls a lead profile from your CRM, asks the LLM to draft outreach. The LLM "helpfully" invents a recent funding round that never happened. You send the email. The prospect replies with a screenshot of your hallucination. Your brand takes the hit, not OpenAI's.

Deep Agents Deploy doesn't solve this. Claude Managed Agents doesn't either. Neither does CrewAI, AutoGen, or LangGraph. They ship the LLM's output verbatim. If the model hallucinates, you ship the hallucination.

Wauldo Deploy runs every agent response through the Verified Execution Engine before it leaves the server. The pipeline:

  • Pre-LLM: source instruction filtering, cross-source contradiction detection, normalized value comparison (currency, time, sizes, %), 35 anti-injection patterns across 4 languages.
  • Post-LLM: claim extraction and source-grounded fact-checking (hybrid embedding + token overlap), phantom citation detection, structured output validation, injection regurgitation check.
  • Output: a verdict (SAFE, UNCERTAIN, PARTIAL, CONFLICT, BLOCK, UNVERIFIED) and a support_score ∈ [0,1] — a single number you can gate on.

When a claim is ungrounded, the Response Rewriter replaces it with [could not be verified: ...] rather than letting the confident lie through. The client picks the mode:

verification_mode = "strict"      # UNVERIFIED → BLOCK
verification_mode = "balanced"    # UNVERIFIED → PARTIAL (default)
verification_mode = "permissive"  # UNVERIFIED → pass-through with warning

It's exposed on the Agent Protocol

Wauldo Deploy speaks the open Agent Protocol. Every step response carries an additional_output.verification block — a non-standard extension that surfaces the support score without breaking conformance:

POST /ap/v1/agent/tasks HTTP/1.1

{
  "task_id": "tk_abc123",
  "artifacts": [{"file_name": "answer.txt", "agent_created": true}],
  "additional_output": {
    "verification": {
      "verdict": "SAFE",
      "support_score": 0.91,
      "claims": [
        { "text": "The return policy is 30 days", "grounded": true, "confidence": 0.94, "sources": [1] },
        { "text": "Free shipping over $50", "grounded": true, "confidence": 0.88, "sources": [2] }
      ],
      "sources_cited": [1, 2]
    }
  }
}

AP clients ignore what they don't understand — so conformance tools keep passing. But if you gate on support_score < 0.6, you get grounded agents for free. That's not a config option you can add to a managed platform after the fact.

The benchmarks that matter

We run an adversarial suite on every commit. 70 tasks split across:

  • 10 factual — straight retrieval from grounded sources.
  • 15 out-of-scope — questions the source can't answer; correct behavior is refusal, not confabulation.
  • 25 injection — 5 attack types (direct, indirect via document, zero-width Unicode, multilingual, fragmented).
  • 10 contradiction — sources disagree; the agent must detect the conflict, not pick a side.
  • 10 semantic / multilingual — hedge words, antonyms, negation across EN/FR/ES/DE.

Current score: 97% on the adversarial suite, 0% hallucination rate. Cross-model stable — the pipeline itself is doing the work, not the model.

The gap between "our agent works on happy-path demos" and "our agent survives an adversarial bench" is where most framework comparisons quietly disappear. We publish the harness, the dataset, and the numbers.

What you get in the box

  • Production-tested Rust engine, MIT license, thousands of tests.
  • Modular architecture: orchestrator, hybrid RAG, multi-provider routing with fallback, MCP support, verified Task API, tools, CLI, SDKs for Python / TypeScript / Rust.
  • Sandboxes: hardened Docker containers or local dev mode.
  • Memory API: tenant-scoped, semantic search, exportable. Your memory, your database — we don't hold it hostage.
  • Human-in-the-loop: mark tools as requiring approval in your config, get pause/approve/reject endpoints for free.
  • A2A: agents can call other agents as tools, with cycle detection and depth limit.
  • Observability: Prometheus metrics for every stage of the verification pipeline.

Try it

cargo install wauldo-cli
wauldo init my-agent && cd my-agent
wauldo deploy --target local

60 seconds from zero to a running, verified agent. No credit card. No managed platform. No walled garden.

The core bet: you shouldn't have to choose between open and grounded. With Wauldo Deploy, you don't.

OpenAI is a trademark of OpenAI, Inc. Anthropic and Claude are trademarks of Anthropic, PBC. Google and Gemini are trademarks of Google LLC. Meta and Llama are trademarks of Meta Platforms, Inc. All other trademarks are the property of their respective owners.


Try it free Paste any AI answer into our home widget to get a numeric support_score. No signup. 300 verifications/month free on RapidAPI. See pricing →