// engineering notes

We write when we have something to measure.

No thought leadership. No "state of AI" post. Essays with code, benchmarks, and repro steps. If the post has a number in the title, the number is in the post.

// featured · 2026-04-22

Wauldo Deploy — Open verified alt to managed agents

We built Wauldo Deploy: open standards (AGENTS.md, Skills, mcp.json, Agent Protocol) + verified execution. Every agent response grounded with a support score.

Published 2026-04-22 · /blog/wauldo-deploy

// all posts

12 essays, newest first.

2026-04-22 · deploy

Wauldo Deploy — Open verified alt to managed agents

We built Wauldo Deploy: open standards (AGENTS.md, Skills, mcp.json, Agent Protocol) + verified execution. Every agent response grounded with a support score.

Read post →
2026-04-17 · benchmark

14-LLM hallucination benchmark — Wauldo blog

We tested 14 LLMs on 61 adversarial tasks. Every model hallucinated. The cheaper model with verification beat the premium model without. Here's the data.

Read post →
2026-04-10 · verification

We wrapped LangChain with our own verifier. It changed nothing.

Ablation study on 70 adversarial tests: LangChain + Wauldo Guard scored 45/70. Bare LangChain scored 46/70. Post-hoc verification doesn't close the gap. Robustness is system-level.

Read post →
2026-03-25 · positioning

"Works most of the time" isn't good enough — Wauldo blog

Users trust AI that sounds confident, even when it is wrong 8% of the time. Users don't report them. They trust them. Do the math for your own AI.

Read post →
2026-03-20 · business

The Real Cost of Shipping Unverified AI to Users — Wauldo Blog

One wrong AI answer triggers churn, support tickets, and compliance risk. Here's how to quantify the cost — and fix it with verification.

Read post →
2026-03-15 · verification

How to fact-check LLM outputs automatically — Wauldo

A developer's guide to adding automated verification to your LLM pipeline — token overlap, semantic similarity, hybrid, numerical mismatch detection.

Read post →
2026-03-10 · langchain

LangChain Hallucinations: Why Retrieval Alone Doesn't Fix Them — Wauldo Blog

LangChain retrieves context but doesn't verify answers. Here's why RAG hallucinations persist and how to add a verification layer that actually works.

Read post →
2026-03-05 · production

Your LLM Is Lying in Production — Wauldo Blog

Most LLM apps ship unverified answers to users. Here's how to audit your AI outputs, measure hallucination rates, and catch wrong answers before users do.

Read post →
2026-03-01 · openai

How to Verify OpenAI Responses Before Users See Them — Wauldo Blog

GPT-4 is powerful but not infallible. Here's how to add a verification layer between OpenAI and your users. Code examples for Python and TypeScript.

Read post →
2026-02-25 · rag

5 RAG mistakes to avoid — Wauldo blog

Five pitfalls every RAG team hits: tenant isolation, confidence signals, hybrid retrieval, source attribution, model routing. With pseudocode.

Read post →
2026-02-20 · verification

How to Get Verified AI Answers in 5 Minutes — Wauldo Blog

Upload documents, ask questions, get answers with sources and a numeric support score. One API, no hallucinations, full audit trail.

Read post →
2026-02-15 · hallucination

Zero Hallucinations: How Our RAG Pipeline Works — Wauldo Blog

3-path retrieval, multi-source merge, post-generation fact-checking, confidence calibration, and an audit trail that proves every answer came from source.

Read post →
SUBSCRIBE

Posts ship sporadically — when we have a number to report or a pattern to share. Subscribe via RSS to catch every one. No email capture.

// adjacent reading

The numbers, not the essays.

Essays are fine. Measuring is better.

Paste an AI answer in our home widget to see support_score live.