// engineering notes

We write when we have something to measure.

No thought leadership. No "state of AI" post. Essays with code, benchmarks, and repro steps. If the post has a number in the title, the number is in the post.

Read the ablation study → Subscribe via RSS →

// featured · 2026-04-12

System-level robustness vs bolt-on layer.

We ran the obvious experiment: take LangChain, add Wauldo Guard as a post-hoc check, and see if the +48pt injection gap closes. Spoiler: it doesn't. Here's why verification inside the loop is not the same as verification around the loop.

Read time ~9 min · Published 2026-04-12 · /blog/ablation-system-vs-layer

KEY TAKEAWAY

Guard around LangChain: injection 44%. LangChain alone: injection 44%. Wauldo: injection 92%. The gap lives in the reasoning path, not at its boundary.

// all posts

12 essays, newest first.

2026-04-12 · benchmark

Posts ship sporadically — when we have a number to report or a pattern to share. Subscribe via RSS to catch every one. No email capture.

Grab the feed →

We write when we have something to measure.

System-level robustness vs bolt-on layer.

12 essays, newest first.

System-level robustness vs bolt-on layer: why LangChain + Guard didn't close the gap

Wauldo Deploy — shipping the verification primitive to prod

How to get verified AI answers in 5 minutes

We tested 14 LLMs for hallucination on Claude 3.5 Sonnet queries — here are the results

Your LLM is lying in production — here's how to prove it

LangChain hallucinations: why retrieval alone doesn't fix them

How to verify OpenAI responses before users see them

Zero hallucinations: how our RAG pipeline works

The real cost of shipping unverified AI to users

How to fact-check LLM outputs automatically

'It works most of the time' is not good enough for AI

5 mistakes to avoid when building RAG systems

The numbers, not the essays.

Leaderboard

Benchmarks

Product overview

Essays are fine. Measuring is better.