Blog

Patterns in AI agent
engineering.

From the Refine AI team — practical writing on CI gates, behavioral testing, and shipping reliable AI agents.

Featured post
CI Behavioral testing Regression gates

Why your agent needs a regression gate, not just a debugger

A developer ships a "minor" prompt update. Evals pass. PR merges. Two days later: the agent now takes 22 steps instead of 6. Token cost tripled. Everything looked correct on the surface. Here's why output evals can't catch this — and what a behavioral regression gate does differently.

April 11, 2026 · 8 min read
Read
More posts
LangChain Instrumentation

Instrumenting any LangChain agent in under 10 lines

A step-by-step walkthrough of wrapping your LangChain AgentExecutor with @trace so Refine AI can capture a behavioral baseline.

Coming soon 5 min read
CI Best practices

The 8 behavioral checks every agent CI pipeline needs

Step count, tool calls, loop risk, latency, cost — a deep dive into each check, what it catches, and how to tune thresholds for your agent.

Coming soon 6 min read
Model updates Behavioral testing

Model swaps are the silent killer of agent reliability

Upgrading from gpt-4o to a new model version seems safe — but behavioral profiles can shift dramatically. How to gate model upgrades the right way.

Coming soon 7 min read