Pricing — Refine AI

Free

$0 / forever

Local debugging — no cloud required.

Unlimited local runs
agentdbg view timeline viewer
Loop & guardrail detection
Step-through trace inspection
LangChain, LlamaIndex, AutoGen, raw Python
Community Slack

Common questions

What counts as a "behavioral regression"?

Any of the following: step count outside the configured threshold, new or removed tool calls, elevated loop risk score, cost above budget, latency spike beyond the allowed delta, guardrail fires, or a stop condition being missed. You configure thresholds — Refine AI measures deltas against your baseline.

Do I need to define what "correct" looks like?

No. There are no golden datasets to curate, no rubrics to write, and no LLM judges to prompt. Refine AI compares the execution structure of your agent before and after the code change. If the structure changed outside your thresholds, the check fails.

How does baseline management work?

Run agentdbg baseline capture on your main branch. The baseline is stored as a JSON file and versioned in your repo. The GitHub Action compares every PR against it automatically. When you intentionally change behavior, run baseline capture again to update it.

Is there a usage limit on the Free tier?

No limits on local runs — ever. The Free tier is local-only: you can run as many agent traces as you like and use the full timeline viewer. CI gate features (agentdbg assert and the GitHub Action) require the Team plan.

Do traces leave my environment?

No. Refine AI runs entirely on your CI runner. Traces are generated, compared, and discarded within your GitHub Actions runner environment. Nothing is sent to our servers. Enterprise customers on on-prem deployments have full control over data residency.

Can I use Refine AI with TypeScript / Node?

The Python SDK is available today and supports LangChain, LlamaIndex, AutoGen, and raw Python agents. A TypeScript / Node.js SDK is in active development. Sign up for early access and we'll notify you when it ships.

How is this different from Braintrust or LangSmith?

Braintrust and LangSmith evaluate output quality using LLM-as-judge — they answer "was the answer good?" Refine AI checks whether the behavioral structure of execution changed — it answers "did this code change alter how the agent behaves?" No LLM calls in our check path. It's a different question. Both tools are complementary.

What's the Enterprise pricing model?

Per-seat or volume-based depending on team size and deployment model. We work with engineering teams to scope the right package. Reach out at founders@refinehq.ai and we'll get back to you within one business day.

Feature	Codecov	Snyk	SonarQube
Free local tool
Paid CI gate
Per-seat pricing
Specific to AI agents	–	–	–

Free to debug.
Pay to gate your PRs.

The proven devtools model,
applied to AI agents.

Common questions

Ready to gate
your first PR?

Free to debug.Pay to gate your PRs.

The proven devtools model,applied to AI agents.

Common questions

Ready to gateyour first PR?

Free to debug.
Pay to gate your PRs.

The proven devtools model,
applied to AI agents.

Ready to gate
your first PR?