The platform forSelf-improving Agents.
Polarity monitors every agent decision in production, surfaces failure patterns before users hit them, and turns trajectories into evals that compound your agent’s reliability over time.
Platform. Monitor, triage, and improve your agents in production.
FIG.1
why did the agent fail for user u_8af2?
trace tr_8af2c1 — tool loop @ step 17
behavior: tool-loop-detector
Found 7 similar failures in the last 24h.
Investigate misbehaving agents the moment they fail.
FIG.2
Cluster decisions into behaviors and surface the patterns behind failures.
FIG.3
Watch every agent decision land in production — live.
FIG.4
Lock every detected failure into a guardrail so reliability compounds.
“Switching to Polarity has been an incredible experience. It is fast, accurate and does more than the competitors. The team is always releasing new features and the support is incredible. I always hear back within the hour.”
Anton
CTO at Clover Labs
“Our engineering team is very lean, and the Polarity product is instrumental to us shipping fast and maintaining a high bar of code quality and security.”
Colin
CTO at Ohm AI
Behaviors
Polarity analyzes every decision your agents make in production and detects recurring failure behaviors the moment they emerge. Each detection becomes a guardrail — so the same regression never reaches a user again.
tool-loop-detector
Catches agents that re-call the same tool with identical args…
stale-context-drift
Flags decisions made against context older than the last user turn
refusal-rubric
Meta-behavior for grading false refusals against intent…
swe-agent-escape
Detects SWE agents that escape their workspace sandbox at edit-time…
stale-context-drift
Flags decisions made against context older than the last user turn
hallucinated-citation
Catches citations that don’t appear in the agent’s retrieved sources
hud-prompt-injection
Detects agents that follow injected instructions inside tool output
hud-prompt-injection
Detects agents that follow injected instructions inside tool output
will/early-stop
Catches agents that finalize before all required tool calls succeed
import polarity as pl agent = pl.instrument( agent=my_agent, workspace="prod", capture="decisions", sample_rate=1.0, )
Instrument any agent in a few lines and stream every decision to Polarity for monitoring and replay.
uv run plr replay \ --trace tr_8af2c1 \ --agent @ examples/agent/ agent.toml \ --diff inline \ --promote-to-behavior
Polarity Live Replay — re-run any production trajectory locally and promote the failure into a behavior that guards every future run.
prod-workspace
python:3.11-slim
staging-workspace
python:3.11-slim
experiments
python:3.11-slim
Organize agents, projects, and teammates with shared dashboards, alerts, and access controls.
Production. Watch every agent decision in flight and act on regressions before users see them.
Always-on monitoring
Behavior-level visibility into every run.
FIG.5
support-agent
99.6%
research-agent
99.9%
code-agent
99.4%
billing-agent
99.8%
support-agent
2.4%
research-agent
0.1%
code-agent
1.2%
billing-agent
0.2%
retrieval-agent
99.7%
summarizer-agent
99.5%
onboarding-agent
100%
moderation-agent
99.9%
FIG.7
Find a behavior to promote..
stale-context-drift
512 traces
PASS 498 / 512
Reliability uplift
+2.7 pts
Ship with confidence
Replay failures, then gate them in CI.
Research. We’re an applied-research lab solving last-mile agent reliability.
DiscoverContinuously improveyour own agents.
