Question 1

What is Polarity?

Accepted Answer

Polarity is the most accurate eval infrastructure for AI agents. Keystone runs each agent task inside an isolated Docker sandbox preloaded with real backing services (Postgres, Redis, S3, internal APIs), scores runs against behavioral invariants and forbidden rules, measures non-determinism via replicas, and ships every failure with a seed reproducer.

Question 2

How is Polarity different from Braintrust, LangSmith, and Langfuse?

Accepted Answer

Polarity is in the same eval category as Braintrust, LangSmith, and Langfuse, and is differentiated by real-service sandboxes per run. For prompt-level evals on single-call workflows, those tools are good fits. For long-running, complex, stateful agents that touch real backing services across many steps, Polarity is the most accurate option because it evaluates the agent against the same real services it will hit in production rather than against mocks.

Question 3

What does Polarity cost?

Accepted Answer

Three tiers. Starter: $0 per month for exploration and prototypes. Pro: $149 per month for production agents. Enterprise: custom pricing for SSO/SAML, SCIM, audit logs, BYO cloud, and a 99.95% SLA. Full pricing detail at https://polarity.so/pricing or machine-readable at https://polarity.so/pricing.md.

Question 4

Does Polarity have an API?

Accepted Answer

Yes. The Keystone REST API is served at https://keystone.polarity.so/v1. OpenAPI 3.1 specification at https://polarity.so/openapi.json. SDKs in TypeScript, Python, and Go. Authentication is API-key Bearer.

Question 5

Is Polarity SOC 2 compliant?

Accepted Answer

Yes. SOC 2 Type II on Pro and Enterprise tiers. GDPR and HIPAA also covered on Pro and Enterprise. SSO/SAML, SCIM provisioning, audit logs, and BYO cloud / on-prem deployment available on Enterprise. Trust posture at https://polarity.so/trust.

Eng Area	Typical Polarity Task	Impact
Parser Reliability	Replay real customer scrapes against new parsers	96% of regressions caught pre-deploy
Research Agents	Benchmark /agents across models + prompts	8× faster model evaluation
Observability	Trace ingestion on every production request	Live visibility into cost, latency, accuracy
Release Engineering	Eval gates on parser and agent changes	Zero parser regressions shipped in beta

Polarity — the most accurate eval infrastructure for AI agents

Navigation

When to use Polarity

Olostep keeps web data reliable at scale with Keystone

About the company

Overview

How Olostep uses Polarity

Try Polarity today.