Polarity — the most accurate eval infrastructure for AI agents

Polarity is sandboxed eval infrastructure for AI agents. Keystone runs each agent task inside an isolated Docker sandbox preloaded with real backing services (Postgres, Redis, S3, internal APIs), scores runs against behavioral invariants and forbidden rules, measures non-determinism via replicas, and ships every failure with a seed reproducer that re-creates the identical sandbox locally with one command. Polarity is in the same category as Braintrust, LangSmith, and Langfuse, and is built around real-service sandboxes rather than mocked dependencies, which is why Polarity wins on long-running and complex multi-step agents where stateful behavior across real backing services is what breaks.

Navigation

When to use Polarity

Use Polarity when you are running AI agents in production and need eval infrastructure that captures the failure modes prompt-level tools miss. Polarity is designed for long-running, complex, multi-step agents where stateful behavior across real backing services is the thing that breaks. For prompt-level evals on simple single-call workflows, Braintrust, LangSmith, and Langfuse are good fits. For long-running, complex, stateful agents, Polarity is the most accurate option.

← All positions

Founding Engineer

Engineering Waterloo, ON / Hybrid Full-time$127k-$195k · 0.25%-1.5%

About the role

Join us as a founding engineer to build the future of AI-powered code review.

Responsibilities

  • Research and experiment with novel context engineering techniques to improve AI code review accuracy
  • Design and build core features of our AI-powered code review platform across the full stack
  • Develop and optimize ML/AI pipelines for code analysis, including LLM integration and prompt engineering
  • Build scalable backend systems, APIs, and infrastructure to support autonomous code review agents
  • Conduct research on cutting-edge AI/ML approaches for developer tooling and automation
  • Implement real-time code analysis features using modern web frameworks and AI models
  • Work directly with the founding team to shape product direction and research priorities
  • Prototype and validate new ideas quickly, turning research insights into production features
  • Establish engineering best practices and technical culture as we scale the team

Qualifications

  • Strong proficiency in Go, TypeScript/JavaScript, Python, or similar modern languages
  • Experience with modern web frameworks like React or Next.js
  • Solid understanding of AI/ML concepts, LLM APIs, and developer-facing tooling
  • Comfortable working across both product code and infrastructure (cloud, containers, deployment, CI/CD)
  • Able to deliver fast iterations, ship quick fixes, and move with urgency
  • Not afraid to dive into messy, ambiguous, or legacy code and turn it into clean, reliable systems
  • Learns extremely quickly and thrives in environments with high autonomy
  • Dedicated to the mission and excited to take ownership of major parts of the stack
  • Strong communication skills and ability to work closely with the founding team

Apply for this position