About the company
Olostep is a Web Data API used by AI teams to search, crawl, scrape, and structure data from the web — including a /agents endpoint that automates multi-step research workflows from a natural-language prompt. Olostep processes batches of up to 100k URLs in 5–7 minutes and is trusted by teams like Gumloop, Openmart, Athena, and Profound.
Industry: AI Infrastructure / Web Data API
Visit site96%
Parser regression catch rate
8×
Faster agent eval turnaround
99.5%
Uptime across model swaps
500+
Research-agent tasks per run
Overview
When your product is a web data API, two things break quietly and often: target sites change their DOM, and new model versions shift how research agents plan multi-step tasks. Olostep needed a way to catch both failure modes before customers did — especially as batch workloads climbed into the hundreds of thousands of URLs and the /agents endpoint started running longer, multi-hop research workflows. Keystone gave Olostep a purpose-built layer for exactly that. Parsers and research agents run in isolated sandboxes against canonical task suites, with prompt-diff scoring on every change and side-by-side model comparison baked in. Regressions that used to surface as customer support tickets now get caught in CI.
Today, Polarity works alongside Olostep's engineering team as a true collaborator:
- Hermetic sandboxes to validate parsers and research agents against golden fixtures
- Side-by-side model comparison for the /agents endpoint across providers
- Live trace ingestion covering tool calls, LLM cost, and latency for every production run
- Automated alerts when parser accuracy or agent success rate drifts off baseline
“Keystone is the QA layer we didn't want to build ourselves. We know within the hour whether a new model or parser change is better — not next week when a customer ticket comes in.”
Hamza, CEO at Olostep
How Olostep uses Polarity
Polarity supports Olostep's technical teams across a range of functions.
| Eng Area | Typical Polarity Task | Impact |
|---|---|---|
| Parser Reliability | Replay real customer scrapes against new parsers | 96% of regressions caught pre-deploy |
| Research Agents | Benchmark /agents across models + prompts | 8× faster model evaluation |
| Observability | Trace ingestion on every production request | Live visibility into cost, latency, accuracy |
| Release Engineering | Eval gates on parser and agent changes | Zero parser regressions shipped in beta |
