The improvement loop for voice AI agents.
Roark simulates your agent against hundreds of scenarios before launch, then scores every production call on 64+ audio-native metrics — so you catch what breaks, prove the fix in simulation, and ship with evidence.
Works with Vapi, Retell, LiveKit, Pipecat + your custom stack.
Scoring production voice AI for teams at


§01 · How Roark works
One loop. Always improving.
Catch it in production, prove the fix in simulation, ship with evidence — then Roark watches the next call.
Roark scores every live call and files what breaks.
Caller: I was charged twice — I need a refund.
Agent: No problem, I’ve refunded it to your card.
Identity not verified — Refund issued to an unconfirmed caller.Compliance
Your fix, replayed against realistic simulated callers.
Every change explicit and diffed — you apply it.
You ship — Roark confirms the metric moved on live calls.
you ship it — Roark scores every call from the first minute
…and the loop runs again on the next call.
Customers
Teams ship faster when they
can hear what breaks.

Production voice AI scored on every conversation — pronunciation, empathy and resolution across their support and sales calls.

Healthcare calls evaluated for disclosures and identity checks — compliance scoring on every conversation, automatically.
Client voice agents validated in simulation before they go live — evidence that a build is ready, not a hunch.
§02 · Simulate before launch
Break it in staging,
not in production.
Run your agent against hundreds of simulated callers — realistic personas, accents, background noise and edge cases — and get every conversation scored before a customer ever dials in.
Scenarios & personas
Hundreds of simulated callers — the angry one, the rambler, the interrupter — built from your real call types.
45 languages & accents
Native accents, code-switching and background noise — in every market your agent answers.
Load & health tests
Peak-volume concurrency and always-on health checks, so the agent that passed in staging survives launch day.
Run it in CI
Every prompt or model change runs the suite before it merges — quality gates for conversations, not just code.
1 failure filed as an issue — fix it before launch, not after
§03 · Evals & observability
64+ metrics. Your models,
not just an LLM.
Every production call scored as it lands — issues filed, alerts fired, dashboards and OTEL traces on tap, for voice calls and chat threads alike. And where most tools grade a transcript with an LLM, Roark runs purpose-built audio models on the call itself, measuring what your customer actually heard.
+1 (415) 555-0134 → support_v2
Today 14:32 · 3m 42s · Vapi
Caller: I was told the refund would arrive by Friday…
Agent: Let me check that for you — one moment.
Dead air — 3.8s before the agent responded.Issue #482 filed
Metrics
Everyone else
LLM reads the transcript
“The agent said the right words.” Misses how it sounded — the mispronounced drug name, the flat apology, the rushed close.
Audio models hear the call
Pronunciation, accent, emotion and vocal stress measured from the waveform — the signal an LLM grading text can never see.
Audio-native
custom models
- Pronunciation
- Accent clarity
- Emotion
- Vocal stress
- Pace & pauses
- Interruptions
Conversational
LLM + rules
- Resolution
- Empathy
- Task success
- Hallucination
- Repetition
- Tone
Compliance
policy
- Disclosures
- PII exposure
- Identity check
- Script adherence
Performance
latency
- Time-to-first-word
- Turn latency
- ASR WER
- Barge-in handling
§04 · Get started
First call scored in under a minute.
One click on any platform below and production calls stream in on their own — or send any recording with three lines of code.
Read the quickstartimport Roark from '@roarkhq/sdk'const roark = new Roark({ apiKey })await roark.calls.evaluate({recordingUrl, agent: 'support_v2',}) // scored in seconds
Works with
Industries
Built for the calls you actually take.
The same audio-native scoring — tuned to the failures, scripts and stakes of your industry.
Enterprise-grade from day one — annual pen tests, SSO/SAML, role-based access, configurable retention.
Bring a recording.
We’ll score it live.
See your own agent measured on the audio it actually produced — in the demo, in real time. Stop guessing whether your voice AI works.
founders@roark.ai · we reply fast