SIMULATE · EVALUATE · IMPROVEVoice agents for intake, scheduling, refills & triage

Your intake agent has to say metoprolol right.

Patient calls run on hard drug names, verified identity before any record is read, and a human tone for someone who is scared. Roark scores every one — on the audio, under a signed BAA.

Book a demo See what breaks

Backed by YC

Live · scoring every call1,284 today

Caller: I need my metoprolol refilled.

Agent: Sure — refilling your met-a-pearl.

Mispronunciation — Drug name mispronounced — heard by the audio model, invisible to the transcript.Pronunciation

Scoring production voice AI for teams at

radiantgraph

§01 · When the call goes wrong

Here's how a patient call goes wrong.

Each one is a HIPAA exposure, a wrong dose, or a frightened patient who hangs up — and most are inaudible to a tool that only reads the transcript.

01Refill

The 'met-a-pearl' refill

Caller: I need my metoprolol refilled.

Agent: Sure — refilling your met-a-pearl.

The transcript looks right; the audio is wrong. Only an audio model hears the mispronounced drug name an LLM-on-transcript reads as correct.

Pronunciation

02Records

PHI before identity

Caller: Can you read me my last results?

Agent: Of course — your A1C was…

The agent disclosed a record before verifying who was on the line. Roark fails any turn that touches PHI before identity is confirmed.

Identity check

03Intake

The skipped HIPAA disclosure

Caller: I want to book a visit.

Agent: Great, what day works?

Required notice was never read on a recorded line. Roark scores disclosure language as pass/fail on every call.

Disclosures

04Triage

Flat empathy on a scared caller

Caller: I think I'm having chest pain.

Agent: Okay. What is your date of birth?

The words were correct and the delivery was cold. The audio model scores warmth and emotion the text can never show.

Empathy

05Dosage

Confident, unverified dosage

Caller: How many of these do I take?

Agent: Two tablets, twice a day.

Stated with confidence, never checked against the record. Roark flags unverified clinical assertions.

Accuracy

§02 · From caught to fixed

Roark catches every one of these — and proves the fix.

Each failure above is filed with its evidence, becomes a repeatable simulation until a candidate passes, and is verified on your next thousand live calls.

01 · Catch

The ledger above — every failure filed live, evidence attached.

See what breaks

02 · Simulate

Your fix, replayed against the exact failures above.

Testing your candidates82 / 240

prompt · warmer triage openfail

voice · slower pacingfail

tts · turbo-v2 → v3fail

03 · Review

Every change explicit and diffed — you apply it.

Your fix, diffedsupport_v3 v4

PromptVoiceTool

− Help the caller with their request.

+ Verify the caller's identity before reading any record, then help.

04 · Verify

You ship — Roark confirms the metric moved on live calls.

Verifying support_v4 in production126 calls scored

Pronunciation — since your deploy71→78

Issue recurrencewatching…

Quality score78 ↑

Regressions on other metricsnone

you ship it — Roark verifies every call, under your BAA

…and the loop runs again on the next call.

§03 · Simulate before launch

Break it in staging,
not in production.

Run your agent against hundreds of simulated callers — realistic personas, accents, background noise and edge cases — and get every conversation scored before a customer ever dials in.

Scenarios & personas

Hundreds of simulated callers — the angry one, the rambler, the interrupter — built from your real call types.

45 languages & accents

Native accents, code-switching and background noise — in every market your agent answers.

Load & health tests

Peak-volume concurrency and always-on health checks, so the agent that passed in staging survives launch day.

Run it in CI

Every prompt or model change runs the suite before it merges — quality gates for conversations, not just code.

Pre-launch suite · healthcare_v1182 / 200 passed

Refill · The 'met-a-pearl' refillpass · 92

Records · PHI before identitypass · 88

Intake · The skipped HIPAA disclosurefail · 61

Triage · Flat empathy on a scared callerpass · 85

Dosage · Confident, unverified dosagepass · 90

1 failure filed as an issue — fix it before launch, not after

Run your first suite

§04 · Evals & observability

64+ metrics. Your models,
not just an LLM.

Every production call scored as it lands — issues filed, alerts fired, dashboards and OTEL traces on tap, for voice calls and chat threads alike. And where most tools grade a transcript with an LLM, Roark runs purpose-built audio models on the call itself, measuring what your customer actually heard.

Everyone else

LLM reads the transcript

“The agent said the right words.” Misses how it sounded — the mispronounced drug name, the flat apology, the rushed close.

“…refund within three business days.” ✓ text-match

Roark · audio modelhears the call

Audio models hear the call

Pronunciation, accent, emotion and vocal stress measured from the waveform — the signal an LLM grading text can never see.

emotion · pace: rushed close

Empathy

Audio-native

custom models

Pronunciation
Accent clarity
Emotion
Vocal stress
Pace & pauses
Interruptions

Compliance

policy

HIPAA disclosures
Identity check
PHI exposure
Script adherence

Conversational

LLM + rules

Empathy
Task success
Accuracy
Hallucination
Repetition
Tone

Performance

latency

Time-to-first-word
Turn latency
ASR WER
Barge-in handling

64+metrics out of the box

∞custom metrics, your rules

Audio + LLMmodels on every call

§05 · Get started

First call scored in under a minute.

One click on any platform below and production calls stream in on their own — or send any recording with three lines of code.

Read the quickstart

evaluate.ts

import Roark from '@roarkhq/sdk'
const roark = new Roark({ apiKey })
await roark.calls.evaluate({
  recordingUrl, agent: 'support_v2',
}) // scored in seconds

Node · Python · Go — plus a REST API for CI/CD and webhooks the instant a call is scored

Works with

Also built for

HIPAABAA availableSOC 2Type II

PHI handling & BAA

Signed BAA, configurable retention, and zero-data-retention options for PHI.

Security details

Bring a recording.
We’ll score it live.

See your own agent measured on the audio it actually produced — in the demo, in real time. Stop guessing whether your voice AI works.

Book a demo Read the docs

founders@roark.ai · we reply fast

Your intake agent has to say metoprolol right.