SIMULATE · EVALUATE · IMPROVEVoice agents for FNOL, claims, quotes & renewals

Your FNOL agent quoted a deductible that does not exist.

Claims, FNOL, quotes and renewals run on exact coverage facts, state-required disclosures, and a human voice for someone who just totaled their car. Roark scores every call on the audio — and fails the ones that get the policy wrong.

Book a demo See what breaks

Backed by YC

Live · scoring every call1,284 today

Caller: My house flooded, everything is gone.

Agent: Okay. What is your policy number?

Flat empathy — Correct words, cold delivery to a claimant in crisis — heard by the audio model, invisible to the transcript.Empathy

Scoring production voice AI for teams at

radiantgraph

§01 · When the call goes wrong

Here's how a policy call goes wrong.

Each one is a bad-faith exposure, a misquoted premium, or a missed fraud cue — and most are inaudible to a tool that only reads the transcript.

01Quote

The deductible that does not exist

Caller: What's my deductible if I file?

Agent: Looks like $250 on this policy.

Stated with total confidence, never checked against the policy — the real number is $1,000. Roark flags coverage assertions that were never verified against the record.

Coverage accuracy

02FNOL

The skipped recorded-line disclosure

Caller: I need to report an accident.

Agent: Sure — let's get the details.

The required recorded-line and claims-handling notice was never read on a recorded call. Roark scores disclosure language as pass/fail on every call.

Disclosures

03Claim

Cold voice on a claimant in crisis

Caller: My house flooded, everything is gone.

Agent: Okay. What is your policy number?

The words were procedurally correct and the delivery was flat. The audio model scores warmth and the vocal stress in the claimant the transcript can never show.

Empathy

04Claim

The missed fraud cue

Caller: The flood was Tuesday — wait, no, before I bought the policy.

Agent: No problem, I will start the claim.

The caller contradicted their own loss date and the agent moved on. Roark flags inconsistency and fraud-signal cues the agent should have escalated, not processed.

Fraud signals

05Renewal

The wrong coverage on renewal

Caller: Does this still cover water backup?

Agent: Yes, that is included on your plan.

It was dropped at last renewal. A wrong coverage answer becomes a denied claim and a bad-faith complaint months later. Roark fails unverified coverage confirmations.

Coverage accuracy

§02 · From caught to fixed

Roark catches every one of these — and proves the fix.

Each failure above is filed with its evidence, becomes a repeatable simulation until a candidate passes, and is verified on your next thousand live calls.

01 · Catch

The ledger above — every failure filed live, evidence attached.

See what breaks

02 · Simulate

Your fix, replayed against the exact failures above.

Testing your candidates82 / 240

prompt · warmer claim openfail

voice · slower pacingfail

tts · turbo-v2 → v3fail

03 · Review

Every change explicit and diffed — you apply it.

Your fix, diffedsupport_v3 v4

PromptToolVoice

− Help the caller with their request.

+ Quote coverage only from a verified policy lookup; lead a claim with acknowledgement, then details.

04 · Verify

You ship — Roark confirms the metric moved on live calls.

Verifying support_v4 in production126 calls scored

Coverage accuracy — since your deploy71→78

Issue recurrencewatching…

Quality score78 ↑

Regressions on other metricsnone

you ship it — Roark verifies every call, with state disclosure scripts enforced

…and the loop runs again on the next call.

§03 · Simulate before launch

Break it in staging,
not in production.

Run your agent against hundreds of simulated callers — realistic personas, accents, background noise and edge cases — and get every conversation scored before a customer ever dials in.

Scenarios & personas

Hundreds of simulated callers — the angry one, the rambler, the interrupter — built from your real call types.

45 languages & accents

Native accents, code-switching and background noise — in every market your agent answers.

Load & health tests

Peak-volume concurrency and always-on health checks, so the agent that passed in staging survives launch day.

Run it in CI

Every prompt or model change runs the suite before it merges — quality gates for conversations, not just code.

Pre-launch suite · insurance_v1182 / 200 passed

Quote · The deductible that does not existpass · 92

FNOL · The skipped recorded-line disclosurepass · 88

Claim · Cold voice on a claimant in crisisfail · 61

Claim · The missed fraud cuepass · 85

Renewal · The wrong coverage on renewalpass · 90

1 failure filed as an issue — fix it before launch, not after

Run your first suite

§04 · Evals & observability

64+ metrics. Your models,
not just an LLM.

Every production call scored as it lands — issues filed, alerts fired, dashboards and OTEL traces on tap, for voice calls and chat threads alike. And where most tools grade a transcript with an LLM, Roark runs purpose-built audio models on the call itself, measuring what your customer actually heard.

Everyone else

LLM reads the transcript

“The agent said the right words.” Misses how it sounded — the mispronounced drug name, the flat apology, the rushed close.

“…refund within three business days.” ✓ text-match

Roark · audio modelhears the call

Audio models hear the call

Pronunciation, accent, emotion and vocal stress measured from the waveform — the signal an LLM grading text can never see.

emotion · pace: rushed close

Empathy

Accuracy & compliance

policy

Coverage accuracy
Disclosures
Quote accuracy
Fraud signals
Script adherence

Audio-native

custom models

Empathy
Vocal stress
Pronunciation
Accent clarity
Pace & pauses
Interruptions

Conversational

LLM + rules

Task success
Hallucination
Repetition
Tone
De-escalation

Performance

latency

Time-to-first-word
Turn latency
ASR WER
Barge-in handling

64+metrics out of the box

∞custom metrics, your rules

Audio + LLMmodels on every call

§05 · Get started

First call scored in under a minute.

One click on any platform below and production calls stream in on their own — or send any recording with three lines of code.

Read the quickstart

evaluate.ts

import Roark from '@roarkhq/sdk'
const roark = new Roark({ apiKey })
await roark.calls.evaluate({
  recordingUrl, agent: 'support_v2',
}) // scored in seconds

Node · Python · Go — plus a REST API for CI/CD and webhooks the instant a call is scored

Works with

Also built for

Finance

Verify before you disclose, never expose an account number, and read the disclosure your regulator wrote.

Explore

Healthcare

Say the drug name right, verify before PHI, and never sound like a robot to a scared patient.

Explore

Customer Support

Resolve it for real, escalate cleanly, and hear the frustration the transcript hides.

Explore

SOC 2Type II

HIPAABAA available

Disclosure & TCPA scripts

Roark checks state-required claims disclosures, recorded-line notices and TCPA consent language as pass/fail on every call — configurable retention and redaction for claimant PII.

Security details

Bring a recording.
We’ll score it live.

See your own agent measured on the audio it actually produced — in the demo, in real time. Stop guessing whether your voice AI works.

Book a demo Read the docs

founders@roark.ai · we reply fast

Your FNOL agent quoted a deductible that does not exist.

Here's how a policy call goes wrong.

The deductible that does not exist

The skipped recorded-line disclosure

Cold voice on a claimant in crisis

The missed fraud cue

The wrong coverage on renewal

Roark catches every one of these — and proves the fix.

Break it in staging, not in production.

Scenarios & personas

45 languages & accents

Load & health tests

Run it in CI

64+ metrics. Your models, not just an LLM.

Accuracy & compliance

Audio-native

Conversational

Performance

First call scored in under a minute.

Finance

Healthcare

Customer Support

Bring a recording. We’ll score it live.

Break it in staging,
not in production.

64+ metrics. Your models,
not just an LLM.

Bring a recording.
We’ll score it live.