SIMULATE · EVALUATE · IMPROVEVoice agents for banking, payments, cards & collections

Your agent read an account balance before it knew who was calling.

Every banking call rides on identity verified before account access, no account number read back on an open line, and the disclosures your examiners expect. Roark scores all of it — on the audio, on every call.

Book a demo See what breaks

Backed by YC

Live · scoring every call1,284 today

Caller: What did my account close at this month?

Agent: Sure — you're at $4,212.08.

Disclosure before identity — Account balance read before the caller was verified.Identity check

Scoring production voice AI for teams at

radiantgraph

§01 · When the call goes wrong

Here's how a banking call goes wrong.

Each one is an account takeover, an exam finding, or a UDAAP complaint — and the worst of them are inaudible to a tool that only reads the transcript.

01Account access

Balance before identity

Caller: What did my account close at this month?

Agent: Sure — you're at $4,212.08.

The agent disclosed an account balance before verifying who was on the line. Roark fails any turn that touches account data before identity is confirmed — the exact gap an account-takeover caller probes for.

Identity check

02Card services

The full account number read aloud

Caller: Which card is the payment going on?

Agent: The one ending — actually the full number is 4147 8392…

A full PAN spoken on a recorded line is a PCI exposure and a permanent liability in your call archive. Roark flags account-number and PII read-back as a hard fail.

PII exposure

03Collections

The skipped mini-Miranda

Caller: Why are you calling me?

Agent: It's about your past-due balance — how much can you pay today?

The required debt-collection disclosure was never read before the agent pursued payment. Roark checks your Reg F / mini-Miranda script as pass/fail on every collections call.

Disclosures

04Disputes

The mishandled Reg E dispute

Caller: There's a charge I didn't make.

Agent: I can remove it, but you should really watch your card more carefully.

The agent both prejudged the claim and skipped the error-resolution rights the caller is owed. Roark checks the dispute script and flags blame-shifting language a transcript-only tool reads as 'resolved.'

Dispute handling

05Fraud

The stressed caller, missed signal

Caller: Someone called me and I— I gave them the code they texted.

Agent: Okay. Is there anything else I can help you with today?

The caller's voice was shaking — a textbook scam-in-progress — and the agent moved on. The audio model hears the stress and confusion the transcript renders as a calm sentence, so the fraud signal never gets missed.

Vocal stress

§02 · From caught to fixed

Roark catches every one of these — and proves the fix.

Each failure above is filed with its evidence, becomes a repeatable simulation until a candidate passes, and is verified on your next thousand live calls.

01 · Catch

The ledger above — every failure filed live, evidence attached.

See what breaks

02 · Simulate

Your fix, replayed against the exact failures above.

Testing your candidates82 / 240

prompt · verify-first openerfail

voice · read digits slowerfail

model · gpt-4.1fail

03 · Review

Every change explicit and diffed — you apply it.

Your fix, diffedsupport_v3 v4

PromptToolModel

− Answer the caller account questions.

+ Verify identity before reading any account data; read the required disclosure before pursuing payment.

04 · Verify

You ship — Roark confirms the metric moved on live calls.

Verifying support_v4 in production126 calls scored

Identity check — since your deploy71→78

Issue recurrencewatching…

Quality score78 ↑

Regressions on other metricsnone

you ship it — Roark verifies every call, with PAN masking on by default

…and the loop runs again on the next call.

§03 · Simulate before launch

Break it in staging,
not in production.

Run your agent against hundreds of simulated callers — realistic personas, accents, background noise and edge cases — and get every conversation scored before a customer ever dials in.

Scenarios & personas

Hundreds of simulated callers — the angry one, the rambler, the interrupter — built from your real call types.

45 languages & accents

Native accents, code-switching and background noise — in every market your agent answers.

Load & health tests

Peak-volume concurrency and always-on health checks, so the agent that passed in staging survives launch day.

Run it in CI

Every prompt or model change runs the suite before it merges — quality gates for conversations, not just code.

Pre-launch suite · finance_v1182 / 200 passed

Account access · Balance before identitypass · 92

Card services · The full account number read aloudpass · 88

Collections · The skipped mini-Mirandafail · 61

Disputes · The mishandled Reg E disputepass · 85

Fraud · The stressed caller, missed signalpass · 90

1 failure filed as an issue — fix it before launch, not after

Run your first suite

§04 · Evals & observability

64+ metrics. Your models,
not just an LLM.

Every production call scored as it lands — issues filed, alerts fired, dashboards and OTEL traces on tap, for voice calls and chat threads alike. And where most tools grade a transcript with an LLM, Roark runs purpose-built audio models on the call itself, measuring what your customer actually heard.

Everyone else

LLM reads the transcript

“The agent said the right words.” Misses how it sounded — the mispronounced drug name, the flat apology, the rushed close.

“…refund within three business days.” ✓ text-match

Roark · audio modelhears the call

Audio models hear the call

Pronunciation, accent, emotion and vocal stress measured from the waveform — the signal an LLM grading text can never see.

emotion · pace: rushed close

Empathy

Compliance

policy

Identity check
PII exposure
Disclosures
Dispute handling
Script adherence

Audio-native

custom models

Vocal stress
Pronunciation
Accent clarity
Emotion
Pace & pauses
Interruptions

Conversational

LLM + rules

Task success
Accuracy
Hallucination
De-escalation
Repetition
Tone

Performance

latency

Time-to-first-word
Turn latency
ASR WER
Barge-in handling

64+metrics out of the box

∞custom metrics, your rules

Audio + LLMmodels on every call

§05 · Get started

First call scored in under a minute.

One click on any platform below and production calls stream in on their own — or send any recording with three lines of code.

Read the quickstart

evaluate.ts

import Roark from '@roarkhq/sdk'
const roark = new Roark({ apiKey })
await roark.calls.evaluate({
  recordingUrl, agent: 'support_v2',
}) // scored in seconds

Node · Python · Go — plus a REST API for CI/CD and webhooks the instant a call is scored

Works with

Also built for

Insurance

Quote the coverage right, read every required disclosure, and never sound cold to a claimant in crisis.

Explore

Customer Support

Resolve it for real, escalate cleanly, and hear the frustration the transcript hides.

Explore

Healthcare

Say the drug name right, verify before PHI, and never sound like a robot to a scared patient.

Explore

SOC 2Type II

HIPAABAA available

Account data & PII handling

PAN/PII masking, configurable retention, and zero-data-retention options — plus scripted checks for your Reg E, Reg F and disclosure requirements on every call.

Security details

Bring a recording.
We’ll score it live.

See your own agent measured on the audio it actually produced — in the demo, in real time. Stop guessing whether your voice AI works.

Book a demo Read the docs

founders@roark.ai · we reply fast

Your agent read an account balance before it knew who was calling.

Here's how a banking call goes wrong.

Balance before identity

The full account number read aloud

The skipped mini-Miranda

The mishandled Reg E dispute

The stressed caller, missed signal

Roark catches every one of these — and proves the fix.

Break it in staging, not in production.

Scenarios & personas

45 languages & accents

Load & health tests

Run it in CI

64+ metrics. Your models, not just an LLM.

Compliance

Audio-native

Conversational

Performance

First call scored in under a minute.

Insurance

Customer Support

Healthcare

Bring a recording. We’ll score it live.

Break it in staging,
not in production.

64+ metrics. Your models,
not just an LLM.

Bring a recording.
We’ll score it live.