SIMULATE · EVALUATE · IMPROVEVoice agents for reservations, concierge & guest services

Your concierge takes a booking in 45 languages.

Guests call from a loud lobby, in Gulf Arabic, Brazilian Portuguese, or South Indian English — and they want a king suite, late checkout, and to feel welcome. Roark scores every call on the audio: accent clarity, warmth, the booking detail, the upsell.

Y CombinatorBacked by YC
Live · scoring every call1,284 today

Caller: I want a room for the thirtieth of Ramadan.

Agent: I'm sorry, did you say the thirteenth?

Accent breakdownGulf Arabic caller misheard — heard by the audio model, invisible to the transcript.Accent clarity

Scoring production voice AI for teams at

Google
AT&T
BCG
Spectrum
Aircall
Podium
radiantgraph
Google
AT&T
BCG
Spectrum
Aircall
Podium
radiantgraph

§01 · When the call goes wrong

Here's how a guest call goes wrong.

Each one is a botched booking, a guest who felt unwelcome, or revenue left on the table — and most are inaudible to a tool that only reads the transcript.

01Reservation

The accent it could not parse

Caller: I want a room for the thirtieth of Ramadan.

Agent: I'm sorry, did you say the thirteenth?

A Gulf Arabic or South Indian English caller is misheard, the date flips, and the booking is wrong from the first turn. Accent breakdown is an audio failure a transcript-only tool reads as a clean exchange.

Accent clarity
02Booking

The wrong room, confidently confirmed

Caller: Two queens, non-smoking, two nights.

Agent: Confirmed — one king, smoking, for two nights.

The agent read back the wrong details with total confidence. Roark checks the confirmed booking against what the guest actually asked for, every call.

Booking accuracy
03Concierge

Warm request, robotic reply

Caller: It's our anniversary — can you do anything special?

Agent: Anniversary noted. Is there anything else.

The words were fine; the delivery was cold and flat on the one call that should have felt personal. The audio model scores warmth and emotion the text can never show.

Warmth
04Upsell

The suite it never offered

Caller: Do you have anything with a view?

Agent: Yes, our standard rooms face the courtyard.

A clear buying signal, and the agent never offered the ocean-view suite or late checkout. Roark scores whether the agent surfaced the upgrade the guest was reaching for.

Upsell capture
05Front desk

Drowned out by the lobby

Caller: ...checking in under Okafor, party of four...

Agent: Sorry, could you repeat the name three more times?

Background noise from a packed lobby buried the guest, and the agent collapsed into repeat-loops instead of recovering. Roark scores noise robustness and ASR accuracy on the real audio, not the cleaned-up transcript.

Noise robustness

§02 · From caught to fixed

Roark catches every one of these — and proves the fix.

Each failure above is filed with its evidence, becomes a repeatable simulation until a candidate passes, and is verified on your next thousand live calls.

01 · Catch

The ledger above — every failure filed live, evidence attached.

See what breaks

02 · Simulate

Your fix, replayed against the exact failures above.

Testing your candidates82 / 240
prompt · warmer concierge openfail
voice · slower pacingfail
asr · pt-BR + ar-AE accentsfail
03 · Review

Every change explicit and diffed — you apply it.

Your fix, diffedsupport_v3 v4
PromptModelTool
Take the booking and confirm the details.
+ Read the booking details back to the guest and confirm before saving — and offer the upgrade when the guest signals intent.
04 · Verify

You ship — Roark confirms the metric moved on live calls.

Verifying support_v4 in production126 calls scored
Accent clarity — since your deploy7178
Issue recurrencewatching…
Quality score78 ↑
Regressions on other metricsnone

you ship it — Roark verifies every call, in 45 languages

…and the loop runs again on the next call.

§03 · Simulate before launch

Break it in staging, not in production.

Run your agent against hundreds of simulated callers — realistic personas, accents, background noise and edge cases — and get every conversation scored before a customer ever dials in.

Scenarios & personas

Hundreds of simulated callers — the angry one, the rambler, the interrupter — built from your real call types.

45 languages & accents

Native accents, code-switching and background noise — in every market your agent answers.

Load & health tests

Peak-volume concurrency and always-on health checks, so the agent that passed in staging survives launch day.

Run it in CI

Every prompt or model change runs the suite before it merges — quality gates for conversations, not just code.

Pre-launch suite · hospitality_v1182 / 200 passed
Reservation · The accent it could not parsepass · 92
Booking · The wrong room, confidently confirmedpass · 88
Concierge · Warm request, robotic replyfail · 61
Upsell · The suite it never offeredpass · 85
Front desk · Drowned out by the lobbypass · 90

1 failure filed as an issue — fix it before launch, not after

§04 · Evals & observability

64+ metrics. Your models, not just an LLM.

Every production call scored as it lands — issues filed, alerts fired, dashboards and OTEL traces on tap, for voice calls and chat threads alike. And where most tools grade a transcript with an LLM, Roark runs purpose-built audio models on the call itself, measuring what your customer actually heard.

Everyone else

LLM reads the transcript

“The agent said the right words.” Misses how it sounded — the mispronounced drug name, the flat apology, the rushed close.

“…refund within three business days.” ✓ text-match
Roark · audio modelhears the call

Audio models hear the call

Pronunciation, accent, emotion and vocal stress measured from the waveform — the signal an LLM grading text can never see.

emotion · pace: rushed close
Empathy
84

Built for 45 languages

The way the world actually calls.

Native accents, code-switching and local noise — not English with a filter. Every audio-native metric runs in each one.

45languages & dialects,
growing every release
Latin American SpanishGulf ArabicBrazilian PortugueseSouth Indian EnglishUK EnglishAustralian EnglishParisian FrenchMandarinCantoneseHigh GermanJapaneseTagalogHindi+ 32 more

Audio-native

custom models

  • Accent clarity
  • Warmth
  • Emotion
  • Noise robustness
  • Pronunciation
  • Pace & pauses

Languages

45 languages

  • Multilingual ASR
  • Language match
  • Dialect handling
  • Code-switching

Conversational

LLM + rules

  • Booking accuracy
  • Upsell capture
  • Task success
  • Hallucination
  • Repetition
  • Tone

Performance

latency

  • Time-to-first-word
  • Turn latency
  • ASR WER
  • Barge-in handling
64+metrics out of the box
custom metrics, your rules
Audio + LLMmodels on every call

§05 · Get started

First call scored in under a minute.

One click on any platform below and production calls stream in on their own — or send any recording with three lines of code.

Read the quickstart
evaluate.ts
import Roark from '@roarkhq/sdk'
const roark = new Roark({ apiKey })
await roark.calls.evaluate({
recordingUrl, agent: 'support_v2',
}) // scored in seconds
Node · Python · Go — plus a REST API for CI/CD and webhooks the instant a call is scored

Works with

Vapi
Bland
Retell
LiveKit
Pipecat
ElevenLabs
Kore.ai
Google
SOC 2Type IIHIPAABAA available

Guest data & payment handling

Roark scores PCI-DSS card-handling scripts as pass/fail, redacts captured card numbers, and runs with configurable retention for guest PII.

Security details

Bring a recording.
We’ll score it live.

See your own agent measured on the audio it actually produced — in the demo, in real time. Stop guessing whether your voice AI works.

founders@roark.ai · we reply fast