The Science of Call Grading

Not vibes. A pipeline.

Phone Genius turns a front-desk phone call into a structured, scored, coachable artifact. That requires more than a transcript. It requires speech recognition, speaker separation, intent tagging, a specialty-tuned rubric, and an inter-rater framework that makes the score reproducible — not the opinion of whoever happened to listen.

ASR + diarization Rubric-based grading Specialty-tuned scenarios Coaching-ready output

Research foundations

Four disciplines converge in one score.

Phone Genius isn't one model behind a chat interface. It's a pipeline that borrows its best ideas from four mature research fields — then translates them into something a specialty practice owner can actually use on a Monday morning.

01

Mystery shopping methodology

Fifty years of service-quality measurement — banking, retail, hospitality, healthcare. The simulated-patient variant (CRiSP reporting checklist) is an accepted research methodology. Phone Genius uses the same core: a structured, repeatable scenario under controlled conditions so the score reflects the practice's behavior, not the caller's mood.

Structured scenario Controlled conditions Standardized report Aggregate-first analysis
02

Speech analytics & conversational AI

ASR, speaker diarization, and intent classification — the standard contact-center pipeline. Whisper-class WER under 5% on clean English. Diarization separates caller from staff to measure turn-taking, interruption, talk-time ratio, and silence. Not marketing claims — standard tooling from the research literature.

Word-level ASR Speaker diarization Intent tagging Prosody cues
03

Rubric-based human evaluation

Same text, two graders, same score — that's inter-rater reliability (Cohen's kappa, Krippendorff's alpha). Our rubric uses concrete, observable anchors: "did staff ask for the appointment within 90 seconds of the pain question?" — not "did it feel warm?" Observable anchors make the grade reproducible between AI and human auditor.

Observable anchors Banded scores Human-calibrated Agreement stats
04

Dental specialty field research

A perio call has different stakes, objections, and conversion ceilings than a pizza order. Built from 1,000+ hours of specialty coaching, 350+ practices on Referral Lab, and a million-plus referral records — including ~85% target schedule rate, time-to-conversion as the forward indicator, and the 45–90 second appointment-ask window.

Specialty scenarios Insurance realism Schedule-rate benchmarks Perio/OS/ortho/endo/prosth

The rubric

Seven dimensions. One score. Zero guesswork.

Every call is scored on the same seven dimensions. Each dimension has concrete behavioral anchors — not mood words — so two graders looking at the same transcript reach the same number. The composite is a weighted 0–100 "Call Score," but the dimension breakdown is what drives the coaching.

01

Greeting & Warmth

Did the call open inside three rings with a clean brand greeting, a named staff member, and a warmth signal the caller's ear actually hears?

Observable anchors
  • Picked up within target ring count
  • Brand + staff name stated in opener
  • Tone: no dead-air or flat monotone
  • No audible hold before the human
02

Control of the Call

Did the staff lead, or did the caller lead? A controlled call moves through a discovery arc on purpose — not a Q&A volley dictated by the caller's anxiety.

Observable anchors
  • Staff asks more open questions than caller
  • Talk-time ratio inside healthy band
  • Interruptions trend low
  • Silence tolerated after a question
03

Discovery & Listening

Did the staff uncover the reason for the call — pain, referral source, timing, insurance, prior experience — before jumping to logistics?

Observable anchors
  • Open-ended "what brings you in" moment
  • Referral source captured verbatim
  • Pain / urgency acknowledged explicitly
  • Active-listening markers ("got it," "mm-hm")
04

Clinical & Administrative Accuracy

Were the factual answers correct? Insurance networks, specialty scope, referral requirements, visit length, what happens at the consult — the boring stuff patients fact-check after they hang up.

Observable anchors
  • Insurance answer matches practice payer list
  • Specialty scope stated correctly
  • Visit length expectation set accurately
  • No fabricated clinical promises
05

Objection Handling

When the caller pushed back on price, timing, insurance, or distance, did the staff have a move — or did the call collapse into "let me take your number"?

Observable anchors
  • Objection named, not deflected
  • Reframe offered before alternative
  • Value anchored, not just price defended
  • No premature "we'll call you back"
06

Urgency & Timing

Specialty new-patient calls have a window — usually 45 to 90 seconds — where the appointment ask lands. Blow past the window and conversion falls off a cliff.

Observable anchors
  • Appointment offered inside the window
  • Specific dates/times offered, not "we have openings"
  • Same-week option surfaced when scope allows
  • Pain-first callers fast-tracked
07

Appointment Conversion

Did the call end with a scheduled appointment — not a "we'll call you back," not a "think about it," not a verbal maybe? This is the dimension every other dimension is trying to serve. It's weighted accordingly.

Observable anchors
  • Firm appointment booked
  • Confirmation details read back
  • Next step communicated (forms, insurance, directions)
  • Warm close, not transactional goodbye

Composite score is a weighted sum with Appointment Conversion (07) as the dominant weight, Urgency (06) and Control (02) as secondary weights, and the remaining dimensions as texture.

90+ Books the appointment and does it well.
60 Books it poorly.
40 Didn't book it.
72 Call Score
Graded Mystery call #047 · Apr 18 · 2m 14s
01 Greeting
81
02 Control
64
03 Discovery
73
04 Accuracy
88
05 Objections
55
06 Urgency
60
07 Conversion
58
Coaching moment · 1:42

“I see you have some openings” — try: “I have Tuesday at 10 or Thursday at 2 — which works better for you?”

Methodology

How a phone call becomes a score.

The pipeline runs the same way every time so the output is comparable across calls, across staff, across practices, and across months. Nothing is "judged." Every score maps back to a specific observable event in the transcript with a timestamp.

1

Scenario & voice selection

Each mystery call gets a specialty-specific scenario: perio consult, OS extraction with pain, ortho teen, endo emergency, prosth implant. The AI caller uses a realistic persona (age, insurance posture, referral source) with scripted flexibility — not a hard read.

2

Call placement & recording

Placed at a realistic time and day. Ring count, hold time, and transfers are all captured. Recording follows applicable consent rules — two-party disclosure is built into the opener when required.

3

ASR with word-level timestamps

Whisper-class ASR model. Word-level timestamps, punctuation, confidence. WER under 5% on clean English telephony. Low-confidence regions are unscored, not guessed.

4

Speaker diarization

Caller turns vs. staff turns. Unlocks talk-time ratio, interruption counts, silence durations, and turn-taking patterns — all feeding Control (02) and Discovery (03).

5

Moment detection & intent tagging

Each staff turn is labeled: greeting, discovery, clinical explanation, objection handling, appointment ask, confirmation, close. Missing moments are scored — a call that never reaches "appointment ask" is a detectable event.

6

Rubric scoring

Seven dimensions scored against observable anchors. Banded (0/40/70/100), not free-text opinions. Composite Call Score is a weighted sum with Appointment Conversion dominant.

7

Coaching extraction

For each missed anchor: the quote, the timestamp, the staff member — plus a suggested replacement line grounded in specialty best practice. That 30-second clip is what the manager gets.

8

Calibration & audit

A rolling sample is re-graded by a human auditor. Agreement statistics (Cohen's kappa per dimension) are tracked. When agreement drifts, the rubric or model is adjusted — not the practice's score.

Why specialty-specific

A perio call isn't a pizza order.

Specialty new-patient calls carry more information, more friction, and more dollars than almost any other inbound conversation in healthcare. Generic call scoring misses the nuances that decide whether the appointment gets booked.

Generic call scoring

Surface-level
  • Warmth and professionalism in the abstract
  • "Did they say hello" style checklists
  • Flat 1–5 star manager scores
  • No model of the referral path
  • No benchmark for schedule rate or LTV
  • No concept of the appointment-ask window

Phone Genius

Specialty-tuned
  • Rubric anchors drawn from real specialty calls
  • Referral-source capture scored explicitly
  • Insurance posture: PPO, fee-for-service, medical billing
  • Urgency window (45–90 sec) measured, not guessed
  • Schedule-rate benchmarks: ~85% target in perio
  • Appointment conversion as the dominant weight
$8,500+ average lifetime value of one specialty new patient
~85% healthy target schedule rate for a perio practice
45–90 sec window to land the appointment ask on most new-patient calls
< 5% target ASR Word Error Rate on clean English telephony

What Phone Genius is and is not

Strong claims, clean boundaries.

A grading & coaching system

It measures how your front desk handles specialty new-patient calls against a consistent rubric, and surfaces the specific moments a manager can coach on this week.

Not a call tracking replacement

It does not replace CallRail, Dialpad, or your PBX. It grades the quality of the conversation. You keep your call tracking for routing and attribution.

Not a clinical decision tool

It does not diagnose, recommend treatment, or advise on clinical scope. Clinical accuracy means “did staff state the practice's scope correctly” — not “was the medicine right.”

Not a lie detector

It grades calls it places itself, as an AI mystery-shopper, using specialty-tuned scenarios. The sample is controlled on purpose.

Not a legal recording service

We comply with applicable consent laws for the calls we place. We don't record your real inbound traffic. Your own call-recording policy is yours.

Not a manager replacement

The output makes a manager twice as effective — not obsolete. A real person decides whether Monday's coaching moment is a one-to-one, a stand-up, or a hiring signal.

Field-tested, research-informed

Built inside specialty, not outside it.

Phone Genius was built by Cameron Full, DBA — founder of Referral Lab, fractional COO for specialty practices through Dentek, and author of a doctoral dissertation on how entrepreneurial operators recombine existing expertise to solve novel problems.

The rubric was pressure-tested against a decade of specialty operational data: treatment coordinator performance, front-desk variation, referral-source behavior, scheduling deficiencies, and the operational patterns that separate a 60% schedule rate from an 85% schedule rate.

Every dimension in the rubric exists because a specialty practice has paid a real cost for missing it — and because a trained TC coach has already coached against it.

350+ specialty practices in the operational dataset
1M+ referral records informing the benchmarks
1,000+ hours of specialty coaching feeding the rubric
10+ yrs inside perio, oral surgery, ortho, endo, prosth ops

Founding pilot

See your practice on the rubric.

We're onboarding a small cohort of specialty practices. Founding pilots lock in a discounted long-term rate, get direct input on the rubric, and receive their first graded calls inside two weeks.