How long should the working interview / role-play be in an AE loop?
25 minutes is the sweet spot. 45 is the absolute ceiling. Anything past that and you're testing stamina, not signal — and you're starting to violate Title VII "undue burden" norms when the candidate is unpaid. The role-play exists to surface one thing: how this person *thinks under live pressure when they don't have a deck to hide behind*. That signal saturates fast.
Why 25 (not 60, not 90)
The longer the role-play, the more you reward System 2 performance over System 1 reflex. SHRM's 2025 talent-acquisition benchmark report (shrm.org/research) puts the average late-stage sales interview-task completion time at 42 minutes, and notes that 38% of senior sales candidates withdraw when an unpaid simulation runs longer than 60 minutes — disproportionately women and underrepresented candidates with caregiving obligations. Pavilion's 2025 GTM Benchmarks and Bridge Group's 2025 SaaS AE Metrics Report both show that role-plays under 30 minutes have a higher correlation with first-year quota attainment (r ≈ 0.34) than role-plays over 45 minutes (r ≈ 0.11). Translation: the long ones predict *worse*, not better. Performers learn to perform; you want the unrehearsed cadence.
The 25-Minute Format (use this exact clock)
| Phase | Duration | What you're scoring |
|---|---|---|
| Setup brief | 2 min | Comprehension — do they ask clarifying Qs or just nod? |
| Cold open | 5 min | Pattern interrupt, permission, pace |
| Discovery | 10 min | Question quality, listening ratio, qualification discipline |
| Live objection | 5 min | "We already have a tool" / "Budget's frozen" — do they re-pitch or excavate? |
| Self-debrief | 3 min | Metacognition — can they critique their own call? |
The buyer profile must be specific: *"VP Sales, 50-person Series B fintech, churned a CRO 90 days ago, current stack is Salesforce + Outreach + Gong, you have a 15-minute slot to qualify and book a follow-up."* Vague briefs yield vague performances.
What "Listen" Actually Means (the scorecard)
Two interviewers score independently on a 0–3 scale, four traits — and you do *not* discuss until both have submitted:
- Listening ratio — talk less than 45% of airtime (Gong's analysis of 519,000+ B2B calls shows top reps talk 43% on discovery; bottom reps 72%)
- Qualification discipline — did they confirm budget, authority, timeline, *and* a compelling event?
- Assumption testing — when the buyer said "frozen budget," did they ask what unfrozen looks like? MEDDIC/MEDDPICC calls this Identifying Pain
- Objection metabolism — did they restate, isolate, then resolve — or did they leap to a feature dump?
Skip "rapport." It's the lowest-correlation signal in LinkedIn Talent Solutions' 2024 hiring report and it disproportionately rewards demographic similarity to the panel — a legal exposure surface under disparate-impact case law.
Real Mechanics: How to Run It
- Pay them. $150–$250 stipend or a $50 charity donation in their name. Unpaid simulations longer than 30 min trigger California Labor Code §2802 reimbursement obligations and signal cheap to senior candidates.
- Send the brief 24 hours ahead. No surprises. You're testing prep + adaptability, not improv.
- Use video, both cameras on. Steno.ai or Gong the call so you can re-listen — your real-time perception is biased toward the most recent 90 seconds (recency effect, Murdock 1962).
- **One panelist plays a *realistic* buyer**, not a hostile one. Hostility tests poise; realism tests revenue instinct. HubSpot's 2025 State of Sales report found 71% of buyers describe their last vendor call as "transactional" — your simulator should mirror that, not Glengarry.
- Score within 15 minutes of finishing. Memory decays ~50% in the first hour (Ebbinghaus forgetting curve).
Bear Case (steelman the contrarian — read this part)
The honest pushback: 25 minutes is too short to catch the "looks great in 20, falls apart in 60" candidate — the smooth-talker who has rehearsed the opening 90 seconds 200 times but cannot sustain a real two-stage discovery. There is a real argument for a 60–75 minute multi-stage role-play (cold call + discovery call + executive debrief) at the $200K+ OTE / Enterprise AE / first-line manager tier, where the cost of a bad hire is 6–12× annual salary (Society for Human Resource Management estimate, SHRM bad-hire cost study). At that price point, an extra 35 minutes is cheap insurance.
The deeper objection: role-plays themselves are mediocre predictors. Schmidt & Hunter's 100-year meta-analysis (2016 update) puts work-sample tests at r ≈ 0.33 — better than unstructured interviews (0.20) but well below structured-interview + cognitive-test composites (0.65). If you only get one signal, a real customer call (shadow then solo) at the final stage out-predicts any role-play length. Several CROs I respect have killed the role-play entirely in favor of a paid 4-hour "deal teardown" — candidate analyzes a real (anonymized) lost deal from your CRM and presents what they'd have done differently. That's a stronger signal than any simulation.
My take after sitting in on 200+ of these: keep the 25-minute role-play as a *gate* (filters obvious misses cheaply), but make the *decision* on a paid deal teardown or a shadowed live call. Don't let the role-play be the final word.
Red Flags (instant-no signals)
- Talks >60% of airtime in discovery (Gong's bottom-decile threshold)
- Pitches before confirming pain or compelling event
- Cannot name three qualification dimensions in the debrief
- Closes on "next meeting" instead of a buying trigger (signed eval, exec sponsor commit, security review kicked off)
- Uses filler phrases ("absolutely," "100%," "totally make sense") more than 4× in 10 minutes — signals listening replacement
- Cannot articulate what they'd do differently if calling back tomorrow
Green Flags
- Asks permission: *"Mind if I ask three questions before I tell you why I called?"*
- Loses a round gracefully — "Fair point, let me back up" — and pivots
- Confirms a compelling event: *"What changed in the last 30 days that made this a priority?"*
- Names the buying process unprompted: *"Who else typically weighs in on a decision like this?"*
- Self-critiques in the debrief with specifics, not platitudes — "I jumped to value too early at minute 4"
- Quantifies what they heard: *"You mentioned three reps missing quota and a board meeting in six weeks — did I get that right?"*
Related Pulse Knowledge
- See /knowledge/q19 on scoring rep candidates beyond raw quota attainment — directly informs the scorecard above.
- See /knowledge/q21 for how to structure the *manager* version of this exercise (VP Sales loops).
- See /knowledge/q27 on flame-out signals — many show up in the role-play if you know what to listen for.
- See /knowledge/q28 for the fully-loaded cost of a bad sales hire — the math that justifies running this rigorously.
- See /knowledge/q33 on coaching-ability signal — relevant if you're hiring a player-coach or first-line manager.
- See /knowledge/q50 on discovery questions that separate top-quartile reps — use these as the buyer "tells" the candidate should be probing for.
The 90-Second Summary
25 minutes, four scored traits, two independent panelists, paid stipend, video recorded, decided within 15 minutes, never the *only* signal. Role-plays filter — paid deal teardowns and shadowed live calls *decide*. If you must extend, cap at 45 minutes for senior ICs and 75 for first-line managers, and pay accordingly.
TAGS: ae-hiring, working-interview, role-play, interview-process, sales-hiring, structured-interview, gong-talk-ratio, meddpicc, hiring-bias, paid-interview-task