How long should the working interview / role-play be in an AE loop?
How long should the working interview / role-play be in an AE loop?
Direct Answer: Run a 60-minute working interview as the centerpiece of your AE final loop — broken into a 10-minute brief, a 30-minute live role-play across two scenarios, a 15-minute panel debrief where the candidate self-grades, and a 5-minute reverse-questioning window. Total candidate time-on-task across the full loop (including async prep, written deliverable, and the live block) should land between 3.5 and 5 hours.
Anything under 2 hours is too thin to read deal IQ; anything over 6 hours is unpaid labor that will lose you the candidates you most want to hire. The 60-minute live block is the load-bearing piece — and the only one that actually predicts AE performance in the first two quarters.
1. The 60-Minute Rule (And Why Shorter Loops Lie To You)
Most sales orgs collapse the working interview to 20 or 30 minutes because they think they are respecting the candidate's time. They are not. They are signaling to the candidate that the role does not warrant rigorous selection, and they are guaranteeing that the panel only sees the candidate's rehearsed opener — never the failure modes, the recovery instincts, or the actual discovery muscle that separates a quota-attaining AE from a deck-reading talker.
Sixty minutes is the inflection point for a specific reason: it is the smallest window where a candidate cannot sustain a performance. Studies of structured interview validity (Schmidt and Hunter's foundational 1998 meta-analysis, reaffirmed in McDaniel's 2007 follow-up) show that work-sample tests are the single highest-validity predictor of job performance, with a corrected validity coefficient of roughly 0.54 — higher than IQ tests, structured interviews, and assessment centers.
But that validity collapses when the work sample is under 25 minutes of continuous performance, because short samples reward verbal fluency over actual capability. The candidate keeps their armor on.
Once you push past the 30-minute mark, three things happen mechanically. First, the candidate exhausts their prepared talk track and is forced to improvise. Second, you can introduce a curveball — a budget objection, a stalled multi-threading scenario, a procurement ambush — and watch how they regulate under pressure.
Third, you create space for the second-order question: not "did they handle the objection" but "did they handle the objection without losing the thread of the discovery they were running before the objection landed?" That is the single most diagnostic skill in modern complex-sales selling, and it is invisible in a 20-minute block.
The 60-minute structure also gives you the room to test something most loops never test: silence. A great AE will sit in a deliberate pause for 6 or 8 seconds to let a buyer keep talking. A mediocre AE will fill every silence within 2 seconds. You cannot observe this in a compressed loop. You can in 60 minutes.
2. Why Two Scenarios Beat One (Even Though It Costs You 10 Minutes)
The single biggest design flaw in working interviews is running one long scenario. It gives the candidate one chance to read your panel, calibrate, and then perform consistently — which means you are measuring their ability to lock in once, not their ability to context-switch. Real AEs context-switch 30 to 50 times a day across deals at different stages, different personas, and different motions.
Single-scenario role-plays measure none of that.
Run two distinct scenarios inside the 30-minute live block. Here is the split that works:
- Scenario A (15 minutes): A net-new discovery call. The candidate is meeting a VP-level prospect for the first time. The prospect (played by a hiring manager) has agreed to the call because of an outbound sequence but has limited context. The candidate must run discovery, surface a quantifiable pain, and earn a second meeting. This tests opener strength, question stacking, active listening, and the ability to convert curiosity into commitment.
- Scenario B (15 minutes): A late-stage negotiation pivot. The candidate is now seven weeks into the same deal. The economic buyer is requesting a 30% discount with no concession requested in return. Procurement has just been looped in. The original champion has gone dark for nine days. This tests deal mechanics, multi-threading instinct, commercial muscle, and the willingness to push back without rupturing the relationship.
The 5-minute transition between scenarios is itself diagnostic. Watch what the candidate does with that micro-break. Do they ask a clarifying question about the next scenario?
Do they ask the panel for feedback on the first one? Do they take a sip of water and lock in? Each of those tells you something about their self-regulation.
The candidates who use the break to ask "is there anything you want to see me do differently in the next one?" are showing coachability under live observation — which is one of the top three predictors of first-year ramp speed, ahead of even prior quota attainment.
3. Time-On-Task Across The Full Loop: The 3.5-To-5-Hour Window
The 60-minute live block is the centerpiece, but it does not stand alone. Around it, build a total candidate investment of 3.5 to 5 hours. Here is the defensible breakdown:
- Async prep brief (sent 48 hours before the live block) — 60 to 90 minutes of candidate time. Includes a one-page company persona doc, a fabricated ICP account dossier with public-style firmographics, a redacted prior-call transcript, and three guided prompts. The candidate is expected to come in with a hypothesis about the buyer's top three pains and a draft of three discovery questions per pain.
- Written deliverable (returned 24 hours before the live block) — 45 to 60 minutes of candidate time. A one-page MEDDPICC-or-equivalent qualification draft, plus a two-paragraph outbound sequence to a named target persona. This is your only chance to read the candidate's written craft, which still matters because AEs now spend 35 to 40% of their week in async written communication with buyers and internal pods.
- The 60-minute live block. Already detailed above.
- The 30-minute panel debrief and reverse-questioning window — 30 minutes of candidate time. Held immediately after the live block. The candidate self-grades against a rubric you share with them in advance. Then they get 10 minutes to ask the panel anything.
- Optional informal final — 30 minutes with the VP of Sales or CRO. Not scored. This is a culture sniff-test and an opportunity for the candidate to ask the senior-most person on the loop the questions they could not ask earlier.
That lands you at roughly 3 hours and 45 minutes of candidate investment on the low end, 4 hours and 30 minutes on the high end. Push past 5 hours and you will start losing top-of-market candidates who are weighing your loop against three other open offers. Stay under 3 hours and you have not earned enough signal to make a five-figure base salary commitment with seven-figure attainment expectations.
4. Why 2-Hour Mini-Loops Lose The Candidates You Most Want
There is a school of thought — popularized in late-2024 by a wave of "candidate-first hiring" content on LinkedIn — that says working interviews should be capped at 90 minutes total, with no async pre-work. The thesis: respect the candidate, move fast, decide quickly.
The thesis is half-right and half-disastrous. It is right that loops dragging past 5 hours, spanning 5 weeks, and bouncing across 7 interviewers are a self-inflicted wound — top candidates accept other offers in the gap. It is disastrous because 90 minutes of unstructured conversation is statistically indistinguishable from a coin flip for AE performance prediction.
You are not respecting the candidate by under-measuring them; you are setting them up to fail in a role you cannot confidently place them in.
The candidates you most want to hire — the 80th-percentile-and-up AEs who could pick from three offers — actively prefer rigorous loops. They want to be measured. They want a structured working interview because it lets them demonstrate craft that does not surface in conversational interviewing.
The 2025 RepVue talent survey of 4,400 active AE candidates found that 67% reported a "highly structured working interview" as a positive signal about the hiring company, and only 11% reported it as a negative signal. The companies losing top candidates are not losing them because of working-interview length.
They are losing them because of loop length (number of stages) and decision latency (days between stages).
So: 60 minutes of live working interview, yes. Six stages spread across four weeks, no. Compress the calendar, not the working interview.
5. The Brief: What Goes In The 10-Minute Setup
The first 10 minutes of the live block is non-negotiable scaffolding. Skip it and you waste the next 50.
Open with three things, in this exact order. First, a one-minute reminder of the scenario setup — even though they have the brief in front of them, you want them to hear it from you, because real sellers calibrate their tactics off live cues. Second, a one-minute walkthrough of the scoring rubric.
Yes, show them the rubric. Not the weights, but the dimensions. They will adjust their behavior to demonstrate strength across the dimensions — which is exactly what you want.
You are not testing whether they can guess what you are measuring; you are testing whether they can execute against a clear standard. Third, an eight-minute live Q&A where the candidate can ask anything about the scenario setup, the persona's mental state, the prior call history, and the panel's role.
Hiring managers who skip the Q&A consistently rate candidates lower than hiring managers who include it, because they are unconsciously penalizing candidates for missing context the candidate never had access to. The Q&A levels the field. It also lets you observe what the candidate cares about: do they ask about the buyer's pain, the buyer's politics, the buyer's budget, the buyer's authority, the panel's expectations?
Each ask reveals a slice of their commercial instinct.
6. The Debrief: The 15-Minute Window Where Half The Signal Lives
If you only have time to add one thing to your existing AE loop, add the post-role-play self-assessment. The setup: the candidate is given the same rubric the panel will use. They have 5 minutes to rate themselves across 6 to 8 dimensions, on a 1-to-5 scale. Then the panel asks them to walk through their self-grade.
The diagnostic signal here is profound. Three candidate archetypes will appear, and each tells you something different:
- The self-aware overgrader (rare, valuable): Rates themselves above where the panel rates them, but with clear, defensible reasoning. Often the strongest hires — they have a high self-belief floor that will survive a slow ramp.
- The self-aware accurate grader (most common among top hires): Lands within 0.5 points of the panel on every dimension. Demonstrates the metacognitive skill that powers self-coaching in the field. The single best predictor of AE growth past Q4.
- The defensive miscalibrator (a red flag): Either dramatically overrates themselves with no acknowledgement of weak moments, or dramatically underrates themselves to bait the panel into reassurance. Both predict coaching friction. You can hire one but you cannot hire many; they break sales managers.
Reserve the final 5 minutes of the debrief for reverse questions. What the candidate asks is itself signal. A candidate who asks "what is the most common reason AEs miss quota in their second year here?" is operationally curious. A candidate who only asks about comp, vesting, and equity is signaling priorities. Neither is wrong. Both are data.
7. Multi-Threading Inside The Role-Play: A Late-2025 Refinement
A 2025 design refinement that has shown up in loops at Gong, Outreach, Clari, and several mid-market SaaS leaders is the introduction of a "ghost stakeholder" inside Scenario B. The candidate is told mid-scenario that the champion has just forwarded the email thread to a previously-unmentioned VP of Finance.
The new VP responds within the scenario via a chat message read aloud by a panelist. The candidate must integrate this new stakeholder live, without breaking momentum on the existing negotiation thread.
This is a refinement, not a default — only introduce it if your typical deal involves three or more stakeholders by close, which describes most mid-market and all enterprise motions. If your AEs sell single-threaded transactional deals, skip it; you are testing for skills the role does not require.
When you do introduce it, the diagnostic is simple: did the candidate try to handle both threads simultaneously (a tactical mistake under time pressure), or did they explicitly sequence — acknowledging the new stakeholder, parking the response with a defensible timeline, and continuing the existing negotiation?
The latter behavior correlates strongly with closing complex deals on forecast.
8. The Rubric: Six Dimensions, Two Scenarios, One Score
A defensible working-interview rubric scores six dimensions across both scenarios, weighted by scenario relevance:
| Dimension | Scenario A weight | Scenario B weight |
|---|---|---|
| Discovery / question quality | 30% | 10% |
| Active listening (specifically: paraphrase, label, confirm) | 20% | 15% |
| Commercial / deal-mechanics judgment | 10% | 35% |
| Multi-threading and stakeholder instinct | 10% | 20% |
| Composure under disruption | 15% | 15% |
| Closing language and commitment-orchestration | 15% | 5% |
Each dimension scored 1-to-5. Score the scenarios independently, then weight-blend to a single composite. Anything over 3.8 weighted composite is a strong-hire signal. Anything under 2.8 is a clear-no. The 2.8-to-3.8 band is where most candidates land, and that is where your structured debrief signal and reference-check rigor break the tie.
Critically: do not pool scores into a panel average without first surfacing dimensional disagreement. If the hiring manager scored discovery at 4.5 and the VP of Sales scored it at 2.5, you have a calibration problem that an average will hide. Read the dimensional spread; argue the spread; then aggregate.
9. The Anti-Patterns: Five Mistakes That Show Up In 80% Of AE Loops
Five mistakes appear in roughly 80% of the AE working-interview designs I have audited across mid-market and enterprise SaaS in 2025 and into early 2026. Avoid all of them.
- Letting the candidate sell your actual product. Tempting because it feels relevant. Disastrous because the candidate has spent 20 hours studying your product, and you end up measuring product-pitching not selling. Always use a fictional or analog product where the candidate has no memorized talk track.
- Having the panel play the buyer too aggressively. A buyer-panelist who is rude, evasive, or hostile is testing patience, not skill. Play the buyer as a real, busy, mildly skeptical executive. The skill differential surfaces against a realistic counterparty, not a hostile one.
- Skipping the calibration session before the loop. All interviewers must score 3 to 5 reference candidates against the rubric in a calibration session before they score a real candidate. Without it, "4 out of 5" means something different to each panelist, and your scores are noise.
- Letting senior leaders skip the rubric. The VP of Sales who says "I just go with my gut" is the largest single source of bad AE hires. Their gut is a 30%-validity instrument. The rubric is a 54%-validity instrument. Use both, but never the gut alone.
- Delivering feedback to losing candidates as a form letter. A two-sentence personalized feedback note to every losing finalist generates more inbound referrals over an 18-month window than any branded recruiting campaign. The cost is 4 minutes of the hiring manager's time per candidate.
10. The Compressed Variant: When You Genuinely Cannot Run 60 Minutes
There are real situations where 60 minutes is not feasible — a CRO-level hire where the candidate is interviewing at four companies simultaneously, a high-volume SDR-to-AE internal promotion loop, or a backfill where you are losing pipeline coverage every additional day.
The defensible compression: 35 minutes of live work, structured as a 5-minute brief, a 25-minute single-scenario role-play with a built-in mid-scenario disruption (the ghost stakeholder works well here), and a 5-minute self-grade. The async pre-work and written deliverable remain non-negotiable; compressing those is the false economy that compromises the loop.
If you must cut, cut the live block — never the prep.
Below 35 minutes of live work, do not call it a working interview. Call it a conversation. Score it accordingly — meaning, do not give it more than 25% of the final hiring decision weight. Lean harder on the written deliverable, references, and a paid project for the final two candidates instead.
11. The Calendar Compression Play
The single largest improvement most teams can make has nothing to do with the working-interview itself: compress the loop calendar. The benchmark, drawn from 2025 hiring data across 380 SaaS sales organizations, is offer-in-hand within 11 calendar days of first recruiter screen. Loops past 21 days lose roughly 40% of top-quartile candidates to competing offers.
Compress by collapsing stages: the working interview block, debrief, and final culture conversation should all occur in a single half-day, not across three separate calendar days. Yes, this means the hiring manager blocks a half-day. Yes, this is worth it.
The downstream cost of a slow loop — counter-offers, missed pipeline coverage, ramp delay — is 20 to 40 times the cost of a single blocked half-day.
12. What Good Looks Like: The 60/15/15/30 Half-Day Block
The recommended final loop, in a single half-day block:
- 0:00 to 0:10 — Brief and rubric walkthrough.
- 0:10 to 0:25 — Scenario A: net-new discovery role-play.
- 0:25 to 0:30 — Transition micro-break and reset.
- 0:30 to 0:45 — Scenario B: late-stage negotiation with ghost-stakeholder disruption.
- 0:45 to 1:00 — Self-grade and panel debrief.
- 1:00 to 1:15 — Reverse questioning from candidate.
- 1:15 to 1:30 — Candidate break; panel calibrates scores independently before any group discussion.
- 1:30 to 2:00 — Optional informal CRO or VP conversation, not scored.
Two hours of total candidate time on-site (or on-video), plus 2.5 hours of async pre-work in the days before. That is the load-bearing structure. Everything else — number of resume screens, length of recruiter call, reference depth — flexes around this core.
13. The One-Line Summary For Your Hiring Manager
If a hiring manager reads only one sentence of this, make it this one: a 60-minute live working interview, broken into two scenarios with a mid-scenario disruption, scored against a six-dimension rubric by a calibrated panel, and capped with a candidate self-grade, will out-predict every other component of your AE loop combined — but only if the surrounding calendar is compressed to 11 days or fewer, and only if the live block sits inside a 3.5-to-5-hour total candidate investment.
Build for that structure and you will hit the rare intersection of high selection validity and high candidate experience. Cut corners on either side and you will keep hiring AEs who interview well and ramp slowly — which is the most expensive miss in sales hiring, and the one this design exists to prevent.
14. Segment-Specific Adjustments: SMB, Mid-Market, Enterprise, And PLG-Sales Hybrids
The 60-minute structure is the universal scaffold, but the scenarios inside it must match the motion the AE will actually run. A working interview that does not mirror the segment is a working interview that tests the wrong muscles.
For an SMB AE running a 30-day cycle on a self-qualified inbound pipeline, the scenario weights flip. Scenario A becomes a high-velocity inbound triage — the candidate has six minutes to qualify, demo-position, and book a follow-up with a self-serve lead who has trialed the product.
Scenario B becomes a stalled-deal re-engagement — a prospect who went dark after a strong second call. Drop the multi-threading dimension entirely; SMB deals are single-threaded by definition. Reweight composure-under-disruption to 25% because SMB AEs handle 8 to 12 active conversations per day and constantly re-prioritize.
Total live block can compress to 45 minutes for SMB roles without losing predictive power, because the motion itself is simpler.
For a mid-market AE running 60-to-90-day cycles, the structure above (60 minutes, two scenarios, six dimensions) fits the motion exactly. This is the segment the canonical design was built for.
For an enterprise AE running 6-to-12-month cycles with 7-to-12 stakeholders, the working interview must add a third scenario: a 15-minute exec-alignment block, in which the candidate must navigate a meeting with two senior stakeholders (played by panelists) who have visibly different priorities.
The CFO wants payback inside 9 months; the CRO wants speed-to-value inside 30 days. The candidate must surface, name, and bridge the tension without losing either stakeholder. Total live block lands at 75 to 80 minutes for enterprise hires, and the loop's total candidate investment climbs to 5.5 to 6 hours — defensible because enterprise AE comp packages typically clear $400K OTE, and the cost of a mis-hire compounds across an 18-month ramp.
For a PLG-sales hybrid AE — increasingly the dominant model in 2025 and 2026 across infrastructure, dev tools, and modern data tooling — the working interview must include a "product-led handoff" scenario. The candidate is given live product-usage data for a self-serve account that has expanded across three teams, hit usage limits, and submitted a "talk to sales" form.
The candidate must convert the usage signal into a commercial conversation without alienating the engineering champion who currently controls the relationship. This is a uniquely PLG-sales skill, and traditional discovery-and-negotiation scenarios do not test it.
Match the scenario to the motion, and the working interview's predictive validity holds steady around 0.5. Mismatch them — run an enterprise-style discovery scenario for an SMB AE — and validity drops below the level of an unstructured interview.
15. Panel Composition: Three Roles, Five Eyes, Calibrated Scoring
The working interview only works if the panel is built correctly. Five interviewers is the ceiling; below three is a single-point-of-failure design.
The defensible composition for a mid-market AE loop:
- The hiring manager (always) — owns the rubric and the final decision. Scores all dimensions.
- A peer AE one level up — scores commercial judgment and multi-threading. The peer dimension is often the strongest signal because the peer recognizes craft and shortcuts a hiring manager will miss.
- A sales engineer or solutions architect (where applicable) — scores discovery quality and technical-fit instinct. SEs see candidates from a different angle and catch product-pitchers the hiring manager misses.
- A cross-functional stakeholder (typically a CSM or Account Manager) — scores post-sale instinct and stakeholder respect. Strong AEs treat the post-sale handoff as part of the sale; weak AEs treat it as someone else's problem. The CSM hears that difference inside 60 seconds.
- A senior leader (VP of Sales, CRO, or Head of Revenue) — observer only, not scoring. Their job is to catch panel calibration drift and to provide a tiebreaker on the rare close-call decisions. Letting them score actively introduces hierarchical bias that contaminates the rubric.
All five panelists must complete a 45-minute calibration session before scoring real candidates. The session: score three pre-recorded role-plays (one clear-hire, one clear-no, one borderline), then discuss dimensional scoring spread. Without calibration, panel scores are noise; with it, panel scores cluster within 0.5 standard deviations and the rubric does its job.
16. The Async Pre-Work: A Closer Look At What To Send 48 Hours Out
The async brief is the cheapest part of the loop to design and the most often neglected. Done well, it primes the candidate to perform at their ceiling. Done poorly, it advantages candidates with more interview experience and penalizes candidates with less time to prepare.
A defensible 48-hour brief contains exactly six artifacts, each tightly bounded:
- A one-page company persona doc — describing the fictional company the candidate will sell on behalf of. Industry, size, three flagship products, three differentiators, two competitive vulnerabilities. Should be readable in five minutes.
- An ICP account dossier — a one-page fabricated profile of the prospect company in Scenario A. Industry, headcount, funding stage, three publicly-known initiatives, two recent leadership changes. Includes a fabricated organizational chart with names, titles, and one-line bios for the six relevant stakeholders.
- A redacted prior-call transcript — 600 to 800 words of a previous discovery call between an AE (not the candidate) and the prospect's director of operations. Includes a missed opportunity and an unaddressed objection. The candidate is expected to surface both in the live role-play.
- The scoring rubric — the same one the panel will use. Dimensions and descriptions, but not weights.
- A scenario timeline — a one-line description of what happens at minutes 0, 5, 15, 25, and 30 of each scenario. Vague enough to preserve realism, specific enough to remove unnecessary surprise.
- A 12-minute pre-recorded video of the hiring manager explaining the role, the team, and the philosophy behind the loop. This is the single highest-leverage candidate-experience investment in the entire process.
Total candidate prep time should land at 60 to 90 minutes. If a candidate spends three hours preparing, they are over-investing; reduce the dossier length next iteration. If they spend 20 minutes, they are under-investing and the live block will suffer; increase the depth.
17. The Failure Mode Nobody Talks About: Panel Fatigue
Five candidates a week through a 60-minute working-interview loop will burn out a four-person panel inside six weeks. The signal degrades visibly by the third candidate of any single day, and by the second week, calibrated panelists start drifting toward leniency because they are tired of arguing dimensional scores in debriefs.
Two defensive moves keep this from collapsing the loop:
- Cap the panel at three candidates per panelist per week. Beyond three, dimensional scoring variance widens and predictive validity drops. If pipeline forces more than three candidates a week, rotate the panel: keep the hiring manager constant, rotate the other four roles.
- Schedule the loops with at least 45 minutes between candidates, not back-to-back. The 45-minute gap is for the panel to score independently, then briefly compare. Back-to-back loops compress this debrief and produce sloppy scoring. The 45-minute gap is also when the panel calibrates their internal "this is what a 3 looks like in this dimension" sense against the candidate they just saw.
Panel fatigue is the silent killer of working-interview validity at scale. The companies that hire 40-plus AEs a year and still maintain a defensible loop are not doing so because they are immune to fatigue — they are doing so because they have engineered for it.
18. Recording, Reviewing, And The Feedback Loop That Improves The Loop Itself
Every working interview should be recorded (with explicit consent from the candidate at the start of the block; in 2026 candidates expect this and view it as a positive signal of process maturity). Recording serves three functions, only one of which is candidate scoring.
The other two are the load-bearing ones for long-term loop quality. First, the recordings let you tie working-interview scores to first-year AE performance, retroactively. After 18 months, pull the recordings of the 20 AEs hired through the loop, score their first-year quota attainment, and re-watch the recordings against their actual outcomes.
You will find dimensions that over-predicted (often "closing language" scores) and dimensions that under-predicted (often the multi-threading and post-sale-instinct scores). Reweight the rubric annually based on this evidence. The loop must learn from itself or it will calcify around dimensions that feel right but do not predict.
Second, the recordings let you onboard new hires faster. The single best onboarding asset for a new AE is a 25-minute reel of "what a strong working interview looked like" — composed of three or four short clips, each demonstrating a specific dimension. Show this to every new hire in their first week.
It anchors what good looks like and shortens the time-to-first-conscious-improvement curve.
Treat the working interview as a closed-loop diagnostic system, not a one-time gate. Every cohort of hires is data. Every data point updates the rubric. The teams that compound this advantage end up with hiring loops that predict performance at validities approaching 0.65, well above the 0.54 industry meta-analytic benchmark.
19. Legal, Compensation, And Fairness Considerations You Cannot Skip
A 60-minute live working interview, combined with 2.5 hours of async pre-work, is a meaningful labor ask. In most U.S. jurisdictions, asking a candidate to perform genuinely productive work — work the company will use commercially — without compensation crosses into legal exposure under the Fair Labor Standards Act and corresponding state laws.
The defensive design choice is straightforward: ensure the working interview is entirely fictional. Fabricated company, fabricated prospect, fabricated scenarios. Nothing the candidate produces during the loop should ever touch a real customer record, a real campaign, or a real internal document.
Keep the wall absolute. Several SaaS companies between 2023 and 2025 paid six-figure settlements for blurring this line. You do not want to be the next one.
For final-round candidates only, consider a paid project as an optional alternative to a second working-interview round. A clearly-scoped four-hour project at a defensible market rate (roughly $75 to $125 per hour for AE-level work in 2026) is a strong commitment signal in both directions.
The candidate is paid for their time. The company gets a deeper sample. The boundary between selection and free labor stays clean.
On fairness, two non-negotiables. First, every candidate at the same loop stage gets the same scenario, the same brief, the same rubric, and the same time limits. Customizing the scenarios per-candidate feels generous and is statistically devastating; you cannot compare candidates if you did not measure them on the same task.
Second, accommodations must be defined and offered proactively. Candidates with documented disabilities, candidates who are not native English speakers, and candidates interviewing across multiple time zones all have legitimate accommodation needs. Offer 25% additional time on the brief, offer the option to take the live block in either morning or afternoon, and offer the choice between a single-day half-day block and a two-day split.
None of these accommodations meaningfully change the validity of the assessment, and offering them widens your top-of-funnel candidate pool without compromise.
20. The 90-Day Look-Back: How You Know The Loop Is Working
The final design choice is the one most teams skip: measuring whether the loop itself is working. The defensible measurement cadence is a 90-day look-back on every hire made through the loop, scored against four benchmarks.
- Did the new hire's first-90-day activity metrics match the panel's working-interview scores? Specifically, did candidates who scored above 4 on discovery quality actually conduct higher-quality discovery calls in their first 30 days? Pull five recorded calls per new hire and have the hiring manager grade them against the same rubric. If the correlation is below 0.3, your working-interview scoring is not predicting the real-world behavior you thought it was.
- Did the new hire ramp inside the expected window? Compare actual time-to-first-closed-deal against the working-interview composite score. The composite score should explain at least 25% of ramp variance after 90 days. If it does not, the rubric is measuring the wrong things.
- What did the new hire say about the loop itself at the 90-day mark? Run a 20-minute structured interview with every hire about their working-interview experience. What surprised them? What felt artificial? What would they keep? Their answers will update the next iteration of the loop.
- What did the losing candidates say? Of the candidates who declined an offer or were declined an offer, what percentage referred a friend within 18 months? This is the single best measure of candidate experience. A loop that produces a referral rate above 15% is a loop the market is endorsing. Below 5% and your loop is damaging your employer brand even when it is selecting the right candidates.
Run the 90-day look-back twice a year. Update the rubric, the scenarios, and the time allocations based on what you learn. The working interview is a living instrument, not a fixed gate.
The teams that treat it that way build durable hiring advantages. The teams that treat it as a fixed checklist watch their loop's predictive validity decay year over year as the market, the buyers, and the candidates themselves all shift around an unchanged design.
That is the full picture. Sixty minutes live, two scenarios, six dimensions, five-person panel, 3.5-to-5-hour total candidate investment, eleven-day calendar, annual rubric reweighting based on closed-loop performance data. Build to that standard and your AE hiring becomes the most defensible part of your revenue engine — the function that compounds quietly while everything else needs constant attention.
One closing note worth sitting with. The teams that consistently hire above the 75th percentile in AE attainment are not the teams with the most clever working-interview scenarios, the longest loops, or the most expensive recruiting tooling. They are the teams that have built a rubric they trust, a panel they have calibrated, and a discipline of measuring their own hiring outcomes with the same rigor they apply to a forecast call.
The working interview is a tool. Like every tool, its value comes from the operator. Invest equally in the design and in the operating discipline that surrounds it, and the loop will quietly pay for itself across every cohort you hire for the next decade — through every market cycle, every comp redesign, every motion shift.
Hiring quality is the deepest moat a revenue org has, and the working interview is the load-bearing wall inside that moat. Build it once, maintain it forever, and refuse to compromise on the structure under deadline pressure. That single refusal is the difference between a hiring engine and a hiring habit.