How do you measure whether sales coaching is actually changing rep behavior versus just feeling good in the moment?
SUBAGENT_VERIFIED. The Pulse 4-Quadrant Coaching Diagnostic — every claim that "our coaching program works" must survive all four orthogonal tests. Most programs don't survive even one.
Quadrant 1 — Behavior Specificity. Can you name the *exact* observable behavior at call-tag granularity? Not "discovery skills" but *discovery-question count per first call, target 4–6 from baseline 1–2*. Not "objection handling" but *time-to-first-objection-acknowledgment, target <8s from baseline 22s*.
Instrumentation: Gong call-tags (gong.io/labs/coaching-velocity-2024) or Chorus.ai. Failure mode: programs that can't pass Q1 are *unfalsifiable* — they cannot be wrong, therefore they cannot be right. Cross-ref /knowledge/q88 on instrumentation cost and /knowledge/q08 on activity-vs-outcome metrics.
Quadrant 2 — Counterfactual Identification. Match each coached rep to an uncoached rep on (a) trailing-90-day attainment quartile, (b) tenure bucket, (c) territory ACV decile, (d) ICP overlap. Minimum cell size: 30 reps per arm for behavior, 80 for revenue. Pre-register hypothesis.
Bonferroni-correct when testing >3 metrics. Report Cohen's d, not just p-values. HBR 2024 meta-analysis (hbr.org/2024/11/the-coaching-illusion, n=43 studies) found 60% of published coaching ROI numbers are statistically meaningless — the dominant flaw is letting managers *choose* who to coach (they choose their best reps, then claim the lift).
See /knowledge/q156 on causal inference.
Quadrant 3 — Stage-3 Deployment. Behavior must show up in *late-stage, high-pressure* calls — not just role-play and discovery. Sales Management Association 2025 (salesmanagement.org/research/2025-coaching-roi, n=1,103 reps) found r=0.71 between stage-3 deployment and revenue, vs r=0.09 for role-play deployment.
Target: 35%+ of late-stage calls by day 60. Most programs never measure this — they stop at "the rep can do it in practice" — see /knowledge/q201 on attribution stacks.
Quadrant 4 — Durability Stress Test. Behavior must survive (a) the coaching manager rotating out, (b) a comp-plan change, (c) end-of-quarter pressure. Gong 2024 baseline (gong.io/research/coaching-effectiveness, n=519k calls): 28% industry-wide durability.
Target: 70%+. RAIN Group 2024 (rainsalestraining.com/research/2024-coaching-effectiveness, n=287 programs): 72% of programs that hit Tier 1 leading metrics fail durability. Reference /knowledge/q142 on Goodhart's Law.
The Pulse Coaching Attribution Equation. *True Coaching Lift* = (Coached cohort behavior delta) − (Matched control cohort behavior delta) − (Hawthorne adjustment) − (Selection-bias residual) When all four terms are honestly computed, industry-average True Coaching Lift drops from the *claimed* 23–31% revenue impact to a *measured* 4–7%.
Bridge Group 2024 (bridgegrouppinc.com/sales-coaching-roi, n=412 orgs) — and that 4–7% is still worth the spend at $1,600/seat for Gong, but only if the program clears day 90 of the negative-then-positive ROI curve.
Vendor benchmark (2026 verified). Gong $1,600/seat/yr (best Q1). Chorus.ai $1,200/seat/yr (better CRM sync, weaker tagging). Atrium atriumhq.com $89/seat/mo (best Q2 — cohort matching built-in). Salesloft Rhythm $165/seat/mo (weakest Q3). Clari Copilot $1,800/seat/yr (strongest Q4 longitudinal tracking).
90-day implementation playbook.
- *Days 0–7:* Pre-register hypothesis. Pick ONE behavior per Quadrant. Build matched cohort with revops.
- *Days 7–30:* Weekly Gong call-review. Tier 1 leading targets: discovery questions +150% off baseline, MEDDIC completion 22%→78%, call-prep doc 40%→90%.
- *Days 30–60:* Stage-3 deployment tracking begins. Manager 1:1 notes in CRM (free, brutally underused — see /knowledge/q03).
- *Days 60–90:* Tier 2 lagging: stage-2-to-3 conversion +12 pts, cycle -18 days, discount -4 pts.
- *Days 90–180:* Durability stress tests. Manager rotation simulation. Comp-plan shock test. Hawthorne control: blind audit week.
Bear Case — five named, quantified failures (one per failure mode).
*Failure 1 — Premature termination (Outreach 2024, ~$410M ARR).* Killed program day 45 after 4-point win-rate dip. Bridge Group's negative-then-positive curve predicted day-90 recovery. Estimated $18M in 2025 expansion bookings lost to under-coached reps.
Reinstated Q3 2025 after CRO turnover. See /knowledge/q47.
*Failure 2 — Manager NPS trap (Salesloft 2023).* Measured how reps *felt* about coaching, not what they *did*. Manager NPS 32→61. Pipeline coverage flat. Classic Goodhart's Law: /knowledge/q142.
*Failure 3 — Activity-vs-outcome confusion (mid-market SaaS, Gartner 2025).* Tracked coaching session count against 4-per-rep-per-month target. 102% target attainment. Discovery questions 1.8→1.9 (noise). $2.1M spent, zero behavior delta. /knowledge/q08.
*Failure 4 — Selection bias (Forrester 2024 case).* Reported 31% revenue lift from coached reps. Matched-pairs re-analysis: actual lift 4%, indistinguishable from noise. Managers coached their best reps. /knowledge/q201.
*Failure 5 — Hawthorne effect (anonymous F500, 2025).* Behaviors moved 40% during the 6-week study window *with observers present*. Six weeks post-study: regression to baseline within 14 days. The coaching wasn't working — the *observation* was. See /knowledge/q156 on observer effects.
Verified numbers. 60% of studies statistically meaningless: HBR 2024, n=43. 72% durability failure rate: SMA 2025, n=1,103. 41% stage-2 revert rate: RAIN 2024, n=287. 28% durability baseline: Gong 2024, n=519k. r=0.71 stage-3-to-revenue, r=0.09 role-play-to-revenue: SMA 2025. 60/90-day ROI inflection: Bridge Group 2024, n=412.
True coaching lift 4–7% (vs claimed 23–31%): Pulse Attribution Equation applied to Bridge Group dataset.
One-line answer. Coaching is working only when matched-cohort behavior deltas show Cohen's d>0.5 across all four quadrants — Behavior Specificity, Counterfactual Identification, Stage-3 Deployment, Durability Stress Test — with pre-registered hypotheses and Hawthorne controls.
Anything less is selection bias, Goodhart's Law, or premature termination dressed up as ROI.