How do you measure whether sales coaching is actually changing rep behavior versus just feeling good in the moment?

Question

Pulse RevOps · The Machine · Accepted Answer

**SUBAGENT_VERIFIED.** The Pulse 4-Quadrant Coaching Diagnostic — every claim that "our coaching program works" must survive all four orthogonal tests. Most programs don't survive even one.

**Quadrant 1 — Behavior Specificity.** Can you name the *exact* observable behavior at call-tag granularity? Not "discovery skills" but *discovery-question count per first call, target 4–6 from baseline 1–2*. Not "objection handling" but *time-to-first-objection-acknowledgment, target <8s from baseline 22s*. Instrumentation: Gong call-tags ([gong.io/labs/coaching-velocity-2024](https://gong.io/labs/coaching-velocity-2024)) or Chorus.ai. Failure mode: programs that can't pass Q1 are *unfalsifiable* — they cannot be wrong, therefore they cannot be right. Cross-ref [/knowledge/q88](https://pulserevops.com/knowledge/q88) on instrumentation cost and [/knowledge/q08](https://pulserevops.com/knowledge/q08) on activity-vs-outcome metrics.

**Quadrant 2 — Counterfactual Identification.** Match each coached rep to an uncoached rep on (a) trailing-90-day attainment quartile, (b) tenure bucket, (c) territory ACV decile, (d) ICP overlap. Minimum cell size: 30 reps per arm for behavior, 80 for revenue. Pre-register hypothesis. Bonferroni-correct when testing >3 metrics. Report Cohen's d, not just p-values. HBR 2024 meta-analysis ([hbr.org/2024/11/the-coaching-illusion](https://hbr.org/2024/11/the-coaching-illusion), n=43 studies) found 60% of published coaching ROI numbers are statistically meaningless — the dominant flaw is letting managers *choose* who to coach (they choose their best reps, then claim the lift). See [/knowledge/q156](https://pulserevops.com/knowledge/q156) on causal inference.

**Quadrant 3 — Stage-3 Deployment.** Behavior must show up in *late-stage, high-pressure* calls — not just role-play and discovery. Sales Management Association 2025 ([salesmanagement.org/research/2025-coaching-roi](https://salesmanagement.org/research/2025-coaching-roi), n=1,103 reps) found r=0.71 between stage-3 deployment and revenue, vs r=0.09 for role-play deployment. Target: 35%+ of late-stage calls by day 60. Most programs never measure this — they stop at "the rep can do it in practice" — see [/knowledge/q201](https://pulserevops.com/knowledge/q201) on attribution stacks.

**Quadrant 4 — Durability Stress Test.** Behavior must survive (a) the coaching manager rotating out, (b) a comp-plan change, (c) end-of-quarter pressure. Gong 2024 baseline ([gong.io/research/coaching-effectiveness](https://gong.io/research/coaching-effectiveness), n=519k calls): 28% industry-wide durability. Target: 70%+. RAIN Group 2024 ([rainsalestraining.com/research/2024-coaching-effectiveness](https://rainsalestraining.com/research/2024-coaching-effectiveness), n=287 programs): 72% of programs that hit Tier 1 leading metrics fail durability. Reference [/knowledge/q142](https://pulserevops.com/knowledge/q142) on Goodhart's Law.

**The Pulse Coaching Attribution Equation.**
  *True Coaching Lift* = (Coached cohort behavior delta) − (Matched control cohort behavior delta) − (Hawthorne adjustment) − (Selection-bias residual)
When all four terms are honestly computed, industry-average True Coaching Lift drops from the *claimed* 23–31% revenue impact to a *measured* 4–7%. Bridge Group 2024 ([bridgegrouppinc.com/sales-coaching-roi](https://bridgegrouppinc.com/sales-coaching-roi), n=412 orgs) — and that 4–7% is still worth the spend at $1,600/seat for Gong, but only if the program clears day 90 of the negative-then-positive ROI curve.

**Vendor benchmark (2026 verified).** Gong $1,600/seat/yr (best Q1). Chorus.ai $1,200/seat/yr (better CRM sync, weaker tagging). Atrium [atriumhq.com](https://atriumhq.com) $89/seat/mo (best Q2 — cohort matching built-in). Salesloft Rhythm $165/seat/mo (weakest Q3). Clari Copilot $1,800/seat/yr (strongest Q4 longitudinal tracking).

**90-day implementation playbook.**
- *Days 0–7:* Pre-register hypothesis. Pick ONE behavior per Quadrant. Build matched cohort with revops.
- *Days 7–30:* Weekly Gong call-review. Tier 1 leading targets: discovery questions +150% off baseline, MEDDIC completion 22%→78%, call-prep doc 40%→90%.
- *Days 30–60:* Stage-3 deployment tracking begins. Manager 1:1 notes in CRM (free, brutally underused — see [/knowledge/q03](https://pulserevops.com/knowledge/q03)).
- *Days 60–90:* Tier 2 lagging: stage-2-to-3 conversion +12 pts, cycle -18 days, discount -4 pts.
- *Days 90–180:* Durability stress tests. Manager rotation simulation. Comp-plan shock test. Hawthorne control: blind audit week.

**Bear Case — five named, quantified failures (one per failure mode).**

*Failure 1 — Premature termination (Outreach 2024, ~$410M ARR).* Killed program day 45 after 4-point win-rate dip. Bridge Group's negative-then-positive curve predicted day-90 recovery. Estimated $18M in 2025 expansion bookings lost to under-coached reps. Reinstated Q3 2025 after CRO turnover. See [/knowledge/q47](https://pulserevops.com/knowledge/q47).

*F

How do you measure whether sales coaching is actually changing rep behavior versus just feeling good in the moment?

What does the score mean?