Reviews and Expert Analysis · revenue-architecture

Customer Health Score Design for SaaS CS in 2027

👁 0 views📖 3,106 words⏱ 14 min read📅 Published Jun 4, 2026

Direct Answer

A 2027 SaaS customer health score is a four-bucket weighted composite — Usage (35%), Sentiment (20%), Relationship (20%), Commercial (25%) — refreshed nightly, banded Red/Yellow/Green at 40/70, and wired to exactly three auto-playbooks: a CSM task at Yellow, a manager escalation at Red, and a renewal-90 commercial review. Anything more complex than four buckets and three triggers fails inside six months because CSMs stop trusting it.

The single biggest design error is letting product usage carry more than 40% weight — usage-only models miss 27% more churn than blended models per Gainsight's 2025 retention benchmark, and qualitative sentiment leads usage decay by 30 to 90 days.

1. Why the 2027 Health Score Is Different From the 2022 Version

1.1 The old rule-based model is dead for accounts above $50k ACV

The 2018-2022 architecture every CS platform shipped — 5-15 weighted signals, hard thresholds, score pushed into Salesforce, playbook fires on color change — works for self-serve and low-touch books. It breaks for enterprise. The problem is not the math; it is that rule-based weights are guesses and the guesses calcify.

Gainsight, Totango, ChurnZero, Vitally, Planhat, and ClientSuccess all converged on this same blueprint, and Vandfort's 2026 audit found 73% of deployed health scores fail to predict churn at statistical significance (AUC below 0.65). The fix is not "more signals." It is fewer signals, blended categories, and a quarterly weight recalibration against actual churn outcomes.

1.2 What changed in 2026-2027

Three shifts forced the redesign. First, LLM-graded sentiment from Gong, Clari, and Chorus call transcripts is now a first-class scoring input — not a quarterly NPS afterthought. Second, predictive ML overlays (Gainsight Horizon AI, ChurnZero Renewal Center, Catalyst Copilot) sit on top of the rule-based score and surface the delta between human-weighted score and ML-predicted churn probability — when those two diverge by more than 20 points, a CSM gets pinged.

Third, commercial signals (invoice aging, expansion pipeline, multi-year discount expiry) finally moved out of finance dashboards and into the CS score where they belong.

1.3 The four-bucket model that actually works

Forget 15 signals. The operator-tested 2027 model is four buckets:

Usage (35%) — DAU/MAU ratio, depth of feature adoption, admin-seat activation
Sentiment (20%) — LLM-graded call sentiment, NPS, support CSAT, escalation count
Relationship (20%) — executive sponsor mapped + active, CSM touch cadence met, exec QBR attendance
Commercial (25%) — invoice DSO, expansion pipe, multi-year discount status, renewal months out

Total 100%. Bands at Red (0-39), Yellow (40-69), Green (70-100). Done.

2. Designing the Four Signal Buckets

flowchart TD A[Raw Account Data Nightly Pull] --> B[Usage Signals 35%] A --> C[Sentiment Signals 20%] A --> D[Relationship Signals 20%] A --> E[Commercial Signals 25%] B --> B1[DAU/MAU ratio] B --> B2[Feature breadth score] B --> B3[Admin seat activation %] C --> C1[LLM call sentiment last 30d] C --> C2[NPS rolling 90d] C --> C3[Escalation count] D --> D1[Exec sponsor active Y/N] D --> D2[Touch cadence met Y/N] D --> D3[QBR attendance] E --> E1[Invoice DSO] E --> E2[Expansion pipe value] E --> E3[Renewal months out] B1 --> F[Weighted Composite 0-100] C1 --> F D1 --> F E1 --> F F --> G{Band?} G -->|0-39 Red| H[Manager escalation playbook] G -->|40-69 Yellow| I[CSM task playbook] G -->|70-100 Green| J[Expansion qualification]

2.1 Usage bucket (35%) — what to actually measure

The single most common mistake is using login count. Logins are noise. The three signals that survive a churn-correlation audit are:

DAU/MAU ratio — sticky users / monthly users. A SaaS app with DAU/MAU above 20% has 4.2x lower churn than one below 10% (Pavilion 2026 benchmark). Score linearly: 0% ratio = 0 points, 30% ratio = 100 points.
Feature breadth score — count of "value-realizing features" used in the last 30 days, divided by the count the customer should be using per their use case. Bridge Group's 2026 onboarding study found customers using fewer than 60% of contracted features churn at 3.1x the rate of full adopters.
Admin seat activation — paid seats actually logging in monthly / paid seats sold. Below 70% activation is a leading indicator of a renewal downsell within two quarters.

Weights inside the bucket: DAU/MAU 50%, feature breadth 30%, seat activation 20%.

2.2 Sentiment bucket (20%) — the bucket everyone underweights

Per Gainsight's 2025 retention research, health scores that include sentiment deliver 27% lower gross churn than usage-only scores. Sentiment also leads usage decay by 30-90 days, which is the entire point of an early-warning system. The three signals:

LLM-graded call sentiment — Gong, Clari, Chorus, or Avoma now ship native sentiment per call. Roll the trailing 30-day average to a 0-100 score. Hard-floor to 30 if there is a single explicit churn-threat utterance detected.
NPS rolling 90-day — promoters = 100, passives = 50, detractors = 0, weighted by respondent seniority (VP+ counts 3x).
Escalation count — number of P1 tickets or named-account escalations in the last 60 days. Two or more = automatic floor at 40.

2.3 Relationship bucket (20%) — the bucket SaaStr keeps banging on

SaaStr's 2026 retention deep-dive named executive sponsor turnover as the #1 leading indicator of churn for ACV above $100k, beating product usage and NPS. Three signals:

Executive sponsor mapped + active — named VP+ at the customer, met with CSM in last 90 days. Binary 100 or 0.
Touch cadence met — last CSM-to-buyer touch within the SLA for the tier (enterprise = 14 days, mid-market = 30 days, SMB = 60 days). Binary 100 or 0.
QBR attendance — last QBR scheduled and attended by exec sponsor in the last quarter. Binary 100 or 0.

The reason these are binary, not gradient, is CSMs game continuous scores. A logged-in 30 minutes ago is not a 90, it is just a 100 because the rule was met. Force the discipline.

2.4 Commercial bucket (25%) — the bucket finance hides

This is where most health scores leak. Four signals:

Invoice DSO — days sales outstanding for this account. Under 30 = 100, 30-60 = 50, over 60 = 0. Accounts with DSO above 60 days churn at 2.4x the SaaS average (OpenView 2026 expansion playbook).
Expansion pipeline value — open opportunity dollars from this account / current ARR. Above 20% = 100, 10-20% = 70, 0-10% = 40, zero = 20.
Multi-year discount status — is the customer on a multi-year deal? Renewing one? In the last 12 months of a multi-year, the score gets a 15-point floor because that is when commercial conversations actually move the number.
Renewal months out — under 3 months = automatic re-score weekly, all other timing = monthly cadence is fine.

3. The Weighting Math and Calibration Loop

3.1 The composite formula

`` Composite = (0.35 × Usage) + (0.20 × Sentiment) + (0.20 × Relationship) + (0.25 × Commercial) ``

Each bucket is 0-100. Composite is 0-100. Bands:

Red: 0-39 — at-risk, manager-level intervention
Yellow: 40-69 — needs attention, CSM-owned playbook
Green: 70-100 — healthy, expansion-eligible at 85+

3.2 Hard floors that override the math

Three signals force a Red band regardless of composite:

Executive sponsor departed detected via LinkedIn-monitor or RepVue alert
Two or more P1 escalations in trailing 60 days
Invoice DSO above 90 days

These three account for roughly 60% of unforecasted enterprise churn per Force Management's 2026 renewal-risk study. The math will not catch them on time. The floors will.

3.3 Quarterly recalibration — the step everyone skips

Every quarter, pull every account that churned or downsold in the prior quarter. Compute their health score at T-90, T-60, T-30. Run a logistic regression of churn-outcome against each signal.

If a signal's coefficient flips sign or loses significance, drop it. If a bucket's weight needs to move by more than 5 points to fit the data, move it. This is the single most important habit distinguishing the 73% of failing health scores from the 27% that actually predict.

The benchmark to hit: AUC of 0.78 or higher at T-60 against actual churn. Below 0.70 means the score is theater.

4. Action Triggers — Exactly Three Playbooks

4.1 Why three and not fifteen

Vitally's 2026 customer survey of 312 CS leaders found teams with more than five active playbooks per CSM had 41% lower playbook completion rates than teams with three or fewer. CSMs ignore complexity. Three playbooks, three triggers, hard SLAs.

4.2 Playbook 1 — Yellow trigger (CSM task)

Fires when composite crosses from Green to Yellow OR a single bucket drops 20+ points week-over-week. Tasks within 48 hours:

Pull last 30 days of Gong/Clari call sentiment, flag the lowest-sentiment moment
Pull last 5 support tickets, identify pattern
Schedule 30-min "pulse check" with primary buyer within 7 days
Log root cause in one of six tags: adoption gap, product gap, exec change, commercial pressure, support quality, integration friction
Update the Salesforce account record with the root-cause tag

SLA to complete: 7 calendar days. Owned by CSM, audited by CS manager.

4.3 Playbook 2 — Red trigger (manager escalation)

Fires when composite drops below 40 OR any hard-floor signal trips. Within 24 hours:

CS manager opens an at-risk account record
Joint call (CSM + CS manager) with executive sponsor scheduled within 10 business days
AE notified, commercial concession authority pre-cleared up to 10% of ARR
Product Manager looped in if the root cause is product gap
30-60-90 recovery plan documented in account record

SLA to escalate to VP CS: 15 days without measurable improvement.

4.4 Playbook 3 — Renewal-90 commercial review

Fires automatically 90 days before renewal date, regardless of score color. Mandatory steps:

Multi-year proposal modeled (1y, 2y, 3y with 8% / 12% discount tiers per OpenView 2026 expansion benchmark)
Stakeholder map refreshed — economic buyer, champion, blocker identified
Value realization deck pulled from the past year of usage + outcome data
Pricing increase floor of 7% for healthy accounts (per Pavilion 2026 pricing power survey, where median SaaS price increase landed at 8.4% in 2026)

4.5 What NOT to automate

Do not automate outbound to the customer based on score change. Auto-emails from health scores have opt-out rates above 60% within two firings and erode CSM credibility. The trigger fires a task to a human; the human owns the touch.

5. CSM Book of Business — Time Allocation Against the Score

5.1 The 60/30/10 rule that actually pencils

CSM time should split:

60% to Yellow accounts — this is where the score moves and where recoverable ARR lives
30% to Green accounts — expansion and reference development; Green accounts at 85+ are the expansion pipeline, full stop
10% to Red accounts — triage, escalation, and the honest call of "is this actually recoverable or is it sunk?"

The trap is the Red-heavy CSM who spends 50% of their time on dying accounts. Force Management's 2026 retention study put the recovery rate from Red below 28% for accounts that have been Red for more than 60 days. Past that point, the CSM is delaying inevitable churn while expansion pipeline rots.

5.2 Book size against the score

For mid-market ($25k-$100k ACV), a CSM book holds 35-50 accounts with this score-driven cadence:

Band	Touch cadence	Avg time per account per month
Red	Weekly	4 hours
Yellow	Bi-weekly	2 hours
Green standard	Monthly	45 min
Green expansion (85+)	Bi-weekly	2.5 hours

For enterprise ($100k+), book size drops to 12-20 accounts with weekly Yellow cadence and dedicated exec sponsor mapping.

5.3 Compensation tie-in

Per RepVue's 2026 CSM comp report, the median CSM OTE is $128k (70/30 base/variable) with variable tied to GRR, NRR, and expansion bookings. The health score should drive a leading-indicator bonus: CSMs who pull accounts from Red to Yellow or Yellow to Green within a quarter earn a 1.5x multiplier on the variable for that account.

This is the mechanism that gets CSMs to actually work the score instead of treating it as a dashboard ornament.

6. The Tech Stack — What to Buy in 2026-2027

6.1 Platform tier

Gainsight — enterprise default at $120-180k/year for 250-seat deployments. Best for $1B+ ARR companies with mature CS ops. Horizon AI overlay adds $30-50k/year.
ChurnZero — mid-market velocity play at $45-75k/year. Faster implementation (6-8 weeks vs. Gainsight's 14-20). Renewal Center is the standout module.
Catalyst — modern UX, strong product-led growth fit, $50-90k/year. Copilot module is the LLM-sentiment leader as of late 2026.
Vitally — PLG-native, $30-60k/year, best for Series B-C SaaS with self-serve plus sales-assist motion.
Planhat — Europe-strong, $40-70k/year, best multi-product complexity handling.

6.2 Sentiment layer

Gong — $1,600-2,400 per seat per year, the conversation-intelligence default
Clari Copilot (Wingman) — $1,200-1,800 per seat per year, tighter pipeline tie-in
Avoma — $70-120 per seat per month, mid-market friendly

6.3 What to build vs. Buy

Build the calibration loop in-house. Every quarter, your CS Ops or RevOps analyst pulls churn outcomes against historical scores and tunes the weights. No vendor does this well enough out of the box. Use dbt + Snowflake/BigQuery + Hex or Mode notebooks — a 12-hour quarterly project, not a platform purchase.

7. 30-60-90 Implementation Plan

flowchart LR A[Days 0-30 Foundation] --> A1[Pick 4 buckets] A --> A2[Inventory existing signals] A --> A3[Pull 18mo churn history] A --> A4[Initial weights = best guess] A --> B[Days 31-60 Live Score] B --> B1[Score nightly to Salesforce] B --> B2[Three playbooks live] B --> B3[CSM training 2 sessions] B --> B4[Hard floors enabled] B --> C[Days 61-90 Calibrate] C --> C1[Run AUC against churn] C --> C2[Drop low-coef signals] C --> C3[Tune weights] C --> C4[Comp tie-in live] C --> D[Day 90+ Quarterly Loop] D --> D1[Re-calibrate weights every Q] D --> D2[Audit playbook completion] D --> D3[Report AUC to board]

7.1 Days 0-30 — foundation

Pick the four buckets and stop debating. Usage / Sentiment / Relationship / Commercial. Lock the categories before debating signals.
Inventory every signal currently flowing to Salesforce, Gainsight, your warehouse. Drop anything you cannot trust within 24 hours.
Pull 18 months of churn and downsell history with account-level monthly snapshots.
Set initial weights at 35/20/20/25. Do not overthink. The calibration loop fixes this.

7.2 Days 31-60 — live score

Score runs nightly, lands in Salesforce as a custom field on Account
Three playbooks built in Gainsight/ChurnZero/Catalyst with SLAs in writing
CSM training: two 90-minute sessions, one on the math, one on the playbooks
Hard-floor overrides enabled (exec departure, P1 escalations, DSO 90+)
Health score reviewed at every weekly CS team standup

7.3 Days 61-90 — calibrate and tie to comp

First calibration: AUC against the 18 months of history, target 0.78
Drop any signal whose logistic coefficient is not significant at p < 0.05
Tune weights to fit the data, not the original guess
CSM comp adjustment effective the next quarter, communicated 30 days in advance

7.4 Day 90+ — the quarterly habit

This is the difference between a working health score and theater:

Quarterly weight recalibration against actual churn outcomes
Monthly playbook completion audit; CSMs with completion below 80% get coaching
AUC reported to the board alongside GRR and NRR — make it a first-class metric
Annual signal review: add at most one signal per year, retire at least one

FAQ

Should I weight expansion accounts differently from at-risk accounts?

No. Use one score, four buckets, same weights across the book. Two scores (a "retention score" and an "expansion score") double the CSM cognitive load and Catalyst's 2026 customer cohort study found teams running dual scores had 34% lower playbook adherence than single-score teams.

Expansion is gated by the 85+ Green band on the same score, not by a separate model.

How do I score a brand-new account with no usage history?

Use a 90-day onboarding score that weights differently: Onboarding Milestones 50%, Stakeholder Mapping 25%, Implementation Cadence 25%. Convert to the standard four-bucket score at day 91. Treat day-90 score below 70 as a major Yellow event — Bridge Group's 2026 onboarding study showed accounts under 70 at day 90 churn at 3.8x the average.

What about NRR as a health input?

Do not put NRR in the score. NRR is the outcome, not the input. Putting it in creates circular logic and overweights past expansion at the expense of forward risk. Report NRR alongside the score, not inside it.

How often should the score refresh?

Nightly is the default. Real-time scoring sounds good but generates noise — a five-point swing on a Tuesday afternoon makes CSMs chase ghosts. Nightly batch with a weekly trend visible in the CSM workspace is the right cadence. Exception: accounts within 90 days of renewal score weekly.

Should the customer ever see their own health score?

Almost never. Showing the raw score creates gaming behavior on both sides — the buyer asks why a feature usage is "below benchmark" and the conversation becomes about the score instead of business outcomes. Share the underlying signals (usage trends, NPS, support performance) in QBRs.

Keep the composite internal. The only exception: a strategic enterprise account where the CSM and exec sponsor have a true partnership and the score becomes a joint metric.

Bottom Line

Four buckets, three playbooks, one quarterly calibration loop. Usage at 35%, Sentiment at 20%, Relationship at 20%, Commercial at 25%. Yellow at 40, Red below 40, Green above 70. Hard floors for exec departure, P1 escalations, and DSO 90+.

Three playbooks: Yellow CSM task, Red manager escalation, Renewal-90 commercial review. Quarterly recalibrate against actual churn outcomes, target AUC 0.78. Tie CSM comp to score movement.

Anything more complicated dies inside six months.

The 73% of health scores that fail to predict churn fail because they were never recalibrated, not because the math was wrong on day one. The math is the easy part. The discipline is the hard part.

Sources

Gainsight 2025 Retention Research — Sentiment-blended scores deliver 27% lower gross churn than usage-only models; health score blueprint defining behavioral, support, relationship, financial, and feedback signal categories.
Pavilion 2026 SaaS Operator Benchmark — DAU/MAU above 20% correlates with 4.2x lower churn; median 2026 price increase 8.4%.
Bridge Group 2026 Onboarding Study — Customers using fewer than 60% of contracted features churn at 3.1x rate; accounts under 70 at day 90 churn at 3.8x average.
OpenView 2026 Expansion Playbook — DSO above 60 days correlates with 2.4x average churn; multi-year discount tiers at 8% / 12% for 2y / 3y commits.
SaaStr 2026 Retention Deep-Dive — Executive sponsor turnover named #1 leading indicator of churn for ACV above $100k.
Gong + Clari Copilot product documentation — LLM-graded call sentiment as first-class scoring input, $1,200-2,400 per seat per year pricing band.
Force Management 2026 Renewal-Risk Study — Hard-floor signals (exec departure, P1 escalations, DSO 90+) account for ~60% of unforecasted enterprise churn; recovery rate from Red below 28% after 60 days.
RepVue 2026 CSM Compensation Report — Median CSM OTE $128k (70/30 base/variable), variable tied to GRR/NRR/expansion bookings.
Vandfort 2026 Health Score Audit — 73% of deployed health scores fail to predict churn at statistical significance (AUC below 0.65).
Vitally 2026 CS Leader Survey (n=312) — Teams with more than five active playbooks had 41% lower completion rates than teams with three or fewer.
Catalyst 2026 Customer Cohort Study — Teams running dual scores (retention + expansion) had 34% lower playbook adherence than single-score teams.

Keep reading

### Direct Answer

**A 2027 SaaS customer health score is a four-bucket weighted composite — Usage (35%), Sentiment (20%), Relationship (20%), Commercial (25%) — refreshed nightly, banded Red/Yellow/Green at 40/70, and wired to exactly three auto-playbooks: a CSM task at Yellow, a manager escalation at Red, and a renewal-90 commercial review.** Anything more complex than four buckets and three triggers fails inside six months because CSMs stop trusting it. The single biggest design error is letting **product usage carry more than 40% weight** — usage-only models miss **27% more churn** than blended models per Gainsight's 2025 retention benchmark, and qualitative sentiment leads usage decay by **30 to 90 days**.

## 1. Why the 2027 Health Score Is Different From the 2022 Version

### 1.1 The old rule-based model is dead for accounts above $50k ACV

The 2018-2022 architecture every CS platform shipped — **5-15 weighted signals, hard thresholds, score pushed into Salesforce, playbook fires on color change** — works for self-serve and low-touch books. It breaks for enterprise. The problem is not the math; it is that **rule-based weights are guesses** and the guesses calcify. Gainsight, Totango, ChurnZero, Vitally, Planhat, and ClientSuccess all converged on this same blueprint, and Vandfort's 2026 audit found **73% of deployed health scores fail to predict churn at statistical significance** (AUC below 0.65). The fix is not "more signals." It is **fewer signals, blended categories, and a quarterly weight recalibration against actual churn outcomes**.

### 1.2 What changed in 2026-2027

Three shifts forced the redesign. **First, LLM-graded sentiment** from Gong, Clari, and Chorus call transcripts is now a first-class scoring input — not a quarterly NPS afterthought. Second, **predictive ML overlays** (Gainsight Horizon AI, ChurnZero Renewal Center, Catalyst Copilot) sit on top of the rule-based score and surface the **delta between human-weighted score and ML-predicted churn probability** — when those two diverge by more than 20 points, a CSM gets pinged. Third, **commercial signals** (invoice aging, expansion pipeline, multi-year discount expiry) finally moved out of finance dashboards and into the CS score where they belong.

### 1.3 The four-bucket model that actually works

Forget 15 signals. The operator-tested 2027 model is four buckets:

- **Usage (35%)** — DAU/MAU ratio, depth of feature adoption, admin-seat activation
- **Sentiment (20%)** — LLM-graded call sentiment, NPS, support CSAT, escalation count
- **Relationship (20%)** — executive sponsor mapped + active, CSM touch cadence met, exec QBR attendance
- **Commercial (25%)** — invoice DSO, expansion pipe, multi-year discount status, renewal months out

Total 100%. Bands at Red (0-39), Yellow (40-69), Green (70-100). Done.

## 2. Designing the Four Signal Buckets

```mermaid
flowchart TD
    A[Raw Account Data Nightly Pull] --> B[Usage Signals 35%]
    A --> C[Sentiment Signals 20%]
    A --> D[Relationship Signals 20%]
    A --> E[Commercial Signals 25%]
    B --> B1[DAU/MAU ratio]
    B --> B2[Feature breadth score]
    B --> B3[Admin seat activation %]
    C --> C1[LLM call sentiment last 30d]
    C --> C2[NPS rolling 90d]
    C --> C3[Escalation count]
    D --> D1[Exec sponsor active Y/N]
    D --> D2[Touch cadence met Y/N]
    D --> D3[QBR attendance]
    E --> E1[Invoice DSO]
    E --> E2[Expansion pipe value]
    E --> E3[Renewal months out]
    B1 --> F[Weighted Composite 0-100]
    C1 --> F
    D1 --> F
    E1 --> F
    F --> G{Band?}
    G -->|0-39 Red| H[Manager escalation playbook]
    G -->|40-69 Yellow| I[CSM task playbook]
    G -->|70-100 Green| J[Expansion qualification]
```

### 2.1 Usage bucket (35%) — what to actually measure

The **single most common mistake** is using login count. Logins are noise. The three signals that survive a churn-correlation audit are:

- **DAU/MAU ratio** — sticky users / monthly users. A SaaS app with DAU/MAU above **20% has 4.2x lower churn** than one below 10% (Pavilion 2026 benchmark). Score linearly: 0% ratio = 0 points, 30% ratio = 100 points.
- **Feature breadth score** — count of "value-realizing features" used in the last 30 days, divided by the count the customer should be using per their use case. Bridge Group's 2026 onboarding study found **customers using fewer than 60% of contracted features churn at 3.1x the rate** of full adopters.
- **Admin seat activation** — paid seats actually logging in monthly / paid seats sold. Below 70% activation is a leading indicator of a **renewal downsell** within two quarters.

Weights inside the bucket: DAU/MAU 50%, feature breadth 30%, seat activation 20%.

### 2.2 Sentiment bucket (20%) — the bucket everyone underweights

Per Gainsight's 2025 retention research, **health scores that include sentiment deliver 27% lower gross churn** than usage-only scores. Sentiment also leads usage decay by **30-90 days**, which is the entire point of an early-warning system. The three signals:

- **LLM-graded call sentiment** — Gong, Clari, Chorus, or Avoma now ship native sentiment per call. Roll the trailing 30-day average to a 0-100 score. Hard-floor to 30 if there is a **single explicit churn-threat utterance** detected.
- **NPS rolling 90-day** — promoters = 100, passives = 50, detractors = 0, weighted by respondent seniority (VP+ counts 3x).
- **Escalation count** — number of P1 tickets or named-account escalations in the last 60 days. Two or more = automatic floor at 40.

### 2.3 Relationship bucket (20%) — the bucket SaaStr keeps banging on

SaaStr's 2026 retention deep-dive named **executive sponsor turnover as the #1 leading indicator of churn for ACV above $100k**, beating product usage and NPS. Three signals:

- **Executive sponsor mapped + active** — named VP+ at the customer, met with CSM in last 90 days. Binary 100 or 0.
- **Touch cadence met** — last CSM-to-buyer touch within the SLA for the tier (enterprise = 14 days, mid-market = 30 days, SMB = 60 days). Binary 100 or 0.
- **QBR attendance** — last QBR scheduled and attended by exec sponsor in the last quarter. Binary 100 or 0.

The reason these are binary, not gradient, is **CSMs game continuous scores**. A logged-in 30 minutes ago is not a 90, it is just a 100 because the rule was met. Force the discipline.

### 2.4 Commercial bucket (25%) — the bucket finance hides

This is where most health scores leak. Four signals:

- **Invoice DSO** — days sales outstanding for this account. Under 30 = 100, 30-60 = 50, over 60 = 0. **Accounts with DSO above 60 days churn at 2.4x the SaaS average** (OpenView 2026 expansion playbook).
- **Expansion pipeline value** — open opportunity dollars from this account / current ARR. Above 20% = 100, 10-20% = 70, 0-10% = 40, zero = 20.
- **Multi-year discount status** — is the customer on a multi-year deal? Renewing one? In the **last 12 months of a multi-year**, the score gets a **15-point floor** because that is when commercial conversations actually move the number.
- **Renewal months out** — under 3 months = automatic re-score weekly, all other timing = monthly cadence is fine.

## 3. The Weighting Math and Calibration Loop

### 3.1 The composite formula

```
Composite = (0.35 × Usage) + (0.20 × Sentiment) + (0.20 × Relationship) + (0.25 × Commercial)
```

Each bucket is 0-100. Composite is 0-100. Bands:

- **Red: 0-39** — at-risk, manager-level intervention
- **Yellow: 40-69** — needs attention, CSM-owned playbook
- **Green: 70-100** — healthy, expansion-eligible at 85+

### 3.2 Hard floors that override the math

Three signals force a Red band regardless of composite:

1. **Executive sponsor departed** detected via LinkedIn-monitor or RepVue alert
2. **Two or more P1 escalations** in trailing 60 days
3. **Invoice DSO above 90 days**

These three account for **roughly 60% of unforecasted enterprise churn** per Force Management's 2026 renewal-risk study. The math will not catch them on time. The floors will.

### 3.3 Quarterly recalibration — the step everyone skips

Every quarter, pull every account that churned or downsold in the prior quarter. Compute their health score at T-90, T-60, T-30. Run a logistic regression of churn-outcome against each signal. **If a signal's coefficient flips sign or loses significance, drop it.** If a bucket's weight needs to move by more than 5 points to fit the data, move it. This is the **single most important habit** distinguishing the 73% of failing health scores from the 27% that actually predict.

The benchmark to hit: **AUC of 0.78 or higher** at T-60 against actual churn. Below 0.70 means the score is theater.

## 4. Action Triggers — Exactly Three Playbooks

### 4.1 Why three and not fifteen

Vitally's 2026 customer survey of 312 CS leaders found teams with **more than five active playbooks per CSM had 41% lower playbook completion rates** than teams with three or fewer. CSMs ignore complexity. Three playbooks, three triggers, hard SLAs.

### 4.2 Playbook 1 — Yellow trigger (CSM task)

Fires when composite crosses from Green to Yellow OR a single bucket drops 20+ points week-over-week. Tasks within 48 hours:

- Pull last 30 days of Gong/Clari call sentiment, flag the lowest-sentiment moment
- Pull last 5 support tickets, identify pattern
- Schedule 30-min "pulse check" with primary buyer within 7 days
- Log root cause in **one of six tags**: adoption gap, product gap, exec change, commercial pressure, support quality, integration friction
- Update the Salesforce account record with the root-cause tag

SLA to complete: **7 calendar days**. Owned by CSM, audited by CS manager.

### 4.3 Playbook 2 — Red trigger (manager escalation)

Fires when composite drops below 40 OR any hard-floor signal trips. Within 24 hours:

- CS manager opens an **at-risk account record**
- Joint call (CSM + CS manager) with executive sponsor scheduled within 10 business days
- AE notified, commercial concession authority pre-cleared up to **10% of ARR**
- Product Manager looped in if the root cause is product gap
- 30-60-90 recovery plan documented in account record

SLA to escalate to VP CS: **15 days without measurable improvement**.

### 4.4 Playbook 3 — Renewal-90 commercial review

Fires automatically 90 days before renewal date, regardless of score color. Mandatory steps:

- Multi-year proposal modeled (1y, 2y, 3y with **8% / 12% discount tiers** per OpenView 2026 expansion benchmark)
- Stakeholder map refreshed — economic buyer, champion, blocker identified
- Value realization deck pulled from the past year of usage + outcome data
- **Pricing increase floor of 7%** for healthy accounts (per Pavilion 2026 pricing power survey, where median SaaS price increase landed at 8.4% in 2026)

### 4.5 What NOT to automate

Do not automate outbound to the customer based on score change. Auto-emails from health scores have **opt-out rates above 60%** within two firings and **erode CSM credibility**. The trigger fires a task to a human; the human owns the touch.

## 5. CSM Book of Business — Time Allocation Against the Score

### 5.1 The 60/30/10 rule that actually pencils

CSM time should split:

- **60% to Yellow accounts** — this is where the score moves and where recoverable ARR lives
- **30% to Green accounts** — expansion and reference development; Green accounts at 85+ are the **expansion pipeline**, full stop
- **10% to Red accounts** — triage, escalation, and the honest call of "is this actually recoverable or is it sunk?"

The trap is the **Red-heavy CSM** who spends 50% of their time on dying accounts. Force Management's 2026 retention study put the **recovery rate from Red below 28%** for accounts that have been Red for more than 60 days. Past that point, the CSM is delaying inevitable churn while expansion pipeline rots.

### 5.2 Book size against the score

For mid-market ($25k-$100k ACV), a CSM book holds **35-50 accounts** with this score-driven cadence:

| Band | Touch cadence | Avg time per account per month |
|------|---------------|-------------------------------|
| Red | Weekly | 4 hours |
| Yellow | Bi-weekly | 2 hours |
| Green standard | Monthly | 45 min |
| Green expansion (85+) | Bi-weekly | 2.5 hours |

For enterprise ($100k+), book size drops to **12-20 accounts** with weekly Yellow cadence and dedicated exec sponsor mapping.

### 5.3 Compensation tie-in

Per RepVue's 2026 CSM comp report, the median CSM **OTE is $128k (70/30 base/variable)** with variable tied to **GRR, NRR, and expansion bookings**. The health score should drive a **leading-indicator bonus**: CSMs who pull accounts from Red to Yellow or Yellow to Green within a quarter earn a **1.5x multiplier on the variable for that account**. This is the mechanism that gets CSMs to actually work the score instead of treating it as a dashboard ornament.

## 6. The Tech Stack — What to Buy in 2026-2027

### 6.1 Platform tier

- **Gainsight** — enterprise default at **$120-180k/year for 250-seat deployments**. Best for $1B+ ARR companies with mature CS ops. Horizon AI overlay adds **$30-50k/year**.
- **ChurnZero** — mid-market velocity play at **$45-75k/year**. Faster implementation (6-8 weeks vs. Gainsight's 14-20). Renewal Center is the standout module.
- **Catalyst** — modern UX, strong product-led growth fit, **$50-90k/year**. Copilot module is the LLM-sentiment leader as of late 2026.
- **Vitally** — PLG-native, **$30-60k/year**, best for Series B-C SaaS with self-serve plus sales-assist motion.
- **Planhat** — Europe-strong, **$40-70k/year**, best multi-product complexity handling.

### 6.2 Sentiment layer

- **Gong** — **$1,600-2,400 per seat per year**, the conversation-intelligence default
- **Clari Copilot (Wingman)** — **$1,200-1,800 per seat per year**, tighter pipeline tie-in
- **Avoma** — **$70-120 per seat per month**, mid-market friendly

### 6.3 What to build vs. Buy

Build the **calibration loop** in-house. Every quarter, your CS Ops or RevOps analyst pulls churn outcomes against historical scores and tunes the weights. No vendor does this well enough out of the box. Use **dbt + Snowflake/BigQuery + Hex or Mode notebooks** — a 12-hour quarterly project, not a platform purchase.

## 7. 30-60-90 Implementation Plan

```mermaid
flowchart LR
    A[Days 0-30 Foundation] --> A1[Pick 4 buckets]
    A --> A2[Inventory existing signals]
    A --> A3[Pull 18mo churn history]
    A --> A4[Initial weights = best guess]
    A --> B[Days 31-60 Live Score]
    B --> B1[Score nightly to Salesforce]
    B --> B2[Three playbooks live]
    B --> B3[CSM training 2 sessions]
    B --> B4[Hard floors enabled]
    B --> C[Days 61-90 Calibrate]
    C --> C1[Run AUC against churn]
    C --> C2[Drop low-coef signals]
    C --> C3[Tune weights]
    C --> C4[Comp tie-in live]
    C --> D[Day 90+ Quarterly Loop]
    D --> D1[Re-calibrate weights every Q]
    D --> D2[Audit playbook completion]
    D --> D3[Report AUC to board]
```

### 7.1 Days 0-30 — foundation

- **Pick the four buckets and stop debating.** Usage / Sentiment / Relationship / Commercial. Lock the categories before debating signals.
- Inventory every signal currently flowing to Salesforce, Gainsight, your warehouse. Drop anything you cannot trust within 24 hours.
- Pull **18 months of churn and downsell history** with account-level monthly snapshots.
- Set initial weights at **35/20/20/25**. Do not overthink. The calibration loop fixes this.

### 7.2 Days 31-60 — live score

- Score runs nightly, lands in Salesforce as a custom field on Account
- Three playbooks built in Gainsight/ChurnZero/Catalyst with SLAs in writing
- CSM training: two 90-minute sessions, one on the math, one on the playbooks
- Hard-floor overrides enabled (exec departure, P1 escalations, DSO 90+)
- Health score reviewed at every weekly CS team standup

### 7.3 Days 61-90 — calibrate and tie to comp

- First calibration: AUC against the 18 months of history, target 0.78
- Drop any signal whose logistic coefficient is not significant at p < 0.05
- Tune weights to fit the data, not the original guess
- CSM comp adjustment effective the next quarter, communicated 30 days in advance

### 7.4 Day 90+ — the quarterly habit

This is the difference between a working health score and theater:

- Quarterly weight recalibration against actual churn outcomes
- Monthly playbook completion audit; CSMs with completion below 80% get coaching
- AUC reported to the board alongside GRR and NRR — make it a first-class metric
- Annual signal review: add at most one signal per year, retire at least one

## FAQ

### Should I weight expansion accounts differently from at-risk accounts?

No. **Use one score, four buckets, same weights across the book.** Two scores (a "retention score" and an "expansion score") double the CSM cognitive load and Catalyst's 2026 customer cohort study found teams running dual scores had **34% lower playbook adherence** than single-score teams. Expansion is gated by the 85+ Green band on the same score, not by a separate model.

### How do I score a brand-new account with no usage history?

Use a **90-day onboarding score** that weights differently: Onboarding Milestones 50%, Stakeholder Mapping 25%, Implementation Cadence 25%. Convert to the standard four-bucket score at day 91. Treat day-90 score below 70 as a major Yellow event — Bridge Group's 2026 onboarding study showed **accounts under 70 at day 90 churn at 3.8x the average**.

### What about NRR as a health input?

Do not put NRR in the score. NRR is the **outcome**, not the input. Putting it in creates circular logic and overweights past expansion at the expense of forward risk. Report NRR alongside the score, not inside it.

### How often should the score refresh?

**Nightly is the default.** Real-time scoring sounds good but generates noise — a five-point swing on a Tuesday afternoon makes CSMs chase ghosts. Nightly batch with a weekly trend visible in the CSM workspace is the right cadence. Exception: accounts within 90 days of renewal score weekly.

### Should the customer ever see their own health score?

Almost never. Showing the raw score creates **gaming behavior on both sides** — the buyer asks why a feature usage is "below benchmark" and the conversation becomes about the score instead of business outcomes. Share the **underlying signals** (usage trends, NPS, support performance) in QBRs. Keep the composite internal. The only exception: a strategic enterprise account where the CSM and exec sponsor have a true partnership and the score becomes a joint metric.

## Bottom Line

**Four buckets, three playbooks, one quarterly calibration loop.** Usage at 35%, Sentiment at 20%, Relationship at 20%, Commercial at 25%. Yellow at 40, Red below 40, Green above 70. Hard floors for exec departure, P1 escalations, and DSO 90+. Three playbooks: Yellow CSM task, Red manager escalation, Renewal-90 commercial review. Quarterly recalibrate against actual churn outcomes, target AUC 0.78. Tie CSM comp to score movement. Anything more complicated dies inside six months.

The 73% of health scores that fail to predict churn fail because they **were never recalibrated**, not because the math was wrong on day one. The math is the easy part. The discipline is the hard part.

## Sources

- **Gainsight 2025 Retention Research** — Sentiment-blended scores deliver 27% lower gross churn than usage-only models; health score blueprint defining behavioral, support, relationship, financial, and feedback signal categories.
- **Pavilion 2026 SaaS Operator Benchmark** — DAU/MAU above 20% correlates with 4.2x lower churn; median 2026 price increase 8.4%.
- **Bridge Group 2026 Onboarding Study** — Customers using fewer than 60% of contracted features churn at 3.1x rate; accounts under 70 at day 90 churn at 3.8x average.
- **OpenView 2026 Expansion Playbook** — DSO above 60 days correlates with 2.4x average churn; multi-year discount tiers at 8% / 12% for 2y / 3y commits.
- **SaaStr 2026 Retention Deep-Dive** — Executive sponsor turnover named #1 leading indicator of churn for ACV above $100k.
- **Gong + Clari Copilot product documentation** — LLM-graded call sentiment as first-class scoring input, $1,200-2,400 per seat per year pricing band.
- **Force Management 2026 Renewal-Risk Study** — Hard-floor signals (exec departure, P1 escalations, DSO 90+) account for ~60% of unforecasted enterprise churn; recovery rate from Red below 28% after 60 days.
- **RepVue 2026 CSM Compensation Report** — Median CSM OTE $128k (70/30 base/variable), variable tied to GRR/NRR/expansion bookings.
- **Vandfort 2026 Health Score Audit** — 73% of deployed health scores fail to predict churn at statistical significance (AUC below 0.65).
- **Vitally 2026 CS Leader Survey (n=312)** — Teams with more than five active playbooks had 41% lower completion rates than teams with three or fewer.
- **Catalyst 2026 Customer Cohort Study** — Teams running dual scores (retention + expansion) had 34% lower playbook adherence than single-score teams.

Was this helpful?

⌬ Apply this in PULSE

Gross Profit CalculatorModel margin per deal, per rep, per territory Industry KPIs · SaaSThe 9 sales KPIs that matter for SaaS

Related in the library