Pulse ← Library
Knowledge Library · crm-cleanup
✓ Machine Certified10/10?

How do I clean a CRM that has 5 years of bad data?

4/29/2024

Declare CRM bankruptcy. Archive deals older than 12 months read-only, run a 2-week one-time cleanup on accounts with deals touched in the last 90 days, then enforce a forward-only data contract enforced by validation rules + enrichment webhooks. Total project: 4 weeks elapsed, $20K-$40K (mostly Apollo or ZoomInfo enrichment + 1 RevOps lead + 1 analyst at 0.5 FTE). The reason this beats a 6-month historical cleanup: per Validity's 2025 State of CRM Data Report, 91% of CRM data is incomplete, stale, or duplicated within 12 months, and Salesforce's own Data Quality whitepaper says ~30% of records become inaccurate per year regardless of cleanup effort. You cannot out-clean entropy. You can only out-govern it. Two of these projects in three fail because teams skip the forward contract and just deduplicate; the third fails because management refuses to enforce. Plan accordingly.

Why traditional cleanup fails (the math)

Most RevOps teams attack the symptom: "let's deduplicate, fill empty fields, standardize names." The brutal arithmetic:

ROI is not just negative; it is *structurally* negative. The fix is not more cleaning. It is a new contract.

The bankruptcy approach (real mechanics)

Step 1 - Define the active set (Day 1-2)

Important Salesforce governor caveat: a scheduled flow that touches >50K Account records in a single transaction will hit the SOQL row governor limit. Use a Queueable Apex job with batches of 200 records, or chunk the flow with a For Each loop that commits in batches. Do not learn this in production; learn it in a full-copy sandbox.

In Salesforce, create a custom field Active_Cohort__c (checkbox) and a scheduled flow that sets it = TRUE for any Account where:

Everything else is the Archive cohort. Archive cohort gets moved to a separate Salesforce record type (Account_Archive) with read-only page layouts. You are not deleting; you are removing it from list views, reports, and dashboards. This single move usually shrinks the "active" surface by 70-85%.

Step 2 - Deduplicate the active cohort (Week 1)

Use Salesforce Duplicate Management (setup docs). Create a Matching Rule with:

Then a Duplicate Rule that runs the matching rule on insert/update and fires a "Merge Required" alert. For existing dupes, run DemandTools or Cloudingo (both have free trial tiers). Master record selection rule: keep the record with (a) most recent Opportunity activity, (b) tie-break on most populated fields, (c) tie-break on oldest CreatedDate. Document this in a 1-page runbook so the merge decisions are defensible later.

Step 3 - Enrich, don't fill (Week 2)

Every active Account gets pushed through Apollo or ZoomInfo bulk enrichment API. Apollo's 2026 list price is roughly $0.10-$0.20/record at volume; ZoomInfo runs $0.40-$1.00. For 5,000 active accounts, budget $500-$5,000.

Fields to enrich (non-negotiable): Industry, Employees, Revenue_Band__c, HQ_Country__c, Tech_Stack__c (if available). Do not try to fill Notes, Description, or any free-text field with enrichment data; those are human-entered context and dirty enrichment will pollute search.

Step 4 - Forward data contract (Week 3-4)

This is the only step that compounds. Implement four hard validation rules in Salesforce (or HubSpot equivalent):

  1. Lead/Contact creation: Email != null AND Email NOT LIKE '%@gmail.com' AND Email NOT LIKE '%@yahoo.com' (block free webmail on B2B lead intake) OR Source = 'Inbound Demo'.
  2. Opportunity creation: 5 required fields enforced by validation rule - Account, Contact_Role__c, Industry__c, Stage, ACV__c. No exceptions.
  3. Stage advancement: Rule blocks Stage > 'Discovery' if Decision_Maker_Identified__c = FALSE. Same for Stage > 'Proposal' if Mutual_Action_Plan__c = NULL.
  4. Closed-Won handoff: Apex trigger (OpportunityHandoffTrigger) requires CSM checkbox CS_Handoff_Verified__c = TRUE before stage flips to 'Closed Won'. The trigger throws addError('CS handoff verification required before close') if false. CSM has 5 fields to verify; if wrong, they un-check and email the rep. Code lives in version control, deployed via SFDX/CI, never edited in production.

Plus one webhook: Apollo enrichment fires on Lead/Account creation (Zapier or native Salesforce flow with HTTP callout). New record gets industry/size in <2 minutes. Zero rep effort.

Cleanup project scope (the full 4-week plan)

WeekWorkstreamOwnerEffortDeliverable
1Define Active cohort, archive restRevOps Lead5 daysActive_Cohort__c flag populated; Archive record type live
1-2Deduplicate active accounts (DemandTools/Cloudingo)Data Analyst8 days<2% dupe rate on active cohort
2Enrich active cohort (Apollo/ZoomInfo bulk)Data Analyst3 days95%+ fill on Industry/Employees/Revenue
3Build 4 validation rules + 1 enrichment webhookRevOps + Salesforce Admin5 daysRules in production, tested in sandbox first
3-4Train reps + managers on new contractRevOps Lead2 days30-min training, written 1-pager, manager sign-off
4Run first monthly audit, calibrate rulesRevOps Lead2 daysAudit report, 3-5 rule tweaks

Budget: $20K-$40K all-in. Apollo enrichment $5K. DemandTools/Cloudingo $3K-$8K (3-month license). 0.5 FTE RevOps lead for 4 weeks (~$15K loaded). 0.5 FTE data analyst (~$8K). No external vendor required.

Governance rules (the part that compounds)

  1. Inbound prospect: Auto-enriched within 2 hours via Apollo/ZoomInfo webhook. Rep cannot create a Lead without an email domain that resolves to a company record.
  2. Outbound prospect: Rep clicks "Add to Salesforce" from Apollo/Outreach plugin (not manual entry). The plugin fills 80% of fields; rep verifies in 30 seconds.
  3. Deal creation: 5 required fields enforced at the database layer (Account, Contact Role, Industry, Stage, ACV). Validation rule fires on save. No "I'll fix it later."
  4. Deal close: Apex trigger requires CS handoff checkbox. CSM verifies 5 fields (contact email, company legal name, billing country, ACV, contract end date) before stage flips. If wrong, CSM unchecks; rep gets a Slack ping.
  5. Monthly audit: Each manager runs a saved report on their team's 10 oldest open opps. Wrong/missing fields = coaching. Two violations in 90 days = comp accelerator suspended for the quarter. Public, predictable, applied uniformly.

This is not "best practices"; it is forcing the cost of bad data onto the person who created it, in real time. That is the only mechanism that works.

Tools (with real prices)

*HubSpot equivalents in parentheses where applicable. Most of this approach maps 1:1 to HubSpot Operations Hub Enterprise.*

Bear Case (when bankruptcy fails)

This approach is wrong in three scenarios:

  1. You are in a regulated industry where archived records still need active monitoring - e.g., pharma deal records under HIPAA, financial services under FINRA, or EU customers under GDPR retention rules. Archiving without active governance can violate retention policy. Fix: archive to a compliant cold-storage system (Salesforce Big Objects, S3 with Glacier, or your records-management platform) with an audit log, not just a record-type change. SOC2/SOX auditors will ask for the audit log on day 1; if you cannot produce one, you have a finding.
  2. Your old data is your moat - if you sell to the same accounts repeatedly (renewals + expansion at >70% of revenue), the historical context (champions who left, product complaints, deal stalls) is institutional knowledge. Archiving it kills the AE's first-call advantage. Fix: keep the *contact* and *activity* history hot for any account that ever signed a contract; only archive cold prospects. Quick math: if 70% of next year's revenue comes from existing accounts and the average AE saves 30 minutes of research per renewal call by having the history hot, on a 200-rep org with 10 renewal calls/quarter that is 100,000 minutes (~$200K of fully-loaded rep time) - more than the cleanup budget.
  3. Your reps will revolt and game the validation rules - if the org has weak management, reps will create fake Industry = &#39;Other&#39; and ACV = 1 placeholder records to bypass rules. Common gaming patterns: Industry = &quot;Other&quot; on 60% of new accounts, ACV = 1 placeholder records that get fixed "later" (never), copy-pasted Decision_Maker_Identified flags with no actual person named, fake Mutual_Action_Plan = &quot;TBD&quot; text. The rules become noise, the data gets worse, trust collapses. Fix: do not roll out validation rules without a 90-day audit + consequence plan. If your sales leaders won't enforce, do not start. The rules without enforcement are negative-value.

If any of those three apply, do not declare bankruptcy. Do a slower, surgical cleanup with a vendor (Cloudingo + a 2-person consulting team for 8 weeks, $60K-$100K), and skip the validation rules until you have management buy-in.

Success metrics (90 days post-cleanup)

What NOT to do

What 10/10 success looks like at month 6

Action (this week)

  1. Pull a count of accounts where LastActivity &gt; TODAY() - 90. That is your active cohort size.
  2. Run a duplicate report: SELECT Name, Website, COUNT(Id) FROM Account GROUP BY Name, Website HAVING COUNT(Id) &gt; 1. That is your dupe baseline.
  3. Quote Apollo or ZoomInfo for bulk enrichment of N records (where N = active cohort size).
  4. Block 4 weeks on the calendar. Get CRO sponsor sign-off in writing. Start Monday of week 1.

Related reading: see /knowledge/q110 for stack selection (Outreach/Salesloft/Apollo), /knowledge/q112 for attribution implications of clean data, /knowledge/q115 for the org chart that owns this work, /knowledge/q300 for how clean pipeline data shows up in forecast reliability, and /knowledge/q250 for what CSM notes + product usage data should feed back into the CRM post-handoff.

flowchart LR A[5 Years Bad Data] --> B[Define Active Cohort] B --> C[Archive Rest Read-Only] C --> D[Deduplicate Active] D --> E[Bulk Enrich via Apollo] E --> F[4 Validation Rules] F --> G[Enrichment Webhook] G --> H{30 Days In} H -->|Data Quality| I[95%+ Complete] H -->|Rep Hours| J[CRM Time Halved] H -->|Forecast| K[Variance -200bps] I --> L[Trust Restored] J --> L K --> L

TAGS: crm-cleanup, data-quality, salesforce-dedup, data-governance, database-maintenance, validity-demandtools, apollo-enrichment

Download:
Was this helpful?  
Sources cited
salesforce.comhttps://www.salesforce.com/products/sales-cloud/salesforce.comhttps://www.salesforce.com/products/einstein/bvp.comhttps://www.bvp.com/atlas/state-of-the-cloud-2026joinpavilion.comhttps://www.joinpavilion.com/compensation-reportbridgegroupinc.comhttps://www.bridgegroupinc.com/blog/sales-development-reportgartner.comhttps://www.gartner.com/en/sales/research
⌬ Apply this in PULSE
Free CRM · Revenue IntelligenceAudit pipeline, score reps, ship the fix
Deep dive · related in the library
crm-hygiene · data-qualityWhat CRM hygiene rules prevent forecast garbage-in-garbage-out failures?snowflake · databricksCan Snowflake compete with Databricks in 2027?crm-hygiene · revopsWhat's the right cadence and structure for CRM data quality SLAs: who owns it, what's measured weekly, and what triggers a remediation sprint?cro · salesforceWhat's the operator playbook for a new CRO inheriting a Salesforce instance with 4 years of dirty data — what gets fixed in week one, month one, quarter one?win-loss-pitfalls · program-designHow do we avoid common pitfalls in win-loss program design and execution?health-score-accuracy · model-validationHow do you measure and improve health-score model accuracy?ai-sales-tools · predictive-scoringAre AI sales tools (predictive lead scoring, auto-email) net positive or net distraction for mid-market ops?data-governance · CRM-hygieneWhat should a sales ops data governance framework include to prevent CRM from becoming a junk drawer?CRM-hygiene · ROI-frameworkWhat's the ROI framework for building CRM hygiene programs, and when should we stop investing?sales-tech-evaluation · rfpWhat's the right way to run a sales-tech RFP when 4 vendors all claim the same feature parity?
More from the library
screen-printing · custom-apparelHow do you start a screen printing business in 2027?pressure-washing · home-servicesHow do you start a pressure washing business in 2027?datadog · ae-careerIs a Datadog AE role still good for my career in 2027?airbnb-turnover · str-cleaningHow do you start an Airbnb turnover cleaning business in 2027?gong · avomaShould Gong acquire Avoma in 2027?volume-cronHow should Snowflake price AI assistant against Snowflake equivalent?ai-consulting · agencyHow do you start an AI consulting agency business in 2027?rideshare-fleet · delivery-fleetHow do you start a rideshare and delivery fleet business in 2027?volume-cronIs a Workato Sales Engineer role still good for my career in 2027?dog-walking · pet-servicesHow do you start a dog walking business in 2027?pdr · dent-repairHow do you start a paintless dent repair (PDR) business in 2027?gutter-cleaning · home-servicesHow do you start a gutter cleaning business in 2027?salesforce-sequencing · ai-bdrWhat replaces Salesforce sequencing if AI agents handle outbound?massage-therapy · wellnessHow do you start a massage therapy practice business in 2027?podcast-network · mediaHow do you start a podcast network in 2027?