How should Datadog rethink its observability thesis for AI buyers?
Direct Answer
Datadog's classic thesis — "unified observability for cloud-native apps" — was the right answer for the 2015-2024 buyer who lived inside Kubernetes, microservices, and SRE-led incident workflows. The 2026 AI buyer is a different persona entirely: an AI/ML platform lead or Chief AI Officer who measures success in cost-per-token, hallucination rate, agent task-success, and time-to-resolution by an autonomous agent — none of which fit a host-priced infrastructure-observability SKU. Datadog needs to evolve from "unified observability platform" to "AI workload observability + agent telemetry platform," with four mental shifts: (1) the unit of observability is the agent task or LLM call, not the host or container; (2) the buyer is the AI platform lead or CAIO, not the SRE manager; (3) the pricing primitive is per-token, per-trace, per-resolved-task, not per-host; and (4) Bits AI is repositioned from "AI assistant inside Datadog" to "the agent layer that observes other agents." The one narrative bet that risks the moat is whether Pomel can credibly claim "AI-native observability platform" before Arize, LangSmith, Helicone, or Honeycomb consolidates the AI-buyer cohort and Datadog becomes the trailing incumbent — the same pattern that hit New Relic in cloud-native observability circa 2017.
The Old Observability Thesis
- "Unified observability for cloud-native apps" — metrics, logs, traces, RUM, synthetics in one agent, one bill, one UI. Won the 2015-2024 cycle decisively.
- Cloud-native data model — every signal tagged with host, container, service, env. The mental model was: an SRE pages, opens a dashboard, drills to a trace, finds the bad deploy.
- Named legacy customers — Samsung, Nasdaq, Peloton, Whole Foods, Comcast, Citrix, Deliveroo. Bought Datadog to consolidate New Relic + Splunk + bespoke Prometheus stacks.
- Per-host + ingest-volume pricing — $15-31/host/mo APM + $0.10/GB logs. Anchored the entire SaaS observability category for a decade.
- SRE-manager buying center — the typical deal sponsor was a Director of SRE or VP Platform Engineering with a $500K-$5M observability budget line.
What AI Buyers Actually Want
- Agent-aware observability — visibility into multi-step agent traces (plan → tool call → reflection → tool call → response), not just HTTP spans. Tool calls, retries, agent handoffs, sub-agent spawns must be first-class spans.
- LLM monitoring — per-call token usage, latency, model version drift, prompt/response logging, embedding-quality drift, cost attribution by team/feature/customer.
- Cost-per-token attribution — "who burned $400K on GPT-5 last month" answered by team, feature, customer cohort, agent — without the AI team having to hand-build it in BigQuery.
- Hallucination rate tracking — automated evals on production traffic (faithfulness, groundedness, factuality) with alerts when scores drift, plus human-in-the-loop sampling for high-stakes calls.
- Agent task-success telemetry — was the agent's task actually completed? Did the user re-prompt? Did the human override? This is the real SLO of the AI era.
- Prompt + RAG observability — versioned prompts, retrieval hit-rate, retrieval relevance scores, vector-store latency, chunking-strategy A/B telemetry.
The 4 Mental Shifts Datadog Needs
- Unit of observability — from host/container to agent task + LLM call. Spans must be agent-aware. The flame graph of 2027 is a multi-step agent trace, not a microservice request trace.
- Buyer persona — from SRE Director to AI Platform Lead, Head of ML Platform, or Chief AI Officer. Different budget, different vocabulary, different procurement cycle (faster, more experimental).
- Pricing primitive — from per-host + per-GB to per-token, per-trace, per-resolved-task. Per-host is psychologically dead in serverless agent runtimes where there is no host.
- Bits AI repositioning — from "AI assistant that helps SREs query Datadog" to "the meta-agent that observes other agents" — the resolver layer that watches Cursor agents, GitHub Copilot agents, internal LangGraph agents, and resolves their failures.
The 1 Narrative Bet That Risks The Moat
- The bet: Pomel publicly claims "Datadog is the AI-native observability platform" at the next analyst day and reorgs the company around it (LLM Obs as flagship, Bits AI as primary GTM motion, AI buyer as named target persona).
- The risk: if the claim isn't backed by 12-18 months of credible AI-native product depth (per-token pricing, agent-trace primitives, evals platform, RAG observability), Arize, LangSmith, Helicone, or Honeycomb sticks the "AI-native" label first and Datadog becomes the trailing incumbent.
- Historical precedent: New Relic claimed cloud-native in 2017, didn't reorg fast enough, and Datadog ate the category. Datadog now risks being the New Relic of the AI cycle.
- Why the moat is at risk: the AI buyer cohort is forming right now (2026) and forms loyalty in 12-18 months. Whoever owns the AI/ML platform team's first observability tool owns the next decade of expansion (the same dynamic that won Datadog Kubernetes in 2017).
- The asymmetry: making the bet and being wrong costs $200-400M in misallocated R&D + a confused installed base. Not making the bet and being wrong costs the next $20B of TAM. The expected value is overwhelmingly to make the bet.
What Pomel's Pitch Should Sound Like
- Open with: "Observability was built for humans reading dashboards. In 2026, the consumer of observability data is an agent, not a human — and the workload being observed is an agent, not a microservice."
- Reframe the category: "We are no longer in the APM business. We are in the AI workload reliability business — and the AI workload is the agent."
- Name the persona: "Our buyer used to be the VP of SRE. In 2027 it is the Chief AI Officer — and we are building the platform that CAIO trusts to keep their agents from hallucinating, overspending, or silently failing."
- Make Bits AI the hero, not the sidekick: "Bits AI is not a feature inside Datadog. Bits AI is Datadog's interface to the agent era — the meta-agent that observes, debugs, and resolves the agents your company runs."
- Close with a number: "By FY28, 40% of Datadog revenue will come from AI workload observability — LLM Obs, agent telemetry, evals, and RAG observability — and it will be growing 60%+ YoY."
What Has To Change Operationally
- LLM Observability becomes the flagship product — not a sub-page under APM. Top-nav placement, dedicated PMM, dedicated sales overlay, dedicated analyst-day session, dedicated pricing page.
- Bits AI becomes the primary GTM motion — every Datadog demo opens with Bits resolving an agent failure, not with a flame graph. Bits is the wedge, not the upsell.
- Named acquisition target — Helicone or Arize — Helicone ($30-80M range) gives proxy-layer LLM observability + dev-loved brand; Arize ($300-600M range) gives ML/LLM evals depth + enterprise AI cohort. Pick one, close it in FY26.
- Reorg around the AI buyer — create an AI Workload Observability business unit with its own GM, P&L, and quota carriers. Stop selling LLM Obs as a Datadog APM cross-sell — sell it as a standalone wedge to the CAIO.
- New pricing page — per-trace, per-token, per-resolved-task SKUs published publicly. Per-host pricing buried under "infrastructure." Reverses the visual hierarchy of who Datadog sells to.
- Hire 200+ AI-native field engineers + AI solutions architects in FY26-FY27 — the SRE-trained Datadog SE cannot credibly demo agent observability to a CAIO. Rebuild the field for the new buyer.
Old Thesis vs New Thesis
| Dimension | Old Thesis (2015-2024) | New Thesis (2026-2030) | Mental Shift Required | Implementation Cost | Risk | Timeline |
|---|---|---|---|---|---|---|
| Tagline | Unified observability for cloud-native apps | AI workload observability + agent telemetry platform | Category redefinition | $80-120M marketing + analyst | Confuses installed base | FY26 H2 |
| Unit | Host, container, service | Agent task, LLM call, tool invocation | Data-model rebuild | $150-250M R&D | Backwards-compat debt | FY26-FY27 |
| Buyer | VP SRE, Director Platform | Chief AI Officer, Head of ML Platform | Field reorg + PMM rebuild | $60-100M field hiring | SE skill gap | FY26 H1 |
| Pricing | Per-host, per-GB | Per-token, per-trace, per-resolved-task | Pricing-page rewrite | $20-40M packaging | Discount cannibalization | FY26 Q3 |
| Hero product | APM | LLM Observability + Bits AI | Top-nav reorder | $40-80M product polish | APM team morale | FY26 H2 |
| Wedge | Free trial → APM expand | Bits AI agent resolution → LLM Obs land | GTM motion rewrite | $30-60M sales enablement | Ramp gap | FY27 H1 |
| M&A | Tuck-in security/CI tools | Helicone or Arize | CEO + Corp Dev focus | $30-600M deal | Integration debt | FY26 |
| Narrative bet | Consolidation play | AI-native observability platform | Pomel public reframe | Reputational | Trailing-incumbent label | Analyst Day FY26 |
Strategic Flow
Bottom Line
The classic Datadog thesis won the cloud-native decade and is still printing money — but the AI buyer is a different persona with a different unit of work, a different budget owner, and a different pricing instinct. Datadog has roughly 12-18 months to make four mental shifts (agent-task as the unit, CAIO as the buyer, per-token as the pricing primitive, Bits AI as the meta-agent) and one narrative bet (Pomel publicly reframes Datadog as the AI-native observability platform, backed by LLM Obs as flagship + a Helicone or Arize acquisition). Make the bet and Datadog owns the next $20B of TAM. Hesitate and Datadog becomes the New Relic of the AI cycle — still big, still profitable, but no longer the default answer when the next category-defining buyer signs their first observability contract.
Related: [q1674](/lab/cheap-100/q1674) (Datadog AI strategy) · [q1675](/lab/cheap-100/q1675) (Datadog growth thesis) · [q1683](/lab/cheap-100/q1683) (Datadog APM stagnation)