How does Snowflake handle the cost of Anthropic + OpenAI inference at scale?
Direct Answer
Based on public list pricing as of Q2 2026, Snowflake Cortex passes roughly 80-90% of partner-model inference cost straight through to customer credit consumption, retaining an estimated 10-20% margin on the orchestration, governance, and serverless compute layer that wraps the call. The model providers (Anthropic, OpenAI, Mistral, Meta, plus Snowflake's own Arctic) get paid per-token via either direct contract or AWS Bedrock passthrough; Snowflake then converts that token cost into a credit charge billed at the customer's negotiated credit rate (typically $2-4/credit depending on edition). As inference volume scales, Snowflake protects margin through four levers: (1) negotiated enterprise volume tiers with Anthropic and OpenAI that beat published list pricing, (2) a Cortex routing layer that defaults expensive calls to cheaper models when latency/quality allows, (3) Snowflake Arctic SLM for high-volume low-stakes workloads where the model cost is essentially zero internal compute, and (4) customer-side budget guardrails that throttle runaway spend before it becomes a margin event. Actual contract pricing varies materially by customer; Bedrock passthrough fees are not always itemized publicly, so all figures below are approximations from list pricing.
The Inference Cost Stack
- Model API cost (the raw token bill) — Anthropic Claude Opus 4 at ~$15 input / $75 output per 1M tokens list, Sonnet 4 at ~$3/$15, Haiku 4.5 at ~$1/$5. OpenAI GPT-5 in a similar Opus-class band, o3 priced as a reasoning premium tier, o4-mini as the cheap workhorse. This is the floor — Snowflake cannot price below it without subsidizing.
- AWS Bedrock passthrough fee — when Cortex routes Anthropic models via Bedrock, AWS takes a cut on top of Anthropic's wholesale rate. Not itemized on Snowflake invoices; bundled into the Cortex per-token credit conversion. Estimated low single-digit % uplift based on Bedrock list vs. Anthropic direct list.
- Snowflake compute fee on top — the serverless warehouse that runs the Cortex function (CORTEX.COMPLETE, CORTEX.SEARCH, etc.) burns standard credits for orchestration, embedding lookup, result assembly. Small per-call but compounds on agent loops.
- Customer-side credit consumption rate — the customer's effective $/credit (Standard ~$2, Enterprise ~$3, Business Critical ~$4) determines the final invoice line. Same query, different credit rate, different revenue to Snowflake on identical underlying token cost.
- Governance + observability tax — RBAC checks, prompt logging, PII masking, Horizon catalog hits all add micro-credits per call. Negligible per query, real at billions of queries/month.
The Margin Math On A 1M-Token Cortex Query (Claude Opus 4 example)
*All figures approximations from public list pricing — actual contract pricing varies.*
- Raw Anthropic cost — ~$15 input + ~$75 output = up to ~$90 if the full 1M is split input/output worst case; a more realistic 800k input / 200k output mix lands around ~$27.
- Bedrock passthrough uplift — estimated low single-digit %, call it ~$0.50-$1.50 added on top.
- Snowflake serverless compute — sub-$1 of credits on the orchestration warehouse for a single large call.
- Customer credit charge — Cortex per-token credit conversion priced to recover the above plus margin; on Enterprise edition ($3/credit), the customer invoice line for that query lands in the ~$32-38 range based on published Cortex token-to-credit ratios.
- Effective Snowflake gross margin on the orchestration layer — estimated ~10-20% of the line item, with the bulk going to Anthropic/AWS. The margin profile is structurally thinner than Snowflake's traditional ~75% storage/compute gross margin, which is the central Cortex unit-economics tension.
Where The Margin Pressure Lives
- Long-context queries on Opus 4 — 200k-token context windows mean a single document-Q&A call can burn $3-5 in raw token cost. Margin-thin even before Bedrock uplift.
- Multi-step agent loops — Cortex Agents that chain 5-15 model calls per user request multiply the token bill linearly. One "agent question" can equal 20 chat completions.
- RAG over large warehouses — Cortex Search retrieves chunks, then stuffs them into the prompt; large k-values (top-50 retrieval) inflate input tokens 10-50x vs. naive prompting.
- The customer learning curve — new Cortex customers over-prompt for the first 60-90 days (no system prompts, no caching, max-context dumps), then optimize. Snowflake eats variable margin during the un-optimized window.
- Prompt caching adoption gap — Anthropic prompt caching cuts repeat-input cost ~90%, but customer adoption inside Cortex lags direct-API users; until caching is on by default, repeat RAG prompts are paid full freight.
- Reasoning-model premium tiers — o3 / Opus-extended-thinking calls bill output tokens at the reasoning rate, which can 3-5x a normal output bill on a single complex query.
The 4 Margin-Protection Levers
- Negotiated volume pricing with Anthropic / OpenAI — Snowflake is among the largest enterprise buyers of both Anthropic and OpenAI inference; enterprise tier discounts off published list are widely assumed (not publicly disclosed). The delta between list and Snowflake's wholesale is the structural margin floor.
- Cortex routing layer (cheap-model-first then escalate) — CORTEX.COMPLETE with model selection, plus newer router functions, default low-stakes calls to Haiku 4.5 / Mistral Large / Llama 3.x rather than Opus 4. Same answer quality on classification / summarization / extraction at 5-15x lower token cost.
- Snowflake Arctic SLM for high-volume low-stakes queries — Arctic and Arctic-Embed run on Snowflake's own infrastructure; no per-token payment to a third party. Margin on Arctic calls approaches Snowflake's traditional compute margin profile. Pushing embedding generation, simple classification, and SQL-gen to Arctic is the single biggest margin lever.
- Customer-side credit guardrails + budget alerts — Resource Monitors on Cortex functions, per-warehouse spend caps, Snowsight budget alerts. These don't directly improve Snowflake margin, but they prevent the customer-churn / billing-dispute outcome that follows a $400k surprise Cortex invoice — protecting long-term consumption revenue.
What Customers Are Actually Paying In 2026
- Bayer — public Cortex case study around clinical / research workflows; consumption commentary in Snowflake earnings has cited life-sciences as a leading Cortex vertical (specific $ not disclosed).
- Siemens — referenced by Snowflake as an enterprise Cortex adopter for industrial / IoT analytics; spend not itemized publicly.
- NYSE / ICE — long-standing Snowflake customer; Cortex usage patterns in financial-services workloads emphasized in CFO commentary as a credit-consumption tailwind.
- Capital One — referenced across Snowflake materials as a flagship financial-services account; specific Cortex spend not public, but Cortex Analyst / Search / Agents are the named expansion vectors.
- General market signal — Snowflake Q4 FY26 earnings commentary positioned Cortex as a contributor to product-revenue growth without breaking out the specific $ contribution; CFO framing emphasized that Cortex consumption is additive to (not cannibalizing) traditional warehouse spend.
Cost-Stack Reference Table
| Model | List $/1M tokens (in/out) | Cortex effective $/credit equivalent | Estimated Snowflake margin band | Use case fit |
|---|---|---|---|---|
| Claude Opus 4 | ~$15 / ~$75 | High credit burn per call | ~10-15% (thinnest) | Long-context reasoning, complex agents |
| Claude Sonnet 4 | ~$3 / ~$15 | Moderate | ~15-20% | Default chat, RAG, mid-complexity agents |
| Claude Haiku 4.5 | ~$1 / ~$5 | Low | ~20-25% | Classification, extraction, routing |
| OpenAI GPT-5 | Opus-class band | High | ~10-15% | Premium reasoning, code, multimodal |
| OpenAI o3 | Reasoning premium | Highest per output | ~10% | Hard math, planning, niche reasoning |
| OpenAI o4-mini | Cheap workhorse | Low | ~20-25% | Bulk completions, agent sub-steps |
| Mistral Large 2 | Mid-tier | Moderate | ~15-20% | EU-data-residency, multilingual |
| Snowflake Arctic / Arctic-Embed | Internal compute | Lowest | ~50-70% (traditional Snowflake margin) | Embeddings, SQL-gen, high-volume low-stakes |
*All $ figures are approximations from public list pricing as of Q2 2026. Actual customer pricing varies; Bedrock passthrough fees may not be itemized publicly.*
Cost-Stack Flow
Bottom Line
Snowflake Cortex is structurally a thinner-margin business than Snowflake's traditional storage-and-compute line — the model providers take the bulk of every partner-model dollar. The path to defending overall gross margin runs through (a) volume-negotiated wholesale rates with Anthropic / OpenAI, (b) aggressive routing to cheap models and Arctic, and (c) keeping customer consumption growing fast enough that the 10-20% orchestration margin compounds into a meaningful product-revenue line. Watch the Arctic mix-shift in future earnings — that is the single cleanest signal of whether Cortex margin is converging on the rest of the platform. *(see also: q1564, q1594, q1597, q1602)*
Sources: Anthropic pricing page, OpenAI pricing page, Snowflake Cortex pricing documentation, AWS Bedrock pricing page, Snowflake Q4 FY26 earnings commentary, Bessemer State of the Cloud, A16z AI infrastructure economics analysis.