How does Snowflake handle the cost of Anthropic + OpenAI inference at scale?

Question

Pulse RevOps · The Machine · Accepted Answer

## Direct Answer

Based on public list pricing as of Q2 2026, Snowflake Cortex passes roughly **80-90% of partner-model inference cost** straight through to customer credit consumption, retaining an estimated **10-20% margin** on the orchestration, governance, and serverless compute layer that wraps the call. The model providers (Anthropic, OpenAI, Mistral, Meta, plus Snowflake's own Arctic) get paid per-token via either direct contract or AWS Bedrock passthrough; Snowflake then converts that token cost into a credit charge billed at the customer's negotiated credit rate (typically $2-4/credit depending on edition). As inference volume scales, Snowflake protects margin through four levers: **(1) negotiated enterprise volume tiers** with Anthropic and OpenAI that beat published list pricing, **(2) a Cortex routing layer** that defaults expensive calls to cheaper models when latency/quality allows, **(3) Snowflake Arctic SLM** for high-volume low-stakes workloads where the model cost is essentially zero internal compute, and **(4) customer-side budget guardrails** that throttle runaway spend before it becomes a margin event. Actual contract pricing varies materially by customer; Bedrock passthrough fees are not always itemized publicly, so all figures below are approximations from list pricing.

## The Inference Cost Stack

- **Model API cost (the raw token bill)** — Anthropic Claude Opus 4 at ~$15 input / $75 output per 1M tokens list, Sonnet 4 at ~$3/$15, Haiku 4.5 at ~$1/$5. OpenAI GPT-5 in a similar Opus-class band, o3 priced as a reasoning premium tier, o4-mini as the cheap workhorse. This is the floor — Snowflake cannot price below it without subsidizing.
- **AWS Bedrock passthrough fee** — when Cortex routes Anthropic models via Bedrock, AWS takes a cut on top of Anthropic's wholesale rate. Not itemized on Snowflake invoices; bundled into the Cortex per-token credit conversion. Estimated low single-digit % uplift based on Bedrock list vs. Anthropic direct list.
- **Snowflake compute fee on top** — the serverless warehouse that runs the Cortex function (CORTEX.COMPLETE, CORTEX.SEARCH, etc.) burns standard credits for orchestration, embedding lookup, result assembly. Small per-call but compounds on agent loops.
- **Customer-side credit consumption rate** — the customer's effective $/credit (Standard ~$2, Enterprise ~$3, Business Critical ~$4) determines the final invoice line. Same query, different credit rate, different revenue to Snowflake on identical underlying token cost.
- **Governance + observability tax** — RBAC checks, prompt logging, PII masking, Horizon catalog hits all add micro-credits per call. Negligible per query, real at billions of queries/month.

## The Margin Math On A 1M-Token Cortex Query (Claude Opus 4 example)

*All figures approximations from public list pricing — actual contract pricing varies.*

- **Raw Anthropic cost** — ~$15 input + ~$75 output = up to ~$90 if the full 1M is split input/output worst case; a more realistic 800k input / 200k output mix lands around ~$27.
- **Bedrock passthrough uplift** — estimated low single-digit %, call it ~$0.50-$1.50 added on top.
- **Snowflake serverless compute** — sub-$1 of credits on the orchestration warehouse for a single large call.
- **Customer credit charge** — Cortex per-token credit conversion priced to recover the above plus margin; on Enterprise edition ($3/credit), the customer invoice line for that query lands in the ~$32-38 range based on published Cortex token-to-credit ratios.
- **Effective Snowflake gross margin on the orchestration layer** — estimated ~10-20% of the line item, with the bulk going to Anthropic/AWS. The margin profile is structurally thinner than Snowflake's traditional ~75% storage/compute gross margin, which is the central Cortex unit-economics tension.

## Where The Margin Pressure Lives

- **Long-context queries on Opus 4** — 200k-token context windows mean a single document-Q&A call can burn $3-5 in raw token cost. Margin-thin even before Bedrock uplift.
- **Multi-step agent loops** — Cortex Agents that chain 5-15 model calls per user request multiply the token bill linearly. One "agent question" can equal 20 chat completions.
- **RAG over large warehouses** — Cortex Search retrieves chunks, then stuffs them into the prompt; large k-values (top-50 retrieval) inflate input tokens 10-50x vs. naive prompting.
- **The customer learning curve** — new Cortex customers over-prompt for the first 60-90 days (no system prompts, no caching, max-context dumps), then optimize. Snowflake eats variable margin during the un-optimized window.
- **Prompt caching adoption gap** — Anthropic prompt caching cuts repeat-input cost ~90%, but customer adoption inside Cortex lags direct-API users; until caching is on by default, repeat RAG prompts are paid full freight.
- **Reasoning-model premium tiers** — o3 / Opus-extended-thinking calls bill output tokens at the reasoning rate, which can 3-5x a normal output bill on a sin

Model	List $/1M tokens (in/out)	Cortex effective $/credit equivalent	Estimated Snowflake margin band	Use case fit
Claude Opus 4	~$15 / ~$75	High credit burn per call	~10-15% (thinnest)	Long-context reasoning, complex agents
Claude Sonnet 4	~$3 / ~$15	Moderate	~15-20%	Default chat, RAG, mid-complexity agents
Claude Haiku 4.5	~$1 / ~$5	Low	~20-25%	Classification, extraction, routing
OpenAI GPT-5	Opus-class band	High	~10-15%	Premium reasoning, code, multimodal
OpenAI o3	Reasoning premium	Highest per output	~10%	Hard math, planning, niche reasoning
OpenAI o4-mini	Cheap workhorse	Low	~20-25%	Bulk completions, agent sub-steps
Mistral Large 2	Mid-tier	Moderate	~15-20%	EU-data-residency, multilingual
Snowflake Arctic / Arctic-Embed	Internal compute	Lowest	~50-70% (traditional Snowflake margin)	Embeddings, SQL-gen, high-volume low-stakes

How does Snowflake handle the cost of Anthropic + OpenAI inference at scale?

Direct Answer

The Inference Cost Stack

The Margin Math On A 1M-Token Cortex Query (Claude Opus 4 example)

Where The Margin Pressure Lives

The 4 Margin-Protection Levers

What Customers Are Actually Paying In 2026

Cost-Stack Reference Table

Cost-Stack Flow

Bottom Line

How does Snowflake handle the cost of Anthropic + OpenAI inference at scale?

Direct Answer

The Inference Cost Stack

The Margin Math On A 1M-Token Cortex Query (Claude Opus 4 example)

Where The Margin Pressure Lives

The 4 Margin-Protection Levers

What Customers Are Actually Paying In 2026

Cost-Stack Reference Table

Cost-Stack Flow

Bottom Line

What does the score mean?