How do you manage secrets and API keys for LLM applications?
How do you manage secrets and API keys for LLM applications?
Direct Answer
You manage secrets for LLM applications the same disciplined way you manage any production secret — never hardcode keys, store them in a dedicated secrets manager, inject them at runtime, scope them tightly, rotate them regularly, and audit every access — but LLM apps add three twists you must handle deliberately.
First, provider API keys (OpenAI, Anthropic, etc.) are high-spend credentials, so a leaked key is not just a data risk but a financial one; put them behind an AI gateway with per-key budgets and rate limits rather than handing the raw key to every service. Second, prompts and logs are a major leak vector — secrets can end up in prompt context, traces, or model outputs — so you must scrub them.
Third, agents and tools that the LLM can invoke need their own scoped, short-lived credentials so a prompt injection cannot exfiltrate a powerful key. The practical stack is a real secrets manager (HashiCorp Vault, AWS/GCP/Azure secret stores, or Doppler/Infisical), short-lived dynamic credentials where possible, an AI gateway issuing virtual keys with budgets, and secret-scanning plus log redaction to keep keys out of prompts and traces.
Why LLM apps need more than a .env file
Storing OPENAI_API_KEY=sk-... in a .env file is fine on your laptop and dangerous in production. The risks compound for AI apps:
- Provider keys are spend-bearing. A leaked OpenAI or Anthropic key can run up thousands of dollars before you notice. This is different from a leaked read-only database credential.
- Keys end up in unusual places. AI apps pass lots of text around — prompts, RAG context, traces, eval datasets — and secrets leak into all of them if you are not careful.
- Agents act autonomously. When an LLM can call tools, the credentials those tools use become reachable by anything that can manipulate the model's behavior, including prompt injection.
The fix is a layered approach: store secrets properly, inject them narrowly, gate the expensive ones, and keep them out of text.
Step 1: Use a real secrets manager, never source control
The foundation is a dedicated, encrypted secrets store with access control and audit logging. Strong choices:
- HashiCorp Vault — the gold standard for dynamic secrets, encryption-as-a-service, and fine-grained policies; it can issue short-lived dynamic credentials for databases and clouds.
- Cloud-native stores — AWS Secrets Manager, GCP Secret Manager, Azure Key Vault — tightly integrated with their platforms' IAM and rotation.
- Developer-friendly platforms — Doppler and Infisical (open-source) — sync secrets across environments and inject them at runtime with a clean DX.
- Kubernetes — use the External Secrets Operator or CSI Secrets Store driver to pull secrets from one of the above into pods, rather than checking in Kubernetes
Secretmanifests (which are only base64-encoded, not encrypted).
Whatever you choose, the rule is the same: secrets live in the manager, not in code, container images, or config files. Add secret scanning (GitHub secret scanning, Gitleaks, TruffleHog) in CI so a key never reaches a repo in the first place.
Step 2: Inject at runtime and scope tightly
Pull secrets at startup or on demand, never bake them into images:
- Inject as environment variables or mounted files at runtime from the secrets manager, so the same image runs in every environment with different injected secrets.
- Apply least privilege. A service that only calls one model provider should hold only that provider's key. A read-only RAG service should not hold a write-capable database credential.
- Prefer short-lived, dynamic credentials. Vault and cloud IAM can issue credentials that expire in minutes, so a leak has a short blast radius. Use workload identity (IRSA on AWS, Workload Identity on GKE, managed identities on Azure) so workloads authenticate without a long-lived static key at all.

Reach Kory White, Fractional CRO: 📅 Book a Quick Call · 💼 Kory on LinkedIn · 🏢 CRO Syndicate
Step 3: Put provider keys behind an AI gateway
Because LLM provider keys are expensive and shared, do not distribute the raw provider key to every microservice. Instead, hold the real key in one place and issue virtual keys through an AI gateway:
- LiteLLM, Portkey, Kong AI Gateway, or Cloudflare AI Gateway sit between your services and the model providers. They store the real provider key once and mint virtual keys per team, tenant, or service.
- Each virtual key gets its own budget, rate limit, and allowed-models policy, so a leaked virtual key can be revoked instantly and cannot run up unlimited spend.
- The gateway gives you central rotation (rotate the real key once, no service changes), spend visibility, and audit logs of who called which model.
This pattern turns a catastrophic raw-key leak into a contained, revocable virtual-key incident.
Step 4: Keep secrets out of prompts, logs, and traces
This is the AI-specific hazard. Secrets leak into text channels in ways classic apps avoid:
- Never put secrets in prompts or system messages. If a tool needs a credential, the *application* should hold it and call the tool; the model should never see the raw key.
- Redact before logging and tracing. Observability tools like Langfuse, Arize Phoenix, and LangSmith capture full prompts and responses by default. Apply redaction/PII filters so keys, tokens, and credentials are masked before storage.
- Scrub model outputs. An LLM can repeat back a secret that appeared in its context. Filter outputs and avoid feeding secrets into context in the first place.
- Guard against prompt injection reaching credentials. Give agent tools scoped, short-lived credentials and require human approval for high-risk actions, so an injected instruction cannot make a tool exfiltrate a powerful key.
Step 5: Rotate, monitor, and audit
Secrets management is continuous, not one-time:
- Rotate on a schedule and on suspicion. Automate rotation through the secrets manager or gateway; with a gateway, rotating the upstream provider key is a single operation.
- Set spend and anomaly alerts. Per-virtual-key budgets and alerts catch a leaked key by its usage spike before the bill does.
- Audit every access. Vault and cloud stores log who read which secret and when; keep these logs for incident response and compliance (SOC 2, ISO 27001).
- Have a revocation runbook. Know exactly how to revoke a virtual key, rotate the upstream key, and invalidate sessions the moment a leak is suspected.
Putting it together
A solid LLM-app secrets architecture looks like this: secrets live in Vault or a cloud secret store; workloads authenticate with workload identity and receive short-lived credentials; provider keys sit behind an AI gateway that issues budgeted virtual keys; secret scanning blocks keys from ever reaching source control; log/trace redaction keeps keys out of observability; and scheduled rotation, spend alerts, and audit logs close the loop.
That layering means no single mistake — a committed file, a leaked virtual key, a verbose trace — turns into an unbounded breach.
Frequently Asked Questions
Is it safe to store my OpenAI key in an environment variable? As a runtime injection mechanism, yes — environment variables populated from a secrets manager are a common, acceptable pattern. What is unsafe is hardcoding the key in a .env file committed to source control, baking it into a container image, or distributing the same raw provider key to many services.
Inject it at runtime, scope it, and ideally front it with an AI gateway that issues per-service virtual keys.
What is an AI gateway and how does it help with key security? An AI gateway (LiteLLM, Portkey, Kong AI Gateway, Cloudflare AI Gateway) sits between your app and model providers, holds the real provider key once, and issues virtual keys with per-key budgets and rate limits. This centralizes rotation, contains the blast radius of a leak (revoke one virtual key instead of rotating everywhere), and gives you spend visibility and audit logs.
How do I keep secrets out of LLM prompts and logs? Never pass raw credentials into prompts — have the application hold the credential and call tools on the model's behalf. Apply redaction in your observability layer (Langfuse, Phoenix, LangSmith all support masking) so keys are scrubbed before traces are stored, and filter model outputs in case a secret was echoed back.
The safest design keeps secrets entirely out of any text the model can see.
Should I use HashiCorp Vault or a cloud secrets manager? Both are good. Cloud-native stores (AWS Secrets Manager, GCP Secret Manager, Azure Key Vault) integrate seamlessly with their platform's IAM and are the easy choice if you are all-in on one cloud. Vault is the stronger pick for multi-cloud, dynamic short-lived credentials, and advanced policy control.
Developer-friendly tools like Doppler or open-source Infisical layer a clean workflow on top of either.
How do agents and tools change secrets management? Agents can autonomously invoke tools, so any credential a tool uses becomes reachable through the model — including via prompt injection. Give each tool a scoped, short-lived credential with the minimum permissions it needs, require human approval for high-impact actions, and never expose powerful keys to tools an attacker could trigger.
Treat the agent as an untrusted caller of every credential it can reach.
How often should I rotate LLM provider API keys? Rotate on a regular schedule (many teams do 30–90 days) and immediately on any suspicion of compromise. The friction of rotation drops dramatically when provider keys sit behind an AI gateway: you rotate the single upstream key and every virtual key keeps working, with no redeploys.
Pair rotation with per-key spend alerts so anomalous usage triggers an out-of-cycle rotation.
Sources
- OWASP — "Top 10 for LLM Applications" (genai.owasp.org)
- HashiCorp — "Vault: dynamic secrets and secrets management" (developer.hashicorp.com/vault)
- AWS — "Secrets Manager rotation and IAM" (docs.aws.amazon.com)
- Google Cloud — "Secret Manager best practices" (cloud.google.com/secret-manager)
- Kubernetes — "External Secrets Operator and CSI Secrets Store" (external-secrets.io)
- LiteLLM — "Virtual keys, budgets, and rate limits" (docs.litellm.ai)
- Infisical — Open-source secrets management documentation (infisical.com/docs)
- Langfuse — "Masking and PII redaction in traces" (langfuse.com/docs)
