What are the LLM fine-tuning compute requirements in 2027?
Direct Answer
In 2027, LLM fine-tuning compute requirements depend on model size and method. Full fine-tuning Llama 4 8B: 4–8 NVIDIA H100 GPUs for 8–24 hours on 10K examples (~$2K–$8K cost). LoRA / QLoRA fine-tuning Llama 4 70B: 4 H100 GPUs for 4–12 hours (~$1K–$4K).
Full fine-tuning Llama 4 405B: 256+ H100 GPUs for days (~$100K+). Fine-tuning via OpenAI API on GPT-5o-mini: ~$3/1M training tokens, typically $5K–$50K total for a 10K-example fine-tune. The 2027 default is LoRA / QLoRA on Llama 4 70B with the unsloth or Hugging Face PEFT library — best cost/quality trade-off for most domain adaptations.
1. Method Selection
Full fine-tuning updates all model weights. Best quality; highest cost. LoRA (Low-Rank Adaptation) updates small adapter matrices. 90% of full-FT quality at 5–10% of the cost.
QLoRA quantizes the base model to 4-bit and applies LoRA on top. Lowest VRAM requirement; runs Llama 4 70B fine-tuning on a single H100. Adapters / prefix tuning — older techniques largely superseded by LoRA.
The 2027 default: QLoRA on Llama 4 70B with unsloth (2x speedup) or Hugging Face PEFT.
2. Compute Requirements by Model Size
Llama 4 8B fine-tuning:
- Full FT: 4× H100 (320 GB VRAM total), 8–24 hours, ~$300–$1,000.
- LoRA: 1× H100, 2–6 hours, ~$50–$200.
- QLoRA: 1× H100 or even RTX 4090 24GB, similar time.
Llama 4 70B fine-tuning:
- Full FT: 16× H100, 24–72 hours, ~$3K–$10K.
- LoRA: 4× H100, 6–18 hours, ~$300–$1,000.
- QLoRA: 1× H100 80GB, 8–24 hours, ~$100–$400.
Llama 4 405B fine-tuning:
- Full FT: 128–256× H100, days, ~$50K–$200K+. Reserved for serious commercial efforts.
- LoRA: 32× H100, 1–3 days, ~$5K–$20K.
- QLoRA: 8× H100, 2–5 days, ~$2K–$10K.
2.1 OpenAI API Fine-Tuning Costs
GPT-4o-mini / GPT-5o-mini: ~$3 per 1M training tokens. 10K examples × 500 tokens average = 5M tokens × 3 epochs = 15M training tokens = $45 per epoch round. Total: typically $1K–$20K for a production fine-tune.
3. Data Requirements
- 10K+ examples for meaningful gains.
- Quality > quantity — 10K high-quality examples beats 100K noisy ones.
- Stratified by use case to avoid overfitting on common patterns.
- Holdout test set (5–10% of data) for unbiased evaluation.
3.1 Synthetic Data Augmentation
See [[synthetic-data-generation]] for augmenting small real-data seeds with synthetic examples.
4. Toolchain
unsloth — Hugging Face PEFT fork with 2x training speedup; QLoRA-first. Hugging Face PEFT — production-grade parameter-efficient fine-tuning library. Axolotl — config-driven fine-tuning framework.
OpenAI fine-tuning API — managed service for GPT-5o-mini and GPT-4o-mini. Anthropic fine-tuning — limited availability; enterprise-tier. Together AI fine-tuning — managed service for Llama and Mistral.
Fireworks AI fine-tuning — managed service with strong inference integration. Modal — serverless GPU compute for custom training pipelines.
5. Cloud Compute Sourcing
For self-managed fine-tuning:
- CoreWeave — best AI-first pricing on H100 capacity.
- Lambda Labs — research-friendly; transparent pricing.
- AWS P5 (H100) — enterprise integration; higher prices.
- GCP A3 (H100) — strong Vertex AI integration.
- Modal — serverless pay-per-second.
- Runpod — community-cloud aggressive pricing.
6. The Three-Phase Workflow
Phase 1: Eval baseline. Score base model on golden eval set. This is the bar to beat.
Phase 2: Fine-tune + eval. Run fine-tuning. Score fine-tuned model on the same eval set. Compare.
Phase 3: Production rollout. Canary deploy at 5%; monitor metrics; scale if metrics hold.
FAQ
LoRA or full fine-tuning? LoRA in nearly all cases. Full FT only when you can't get the quality you need from LoRA.
Should we use unsloth? Yes — 2x speedup is real and easy to adopt.
OpenAI fine-tuning API or self-host? OpenAI for fast time-to-value under 100M training tokens; self-host above.
How many examples do we need? 10K minimum for consistent gains. Under 1K, prompt engineering wins.
Should we fine-tune a small model or use a bigger base? Often a fine-tuned small model (Llama 4 8B fine-tuned) beats a prompted large model on a specific task at 10x lower inference cost.
Bottom Line
LLM fine-tuning compute in 2027 is accessible — QLoRA on a single H100 can fine-tune Llama 4 70B in a day for $200. The discipline is data quality, eval rigor, and production rollout discipline, not raw compute. OpenAI's managed fine-tuning API is the fast-path for GPT-5o-mini; self-host Llama 4 with unsloth for cost-sensitive scale.
Sources
- Meta — Llama 4 Open-Source Release Documentation
- Hugging Face — PEFT (Parameter-Efficient Fine-Tuning) Library Reference
- Unsloth — Fine-Tuning Acceleration Library Documentation
- Axolotl — Config-Driven Fine-Tuning Framework
- OpenAI — Fine-Tuning API Documentation and Pricing
- Together AI — Fine-Tuning Reference
- Fireworks AI — Fine-Tuning Documentation
- CoreWeave — GPU Cloud Pricing
- NVIDIA — H100 Datasheet
- Modal — Serverless GPU Training Reference