Question 1

What is LLM fine-tuning?

Accepted Answer

Fine-tuning trains a pre-existing model on your specific examples to customize its behavior. Unlike prompt engineering, fine-tuned behavior is baked into the model weights, allowing shorter prompts and more consistent outputs.

Question 2

When does fine-tuning make financial sense?

Accepted Answer

Fine-tuning pays off when: (1) you need a very long system prompt (3,000+ tokens) repeated on every call, (2) you have high request volume (50,000+/month), and (3) you have quality training data (1,000+ examples). The break-even is typically 1–3 months.

Question 3

How many training examples do I need?

Accepted Answer

OpenAI recommends starting with 50–100 examples and iterating. For production-quality fine-tuning, 1,000–10,000 examples typically gives the best results. Quality matters more than quantity — diverse, high-quality examples outperform large low-quality datasets.

Question 4

What are alternatives to fine-tuning?

Accepted Answer

Prompt engineering (adding examples and instructions to your system prompt) is free but costs more per call. RAG retrieves relevant context dynamically. Prompt caching reduces the cost of long repeated prompts by up to 90%. Each has different break-even points.

Question 5

Can I fine-tune open-source models instead?

Accepted Answer

Yes. Fine-tuning Llama 3 8B or Mistral 7B on services like Modal, RunPod, or Replicate typically costs $10–200 for training and near-zero for self-hosted inference. This can be 10× cheaper than OpenAI fine-tuning at high volume.

Fine-Tuning Cost Calculator

Training Configuration

Fine-Tuning Cost — FAQ

Our Offices