The Fine-Tuning Trap: Why Context Engineering Beats Custom Models Pre-PMF
Fine-tuning is easier than ever. That's the problem. Here's why we use 100KB prompts instead.

The Fine-Tuning Trap: Why Context Engineering Beats Custom Models Pre-PMF
Fine-tuning is easier than ever. That's the problem.
The Temptation
Every team building AI hits the same question: "Why not just train our own model?"
The pitch is compelling:
- Specialized behavior
- Consistent outputs
- Potentially lower inference costs
- "We'll have our own model"
And fine-tuning has never been easier. OpenAI, Anthropic, and Google all offer fine-tuning APIs. LoRA makes it cheap. The tools are mature.
So why do we at QuantumFabrics rarely fine-tune?
The Trap
Model iteration limits product iteration.
Fine-tuning bakes behavior into weights. Weights are expensive to change. The typical cycle:
- Collect training data (days)
- Fine-tune model (hours to days)
- Evaluate results (days)
- Iterate (repeat from step 1)
Total: weeks to months per iteration.
Meanwhile, your product requirements change weekly. Your users discover edge cases. Your business pivots.
Pre-PMF, you need to iterate fast. Fine-tuning locks you in.
The Alternative: Context Engineering
Our entire AI strategy: Don't change model weights. Change the context.
Here's how it works in production:
export function buildLiraSystemPrompt(context: PromptContext): string {
// Dynamic context injection - every request gets customized prompt
let promptText = basePromptTemplate
.replace(/{candidateId}/g, context.candidateId || "{uuid}")
.replace(/{positionId}/g, context.positionId || "{uuid}")
.replace(/{currentChatId}/g, context.chatId);
// Email-specific behavior modification
if (context.requestSource === "email" && context.emailMetadata) {
promptText += `
## Email Request Context
- Email From: ${context.emailMetadata.fromName}
- Email Subject: ${context.emailMetadata.subject}
**Formatting**: Use professional tone, avoid markdown`;
}
return contextHeader + promptText;
}
Our harness — system prompt, skills, and tools — is comprehensive. It handles:
- Dynamic context injection (userId, timezone, email metadata)
- Task-specific behavior modification
- Policy enforcement
- Output formatting rules
All without training a single model.
Why This Wins
1. Fast Iteration
Prompt changes take seconds. Deploy and test immediately. Found an edge case? Fix it in 5 minutes.
Fine-tuning: days to weeks per change.
2. Multi-Tenant Scale
Same base model serves 1M users with different contexts. Each request gets personalized prompts based on:
- User role
- Tenant settings
- Request type (chat vs email)
- Current task
With fine-tuning, you'd need separate models for each variation.
3. Provider Agnostic
Our fallback chain:
const fallbackModel1 = new ChatAnthropic({ model: "claude-sonnet-4-5" });
const fallbackModel2 = new ChatOpenAI({ model: "gpt-5.2" });
middleware: [
modelFallbackMiddleware(fallbackModel1, fallbackModel2),
modelRetryMiddleware({ maxRetries: 2 }),
]
If Anthropic is down, we fall back to OpenAI. No retraining required.
Fine-tuning locks you to one provider.
4. Cost Optimized
Use Sonnet for simple tasks, Opus only when needed. Dynamic routing based on task complexity.
With fine-tuning, you commit to one model's pricing.
The Cost Objection (Solved)
"But context engineering is expensive with long prompts!"
Not anymore. Prompt caching changes the math:
Anthropic:
- Cached tokens are 10x cheaper
- Up to 90% cost reduction
- Up to 85% latency reduction
OpenAI:
- Automatic caching for prompts ≥1,024 tokens
- No extra charge for cache writes
- 24hr retention for GPT-5.1/4.1 series
Best practice: Put static content (system prompts) at the top. Dynamic content at the bottom. Maximize cache hits.
Our 100KB prompt? Most of it hits cache. The cost is negligible.
When Fine-Tuning Makes Sense
Post-PMF, fine-tuning can make sense:
ROI Timeline:
-
10M queries/month: ROI in 3-6 months
- 1-10M queries/month: ROI in 6-12 months
- Smaller language models: Often profitable from day one
Valid Use Cases:
- Highly specialized output formats that never change
- Strict latency requirements (smaller fine-tuned model faster than large base)
- Compliance requirements needing model-level guarantees
- Post-PMF with stable requirements and high volume
Invalid Use Cases:
- Early-stage projects
- Evolving knowledge (use RAG instead)
- Open-ended or multi-task applications
- Limited resources
The rule: "If your LLM has relevant facts but needs different style/tone/format, first try prompt engineering. If that doesn't work, THEN consider fine-tuning."
Platform Implementation
AWS: Bedrock + Prompt Management
- Bedrock Prompt Management for versioned system prompts
- Bedrock Guardrails for policy enforcement
- Knowledge Bases for RAG
- Multi-model support via Bedrock
GCP: Vertex AI + Context Caching
- Vertex AI context caching (similar to Anthropic)
- Grounding with Google Search for real-time knowledge
- Prompt templates with variable injection
- Model Garden for multi-provider
Open Source: LangChain + LangGraph
What we use:
buildSystemPrompt()for dynamic context injectionTOKEN_BUDGETfor smart context window managementmodelFallbackMiddlewarefor provider resilience- Vector DB for RAG retrieval
The Hierarchy
When to use each approach:
- Prompt engineering - Instant iteration (seconds)
- RAG/retrieval - Fresh knowledge, no retraining
- Prompt caching - Cost reduction for long contexts
- Multi-provider fallback - Resilience without lock-in
- Fine-tuning - Only after PMF, stable requirements, high volume
Key Takeaways
- Fine-tuning is easier than ever. That doesn't mean you should do it.
- Model iteration limits product iteration
- Pre-PMF, optimize for iteration speed
- Prompt caching makes context engineering cost-effective
- Save fine-tuning for post-PMF, high-volume, narrow tasks
- A 100KB prompt can do what you think requires training
Sources:
- Context Engineering vs Fine-Tuning
- Anthropic Prompt Caching
- When Fine-Tuning Is Worth It
- Why Not Fine-Tune in 2025
Sources
- Context Engineering vs Fine-Tuning - When to use each approach
- Anthropic Prompt Caching - 90% cost reduction
- When Fine-Tuning Is Worth It - ROI analysis
- Why Not Fine-Tune - Pre-PMF considerations