Model Roles
Loom uses different models for different purposes:Default Model
The primary reasoning engine. Handles complex planning, code understanding, and tool orchestration.
Weak Model
Fast, cheap model for simple tasks like commit messages, summaries, and sub-agent searches.
Architect Model
Strong model for planning in Architect Mode. Creates detailed edit plans.
Editor Model
Fast model for executing plans in Architect Mode. Follows instructions precisely.
Switching Models
You can change models in several ways:In Configuration
Set defaults in.loom.toml:
.loom.toml
During a Session (CLI)
Switch models mid-conversation:During a Session (Web UI)
Use the model selector dropdown in the chat header.Via API
Update the model programmatically:Model Format
All models use the format<provider>:<model-id>:
Recommended Configurations
Balanced (Default)
Good mix of performance and cost:Performance-First
Best models, higher cost:Budget-Conscious
Fast, cheap models:OpenAI-Only
Using OpenAI models:Local Models
Using Groq or local providers:Model Characteristics
Anthropic Claude
Claude Opus 4-6 (most capable)- Best reasoning and planning
- Excellent code understanding
- High cost ($15 per million input tokens)
- Use for: Complex refactoring, architecture decisions, architect mode
- Strong code understanding
- Good tool use
- Moderate cost ($3 per million input tokens)
- Use for: Default agent model, general coding tasks
- Good for simple tasks
- Fast response times
- Low cost ($0.25 per million input tokens)
- Use for: Weak model, editor mode, sub-agents
OpenAI GPT
GPT-4 Turbo- Strong reasoning
- Good code generation
- Moderate cost (~$10 per million input tokens)
- Use for: Default model, architect mode
- Fast and cheap
- Good for simple tasks
- Low cost (~$0.50 per million input tokens)
- Use for: Weak model, editor mode
Google Gemini
Gemini 2.0 Flash- Very fast
- Experimental features
- Free tier available
- Use for: Exploration, testing
Groq
LLaMA 3 70B- Fast inference
- Open source
- Free tier available
- Use for: Budget-conscious workflows
Model Selection by Task
Code Review
Recommended: Claude Opus or GPT-4 Turbo Why: Requires deep understanding, pattern recognition, security awarenessRefactoring
Recommended: Claude Sonnet or GPT-4 Turbo Why: Needs code understanding and planning, but not necessarily the most expensive modelDocumentation Writing
Recommended: Claude Sonnet or GPT-3.5 Turbo Why: Simpler task, doesn’t require the strongest reasoningTest Generation
Recommended: Claude Sonnet or GPT-4 Turbo Why: Needs to understand edge cases and failure modesCodebase Exploration
Recommended: Claude Haiku or GPT-3.5 Turbo Why: Mostly reading, summarizing, and pattern matchingPerformance Considerations
Context Window Size
Different models have different context window limits:- Claude Opus 4-6: 200K tokens
- Claude Sonnet 4-6: 200K tokens
- GPT-4 Turbo: 128K tokens
- GPT-3.5 Turbo: 16K tokens
Response Speed
Faster models mean less waiting:- Fastest: Claude Haiku, Groq models (~1-2 seconds)
- Fast: Claude Sonnet, GPT-4 Turbo (~3-5 seconds)
- Slower: Claude Opus (~5-10 seconds)
Token Costs
Approximate costs per million input tokens (as of 2024):| Model | Input Cost | Output Cost |
|---|---|---|
| Claude Opus 4-6 | $15 | $75 |
| Claude Sonnet 4-6 | $3 | $15 |
| Claude Haiku 4-5 | $0.25 | $1.25 |
| GPT-4 Turbo | $10 | $30 |
| GPT-3.5 Turbo | $0.50 | $1.50 |
| Gemini 2.0 Flash | Free tier | Free tier |
| Groq LLaMA 3 70B | Free tier | Free tier |
Multi-Model Workflows
Loom’s architecture allows sophisticated multi-model patterns:Architect → Editor Pattern
Use a strong model to plan, then a fast model to execute:Main Agent + Sub-Agents
Use a strong default model with fast sub-agents for exploration:Progressive Escalation
Start with a cheap model, escalate to expensive models when needed:- Start session with Claude Haiku
- If the task is too complex,
/model anthropic:claude-sonnet-4-6 - If still struggling,
/model anthropic:claude-opus-4-6
Monitoring Costs
Loom tracks token usage and costs for each session:Via Web UI
Visit/dashboard to see:
- Per-session token usage
- Model usage breakdown
- Cumulative costs
Via CLI
Check session history:Via Database
Query SQLite directly:Best Practices
Start with the Default
The default configuration is carefully chosen:Use Weak Models Aggressively
Sub-agents, commit messages, and summaries don’t need expensive models:Match Model to Task Complexity
Don’t use Opus for simple tasks. Don’t use Haiku for complex refactoring.Monitor Your Costs
Check/dashboard regularly to understand usage patterns and optimize model selection.
Test New Models
The LLM landscape evolves quickly. Try new models as they’re released:Troubleshooting
Model Not Found
Error:Model 'xyz' not found
Solution: Check the model ID format. It must be <provider>:<model-id>:
API Key Missing
Error:Provider 'anthropic' requires ANTHROPIC_API_KEY
Solution: Set the environment variable:
Rate Limit Errors
Error:Rate limit exceeded
Solutions:
- Switch to a different provider temporarily
- Use a slower model (fewer requests)
- Wait and retry
Poor Code Quality
If the model produces low-quality code:- Try a stronger model: Switch from Haiku → Sonnet or Sonnet → Opus
- Check your prompts: Vague requests get vague results
- Use Architect Mode: Let a strong model plan, then execute
Next Steps
Architect Mode
Use two-model workflows for complex changes
Sub-Agents
Spawn lightweight agents for parallel exploration