Messages 1-10 are dropped (not deleted, just excluded from this request).
Evicted messages are lost to the AI for this turn. The AI cannot reference conversations from message #5 if only messages #11-20 fit.Future enhancement: Summarization will condense old messages instead of dropping them.
defp inject_decision_context(system_parts, session_id) do case Loom.Decisions.ContextBuilder.build(session_id) do {:ok, context} when is_binary(context) and context != "" -> system_parts ++ [context] _ -> system_parts endend
What gets injected:
## Active Goals- Add authentication to the API (confidence: 85%)- Improve error handling (confidence: 90%)## Recent Decisions- [decision] Use JWT tokens for auth- [action] Implement JWT middleware## Session Context[2026-02-28 10:30] goal: Add authentication to the API[2026-02-28 10:35] decision: Use JWT tokens for auth (confidence: 85%)
Budget: Up to 1,024 tokens (truncated if exceeded).
defp inject_repo_map(system_parts, project_path, opts) do case Loom.RepoIntel.RepoMap.generate(project_path, opts) do {:ok, repo_map} when is_binary(repo_map) and repo_map != "" -> system_parts ++ [repo_map] _ -> system_parts endend
def model_limit(model_string) when is_binary(model_string) do case LLMDB.model(model_string) do {:ok, %{limits: %{context: context}}} when is_integer(context) and context > 0 -> context _ -> @default_context_limit # 128,000 endend
build_messages(messages, system_prompt, max_repo_map_tokens: 1024 # Half the default)
Trade-off: Less repository context, faster generation.
Use smaller models for simple tasks
Smaller context windows = faster, cheaper:
Haiku 4-5: 200K tokens
GPT-4o-mini: 128K tokens
Both are 10x cheaper than flagship models.
Archive old sessions
Long-running sessions eventually hit the window limit. Start fresh:
# Mark old session as archivedLoom.Session.Persistence.update_session(session, %{status: :archived})# Start new session for the same projectLoom.Session.Manager.start_session( project_path: old_session.project_path, title: "Continue: #{old_session.title}")
Monitor token usage stats
Check session stats to see if you’re hitting limits:
avg_tokens_per_message = 125max_messages = div(budget.history, avg_tokens_per_message)IO.puts("Can fit ~#{max_messages} messages")# => Can fit ~1509 messages
If you set max_repo_map_tokens: 10000, you steal tokens from history. Balance wisely:
# Good: Modest increasesmax_repo_map_tokens: 3072 # +1024 from defaultmax_decision_tokens: 2048 # +1024 from default# History: Still ~185K tokens# Bad: Excessive allocationmax_repo_map_tokens: 50000 # Massive mapmax_decision_tokens: 10000 # Huge decision context# History: Only ~127K tokens left (lost 60K!)
Use architect mode for context-heavy tasks
Architect mode splits context across two calls:
Architect call: Full context (200K tokens)
Editor calls: Focused context per step (~10K tokens)
Total tokens used can be less than a single normal mode call.
Keep system prompts concise
The base system prompt counts against the 2048-token budget. Keep it under 1000 tokens:
# Good: ~500 tokenssystem_prompt = """You are Loom, an AI coding assistant.Project: #{project_path}Model: #{model}Guidelines:- Read files before editing- Explain your reasoning- Make minimal, focused changes"""# Bad: ~3000 tokens (steals from other zones!)system_prompt = """[20 paragraphs of detailed instructions...]"""
Monitor evicted message count
If users complain “the AI forgot something,” check how many messages are evicted:
total_messages = length(messages)windowed_messages = build_messages(...) |> length() - 1 # -1 for system msgevicted = total_messages - windowed_messagesif evicted > 50 do IO.warn("#{evicted} messages evicted—consider summarization or archiving")end