Context Strategies

Context strategies determine how conversation history is managed when it exceeds the model’s context window. They ensure your LLM doesn’t run out of memory while preserving the most relevant conversation context.

Why Context Strategies?

LLMs have a maximum context length (e.g., 2048, 4096, or 8192 tokens). When your conversation grows beyond this limit, you need a strategy to:

Trim old messages while keeping recent context
Preserve important messages (like system prompts)
Maintain conversation coherence
Prevent out-of-memory errors

Available Strategies

React Native ExecuTorch provides three built-in context strategies:

NoopContextStrategy

No filtering or trimming - uses the entire message history as-is.

import { NoopContextStrategy } from 'react-native-executorch/utils';

llm.configure({
  chatConfig: {
    contextStrategy: new NoopContextStrategy(),
  },
});

Use when:

You manually manage conversation length
You have very short conversations
You’re certain the context won’t exceed limits

Behavior:

Prepends system prompt to the message history
No message removal or filtering
Ignores maxContextLength and token counts

Example:

// Input history: [msg1, msg2, msg3]
// System prompt: "You are helpful"
// Output: [system_prompt, msg1, msg2, msg3]

MessageCountContextStrategy

Retains a fixed number of the most recent messages.

import { MessageCountContextStrategy } from 'react-native-executorch/utils';

llm.configure({
  chatConfig: {
    contextStrategy: new MessageCountContextStrategy(10), // Keep last 10 messages
  },
});

Constructor:

new MessageCountContextStrategy(windowLength: number = 5)

Parameters:

windowLength: Maximum number of recent messages to keep (default: 5)

Use when:

You want simple, predictable context management
Message length is relatively uniform
You need fast, token-count-free trimming

Behavior:

Keeps the last windowLength messages
Removes older messages beyond the window
System prompt is always included
Does not consider actual token count

Example:

const strategy = new MessageCountContextStrategy(3);

// Input history: [msg1, msg2, msg3, msg4, msg5]
// Output: [system_prompt, msg3, msg4, msg5] (last 3 messages)

SlidingWindowContextStrategy (Recommended)

Dynamically trims messages based on actual token count to fit within the model’s context window.

import { SlidingWindowContextStrategy } from 'react-native-executorch/utils';

llm.configure({
  chatConfig: {
    contextStrategy: new SlidingWindowContextStrategy(
      1000,  // Buffer tokens for generation
      false  // Don't allow orphaned assistant messages
    ),
  },
});

Constructor:

new SlidingWindowContextStrategy(
  bufferTokens: number,
  allowOrphanedAssistantMessages: boolean = false
)

Parameters:

bufferTokens: Number of tokens to reserve for model generation (e.g., 1000)
allowOrphanedAssistantMessages: Whether to allow assistant responses without their preceding user message

Use when:

You want optimal context utilization (recommended for most cases)
Messages vary in length
You want to prevent context overflow errors
You need to maximize context usage while leaving room for generation

Behavior:

Calculates exact token count of formatted messages
Removes oldest messages until tokens fit within: maxContextLength - bufferTokens
Optionally preserves user-assistant message pairs
System prompt is always included

Example:

const strategy = new SlidingWindowContextStrategy(
  1000, // Reserve 1000 tokens for generation
  false // Keep user-assistant pairs together
);

// Assume maxContextLength = 4096
// Token budget = 4096 - 1000 = 3096 tokens
// 
// The strategy will:
// 1. Start with full history
// 2. Calculate token count of [system_prompt, ...history]
// 3. If > 3096 tokens, remove oldest message
// 4. If orphaned assistant message, remove it too (when allowOrphaned=false)
// 5. Repeat until tokens <= 3096

Comparison

Strategy	Token-Aware	Preserves Pairs	Complexity	Best For
NoopContextStrategy	No	N/A	O(1)	Manual management, short conversations
MessageCountContextStrategy	No	No	O(1)	Simple apps, uniform messages
SlidingWindowContextStrategy	Yes	Optional	O(n)	Production apps, optimal context usage

Implementation Details

Context Strategy Interface

All strategies implement this interface:

interface ContextStrategy {
  buildContext(
    systemPrompt: string,
    history: Message[],
    maxContextLength: number,
    getTokenCount: (messages: Message[]) => number
  ): Message[];
}

Parameters:

systemPrompt: The system instructions for the model
history: Complete conversation history
maxContextLength: Maximum tokens the model can handle
getTokenCount: Callback to calculate token count of messages

Returns:

Array of messages optimized for the context window

Orphaned Assistant Messages

When allowOrphanedAssistantMessages is false in SlidingWindowContextStrategy, the strategy ensures:

BAD (orphaned):
[system_prompt, assistant_message, user_message, assistant_message]

GOOD (paired):
[system_prompt, user_message, assistant_message]

This prevents the model from seeing an assistant response without understanding what user question it was answering.

Practical Examples

Short Conversations

import { NoopContextStrategy } from 'react-native-executorch/utils';

// No context management needed
llm.configure({
  chatConfig: {
    systemPrompt: 'You are a helpful assistant.',
    contextStrategy: new NoopContextStrategy(),
  },
});

Simple Chat App

import { MessageCountContextStrategy } from 'react-native-executorch/utils';

// Keep last 15 messages
llm.configure({
  chatConfig: {
    systemPrompt: 'You are a friendly chatbot.',
    contextStrategy: new MessageCountContextStrategy(15),
  },
});

Production App (Recommended)

import { SlidingWindowContextStrategy } from 'react-native-executorch/utils';

// Token-aware strategy with optimal settings
llm.configure({
  chatConfig: {
    systemPrompt: 'You are an AI assistant.',
    contextStrategy: new SlidingWindowContextStrategy(
      2000, // Reserve 2000 tokens for response
      false // Keep conversation pairs together
    ),
  },
});

Long-Context Model

import { SlidingWindowContextStrategy } from 'react-native-executorch/utils';

// For models with large context windows (e.g., 8192 tokens)
llm.configure({
  chatConfig: {
    systemPrompt: 'You are an AI assistant with access to long context.',
    contextStrategy: new SlidingWindowContextStrategy(
      4000, // Reserve more tokens for longer responses
      false
    ),
  },
});

Custom Context Strategy

You can implement your own strategy by implementing the ContextStrategy interface:

import { ContextStrategy, Message } from 'react-native-executorch';

class CustomContextStrategy implements ContextStrategy {
  buildContext(
    systemPrompt: string,
    history: Message[],
    maxContextLength: number,
    getTokenCount: (messages: Message[]) => number
  ): Message[] {
    // Your custom logic
    // For example: keep first and last N messages
    const keepFirst = 3;
    const keepLast = 5;
    
    const beginning = history.slice(0, keepFirst);
    const end = history.slice(-keepLast);
    
    return [
      { content: systemPrompt, role: 'system' },
      ...beginning,
      ...end,
    ];
  }
}

// Use it
llm.configure({
  chatConfig: {
    contextStrategy: new CustomContextStrategy(),
  },
});

Best Practices

Use SlidingWindowContextStrategy for production - It provides the most reliable context management
Set appropriate buffer tokens - Reserve enough tokens for the model’s response (1000-2000 is typical)
Consider conversation patterns - Set allowOrphanedAssistantMessages: false to preserve Q&A pairs
Monitor token usage - Use getTotalTokenCount() to understand your token consumption
Test with long conversations - Ensure your strategy handles extended conversations gracefully

Debugging Context Issues

If you encounter context-related errors:

useEffect(() => {
  if (!llm.isGenerating && llm.response) {
    const promptTokens = llm.getPromptTokenCount();
    const generatedTokens = llm.getGeneratedTokenCount();
    const totalTokens = llm.getTotalTokenCount();
    
    console.log('Token usage:', {
      prompt: promptTokens,
      generated: generatedTokens,
      total: totalTokens,
      historyLength: llm.messageHistory.length,
    });
  }
}, [llm.isGenerating]);

Type Definitions

interface ContextStrategy {
  buildContext(
    systemPrompt: string,
    history: Message[],
    maxContextLength: number,
    getTokenCount: (messages: Message[]) => number
  ): Message[];
}

interface Message {
  role: 'user' | 'assistant' | 'system';
  content: string;
}

Getting Started

Core Concepts

Large Language Models

Computer Vision

Speech & Audio

Text Embeddings

Advanced

Guides

Context Strategies

Context Strategies

Why Context Strategies?

Available Strategies

NoopContextStrategy

MessageCountContextStrategy

SlidingWindowContextStrategy (Recommended)

Comparison

Implementation Details

Context Strategy Interface

Orphaned Assistant Messages

Practical Examples

Short Conversations

Simple Chat App

Production App (Recommended)

Long-Context Model

Custom Context Strategy

Best Practices

Debugging Context Issues

Type Definitions

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Large Language Models

Computer Vision

Speech & Audio

Text Embeddings

Advanced

Guides

Documentation Index

​Context Strategies

​Why Context Strategies?

​Available Strategies

​NoopContextStrategy

​MessageCountContextStrategy

​SlidingWindowContextStrategy (Recommended)

​Comparison

​Implementation Details

​Context Strategy Interface

​Orphaned Assistant Messages

​Practical Examples

​Short Conversations

​Simple Chat App

​Production App (Recommended)

​Long-Context Model

​Custom Context Strategy

​Best Practices

​Debugging Context Issues

​Type Definitions

Build docs developers (and LLMs) love

Context Strategies

Why Context Strategies?

Available Strategies

NoopContextStrategy

MessageCountContextStrategy

SlidingWindowContextStrategy (Recommended)

Comparison

Implementation Details

Context Strategy Interface

Orphaned Assistant Messages

Practical Examples

Short Conversations

Simple Chat App

Production App (Recommended)

Long-Context Model

Custom Context Strategy

Best Practices

Debugging Context Issues

Type Definitions