Documentation Index
Fetch the complete documentation index at: https://mintlify.com/alblandino/tokenizador/llms.txt
Use this file to discover all available pages before exploring further.
Overview
TheStatisticsCalculator class provides comprehensive statistical analysis for tokenized text. It calculates token counts, character counts, word counts, cost estimates, context utilization, and provides model comparison capabilities.
This calculator works with data from 48 AI models and provides accurate cost estimates based on current pricing.
Constructor
Creates a new StatisticsCalculator instance.Methods
calculateStatistics()
Calculates comprehensive statistics for the given text and model.The original input text
Result object from TokenizationService.tokenizeText()
Model identifier (e.g., “gpt-4o”, “claude-3.5-sonnet”)
Comprehensive statistics object
Total number of tokens
Total number of characters
Total number of words
Estimated cost in USD for input tokens
Percentage of context window used (0-100)
Average tokens per word ratio
Cost per 1M input tokens in USD
Cost per 1M output tokens in USD
countWords()
Counts words in text using intelligent word boundary detection.Text to analyze
Number of words (0 for empty text)
- Trims whitespace from text
- Splits on whitespace characters (
\s+) - Filters out empty strings
- Returns count
calculateCost()
Calculates estimated cost based on token count and model pricing.Number of tokens
Model information object from MODELS_DATA
Estimated cost in USD
Cost estimates are based on input token pricing. Output tokens typically cost more.
calculateContextUtilization()
Calculates the percentage of the model’s context window being used.Number of tokens in the text
Maximum context window size for the model
Percentage from 0 to 100 (capped at 100)
exceedsContextLimit()
Checks if token count exceeds the model’s context limit.Number of tokens
Model identifier
True if exceeds limit, false otherwise
getContextWarning()
Returns a warning message if context usage is high or exceeded.Number of tokens
Model identifier
Warning message or null if no warning needed
- 100%+ (Exceeded)
- 90-99% (Near Limit)
- 75-89% (High Usage)
- Under 75% (No Warning)
formatStatistics()
Formats statistics for display with proper localization and units.Raw statistics object from calculateStatistics()
Formatted statistics with string values
compareModels()
Compares tokenization statistics across multiple models.Text to analyze
Array of model IDs to compare
Tokenization service instance
Array of comparison objects sorted by cost (cheapest first)
Model identifier
Model provider (e.g., “OpenAI”, “Anthropic”)
Raw statistics object
Formatted statistics for display
Comparison results are automatically sorted by cost estimate, making it easy to find the most economical model for your text.
getEfficiencyMetrics()
Calculates efficiency metrics for tokenization analysis.Statistics object from calculateStatistics()
Efficiency metrics object
Cost per thousand tokens (lower is better)
Tokens per character (lower = better compression)
Tokens per word (lower = more efficient encoding)
Usage Examples
Statistics Interpretation
Token Count
Token Count
The total number of tokens the text is divided into. This directly impacts:
- API costs (priced per token)
- Processing time
- Context window usage
- Short prompt: 10-100 tokens
- Medium text: 100-1,000 tokens
- Long document: 1,000-10,000+ tokens
Character Count
Character Count
Total number of characters including spaces and punctuation.Rule of thumb: English text averages ~4 characters per token.
Word Count
Word Count
Number of words (whitespace-separated).Rule of thumb: English text averages ~0.75 tokens per word.
Cost Estimate
Cost Estimate
Estimated API cost for processing the text.Note: Based on input pricing. Output tokens cost more.Cost ranges (GPT-4o):
- 1K tokens: ~$0.0025
- 10K tokens: ~$0.025
- 100K tokens: ~$0.25
Context Utilization
Context Utilization
Percentage of the model’s context window being used.Guidelines:
- Less than 50%: Comfortable usage
- 50-75%: Moderate usage
- 75-90%: High usage
- 90-100%: Near limit
- Greater than 100%: Exceeds limit (will fail)
Tokens Per Word
Tokens Per Word
Average number of tokens per word.Typical values:
- English: 1.3-1.5
- Code: 1.5-2.0
- Non-English: varies by language
Cost Optimization Tips
Choose Efficient Models
Compare models to find the best token-to-cost ratio for your use case
Minimize Prompt Length
Remove unnecessary context and instructions to reduce token count
Use Smaller Models
Consider mini variants (e.g., gpt-4o-mini) for simpler tasks
Batch Requests
Process multiple items in one request to reduce per-request overhead
See Also
TokenAnalyzer
Main application orchestrator
TokenizationService
Tokenization engine
UIController
UI management
Supported Models
View all model pricing