AI Models and OpenRouter Setup

Learn how to configure AI models, set up OpenRouter API access, and choose the best model for your document processing needs.

OpenRouter Overview

The Meta-Data Tag Generator uses OpenRouter to access multiple AI models through a single API. OpenRouter provides:

Unified API: Access 200+ AI models with one API key
Flexible Pricing: Pay only for what you use, choose models by cost/performance
Model Fallbacks: Automatic failover if a model is unavailable
Rate Limiting: Built-in rate limit management

Getting Started with OpenRouter

Create an Account

Generate API Key

Navigate to API Keys and create a new key

Add Credits (Optional)

Free tier available, or add credits at Billing for higher rate limits

Use API Key

Include your API key in the config.api_key parameter when processing documents

API Key Format

OpenRouter API keys start with sk-or-v1-:

sk-or-v1-1234567890abcdef1234567890abcdef1234567890abcdef1234567890ab

Keep your API key secure. Never commit it to version control or expose it in client-side code.

Recommended AI Models

The system supports all OpenRouter models, but these are optimized for tag generation:

Best for Speed and Cost

OpenAI GPT-4o Mini

Model ID: openai/gpt-4o-miniBest for: General-purpose tagging, English documentsSpeed: Very fast (2-4 seconds)Cost: $0.15 per 1M input tokensStrengths:

Excellent balance of speed and quality
Low cost for high-volume processing
Reliable and well-supported

Default model - Recommended for most use cases

Google Gemini Flash 1.5

Model ID: google/gemini-flash-1.5Best for: Multilingual documents, Indian languagesSpeed: Very fast (2-3 seconds)Cost: $0.075 per 1M input tokensStrengths:

Excellent with Hindi, Tamil, Telugu, and other Indian languages
Fastest processing speed
Lowest cost option
Great for scanned/OCR documents

Best for Quality

Anthropic Claude 3 Haiku

Model ID: anthropic/claude-3-haikuBest for: Complex documents, legal textsSpeed: Fast (3-5 seconds)Cost: $0.25 per 1M input tokensStrengths:

Highest quality tag generation
Excellent understanding of context
Great for technical/legal documents
Superior at avoiding generic tags

Anthropic Claude 3.5 Sonnet

Model ID: anthropic/claude-3.5-sonnetBest for: Premium quality, complex analysisSpeed: Medium (5-8 seconds)Cost: $3.00 per 1M input tokensStrengths:

Best-in-class quality
Deep contextual understanding
Ideal for critical documents
Most sophisticated tag selection

Model Configuration

Specify the model in your processing configuration:

import requests
import json

config = {
    "api_key": "sk-or-v1-...",
    "model_name": "google/gemini-flash-1.5",  # Choose your model
    "num_pages": 3,
    "num_tags": 8
}

files = {"pdf_file": open("document.pdf", "rb")}
data = {"config": json.dumps(config)}
headers = {"Authorization": f"Bearer {access_token}"}

response = requests.post(
    "http://localhost:8000/api/single/process",
    files=files,
    data=data,
    headers=headers
)

Model Selection Guide

Choose the right model based on your needs:

English Documents (General Purpose)

Recommended: openai/gpt-4o-miniBest balance of speed, cost, and quality for English documents:

Business reports
Training manuals
Policy documents
General correspondence

config = {
    "model_name": "openai/gpt-4o-mini",
    "num_pages": 3,
    "num_tags": 8
}

Indian Language Documents

Recommended: google/gemini-flash-1.5Excellent support for Hindi, Tamil, Telugu, Bengali, and other Indian languages:

Government documents in regional languages
Multilingual reports
OCR-extracted text from scanned documents

config = {
    "model_name": "google/gemini-flash-1.5",
    "num_pages": 5,  # More pages for better context
    "num_tags": 10
}

Legal/Technical Documents

Recommended: anthropic/claude-3-haikuSuperior understanding of complex terminology and context:

Legal contracts and agreements
Technical specifications
Research papers
Regulatory documents

config = {
    "model_name": "anthropic/claude-3-haiku",
    "num_pages": 5,
    "num_tags": 12  # More tags for technical docs
}

High-Volume Processing

Recommended: google/gemini-flash-1.5Lowest cost for processing thousands of documents:

Batch processing large archives
Daily automated processing
Cost-sensitive deployments

config = {
    "model_name": "google/gemini-flash-1.5",
    "num_pages": 2,  # Reduce pages to save costs
    "num_tags": 6
}

Premium Quality Required

Recommended: anthropic/claude-3.5-sonnetBest quality regardless of cost:

Critical business documents
Executive summaries
High-stakes legal documents
Knowledge base curation

config = {
    "model_name": "anthropic/claude-3.5-sonnet",
    "num_pages": 10,
    "num_tags": 15
}

Unsupported Models

Some models are not compatible with tag generation:

The following model types do NOT work for tagging:

Reasoning models: deepseek-r1, deepseek-reasoner, o1-preview, o1-mini
Vision models: qwen-vl, qwen-2.5-vl, image analysis models

These models use different response formats incompatible with tag generation. Stick to chat/completion models.

Rate Limits and Pricing

Free Tier

OpenRouter provides a free tier with rate limits:

Free credits: Small amount for testing
Rate limits: 10-20 requests per minute (model-dependent)
Best for: Development, testing, small-scale use

If you hit rate limits frequently, the system automatically implements exponential backoff. Consider adding credits for production use.

Paid Tier

Add credits for higher limits and better performance:

Higher Rate Limits

100+ requests per minute depending on model

Pay-as-you-go

Only pay for tokens used, no subscription

Priority Access

Faster processing during peak times

Cost Calculation

Estimate costs based on your usage:

Cost Estimator

def estimate_cost(num_documents, pages_per_doc, avg_words_per_page, model="openai/gpt-4o-mini"):
    """
    Estimate OpenRouter API cost for batch processing
    
    Pricing (per 1M input tokens):
    - gpt-4o-mini: $0.15
    - gemini-flash-1.5: $0.075
    - claude-3-haiku: $0.25
    - claude-3.5-sonnet: $3.00
    """
    # Rough estimate: 750 words = 1000 tokens
    tokens_per_doc = (pages_per_doc * avg_words_per_page) / 0.75
    total_tokens = num_documents * tokens_per_doc
    
    # Model pricing per 1M tokens (input)
    prices = {
        "openai/gpt-4o-mini": 0.15,
        "google/gemini-flash-1.5": 0.075,
        "anthropic/claude-3-haiku": 0.25,
        "anthropic/claude-3.5-sonnet": 3.00
    }
    
    cost_per_million = prices.get(model, 0.15)
    total_cost = (total_tokens / 1_000_000) * cost_per_million
    
    return {
        "documents": num_documents,
        "total_tokens": int(total_tokens),
        "estimated_cost": round(total_cost, 4),
        "cost_per_document": round(total_cost / num_documents, 6)
    }

# Example: 1000 documents, 3 pages each, 300 words/page
estimate = estimate_cost(
    num_documents=1000,
    pages_per_doc=3,
    avg_words_per_page=300,
    model="openai/gpt-4o-mini"
)

print(f"Total cost: ${estimate['estimated_cost']}")
print(f"Cost per document: ${estimate['cost_per_document']}")
# Output:
# Total cost: $0.18
# Cost per document: $0.00018

API Configuration

The system uses these OpenRouter API settings:

Backend Configuration

# OpenRouter endpoint
BASE_URL = "https://openrouter.ai/api/v1"

# Timeouts
CONNECT_TIMEOUT = 10  # seconds to establish connection
READ_TIMEOUT = 120    # seconds to wait for response

# Retry settings
MAX_RETRIES = 3       # retry failed requests
RETRY_DELAY = 2       # initial delay between retries

# Rate limiting
RETRY_DELAY_MULTIPLIER = 1.5  # exponential backoff
MAX_DELAY_BETWEEN_REQUESTS = 120  # cap delay at 2 minutes

Request Format

The system sends requests in OpenAI-compatible format:

OpenRouter Request

{
  "model": "openai/gpt-4o-mini",
  "messages": [
    {
      "role": "system",
      "content": "You are a document search-tagging expert..."
    },
    {
      "role": "user",
      "content": "Analyze the document below and return exactly 8 search tags..."
    }
  ],
  "max_tokens": 700,
  "temperature": 0.2
}

Error Handling

Invalid API Key

Error: "Invalid API key"Cause: API key is incorrect or not from OpenRouterSolution:

Verify API key starts with sk-or-v1-
Generate new key at OpenRouter Keys
Check for typos or extra spaces

Rate Limited

Error: "RATE_LIMITED: OpenRouter free tier limit hit"Cause: Too many requests too quicklySolution:

System automatically implements backoff
Add credits at OpenRouter Billing
Reduce batch size or add delays between requests

The system will retry automatically with exponential backoff.

Model Not Found

Error: "Model not found: incorrect-model-name"Cause: Model ID is incorrect or model is unavailableSolution:

Check model ID at OpenRouter Models
Ensure model is currently available
Use exact model ID (case-sensitive)

Request Timeout

Error: "Request timed out"Cause: Model is slow or API is congestedSolution:

Reduce num_pages to decrease content size
Try a faster model like gemini-flash-1.5
System will retry automatically (up to 3 times)

Model Compatibility Warning

Warning: "Model 'deepseek-r1' is likely incompatible for tagging tasks"Cause: Using a reasoning or vision modelSolution:

Switch to a chat/completion model
Use recommended models: gpt-4o-mini, gemini-flash-1.5, claude-3-haiku
Reasoning models return empty/incompatible responses

Best Practices

Start with GPT-4o Mini

Begin with openai/gpt-4o-mini for testing and general use. Upgrade to Claude for better quality or Gemini for multilingual.

Optimize Page Count

More pages = better context but higher cost. Start with 3 pages, adjust based on document complexity.

Monitor API Usage

Track token usage and costs in OpenRouter Dashboard

Handle Rate Limits

The system auto-retries with backoff. For high-volume, add credits or implement request queuing.

Model Comparison

Model	Speed	Cost (per 1M tokens)	Quality	Best For
gpt-4o-mini	⚡⚡⚡ Very Fast	$0.15	⭐⭐⭐⭐ Excellent	General purpose, English
gemini-flash-1.5	⚡⚡⚡ Very Fast	$0.075	⭐⭐⭐⭐ Excellent	Multilingual, high-volume
claude-3-haiku	⚡⚡ Fast	$0.25	⭐⭐⭐⭐⭐ Superior	Legal, technical docs
claude-3.5-sonnet	⚡ Medium	$3.00	⭐⭐⭐⭐⭐ Best	Premium quality, complex

For most users, gpt-4o-mini offers the best balance. Use gemini-flash-1.5 for Indian languages or high-volume processing at lower cost.

Getting Started

Core Features

User Guides

Deployment

OpenRouter Overview

Getting Started with OpenRouter

API Key Format

Recommended AI Models

Best for Speed and Cost

OpenAI GPT-4o Mini

Google Gemini Flash 1.5

Best for Quality

Anthropic Claude 3 Haiku

Anthropic Claude 3.5 Sonnet

Model Configuration

Model Selection Guide

Unsupported Models

Rate Limits and Pricing

Free Tier

Paid Tier

Higher Rate Limits

Pay-as-you-go

Priority Access

Cost Calculation

API Configuration

Request Format

Error Handling

Best Practices

Start with GPT-4o Mini

Optimize Page Count

Monitor API Usage

Handle Rate Limits

Model Comparison

Build docs developers (and LLMs) love

Getting Started

Core Features

User Guides

Deployment

Documentation Index

​OpenRouter Overview

​Getting Started with OpenRouter

​API Key Format

​Recommended AI Models

​Best for Speed and Cost

OpenAI GPT-4o Mini

Google Gemini Flash 1.5

​Best for Quality

Anthropic Claude 3 Haiku

Anthropic Claude 3.5 Sonnet

​Model Configuration

​Model Selection Guide

​Unsupported Models

​Rate Limits and Pricing

​Free Tier

​Paid Tier

Higher Rate Limits

Pay-as-you-go

Priority Access

​Cost Calculation

​API Configuration

​Request Format

​Error Handling

​Best Practices

Start with GPT-4o Mini

Optimize Page Count

Monitor API Usage

Handle Rate Limits

​Model Comparison

Build docs developers (and LLMs) love

OpenRouter Overview

Getting Started with OpenRouter

API Key Format

Recommended AI Models

Best for Speed and Cost

Best for Quality

Model Configuration

Model Selection Guide

Unsupported Models

Rate Limits and Pricing

Free Tier

Paid Tier

Cost Calculation

API Configuration

Request Format

Error Handling

Best Practices

Model Comparison