Skip to main content
The Ollama provider enables LangExtract to use local models through Ollama, allowing you to run extractions completely offline without API keys or cloud services.

Quick Start

# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# 2. Pull a model
ollama pull gemma2:2b

# 3. Start the server (runs automatically on most systems)
ollama serve
import langextract as lx

result = lx.extract(
    text="Your document text",
    model_id="gemma2:2b",
    prompt_description="Extract key information",
    examples=[...],
    fence_output=False,
    use_schema_constraints=False
)
No API key required! Ollama runs models locally on your machine.

Installation

macOS and Linux

curl -fsSL https://ollama.com/install.sh | sh

Windows

Download the installer from ollama.com

Docker

docker pull ollama/ollama
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Model Selection

Pulling Models

Before using a model, download it:
# Small, fast models (recommended for testing)
ollama pull gemma2:2b
ollama pull llama3.2:1b

# Medium models (good balance)
ollama pull llama3.2:3b
ollama pull mistral:7b

# Large models (better quality, slower)
ollama pull llama3.1:70b
ollama pull qwen2.5:7b

Supported Model Formats

The Ollama provider supports:
  • Standard Ollama: llama3.2:1b, gemma2:2b, mistral:7b
  • Hugging Face style: meta-llama/Llama-3.2-1B-Instruct, google/gemma-2b
import langextract as lx

# Standard Ollama format
result = lx.extract(
    text="Your text",
    model_id="gemma2:2b",
    prompt_description="Extract entities",
    examples=[...],
    fence_output=False,
    use_schema_constraints=False
)

# Hugging Face format also works
result = lx.extract(
    text="Your text",
    model_id="google/gemma-2b",
    prompt_description="Extract entities",
    examples=[...],
    fence_output=False,
    use_schema_constraints=False
)

Configuration Options

Basic Parameters

result = lx.extract(
    text="Your document",
    model_id="gemma2:2b",
    prompt_description="Extract entities",
    examples=[...],
    fence_output=False,
    use_schema_constraints=False,
    # Provider-specific parameters:
    model_url="http://localhost:11434",  # Ollama server URL
    temperature=0.1,                     # Sampling temperature
    timeout=120,                         # Request timeout (seconds)
    max_output_tokens=1000,             # Maximum tokens to generate
)

Advanced Configuration

from langextract.providers.ollama import OllamaLanguageModel
import langextract as lx

model = OllamaLanguageModel(
    model_id="gemma2:2b",
    model_url="http://localhost:11434",
    timeout=120,              # Request timeout
    format_type=lx.data.FormatType.JSON,  # Output format
)

# Use with lx.extract
result = lx.extract(
    text="Your text",
    model=model,
    prompt_description="Extract data",
    examples=[...],
    fence_output=False,
    use_schema_constraints=False
)

Timeout Settings

For larger models or complex prompts, increase the timeout:
result = lx.extract(
    text="Your text",
    model_id="llama3.1:70b",  # Large model needs more time
    prompt_description="Extract entities",
    examples=[...],
    fence_output=False,
    use_schema_constraints=False,
    timeout=300,  # 5 minutes
    model_url="http://localhost:11434"
)

JSON Mode

Ollama supports JSON-structured output:
import langextract as lx

result = lx.extract(
    text="Your document",
    model_id="gemma2:2b",
    prompt_description="Extract structured data",
    examples=[...],
    format_type=lx.data.FormatType.JSON,  # Enables JSON mode
    fence_output=False,
    use_schema_constraints=False
)
Ollama’s JSON mode creates {"extractions": [...]} structure automatically. The provider uses use_wrapper=True for compatibility.

Code Examples

Basic Extraction

import langextract as lx

# Define your task
prompt = "Extract person names, locations, and dates in order of appearance."

examples = [
    lx.data.ExampleData(
        text="Dr. Jane Smith visited Paris on March 15, 2024.",
        extractions=[
            lx.data.Extraction(
                extraction_class="person",
                extraction_text="Dr. Jane Smith",
                attributes={"title": "Dr."}
            ),
            lx.data.Extraction(
                extraction_class="location",
                extraction_text="Paris",
                attributes={"type": "city"}
            ),
            lx.data.Extraction(
                extraction_class="date",
                extraction_text="March 15, 2024",
                attributes={"format": "full date"}
            )
        ]
    )
]

# Run extraction with Ollama
result = lx.extract(
    text="Prof. John Doe traveled to London on April 20, 2024.",
    model_id="gemma2:2b",
    prompt_description=prompt,
    examples=examples,
    fence_output=False,
    use_schema_constraints=False
)

print(f"Found {len(result.extractions)} extractions")
for ext in result.extractions:
    print(f"{ext.extraction_class}: {ext.extraction_text}")

Remote Ollama Server

import langextract as lx

# Connect to Ollama on a different machine
result = lx.extract(
    text="Your text",
    model_id="gemma2:2b",
    model_url="http://192.168.1.100:11434",  # Remote server
    prompt_description="Extract entities",
    examples=[...],
    fence_output=False,
    use_schema_constraints=False
)

Docker Compose Setup

Create a docker-compose.yml for production:
version: '3.8'

services:
  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
      interval: 30s
      timeout: 10s
      retries: 3

  langextract:
    build: .
    depends_on:
      ollama:
        condition: service_healthy
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    command: python your_extraction_script.py

volumes:
  ollama_data:
Then in your Python code:
import langextract as lx
import os

result = lx.extract(
    text="Your text",
    model_id="gemma2:2b",
    model_url=os.environ.get('OLLAMA_BASE_URL', 'http://localhost:11434'),
    prompt_description="Extract entities",
    examples=[...],
    fence_output=False,
    use_schema_constraints=False
)

Authentication (Proxied Instances)

For proxied Ollama instances that require authentication:
from langextract.providers.ollama import OllamaLanguageModel
import langextract as lx

model = OllamaLanguageModel(
    model_id="gemma2:2b",
    model_url="https://your-proxy.example.com",
    api_key="your-api-key",
    auth_scheme="Bearer",  # or "Token", etc.
    auth_header="Authorization"  # Header name
)

result = lx.extract(
    text="Your text",
    model=model,
    prompt_description="Extract data",
    examples=[...],
    fence_output=False,
    use_schema_constraints=False
)
Native Ollama doesn’t require authentication. Only use api_key for proxied instances.

Performance Optimization

CPU Threads

result = lx.extract(
    text="Your text",
    model_id="gemma2:2b",
    prompt_description="Extract entities",
    examples=[...],
    fence_output=False,
    use_schema_constraints=False,
    num_threads=8,  # Utilize more CPU cores
)

Context Window

result = lx.extract(
    text="Your text",
    model_id="gemma2:2b",
    prompt_description="Extract entities",
    examples=[...],
    fence_output=False,
    use_schema_constraints=False,
    num_ctx=4096,  # Increase context window (default: 2048)
)

Keep-Alive

Control how long the model stays loaded:
result = lx.extract(
    text="Your text",
    model_id="gemma2:2b",
    prompt_description="Extract entities",
    examples=[...],
    fence_output=False,
    use_schema_constraints=False,
    keep_alive=600,  # Keep loaded for 10 minutes (default: 300)
)

Error Handling

import langextract as lx
from langextract.core.exceptions import InferenceConfigError, InferenceRuntimeError

try:
    result = lx.extract(
        text="Your text",
        model_id="gemma2:2b",
        prompt_description="Extract data",
        examples=[...],
        fence_output=False,
        use_schema_constraints=False
    )
except InferenceConfigError as e:
    # Configuration errors (model not found, invalid URL)
    if "Can't find Ollama" in str(e):
        print("Model not found. Run: ollama pull gemma2:2b")
    else:
        print(f"Configuration error: {e}")
except InferenceRuntimeError as e:
    # Runtime errors (timeouts, connection errors)
    if "timed out" in str(e):
        print("Request timed out. Try increasing timeout parameter.")
    else:
        print(f"Runtime error: {e}")

Troubleshooting

Model Not Found

InferenceConfigError: Can't find Ollama gemma2:2b. Try: ollama run gemma2:2b
Solution: Pull the model first:
ollama pull gemma2:2b

Connection Refused

InferenceRuntimeError: Ollama request failed: Connection refused
Solutions:
  1. Start Ollama: ollama serve
  2. Check if running: curl http://localhost:11434/api/tags
  3. Verify URL: model_url="http://localhost:11434"

Timeout Errors

InferenceRuntimeError: Ollama Model timed out (timeout=120)
Solution: Increase timeout for larger models:
result = lx.extract(
    text="Your text",
    model_id="llama3.1:70b",
    timeout=300,  # 5 minutes
    # ...
)

Model Licenses

Ollama models have their own licenses:Review the license for any model you use.

Direct Provider Usage

For advanced use cases, instantiate the provider directly:
from langextract.providers.ollama import OllamaLanguageModel
from langextract.core.types import ScoredOutput
import langextract as lx

model = OllamaLanguageModel(
    model_id="gemma2:2b",
    model_url="http://localhost:11434",
    format_type=lx.data.FormatType.JSON,
    timeout=120
)

# Run inference on prompts
prompts = ["Extract entities from: ...", "Summarize: ..."]
for outputs in model.infer(prompts):
    for scored_output in outputs:
        print(f"Score: {scored_output.score}, Output: {scored_output.output}")

Comparison with Cloud Providers

FeatureOllamaGeminiOpenAI
API KeyNot requiredRequiredRequired
InternetNot requiredRequiredRequired
CostFree (hardware only)Pay per tokenPay per token
PrivacyFully localCloud-basedCloud-based
SpeedDepends on hardwareFastFast
Model SelectionOpen source modelsGemini familyGPT family
Schema ConstraintsNot supportedSupportedNot supported

Next Steps

Provider Overview

Learn about the provider architecture

Gemini Provider

Use Google’s Gemini models

OpenAI Provider

Use OpenAI’s GPT models

Custom Providers

Create your own providers

Build docs developers (and LLMs) love