Ollama Provider - LangExtract

The Ollama provider enables LangExtract to use local models through Ollama, allowing you to run extractions completely offline without API keys or cloud services.

Quick Start

# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# 2. Pull a model
ollama pull gemma2:2b

# 3. Start the server (runs automatically on most systems)
ollama serve

import langextract as lx

result = lx.extract(
    text="Your document text",
    model_id="gemma2:2b",
    prompt_description="Extract key information",
    examples=[...],
    fence_output=False,
    use_schema_constraints=False
)

No API key required! Ollama runs models locally on your machine.

Installation

macOS and Linux

curl -fsSL https://ollama.com/install.sh | sh

Windows

Download the installer from ollama.com

Docker

docker pull ollama/ollama
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Model Selection

Pulling Models

Before using a model, download it:

# Small, fast models (recommended for testing)
ollama pull gemma2:2b
ollama pull llama3.2:1b

# Medium models (good balance)
ollama pull llama3.2:3b
ollama pull mistral:7b

# Large models (better quality, slower)
ollama pull llama3.1:70b
ollama pull qwen2.5:7b

Supported Model Formats

The Ollama provider supports:

Standard Ollama: llama3.2:1b, gemma2:2b, mistral:7b
Hugging Face style: meta-llama/Llama-3.2-1B-Instruct, google/gemma-2b

import langextract as lx

# Standard Ollama format
result = lx.extract(
    text="Your text",
    model_id="gemma2:2b",
    prompt_description="Extract entities",
    examples=[...],
    fence_output=False,
    use_schema_constraints=False
)

# Hugging Face format also works
result = lx.extract(
    text="Your text",
    model_id="google/gemma-2b",
    prompt_description="Extract entities",
    examples=[...],
    fence_output=False,
    use_schema_constraints=False
)

Configuration Options

Basic Parameters

result = lx.extract(
    text="Your document",
    model_id="gemma2:2b",
    prompt_description="Extract entities",
    examples=[...],
    fence_output=False,
    use_schema_constraints=False,
    # Provider-specific parameters:
    model_url="http://localhost:11434",  # Ollama server URL
    temperature=0.1,                     # Sampling temperature
    timeout=120,                         # Request timeout (seconds)
    max_output_tokens=1000,             # Maximum tokens to generate
)

Advanced Configuration

from langextract.providers.ollama import OllamaLanguageModel
import langextract as lx

model = OllamaLanguageModel(
    model_id="gemma2:2b",
    model_url="http://localhost:11434",
    timeout=120,              # Request timeout
    format_type=lx.data.FormatType.JSON,  # Output format
)

# Use with lx.extract
result = lx.extract(
    text="Your text",
    model=model,
    prompt_description="Extract data",
    examples=[...],
    fence_output=False,
    use_schema_constraints=False
)

Timeout Settings

For larger models or complex prompts, increase the timeout:

result = lx.extract(
    text="Your text",
    model_id="llama3.1:70b",  # Large model needs more time
    prompt_description="Extract entities",
    examples=[...],
    fence_output=False,
    use_schema_constraints=False,
    timeout=300,  # 5 minutes
    model_url="http://localhost:11434"
)

JSON Mode

Ollama supports JSON-structured output:

import langextract as lx

result = lx.extract(
    text="Your document",
    model_id="gemma2:2b",
    prompt_description="Extract structured data",
    examples=[...],
    format_type=lx.data.FormatType.JSON,  # Enables JSON mode
    fence_output=False,
    use_schema_constraints=False
)

Ollama’s JSON mode creates {"extractions": [...]} structure automatically. The provider uses use_wrapper=True for compatibility.

Code Examples

Basic Extraction

import langextract as lx

# Define your task
prompt = "Extract person names, locations, and dates in order of appearance."

examples = [
    lx.data.ExampleData(
        text="Dr. Jane Smith visited Paris on March 15, 2024.",
        extractions=[
            lx.data.Extraction(
                extraction_class="person",
                extraction_text="Dr. Jane Smith",
                attributes={"title": "Dr."}
            ),
            lx.data.Extraction(
                extraction_class="location",
                extraction_text="Paris",
                attributes={"type": "city"}
            ),
            lx.data.Extraction(
                extraction_class="date",
                extraction_text="March 15, 2024",
                attributes={"format": "full date"}
            )
        ]
    )
]

# Run extraction with Ollama
result = lx.extract(
    text="Prof. John Doe traveled to London on April 20, 2024.",
    model_id="gemma2:2b",
    prompt_description=prompt,
    examples=examples,
    fence_output=False,
    use_schema_constraints=False
)

print(f"Found {len(result.extractions)} extractions")
for ext in result.extractions:
    print(f"{ext.extraction_class}: {ext.extraction_text}")

Remote Ollama Server

import langextract as lx

# Connect to Ollama on a different machine
result = lx.extract(
    text="Your text",
    model_id="gemma2:2b",
    model_url="http://192.168.1.100:11434",  # Remote server
    prompt_description="Extract entities",
    examples=[...],
    fence_output=False,
    use_schema_constraints=False
)

Docker Compose Setup

Create a docker-compose.yml for production:

version: '3.8'

services:
  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
      interval: 30s
      timeout: 10s
      retries: 3

  langextract:
    build: .
    depends_on:
      ollama:
        condition: service_healthy
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    command: python your_extraction_script.py

volumes:
  ollama_data:

Then in your Python code:

import langextract as lx
import os

result = lx.extract(
    text="Your text",
    model_id="gemma2:2b",
    model_url=os.environ.get('OLLAMA_BASE_URL', 'http://localhost:11434'),
    prompt_description="Extract entities",
    examples=[...],
    fence_output=False,
    use_schema_constraints=False
)

Authentication (Proxied Instances)

For proxied Ollama instances that require authentication:

from langextract.providers.ollama import OllamaLanguageModel
import langextract as lx

model = OllamaLanguageModel(
    model_id="gemma2:2b",
    model_url="https://your-proxy.example.com",
    api_key="your-api-key",
    auth_scheme="Bearer",  # or "Token", etc.
    auth_header="Authorization"  # Header name
)

result = lx.extract(
    text="Your text",
    model=model,
    prompt_description="Extract data",
    examples=[...],
    fence_output=False,
    use_schema_constraints=False
)

Native Ollama doesn’t require authentication. Only use api_key for proxied instances.

Performance Optimization

CPU Threads

result = lx.extract(
    text="Your text",
    model_id="gemma2:2b",
    prompt_description="Extract entities",
    examples=[...],
    fence_output=False,
    use_schema_constraints=False,
    num_threads=8,  # Utilize more CPU cores
)

Context Window

result = lx.extract(
    text="Your text",
    model_id="gemma2:2b",
    prompt_description="Extract entities",
    examples=[...],
    fence_output=False,
    use_schema_constraints=False,
    num_ctx=4096,  # Increase context window (default: 2048)
)

Keep-Alive

Control how long the model stays loaded:

result = lx.extract(
    text="Your text",
    model_id="gemma2:2b",
    prompt_description="Extract entities",
    examples=[...],
    fence_output=False,
    use_schema_constraints=False,
    keep_alive=600,  # Keep loaded for 10 minutes (default: 300)
)

Error Handling

import langextract as lx
from langextract.core.exceptions import InferenceConfigError, InferenceRuntimeError

try:
    result = lx.extract(
        text="Your text",
        model_id="gemma2:2b",
        prompt_description="Extract data",
        examples=[...],
        fence_output=False,
        use_schema_constraints=False
    )
except InferenceConfigError as e:
    # Configuration errors (model not found, invalid URL)
    if "Can't find Ollama" in str(e):
        print("Model not found. Run: ollama pull gemma2:2b")
    else:
        print(f"Configuration error: {e}")
except InferenceRuntimeError as e:
    # Runtime errors (timeouts, connection errors)
    if "timed out" in str(e):
        print("Request timed out. Try increasing timeout parameter.")
    else:
        print(f"Runtime error: {e}")

Troubleshooting

Model Not Found

InferenceConfigError: Can't find Ollama gemma2:2b. Try: ollama run gemma2:2b

Solution: Pull the model first:

ollama pull gemma2:2b

Connection Refused

InferenceRuntimeError: Ollama request failed: Connection refused

Solutions:

Start Ollama: ollama serve
Check if running: curl http://localhost:11434/api/tags
Verify URL: model_url="http://localhost:11434"

Timeout Errors

InferenceRuntimeError: Ollama Model timed out (timeout=120)

Solution: Increase timeout for larger models:

result = lx.extract(
    text="Your text",
    model_id="llama3.1:70b",
    timeout=300,  # 5 minutes
    # ...
)

Model Licenses

Ollama models have their own licenses:

Gemma models: Gemma Terms of Use
Llama models: Meta Llama License
Other models: Check model page on ollama.com/library

Review the license for any model you use.

Direct Provider Usage

For advanced use cases, instantiate the provider directly:

from langextract.providers.ollama import OllamaLanguageModel
from langextract.core.types import ScoredOutput
import langextract as lx

model = OllamaLanguageModel(
    model_id="gemma2:2b",
    model_url="http://localhost:11434",
    format_type=lx.data.FormatType.JSON,
    timeout=120
)

# Run inference on prompts
prompts = ["Extract entities from: ...", "Summarize: ..."]
for outputs in model.infer(prompts):
    for scored_output in outputs:
        print(f"Score: {scored_output.score}, Output: {scored_output.output}")

Comparison with Cloud Providers

Feature	Ollama	Gemini	OpenAI
API Key	Not required	Required	Required
Internet	Not required	Required	Required
Cost	Free (hardware only)	Pay per token	Pay per token
Privacy	Fully local	Cloud-based	Cloud-based
Speed	Depends on hardware	Fast	Fast
Model Selection	Open source models	Gemini family	GPT family
Schema Constraints	Not supported	Supported	Not supported

Next Steps

Provider Overview

Learn about the provider architecture

Gemini Provider

Use Google’s Gemini models

OpenAI Provider

Use OpenAI’s GPT models

Custom Providers

Create your own providers

Get Started

Core Concepts

Guides

Model Providers

Examples

​Quick Start

​Installation

​macOS and Linux

​Windows

​Docker

​Model Selection

​Pulling Models

​Supported Model Formats

​Configuration Options

​Basic Parameters

​Advanced Configuration

​Timeout Settings

​JSON Mode

​Code Examples

​Basic Extraction

​Remote Ollama Server

​Docker Compose Setup

​Authentication (Proxied Instances)

​Performance Optimization

​CPU Threads

​Context Window

​Keep-Alive

​Error Handling

​Troubleshooting

​Model Not Found

​Connection Refused

​Timeout Errors

​Model Licenses

​Direct Provider Usage

​Comparison with Cloud Providers

​Next Steps

Provider Overview

Gemini Provider

OpenAI Provider

Custom Providers

Build docs developers (and LLMs) love

Quick Start

Installation

macOS and Linux

Windows

Docker

Model Selection

Pulling Models

Supported Model Formats

Configuration Options

Basic Parameters

Advanced Configuration

Timeout Settings

JSON Mode

Code Examples

Basic Extraction

Remote Ollama Server

Docker Compose Setup

Authentication (Proxied Instances)

Performance Optimization

CPU Threads

Context Window

Keep-Alive

Error Handling

Troubleshooting

Model Not Found

Connection Refused

Timeout Errors

Model Licenses

Direct Provider Usage

Comparison with Cloud Providers

Next Steps