The Ollama provider enables LangExtract to use local models through Ollama , allowing you to run extractions completely offline without API keys or cloud services.
Quick Start
# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# 2. Pull a model
ollama pull gemma2:2b
# 3. Start the server (runs automatically on most systems)
ollama serve
import langextract as lx
result = lx.extract(
text = "Your document text" ,
model_id = "gemma2:2b" ,
prompt_description = "Extract key information" ,
examples = [ ... ],
fence_output = False ,
use_schema_constraints = False
)
No API key required! Ollama runs models locally on your machine.
Installation
macOS and Linux
curl -fsSL https://ollama.com/install.sh | sh
Windows
Download the installer from ollama.com
Docker
docker pull ollama/ollama
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
Model Selection
Pulling Models
Before using a model, download it:
# Small, fast models (recommended for testing)
ollama pull gemma2:2b
ollama pull llama3.2:1b
# Medium models (good balance)
ollama pull llama3.2:3b
ollama pull mistral:7b
# Large models (better quality, slower)
ollama pull llama3.1:70b
ollama pull qwen2.5:7b
The Ollama provider supports:
Standard Ollama : llama3.2:1b, gemma2:2b, mistral:7b
Hugging Face style : meta-llama/Llama-3.2-1B-Instruct, google/gemma-2b
import langextract as lx
# Standard Ollama format
result = lx.extract(
text = "Your text" ,
model_id = "gemma2:2b" ,
prompt_description = "Extract entities" ,
examples = [ ... ],
fence_output = False ,
use_schema_constraints = False
)
# Hugging Face format also works
result = lx.extract(
text = "Your text" ,
model_id = "google/gemma-2b" ,
prompt_description = "Extract entities" ,
examples = [ ... ],
fence_output = False ,
use_schema_constraints = False
)
Configuration Options
Basic Parameters
result = lx.extract(
text = "Your document" ,
model_id = "gemma2:2b" ,
prompt_description = "Extract entities" ,
examples = [ ... ],
fence_output = False ,
use_schema_constraints = False ,
# Provider-specific parameters:
model_url = "http://localhost:11434" , # Ollama server URL
temperature = 0.1 , # Sampling temperature
timeout = 120 , # Request timeout (seconds)
max_output_tokens = 1000 , # Maximum tokens to generate
)
Advanced Configuration
from langextract.providers.ollama import OllamaLanguageModel
import langextract as lx
model = OllamaLanguageModel(
model_id = "gemma2:2b" ,
model_url = "http://localhost:11434" ,
timeout = 120 , # Request timeout
format_type = lx.data.FormatType. JSON , # Output format
)
# Use with lx.extract
result = lx.extract(
text = "Your text" ,
model = model,
prompt_description = "Extract data" ,
examples = [ ... ],
fence_output = False ,
use_schema_constraints = False
)
Timeout Settings
For larger models or complex prompts, increase the timeout:
result = lx.extract(
text = "Your text" ,
model_id = "llama3.1:70b" , # Large model needs more time
prompt_description = "Extract entities" ,
examples = [ ... ],
fence_output = False ,
use_schema_constraints = False ,
timeout = 300 , # 5 minutes
model_url = "http://localhost:11434"
)
JSON Mode
Ollama supports JSON-structured output:
import langextract as lx
result = lx.extract(
text = "Your document" ,
model_id = "gemma2:2b" ,
prompt_description = "Extract structured data" ,
examples = [ ... ],
format_type = lx.data.FormatType. JSON , # Enables JSON mode
fence_output = False ,
use_schema_constraints = False
)
Ollama’s JSON mode creates {"extractions": [...]} structure automatically. The provider uses use_wrapper=True for compatibility.
Code Examples
import langextract as lx
# Define your task
prompt = "Extract person names, locations, and dates in order of appearance."
examples = [
lx.data.ExampleData(
text = "Dr. Jane Smith visited Paris on March 15, 2024." ,
extractions = [
lx.data.Extraction(
extraction_class = "person" ,
extraction_text = "Dr. Jane Smith" ,
attributes = { "title" : "Dr." }
),
lx.data.Extraction(
extraction_class = "location" ,
extraction_text = "Paris" ,
attributes = { "type" : "city" }
),
lx.data.Extraction(
extraction_class = "date" ,
extraction_text = "March 15, 2024" ,
attributes = { "format" : "full date" }
)
]
)
]
# Run extraction with Ollama
result = lx.extract(
text = "Prof. John Doe traveled to London on April 20, 2024." ,
model_id = "gemma2:2b" ,
prompt_description = prompt,
examples = examples,
fence_output = False ,
use_schema_constraints = False
)
print ( f "Found { len (result.extractions) } extractions" )
for ext in result.extractions:
print ( f " { ext.extraction_class } : { ext.extraction_text } " )
Remote Ollama Server
import langextract as lx
# Connect to Ollama on a different machine
result = lx.extract(
text = "Your text" ,
model_id = "gemma2:2b" ,
model_url = "http://192.168.1.100:11434" , # Remote server
prompt_description = "Extract entities" ,
examples = [ ... ],
fence_output = False ,
use_schema_constraints = False
)
Docker Compose Setup
Create a docker-compose.yml for production:
version : '3.8'
services :
ollama :
image : ollama/ollama:latest
ports :
- "11434:11434"
volumes :
- ollama_data:/root/.ollama
healthcheck :
test : [ "CMD" , "curl" , "-f" , "http://localhost:11434/api/tags" ]
interval : 30s
timeout : 10s
retries : 3
langextract :
build : .
depends_on :
ollama :
condition : service_healthy
environment :
- OLLAMA_BASE_URL=http://ollama:11434
command : python your_extraction_script.py
volumes :
ollama_data :
Then in your Python code:
import langextract as lx
import os
result = lx.extract(
text = "Your text" ,
model_id = "gemma2:2b" ,
model_url = os.environ.get( 'OLLAMA_BASE_URL' , 'http://localhost:11434' ),
prompt_description = "Extract entities" ,
examples = [ ... ],
fence_output = False ,
use_schema_constraints = False
)
Authentication (Proxied Instances)
For proxied Ollama instances that require authentication:
from langextract.providers.ollama import OllamaLanguageModel
import langextract as lx
model = OllamaLanguageModel(
model_id = "gemma2:2b" ,
model_url = "https://your-proxy.example.com" ,
api_key = "your-api-key" ,
auth_scheme = "Bearer" , # or "Token", etc.
auth_header = "Authorization" # Header name
)
result = lx.extract(
text = "Your text" ,
model = model,
prompt_description = "Extract data" ,
examples = [ ... ],
fence_output = False ,
use_schema_constraints = False
)
Native Ollama doesn’t require authentication. Only use api_key for proxied instances.
CPU Threads
result = lx.extract(
text = "Your text" ,
model_id = "gemma2:2b" ,
prompt_description = "Extract entities" ,
examples = [ ... ],
fence_output = False ,
use_schema_constraints = False ,
num_threads = 8 , # Utilize more CPU cores
)
Context Window
result = lx.extract(
text = "Your text" ,
model_id = "gemma2:2b" ,
prompt_description = "Extract entities" ,
examples = [ ... ],
fence_output = False ,
use_schema_constraints = False ,
num_ctx = 4096 , # Increase context window (default: 2048)
)
Keep-Alive
Control how long the model stays loaded:
result = lx.extract(
text = "Your text" ,
model_id = "gemma2:2b" ,
prompt_description = "Extract entities" ,
examples = [ ... ],
fence_output = False ,
use_schema_constraints = False ,
keep_alive = 600 , # Keep loaded for 10 minutes (default: 300)
)
Error Handling
import langextract as lx
from langextract.core.exceptions import InferenceConfigError, InferenceRuntimeError
try :
result = lx.extract(
text = "Your text" ,
model_id = "gemma2:2b" ,
prompt_description = "Extract data" ,
examples = [ ... ],
fence_output = False ,
use_schema_constraints = False
)
except InferenceConfigError as e:
# Configuration errors (model not found, invalid URL)
if "Can't find Ollama" in str (e):
print ( "Model not found. Run: ollama pull gemma2:2b" )
else :
print ( f "Configuration error: { e } " )
except InferenceRuntimeError as e:
# Runtime errors (timeouts, connection errors)
if "timed out" in str (e):
print ( "Request timed out. Try increasing timeout parameter." )
else :
print ( f "Runtime error: { e } " )
Troubleshooting
Model Not Found
InferenceConfigError: Can't find Ollama gemma2:2b. Try: ollama run gemma2:2b
Solution : Pull the model first:
Connection Refused
InferenceRuntimeError: Ollama request failed: Connection refused
Solutions :
Start Ollama: ollama serve
Check if running: curl http://localhost:11434/api/tags
Verify URL: model_url="http://localhost:11434"
Timeout Errors
InferenceRuntimeError: Ollama Model timed out (timeout=120)
Solution : Increase timeout for larger models:
result = lx.extract(
text = "Your text" ,
model_id = "llama3.1:70b" ,
timeout = 300 , # 5 minutes
# ...
)
Model Licenses
Ollama models have their own licenses: Review the license for any model you use.
Direct Provider Usage
For advanced use cases, instantiate the provider directly:
from langextract.providers.ollama import OllamaLanguageModel
from langextract.core.types import ScoredOutput
import langextract as lx
model = OllamaLanguageModel(
model_id = "gemma2:2b" ,
model_url = "http://localhost:11434" ,
format_type = lx.data.FormatType. JSON ,
timeout = 120
)
# Run inference on prompts
prompts = [ "Extract entities from: ..." , "Summarize: ..." ]
for outputs in model.infer(prompts):
for scored_output in outputs:
print ( f "Score: { scored_output.score } , Output: { scored_output.output } " )
Comparison with Cloud Providers
Feature Ollama Gemini OpenAI API Key Not required Required Required Internet Not required Required Required Cost Free (hardware only) Pay per token Pay per token Privacy Fully local Cloud-based Cloud-based Speed Depends on hardware Fast Fast Model Selection Open source models Gemini family GPT family Schema Constraints Not supported Supported Not supported
Next Steps
Provider Overview Learn about the provider architecture
Gemini Provider Use Google’s Gemini models
OpenAI Provider Use OpenAI’s GPT models
Custom Providers Create your own providers