Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/bidewio/better-openclaw/llms.txt

Use this file to discover all available pages before exploring further.

The Local AI skill pack enables completely local AI inference without external API dependencies. Run language models and speech-to-text on your own hardware.

Included Services

Ollama

Local LLM inference for chat and embeddings

Whisper

Speech-to-text transcription

Skills Provided

Ollama Local LLM

Capabilities:
  • Chat completion
  • Text generation
  • Code generation
  • Text embeddings for RAG
  • JSON-structured output
  • Multi-turn conversations
  • Streaming responses
Example Usage:
# Chat completion
curl -X POST "http://ollama:11434/api/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing"}
    ],
    "stream": false
  }'

# Text generation
curl -X POST "http://ollama:11434/api/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "codellama",
    "prompt": "Write a Python function to calculate fibonacci numbers",
    "stream": false
  }'

# Generate embeddings
curl -X POST "http://ollama:11434/api/embed" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nomic-embed-text",
    "input": ["Text to embed", "Another text"]
  }'

# JSON output mode
curl -X POST "http://ollama:11434/api/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "messages": [
      {"role": "user", "content": "List 3 programming languages"}
    ],
    "format": "json",
    "stream": false
  }'

Whisper Transcribe

Capabilities:
  • Audio transcription
  • Multiple language support
  • Speaker diarization
  • Timestamp generation
  • Various audio formats
  • Subtitle generation (SRT, VTT)
Example Usage:
# Transcribe audio file
curl -X POST "http://whisper:9000/asr?task=transcribe&language=en&output=json" \
  -F "audio_file=@/data/audio/recording.mp3"

# Response:
{
  "text": "Hello, this is a test recording.",
  "segments": [
    {
      "start": 0.0,
      "end": 2.5,
      "text": "Hello, this is a test recording."
    }
  ]
}

# Transcribe with timestamps (SRT format)
curl -X POST "http://whisper:9000/asr?task=transcribe&language=en&output=srt" \
  -F "audio_file=@/data/audio/recording.mp3"

# Translate to English
curl -X POST "http://whisper:9000/asr?task=translate&output=json" \
  -F "audio_file=@/data/audio/spanish.mp3"

Use Cases

RAG (Retrieval-Augmented Generation)

Build a complete local RAG system:
# 1. Generate embeddings for documents
curl -X POST "http://ollama:11434/api/embed" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nomic-embed-text",
    "input": ["Document content here..."]
  }'

# 2. Store in Qdrant (from Knowledge Base pack)
curl -X PUT "http://qdrant:6333/collections/docs/points" \
  -H "Content-Type: application/json" \
  -d '{
    "points": [{
      "id": 1,
      "vector": [...embedding...],
      "payload": {"text": "Document content"}
    }]
  }'

# 3. Query: Generate query embedding
QUERY_EMBEDDING=$(curl -s -X POST "http://ollama:11434/api/embed" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nomic-embed-text",
    "input": ["What is quantum computing?"]
  }' | jq -r '.embeddings[0]')

# 4. Search Qdrant
RESULTS=$(curl -s -X POST "http://qdrant:6333/collections/docs/points/search" \
  -H "Content-Type: application/json" \
  -d "{
    \"vector\": $QUERY_EMBEDDING,
    \"limit\": 5
  }")

# 5. Generate answer with context
curl -X POST "http://ollama:11434/api/chat" \
  -H "Content-Type: application/json" \
  -d "{
    \"model\": \"llama3.2\",
    \"messages\": [
      {\"role\": \"system\", \"content\": \"Answer based on this context: $RESULTS\"},
      {\"role\": \"user\", \"content\": \"What is quantum computing?\"}
    ],
    \"stream\": false
  }"

Video Transcription Pipeline

Combine with Video Creator pack:
# 1. Extract audio from video (FFmpeg)
ffmpeg -i /data/videos/lecture.mp4 \
  -vn -ar 16000 -ac 1 \
  /data/audio/lecture.wav

# 2. Transcribe with Whisper
curl -X POST "http://whisper:9000/asr?task=transcribe&language=en&output=srt" \
  -F "audio_file=@/data/audio/lecture.wav" \
  -o /data/subtitles/lecture.srt

# 3. Burn subtitles into video
ffmpeg -i /data/videos/lecture.mp4 \
  -vf "subtitles=/data/subtitles/lecture.srt" \
  /data/output/lecture_subtitled.mp4

# 4. Generate summary with Ollama
TRANSCRIPT=$(cat /data/subtitles/lecture.srt)
curl -X POST "http://ollama:11434/api/chat" \
  -H "Content-Type: application/json" \
  -d "{
    \"model\": \"llama3.2\",
    \"messages\": [
      {\"role\": \"system\", \"content\": \"Summarize this lecture transcript.\"},
      {\"role\": \"user\", \"content\": \"$TRANSCRIPT\"}
    ],
    \"stream\": false
  }"

Code Assistant

Local code generation and review:
# Generate code
curl -X POST "http://ollama:11434/api/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "codellama",
    "prompt": "Write a REST API endpoint in Python using FastAPI for user registration",
    "stream": false
  }'

# Code review
curl -X POST "http://ollama:11434/api/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "codellama",
    "messages": [
      {
        "role": "system",
        "content": "You are a code reviewer. Find bugs and suggest improvements."
      },
      {
        "role": "user",
        "content": "Review this code: '$(cat app.py)'"
      }
    ],
    "stream": false
  }'

Chatbot with Memory

Build a stateful chatbot:
// Store conversation in Redis (from DevOps pack)
const conversationKey = `chat:${userId}:history`;

// Add user message
await redis.rpush(conversationKey, JSON.stringify({
  role: 'user',
  content: userMessage
}));

// Get conversation history
const history = await redis.lrange(conversationKey, -10, -1);
const messages = history.map(JSON.parse);

// Generate response
const response = await fetch('http://ollama:11434/api/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: 'llama3.2',
    messages: messages,
    stream: false
  })
});

// Store assistant response
const answer = await response.json();
await redis.rpush(conversationKey, JSON.stringify({
  role: 'assistant',
  content: answer.message.content
}));

// Set expiry (24 hours)
await redis.expire(conversationKey, 86400);

General Purpose

ModelSizeUse CaseMemory
llama3.23BFast chat and reasoning4 GB
llama3.2:70b70BComplex reasoning40 GB
mistral7BBalanced performance5 GB
phi33.8BEfficient reasoning4 GB

Code Generation

ModelSizeUse CaseMemory
codellama7BCode generation5 GB
codellama:13b13BAdvanced code tasks8 GB
deepseek-coder6.7BMulti-language coding5 GB

Embeddings

ModelSizeDimensionsMemory
nomic-embed-text137M7681 GB
mxbai-embed-large335M10242 GB
all-minilm23M384512 MB

Managing Models

# List installed models
curl "http://ollama:11434/api/tags"

# Pull a new model
curl -X POST "http://ollama:11434/api/pull" \
  -H "Content-Type: application/json" \
  -d '{"name": "llama3.2"}'

# Delete a model
curl -X DELETE "http://ollama:11434/api/delete" \
  -H "Content-Type: application/json" \
  -d '{"name": "old-model"}'

# Show model info
curl -X POST "http://ollama:11434/api/show" \
  -H "Content-Type: application/json" \
  -d '{"name": "llama3.2"}'

Configuration

Environment Variables

# Ollama
OLLAMA_HOST=ollama
OLLAMA_PORT=11434
OLLAMA_MODELS=/data/ollama/models  # Model storage

# Whisper
WHISPER_HOST=whisper
WHISPER_PORT=9000
WHISPER_MODEL=base  # tiny, base, small, medium, large

Volume Mounts

Models persist across restarts:
services:
  ollama:
    volumes:
      - ollama_models:/root/.ollama
  
  whisper:
    volumes:
      - whisper_models:/root/.cache/whisper

volumes:
  ollama_models:
  whisper_models:

Memory Requirements

Ollama

Memory depends on model size:
  • Small models (3B-7B): 4-6 GB
  • Medium models (13B-30B): 10-20 GB
  • Large models (70B+): 40+ GB
GPU acceleration recommended for larger models.

Whisper

Memory depends on model variant:
  • tiny: ~1 GB
  • base: ~1 GB
  • small: ~2 GB
  • medium: ~5 GB
  • large: ~10 GB
Total Pack: ~4-8 GB minimum (with small models)

Performance Tips

Ollama

  • Use GPU if available: docker run --gpus all
  • Set num_gpu layers in model config
  • Lower temperature for consistent output
  • Use seed for reproducible results
  • Enable stream: false for full responses

Whisper

  • Use base or small model for real-time
  • Convert audio to 16kHz mono WAV for best performance
  • Use tiny model for quick drafts, medium for accuracy
  • Enable GPU acceleration for large models

Embedding Generation

Batch embeddings for efficiency:
# Single request with multiple inputs
curl -X POST "http://ollama:11434/api/embed" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nomic-embed-text",
    "input": [
      "Document 1 text",
      "Document 2 text",
      "Document 3 text"
    ]
  }'

GPU Acceleration

NVIDIA GPU

Enable GPU support in docker-compose:
services:
  ollama:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
Verify GPU is detected:
docker exec ollama nvidia-smi

Next Steps

Knowledge Base Pack

Build RAG systems with vector search

Video Creator Pack

Add video transcription workflows

Build docs developers (and LLMs) love