Documentation Index Fetch the complete documentation index at: https://mintlify.com/bidewio/better-openclaw/llms.txt
Use this file to discover all available pages before exploring further.
The Local AI skill pack enables completely local AI inference without external API dependencies. Run language models and speech-to-text on your own hardware.
Included Services
Ollama Local LLM inference for chat and embeddings
Whisper Speech-to-text transcription
Skills Provided
Ollama Local LLM
Capabilities:
Chat completion
Text generation
Code generation
Text embeddings for RAG
JSON-structured output
Multi-turn conversations
Streaming responses
Example Usage:
# Chat completion
curl -X POST "http://ollama:11434/api/chat" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing"}
],
"stream": false
}'
# Text generation
curl -X POST "http://ollama:11434/api/generate" \
-H "Content-Type: application/json" \
-d '{
"model": "codellama",
"prompt": "Write a Python function to calculate fibonacci numbers",
"stream": false
}'
# Generate embeddings
curl -X POST "http://ollama:11434/api/embed" \
-H "Content-Type: application/json" \
-d '{
"model": "nomic-embed-text",
"input": ["Text to embed", "Another text"]
}'
# JSON output mode
curl -X POST "http://ollama:11434/api/chat" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2",
"messages": [
{"role": "user", "content": "List 3 programming languages"}
],
"format": "json",
"stream": false
}'
Whisper Transcribe
Capabilities:
Audio transcription
Multiple language support
Speaker diarization
Timestamp generation
Various audio formats
Subtitle generation (SRT, VTT)
Example Usage:
# Transcribe audio file
curl -X POST "http://whisper:9000/asr?task=transcribe&language=en&output=json" \
-F "audio_file=@/data/audio/recording.mp3"
# Response:
{
"text" : "Hello, this is a test recording.",
"segments" : [
{
"start" : 0.0,
"end" : 2.5,
"text" : "Hello, this is a test recording."
}
]
}
# Transcribe with timestamps (SRT format)
curl -X POST "http://whisper:9000/asr?task=transcribe&language=en&output=srt" \
-F "audio_file=@/data/audio/recording.mp3"
# Translate to English
curl -X POST "http://whisper:9000/asr?task=translate&output=json" \
-F "audio_file=@/data/audio/spanish.mp3"
Use Cases
RAG (Retrieval-Augmented Generation)
Build a complete local RAG system:
# 1. Generate embeddings for documents
curl -X POST "http://ollama:11434/api/embed" \
-H "Content-Type: application/json" \
-d '{
"model": "nomic-embed-text",
"input": ["Document content here..."]
}'
# 2. Store in Qdrant (from Knowledge Base pack)
curl -X PUT "http://qdrant:6333/collections/docs/points" \
-H "Content-Type: application/json" \
-d '{
"points": [{
"id": 1,
"vector": [...embedding...],
"payload": {"text": "Document content"}
}]
}'
# 3. Query: Generate query embedding
QUERY_EMBEDDING = $( curl -s -X POST "http://ollama:11434/api/embed" \
-H "Content-Type: application/json" \
-d '{
"model": "nomic-embed-text",
"input": ["What is quantum computing?"]
}' | jq -r '.embeddings[0]' )
# 4. Search Qdrant
RESULTS = $( curl -s -X POST "http://qdrant:6333/collections/docs/points/search" \
-H "Content-Type: application/json" \
-d "{
\" vector \" : $QUERY_EMBEDDING ,
\" limit \" : 5
}" )
# 5. Generate answer with context
curl -X POST "http://ollama:11434/api/chat" \
-H "Content-Type: application/json" \
-d "{
\" model \" : \" llama3.2 \" ,
\" messages \" : [
{ \" role \" : \" system \" , \" content \" : \" Answer based on this context: $RESULTS \" },
{ \" role \" : \" user \" , \" content \" : \" What is quantum computing? \" }
],
\" stream \" : false
}"
Video Transcription Pipeline
Combine with Video Creator pack:
# 1. Extract audio from video (FFmpeg)
ffmpeg -i /data/videos/lecture.mp4 \
-vn -ar 16000 -ac 1 \
/data/audio/lecture.wav
# 2. Transcribe with Whisper
curl -X POST "http://whisper:9000/asr?task=transcribe&language=en&output=srt" \
-F "audio_file=@/data/audio/lecture.wav" \
-o /data/subtitles/lecture.srt
# 3. Burn subtitles into video
ffmpeg -i /data/videos/lecture.mp4 \
-vf "subtitles=/data/subtitles/lecture.srt" \
/data/output/lecture_subtitled.mp4
# 4. Generate summary with Ollama
TRANSCRIPT = $( cat /data/subtitles/lecture.srt )
curl -X POST "http://ollama:11434/api/chat" \
-H "Content-Type: application/json" \
-d "{
\" model \" : \" llama3.2 \" ,
\" messages \" : [
{ \" role \" : \" system \" , \" content \" : \" Summarize this lecture transcript. \" },
{ \" role \" : \" user \" , \" content \" : \" $TRANSCRIPT \" }
],
\" stream \" : false
}"
Code Assistant
Local code generation and review:
# Generate code
curl -X POST "http://ollama:11434/api/generate" \
-H "Content-Type: application/json" \
-d '{
"model": "codellama",
"prompt": "Write a REST API endpoint in Python using FastAPI for user registration",
"stream": false
}'
# Code review
curl -X POST "http://ollama:11434/api/chat" \
-H "Content-Type: application/json" \
-d '{
"model": "codellama",
"messages": [
{
"role": "system",
"content": "You are a code reviewer. Find bugs and suggest improvements."
},
{
"role": "user",
"content": "Review this code: ' $( cat app.py ) '"
}
],
"stream": false
}'
Chatbot with Memory
Build a stateful chatbot:
// Store conversation in Redis (from DevOps pack)
const conversationKey = `chat: ${ userId } :history` ;
// Add user message
await redis . rpush ( conversationKey , JSON . stringify ({
role: 'user' ,
content: userMessage
}));
// Get conversation history
const history = await redis . lrange ( conversationKey , - 10 , - 1 );
const messages = history . map ( JSON . parse );
// Generate response
const response = await fetch ( 'http://ollama:11434/api/chat' , {
method: 'POST' ,
headers: { 'Content-Type' : 'application/json' },
body: JSON . stringify ({
model: 'llama3.2' ,
messages: messages ,
stream: false
})
});
// Store assistant response
const answer = await response . json ();
await redis . rpush ( conversationKey , JSON . stringify ({
role: 'assistant' ,
content: answer . message . content
}));
// Set expiry (24 hours)
await redis . expire ( conversationKey , 86400 );
Recommended Models
General Purpose
Model Size Use Case Memory llama3.23B Fast chat and reasoning 4 GB llama3.2:70b70B Complex reasoning 40 GB mistral7B Balanced performance 5 GB phi33.8B Efficient reasoning 4 GB
Code Generation
Model Size Use Case Memory codellama7B Code generation 5 GB codellama:13b13B Advanced code tasks 8 GB deepseek-coder6.7B Multi-language coding 5 GB
Embeddings
Model Size Dimensions Memory nomic-embed-text137M 768 1 GB mxbai-embed-large335M 1024 2 GB all-minilm23M 384 512 MB
Managing Models
# List installed models
curl "http://ollama:11434/api/tags"
# Pull a new model
curl -X POST "http://ollama:11434/api/pull" \
-H "Content-Type: application/json" \
-d '{"name": "llama3.2"}'
# Delete a model
curl -X DELETE "http://ollama:11434/api/delete" \
-H "Content-Type: application/json" \
-d '{"name": "old-model"}'
# Show model info
curl -X POST "http://ollama:11434/api/show" \
-H "Content-Type: application/json" \
-d '{"name": "llama3.2"}'
Configuration
Environment Variables
# Ollama
OLLAMA_HOST = ollama
OLLAMA_PORT = 11434
OLLAMA_MODELS = /data/ollama/models # Model storage
# Whisper
WHISPER_HOST = whisper
WHISPER_PORT = 9000
WHISPER_MODEL = base # tiny, base, small, medium, large
Volume Mounts
Models persist across restarts:
services :
ollama :
volumes :
- ollama_models:/root/.ollama
whisper :
volumes :
- whisper_models:/root/.cache/whisper
volumes :
ollama_models :
whisper_models :
Memory Requirements
Ollama
Memory depends on model size:
Small models (3B-7B): 4-6 GB
Medium models (13B-30B): 10-20 GB
Large models (70B+): 40+ GB
GPU acceleration recommended for larger models.
Whisper
Memory depends on model variant:
tiny : ~1 GB
base : ~1 GB
small : ~2 GB
medium : ~5 GB
large : ~10 GB
Total Pack : ~4-8 GB minimum (with small models)
Ollama
Use GPU if available: docker run --gpus all
Set num_gpu layers in model config
Lower temperature for consistent output
Use seed for reproducible results
Enable stream: false for full responses
Whisper
Use base or small model for real-time
Convert audio to 16kHz mono WAV for best performance
Use tiny model for quick drafts, medium for accuracy
Enable GPU acceleration for large models
Embedding Generation
Batch embeddings for efficiency:
# Single request with multiple inputs
curl -X POST "http://ollama:11434/api/embed" \
-H "Content-Type: application/json" \
-d '{
"model": "nomic-embed-text",
"input": [
"Document 1 text",
"Document 2 text",
"Document 3 text"
]
}'
GPU Acceleration
NVIDIA GPU
Enable GPU support in docker-compose:
services :
ollama :
deploy :
resources :
reservations :
devices :
- driver : nvidia
count : 1
capabilities : [ gpu ]
Verify GPU is detected:
docker exec ollama nvidia-smi
Next Steps
Knowledge Base Pack Build RAG systems with vector search
Video Creator Pack Add video transcription workflows