Local AI Models

Run AI models locally for privacy, cost savings, and offline operation. Includes LLM inference, image generation, and speech-to-text.

Available Services

Ollama

Port: 11434 | Memory: 2048 MB | Maturity: StableRun large language models locally with an easy-to-use API. Supports Llama, Mistral, Gemma, and many more open-source models.Features:

100+ open-source models
Simple REST API
Model management CLI
Streaming responses
OpenAI-compatible API
CPU and GPU support

Supported Models:

Llama 3.3, Llama 3.2, Llama 3.1
Mistral, Mixtral
Gemma 2, CodeGemma
Phi-3, Qwen 2.5
DeepSeek-Coder

OpenClaw Integration:

Skill: ollama-local-llm
Environment: OLLAMA_HOST, OLLAMA_PORT

Documentation

ComfyUI

Port: 8188 | Memory: 4096 MB | Maturity: ExperimentalNode-based visual workflow editor for Stable Diffusion and other generative AI models. Design complex image/video generation pipelines.Features:

Node-based workflow editor
Stable Diffusion support
ControlNet, LoRA, VAE support
Custom nodes ecosystem
REST API
Batch processing

Requirements:

NVIDIA GPU with CUDA
nvidia-docker2 installed
Minimum 4 GB VRAM (8 GB+ recommended)

OpenClaw Integration:

Skill: comfyui-generate
Environment: COMFYUI_HOST, COMFYUI_PORT

⚠️ GPU RequiredDocumentation

Stable Diffusion WebUI

Port: 7860 | Memory: 4096 MB | Maturity: ExperimentalLocal AI image generation with a web interface. Generate images from text prompts using Stable Diffusion.Features:

Text-to-image generation
Image-to-image transformation
Inpainting and outpainting
Model management
Extensions support
Batch processing

Requirements:

NVIDIA GPU with CUDA
nvidia-docker2 installed
Minimum 4 GB VRAM

⚠️ GPU RequiredDocumentation

Faster Whisper Server

Port: 8001 | Memory: 1024 MB | Maturity: BetaSelf-hosted speech-to-text transcription service using the Faster Whisper engine for high-performance audio transcription.Features:

OpenAI Whisper models
Fast inference (CTranslate2)
Multiple languages
OpenAI-compatible API
Timestamp support
CPU and GPU support

Supported Models:

tiny, base, small, medium, large
Multilingual and English-only variants

OpenClaw Integration:

Skill: whisper-transcribe
Environment: WHISPER_HOST, WHISPER_PORT

Documentation

Usage Examples

Local LLM Stack

npx create-better-openclaw --services ollama,open-webui --yes

Image Generation Stack (GPU Required)

npx create-better-openclaw --services comfyui,stable-diffusion --yes

Complete Local AI Stack

npx create-better-openclaw --preset local-ai --yes

Audio Transcription Stack

npx create-better-openclaw --services whisper,redis --yes

Model Management

Ollama Models

Pull models into Ollama:

# Access Ollama container
docker exec -it ollama bash

# Pull a model
ollama pull llama3.3
ollama pull mistral
ollama pull codellama

ComfyUI Models

Download Stable Diffusion checkpoints to the comfyui-models volume:

# Models go in: /opt/ComfyUI/models/checkpoints/
# LoRAs go in: /opt/ComfyUI/models/loras/
# VAEs go in: /opt/ComfyUI/models/vae/

Hardware Requirements

CPU-Only (LLMs)

Model Size	RAM Required	Performance
7B params	8 GB	Good
13B params	16 GB	Moderate
34B params	32 GB	Slow
70B params	64 GB+	Very Slow

GPU (Image Generation)

VRAM	Supported Models	Performance
4 GB	SD 1.5	Slow
6 GB	SD 1.5, SDXL (low res)	Moderate
8 GB	SDXL	Good
12 GB+	SDXL, SD3	Excellent

Performance Tips

Ollama Optimization

Model Selection: Start with smaller models (7B) for faster inference
Context Length: Reduce context window for speed
Quantization: Use Q4 or Q5 quantized models
GPU Acceleration: Add NVIDIA GPU support with nvidia-docker2

Image Generation Optimization

GPU Required: CPU inference is extremely slow (minutes per image)
VRAM Management: Close other GPU applications
Batch Size: Reduce for lower VRAM usage
Resolution: Start with 512x512, then scale up

Privacy & Offline Operation

Local models provide:

Data Privacy: All processing happens on your infrastructure
Offline Operation: No internet required after model download
Cost Savings: No API costs for inference
Customization: Fine-tune models for specific tasks
Low Latency: No network round trips

Integration Patterns

Local LLM + RAG

npx create-better-openclaw \
  --services ollama,open-webui,qdrant,redis \
  --yes

npx create-better-openclaw \
  --services ollama,comfyui,whisper \
  --yes

Local AI Development

npx create-better-openclaw \
  --services ollama,opencode,redis,postgresql \
  --yes

Service Catalog

Skill Packs

Local AI Models

Local AI Models

Available Services

Ollama

ComfyUI

Stable Diffusion WebUI

Faster Whisper Server

Usage Examples

Local LLM Stack

Image Generation Stack (GPU Required)

Complete Local AI Stack

Audio Transcription Stack

Model Management

Ollama Models

ComfyUI Models

Hardware Requirements

CPU-Only (LLMs)

GPU (Image Generation)

Performance Tips

Ollama Optimization

Image Generation Optimization

Privacy & Offline Operation

Integration Patterns

Local LLM + RAG

Local AI Development

Build docs developers (and LLMs) love

Service Catalog

Skill Packs

Documentation Index

​Local AI Models

​Available Services

Ollama

ComfyUI

Stable Diffusion WebUI

Faster Whisper Server

​Usage Examples

​Local LLM Stack

​Image Generation Stack (GPU Required)

​Complete Local AI Stack

​Audio Transcription Stack

​Model Management

​Ollama Models

​ComfyUI Models

​Hardware Requirements

​CPU-Only (LLMs)

​GPU (Image Generation)

​Performance Tips

​Ollama Optimization

​Image Generation Optimization

​Privacy & Offline Operation

​Integration Patterns

​Local LLM + RAG

​Multi-Modal Local AI

​Local AI Development

Build docs developers (and LLMs) love

Local AI Models

Available Services

Usage Examples

Local LLM Stack

Image Generation Stack (GPU Required)

Complete Local AI Stack

Audio Transcription Stack

Model Management

Ollama Models

ComfyUI Models

Hardware Requirements

CPU-Only (LLMs)

GPU (Image Generation)

Performance Tips

Ollama Optimization

Image Generation Optimization

Privacy & Offline Operation

Integration Patterns

Local LLM + RAG

Multi-Modal Local AI

Local AI Development