Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/bidewio/better-openclaw/llms.txt

Use this file to discover all available pages before exploring further.

Local AI Models

Run AI models locally for privacy, cost savings, and offline operation. Includes LLM inference, image generation, and speech-to-text.

Available Services

Ollama

Port: 11434 | Memory: 2048 MB | Maturity: StableRun large language models locally with an easy-to-use API. Supports Llama, Mistral, Gemma, and many more open-source models.Features:
  • 100+ open-source models
  • Simple REST API
  • Model management CLI
  • Streaming responses
  • OpenAI-compatible API
  • CPU and GPU support
Supported Models:
  • Llama 3.3, Llama 3.2, Llama 3.1
  • Mistral, Mixtral
  • Gemma 2, CodeGemma
  • Phi-3, Qwen 2.5
  • DeepSeek-Coder
OpenClaw Integration:
  • Skill: ollama-local-llm
  • Environment: OLLAMA_HOST, OLLAMA_PORT
Documentation

ComfyUI

Port: 8188 | Memory: 4096 MB | Maturity: ExperimentalNode-based visual workflow editor for Stable Diffusion and other generative AI models. Design complex image/video generation pipelines.Features:
  • Node-based workflow editor
  • Stable Diffusion support
  • ControlNet, LoRA, VAE support
  • Custom nodes ecosystem
  • REST API
  • Batch processing
Requirements:
  • NVIDIA GPU with CUDA
  • nvidia-docker2 installed
  • Minimum 4 GB VRAM (8 GB+ recommended)
OpenClaw Integration:
  • Skill: comfyui-generate
  • Environment: COMFYUI_HOST, COMFYUI_PORT
⚠️ GPU RequiredDocumentation

Stable Diffusion WebUI

Port: 7860 | Memory: 4096 MB | Maturity: ExperimentalLocal AI image generation with a web interface. Generate images from text prompts using Stable Diffusion.Features:
  • Text-to-image generation
  • Image-to-image transformation
  • Inpainting and outpainting
  • Model management
  • Extensions support
  • Batch processing
Requirements:
  • NVIDIA GPU with CUDA
  • nvidia-docker2 installed
  • Minimum 4 GB VRAM
⚠️ GPU RequiredDocumentation

Faster Whisper Server

Port: 8001 | Memory: 1024 MB | Maturity: BetaSelf-hosted speech-to-text transcription service using the Faster Whisper engine for high-performance audio transcription.Features:
  • OpenAI Whisper models
  • Fast inference (CTranslate2)
  • Multiple languages
  • OpenAI-compatible API
  • Timestamp support
  • CPU and GPU support
Supported Models:
  • tiny, base, small, medium, large
  • Multilingual and English-only variants
OpenClaw Integration:
  • Skill: whisper-transcribe
  • Environment: WHISPER_HOST, WHISPER_PORT
Documentation

Usage Examples

Local LLM Stack

npx create-better-openclaw --services ollama,open-webui --yes

Image Generation Stack (GPU Required)

npx create-better-openclaw --services comfyui,stable-diffusion --yes

Complete Local AI Stack

npx create-better-openclaw --preset local-ai --yes

Audio Transcription Stack

npx create-better-openclaw --services whisper,redis --yes

Model Management

Ollama Models

Pull models into Ollama:
# Access Ollama container
docker exec -it ollama bash

# Pull a model
ollama pull llama3.3
ollama pull mistral
ollama pull codellama

ComfyUI Models

Download Stable Diffusion checkpoints to the comfyui-models volume:
# Models go in: /opt/ComfyUI/models/checkpoints/
# LoRAs go in: /opt/ComfyUI/models/loras/
# VAEs go in: /opt/ComfyUI/models/vae/

Hardware Requirements

CPU-Only (LLMs)

Model SizeRAM RequiredPerformance
7B params8 GBGood
13B params16 GBModerate
34B params32 GBSlow
70B params64 GB+Very Slow

GPU (Image Generation)

VRAMSupported ModelsPerformance
4 GBSD 1.5Slow
6 GBSD 1.5, SDXL (low res)Moderate
8 GBSDXLGood
12 GB+SDXL, SD3Excellent

Performance Tips

Ollama Optimization

  1. Model Selection: Start with smaller models (7B) for faster inference
  2. Context Length: Reduce context window for speed
  3. Quantization: Use Q4 or Q5 quantized models
  4. GPU Acceleration: Add NVIDIA GPU support with nvidia-docker2

Image Generation Optimization

  1. GPU Required: CPU inference is extremely slow (minutes per image)
  2. VRAM Management: Close other GPU applications
  3. Batch Size: Reduce for lower VRAM usage
  4. Resolution: Start with 512x512, then scale up

Privacy & Offline Operation

Local models provide:
  • Data Privacy: All processing happens on your infrastructure
  • Offline Operation: No internet required after model download
  • Cost Savings: No API costs for inference
  • Customization: Fine-tune models for specific tasks
  • Low Latency: No network round trips

Integration Patterns

Local LLM + RAG

npx create-better-openclaw \
  --services ollama,open-webui,qdrant,redis \
  --yes

Multi-Modal Local AI

npx create-better-openclaw \
  --services ollama,comfyui,whisper \
  --yes

Local AI Development

npx create-better-openclaw \
  --services ollama,opencode,redis,postgresql \
  --yes

Build docs developers (and LLMs) love