This guide covers the full setup and launch process for USB-Uncensored-LLM on macOS, including Intel Macs and Apple Silicon (M1/M2/M3) machines. The macOS installer downloads a nativeDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/techjarves/USB-Uncensored-LLM/llms.txt
Use this file to discover all available pages before exploring further.
ollama-darwin binary along with its supporting runtime libraries, then imports your chosen models directly on the drive. Apple Silicon machines benefit from automatic Metal GPU acceleration for significantly faster inference.
Prerequisites
- macOS 11 (Big Sur) or later
- Intel x86-64 or Apple Silicon (M1 / M2 / M3)
- 8 GB RAM minimum (16 GB recommended for 9B/12B models)
- Python 3 (pre-installed on macOS 11+)
Installation
Run install.command
Drag the
Mac/install.command file from Finder into the Terminal window and press Enter. This executes the installer with full filesystem path resolution.You can also
cd to the Mac/ directory and run bash install.command directly. Both methods are equivalent.Choose your AI model(s)
The installer displays the same interactive model catalog as the Windows version. Enter one or more numbers separated by commas, type Press Enter with no input to default to
all for every preset model, or enter c for a custom HuggingFace GGUF URL:[1] Gemma 2 2B Abliterated.Wait for the engine download and model import
The installer runs a seven-step process:
| Step | Action |
|---|---|
[1/7] | Model selection |
[2/7] | Creates Shared/models/, Shared/bin/, and Shared/vendor/ |
[3/7] | Downloads optional offline UI assets |
[4/7] | Downloads selected GGUF model files from HuggingFace |
[5/7] | Creates Modelfile-<name> configuration files in Shared/models/ |
[6/7] | Downloads ollama-darwin.tgz from GitHub Releases, extracts the ollama-darwin binary to Shared/bin/ollama-darwin and supporting runtime libraries (including llama-server) to Shared/lib/ollama/ |
[7/7] | Imports each selected model into the Ollama engine using ollama-darwin create, then shuts down the temporary server |
Launching
Double-clickMac/start.command in Finder, or run the following in Terminal:
ollama-darwin engine in the background, waits up to 60 seconds for it to become ready, then launches the Python chat server. Your default browser opens automatically at http://localhost:3333.
To shut down the AI engine, press Ctrl + C in the Terminal window running the chat server.
Apple Silicon Notes
On M1, M2, and M3 Macs, the Ollama engine automatically detects and uses Metal GPU acceleration through Apple’s unified memory architecture. No additional configuration is required. Metal acceleration allows the model to utilize GPU compute for matrix operations, substantially increasing inference speed — particularly for the larger 5–7 GB models. You can verify Metal is being used by observing lower CPU utilization and faster token generation compared to a pure CPU run.Environment Variables
start.command exports the following variables before starting the engine. All paths are relative to the USB root, keeping nothing on the host machine’s home directory:
| Variable | Value | Purpose |
|---|---|---|
OLLAMA_MODELS | Shared/models/ollama_data | Keeps model data on the USB drive |
OLLAMA_HOME | Shared/.ollama-runtime | Redirects Ollama’s runtime directory away from ~/.ollama |
OLLAMA_TMPDIR | Shared/.ollama-runtime/tmp | Redirects temporary files to the USB drive |
OLLAMA_ORIGINS | * | Enables LAN access from phones and tablets on the same network |
OLLAMA_HOST | 127.0.0.1:11434 | Binds Ollama to localhost port 11434 |
Uninstalling
To remove individual models or all downloaded data, run the macOS uninstall script from Terminal:Shared/ folder:
Shared/bin/ollama-darwin— the Ollama engine binaryShared/lib/ollama/— runtime libraries includingllama-serverShared/models/— downloaded GGUF model weights and ModelfilesShared/.ollama-runtime/— runtime state directory