Run USB-Uncensored-LLM on macOS

This guide covers the full setup and launch process for USB-Uncensored-LLM on macOS, including Intel Macs and Apple Silicon (M1/M2/M3) machines. The macOS installer downloads a native ollama-darwin binary along with its supporting runtime libraries, then imports your chosen models directly on the drive. Apple Silicon machines benefit from automatic Metal GPU acceleration for significantly faster inference.

Prerequisites

macOS 11 (Big Sur) or later
Intel x86-64 or Apple Silicon (M1 / M2 / M3)
8 GB RAM minimum (16 GB recommended for 9B/12B models)
Python 3 (pre-installed on macOS 11+)

Installation

Open Terminal

Press Cmd + Space to open Spotlight, type Terminal, and press Enter.

Run install.command

Drag the Mac/install.command file from Finder into the Terminal window and press Enter. This executes the installer with full filesystem path resolution.

You can also cd to the Mac/ directory and run bash install.command directly. Both methods are equivalent.

Choose your AI model(s)

The installer displays the same interactive model catalog as the Windows version. Enter one or more numbers separated by commas, type all for every preset model, or enter c for a custom HuggingFace GGUF URL:

[1] Gemma 2 2B Abliterated (~1.6 GB) [UNCENSORED] - RECOMMENDED FOR ALL - BLAZING FAST
[2] Gemma 4 E4B Ultra Uncensored Heretic (~5.34 GB) [UNCENSORED] - HERETIC
[3] Qwen 3.5 9B Uncensored Aggressive (~5.2 GB) [UNCENSORED] - AGGRESSIVE
[4] NemoMix Unleashed 12B (~7.0 GB) [UNCENSORED] - HEAVYWEIGHT
[5] Dolphin 2.9 Llama 3 8B (~4.9 GB) [UNCENSORED]
[6] Phi-3.5 Mini 3.8B (~2.2 GB) [STANDARD] - LIGHTWEIGHT
[C] CUSTOM - Enter your own HuggingFace GGUF URL
Enter number(s) separated by commas (e.g. 1,3)

Press Enter with no input to default to [1] Gemma 2 2B Abliterated.

Wait for the engine download and model import

The installer runs a seven-step process:

Step	Action
`[1/7]`	Model selection
`[2/7]`	Creates `Shared/models/`, `Shared/bin/`, and `Shared/vendor/`
`[3/7]`	Downloads optional offline UI assets
`[4/7]`	Downloads selected GGUF model files from HuggingFace
`[5/7]`	Creates `Modelfile-<name>` configuration files in `Shared/models/`
`[6/7]`	Downloads `ollama-darwin.tgz` from GitHub Releases, extracts the `ollama-darwin` binary to `Shared/bin/ollama-darwin` and supporting runtime libraries (including `llama-server`) to `Shared/lib/ollama/`
`[7/7]`	Imports each selected model into the Ollama engine using `ollama-darwin create`, then shuts down the temporary server

Installation complete

When all steps succeed, the installer prints SETUP COMPLETE! YOUR PORTABLE AI IS READY! and lists the installed models. You are now ready to launch.

macOS Gatekeeper may block the downloaded binary. The installer automatically removes the quarantine attribute by running:

xattr -dr com.apple.quarantine "$OLLAMA_BIN" "$OLLAMA_LIB_DIR"

If you still see a security warning after installation, go to System Preferences → Security & Privacy → General and click Allow Anyway next to the blocked item. On macOS 13+, this setting is in System Settings → Privacy & Security.

Launching

Double-click Mac/start.command in Finder, or run the following in Terminal:

bash Mac/start.command

The script exports all required environment variables (see below), starts the ollama-darwin engine in the background, waits up to 60 seconds for it to become ready, then launches the Python chat server. Your default browser opens automatically at http://localhost:3333. To shut down the AI engine, press Ctrl + C in the Terminal window running the chat server.

Apple Silicon Notes

On M1, M2, and M3 Macs, the Ollama engine automatically detects and uses Metal GPU acceleration through Apple’s unified memory architecture. No additional configuration is required. Metal acceleration allows the model to utilize GPU compute for matrix operations, substantially increasing inference speed — particularly for the larger 5–7 GB models. You can verify Metal is being used by observing lower CPU utilization and faster token generation compared to a pure CPU run.

Environment Variables

start.command exports the following variables before starting the engine. All paths are relative to the USB root, keeping nothing on the host machine’s home directory:

Variable	Value	Purpose
`OLLAMA_MODELS`	`Shared/models/ollama_data`	Keeps model data on the USB drive
`OLLAMA_HOME`	`Shared/.ollama-runtime`	Redirects Ollama’s runtime directory away from `~/.ollama`
`OLLAMA_TMPDIR`	`Shared/.ollama-runtime/tmp`	Redirects temporary files to the USB drive
`OLLAMA_ORIGINS`	`*`	Enables LAN access from phones and tablets on the same network
`OLLAMA_HOST`	`127.0.0.1:11434`	Binds Ollama to localhost port 11434

Uninstalling

To remove individual models or all downloaded data, run the macOS uninstall script from Terminal:

bash Mac/uninstall.command

This presents an interactive menu to remove selected models or all downloaded files while preserving the base project files. For a manual clean-up, delete these items from the Shared/ folder:

Shared/bin/ollama-darwin — the Ollama engine binary
Shared/lib/ollama/ — runtime libraries including llama-server
Shared/models/ — downloaded GGUF model weights and Modelfiles
Shared/.ollama-runtime/ — runtime state directory

Get Started

Platform Guides

Models

Architecture

Reference

Prerequisites

Installation

Launching

Apple Silicon Notes

Environment Variables

Uninstalling

Build docs developers (and LLMs) love

Get Started

Platform Guides

Models

Architecture

Reference

Documentation Index

​Prerequisites

​Installation

​Launching

​Apple Silicon Notes

​Environment Variables

​Uninstalling

Build docs developers (and LLMs) love

Prerequisites

Installation

Launching

Apple Silicon Notes

Environment Variables

Uninstalling