Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/techjarves/USB-Uncensored-LLM/llms.txt

Use this file to discover all available pages before exploring further.

This guide covers the full setup and launch process for USB-Uncensored-LLM on macOS, including Intel Macs and Apple Silicon (M1/M2/M3) machines. The macOS installer downloads a native ollama-darwin binary along with its supporting runtime libraries, then imports your chosen models directly on the drive. Apple Silicon machines benefit from automatic Metal GPU acceleration for significantly faster inference.

Prerequisites

  • macOS 11 (Big Sur) or later
  • Intel x86-64 or Apple Silicon (M1 / M2 / M3)
  • 8 GB RAM minimum (16 GB recommended for 9B/12B models)
  • Python 3 (pre-installed on macOS 11+)

Installation

1

Open Terminal

Press Cmd + Space to open Spotlight, type Terminal, and press Enter.
2

Run install.command

Drag the Mac/install.command file from Finder into the Terminal window and press Enter. This executes the installer with full filesystem path resolution.
You can also cd to the Mac/ directory and run bash install.command directly. Both methods are equivalent.
3

Choose your AI model(s)

The installer displays the same interactive model catalog as the Windows version. Enter one or more numbers separated by commas, type all for every preset model, or enter c for a custom HuggingFace GGUF URL:
[1] Gemma 2 2B Abliterated (~1.6 GB) [UNCENSORED] - RECOMMENDED FOR ALL - BLAZING FAST
[2] Gemma 4 E4B Ultra Uncensored Heretic (~5.34 GB) [UNCENSORED] - HERETIC
[3] Qwen 3.5 9B Uncensored Aggressive (~5.2 GB) [UNCENSORED] - AGGRESSIVE
[4] NemoMix Unleashed 12B (~7.0 GB) [UNCENSORED] - HEAVYWEIGHT
[5] Dolphin 2.9 Llama 3 8B (~4.9 GB) [UNCENSORED]
[6] Phi-3.5 Mini 3.8B (~2.2 GB) [STANDARD] - LIGHTWEIGHT
[C] CUSTOM - Enter your own HuggingFace GGUF URL
Enter number(s) separated by commas (e.g. 1,3)
Press Enter with no input to default to [1] Gemma 2 2B Abliterated.
4

Wait for the engine download and model import

The installer runs a seven-step process:
StepAction
[1/7]Model selection
[2/7]Creates Shared/models/, Shared/bin/, and Shared/vendor/
[3/7]Downloads optional offline UI assets
[4/7]Downloads selected GGUF model files from HuggingFace
[5/7]Creates Modelfile-<name> configuration files in Shared/models/
[6/7]Downloads ollama-darwin.tgz from GitHub Releases, extracts the ollama-darwin binary to Shared/bin/ollama-darwin and supporting runtime libraries (including llama-server) to Shared/lib/ollama/
[7/7]Imports each selected model into the Ollama engine using ollama-darwin create, then shuts down the temporary server
5

Installation complete

When all steps succeed, the installer prints SETUP COMPLETE! YOUR PORTABLE AI IS READY! and lists the installed models. You are now ready to launch.
macOS Gatekeeper may block the downloaded binary. The installer automatically removes the quarantine attribute by running:
xattr -dr com.apple.quarantine "$OLLAMA_BIN" "$OLLAMA_LIB_DIR"
If you still see a security warning after installation, go to System Preferences → Security & Privacy → General and click Allow Anyway next to the blocked item. On macOS 13+, this setting is in System Settings → Privacy & Security.

Launching

Double-click Mac/start.command in Finder, or run the following in Terminal:
bash Mac/start.command
The script exports all required environment variables (see below), starts the ollama-darwin engine in the background, waits up to 60 seconds for it to become ready, then launches the Python chat server. Your default browser opens automatically at http://localhost:3333. To shut down the AI engine, press Ctrl + C in the Terminal window running the chat server.

Apple Silicon Notes

On M1, M2, and M3 Macs, the Ollama engine automatically detects and uses Metal GPU acceleration through Apple’s unified memory architecture. No additional configuration is required. Metal acceleration allows the model to utilize GPU compute for matrix operations, substantially increasing inference speed — particularly for the larger 5–7 GB models. You can verify Metal is being used by observing lower CPU utilization and faster token generation compared to a pure CPU run.

Environment Variables

start.command exports the following variables before starting the engine. All paths are relative to the USB root, keeping nothing on the host machine’s home directory:
VariableValuePurpose
OLLAMA_MODELSShared/models/ollama_dataKeeps model data on the USB drive
OLLAMA_HOMEShared/.ollama-runtimeRedirects Ollama’s runtime directory away from ~/.ollama
OLLAMA_TMPDIRShared/.ollama-runtime/tmpRedirects temporary files to the USB drive
OLLAMA_ORIGINS*Enables LAN access from phones and tablets on the same network
OLLAMA_HOST127.0.0.1:11434Binds Ollama to localhost port 11434

Uninstalling

To remove individual models or all downloaded data, run the macOS uninstall script from Terminal:
bash Mac/uninstall.command
This presents an interactive menu to remove selected models or all downloaded files while preserving the base project files. For a manual clean-up, delete these items from the Shared/ folder:
  • Shared/bin/ollama-darwin — the Ollama engine binary
  • Shared/lib/ollama/ — runtime libraries including llama-server
  • Shared/models/ — downloaded GGUF model weights and Modelfiles
  • Shared/.ollama-runtime/ — runtime state directory

Build docs developers (and LLMs) love