This guide covers installation and launch of USB-Uncensored-LLM on Ubuntu 20.04+, Debian, and compatible Linux distributions. The installer downloads a nativeDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/techjarves/USB-Uncensored-LLM/llms.txt
Use this file to discover all available pages before exploring further.
ollama-linux binary and its bundled llama-server inference executable, then imports your chosen models — all inside the Shared/ folder. Nothing is written to your home directory or system paths.
Prerequisites
- Ubuntu 20.04+ or a Debian-compatible distribution
- x86-64 architecture
- 8 GB RAM minimum (16 GB recommended for 9B/12B models)
curlinstalled (sudo apt install curl)- Python 3 installed (
sudo apt install python3)
Installation
Open a terminal and navigate to the project folder
Open your terminal emulator and
cd to the USB-Uncensored-LLM root directory — whether that’s a mounted USB drive or a local clone.Choose your AI model(s)
The installer displays an interactive model catalog. Enter one or more numbers separated by commas, type Press Enter with no input to default to
all for every preset, or c for a custom HuggingFace GGUF URL:[1] Gemma 2 2B Abliterated.Wait for the engine download and model import
The installer runs a seven-step process:
The
| Step | Action |
|---|---|
[1/7] | Model selection |
[2/7] | Creates Shared/models/, Shared/bin/, and Shared/vendor/ |
[3/7] | Downloads optional offline UI assets |
[4/7] | Downloads selected GGUF model files from HuggingFace into Shared/models/ |
[5/7] | Creates Modelfile-<name> configuration files in Shared/models/ |
[6/7] | Downloads ollama-linux-amd64.tar.zst from GitHub Releases and extracts it directly into Shared/, placing the binary at Shared/bin/ollama-linux and runtime files (including llama-server) at Shared/lib/ollama/ |
[7/7] | Imports each selected model into the Ollama engine using ollama-linux create, then shuts down the temporary server |
HOME environment variable is overridden to Shared/models/.ollama-runtime during import so that nothing is written to ~/.ollama on the host machine.Launching
Shared/bin/ollama-linux serve in the background (with HOME overridden to keep data on the drive), waits for the engine to become ready, then launches the Python chat server. The terminal displays the local access URL:
http://localhost:3333. The chat server also detects and prints your machine’s LAN IP address so other devices on the same network can connect.
Press Ctrl + C to stop the chat server and shut down the Ollama engine.
CUDA GPU Acceleration
If NVIDIA drivers are installed on the host machine, Ollama automatically detects and uses CUDA for GPU-accelerated inference. No additional configuration is required — the detection is built into theollama-linux binary. Verify that your GPU is visible with:
nvidia-smi shows your GPU, Ollama will use it automatically the next time you run bash Linux/start.sh. Expect significantly higher tokens-per-second compared to CPU-only mode, especially for the larger 9B and 12B models.
Environment Variables
start.sh exports the following variables before starting the engine. All paths resolve inside the Shared/ folder, keeping the host system’s home directory completely clean:
| Variable | Value | Purpose |
|---|---|---|
OLLAMA_MODELS | Shared/models/ollama_data | Keeps model data on the USB drive |
OLLAMA_HOME | Shared/.ollama-runtime | Redirects Ollama’s runtime directory away from ~/.ollama |
OLLAMA_TMPDIR | Shared/.ollama-runtime/tmp | Redirects temporary files to the USB drive |
OLLAMA_ORIGINS | * | Enables LAN access from phones and tablets on the same network |
OLLAMA_HOST | 127.0.0.1:11434 | Binds Ollama to localhost port 11434 |
Uninstalling
Run the Linux uninstall script to remove models or all downloaded data interactively:Shared/:
Shared/bin/ollama-linux— the Ollama engine binaryShared/lib/ollama/— runtime libraries includingllama-serverShared/models/— downloaded GGUF weights and ModelfilesShared/.ollama-runtime/— runtime state directory