Run USB-Uncensored-LLM on Linux

This guide covers installation and launch of USB-Uncensored-LLM on Ubuntu 20.04+, Debian, and compatible Linux distributions. The installer downloads a native ollama-linux binary and its bundled llama-server inference executable, then imports your chosen models — all inside the Shared/ folder. Nothing is written to your home directory or system paths.

Prerequisites

Ubuntu 20.04+ or a Debian-compatible distribution
x86-64 architecture
8 GB RAM minimum (16 GB recommended for 9B/12B models)
curl installed (sudo apt install curl)
Python 3 installed (sudo apt install python3)

Installation

Open a terminal and navigate to the project folder

Open your terminal emulator and cd to the USB-Uncensored-LLM root directory — whether that’s a mounted USB drive or a local clone.

Run the installer

bash Linux/install.sh

Choose your AI model(s)

The installer displays an interactive model catalog. Enter one or more numbers separated by commas, type all for every preset, or c for a custom HuggingFace GGUF URL:

[1] Gemma 2 2B Abliterated (~1.6 GB) [UNCENSORED] - RECOMMENDED FOR ALL - BLAZING FAST
[2] Gemma 4 E4B Ultra Uncensored Heretic (~5.34 GB) [UNCENSORED] - HERETIC
[3] Qwen 3.5 9B Uncensored Aggressive (~5.2 GB) [UNCENSORED] - AGGRESSIVE
[4] NemoMix Unleashed 12B (~7.0 GB) [UNCENSORED] - HEAVYWEIGHT
[5] Dolphin 2.9 Llama 3 8B (~4.9 GB) [UNCENSORED]
[6] Phi-3.5 Mini 3.8B (~2.2 GB) [STANDARD] - LIGHTWEIGHT
[C] CUSTOM - Enter your own HuggingFace GGUF URL
Enter number(s) separated by commas (e.g. 1,3)

Press Enter with no input to default to [1] Gemma 2 2B Abliterated.

Wait for the engine download and model import

The installer runs a seven-step process:

Step	Action
`[1/7]`	Model selection
`[2/7]`	Creates `Shared/models/`, `Shared/bin/`, and `Shared/vendor/`
`[3/7]`	Downloads optional offline UI assets
`[4/7]`	Downloads selected GGUF model files from HuggingFace into `Shared/models/`
`[5/7]`	Creates `Modelfile-<name>` configuration files in `Shared/models/`
`[6/7]`	Downloads `ollama-linux-amd64.tar.zst` from GitHub Releases and extracts it directly into `Shared/`, placing the binary at `Shared/bin/ollama-linux` and runtime files (including `llama-server`) at `Shared/lib/ollama/`
`[7/7]`	Imports each selected model into the Ollama engine using `ollama-linux create`, then shuts down the temporary server

The HOME environment variable is overridden to Shared/models/.ollama-runtime during import so that nothing is written to ~/.ollama on the host machine.

Installation complete

When all steps succeed, the installer prints SETUP COMPLETE! YOUR PORTABLE AI IS READY!. You are ready to launch.

If zstd is not installed on your system, the tar --use-compress-program=zstd extraction step may fail. Install it with:

sudo apt install zstd

Then re-run bash Linux/install.sh — it will skip already-downloaded models and retry only the engine extraction.

Launching

bash Linux/start.sh

The script exports all required environment variables, starts Shared/bin/ollama-linux serve in the background (with HOME overridden to keep data on the drive), waits for the engine to become ready, then launches the Python chat server. The terminal displays the local access URL:

===================================================
  AI ENGINE IS RUNNING
  Chat UI will open automatically.
  Press Ctrl+C to shut down.
===================================================

Your default browser opens at http://localhost:3333. The chat server also detects and prints your machine’s LAN IP address so other devices on the same network can connect. Press Ctrl + C to stop the chat server and shut down the Ollama engine.

CUDA GPU Acceleration

If NVIDIA drivers are installed on the host machine, Ollama automatically detects and uses CUDA for GPU-accelerated inference. No additional configuration is required — the detection is built into the ollama-linux binary. Verify that your GPU is visible with:

nvidia-smi

If nvidia-smi shows your GPU, Ollama will use it automatically the next time you run bash Linux/start.sh. Expect significantly higher tokens-per-second compared to CPU-only mode, especially for the larger 9B and 12B models.

Environment Variables

start.sh exports the following variables before starting the engine. All paths resolve inside the Shared/ folder, keeping the host system’s home directory completely clean:

Variable	Value	Purpose
`OLLAMA_MODELS`	`Shared/models/ollama_data`	Keeps model data on the USB drive
`OLLAMA_HOME`	`Shared/.ollama-runtime`	Redirects Ollama’s runtime directory away from `~/.ollama`
`OLLAMA_TMPDIR`	`Shared/.ollama-runtime/tmp`	Redirects temporary files to the USB drive
`OLLAMA_ORIGINS`	`*`	Enables LAN access from phones and tablets on the same network
`OLLAMA_HOST`	`127.0.0.1:11434`	Binds Ollama to localhost port 11434

Uninstalling

Run the Linux uninstall script to remove models or all downloaded data interactively:

bash Linux/uninstall.sh

The menu lets you remove individual models, a selection of models, or all downloaded files while keeping the base project files intact. For a manual clean-up, delete the following from Shared/:

Shared/bin/ollama-linux — the Ollama engine binary
Shared/lib/ollama/ — runtime libraries including llama-server
Shared/models/ — downloaded GGUF weights and Modelfiles
Shared/.ollama-runtime/ — runtime state directory

Get Started

Platform Guides

Models

Architecture

Reference

Prerequisites

Installation

Launching

CUDA GPU Acceleration

Environment Variables

Uninstalling

Build docs developers (and LLMs) love

Get Started

Platform Guides

Models

Architecture

Reference

Documentation Index

​Prerequisites

​Installation

​Launching

​CUDA GPU Acceleration

​Environment Variables

​Uninstalling

Build docs developers (and LLMs) love

Prerequisites

Installation

Launching

CUDA GPU Acceleration

Environment Variables

Uninstalling