Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/techjarves/USB-Uncensored-LLM/llms.txt

Use this file to discover all available pages before exploring further.

This guide covers installation and launch of USB-Uncensored-LLM on Ubuntu 20.04+, Debian, and compatible Linux distributions. The installer downloads a native ollama-linux binary and its bundled llama-server inference executable, then imports your chosen models — all inside the Shared/ folder. Nothing is written to your home directory or system paths.

Prerequisites

  • Ubuntu 20.04+ or a Debian-compatible distribution
  • x86-64 architecture
  • 8 GB RAM minimum (16 GB recommended for 9B/12B models)
  • curl installed (sudo apt install curl)
  • Python 3 installed (sudo apt install python3)

Installation

1

Open a terminal and navigate to the project folder

Open your terminal emulator and cd to the USB-Uncensored-LLM root directory — whether that’s a mounted USB drive or a local clone.
2

Run the installer

bash Linux/install.sh
3

Choose your AI model(s)

The installer displays an interactive model catalog. Enter one or more numbers separated by commas, type all for every preset, or c for a custom HuggingFace GGUF URL:
[1] Gemma 2 2B Abliterated (~1.6 GB) [UNCENSORED] - RECOMMENDED FOR ALL - BLAZING FAST
[2] Gemma 4 E4B Ultra Uncensored Heretic (~5.34 GB) [UNCENSORED] - HERETIC
[3] Qwen 3.5 9B Uncensored Aggressive (~5.2 GB) [UNCENSORED] - AGGRESSIVE
[4] NemoMix Unleashed 12B (~7.0 GB) [UNCENSORED] - HEAVYWEIGHT
[5] Dolphin 2.9 Llama 3 8B (~4.9 GB) [UNCENSORED]
[6] Phi-3.5 Mini 3.8B (~2.2 GB) [STANDARD] - LIGHTWEIGHT
[C] CUSTOM - Enter your own HuggingFace GGUF URL
Enter number(s) separated by commas (e.g. 1,3)
Press Enter with no input to default to [1] Gemma 2 2B Abliterated.
4

Wait for the engine download and model import

The installer runs a seven-step process:
StepAction
[1/7]Model selection
[2/7]Creates Shared/models/, Shared/bin/, and Shared/vendor/
[3/7]Downloads optional offline UI assets
[4/7]Downloads selected GGUF model files from HuggingFace into Shared/models/
[5/7]Creates Modelfile-<name> configuration files in Shared/models/
[6/7]Downloads ollama-linux-amd64.tar.zst from GitHub Releases and extracts it directly into Shared/, placing the binary at Shared/bin/ollama-linux and runtime files (including llama-server) at Shared/lib/ollama/
[7/7]Imports each selected model into the Ollama engine using ollama-linux create, then shuts down the temporary server
The HOME environment variable is overridden to Shared/models/.ollama-runtime during import so that nothing is written to ~/.ollama on the host machine.
5

Installation complete

When all steps succeed, the installer prints SETUP COMPLETE! YOUR PORTABLE AI IS READY!. You are ready to launch.
If zstd is not installed on your system, the tar --use-compress-program=zstd extraction step may fail. Install it with:
sudo apt install zstd
Then re-run bash Linux/install.sh — it will skip already-downloaded models and retry only the engine extraction.

Launching

bash Linux/start.sh
The script exports all required environment variables, starts Shared/bin/ollama-linux serve in the background (with HOME overridden to keep data on the drive), waits for the engine to become ready, then launches the Python chat server. The terminal displays the local access URL:
===================================================
  AI ENGINE IS RUNNING
  Chat UI will open automatically.
  Press Ctrl+C to shut down.
===================================================
Your default browser opens at http://localhost:3333. The chat server also detects and prints your machine’s LAN IP address so other devices on the same network can connect. Press Ctrl + C to stop the chat server and shut down the Ollama engine.

CUDA GPU Acceleration

If NVIDIA drivers are installed on the host machine, Ollama automatically detects and uses CUDA for GPU-accelerated inference. No additional configuration is required — the detection is built into the ollama-linux binary. Verify that your GPU is visible with:
nvidia-smi
If nvidia-smi shows your GPU, Ollama will use it automatically the next time you run bash Linux/start.sh. Expect significantly higher tokens-per-second compared to CPU-only mode, especially for the larger 9B and 12B models.

Environment Variables

start.sh exports the following variables before starting the engine. All paths resolve inside the Shared/ folder, keeping the host system’s home directory completely clean:
VariableValuePurpose
OLLAMA_MODELSShared/models/ollama_dataKeeps model data on the USB drive
OLLAMA_HOMEShared/.ollama-runtimeRedirects Ollama’s runtime directory away from ~/.ollama
OLLAMA_TMPDIRShared/.ollama-runtime/tmpRedirects temporary files to the USB drive
OLLAMA_ORIGINS*Enables LAN access from phones and tablets on the same network
OLLAMA_HOST127.0.0.1:11434Binds Ollama to localhost port 11434

Uninstalling

Run the Linux uninstall script to remove models or all downloaded data interactively:
bash Linux/uninstall.sh
The menu lets you remove individual models, a selection of models, or all downloaded files while keeping the base project files intact. For a manual clean-up, delete the following from Shared/:
  • Shared/bin/ollama-linux — the Ollama engine binary
  • Shared/lib/ollama/ — runtime libraries including llama-server
  • Shared/models/ — downloaded GGUF weights and Modelfiles
  • Shared/.ollama-runtime/ — runtime state directory

Build docs developers (and LLMs) love