Get Started with USB-Uncensored-LLM in Three Steps

This guide takes you from a fresh clone or download of USB-Uncensored-LLM to a fully running chat UI in three steps. The process is the same whether you’re working from a physical USB drive, an external SSD, or a local folder on your primary machine — run the installer for your OS, pick a model, and launch.

Before you begin, check that your drive and host machine meet the minimum hardware thresholds. See System Requirements for storage, RAM, and USB speed guidance.

Initialize the Engine

The install script detects your operating system and downloads the correct Ollama engine binary (~50 MB) into Shared/bin/. This is the only step that requires an internet connection.

Windows
macOS
Linux

Double-click Windows/install.bat in File Explorer.If the script closes instantly, right-click the file and select Run as Administrator instead — this resolves a known conflict with Windows App Execution Aliases.

Windows\install.bat

The script launches an interactive PowerShell menu that walks you through engine setup and model selection.

Open Terminal, then drag the Mac/install.command file from Finder directly into the Terminal window and press Enter.

bash Mac/install.command

macOS may prompt you to allow the script to run. If you see a security warning, go to System Settings → Privacy & Security and click Allow Anyway.

Open a terminal, navigate to the project root, and run:

bash Linux/install.sh

The script will check for dependencies, download the Linux Ollama binary, and place it in Shared/bin/.

Initialization downloads the small (~50 MB) Ollama engine binary specific to your OS into the Shared/bin/ folder. Your model weights are not downloaded in this step — that happens in Step 2. You only need to run the installer once per OS you plan to use; the engine binary persists on the drive for future sessions.

Choose and Download a Model

After the engine is initialized, the installer presents an interactive numbered catalog of curated uncensored models. Enter the number of the model you want to download; it will be saved into Shared/models/ and is immediately available to all OS launchers on the drive.

[1] Gemma 2 2B Abliterated (~1.6 GB) [UNCENSORED] - RECOMMENDED FOR ALL - BLAZING FAST
[2] Gemma 4 E4B Ultra Uncensored Heretic (~5.34 GB) [UNCENSORED] - HERETIC
[3] Qwen 3.5 9B Uncensored Aggressive (~5.2 GB) [UNCENSORED] - AGGRESSIVE
[4] NemoMix Unleashed 12B (~7.0 GB) [UNCENSORED] - HEAVYWEIGHT
[5] Dolphin 2.9 Llama 3 8B (~4.9 GB) [UNCENSORED]
[6] Phi-3.5 Mini 3.8B (~2.2 GB) [STANDARD] - LIGHTWEIGHT

[C] CUSTOM - Enter your own HuggingFace GGUF URL

Recommended for first-time users: Choose [1] Gemma 2 2B Abliterated (~1.6 GB). It is the smallest model in the catalog, downloads quickly, runs fast on any hardware with 8 GB RAM, and delivers strong performance for its size. You can always download additional models later by re-running the install script.The [C] CUSTOM option lets you paste any direct .gguf download URL from HuggingFace — the installer will download it into Shared/models/ alongside the curated options.

Launch the Chat UI

Once the engine is initialized and at least one model is downloaded, run the start script for your OS. The Ollama engine starts silently in the background, and your default web browser opens automatically at http://localhost:3333.

Windows
macOS
Linux

Double-click Windows/start-fast-chat.bat in File Explorer, or run it from a terminal:

Windows\start-fast-chat.bat

The script starts the Ollama engine, waits for it to come online, then launches the Python chat server. Your browser opens to http://localhost:3333. Keep the terminal window open — closing it shuts down the engine and chat server.

Open Terminal, drag in Mac/start.command, and press Enter:

bash Mac/start.command

The browser opens automatically once the engine is ready. Keep the Terminal window open for the duration of your session.

Run the start script from your terminal:

bash Linux/start.sh

The engine initializes and the chat UI opens in your default browser at http://localhost:3333.

What Happens Next

Chat history is saved automatically. Every conversation is written to Shared/chat_data/ in real time. Because this folder lives on your portable drive, your history is available regardless of which machine you plug into — your conversation from a Windows laptop last Tuesday is waiting for you when you plug into a Linux desktop today. Moving to a new machine is a one-step process. When you plug your drive into a different computer for the first time, run that machine’s install script once to download the OS-specific engine binary into Shared/bin/. Your model weights are already on the drive — no re-download required. After initialization, use the start script as normal for every subsequent session on that machine. LAN access from your phone. When the start script runs, the terminal displays a local network IP address (e.g., http://192.168.1.15:3333). Any device on the same WiFi network can open that address in a browser to use the chat UI — no app installation needed on the mobile device.

Running USB-Uncensored-LLM from an internal SSD rather than a USB flash drive results in significantly faster model loading and token generation. If you are using this as a permanent local AI setup rather than a portable one, clone the repository to a folder on your C:\ or D:\ drive and run the installer from there — model loading becomes near-instant.

Get Started

Platform Guides

Models

Architecture

Reference

Get Started with USB-Uncensored-LLM in Three Steps

What Happens Next

Build docs developers (and LLMs) love

Get Started

Platform Guides

Models

Architecture

Reference

Documentation Index

​What Happens Next

Build docs developers (and LLMs) love

What Happens Next