Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/techjarves/USB-Uncensored-LLM/llms.txt

Use this file to discover all available pages before exploring further.

This guide takes you from a fresh clone or download of USB-Uncensored-LLM to a fully running chat UI in three steps. The process is the same whether you’re working from a physical USB drive, an external SSD, or a local folder on your primary machine — run the installer for your OS, pick a model, and launch.
Before you begin, check that your drive and host machine meet the minimum hardware thresholds. See System Requirements for storage, RAM, and USB speed guidance.
1

Initialize the Engine

The install script detects your operating system and downloads the correct Ollama engine binary (~50 MB) into Shared/bin/. This is the only step that requires an internet connection.
Double-click Windows/install.bat in File Explorer.If the script closes instantly, right-click the file and select Run as Administrator instead — this resolves a known conflict with Windows App Execution Aliases.
Windows\install.bat
The script launches an interactive PowerShell menu that walks you through engine setup and model selection.
Initialization downloads the small (~50 MB) Ollama engine binary specific to your OS into the Shared/bin/ folder. Your model weights are not downloaded in this step — that happens in Step 2. You only need to run the installer once per OS you plan to use; the engine binary persists on the drive for future sessions.
2

Choose and Download a Model

After the engine is initialized, the installer presents an interactive numbered catalog of curated uncensored models. Enter the number of the model you want to download; it will be saved into Shared/models/ and is immediately available to all OS launchers on the drive.
[1] Gemma 2 2B Abliterated (~1.6 GB) [UNCENSORED] - RECOMMENDED FOR ALL - BLAZING FAST
[2] Gemma 4 E4B Ultra Uncensored Heretic (~5.34 GB) [UNCENSORED] - HERETIC
[3] Qwen 3.5 9B Uncensored Aggressive (~5.2 GB) [UNCENSORED] - AGGRESSIVE
[4] NemoMix Unleashed 12B (~7.0 GB) [UNCENSORED] - HEAVYWEIGHT
[5] Dolphin 2.9 Llama 3 8B (~4.9 GB) [UNCENSORED]
[6] Phi-3.5 Mini 3.8B (~2.2 GB) [STANDARD] - LIGHTWEIGHT

[C] CUSTOM - Enter your own HuggingFace GGUF URL
Recommended for first-time users: Choose [1] Gemma 2 2B Abliterated (~1.6 GB). It is the smallest model in the catalog, downloads quickly, runs fast on any hardware with 8 GB RAM, and delivers strong performance for its size. You can always download additional models later by re-running the install script.The [C] CUSTOM option lets you paste any direct .gguf download URL from HuggingFace — the installer will download it into Shared/models/ alongside the curated options.
3

Launch the Chat UI

Once the engine is initialized and at least one model is downloaded, run the start script for your OS. The Ollama engine starts silently in the background, and your default web browser opens automatically at http://localhost:3333.
Double-click Windows/start-fast-chat.bat in File Explorer, or run it from a terminal:
Windows\start-fast-chat.bat
The script starts the Ollama engine, waits for it to come online, then launches the Python chat server. Your browser opens to http://localhost:3333. Keep the terminal window open — closing it shuts down the engine and chat server.

What Happens Next

Chat history is saved automatically. Every conversation is written to Shared/chat_data/ in real time. Because this folder lives on your portable drive, your history is available regardless of which machine you plug into — your conversation from a Windows laptop last Tuesday is waiting for you when you plug into a Linux desktop today. Moving to a new machine is a one-step process. When you plug your drive into a different computer for the first time, run that machine’s install script once to download the OS-specific engine binary into Shared/bin/. Your model weights are already on the drive — no re-download required. After initialization, use the start script as normal for every subsequent session on that machine. LAN access from your phone. When the start script runs, the terminal displays a local network IP address (e.g., http://192.168.1.15:3333). Any device on the same WiFi network can open that address in a browser to use the chat UI — no app installation needed on the mobile device.
Running USB-Uncensored-LLM from an internal SSD rather than a USB flash drive results in significantly faster model loading and token generation. If you are using this as a permanent local AI setup rather than a portable one, clone the repository to a folder on your C:\ or D:\ drive and run the installer from there — model loading becomes near-instant.

Build docs developers (and LLMs) love