This guide walks Windows 10 and Windows 11 users through the full setup process for USB-Uncensored-LLM — from running the installer and selecting your AI models, to launching the browser-based chat UI and configuring environment variables for LAN access. Everything runs locally from your USB drive or a folder on your hard disk; no internet connection is required after the initial setup.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/techjarves/USB-Uncensored-LLM/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
- Windows 10 or Windows 11 (x86-64)
- 8 GB RAM minimum (16 GB recommended for 9B/12B models)
- USB 3.0+ flash drive or SSD with at least 8 GB free space (16 GB strongly recommended), or a local folder on your
C:\/D:\drive
Installation
Navigate to the Windows folder
Open the USB-Uncensored-LLM root directory (on your USB drive or wherever you cloned it) and open the
Windows/ subfolder.Run install.bat
Double-click The batch file sets the window title, displays a summary of what will be downloaded, then hands control to the PowerShell-based core installer:
install.bat to launch the setup wizard. If the terminal window closes instantly, right-click the file and choose Run as Administrator, or open a Command Prompt and run:Choose your AI model(s)
The PowerShell installer displays an interactive model catalog. Select one or more models by entering the corresponding number(s):Press Enter with no input to default to model
[1] — Gemma 2 2B Abliterated, the recommended starting point.The 7 install steps explained:
| Step | What happens |
|---|---|
[1/7] | Model selection menu — you pick which models to download |
[2/7] | Creates Shared/models/, Shared/bin/, and Shared/vendor/ if they don’t exist |
[3/7] | Downloads optional offline UI assets (markdown, PDF renderer, fonts) into Shared/vendor/ |
[4/7] | Downloads selected GGUF model weight files from HuggingFace into Shared/models/ |
[5/7] | Creates a Modelfile-<name> configuration file for each model in Shared/models/ |
[6/7] | Downloads ollama-windows-amd64.zip from GitHub Releases and extracts ollama-windows.exe and llama-server.exe to Shared/bin/ |
[7/7] | Imports each model into the Ollama engine using ollama-windows.exe create and then shuts down the temporary server |
Launching the Chat UI
Double-clickWindows/start-fast-chat.bat to start the Ollama engine and open the chat interface. The script auto-detects a portable Python installation in Shared/python/ and falls back to system Python if needed. The terminal shows:
http://localhost:3333. Keep the terminal window open — closing it shuts down the AI engine.
Multiple Model Selection
At the model selection prompt, you can select more than one model by entering numbers separated by commas:all.
Selecting 3 or more models (or
all) triggers a storage warning. The installer calculates the estimated download size and checks your available drive space. You will be asked to confirm before proceeding. Make sure you have at least 16 GB free when installing multiple large models.Custom Models
To install any.gguf model from HuggingFace, enter C (or include c in a comma-separated list, e.g. 1,3,c). The installer then prompts for three pieces of information:
- GGUF URL — paste the direct download link, e.g.
https://huggingface.co/user/model-GGUF/resolve/main/model-Q4_K_M.gguf - Short local name — a slug used to refer to the model inside Ollama, e.g.
mymodel-local(the-localsuffix is appended automatically if omitted) - System prompt — press Enter to use the default:
You are a helpful AI assistant.
Shared/models/, a Modelfile-<name> is created, and it is imported into the Ollama engine alongside any preset models you selected.
Environment Variables
start-fast-chat.bat sets the following environment variables before starting the Ollama engine:
| Variable | Value | Purpose |
|---|---|---|
OLLAMA_MODELS | %~dp0..\Shared\models\ollama_data | Keeps all model data on the USB drive |
OLLAMA_ORIGINS | * | Allows connections from any origin, enabling LAN access from phones/tablets |
OLLAMA_HOST | 127.0.0.1:11434 | Binds Ollama to localhost port 11434 |
Uninstalling
To remove AI models and downloaded engine files, double-clickWindows/uninstall.bat. It launches uninstall-core.ps1, which presents a menu to:
- Remove selected individual models
- Remove all downloaded files (engine binaries, models, runtime data) while preserving the base project files
Shared/:
Shared/bin/— engine binaries (ollama-windows.exe,llama-server.exe)Shared/models/— downloaded GGUF weights and ModelfilesShared/python/— portable Python environment (if downloaded)
Troubleshooting
The script closes instantly after double-clicking This is caused by Windows App Execution Aliases interfering with the.bat file. To fix it, open a Command Prompt (Win + R → cmd), navigate to the Windows/ folder, and run install.bat manually. Alternatively, right-click the file and choose Run as Administrator.
“Ollama Engine Not Found” error when starting
This means start-fast-chat.bat was run before install.bat finished downloading the engine to Shared/bin/. Run Windows/install.bat first to download and set up the Ollama runtime, then launch the chat.
Slow generation speeds
The selected model is too large for the host machine’s RAM. Re-run install.bat and choose model [1] — Gemma 2 2B Abliterated — which runs well even on older hardware.