Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/techjarves/USB-Uncensored-LLM/llms.txt

Use this file to discover all available pages before exploring further.

This guide walks Windows 10 and Windows 11 users through the full setup process for USB-Uncensored-LLM — from running the installer and selecting your AI models, to launching the browser-based chat UI and configuring environment variables for LAN access. Everything runs locally from your USB drive or a folder on your hard disk; no internet connection is required after the initial setup.

Prerequisites

  • Windows 10 or Windows 11 (x86-64)
  • 8 GB RAM minimum (16 GB recommended for 9B/12B models)
  • USB 3.0+ flash drive or SSD with at least 8 GB free space (16 GB strongly recommended), or a local folder on your C:\ / D:\ drive

Installation

1

Navigate to the Windows folder

Open the USB-Uncensored-LLM root directory (on your USB drive or wherever you cloned it) and open the Windows/ subfolder.
2

Run install.bat

Double-click install.bat to launch the setup wizard. If the terminal window closes instantly, right-click the file and choose Run as Administrator, or open a Command Prompt and run:
Windows\install.bat
The batch file sets the window title, displays a summary of what will be downloaded, then hands control to the PowerShell-based core installer:
powershell -ExecutionPolicy Bypass -File "%~dp0install-core.ps1"
3

Choose your AI model(s)

The PowerShell installer displays an interactive model catalog. Select one or more models by entering the corresponding number(s):
[1] Gemma 2 2B Abliterated (~1.6 GB) [UNCENSORED] - RECOMMENDED FOR ALL - BLAZING FAST
[2] Gemma 4 E4B Ultra Uncensored Heretic (~5.34 GB) [UNCENSORED] - HERETIC
[3] Qwen 3.5 9B Uncensored Aggressive (~5.2 GB) [UNCENSORED] - AGGRESSIVE
[4] NemoMix Unleashed 12B (~7.0 GB) [UNCENSORED] - HEAVYWEIGHT
[5] Dolphin 2.9 Llama 3 8B (~4.9 GB) [UNCENSORED]
[6] Phi-3.5 Mini 3.8B (~2.2 GB) [STANDARD] - LIGHTWEIGHT
[C] CUSTOM - Enter your own HuggingFace GGUF URL
Enter number(s) separated by commas (e.g. 1,3)
Press Enter with no input to default to model [1] — Gemma 2 2B Abliterated, the recommended starting point.
4

Wait for download and model import to complete

The installer runs a seven-step process shown in the terminal. Do not close the window during this phase. Each step is labelled [X/7] so you can track progress. When all steps succeed, the installer prints SETUP COMPLETE! YOUR PORTABLE AI IS READY!.
The 7 install steps explained:
StepWhat happens
[1/7]Model selection menu — you pick which models to download
[2/7]Creates Shared/models/, Shared/bin/, and Shared/vendor/ if they don’t exist
[3/7]Downloads optional offline UI assets (markdown, PDF renderer, fonts) into Shared/vendor/
[4/7]Downloads selected GGUF model weight files from HuggingFace into Shared/models/
[5/7]Creates a Modelfile-<name> configuration file for each model in Shared/models/
[6/7]Downloads ollama-windows-amd64.zip from GitHub Releases and extracts ollama-windows.exe and llama-server.exe to Shared/bin/
[7/7]Imports each model into the Ollama engine using ollama-windows.exe create and then shuts down the temporary server

Launching the Chat UI

Double-click Windows/start-fast-chat.bat to start the Ollama engine and open the chat interface. The script auto-detects a portable Python installation in Shared/python/ and falls back to system Python if needed. The terminal shows:
===================================================
    Portable AI - Fast Web Chat Mode
===================================================

  Launches the AI engine + browser chat UI.
  All chats auto-save to the USB drive.

[OK] Using portable Python from USB drive.
Starting Ollama Engine...
Waiting for engine to initialize...
[OK] Engine is online!

===================================================
 AI ENGINE IS RUNNING
 Chat UI opening at: http://localhost:3333
 Close this window to shut down everything.
===================================================
Your default browser opens automatically at http://localhost:3333. Keep the terminal window open — closing it shuts down the AI engine.

Multiple Model Selection

At the model selection prompt, you can select more than one model by entering numbers separated by commas:
1,3
To install every preset model at once, type all.
Selecting 3 or more models (or all) triggers a storage warning. The installer calculates the estimated download size and checks your available drive space. You will be asked to confirm before proceeding. Make sure you have at least 16 GB free when installing multiple large models.

Custom Models

To install any .gguf model from HuggingFace, enter C (or include c in a comma-separated list, e.g. 1,3,c). The installer then prompts for three pieces of information:
  1. GGUF URL — paste the direct download link, e.g. https://huggingface.co/user/model-GGUF/resolve/main/model-Q4_K_M.gguf
  2. Short local name — a slug used to refer to the model inside Ollama, e.g. mymodel-local (the -local suffix is appended automatically if omitted)
  3. System prompt — press Enter to use the default: You are a helpful AI assistant.
The custom model is downloaded to Shared/models/, a Modelfile-<name> is created, and it is imported into the Ollama engine alongside any preset models you selected.

Environment Variables

start-fast-chat.bat sets the following environment variables before starting the Ollama engine:
VariableValuePurpose
OLLAMA_MODELS%~dp0..\Shared\models\ollama_dataKeeps all model data on the USB drive
OLLAMA_ORIGINS*Allows connections from any origin, enabling LAN access from phones/tablets
OLLAMA_HOST127.0.0.1:11434Binds Ollama to localhost port 11434

Uninstalling

To remove AI models and downloaded engine files, double-click Windows/uninstall.bat. It launches uninstall-core.ps1, which presents a menu to:
  • Remove selected individual models
  • Remove all downloaded files (engine binaries, models, runtime data) while preserving the base project files
To perform a manual clean-up without running the uninstaller, delete the following folders from Shared/:
  • Shared/bin/ — engine binaries (ollama-windows.exe, llama-server.exe)
  • Shared/models/ — downloaded GGUF weights and Modelfiles
  • Shared/python/ — portable Python environment (if downloaded)

Troubleshooting

The script closes instantly after double-clicking This is caused by Windows App Execution Aliases interfering with the .bat file. To fix it, open a Command Prompt (Win + Rcmd), navigate to the Windows/ folder, and run install.bat manually. Alternatively, right-click the file and choose Run as Administrator. “Ollama Engine Not Found” error when starting This means start-fast-chat.bat was run before install.bat finished downloading the engine to Shared/bin/. Run Windows/install.bat first to download and set up the Ollama runtime, then launch the chat. Slow generation speeds The selected model is too large for the host machine’s RAM. Re-run install.bat and choose model [1] — Gemma 2 2B Abliterated — which runs well even on older hardware.

Build docs developers (and LLMs) love