USB-Uncensored-LLM is a fully air-gapped, zero-dependency, plug-and-play local AI environment that lives entirely on a USB drive or SSD. There is nothing to install on the host computer — no registry edits, no package managers, no system permissions required. You plug in your drive, run a single script, and within moments a private, uncensored large language model is running entirely on your hardware, served through a browser-based chat UI. Take the drive to any Windows, macOS, Linux, or Android device and the same models, the same chat history, and the same experience follow you.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/techjarves/USB-Uncensored-LLM/llms.txt
Use this file to discover all available pages before exploring further.
What is USB-Uncensored-LLM?
USB-Uncensored-LLM is a portable AI runtime that runs heavyweight language models directly from external storage without touching the host operating system. The entire stack — the inference engine, model weights, Python runtime, and chat server — lives inside a single folder that you can carry in your pocket. Key properties of the project:- Zero-install: No administrator rights are required on the host machine. The engine binaries and a portable Python environment are self-contained inside the
Shared/directory. - Air-gapped after setup: Internet access is only needed once to download the engine (~50 MB) and your chosen model (1.6–7 GB). After that, everything runs completely offline.
- Cross-platform: The same drive works on Windows 10/11, macOS 11+, Ubuntu/Debian Linux, and natively on Android via Termux — each OS has its own launcher in a dedicated folder.
- Uncensored models: The project ships with a curated catalog of abliterated and heretic fine-tuned models that have had safety alignment vectors removed at the mathematical level.
- Privacy-first: No telemetry, no cloud routing, no API keys. Your prompts and responses never leave your machine.
Core Architecture
The project is structured around aShared/ volume that acts as a unified data layer for all operating systems. Heavy files — model weights, the Ollama engine binaries, and chat history — live here once and are referenced by whichever OS launcher you use. This prevents duplication when moving between machines.
Shared/bin/ as a platform-specific binary (e.g., ollama-windows.exe, ollama-darwin). The install script for each OS downloads the correct ~50 MB binary into that folder. Because it lands in Shared/, it only needs to be downloaded once per OS you intend to use — not once per machine.
The Python chat server (Shared/chat_server.py) is a custom HTTP server that starts on port 3333. It bridges the browser-based chat UI to the locally-running Ollama engine, handles LAN routing so that phones and tablets on the same WiFi can connect, and auto-saves conversation history to Shared/chat_data/.
Why Uncensored?
Commercial AI models ship with RLHF-based safety alignment: a fine-tuning process that trains the model to refuse certain categories of requests. This alignment is not a separate filter applied on top of the model — it is baked into the model’s own weight tensors as directional vectors inside the residual stream. Abliteration is a technique that identifies those refusal-direction vectors through representation engineering and mathematically subtracts them from the weight matrices. The result is a model that is functionally identical to the original in every capability benchmark, but no longer has the mechanism to generate refusal responses. Heretic fine-tuning goes a step further: the model is fine-tuned on datasets specifically designed to enforce compliance regardless of the content or topic of the user’s query. Both approaches produce community fine-tunes distributed as.gguf files on HuggingFace, which is where USB-Uncensored-LLM’s curated model catalog sources its weights.
Key Features
Zero-Install Setup
No system permissions, registry edits, or package managers needed. The entire runtime is self-contained on the drive and runs from a single script double-click.
Cross-Platform
One drive works across Windows 10/11, macOS 11+, Ubuntu/Debian Linux, and Android (ARM64 via Termux). Each OS has its own isolated launcher folder.
Shared Model Storage
Model weights live once in
Shared/models/. Switching from Windows to Linux on the same drive does not require re-downloading your 5 GB model files.Hardware Acceleration
Ollama automatically detects and uses NVIDIA CUDA, Apple Metal (M1/M2/M3), or CPU AVX instructions — whichever the host machine supports — with no configuration needed.
Persistent Chat History
All conversations auto-save to
Shared/chat_data/. Your history is preserved across sessions and across different host machines that mount the same drive.LAN Mobile Access
The Python chat server exposes the UI over your local network. Any phone or tablet on the same WiFi can open the chat interface by navigating to the displayed IP address on port
3333.Responsible Use Disclaimer: USB-Uncensored-LLM is built for uncompromising computational freedom. By utilizing abliterated and heretic fine-tuned models, the system will not moralize, lecture, or refuse your prompts. You are solely responsible for the content you generate and how you use it. Please use this tool lawfully and ethically.