Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/techjarves/USB-Uncensored-LLM/llms.txt

Use this file to discover all available pages before exploring further.

USB-Uncensored-LLM is a fully air-gapped, zero-dependency, plug-and-play local AI environment that lives entirely on a USB drive or SSD. There is nothing to install on the host computer — no registry edits, no package managers, no system permissions required. You plug in your drive, run a single script, and within moments a private, uncensored large language model is running entirely on your hardware, served through a browser-based chat UI. Take the drive to any Windows, macOS, Linux, or Android device and the same models, the same chat history, and the same experience follow you.

What is USB-Uncensored-LLM?

USB-Uncensored-LLM is a portable AI runtime that runs heavyweight language models directly from external storage without touching the host operating system. The entire stack — the inference engine, model weights, Python runtime, and chat server — lives inside a single folder that you can carry in your pocket. Key properties of the project:
  • Zero-install: No administrator rights are required on the host machine. The engine binaries and a portable Python environment are self-contained inside the Shared/ directory.
  • Air-gapped after setup: Internet access is only needed once to download the engine (~50 MB) and your chosen model (1.6–7 GB). After that, everything runs completely offline.
  • Cross-platform: The same drive works on Windows 10/11, macOS 11+, Ubuntu/Debian Linux, and natively on Android via Termux — each OS has its own launcher in a dedicated folder.
  • Uncensored models: The project ships with a curated catalog of abliterated and heretic fine-tuned models that have had safety alignment vectors removed at the mathematical level.
  • Privacy-first: No telemetry, no cloud routing, no API keys. Your prompts and responses never leave your machine.

Core Architecture

The project is structured around a Shared/ volume that acts as a unified data layer for all operating systems. Heavy files — model weights, the Ollama engine binaries, and chat history — live here once and are referenced by whichever OS launcher you use. This prevents duplication when moving between machines.
[USB Drive or Local Folder]
 ├── 📁 Android    — Termux installers & launchers
 ├── 📁 Linux      — Ubuntu/Debian offline installers & launchers
 ├── 📁 Mac        — macOS offline installers & launchers
 ├── 📁 Windows    — Windows automatic UI menus
 └── 📁 Shared     — Unified data system
      ├── 📁 bin         (Ollama engine binaries per OS)
      ├── 📁 chat_data   (Cross-platform persistent conversation history)
      ├── 📁 models      (GGUF model weights & local database mapping)
      └── 📁 python      (Isolated portable Python environment)
The Ollama engine is stored in Shared/bin/ as a platform-specific binary (e.g., ollama-windows.exe, ollama-darwin). The install script for each OS downloads the correct ~50 MB binary into that folder. Because it lands in Shared/, it only needs to be downloaded once per OS you intend to use — not once per machine. The Python chat server (Shared/chat_server.py) is a custom HTTP server that starts on port 3333. It bridges the browser-based chat UI to the locally-running Ollama engine, handles LAN routing so that phones and tablets on the same WiFi can connect, and auto-saves conversation history to Shared/chat_data/.

Why Uncensored?

Commercial AI models ship with RLHF-based safety alignment: a fine-tuning process that trains the model to refuse certain categories of requests. This alignment is not a separate filter applied on top of the model — it is baked into the model’s own weight tensors as directional vectors inside the residual stream. Abliteration is a technique that identifies those refusal-direction vectors through representation engineering and mathematically subtracts them from the weight matrices. The result is a model that is functionally identical to the original in every capability benchmark, but no longer has the mechanism to generate refusal responses. Heretic fine-tuning goes a step further: the model is fine-tuned on datasets specifically designed to enforce compliance regardless of the content or topic of the user’s query. Both approaches produce community fine-tunes distributed as .gguf files on HuggingFace, which is where USB-Uncensored-LLM’s curated model catalog sources its weights.

Key Features

Zero-Install Setup

No system permissions, registry edits, or package managers needed. The entire runtime is self-contained on the drive and runs from a single script double-click.

Cross-Platform

One drive works across Windows 10/11, macOS 11+, Ubuntu/Debian Linux, and Android (ARM64 via Termux). Each OS has its own isolated launcher folder.

Shared Model Storage

Model weights live once in Shared/models/. Switching from Windows to Linux on the same drive does not require re-downloading your 5 GB model files.

Hardware Acceleration

Ollama automatically detects and uses NVIDIA CUDA, Apple Metal (M1/M2/M3), or CPU AVX instructions — whichever the host machine supports — with no configuration needed.

Persistent Chat History

All conversations auto-save to Shared/chat_data/. Your history is preserved across sessions and across different host machines that mount the same drive.

LAN Mobile Access

The Python chat server exposes the UI over your local network. Any phone or tablet on the same WiFi can open the chat interface by navigating to the displayed IP address on port 3333.
Responsible Use Disclaimer: USB-Uncensored-LLM is built for uncompromising computational freedom. By utilizing abliterated and heretic fine-tuned models, the system will not moralize, lecture, or refuse your prompts. You are solely responsible for the content you generate and how you use it. Please use this tool lawfully and ethically.

Build docs developers (and LLMs) love