USB-Uncensored-LLM is a fully portable, zero-dependency local AI environment that runs high-quality uncensored language models directly from a USB drive or SSD. Download your models once, plug into any Windows, macOS, Linux, or Android device, and start chatting — no system installations, no internet required after setup.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/techjarves/USB-Uncensored-LLM/llms.txt
Use this file to discover all available pages before exploring further.
Introduction
Learn what USB-Uncensored-LLM is, how it works, and what makes it different from other local AI setups.
Quickstart
Get your portable AI running in three steps — install the engine, download a model, and launch the chat UI.
System Requirements
Check hardware specs, storage needs, and RAM requirements before you start.
Model Library
Browse the curated catalog of uncensored models — Gemma, Qwen, NemoMix, Dolphin, Phi, and more.
Platform Guides
Windows
Double-click
install.bat to set up the engine and models, then launch with start-fast-chat.bat.macOS
Run
install.command in Terminal to download everything, then launch with start.command.Linux
Use
bash Linux/install.sh for a fully automated setup on Ubuntu, Debian, and compatible distros.Android
Run natively on Android via Termux — llama.cpp is compiled on-device for maximum ARM64 performance.
How It Works
Initialize the Engine
Run the installer for your operating system. It downloads the ~50 MB Ollama engine binary into
Shared/bin/ — nothing is installed system-wide.Download AI Models
Choose from the interactive model catalog or paste any HuggingFace GGUF URL. Models land in
Shared/models/ and are shared across all platforms on the same drive.Launch the Chat UI
Run the
start script. The Ollama engine starts in the background, and your browser opens to the locally-served chat interface at http://localhost:3333.Key Features
Zero-Install Setup
Ships with portable Python and isolated engine binaries. No system permissions, registry edits, or package managers required.
Shared Model Storage
Download 5 GB+ model weights once. The
Shared/ volume is read by all OS launchers, eliminating duplication.Hardware Acceleration
Automatically uses NVIDIA CUDA, Apple Metal GPU, or AVX CPU instructions depending on the host machine.
LAN Access
Access the chat UI from any phone or tablet on the same WiFi network — the server broadcasts its local IP at startup.
Persistent Chat History
Conversations are saved as JSON to the drive. Switch machines without losing context.
Custom Model Support
Download any
.gguf model from HuggingFace directly into the drive’s engine during the install flow.