USB-Uncensored-LLM: Portable Local AI on Any Drive

USB-Uncensored-LLM is a fully portable, zero-dependency local AI environment that runs high-quality uncensored language models directly from a USB drive or SSD. Download your models once, plug into any Windows, macOS, Linux, or Android device, and start chatting — no system installations, no internet required after setup.

Introduction

Learn what USB-Uncensored-LLM is, how it works, and what makes it different from other local AI setups.

Quickstart

Get your portable AI running in three steps — install the engine, download a model, and launch the chat UI.

System Requirements

Check hardware specs, storage needs, and RAM requirements before you start.

Model Library

Browse the curated catalog of uncensored models — Gemma, Qwen, NemoMix, Dolphin, Phi, and more.

Platform Guides

Windows

Double-click install.bat to set up the engine and models, then launch with start-fast-chat.bat.

macOS

Run install.command in Terminal to download everything, then launch with start.command.

Linux

Use bash Linux/install.sh for a fully automated setup on Ubuntu, Debian, and compatible distros.

Android

Run natively on Android via Termux — llama.cpp is compiled on-device for maximum ARM64 performance.

How It Works

Initialize the Engine

Run the installer for your operating system. It downloads the ~50 MB Ollama engine binary into Shared/bin/ — nothing is installed system-wide.

Download AI Models

Choose from the interactive model catalog or paste any HuggingFace GGUF URL. Models land in Shared/models/ and are shared across all platforms on the same drive.

Launch the Chat UI

Run the start script. The Ollama engine starts in the background, and your browser opens to the locally-served chat interface at http://localhost:3333.

Chat Anywhere

All conversations auto-save to Shared/chat_data/. Plug the drive into another computer, run the installer once for that OS, and your history travels with you.

Key Features

Zero-Install Setup

Ships with portable Python and isolated engine binaries. No system permissions, registry edits, or package managers required.

Shared Model Storage

Download 5 GB+ model weights once. The Shared/ volume is read by all OS launchers, eliminating duplication.

Hardware Acceleration

Automatically uses NVIDIA CUDA, Apple Metal GPU, or AVX CPU instructions depending on the host machine.

LAN Access

Access the chat UI from any phone or tablet on the same WiFi network — the server broadcasts its local IP at startup.

Persistent Chat History

Conversations are saved as JSON to the drive. Switch machines without losing context.

Custom Model Support

Download any .gguf model from HuggingFace directly into the drive’s engine during the install flow.

Get Started

Platform Guides

Models

Architecture

Reference

USB-Uncensored-LLM: Portable Local AI on Any Drive

Introduction

Quickstart

System Requirements

Model Library

Platform Guides

Windows

macOS

Linux

Android

How It Works

Key Features

Zero-Install Setup

Shared Model Storage

Hardware Acceleration

LAN Access

Persistent Chat History

Custom Model Support

Build docs developers (and LLMs) love

Get Started

Platform Guides

Models

Architecture

Reference

Documentation Index

Introduction

Quickstart

System Requirements

Model Library

​Platform Guides

Windows

macOS

Linux

Android

​How It Works

​Key Features

Zero-Install Setup

Shared Model Storage

Hardware Acceleration

LAN Access

Persistent Chat History

Custom Model Support

Build docs developers (and LLMs) love

Platform Guides

How It Works

Key Features