Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/techjarves/USB-Uncensored-LLM/llms.txt

Use this file to discover all available pages before exploring further.

USB-Uncensored-LLM is a fully portable, zero-dependency local AI environment that runs high-quality uncensored language models directly from a USB drive or SSD. Download your models once, plug into any Windows, macOS, Linux, or Android device, and start chatting — no system installations, no internet required after setup.

Introduction

Learn what USB-Uncensored-LLM is, how it works, and what makes it different from other local AI setups.

Quickstart

Get your portable AI running in three steps — install the engine, download a model, and launch the chat UI.

System Requirements

Check hardware specs, storage needs, and RAM requirements before you start.

Model Library

Browse the curated catalog of uncensored models — Gemma, Qwen, NemoMix, Dolphin, Phi, and more.

Platform Guides

Windows

Double-click install.bat to set up the engine and models, then launch with start-fast-chat.bat.

macOS

Run install.command in Terminal to download everything, then launch with start.command.

Linux

Use bash Linux/install.sh for a fully automated setup on Ubuntu, Debian, and compatible distros.

Android

Run natively on Android via Termux — llama.cpp is compiled on-device for maximum ARM64 performance.

How It Works

1

Initialize the Engine

Run the installer for your operating system. It downloads the ~50 MB Ollama engine binary into Shared/bin/ — nothing is installed system-wide.
2

Download AI Models

Choose from the interactive model catalog or paste any HuggingFace GGUF URL. Models land in Shared/models/ and are shared across all platforms on the same drive.
3

Launch the Chat UI

Run the start script. The Ollama engine starts in the background, and your browser opens to the locally-served chat interface at http://localhost:3333.
4

Chat Anywhere

All conversations auto-save to Shared/chat_data/. Plug the drive into another computer, run the installer once for that OS, and your history travels with you.

Key Features

Zero-Install Setup

Ships with portable Python and isolated engine binaries. No system permissions, registry edits, or package managers required.

Shared Model Storage

Download 5 GB+ model weights once. The Shared/ volume is read by all OS launchers, eliminating duplication.

Hardware Acceleration

Automatically uses NVIDIA CUDA, Apple Metal GPU, or AVX CPU instructions depending on the host machine.

LAN Access

Access the chat UI from any phone or tablet on the same WiFi network — the server broadcasts its local IP at startup.

Persistent Chat History

Conversations are saved as JSON to the drive. Switch machines without losing context.

Custom Model Support

Download any .gguf model from HuggingFace directly into the drive’s engine during the install flow.

Build docs developers (and LLMs) love