System Requirements for USB-Uncensored-LLM

USB-Uncensored-LLM has a deliberately low hardware floor — it is designed to run on older machines, budget laptops, and portable drives rather than requiring a dedicated AI workstation. The most important resource for local LLM inference is RAM, not CPU clock speed. The model weights must fit into system memory (or GPU VRAM) during inference; if they do not, the OS begins swapping to disk and generation slows to a crawl. Choose a model whose size fits comfortably within your available RAM using the table below, and you will have a smooth experience regardless of how old your hardware is.

Storage

The drive or folder where you place USB-Uncensored-LLM must have enough free space to hold the engine binaries and at least one model.

Requirement	Value
Minimum free space	8 GB
Recommended free space	16 GB
Minimum USB standard	USB 3.0
Recommended USB standard	USB 3.1 / USB-C SSD

USB 2.0 drives will technically work, but model loading will be noticeably slow because the multi-gigabyte weight files must be read from the drive on every launch. USB 3.0 or faster is strongly recommended. Running from an internal SSD on the host machine is the fastest option — model loading is near-instant at internal drive read speeds.

RAM Requirements

The table below lists each model in the catalog alongside its file size and the RAM needed to run it at a usable speed. “Minimum RAM” means the model will load and generate tokens; “Recommended RAM” means generation will be smooth with headroom for the OS and other applications.

Model	File Size	Minimum RAM	Recommended RAM
Gemma 2 2B Abliterated	1.6 GB	8 GB	8 GB
Phi-3.5 Mini 3.8B	2.2 GB	8 GB	8 GB
Dolphin 2.9 Llama 3 8B	4.9 GB	8 GB	16 GB
Gemma 4 E4B Ultra Uncensored Heretic	5.34 GB	8 GB	16 GB
Qwen 3.5 9B Uncensored Aggressive	5.2 GB	16 GB	16 GB
NemoMix Unleashed 12B	7.0 GB	16 GB	32 GB

If you are on a machine with 8 GB of RAM, start with Gemma 2 2B Abliterated. It is the fastest model in the catalog and performs remarkably well for its size.

GPU Acceleration

Hardware acceleration is fully automatic — no configuration files to edit, no environment variables to set. When the Ollama engine starts, it detects the available compute hardware on the host machine and uses the best option available.

Hardware	Acceleration
NVIDIA GPU	CUDA (auto-detected if NVIDIA drivers are installed)
Apple Silicon (M1 / M2 / M3)	Metal GPU acceleration, enabled by default
Intel / AMD (no discrete GPU)	CPU inference using AVX instruction sets

A GPU is not required. Every model in the catalog runs on CPU-only machines. GPU acceleration improves tokens-per-second generation speed significantly, but the output quality of the model is identical regardless of whether a GPU is used.

Operating System

USB-Uncensored-LLM ships OS-specific launcher scripts for the following platforms:

Windows 10 / 11 (x86-64)
macOS 11 Big Sur and later (Intel and Apple Silicon)
Ubuntu / Debian Linux (x86-64)
Android (ARM64, via Termux from F-Droid)

Each OS has a dedicated subdirectory (Windows/, Mac/, Linux/, Android/) containing its own install and start scripts. The heavy files in Shared/ are shared between all of them.

Internet Access

Internet access is only required during the initial setup phase. The install script downloads the Ollama engine binary (~50 MB) and your chosen model weights (1.6 GB to 7 GB depending on your selection). Once both are saved to the Shared/ folder on your drive, the system operates fully offline — no internet connection is needed to run, chat, or use any feature of the UI.

Android-Specific Requirements

Running USB-Uncensored-LLM natively on Android via Termux has slightly different requirements from the desktop platforms:

RAM: 6 GB minimum (8 GB+ recommended). Only the 2B model runs well on 6 GB devices; larger models require 12 GB or more.
Processor: ARM64 (virtually all modern Android phones and tablets qualify).
Termux: Must be installed from F-Droid — the Play Store version is outdated and will not work correctly.
Internet: WiFi or mobile data is required for initial setup (engine + model downloads). After setup, the engine runs offline.
Battery: LLM inference is CPU-intensive. Plug in your charger before running extended sessions — inference drains battery significantly faster than typical app usage.

On Android, expect generation speeds of approximately 3–10 tokens per second on the 2B model, compared to 30–50+ tokens per second on a PC with a discrete GPU. Run termux-wake-lock before starting the engine to prevent Android from suspending the process in the background.

Get Started

Platform Guides

Models

Architecture

Reference

System Requirements for USB-Uncensored-LLM

Storage

RAM Requirements

GPU Acceleration

Operating System

Internet Access

Android-Specific Requirements

Build docs developers (and LLMs) love

Get Started

Platform Guides

Models

Architecture

Reference

Documentation Index

​Storage

​RAM Requirements

​GPU Acceleration

​Operating System

​Internet Access

​Android-Specific Requirements

Build docs developers (and LLMs) love

Storage

RAM Requirements

GPU Acceleration

Operating System

Internet Access

Android-Specific Requirements