USB-Uncensored-LLM has a deliberately low hardware floor — it is designed to run on older machines, budget laptops, and portable drives rather than requiring a dedicated AI workstation. The most important resource for local LLM inference is RAM, not CPU clock speed. The model weights must fit into system memory (or GPU VRAM) during inference; if they do not, the OS begins swapping to disk and generation slows to a crawl. Choose a model whose size fits comfortably within your available RAM using the table below, and you will have a smooth experience regardless of how old your hardware is.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/techjarves/USB-Uncensored-LLM/llms.txt
Use this file to discover all available pages before exploring further.
Storage
The drive or folder where you place USB-Uncensored-LLM must have enough free space to hold the engine binaries and at least one model.| Requirement | Value |
|---|---|
| Minimum free space | 8 GB |
| Recommended free space | 16 GB |
| Minimum USB standard | USB 3.0 |
| Recommended USB standard | USB 3.1 / USB-C SSD |
USB 2.0 drives will technically work, but model loading will be noticeably slow because the multi-gigabyte weight files must be read from the drive on every launch. USB 3.0 or faster is strongly recommended. Running from an internal SSD on the host machine is the fastest option — model loading is near-instant at internal drive read speeds.
RAM Requirements
The table below lists each model in the catalog alongside its file size and the RAM needed to run it at a usable speed. “Minimum RAM” means the model will load and generate tokens; “Recommended RAM” means generation will be smooth with headroom for the OS and other applications.| Model | File Size | Minimum RAM | Recommended RAM |
|---|---|---|---|
| Gemma 2 2B Abliterated | 1.6 GB | 8 GB | 8 GB |
| Phi-3.5 Mini 3.8B | 2.2 GB | 8 GB | 8 GB |
| Dolphin 2.9 Llama 3 8B | 4.9 GB | 8 GB | 16 GB |
| Gemma 4 E4B Ultra Uncensored Heretic | 5.34 GB | 8 GB | 16 GB |
| Qwen 3.5 9B Uncensored Aggressive | 5.2 GB | 16 GB | 16 GB |
| NemoMix Unleashed 12B | 7.0 GB | 16 GB | 32 GB |
GPU Acceleration
Hardware acceleration is fully automatic — no configuration files to edit, no environment variables to set. When the Ollama engine starts, it detects the available compute hardware on the host machine and uses the best option available.| Hardware | Acceleration |
|---|---|
| NVIDIA GPU | CUDA (auto-detected if NVIDIA drivers are installed) |
| Apple Silicon (M1 / M2 / M3) | Metal GPU acceleration, enabled by default |
| Intel / AMD (no discrete GPU) | CPU inference using AVX instruction sets |
A GPU is not required. Every model in the catalog runs on CPU-only machines. GPU acceleration improves tokens-per-second generation speed significantly, but the output quality of the model is identical regardless of whether a GPU is used.
Operating System
USB-Uncensored-LLM ships OS-specific launcher scripts for the following platforms:- Windows 10 / 11 (x86-64)
- macOS 11 Big Sur and later (Intel and Apple Silicon)
- Ubuntu / Debian Linux (x86-64)
- Android (ARM64, via Termux from F-Droid)
Windows/, Mac/, Linux/, Android/) containing its own install and start scripts. The heavy files in Shared/ are shared between all of them.
Internet Access
Internet access is only required during the initial setup phase. The install script downloads the Ollama engine binary (~50 MB) and your chosen model weights (1.6 GB to 7 GB depending on your selection). Once both are saved to the
Shared/ folder on your drive, the system operates fully offline — no internet connection is needed to run, chat, or use any feature of the UI.Android-Specific Requirements
Running USB-Uncensored-LLM natively on Android via Termux has slightly different requirements from the desktop platforms:- RAM: 6 GB minimum (8 GB+ recommended). Only the 2B model runs well on 6 GB devices; larger models require 12 GB or more.
- Processor: ARM64 (virtually all modern Android phones and tablets qualify).
- Termux: Must be installed from F-Droid — the Play Store version is outdated and will not work correctly.
- Internet: WiFi or mobile data is required for initial setup (engine + model downloads). After setup, the engine runs offline.
- Battery: LLM inference is CPU-intensive. Plug in your charger before running extended sessions — inference drains battery significantly faster than typical app usage.
On Android, expect generation speeds of approximately 3–10 tokens per second on the 2B model, compared to 30–50+ tokens per second on a PC with a discrete GPU. Run
termux-wake-lock before starting the engine to prevent Android from suspending the process in the background.