Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/techjarves/USB-Uncensored-LLM/llms.txt

Use this file to discover all available pages before exploring further.

USB-Uncensored-LLM has a deliberately low hardware floor — it is designed to run on older machines, budget laptops, and portable drives rather than requiring a dedicated AI workstation. The most important resource for local LLM inference is RAM, not CPU clock speed. The model weights must fit into system memory (or GPU VRAM) during inference; if they do not, the OS begins swapping to disk and generation slows to a crawl. Choose a model whose size fits comfortably within your available RAM using the table below, and you will have a smooth experience regardless of how old your hardware is.

Storage

The drive or folder where you place USB-Uncensored-LLM must have enough free space to hold the engine binaries and at least one model.
RequirementValue
Minimum free space8 GB
Recommended free space16 GB
Minimum USB standardUSB 3.0
Recommended USB standardUSB 3.1 / USB-C SSD
USB 2.0 drives will technically work, but model loading will be noticeably slow because the multi-gigabyte weight files must be read from the drive on every launch. USB 3.0 or faster is strongly recommended. Running from an internal SSD on the host machine is the fastest option — model loading is near-instant at internal drive read speeds.

RAM Requirements

The table below lists each model in the catalog alongside its file size and the RAM needed to run it at a usable speed. “Minimum RAM” means the model will load and generate tokens; “Recommended RAM” means generation will be smooth with headroom for the OS and other applications.
ModelFile SizeMinimum RAMRecommended RAM
Gemma 2 2B Abliterated1.6 GB8 GB8 GB
Phi-3.5 Mini 3.8B2.2 GB8 GB8 GB
Dolphin 2.9 Llama 3 8B4.9 GB8 GB16 GB
Gemma 4 E4B Ultra Uncensored Heretic5.34 GB8 GB16 GB
Qwen 3.5 9B Uncensored Aggressive5.2 GB16 GB16 GB
NemoMix Unleashed 12B7.0 GB16 GB32 GB
If you are on a machine with 8 GB of RAM, start with Gemma 2 2B Abliterated. It is the fastest model in the catalog and performs remarkably well for its size.

GPU Acceleration

Hardware acceleration is fully automatic — no configuration files to edit, no environment variables to set. When the Ollama engine starts, it detects the available compute hardware on the host machine and uses the best option available.
HardwareAcceleration
NVIDIA GPUCUDA (auto-detected if NVIDIA drivers are installed)
Apple Silicon (M1 / M2 / M3)Metal GPU acceleration, enabled by default
Intel / AMD (no discrete GPU)CPU inference using AVX instruction sets
A GPU is not required. Every model in the catalog runs on CPU-only machines. GPU acceleration improves tokens-per-second generation speed significantly, but the output quality of the model is identical regardless of whether a GPU is used.

Operating System

USB-Uncensored-LLM ships OS-specific launcher scripts for the following platforms:
  • Windows 10 / 11 (x86-64)
  • macOS 11 Big Sur and later (Intel and Apple Silicon)
  • Ubuntu / Debian Linux (x86-64)
  • Android (ARM64, via Termux from F-Droid)
Each OS has a dedicated subdirectory (Windows/, Mac/, Linux/, Android/) containing its own install and start scripts. The heavy files in Shared/ are shared between all of them.

Internet Access

Internet access is only required during the initial setup phase. The install script downloads the Ollama engine binary (~50 MB) and your chosen model weights (1.6 GB to 7 GB depending on your selection). Once both are saved to the Shared/ folder on your drive, the system operates fully offline — no internet connection is needed to run, chat, or use any feature of the UI.

Android-Specific Requirements

Running USB-Uncensored-LLM natively on Android via Termux has slightly different requirements from the desktop platforms:
  • RAM: 6 GB minimum (8 GB+ recommended). Only the 2B model runs well on 6 GB devices; larger models require 12 GB or more.
  • Processor: ARM64 (virtually all modern Android phones and tablets qualify).
  • Termux: Must be installed from F-Droid — the Play Store version is outdated and will not work correctly.
  • Internet: WiFi or mobile data is required for initial setup (engine + model downloads). After setup, the engine runs offline.
  • Battery: LLM inference is CPU-intensive. Plug in your charger before running extended sessions — inference drains battery significantly faster than typical app usage.
On Android, expect generation speeds of approximately 3–10 tokens per second on the 2B model, compared to 30–50+ tokens per second on a PC with a discrete GPU. Run termux-wake-lock before starting the engine to prevent Android from suspending the process in the background.

Build docs developers (and LLMs) love