Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/techjarves/USB-Uncensored-LLM/llms.txt

Use this file to discover all available pages before exploring further.

Android support in USB-Uncensored-LLM works through Termux, a Linux terminal emulator that runs a full Debian-like environment on your device. Unlike the Windows, macOS, and Linux installers — which use a pre-built Ollama binary — the Android installer clones the llama.cpp source repository and compiles it natively on your device for your exact ARM64 processor. This compilation step takes 10–30 minutes but produces a binary perfectly tuned to your hardware, giving maximum inference performance. No PC is required at any stage.

Prerequisites

Install Termux from F-Droid only. The Play Store version of Termux is severely outdated and broken — it will not work. Download the correct version here:👉 https://f-droid.org/en/packages/com.termux/
  • Termux installed from F-Droid (link above)
  • Android device with an ARM64 processor (virtually all modern Android phones and tablets)
  • 6 GB+ RAM (8 GB+ strongly recommended; only the 2B model runs reliably on 6 GB devices)
  • ~4 GB free storage (more if you want to keep the compiled build artifacts)
  • WiFi or mobile data for initial setup — required for the apt package installation, the llama.cpp git clone, and the model download

Installation

1

Install Termux from F-Droid

Navigate to https://f-droid.org/en/packages/com.termux/ on your Android device and install the Termux APK. Open Termux after installation and wait for the bootstrap to complete.
2

Copy the project to your device

Transfer the USB-Uncensored-LLM folder to your Android device using one of these methods:
  • USB OTG cable — connect the USB drive and copy the folder to internal storage
  • File transfer — copy over MTP/USB from a PC
  • Git clone — inside Termux, run git clone <repo-url>
3

Navigate to the project folder in Termux

In Termux, use cd to navigate to the USB-Uncensored-LLM root directory. If you copied it to your Android Downloads folder, it will typically be at ~/storage/downloads/USB-Uncensored-LLM. Run termux-setup-storage first if you need to access shared storage.
4

Run the Android installer

bash Android/install.sh
The script detects the Termux environment, requests storage permissions, and begins setup.
5

Build tools are installed automatically

The installer updates and upgrades the package list via apt, then installs build tools via pkg:
apt update -y
apt full-upgrade -y
pkg install -y clang cmake git wget ninja python
This ensures all compilers and build utilities needed for the llama.cpp compilation are present.
6

llama.cpp is cloned and compiled natively

The installer clones the llama.cpp repository to Shared/bin/llama.cpp/ and compiles it for your ARM64 processor:
cmake -B build -GNinja -DLLAMA_BUILD_SERVER=ON -DLLAMA_BUILD_TESTS=OFF
cmake --build build --config Release --target llama-server
The compiled llama-server binary is copied to Shared/bin/llama-server-android.
Run termux-wake-lock before starting compilation to prevent Android from killing the Termux process mid-build. The compilation takes 10–30 minutes depending on your device. Keep Termux in the foreground during this step.
7

Choose a model from the Android catalog

After compilation, the installer presents the Android-specific model catalog, which is optimized for mobile devices:
[1] Gemma 2 2B Abliterated (1.6 GB) [UNCENSORED - FASTEST]
[2] SmolLM2 1.7B Uncensored (1.0 GB) [UNCENSORED - LIGHT]
[3] Qwen2.5 1.5B Instruct (1.1 GB) [STANDARD - MULTILINGUAL]
[4] Phi 3.5 Mini 3.8B (2.2 GB) [STANDARD - SMART]
[5] Qwen 3.5 9B Uncensored (5.2 GB) [UNCENSORED - HEAVY - FOR 12GB+ RAM]
[C] CUSTOM - Paste HuggingFace .gguf direct link
[0] Skip downloading (I already have models in Shared/models/)
The installer uses wget -c to download your selected model, which supports resuming interrupted downloads.

Launching

bash Android/start.sh
The launcher starts Shared/bin/llama-server-android with the first .gguf file found in Shared/models/, acquires a Termux wakelock to prevent Android from killing the process, and waits up to 90 seconds for the server to become ready on port 8080. Once the engine is online, you are given a choice of UI:
[1] USB FastChat UI (Beautiful Dark Mode, Auto-Saves)
[2] Llama.cpp Default UI (Classic Raw Developer UI)
Selecting [1] opens http://localhost:3333 (the full chat server). Selecting [2] opens http://localhost:8080 (the raw llama.cpp interface). The browser is opened automatically using Android’s intent system:
am start -a android.intent.action.VIEW -d "$TARGET_URL"

Performance Tips

Maximize performance and stability on Android:
  • Run termux-wake-lock before starting the AI — this prevents Android from suspending or killing the inference process
  • Keep Termux in the foreground or use Android’s split-screen mode
  • Close all other apps before running the model to free RAM
  • Use the 2B model on devices with less than 12 GB RAM — the 9B model requires 12 GB+ to run without swapping
  • Plug in your charger — LLM inference is CPU-intensive and drains battery rapidly
  • Expect approximately 3–10 tokens per second on a 2B model (compared to 30–50+ tokens/sec on a GPU-equipped PC)

Android vs Desktop Architecture

The Android installer uses a fundamentally different execution path from the desktop platforms:
AspectDesktop (Windows / macOS / Linux)Android
EnginePre-built Ollama binary (ollama-windows.exe, ollama-darwin, ollama-linux)Natively compiled llama-server-android from llama.cpp source
API portOllama API on port 11434OpenAI-compatible API on port 8080
Chat server flagStandard modepython chat_server.py --no-browser --llama-cpp
Model formatImported into Ollama’s internal registryRaw .gguf files passed directly via -m flag
The --llama-cpp flag tells the Python chat server to translate Ollama API calls into the OpenAI-compatible format that llama-server’s /v1/ endpoint expects. This means the same FastChatUI.html interface works identically on all platforms. The llama-server is started with -c 2048 -cb -np 4 --port 8080, limiting context to 2048 tokens to fit in mobile RAM while allowing up to 4 parallel slots.

LAN Access from Android

Once bash Android/start.sh is running and the engine is online, other devices on the same WiFi network can access your Android-hosted AI. The launcher detects your local IP address and displays the network URL in the terminal. Simply navigate to that address in any browser on the same network:
http://192.168.x.x:3333
This lets you use a tablet as the AI server while browsing from a PC, or share the session with other users on your local network.

Build docs developers (and LLMs) love