Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/techjarves/USB-Uncensored-LLM/llms.txt

Use this file to discover all available pages before exploring further.

Most issues with USB-Uncensored-LLM fall into three categories: engine not found (ran start before install), insufficient RAM (model too large for the host machine), or platform-specific permission problems. The solutions below cover every known issue across Windows, macOS, Linux, and Android.

Windows Issues

Cause: Windows App Execution Aliases intercepts python or winget commands typed at a terminal and opens the Microsoft Store instead of executing the binary. When a .bat file that calls python hits this interception, Windows opens the Store pop-up and immediately exits the script.Fix:
  1. Run the .bat file from a Command Prompt (cmd.exe) window rather than double-clicking it from Explorer.
  2. Alternatively, right-click the .bat file and select Run as Administrator.
  3. To permanently fix the issue: go to Settings → Apps → Advanced app settings → App execution aliases and disable the aliases for python.exe and python3.exe.
The start script already bundles a portable Python at Shared/python/python.exe and downloads one automatically on first run if no system Python is found. The App Execution Alias issue only affects the brief moment before the bundled Python is located.
Cause: Windows/start-fast-chat.bat was run before Windows/install.bat completed. The start script checks for the engine binary at Shared/bin/ollama-windows.exe and exits with this error if it is not present.Fix: Run Windows/install.bat first. The installer downloads the Ollama engine (step 6 of 7) and imports your selected models into the registry. The start script cannot function without the engine binary.If install.bat was interrupted mid-download, re-run it — the installer skips files that already exist and only re-downloads what is missing.
Cause: The system PowerShell execution policy is set to Restricted, which blocks unsigned scripts from running.Fix: The install.bat file already passes -ExecutionPolicy Bypass to every PowerShell invocation, which should bypass this restriction without changing system settings. If the error still appears, run the following in an Administrator PowerShell session:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
This allows locally created scripts to run while still requiring remote scripts to be signed. It only affects the current user and does not require a system restart.

macOS Issues

Cause: macOS Gatekeeper quarantines binaries downloaded from the internet and blocks execution of unsigned executables.Fix: The macOS installer automatically runs xattr -dr com.apple.quarantine on the engine binaries during setup. If the error persists after installation (for example, if you copied the folder from another machine), remove the quarantine attribute manually:
xattr -dr com.apple.quarantine Shared/bin/ollama-darwin
xattr -dr com.apple.quarantine Shared/lib/ollama
Alternatively, use the GUI path: System Settings → Privacy & Security, scroll to the Security section, and click Allow Anyway next to the blocked binary name.
Cause: Double-clicking a .command file in Finder opens a new Terminal window and runs the script, but if macOS cannot determine a working directory or the script exits immediately due to a permission error, the window closes before you can read the output.Fix: Open Terminal manually, drag the Mac/install.command file into the Terminal window to paste its full path, and press Enter. This keeps the Terminal session open so you can read any error messages.If the script reports a permission denied error on the script itself:
chmod +x Mac/install.command
bash Mac/install.command

Linux Issues

Cause: The Linux Ollama runtime is distributed as a .tar.zst archive. The installer extracts it using tar --use-compress-program=zstd, which requires the zstd binary to be available on the host system. Many minimal Linux installations do not include it by default.Fix: Install zstd using your distribution’s package manager, then re-run bash Linux/install.sh:
# Ubuntu / Debian
sudo apt install zstd

# CentOS / RHEL / Fedora
sudo yum install zstd
Cause: The Ollama archive extraction was interrupted or only partially completed. The start script checks for both Shared/bin/ollama-linux and Shared/lib/ollama/llama-server — if either is missing, it reports the engine as incomplete.Fix: Re-run the Linux installer. It detects the incomplete install and re-downloads the full archive:
bash Linux/install.sh
Do not manually delete partial files before re-running — the installer handles cleanup automatically.

Android Issues

Cause: Android’s battery optimization aggressively terminates background processes, including long-running Termux sessions. Compiling llama.cpp from source takes 10–30 minutes, which is long enough for Android to kill the process multiple times.Fix: Acquire a wake lock before starting the installer to prevent Android from suspending Termux:
termux-wake-lock
bash Android/install.sh
Keep Termux in the foreground for the entire duration of compilation. Do not switch to another app or lock the screen. Plug in your charger — compilation is CPU-intensive and drains the battery quickly.The installer installs these build dependencies before compiling: clang cmake git wget ninja python. If any of these packages fail to install, the compilation step will fail. Run apt update && apt full-upgrade -y first if you see package errors.The Android/start.sh script also calls termux-wake-lock automatically at launch to prevent the inference process from being killed during use.
Cause: Termux installed from the Google Play Store is severely outdated and uses a different, broken package infrastructure. Stale versions of pkg can also have corrupted package caches that cause install failures.Fix:
  1. Uninstall Termux from the Play Store.
  2. Install Termux from F-Droid — this is the maintained, up-to-date version.
The Android installer uses apt directly (instead of pkg) and runs apt full-upgrade before installing build tools to avoid cache-related issues with outdated Termux installations.

Performance Issues

Cause: The loaded model is larger than the host machine’s available RAM. When the model cannot fit in RAM, the operating system continuously swaps pages to disk — on a USB drive, this is especially severe and can reduce generation speed to near zero.Fix: Re-run the installer and select a smaller model. Gemma 2 2B Abliterated (~1.6 GB) fits comfortably in 8 GB of RAM and generates at 10–30 tokens/second on a modern CPU. Larger models (9B, 12B) require 16+ GB of RAM for comfortable performance.As a rule of thumb: the model’s file size should be no more than 60% of your total system RAM to leave headroom for the OS and other processes.
The Ollama engine binary in Shared/bin/ auto-detects available hardware accelerators at startup and prints the detected configuration in the terminal window.NVIDIA (Windows/Linux):
  • Verify that nvidia-smi runs and shows your GPU.
  • Ensure CUDA drivers are installed and up to date.
  • CUDA 11.x and 12.x are both supported by the bundled Ollama engine.
Apple Silicon (macOS):
  • Metal acceleration is detected and enabled automatically on M1/M2/M3/M4 chips. No action needed.
AMD (Linux):
  • ROCm support depends on the Ollama build. Check the terminal output at startup for rocm in the device list.
If Ollama reports cpu only despite a GPU being present, ensure your GPU driver is not blocked by Gatekeeper (macOS) or that the CUDA runtime is on the system PATH (Linux).

Chat Server Issues

Cause: Another process on the host machine is already listening on port 3333. This is uncommon but possible if a previous chat server session was not cleanly shut down, or if another application happens to use that port.Fix: Identify and kill the process occupying port 3333:
# Linux / macOS
lsof -i :3333
kill -9 <PID>
:: Windows
netstat -ano | findstr :3333
taskkill /PID <PID> /F
After clearing the port, re-run the start script. To change the port permanently, edit CHAT_SERVER_PORT in Shared/chat_server.py.
Cause: The Shared/chat_data/ directory does not exist or is not writable. This can happen on USB drives formatted as FAT32 if the physical write-protect switch on the drive is engaged, or if the drive was ejected and re-mounted as read-only.Checklist:
  1. Confirm Shared/chat_data/ exists. The server creates it automatically on startup, so if it is missing, check that the server actually launched without errors.
  2. On USB drives, check for a small physical write-protect switch on the side of the drive and ensure it is in the unlocked position.
  3. On Linux/macOS, check mount flags: mount | grep <drive-name>. If ro appears, the drive was mounted read-only. Remount with write permissions or use a different USB port.
  4. On Windows, right-click the drive in Explorer → Properties → check that “Read-only” is not selected.
The chat server uses atomic writes (os.replace() after writing to a .tmp file) so partial saves cannot corrupt existing history — if a save fails, the previous chats.json remains intact.
Cause: A previous session left the Ollama engine running in the background, or you launched the start script twice.Behavior: This is handled gracefully. The Windows, Linux, and macOS start scripts all check http://127.0.0.1:11434/api/tags before attempting to start the engine. If Ollama is already responding, the scripts skip the startup step and go straight to launching the chat server.You will see [OK] Ollama engine is already running! in the terminal when this happens — this is normal and not an error.

Build docs developers (and LLMs) love