Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/techjarves/USB-Uncensored-LLM/llms.txt

Use this file to discover all available pages before exploring further.

USB-Uncensored-LLM uses a deliberate two-tier layout to solve the most expensive problem in cross-platform portability: you should never have to download a 5 GB model twice. Platform-specific binaries — the Ollama engine, runtime libraries, and OS launchers — are isolated inside per-OS folders (Windows/, Linux/, Mac/, Android/), each completely independent of the others. Everything that is heavy and OS-agnostic — model weights, chat history, settings, and the Python chat server — lives in a single Shared/ volume that every OS folder reads from and writes to. The result is that plugging the drive into a new machine only requires downloading the small (~50 MB) engine binary for that OS; the gigabytes of model weights are already there.

Top-Level Layout

The root of the USB drive (or project clone) looks like this:
[Portable USB Drive / Project Root]
├── Android/          # Android (Termux) installers & launchers
│   ├── install.sh
│   ├── start.sh
│   └── uninstall.sh
├── Linux/            # Linux installers & launchers
│   ├── install.sh
│   ├── start.sh
│   └── uninstall.sh
├── Mac/              # macOS installers & launchers
│   ├── install.command
│   ├── start.command
│   └── uninstall.command
├── Windows/          # Windows installers & launchers
│   ├── install.bat
│   ├── install-core.ps1
│   ├── start-fast-chat.bat
│   └── uninstall.bat
└── Shared/           # Unified data volume (OS-agnostic)
    ├── FastChatUI.html
    ├── chat_server.py
    ├── bin/              # OS-specific engine binaries
    ├── chat_data/        # Persistent conversation history
    ├── config/           # Model catalog JSON files
    ├── lib/              # Ollama runtime libraries
    ├── logs/             # Chat server logs
    ├── models/           # GGUF weights + Ollama data
    ├── python/           # Portable Python environment (Windows)
    ├── scripts/          # Shared utility scripts
    └── vendor/           # Offline UI assets

The Shared/ Volume

Every subdirectory under Shared/ has a specific responsibility. The start scripts on each platform set environment variables pointing here before launching the engine.

bin/

Contains the Ollama engine executables, one per supported platform:
FilePlatform
ollama-windows.exeWindows
ollama-linuxLinux
ollama-darwinmacOS
llama-server-androidAndroid (Termux, ARM64)
Each OS installer downloads only the binary it needs into this shared folder. The start scripts know which binary to invoke based on the host OS, so the same bin/ directory serves all platforms without conflict.

chat_data/

Holds the two persistent JSON state files written and read by chat_server.py:
  • chats.json — the full conversation history array. Every chat session ever created is stored here, regardless of which OS launched the server.
  • settings.json — user preferences: temperature, globalSystemPrompt, and logMode. These settings survive reboots, OS switches, and drive moves.
Because both files live in Shared/, your chat history is available whether you boot from Windows or Linux on the same drive.

config/

Contains JSON catalog files consumed by the installer scripts:
  • models.json — the curated model catalog: names, download URLs, file sizes, and HuggingFace paths. Read by scripts/config_query.py during install.bat to render the interactive model selection menu.
  • ui-vendor-assets.json — a manifest of offline UI assets to download during install (fonts, markdown renderer, PDF viewer).

lib/ollama/

Holds the Ollama runtime libraries installed by the Linux and macOS installers, including the llama-server backend binary. This directory is created automatically by Linux/install.sh and Mac/install.command. Windows uses a self-contained ollama-windows.exe that bundles its own runtime, so it does not use this folder.

logs/

Contains the rotating chat server log file:
  • chat_server.log — structured, multi-line log entries written by chat_server.py. The file rotates at 10 MB with 1 backup file kept (chat_server.log.1). Log verbosity is controlled at runtime via POST /api/settings with logMode: "all" or logMode: "errors_only".

models/

The largest subdirectory on the drive — all AI model data lives here:
PathContents
models/*.ggufRaw GGUF weight files (e.g. gemma-2-2b-abliterated.gguf)
models/ollama_data/Ollama’s internal model registry and manifests
models/Modelfile-<name>Ollama Modelfile configs (system prompt, parameters)
models/installed-models.txtPlain-text record of which models have been set up
The OLLAMA_MODELS environment variable is set to Shared/models/ollama_data by every start script before launching the engine, ensuring Ollama always reads from and writes to the USB regardless of host machine defaults.

python/

A portable Python 3.12 embed distribution for Windows. Python is not guaranteed to be installed on every Windows machine, so start-fast-chat.bat checks for Shared/python/python.exe first. If it exists, the bat file uses it directly. If it doesn’t, the script auto-downloads Python 3.12 embed (~11 MB) and extracts it here. This directory is Windows-only; Linux and macOS have system Python available via python3.

scripts/

Shared utility scripts used across platforms:
ScriptPurpose
config_query.pyParses config/models.json for installer menus
download-ui-assets.shFetches offline vendor assets on Linux/macOS
download-ui-assets.ps1Fetches offline vendor assets on Windows
uninstall-common.shShared cleanup logic used by OS-specific uninstallers

vendor/

Offline copies of UI dependencies downloaded during install:
  • Markdown renderer (used to format AI responses in the browser)
  • Web fonts
  • PDF viewer assets
Bundling these assets means the chat UI works with no internet connection after install — the entire experience runs air-gapped.

Why This Layout?

The cross-platform benefit becomes clear when you use the drive across multiple machines:
1

First machine (Windows)

Run Windows/install.bat. This downloads ollama-windows.exe to Shared/bin/ and your selected model weights (e.g. a 5.2 GB .gguf file) to Shared/models/.
2

Second machine (Linux)

Run bash Linux/install.sh. This downloads only ollama-linux to Shared/bin/ and the Ollama runtime libraries to Shared/lib/ollama/. The model weights in Shared/models/ are already there — no re-download.
3

Third machine (macOS)

Run Mac/install.command. Downloads ollama-darwin. Same models, same chat history, same settings — all already present in Shared/.
Running from an internal SSD instead of a USB drive gives near-instant model loading. The folder layout works identically whether the root is E:\USB-Uncensored-LLM\ or /home/user/USB-Uncensored-LLM/.

Portable Python (Windows)

On Linux and macOS, the start scripts use the system python3 interpreter. On Windows, no Python is guaranteed. Windows/start-fast-chat.bat resolves this with a three-step fallback:
1. Check for Shared/python/python.exe  →  use portable USB copy
2. Check for system `python` on PATH   →  use system Python
3. Neither found                       →  download Python 3.12 embed (~11 MB)
                                           extract to Shared/python/
                                           use Shared/python/python.exe
The embed package is a minimal, self-contained Python 3.12 distribution — no installer, no registry entries, no pip. It runs chat_server.py using only the standard library, which is all the server requires. The download happens once; subsequent launches find Shared/python/python.exe and skip straight to step 1.
The portable Python embed is amd64 only. On ARM64 Windows devices, the system Python (if installed) will be used instead. If neither is present on an ARM64 machine, download a compatible Python 3.12 embed zip manually and extract it to Shared/python/.

Build docs developers (and LLMs) love