USB-Uncensored-LLM is configured through a combination of environment variables (set by the start scripts), a persistentDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/techjarves/USB-Uncensored-LLM/llms.txt
Use this file to discover all available pages before exploring further.
settings.json file (managed via the chat UI or API), and per-model Modelfiles. Most settings have sensible defaults and require no manual editing — this reference exists for users who want to customize behavior beyond what the UI exposes.
Environment Variables
The start scripts for each platform export these variables before launching the Ollama engine. You can override them in your own shell session before running a start script if you need non-default values.| Variable | Set By | Default | Description |
|---|---|---|---|
OLLAMA_MODELS | All start scripts | Shared/models/ollama_data | Directory where Ollama stores its model registry and imported model blobs |
OLLAMA_ORIGINS | All start scripts | * | CORS allowed origins for the Ollama API — set to * so the LAN-accessible chat server can proxy requests without browser CORS errors |
OLLAMA_HOST | All start scripts | 127.0.0.1:11434 | Address and port the Ollama engine listens on |
OLLAMA_HOME | Linux/Mac start scripts | Shared/.ollama-runtime | Overrides the default ~/.ollama location, keeping all engine state on the USB drive |
OLLAMA_TMPDIR | Linux/Mac start scripts | Shared/.ollama-runtime/tmp | Temp directory for Ollama during model operations — kept on the USB to avoid writing to the host system |
On Windows,
OLLAMA_HOME and OLLAMA_TMPDIR are not set because the Windows Ollama binary manages its working files relative to OLLAMA_MODELS automatically. On Linux and macOS, they are set to Shared/.ollama-runtime and Shared/.ollama-runtime/tmp respectively to prevent any writes to ~/.ollama on the host machine.Chat Server Settings
The chat server reads its configuration fromShared/chat_data/settings.json at startup. This file is created automatically with default values if it does not exist. Settings can be updated at runtime via the chat UI’s Settings panel or via a POST /api/settings request — no server restart is required.
A system prompt injected at the start of every new conversation. When set, this overrides the per-model system prompt defined in the Modelfile. Leave empty to use each model’s built-in system prompt. Can be set via the chat UI Settings panel.
Controls the randomness of model output.
0.0 produces fully deterministic, repetitive responses. 1.0 produces highly creative but potentially incoherent output. The default of 0.7 is a balanced starting point suitable for most use cases. This value is forwarded to Ollama on every /api/chat request.Controls which events are written to
Shared/logs/chat_server.log. Accepted values:"errors_only"— onlyERROR-level events are logged (failed proxies, file write errors, bad requests). Recommended for normal use to avoid filling up the USB drive."all"— every request event is logged, including successful chat completions, settings saves, and chat history reads. Useful for debugging.
POST /api/settings without restarting the server.settings.json:
Modelfile Parameters
Each installed model has a corresponding Modelfile stored atShared/models/Modelfile-<local-name> (for example, Shared/models/Modelfile-gemma2-2b-local). A legacy Shared/models/Modelfile is also maintained for backward compatibility — it always points to the first model installed.
Modelfiles are created automatically by the installer. The format used for every model is:
| Parameter | Default | Description |
|---|---|---|
temperature | 0.7 | Sampling temperature — same semantic as the chat server temperature setting. The Modelfile value is the model-level default; the chat server setting overrides it per-request. |
top_p | 0.9 | Nucleus sampling threshold. Only tokens whose cumulative probability reaches this value are considered. Lower values make output more focused; higher values allow more variety. |
SYSTEM | Model-specific | The system prompt baked into the model registration. Each curated model ships with an uncensored system prompt. Overridden at runtime by globalSystemPrompt if that setting is non-empty. |
The
FROM path uses a relative ./ prefix, which means Ollama resolves it relative to the directory where ollama create is run. The installer always runs ollama create from inside Shared/models/, so the .gguf file must be in that directory.Chat Server CLI Flags
Shared/chat_server.py accepts two optional flags. Flags can be combined.
| Flag | Description |
|---|---|
--llama-cpp | Switch to llama.cpp mode — targets http://127.0.0.1:8080 instead of the Ollama engine on :11434. Translates Ollama-style /api/chat payloads to OpenAI-compatible /v1/chat/completions requests for llama-server. Used automatically by Android/start.sh. |
--no-browser | Suppress automatic browser open on startup. Used by Android/start.sh since Android opens the browser via am start before launching the Python server. |
Ports
| Port | Service | Configurable? |
|---|---|---|
3333 | Chat server HTTP (serves the FastChatUI and /api/* endpoints) | No — hardcoded as CHAT_SERVER_PORT in Shared/chat_server.py |
11434 | Ollama engine (desktop platforms) | Yes — via the OLLAMA_HOST environment variable |
8080 | llama.cpp llama-server (Android) | No — hardcoded in Android/start.sh and matched by --llama-cpp mode |
To change the chat server port from
3333, open Shared/chat_server.py and modify the CHAT_SERVER_PORT constant near the top of the file. If you expose the server over a LAN, update any firewall rules to match the new port.Log File
The chat server writes structured logs toShared/logs/chat_server.log.
Rotation: The log file rotates automatically when it reaches 10 MB, keeping 1 backup (chat_server.log.1). Older backups are discarded. This caps total log storage at ~20 MB, which is safe for USB drives.
Log record fields (written for every event when logMode is "all", or only for errors when "errors_only"):
| Field | Description |
|---|---|
timestamp | ISO 8601 local time with timezone |
level | INFO or ERROR |
request_id | UUID per request, useful for correlating entries |
method | HTTP method (GET, POST, etc.) |
path | Request path |
client_ip | Requester’s IP address |
model_name | Model name from the request payload |
model_temp | Temperature value from the request payload |
model_stream | Whether streaming was requested |
hardware_specs | Snapshot of platform, CPU, RAM, and Python version at server startup |