Configuration Reference for USB-Uncensored-LLM

USB-Uncensored-LLM is configured through a combination of environment variables (set by the start scripts), a persistent settings.json file (managed via the chat UI or API), and per-model Modelfiles. Most settings have sensible defaults and require no manual editing — this reference exists for users who want to customize behavior beyond what the UI exposes.

Environment Variables

The start scripts for each platform export these variables before launching the Ollama engine. You can override them in your own shell session before running a start script if you need non-default values.

Variable	Set By	Default	Description
`OLLAMA_MODELS`	All start scripts	`Shared/models/ollama_data`	Directory where Ollama stores its model registry and imported model blobs
`OLLAMA_ORIGINS`	All start scripts	`*`	CORS allowed origins for the Ollama API — set to `*` so the LAN-accessible chat server can proxy requests without browser CORS errors
`OLLAMA_HOST`	All start scripts	`127.0.0.1:11434`	Address and port the Ollama engine listens on
`OLLAMA_HOME`	Linux/Mac start scripts	`Shared/.ollama-runtime`	Overrides the default `~/.ollama` location, keeping all engine state on the USB drive
`OLLAMA_TMPDIR`	Linux/Mac start scripts	`Shared/.ollama-runtime/tmp`	Temp directory for Ollama during model operations — kept on the USB to avoid writing to the host system

On Windows, OLLAMA_HOME and OLLAMA_TMPDIR are not set because the Windows Ollama binary manages its working files relative to OLLAMA_MODELS automatically. On Linux and macOS, they are set to Shared/.ollama-runtime and Shared/.ollama-runtime/tmp respectively to prevent any writes to ~/.ollama on the host machine.

Chat Server Settings

The chat server reads its configuration from Shared/chat_data/settings.json at startup. This file is created automatically with default values if it does not exist. Settings can be updated at runtime via the chat UI’s Settings panel or via a POST /api/settings request — no server restart is required.

globalSystemPrompt

string

default:"\"\""

A system prompt injected at the start of every new conversation. When set, this overrides the per-model system prompt defined in the Modelfile. Leave empty to use each model’s built-in system prompt. Can be set via the chat UI Settings panel.

temperature

float

default:"0.7"

Controls the randomness of model output. 0.0 produces fully deterministic, repetitive responses. 1.0 produces highly creative but potentially incoherent output. The default of 0.7 is a balanced starting point suitable for most use cases. This value is forwarded to Ollama on every /api/chat request.

logMode

string

default:"\"errors_only\""

Controls which events are written to Shared/logs/chat_server.log. Accepted values:

"errors_only" — only ERROR-level events are logged (failed proxies, file write errors, bad requests). Recommended for normal use to avoid filling up the USB drive.
"all" — every request event is logged, including successful chat completions, settings saves, and chat history reads. Useful for debugging.

This setting takes effect immediately when changed via POST /api/settings without restarting the server.

Default settings.json:

{
  "globalSystemPrompt": "",
  "temperature": 0.7,
  "logMode": "errors_only"
}

Modelfile Parameters

Each installed model has a corresponding Modelfile stored at Shared/models/Modelfile-<local-name> (for example, Shared/models/Modelfile-gemma2-2b-local). A legacy Shared/models/Modelfile is also maintained for backward compatibility — it always points to the first model installed. Modelfiles are created automatically by the installer. The format used for every model is:

FROM ./<model-filename>.gguf
PARAMETER temperature 0.7
PARAMETER top_p 0.9
SYSTEM <system prompt text>

Parameter	Default	Description
`temperature`	`0.7`	Sampling temperature — same semantic as the chat server `temperature` setting. The Modelfile value is the model-level default; the chat server setting overrides it per-request.
`top_p`	`0.9`	Nucleus sampling threshold. Only tokens whose cumulative probability reaches this value are considered. Lower values make output more focused; higher values allow more variety.
`SYSTEM`	Model-specific	The system prompt baked into the model registration. Each curated model ships with an uncensored system prompt. Overridden at runtime by `globalSystemPrompt` if that setting is non-empty.

The FROM path uses a relative ./ prefix, which means Ollama resolves it relative to the directory where ollama create is run. The installer always runs ollama create from inside Shared/models/, so the .gguf file must be in that directory.

Chat Server CLI Flags

Shared/chat_server.py accepts two optional flags. Flags can be combined.

Flag	Description
`--llama-cpp`	Switch to llama.cpp mode — targets `http://127.0.0.1:8080` instead of the Ollama engine on `:11434`. Translates Ollama-style `/api/chat` payloads to OpenAI-compatible `/v1/chat/completions` requests for `llama-server`. Used automatically by `Android/start.sh`.
`--no-browser`	Suppress automatic browser open on startup. Used by `Android/start.sh` since Android opens the browser via `am start` before launching the Python server.

Usage examples:

# Standard desktop mode (Ollama engine)
python3 Shared/chat_server.py

# Android / llama.cpp mode
python3 Shared/chat_server.py --llama-cpp

# Headless / no browser (e.g., server environments)
python3 Shared/chat_server.py --no-browser

# Android combined flags
python3 Shared/chat_server.py --no-browser --llama-cpp

Ports

Port	Service	Configurable?
`3333`	Chat server HTTP (serves the FastChatUI and `/api/*` endpoints)	No — hardcoded as `CHAT_SERVER_PORT` in `Shared/chat_server.py`
`11434`	Ollama engine (desktop platforms)	Yes — via the `OLLAMA_HOST` environment variable
`8080`	llama.cpp `llama-server` (Android)	No — hardcoded in `Android/start.sh` and matched by `--llama-cpp` mode

To change the chat server port from 3333, open Shared/chat_server.py and modify the CHAT_SERVER_PORT constant near the top of the file. If you expose the server over a LAN, update any firewall rules to match the new port.

Log File

The chat server writes structured logs to Shared/logs/chat_server.log. Rotation: The log file rotates automatically when it reaches 10 MB, keeping 1 backup (chat_server.log.1). Older backups are discarded. This caps total log storage at ~20 MB, which is safe for USB drives. Log record fields (written for every event when logMode is "all", or only for errors when "errors_only"):

Field	Description
`timestamp`	ISO 8601 local time with timezone
`level`	`INFO` or `ERROR`
`request_id`	UUID per request, useful for correlating entries
`method`	HTTP method (`GET`, `POST`, etc.)
`path`	Request path
`client_ip`	Requester’s IP address
`model_name`	Model name from the request payload
`model_temp`	Temperature value from the request payload
`model_stream`	Whether streaming was requested
`hardware_specs`	Snapshot of platform, CPU, RAM, and Python version at server startup

Get Started

Platform Guides

Models

Architecture

Reference

Configuration Reference for USB-Uncensored-LLM

Environment Variables

Chat Server Settings

Modelfile Parameters

Chat Server CLI Flags

Ports

Log File

Build docs developers (and LLMs) love

Get Started

Platform Guides

Models

Architecture

Reference

Documentation Index

​Environment Variables

​Chat Server Settings

​Modelfile Parameters

​Chat Server CLI Flags

​Ports

​Log File

Build docs developers (and LLMs) love

Environment Variables

Chat Server Settings

Modelfile Parameters

Chat Server CLI Flags

Ports

Log File