Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/techjarves/USB-Uncensored-LLM/llms.txt

Use this file to discover all available pages before exploring further.

USB-Uncensored-LLM is configured through a combination of environment variables (set by the start scripts), a persistent settings.json file (managed via the chat UI or API), and per-model Modelfiles. Most settings have sensible defaults and require no manual editing — this reference exists for users who want to customize behavior beyond what the UI exposes.

Environment Variables

The start scripts for each platform export these variables before launching the Ollama engine. You can override them in your own shell session before running a start script if you need non-default values.
VariableSet ByDefaultDescription
OLLAMA_MODELSAll start scriptsShared/models/ollama_dataDirectory where Ollama stores its model registry and imported model blobs
OLLAMA_ORIGINSAll start scripts*CORS allowed origins for the Ollama API — set to * so the LAN-accessible chat server can proxy requests without browser CORS errors
OLLAMA_HOSTAll start scripts127.0.0.1:11434Address and port the Ollama engine listens on
OLLAMA_HOMELinux/Mac start scriptsShared/.ollama-runtimeOverrides the default ~/.ollama location, keeping all engine state on the USB drive
OLLAMA_TMPDIRLinux/Mac start scriptsShared/.ollama-runtime/tmpTemp directory for Ollama during model operations — kept on the USB to avoid writing to the host system
On Windows, OLLAMA_HOME and OLLAMA_TMPDIR are not set because the Windows Ollama binary manages its working files relative to OLLAMA_MODELS automatically. On Linux and macOS, they are set to Shared/.ollama-runtime and Shared/.ollama-runtime/tmp respectively to prevent any writes to ~/.ollama on the host machine.

Chat Server Settings

The chat server reads its configuration from Shared/chat_data/settings.json at startup. This file is created automatically with default values if it does not exist. Settings can be updated at runtime via the chat UI’s Settings panel or via a POST /api/settings request — no server restart is required.
globalSystemPrompt
string
default:"\"\""
A system prompt injected at the start of every new conversation. When set, this overrides the per-model system prompt defined in the Modelfile. Leave empty to use each model’s built-in system prompt. Can be set via the chat UI Settings panel.
temperature
float
default:"0.7"
Controls the randomness of model output. 0.0 produces fully deterministic, repetitive responses. 1.0 produces highly creative but potentially incoherent output. The default of 0.7 is a balanced starting point suitable for most use cases. This value is forwarded to Ollama on every /api/chat request.
logMode
string
default:"\"errors_only\""
Controls which events are written to Shared/logs/chat_server.log. Accepted values:
  • "errors_only" — only ERROR-level events are logged (failed proxies, file write errors, bad requests). Recommended for normal use to avoid filling up the USB drive.
  • "all" — every request event is logged, including successful chat completions, settings saves, and chat history reads. Useful for debugging.
This setting takes effect immediately when changed via POST /api/settings without restarting the server.
Default settings.json:
{
  "globalSystemPrompt": "",
  "temperature": 0.7,
  "logMode": "errors_only"
}

Modelfile Parameters

Each installed model has a corresponding Modelfile stored at Shared/models/Modelfile-<local-name> (for example, Shared/models/Modelfile-gemma2-2b-local). A legacy Shared/models/Modelfile is also maintained for backward compatibility — it always points to the first model installed. Modelfiles are created automatically by the installer. The format used for every model is:
FROM ./<model-filename>.gguf
PARAMETER temperature 0.7
PARAMETER top_p 0.9
SYSTEM <system prompt text>
ParameterDefaultDescription
temperature0.7Sampling temperature — same semantic as the chat server temperature setting. The Modelfile value is the model-level default; the chat server setting overrides it per-request.
top_p0.9Nucleus sampling threshold. Only tokens whose cumulative probability reaches this value are considered. Lower values make output more focused; higher values allow more variety.
SYSTEMModel-specificThe system prompt baked into the model registration. Each curated model ships with an uncensored system prompt. Overridden at runtime by globalSystemPrompt if that setting is non-empty.
The FROM path uses a relative ./ prefix, which means Ollama resolves it relative to the directory where ollama create is run. The installer always runs ollama create from inside Shared/models/, so the .gguf file must be in that directory.

Chat Server CLI Flags

Shared/chat_server.py accepts two optional flags. Flags can be combined.
FlagDescription
--llama-cppSwitch to llama.cpp mode — targets http://127.0.0.1:8080 instead of the Ollama engine on :11434. Translates Ollama-style /api/chat payloads to OpenAI-compatible /v1/chat/completions requests for llama-server. Used automatically by Android/start.sh.
--no-browserSuppress automatic browser open on startup. Used by Android/start.sh since Android opens the browser via am start before launching the Python server.
Usage examples:
# Standard desktop mode (Ollama engine)
python3 Shared/chat_server.py

# Android / llama.cpp mode
python3 Shared/chat_server.py --llama-cpp

# Headless / no browser (e.g., server environments)
python3 Shared/chat_server.py --no-browser

# Android combined flags
python3 Shared/chat_server.py --no-browser --llama-cpp

Ports

PortServiceConfigurable?
3333Chat server HTTP (serves the FastChatUI and /api/* endpoints)No — hardcoded as CHAT_SERVER_PORT in Shared/chat_server.py
11434Ollama engine (desktop platforms)Yes — via the OLLAMA_HOST environment variable
8080llama.cpp llama-server (Android)No — hardcoded in Android/start.sh and matched by --llama-cpp mode
To change the chat server port from 3333, open Shared/chat_server.py and modify the CHAT_SERVER_PORT constant near the top of the file. If you expose the server over a LAN, update any firewall rules to match the new port.

Log File

The chat server writes structured logs to Shared/logs/chat_server.log. Rotation: The log file rotates automatically when it reaches 10 MB, keeping 1 backup (chat_server.log.1). Older backups are discarded. This caps total log storage at ~20 MB, which is safe for USB drives. Log record fields (written for every event when logMode is "all", or only for errors when "errors_only"):
FieldDescription
timestampISO 8601 local time with timezone
levelINFO or ERROR
request_idUUID per request, useful for correlating entries
methodHTTP method (GET, POST, etc.)
pathRequest path
client_ipRequester’s IP address
model_nameModel name from the request payload
model_tempTemperature value from the request payload
model_streamWhether streaming was requested
hardware_specsSnapshot of platform, CPU, RAM, and Python version at server startup

Build docs developers (and LLMs) love