Documentation Index
Fetch the complete documentation index at: https://mintlify.com/techjarves/USB-Uncensored-LLM/llms.txt
Use this file to discover all available pages before exploring further.
chat_server.py is a zero-dependency Python HTTP server built entirely from the standard library — no pip install required. It serves the FastChatUI.html web interface, persists conversation history as JSON files on the drive, proxies all Ollama API calls (eliminating browser CORS restrictions), and exposes live CPU and RAM metrics. It runs on port 3333 and binds to all network interfaces so it is reachable from phones and other devices on the same LAN.
Configuration Constants
These values are set at the top ofchat_server.py and control core server behaviour:
| Constant | Default | Description |
|---|---|---|
CHAT_SERVER_PORT | 3333 | TCP port the HTTP server listens on |
OLLAMA_HOST | http://127.0.0.1:11434 | Address of the local Ollama engine |
LLAMA_CPP_MODE | false | Activated by passing --llama-cpp as a CLI argument; changes OLLAMA_HOST to http://127.0.0.1:8080 and enables OpenAI API translation |
All file paths inside the server are resolved relative to the location of
chat_server.py itself (i.e., Shared/). This ensures the server always reads and writes to the USB drive, regardless of the current working directory when it is launched.API Endpoints
GET / and GET /index.html
Serves the main chat interface.
- Response:
200 text/html— contents ofShared/FastChatUI.html - Error:
404ifFastChatUI.htmlis missing from theShared/directory
GET /api/chats
Returns the full saved chat history from disk.
- Response:
200 application/json— a JSON array of all chat objects stored inShared/chat_data/chats.json - Fallback: If the file is missing or contains malformed JSON, returns
200with an empty array[]rather than an error, so the UI always loads cleanly
POST /api/chats
Persists the current chat history to disk.
- Body: JSON array of chat objects (the full chat history from the UI)
- Response:
200 application/json - Error:
500 application/json - Atomicity: Writes to
chats.json.tmpfirst, then renames tochats.jsonviaos.replace(). This prevents corruption if the process is killed mid-write.
GET /api/settings
Returns the current user settings from Shared/chat_data/settings.json.
- Response:
200 application/json - If the file is missing, these defaults are returned automatically.
POST /api/settings
Merges a partial or full settings object with the existing settings and saves to disk.
- Body: A partial JSON object — only keys you want to change are required
- The incoming object is merged on top of the existing settings (missing keys are preserved).
logModeis normalized: any value other than"all"is coerced to"errors_only".- The in-memory log mode is updated immediately without restart.
- Response:
200 application/json - Error:
500 application/json
GET /api/stats
Returns real-time CPU and RAM usage for the host machine.
-
Response:
200 application/json -
has_psutil— indicates whether the optionalpsutillibrary was found. The server works without it using platform-native fallbacks:Platform RAM source CPU source Windows GlobalMemoryStatusEx(kernel32)GetSystemTimes(kernel32)Linux /proc/meminfo/proc/statdeltamacOS — — On macOS, CPU and RAM are returned as0.0in the stdlib fallback path to avoid potential permission issues. Installpsutil(pip install psutil) for accurate macOS stats. -
Error:
500 application/jsonon unexpected failure
GET | POST | DELETE /ollama/*
Transparent reverse proxy to the local Ollama engine. All Ollama API calls from the browser are routed here to avoid CORS errors.
- Path rewriting: The
/ollamaprefix is stripped before forwarding. For example,GET /ollama/api/tagsis forwarded toGET http://127.0.0.1:11434/api/tags. - Streaming: For
/api/chatand/api/generate, the response is streamed back in 4096-byte chunks as they arrive from Ollama, enabling token-by-token rendering in the UI. - Validation:
POST /ollama/api/chatvalidates the request body before forwarding:modelmust be a non-empty string — returns400if missing or blankmessagesmust be a non-empty array — returns400if missing or empty
- LLAMA_CPP_MODE bridging: When started with
--llama-cpp, the proxy translates/api/chatpayloads from Ollama JSONL format to OpenAI/v1/chat/completionsformat for llama.cpp’sllama-server. SSE (data: {...}) responses from llama-server are bridged back to Ollama-style JSONL ({"message": ..., "done": false}\n) for the UI. - Responses:
- Upstream status code on success
400— validation failure (missing model or messages)502— Ollama engine is unreachable (not running or not yet ready)500— unexpected proxy error
Concurrency Model
chat_server.py uses a custom ThreadedHTTPServer class that extends Python’s built-in http.server.HTTPServer:
- A long-running streaming response (a model generating thousands of tokens) does not block the UI from loading or saving chats.
- Hardware stats polls from the UI run concurrently with active generation.
- Thread safety for shared file access is enforced with
threading.RLock()—DATA_FILE_LOCKguardschats.jsonandsettings.json, andLOG_MODE_LOCKguards the in-memory log mode state.
CORS
Every response from the server includes the following CORS headers, regardless of endpoint:OPTIONS requests return 204 No Content with these headers and no body. This configuration allows the UI to be accessed from any origin — critical for LAN access from mobile devices whose IP differs from the host machine.
Logging
Log entries are written toShared/logs/chat_server.log via Python’s logging.handlers.RotatingFileHandler:
| Setting | Value |
|---|---|
| Max file size | 10 MB |
| Backup count | 1 (chat_server.log.1) |
| Encoding | UTF-8 |
QueueHandler / QueueListener pair moves records off the request thread immediately, preventing I/O latency from slowing down responses. Each record is flushed to disk immediately after being written.
Each log entry is a structured, multi-line block that includes:
- Timestamp with timezone
- Request ID (UUID, unique per request)
- HTTP method and path
- Client IP address
- User-Agent header
- Model name, temperature, and stream flag
- Python module, function, and line number
- Thread name
- Full hardware snapshot (platform, CPU count, total RAM, Python version)
- Exception traceback (if applicable)
POST /api/settings:
logMode value | What is logged |
|---|---|
"errors_only" | Only ERROR-level and above (default) |
"all" | All levels including INFO (every request) |
Atomic File Writes
Bothchats.json and settings.json are saved using a write-then-rename pattern to prevent data loss:
os.replace() is an atomic operation on both POSIX and Windows. If the process is killed between the open and the replace, the original file is untouched. If it is killed after replace, the new data is fully committed. There is no window where the file can be half-written or empty.
Starting the Server Manually
The server is normally started by the OS-specificstart script, but it can be run directly: