Chat Server API Reference and Internals

chat_server.py is a zero-dependency Python HTTP server built entirely from the standard library — no pip install required. It serves the FastChatUI.html web interface, persists conversation history as JSON files on the drive, proxies all Ollama API calls (eliminating browser CORS restrictions), and exposes live CPU and RAM metrics. It runs on port 3333 and binds to all network interfaces so it is reachable from phones and other devices on the same LAN.

Configuration Constants

These values are set at the top of chat_server.py and control core server behaviour:

Constant	Default	Description
`CHAT_SERVER_PORT`	`3333`	TCP port the HTTP server listens on
`OLLAMA_HOST`	`http://127.0.0.1:11434`	Address of the local Ollama engine
`LLAMA_CPP_MODE`	`false`	Activated by passing `--llama-cpp` as a CLI argument; changes `OLLAMA_HOST` to `http://127.0.0.1:8080` and enables OpenAI API translation

All file paths inside the server are resolved relative to the location of chat_server.py itself (i.e., Shared/). This ensures the server always reads and writes to the USB drive, regardless of the current working directory when it is launched.

API Endpoints

`GET /` and `GET /index.html`

Serves the main chat interface.

Response: 200 text/html — contents of Shared/FastChatUI.html
Error: 404 if FastChatUI.html is missing from the Shared/ directory

`GET /api/chats`

Returns the full saved chat history from disk.

Response: 200 application/json — a JSON array of all chat objects stored in Shared/chat_data/chats.json
Fallback: If the file is missing or contains malformed JSON, returns 200 with an empty array [] rather than an error, so the UI always loads cleanly

`POST /api/chats`

Persists the current chat history to disk.

Body: JSON array of chat objects (the full chat history from the UI)
Response: 200 application/json
```
{ "ok": true }
```
Error: 500 application/json
```
{ "error": "..." }
```
Atomicity: Writes to chats.json.tmp first, then renames to chats.json via os.replace(). This prevents corruption if the process is killed mid-write.

`GET /api/settings`

Returns the current user settings from Shared/chat_data/settings.json.

Response: 200 application/json

{
  "globalSystemPrompt": "",
  "temperature": 0.7,
  "logMode": "errors_only"
}

If the file is missing, these defaults are returned automatically.

`POST /api/settings`

Merges a partial or full settings object with the existing settings and saves to disk.

Body: A partial JSON object — only keys you want to change are required
```
{ "temperature": 0.9, "logMode": "all" }
```
The incoming object is merged on top of the existing settings (missing keys are preserved).
logMode is normalized: any value other than "all" is coerced to "errors_only".
The in-memory log mode is updated immediately without restart.

Response: 200 application/json

{ "ok": true, "logMode": "errors_only" }

Error: 500 application/json
```
{ "error": "..." }
```

`GET /api/stats`

Returns real-time CPU and RAM usage for the host machine.

Response: 200 application/json

{
  "cpu_percent": 14.2,
  "ram_percent": 67.8,
  "has_psutil": false
}

has_psutil — indicates whether the optional psutil library was found. The server works without it using platform-native fallbacks:
Platform RAM source CPU source
Windows GlobalMemoryStatusEx (kernel32) GetSystemTimes (kernel32)
Linux /proc/meminfo /proc/stat delta
macOS — —

On macOS, CPU and RAM are returned as 0.0 in the stdlib fallback path to avoid potential permission issues. Install psutil (pip install psutil) for accurate macOS stats.
Error: 500 application/json on unexpected failure

Platform	RAM source	CPU source
Windows	`GlobalMemoryStatusEx` (kernel32)	`GetSystemTimes` (kernel32)
Linux	`/proc/meminfo`	`/proc/stat` delta
macOS	—	—

`GET | POST | DELETE /ollama/*`

Transparent reverse proxy to the local Ollama engine. All Ollama API calls from the browser are routed here to avoid CORS errors.

Path rewriting: The /ollama prefix is stripped before forwarding. For example, GET /ollama/api/tags is forwarded to GET http://127.0.0.1:11434/api/tags.
Streaming: For /api/chat and /api/generate, the response is streamed back in 4096-byte chunks as they arrive from Ollama, enabling token-by-token rendering in the UI.
Validation: POST /ollama/api/chat validates the request body before forwarding:
- model must be a non-empty string — returns 400 if missing or blank
- messages must be a non-empty array — returns 400 if missing or empty
LLAMA_CPP_MODE bridging: When started with --llama-cpp, the proxy translates /api/chat payloads from Ollama JSONL format to OpenAI /v1/chat/completions format for llama.cpp’s llama-server. SSE (data: {...}) responses from llama-server are bridged back to Ollama-style JSONL ({"message": ..., "done": false}\n) for the UI.
Responses:
- Upstream status code on success
- 400 — validation failure (missing model or messages)
- 502 — Ollama engine is unreachable (not running or not yet ready)
- 500 — unexpected proxy error

Concurrency Model

chat_server.py uses a custom ThreadedHTTPServer class that extends Python’s built-in http.server.HTTPServer:

class ThreadedHTTPServer(http.server.HTTPServer):
    def process_request(self, request, client_address):
        thread = threading.Thread(target=self._handle, args=(request, client_address))
        thread.daemon = True
        thread.start()

Each incoming request is dispatched to a new daemon thread. This means:

A long-running streaming response (a model generating thousands of tokens) does not block the UI from loading or saving chats.
Hardware stats polls from the UI run concurrently with active generation.
Thread safety for shared file access is enforced with threading.RLock() — DATA_FILE_LOCK guards chats.json and settings.json, and LOG_MODE_LOCK guards the in-memory log mode state.

CORS

Every response from the server includes the following CORS headers, regardless of endpoint:

Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS
Access-Control-Allow-Headers: Content-Type, Authorization

Preflight OPTIONS requests return 204 No Content with these headers and no body. This configuration allows the UI to be accessed from any origin — critical for LAN access from mobile devices whose IP differs from the host machine.

Logging

Log entries are written to Shared/logs/chat_server.log via Python’s logging.handlers.RotatingFileHandler:

Setting	Value
Max file size	10 MB
Backup count	1 (`chat_server.log.1`)
Encoding	UTF-8

Log writes are asynchronous: a QueueHandler / QueueListener pair moves records off the request thread immediately, preventing I/O latency from slowing down responses. Each record is flushed to disk immediately after being written. Each log entry is a structured, multi-line block that includes:

Timestamp with timezone
Request ID (UUID, unique per request)
HTTP method and path
Client IP address
User-Agent header
Model name, temperature, and stream flag
Python module, function, and line number
Thread name
Full hardware snapshot (platform, CPU count, total RAM, Python version)
Exception traceback (if applicable)

Log mode is controlled at runtime via POST /api/settings:

`logMode` value	What is logged
`"errors_only"`	Only `ERROR`-level and above (default)
`"all"`	All levels including `INFO` (every request)

The mode change takes effect immediately — no server restart required.

Atomic File Writes

Both chats.json and settings.json are saved using a write-then-rename pattern to prevent data loss:

temp_file = CHATS_FILE + ".tmp"
with open(temp_file, "w", encoding="utf-8") as f:
    json.dump(chats, f, ensure_ascii=False, indent=2)
    f.flush()
os.replace(temp_file, CHATS_FILE)

os.replace() is an atomic operation on both POSIX and Windows. If the process is killed between the open and the replace, the original file is untouched. If it is killed after replace, the new data is fully committed. There is no window where the file can be half-written or empty.

Starting the Server Manually

The server is normally started by the OS-specific start script, but it can be run directly:

# Standard mode (proxies Ollama at :11434)
python3 Shared/chat_server.py

# llama.cpp mode (proxies llama-server at :8080, translates API format)
python3 Shared/chat_server.py --llama-cpp

# Suppress automatic browser open on launch
python3 Shared/chat_server.py --no-browser

# Combine flags
python3 Shared/chat_server.py --llama-cpp --no-browser

Use --no-browser when running headlessly (e.g. on a remote Linux server or in a Termux background session). The server still binds to 0.0.0.0:3333 and is reachable over the network.

Get Started

Platform Guides

Models

Architecture

Reference

Chat Server API Reference and Internals

Configuration Constants

API Endpoints

`GET /` and `GET /index.html`

`GET /api/chats`

`POST /api/chats`

`GET /api/settings`

`POST /api/settings`

`GET /api/stats`

`GET | POST | DELETE /ollama/*`

Concurrency Model

CORS

Logging

Atomic File Writes

Starting the Server Manually

Build docs developers (and LLMs) love

Get Started

Platform Guides

Models

Architecture

Reference

Documentation Index

​Configuration Constants

​API Endpoints

​GET / and GET /index.html

​GET /api/chats

​POST /api/chats

​GET /api/settings

​POST /api/settings

​GET /api/stats

​GET | POST | DELETE /ollama/*

​Concurrency Model

​CORS

​Logging

​Atomic File Writes

​Starting the Server Manually

Build docs developers (and LLMs) love

Configuration Constants

API Endpoints

`GET /` and `GET /index.html`

`GET /api/chats`

`POST /api/chats`

`GET /api/settings`

`POST /api/settings`

`GET /api/stats`

`GET | POST | DELETE /ollama/*`

Concurrency Model

CORS

Logging

Atomic File Writes

Starting the Server Manually