Skip to main content

Overview

Protocol Interception is the foundational mechanism that enables APITHON to capture authentication tokens and session data from web-based LLM interfaces. Using Playwright-based browser automation, APITHON acts as a transparent proxy layer that monitors network traffic to extract critical protocol parameters.

How It Works

APItHON employs a sophisticated sniffing mechanism that operates at the browser network layer, intercepting HTTP requests before they reach the target server.

The Protocol Sniffer Function

At the core of the interception system is the protocol_sniffer function (lines 67-82 in apithon.py):
async def protocol_sniffer(request):
    if "StreamGenerate" in request.url:
        post_data = request.post_data
        if post_data:
            decoded = urllib.parse.unquote(post_data)
            ctx_match = re.search(r'(![a-zA-Z0-9_\-]{100,})', decoded)
            sid_match = re.search(r'f.sid=([^&]+)', request.url)
            bl_match = re.search(r'bl=([^&]+)', request.url)
            
            if ctx_match and sid_match:
                SESS["internal_context"] = ctx_match.group(1)
                SESS["session_id"] = sid_match.group(1)
                SESS["build_id"] = bl_match.group(1)
                cookies = await context.cookies()
                SESS["auth_cookie"] = "; ".join([f"{c['name']}={c['value']}" for c in cookies])
                SESS["status_ready"] = True

Target Endpoint Pattern

The sniffer specifically monitors requests containing "StreamGenerate" in the URL, which is the standard endpoint pattern used by many LLM backend services:
https://{domain}/_/BardChatUi/data/assistant.lamda.BardFrontendService/StreamGenerate
The endpoint pattern may vary between different LLM services, but the core interception logic remains the same.

Captured Tokens

When a valid StreamGenerate request is detected, APITHON extracts four critical components:

internal_context

A long-lived context token (100+ characters) that begins with ! and maintains conversation state across requests.

session_id

Extracted from the f.sid URL parameter, this identifies the unique browser session.

build_id

Found in the bl URL parameter, this represents the frontend build version for protocol compatibility.

auth_cookie

The complete authentication cookie string, assembled from all browser cookies associated with the target domain.

Token Extraction Details

Internal Context: Extracted using regex pattern (![a-zA-Z0-9_\-]{100,}) from the URL-decoded POST data payload. Session ID: Parsed from query parameter f.sid in the request URL. Build ID: Parsed from query parameter bl in the request URL. Authentication Cookies: Retrieved via Playwright’s context.cookies() API and formatted as a semicolon-delimited string.

Interception Flow

Here’s the complete flow from browser interaction to token capture:
1

Browser Launch

Playwright launches a Chromium instance in non-headless mode, creating a real browser context.
2

Request Listener Attached

The protocol_sniffer function is attached as an event listener to intercept all outgoing requests:
page.on("request", protocol_sniffer)
3

User Navigates

The browser navigates to the target URL, and the user authenticates (if required).
4

Message Sent

APITHON automatically sends a validation message (”.” character) to trigger backend communication.
5

StreamGenerate Request Detected

When the backend responds, a StreamGenerate request is fired, triggering the sniffer.
6

Tokens Extracted

All four critical tokens are extracted via regex matching and stored in the global SESS dictionary.
7

Status Set to Ready

The status_ready flag is set to True, signaling that the system is ready to bridge requests.

Implementation Details

Playwright Integration

APItHON uses Playwright’s async API to gain fine-grained control over browser behavior:
async with async_playwright() as p:
    browser = await p.chromium.launch(headless=False)
    context = await browser.new_context()
    page = await context.new_page()
Why Non-Headless? The browser runs in visible mode (headless=False) to allow for manual authentication steps like CAPTCHA solving or two-factor authentication.

URL Decoding

POST data is URL-encoded by default. APITHON decodes it before pattern matching:
decoded = urllib.parse.unquote(post_data)

Regex Pattern Matching

Three distinct regex patterns extract the necessary tokens:
  • Context Token: r'(![a-zA-Z0-9_\-]{100,})'
  • Session ID: r'f.sid=([^&]+)'
  • Build ID: r'bl=([^&]+)'
If the target service changes its protocol structure, these regex patterns must be updated accordingly. Always inspect the network traffic first when working with a new target.

Security Considerations

All captured tokens are stored in-memory only in the global SESS dictionary. They are never written to disk, ensuring that sensitive authentication data doesn’t persist after the process terminates.
Since interception happens at the browser level (not network level), it’s transparent to the target service. The service sees normal browser traffic.

Diagram: Interception Architecture

Next Steps

Once tokens are captured via Protocol Interception, they must be validated through Session Synchronization before the gateway can be activated.

Session Synchronization

Learn how APITHON validates and maintains session continuity

Gateway Mode

Understand how captured tokens power the API gateway

Build docs developers (and LLMs) love