Telemetry Data Pipeline

Telemetry pipelines are critical for modern observability. Timeplus Proton provides a powerful SQL-based approach to process logs, metrics, and traces in real-time, offering an alternative to traditional ELK stacks with superior performance.

Use Case Overview

Telemetry pipelines powered by Proton enable:

Real-time Log Analysis: Process application and system logs as they arrive
Metrics Aggregation: Compute statistics and rollups for time-series data
Distributed Tracing: Analyze request flows across microservices
Alerting: Detect anomalies and trigger notifications
SQL-based Observability: Query telemetry data using standard SQL

Why SQL-based Observability?

SQL-based Observability is growing in popularity due to:

Performance: ClickHouse-based solutions handle 300K+ logs/second vs 25K for ELK (Uber case study)
Fast Aggregations: Columnar storage enables sub-second queries over billions of rows
Familiar Interface: Use SQL instead of domain-specific query languages
Real-time + Historical: Single system for both streaming and batch analytics

Architecture: Nginx Access Log Streaming

This tutorial demonstrates real-time analysis of Nginx access logs using Timeplus Proton.

Deployment Options

Option 1: Simple (Development)

Nginx → Proton (via external stream on log files)

Option 2: Production (Scalable)

Nginx → Filebeat → Kafka → Proton → ClickHouse/Visualization

Option 3: Hybrid

Nginx → Vector → Proton → Output Destinations

Tutorial: Real-time Nginx Log Analysis

For production environments, share logs securely over NFS: On Nginx Server:

# Install NFS server
sudo apt-get install nfs-kernel-server -y

# Configure export (whitelist Proton server IP)
echo "/var/log/nginx 172.31.24.29(ro,sync,no_subtree_check)" | sudo tee -a /etc/exports

# Export and start
sudo exportfs -ra
sudo systemctl restart nfs-kernel-server

On Proton Server:

# Install NFS client
sudo apt install nfs-common -y

# Mount logs
sudo mkdir -p /mnt/nginx
sudo mount 172.31.17.58:/var/log/nginx /mnt/nginx

Step 1: Create External Stream for Log Files

Proton can directly tail log files and parse timestamps:

CREATE EXTERNAL STREAM nginx_access_log (
    raw string
)
SETTINGS
    type='log',
    log_files='access.log',
    log_dir='/mnt/nginx',
    timestamp_regex='(\[\d{2}\/\w+\/\d{4}:\d{2}:\d{2}:\d{2} \+\d{4}\])',
    row_delimiter='(\n)';

Step 2: Query Live Traffic

View logs in real-time:

SELECT * FROM nginx_access_log;

This query continuously returns new log lines as requests hit your server.

Step 3: Parse and Analyze Logs

Create a parsed view with structured fields:

CREATE MATERIALIZED VIEW nginx_parsed AS
SELECT
    extract(raw, '(\d+\.\d+\.\d+\.\d+)') AS ip,
    parse_datetime64_best_effort(extract(raw, '\[(.*?)\]')) AS timestamp,
    extract(raw, '"(\w+) (.*?) HTTP/.*?"') AS method,
    extract(raw, '"\w+ (.*?) HTTP/.*?"') AS path,
    extract(raw, '" (\d{3}) ') AS status,
    extract(raw, '" \d{3} (\d+)') AS bytes
FROM nginx_access_log;

Step 4: Real-time Analytics

Request Rate by Method:

SELECT
    method,
    count() AS requests,
    bar(requests, 0, 100, 50) AS bar
FROM nginx_parsed
GROUP BY method
LIMIT 10 BY emit_version();

Error Rate Monitoring:

SELECT
    if(status >= 400, '40x+', 'ok') AS status_category,
    count() AS count,
    count() * 100.0 / sum(count()) OVER () AS percentage
FROM nginx_parsed
GROUP BY status_category;

Top Requested Pages (excluding static assets):

SELECT
    path,
    count() AS requests
FROM nginx_parsed
WHERE path NOT LIKE '%.js%'
  AND path NOT LIKE '%.css%'
  AND path NOT LIKE '%.png%'
  AND path NOT LIKE '%.jpg%'
GROUP BY path
ORDER BY requests DESC
LIMIT 10;

Step 5: Windowed Aggregations

Compute metrics per time window:

SELECT
    window_start,
    count() AS requests,
    count_if(status >= 400) AS errors,
    errors * 100.0 / requests AS error_rate,
    avg(bytes) AS avg_bytes
FROM tumble(nginx_parsed, 1m)
GROUP BY window_start;

Alerting Patterns

High Error Rate Alert

CREATE MATERIALIZED VIEW alert_high_error_rate AS
SELECT
    window_start,
    count_if(status >= 500) AS server_errors,
    count() AS total_requests,
    server_errors * 100.0 / total_requests AS error_rate
FROM tumble(nginx_parsed, 1m)
GROUP BY window_start
HAVING error_rate > 5.0;  -- Alert when >5% error rate

Traffic Spike Detection

CREATE MATERIALIZED VIEW alert_traffic_spike AS
SELECT
    window_start,
    count() AS current_requests,
    lag(count(), 1) OVER (ORDER BY window_start) AS previous_requests,
    current_requests * 100.0 / previous_requests AS growth_rate
FROM tumble(nginx_parsed, 1m)
GROUP BY window_start
HAVING growth_rate > 200;  -- Alert when traffic doubles

Slow Response Detection

CREATE MATERIALIZED VIEW alert_slow_requests AS
SELECT
    _tp_time,
    path,
    extract(raw, 'request_time=(\d+\.\d+)') AS response_time
FROM nginx_access_log
WHERE response_time::float64 > 2.0;  -- Alert when >2 seconds

Historical Analysis with CSV Import

For analyzing historical logs, convert and import them:

Step 1: Convert Logs to CSV

Use a parser script to convert Nginx logs:

import re
import csv

def parse_nginx_log(line):
    pattern = r'(\S+) - (\S+) \[([^\]]+)\] "(\S+) (\S+) (\S+)" (\d+) (\d+)'
    match = re.match(pattern, line)
    if match:
        return match.groups()
    return None

with open('access.log') as f_in, open('access.csv', 'w') as f_out:
    writer = csv.writer(f_out)
    writer.writerow(['ip', 'user', 'timestamp', 'method', 'path', 'protocol', 'status', 'bytes'])
    for line in f_in:
        parsed = parse_nginx_log(line)
        if parsed:
            writer.writerow(parsed)

Step 2: Create Historical Table

CREATE TABLE nginx_historical (
    remote_ip ipv4,
    remote_user string,
    date_time datetime64(3),
    http_verb string,
    path string,
    http_version string,
    status uint32,
    bytes uint32
)
PRIMARY KEY (date_time, remote_ip);

Step 3: Import CSV

INSERT INTO nginx_historical
SELECT *
FROM file('access.csv', 'CSVWithNames')
SETTINGS max_insert_threads = 8;

Step 4: Historical Analytics

Error analysis:

SELECT
    if(status >= 400, '40x+', 'ok') AS status_category,
    count() AS count
FROM nginx_historical
GROUP BY status_category;

Traffic patterns by hour:

SELECT
    to_hour(date_time) AS hour,
    count() AS requests
FROM nginx_historical
GROUP BY hour
ORDER BY hour;

Geographic Analysis with IP Enrichment

Enrich logs with geographic data:

-- Create IP info table
CREATE TABLE ip_geolocation (
    ip ipv4,
    city string,
    country string,
    country_name string,
    country_flag_emoji string,
    continent string
);

-- Query with geolocation
SELECT
    ipv4_num_to_string_class_c(n.remote_ip) AS ip,
    count(*) AS requests,
    g.country_name,
    g.country_flag_emoji
FROM nginx_historical AS n
JOIN ip_geolocation AS g ON n.remote_ip = g.ip
GROUP BY ip, country_name, country_flag_emoji
HAVING requests > 100
ORDER BY requests DESC
LIMIT 50;

Visualization with Grafana

Connect Proton to Grafana using the Timeplus Proton plugin:

Install the plugin in Grafana
Configure data source pointing to Proton
Create dashboards with real-time queries

Example Dashboard Queries:

Requests per second: SELECT count() FROM nginx_parsed
P95 Response Time: SELECT quantile(0.95)(response_time) FROM nginx_parsed
Geographic Distribution: Join with IP geolocation table

Performance Tips

Materialize Frequently Queried Fields

Instead of parsing JSON in every query, create a materialized view:

CREATE MATERIALIZED VIEW nginx_structured AS
SELECT
    parse_datetime64_best_effort(extract(raw, '\[(.*?)\]')) AS timestamp,
    extract(raw, '(\d+\.\d+\.\d+\.\d+)') AS ip,
    extract(raw, '"(\w+)') AS method,
    extract(raw, '" (\d{3})') AS status
FROM nginx_access_log;

Use Appropriate Data Types

-- Use ipv4 instead of string for IP addresses
CREATE TABLE logs (
    ip ipv4,  -- More efficient than string
    timestamp datetime64(3),
    status uint16  -- Use smallest type that fits
);

Next Steps

Explore Feature Pipeline for ML feature computation
Learn about Real-time ETL patterns
Set up Change Data Capture for databases

Use Cases

Use Case Overview

Why SQL-based Observability?

Architecture: Nginx Access Log Streaming

Deployment Options

Tutorial: Real-time Nginx Log Analysis

Step 1: Create External Stream for Log Files

Step 2: Query Live Traffic

Step 3: Parse and Analyze Logs

Step 4: Real-time Analytics

Step 5: Windowed Aggregations

Alerting Patterns

High Error Rate Alert

Traffic Spike Detection

Slow Response Detection

Historical Analysis with CSV Import

Step 1: Convert Logs to CSV

Step 2: Create Historical Table

Step 3: Import CSV

Step 4: Historical Analytics

Geographic Analysis with IP Enrichment

Visualization with Grafana

Performance Tips

Materialize Frequently Queried Fields

Use Appropriate Data Types

Next Steps

Build docs developers (and LLMs) love

Use Cases

Documentation Index

​Use Case Overview

​Why SQL-based Observability?

​Architecture: Nginx Access Log Streaming

​Deployment Options

​Tutorial: Real-time Nginx Log Analysis

​Setup: NFS-based Log Sharing

​Step 1: Create External Stream for Log Files

​Step 2: Query Live Traffic

​Step 3: Parse and Analyze Logs

​Step 4: Real-time Analytics

​Step 5: Windowed Aggregations

​Alerting Patterns

​High Error Rate Alert

​Traffic Spike Detection

​Slow Response Detection

​Historical Analysis with CSV Import

​Step 1: Convert Logs to CSV

​Step 2: Create Historical Table

​Step 3: Import CSV

​Step 4: Historical Analytics

​Geographic Analysis with IP Enrichment

​Visualization with Grafana

​Performance Tips

​Materialize Frequently Queried Fields

​Use Appropriate Data Types

​Next Steps

​Related Resources

Build docs developers (and LLMs) love

Use Case Overview

Why SQL-based Observability?

Architecture: Nginx Access Log Streaming

Deployment Options

Tutorial: Real-time Nginx Log Analysis

Setup: NFS-based Log Sharing

Step 1: Create External Stream for Log Files

Step 2: Query Live Traffic

Step 3: Parse and Analyze Logs

Step 4: Real-time Analytics

Step 5: Windowed Aggregations

Alerting Patterns

High Error Rate Alert

Traffic Spike Detection

Slow Response Detection

Historical Analysis with CSV Import

Step 1: Convert Logs to CSV

Step 2: Create Historical Table

Step 3: Import CSV

Step 4: Historical Analytics

Geographic Analysis with IP Enrichment

Visualization with Grafana

Performance Tips

Materialize Frequently Queried Fields

Use Appropriate Data Types

Next Steps

Related Resources