Loki

Loki provides scalable log aggregation for the Kimbernetes cluster, storing logs collected by Grafana Alloy. It’s deployed in SingleBinary mode with MinIO for S3-compatible object storage, optimized for Kubernetes environments.

HelmRelease Configuration

overlays/base/grafana/grafana-loki/helm-release.yaml

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: grafana-loki
spec:
  timeout: 15m
  chart:
    spec:
      chart: loki
      sourceRef:
        kind: HelmRepository
        name: grafana
      version: 6.49.0
  interval: 24h
  releaseName: loki-monolith

The release is named loki-monolith to distinguish it from distributed deployments.

Deployment Architecture

Loki runs in SingleBinary mode with all components in a single process:

overlays/base/grafana/grafana-loki/helm-release.yaml

deploymentMode: SingleBinary

singleBinary:
  replicas: 1
  persistence:
    storageClass: local-storage
  extraArgs:
    - -store.retention=31d

Why SingleBinary Mode?

Simplicity

Single deployment to manage, easier troubleshooting

Resource Efficiency

Lower overhead compared to microservices mode

Sufficient for Most Workloads

Handles moderate log volumes effectively

Easy Upgrades

No complex migration between versions

For high-scale deployments (>100GB/day), consider switching to microservices mode with distributed components.

Storage Configuration

Loki uses a hybrid storage approach:

Index and Cache Storage

loki:
  schemaConfig:
    configs:
      - from: "2024-04-01"
        store: tsdb
        object_store: s3
        schema: v13
        index:
          prefix: loki_index_
          period: 24h

TSDB (Time Series Database) index format provides better query performance and compression compared to legacy BoltDB.

Object Storage (MinIO)

overlays/base/grafana/grafana-loki/helm-release.yaml

minio:
  enabled: true
  persistence:
    size: 30Gi
    storageClass: local-storage

MinIO provides S3-compatible storage for log chunks:

Capacity: 30Gi
Storage Class: local-storage
Access: Internal cluster service

MinIO Service Endpoints

MinIO exposes two services:

# S3 API endpoint (used by Loki)
loki-monolith-minio.observability.svc:9000

# Web UI (for administration)
loki-monolith-minio-console.observability.svc:9001

Access the MinIO console:

kubectl port-forward -n observability svc/loki-monolith-minio-console 9001:9001

Persistent Volume

A PV is configured for WAL and temporary data:

overlays/kimawesome/infrastructure/observability/grafana-loki/loki-pv.yaml

apiVersion: v1
kind: PersistentVolume
metadata:
  name: loki-pv
spec:
  capacity:
    storage: 50Gi
  accessModes:
    - ReadWriteOnce
  storageClassName: local-storage
  hostPath:
    path: /mnt/loki-data

Configuration

Authentication

loki:
  auth_enabled: false

Authentication is disabled for internal cluster use. Enable auth_enabled: true and configure tenants for multi-tenant deployments.

Replication

loki:
  commonConfig:
    replication_factor: 1

Single replica configuration suitable for development and small-scale production.

Limits and Features

overlays/base/grafana/grafana-loki/helm-release.yaml

loki:
  pattern_ingester:
    enabled: true
  limits_config:
    allow_structured_metadata: true
    volume_enabled: true
  ruler:
    enable_api: true

Pattern Ingester
Structured Metadata
Volume Queries
Ruler API

Automatically extracts log patterns for faster queries:

# Query detected patterns
{namespace="default"} | pattern

Patterns help identify common log structures without indexing every field.

Allows attaching metadata that doesn’t affect log stream cardinality:

# Filter by structured metadata
{namespace="default"} | trace_id="abc123"

Enables volume range queries for analyzing log throughput:

# Get log volume by app
sum by (app) (bytes_over_time({namespace="default"}[1h]))

Enables LogQL-based alerting rules:

apiVersion: loki.grafana.com/v1
kind: LoggingRule
metadata:
  name: high-error-rate
spec:
  rules:
  - alert: HighErrorRate
    expr: |
      sum(rate({namespace="default"} |= "error"[5m])) > 10

Retention Policy

Logs are retained for 31 days:

singleBinary:
  extraArgs:
    - -store.retention=31d

Retention is applied to the entire Loki instance. For per-tenant retention, configure limits_config.retention_period with authentication enabled.

LogQL Query Examples

{namespace="default"}

Advanced LogQL

Pattern Matching
Metric Queries
Line Format
Label Format

{namespace="default"}
  | pattern `<ip> - - <_> "<method> <path> <_>" <status> <_>`
  | status >= 400

Extract structured data from unstructured logs.

# Request rate by status code
sum by (status_code) (
  rate(
    {namespace="default"}
    | json
    | __error__="" [5m]
  )
)

Convert logs to metrics for alerting.

{namespace="default"}
  | json
  | line_format "{{.level}} [{{.app}}] {{.message}}"

Reformat log lines for readability.

{namespace="default"}
  | json
  | label_format env="{{.environment}}"

Create new labels from log content.

Accessing Loki

Via Grafana Datasource

Loki is pre-configured as a Grafana datasource. Access through:

Grafana → Explore → Select “Loki” datasource

Direct API Access

Port forward to Loki service:

kubectl port-forward -n observability svc/loki-monolith 3100:3100

Query API:

# Query logs
curl -G -s "http://localhost:3100/loki/api/v1/query_range" \
  --data-urlencode 'query={namespace="default"}' \
  --data-urlencode 'start=1h' | jq

# Get labels
curl -s http://localhost:3100/loki/api/v1/labels | jq

LogCLI Tool

Use Grafana’s LogCLI for command-line queries:

# Install LogCLI
go install github.com/grafana/loki/cmd/logcli@latest

# Forward port and query
export LOKI_ADDR=http://localhost:3100
logcli query '{namespace="default"}'

Performance Optimization

Query Performance

Use specific label filters

# Good: Specific label selector
{namespace="default", app="my-app"}

# Bad: Broad selector
{namespace=~".*"}

More specific label filters reduce the number of log streams Loki must search.

Limit time ranges

# Good: Recent time range
{namespace="default"}[1h]

# Bad: Large time range
{namespace="default"}[30d]

Shorter time ranges query fewer chunks and return faster.

Use line filters early

# Good: Filter early in pipeline
{namespace="default"} |= "error" | json

# Bad: Parse before filtering
{namespace="default"} | json | status >= 400

Line filters (|=, !=, |~, !~) are faster than parsers.

Resource Allocation

For high log volumes, increase resources:

singleBinary:
  resources:
    limits:
      memory: 4Gi
      cpu: 2000m
    requests:
      memory: 2Gi
      cpu: 1000m

Cache Configuration

The deployment includes built-in caching:

overlays/base/grafana/grafana-loki/helm-release.yaml

chunksCache:
  resources:
    limits:
      memory: "1Gi"
    requests:
      memory: "512Mi"
resultsCache:
  resources:
    limits:
      memory: "256Mi"
    requests:
      memory: "128Mi"

Troubleshooting

Loki pod not starting

Check pod status:

kubectl get pod -n observability -l app.kubernetes.io/name=loki
kubectl describe pod -n observability -l app.kubernetes.io/name=loki

View logs:

kubectl logs -n observability -l app.kubernetes.io/name=loki

Common issues:

PVC not bound (check storage class)
MinIO not ready (check MinIO pods)
Resource limits too low

No logs appearing

Verify Alloy is sending logs:

# Check Alloy write metrics
kubectl port-forward -n observability ds/grafana-monitoring-alloy-logs 12345:12345
curl http://localhost:12345/metrics | grep loki_write_sent_bytes_total

Check Loki ingestion:

kubectl port-forward -n observability svc/loki-monolith 3100:3100
curl http://localhost:3100/metrics | grep loki_ingester_streams

Queries timing out

Check query performance:

kubectl port-forward -n observability svc/loki-monolith 3100:3100
curl http://localhost:3100/metrics | grep loki_query_duration_seconds

Optimize queries:

Use more specific label selectors
Reduce time ranges
Add line filters before parsers

Storage running out

Check MinIO storage usage:

kubectl exec -n observability -it deploy/loki-monolith-minio -- df -h /export

Options:

Reduce retention period
Increase MinIO PVC size
Migrate to external S3

Migrating to Distributed Mode

For production at scale, consider distributed deployment:

Plan the Migration

Export existing logs or accept data loss
Plan for separate read/write/backend components
Configure object storage (S3, GCS, etc.)

Update HelmRelease

deploymentMode: Distributed
read:
  replicas: 3
write:
  replicas: 3
backend:
  replicas: 3

Update Datasource

Point to the new gateway service:

url: http://loki-gateway.observability.svc:80

Next Steps

Visualize in Grafana

Create dashboards for log analytics

Configure Alloy

Customize log collection pipelines

Add Metrics

Correlate logs with metrics

Overview

Return to observability architecture

Getting Started

Setup & Deployment

Architecture

Infrastructure Components

Observability

Applications

Operations

HelmRelease Configuration

Deployment Architecture

Why SingleBinary Mode?

Simplicity

Resource Efficiency

Sufficient for Most Workloads

Easy Upgrades

Storage Configuration

Index and Cache Storage

Object Storage (MinIO)

Persistent Volume

Configuration

Authentication

Replication

Limits and Features

Retention Policy

LogQL Query Examples

Advanced LogQL

Accessing Loki

Performance Optimization

Query Performance

Resource Allocation

Cache Configuration

Troubleshooting

Migrating to Distributed Mode

Next Steps

Visualize in Grafana

Configure Alloy

Add Metrics

Overview

Build docs developers (and LLMs) love

Getting Started

Setup & Deployment

Architecture

Infrastructure Components

Observability

Applications

Operations

​HelmRelease Configuration

​Deployment Architecture

​Why SingleBinary Mode?

Simplicity

Resource Efficiency

Sufficient for Most Workloads

Easy Upgrades

​Storage Configuration

​Index and Cache Storage

​Object Storage (MinIO)

​Persistent Volume

​Configuration

​Authentication

​Replication

​Limits and Features

​Retention Policy

​LogQL Query Examples

​Advanced LogQL

​Accessing Loki

​Performance Optimization

​Query Performance

​Resource Allocation

​Cache Configuration

​Troubleshooting

​Migrating to Distributed Mode

​Next Steps

Visualize in Grafana

Configure Alloy

Add Metrics

Overview

Build docs developers (and LLMs) love

HelmRelease Configuration

Deployment Architecture

Why SingleBinary Mode?

Storage Configuration

Index and Cache Storage

Object Storage (MinIO)

Persistent Volume

Configuration

Authentication

Replication

Limits and Features

Retention Policy

LogQL Query Examples

Advanced LogQL

Accessing Loki

Performance Optimization

Query Performance

Resource Allocation

Cache Configuration

Troubleshooting

Migrating to Distributed Mode

Next Steps