Skip to main content
The Historical service is responsible for storing and querying historical data. Historical services cache data segments on local disk and serve queries from that cache as well as from an in-memory cache.

Key Responsibilities

Segment Storage

Caches data segments on local disk in the segment cache

Query Execution

Serves queries from local disk cache and memory-mapped cache

Segment Loading

Pulls segment files from deep storage to local disk

ZooKeeper Integration

Monitors ZooKeeper for segment assignment and announces segment availability

Configuration

For Apache Druid Historical service configuration, see:

Running the Historical

org.apache.druid.cli.Main server historical

Loading and Serving Segments

Each Historical service copies or pulls segment files from deep storage to local disk in an area called the segment cache.
To configure the size and location of the segment cache on each Historical service, set the druid.segmentCache.locations property. For more information, see Segment cache size.

How Segment Assignment Works

1

Coordinator creates ZooKeeper entry

The Coordinator controls the assignment of segments to Historicals and the balance of segments between Historicals. The Coordinator creates ephemeral entries in ZooKeeper in a load queue path.
2

Historical monitors ZooKeeper

Each Historical service maintains a connection to ZooKeeper, watching those paths for segment information.
3

Historical checks segment cache

When a Historical service detects a new entry in the ZooKeeper load queue, it checks its own segment cache.
4

Retrieve segment metadata

If no information about the segment exists in the cache, the Historical service first retrieves metadata from ZooKeeper about the segment, including where the segment is located in deep storage and how it needs to decompress and process it.
5

Pull from deep storage

The Historical pulls down and processes the segment from deep storage.
6

Announce availability

After processing the segment, Druid advertises the segment as being available for queries from the Broker. This announcement is made via ZooKeeper, in a served segments path.
Historical services do not communicate directly with each other, nor do they communicate directly with the Coordinator. All coordination happens through ZooKeeper.
To make data from the segment cache available for querying as soon as possible, Historical services search the local segment cache upon startup and advertise the segments found there.

Loading and Serving Segments from Cache

The segment cache uses memory mapping (mmap). The cache consumes memory from the underlying operating system so Historicals can hold parts of segment files in memory to increase query performance at the data level.

Memory-Mapped Cache Behavior

The in-memory segment cache is affected by:
  • Size of the Historical JVM
  • Heap / direct memory buffers
  • Other services on the operating system itself
At query time, if the required part of a segment file is available in the memory mapped cache or “page cache”, the Historical re-uses it and reads it directly from memory.
If free operating system memory is close to druid.server.maxSize, segment data is more likely to be available in memory and reduce query times.

Understanding Cache Layers

This memory-mapped segment cache is in addition to other query-level caches. For more information, see Query Caching.

Querying Segments

You can configure a Historical service to log and report metrics for every query it services.

Query Documentation

For information on querying Historical services, see the Querying documentation.

HTTP Endpoints

For a list of API endpoints supported by the Historical, see:

Architecture Integration

With Coordinator

  • Receives segment assignment instructions via ZooKeeper
  • Does not communicate directly
  • Coordinator manages which segments to load/drop

With Broker

  • Announces segment availability via ZooKeeper
  • Receives and executes queries from Broker
  • Returns query results to Broker for consolidation

With ZooKeeper

  • Monitors load queue path for new segment assignments
  • Announces served segments in served segments path
  • Retrieves segment metadata

With Deep Storage

  • Pulls segment files from deep storage
  • Caches segments locally on disk
  • Decompresses and processes segment files

Performance Considerations

To optimize Historical service performance:
  1. Size segment cache appropriately: Ensure druid.segmentCache.locations has enough space for your working set of segments
  2. Maximize free OS memory: More free memory means more data in the memory-mapped cache
  3. Use SSDs for segment cache: Fast disk I/O improves cold query performance
  4. Monitor cache hit rates: Track how often queries hit memory vs. disk
  5. Configure appropriate replication: Balance query performance with storage costs
Segment Cache vs. Heap MemoryThe segment cache is separate from JVM heap memory. It uses OS-level memory mapping, so you need to consider:
  • JVM heap size
  • Direct memory buffers
  • OS page cache size
  • Total available RAM
Ensure your system has enough RAM for all these components.

Build docs developers (and LLMs) love