Key Responsibilities
Segment Storage
Caches data segments on local disk in the segment cache
Query Execution
Serves queries from local disk cache and memory-mapped cache
Segment Loading
Pulls segment files from deep storage to local disk
ZooKeeper Integration
Monitors ZooKeeper for segment assignment and announces segment availability
Configuration
For Apache Druid Historical service configuration, see:Running the Historical
Loading and Serving Segments
Each Historical service copies or pulls segment files from deep storage to local disk in an area called the segment cache.To configure the size and location of the segment cache on each Historical service, set the
druid.segmentCache.locations property. For more information, see Segment cache size.How Segment Assignment Works
Coordinator creates ZooKeeper entry
The Coordinator controls the assignment of segments to Historicals and the balance of segments between Historicals. The Coordinator creates ephemeral entries in ZooKeeper in a load queue path.
Historical monitors ZooKeeper
Each Historical service maintains a connection to ZooKeeper, watching those paths for segment information.
Historical checks segment cache
When a Historical service detects a new entry in the ZooKeeper load queue, it checks its own segment cache.
Retrieve segment metadata
If no information about the segment exists in the cache, the Historical service first retrieves metadata from ZooKeeper about the segment, including where the segment is located in deep storage and how it needs to decompress and process it.
Historical services do not communicate directly with each other, nor do they communicate directly with the Coordinator. All coordination happens through ZooKeeper.
Loading and Serving Segments from Cache
The segment cache uses memory mapping (mmap). The cache consumes memory from the underlying operating system so Historicals can hold parts of segment files in memory to increase query performance at the data level.Memory-Mapped Cache Behavior
The in-memory segment cache is affected by:- Size of the Historical JVM
- Heap / direct memory buffers
- Other services on the operating system itself
- Data in Memory
- Data on Disk
At query time, if the required part of a segment file is available in the memory mapped cache or “page cache”, the Historical re-uses it and reads it directly from memory.
If free operating system memory is close to
druid.server.maxSize, segment data is more likely to be available in memory and reduce query times.Understanding Cache Layers
This memory-mapped segment cache is in addition to other query-level caches. For more information, see Query Caching.
Querying Segments
You can configure a Historical service to log and report metrics for every query it services.Query Documentation
For information on querying Historical services, see the Querying documentation.
HTTP Endpoints
For a list of API endpoints supported by the Historical, see:Architecture Integration
With Coordinator
- Receives segment assignment instructions via ZooKeeper
- Does not communicate directly
- Coordinator manages which segments to load/drop
With Broker
- Announces segment availability via ZooKeeper
- Receives and executes queries from Broker
- Returns query results to Broker for consolidation
With ZooKeeper
- Monitors load queue path for new segment assignments
- Announces served segments in served segments path
- Retrieves segment metadata
With Deep Storage
- Pulls segment files from deep storage
- Caches segments locally on disk
- Decompresses and processes segment files