The trigger endpoint starts the ingestion pipeline for a single book identified by its Project Gutenberg integer ID. When called, the service fetches the book’s plain-text content from Gutenberg, writes the header and body files to the datalake filesystem, stores a record in the HazelcastDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/GuancheData/stage_3/llms.txt
Use this file to discover all available pages before exploring further.
datalake IMap, and publishes a notification to the ActiveMQ documents.ingested queue. If the book has already been downloaded (its ID is present in the "log" ISet), the download step is skipped and the existing data is left untouched, making the operation safe to call multiple times.
Path parameters
The Project Gutenberg numeric book ID. For example,
1342 for Pride and Prejudice or 84 for Frankenstein.Example request
What happens on success
When the book is not yet in the datalake, the service:- Downloads
https://www.gutenberg.org/cache/epub/{id}/pg{id}.txt - Splits the content into a header and a body
- Writes both parts to the filesystem under
datalake/{date}/{hour}/{id}_header.txtanddatalake/{date}/{hour}/{id}_body.txt - Stores the entry in the Hazelcast
"datalake"IMap - Marks the book as downloaded in the
"log"ISet - Publishes a message to the ActiveMQ
documents.ingestedqueue
Idempotency
If the book ID is already present in the download log, the service skips the download and storage steps entirely. Calling this endpoint multiple times for the same book is safe.Ingestion may be paused by the system when the indexing buffer is full. In that case, the periodic scheduler will hold back further processing until capacity is available. Direct calls to this endpoint still execute immediately regardless of the pause state.