Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/GuancheData/stage_3/llms.txt

Use this file to discover all available pages before exploring further.

The trigger endpoint starts the ingestion pipeline for a single book identified by its Project Gutenberg integer ID. When called, the service fetches the book’s plain-text content from Gutenberg, writes the header and body files to the datalake filesystem, stores a record in the Hazelcast datalake IMap, and publishes a notification to the ActiveMQ documents.ingested queue. If the book has already been downloaded (its ID is present in the "log" ISet), the download step is skipped and the existing data is left untouched, making the operation safe to call multiple times.

Path parameters

book_id
integer
required
The Project Gutenberg numeric book ID. For example, 1342 for Pride and Prejudice or 84 for Frankenstein.

Example request

curl -X POST http://localhost:7001/ingest/1342

What happens on success

When the book is not yet in the datalake, the service:
  1. Downloads https://www.gutenberg.org/cache/epub/{id}/pg{id}.txt
  2. Splits the content into a header and a body
  3. Writes both parts to the filesystem under datalake/{date}/{hour}/{id}_header.txt and datalake/{date}/{hour}/{id}_body.txt
  4. Stores the entry in the Hazelcast "datalake" IMap
  5. Marks the book as downloaded in the "log" ISet
  6. Publishes a message to the ActiveMQ documents.ingested queue
The endpoint returns HTTP 200 on success and an HTTP 4xx or 5xx status code if an error occurs.

Idempotency

If the book ID is already present in the download log, the service skips the download and storage steps entirely. Calling this endpoint multiple times for the same book is safe.
Ingestion may be paused by the system when the indexing buffer is full. In that case, the periodic scheduler will hold back further processing until capacity is available. Direct calls to this endpoint still execute immediately regardless of the pause state.

Build docs developers (and LLMs) love