Complete sift-kg pipeline output from 9 Wikipedia articles documenting the FTX cryptocurrency exchange collapse, producing a graph of 373 entities and 1,184 relations after deduplication.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/juanceresa/sift-kg/llms.txt
Use this file to discover all available pages before exploring further.
Overview
This example demonstrates knowledge graph extraction from journalistic and encyclopedic content, with emphasis on the entity resolution workflow to merge duplicate entities across multiple documents.View Interactive Graph
Open
examples/ftx/output/graph.html in your browserSource Documents
9 text files covering FTX, Alameda Research, Binance, and key people
Quick Start
Pipeline Output
Theoutput/ directory contains the complete pipeline results:
Pipeline Statistics
| Metric | Value |
|---|---|
| Input Documents | 9 text files (~148K total) |
| Topics Covered | FTX, Alameda Research, Binance, Sam Bankman-Fried, and other key figures |
| Raw Entities Extracted | ~777 entities from LLM |
| After Pre-dedup (semhash) | 750 entities (27 deterministic merges) |
| After Build + Postprocess | 432 entities, 1,201 relations |
| After Resolution (3 passes) | 373 entities, 1,184 relations (59 entities merged via LLM + human review) |
| Final Entity Descriptions | 100 entity profiles in narrative |
| Model Used | claude-haiku-4-5-20251001 |
| Total Cost | ~$0.28 (extraction was separate) |
Entity Resolution Workflow
This example showcases the full deduplication pipeline:1. Automatic Semantic Deduplication
During extraction, semantic hashing automatically merges near-identical entities:- Before: 777 raw entities
- After: 750 entities (27 deterministic merges)
2. Build + Postprocess
Graph construction with normalization and filtering:- Result: 432 entities, 1,201 relations
3. LLM-Assisted Resolution
Three passes ofsift resolve to identify remaining duplicates:
merge_proposals.yaml with candidate merges:
4. Human Review
5. Apply Merges
Key Insights from the Graph
The generated narrative (narrative.md) provides:
- Overview — High-level synthesis of the FTX collapse timeline
- Entity descriptions — AI-generated profiles for 100 key entities (companies, people, events)
- Relationship mapping — How FTX, Alameda, Binance, and key figures are connected
Re-running the Example
Option 1: Build from Existing Extractions (Free)
Use the pre-extracted entities — no LLM API calls:Option 2: Full Pipeline from Scratch
Re-extract entities from the source documents:Cost Breakdown
The ~$0.28 total cost includes:- Resolution — LLM calls across 3 passes to identify duplicates
- Narration — LLM calls to generate entity descriptions and overview
- Number of entities to resolve
- Model chosen (Haiku vs GPT-4o-mini vs others)
- Number of resolution passes
Use Cases
This example pattern works well for:- Investigative journalism — Map connections across news articles
- Business intelligence — Track companies, people, and events across sources
- Historical analysis — Document timelines and relationships in major events
- Due diligence — Aggregate information about entities from multiple sources
Merge Proposals File
Themerge_proposals.yaml file is a key artifact:
- Is generated by
sift resolve - Can be manually edited before applying
- Supports version control and collaboration
- Is applied with
sift apply-merges
Next Steps
Explore Other Examples
See Transformers and Epstein examples
Resolution Guide
Learn more about entity deduplication