Documentation Index
Fetch the complete documentation index at: https://mintlify.com/juanceresa/sift-kg/llms.txt
Use this file to discover all available pages before exploring further.
Introduction
While sift-kg is primarily a CLI tool, you can also use it as a Python library in your notebooks, web apps, or data pipelines. The Python API gives you full control over the extraction pipeline with explicit parameters instead of CLI arguments.
Installation
For embedding-based entity resolution:
pip install "sift-kg[embeddings]"
Quick Start
from pathlib import Path
from sift_kg import load_domain, run_pipeline
# Load a domain configuration
domain = load_domain(bundled_name="schema-free")
# Run the full pipeline
output_dir = run_pipeline(
doc_dir=Path("./documents"),
model="openai/gpt-4o-mini",
domain=domain,
output_dir=Path("./output"),
max_cost=5.0, # Budget cap in USD
include_narrative=True,
)
print(f"Pipeline complete! Results in {output_dir}")
Core Imports
from sift_kg import (
# Pipeline functions
run_pipeline, # Full pipeline: extract → build → narrate
run_extract, # Step 1: Extract entities and relations
run_build, # Step 2: Build knowledge graph
run_resolve, # Step 3: Find duplicate entities
run_apply_merges, # Step 4: Apply confirmed merges
run_narrate, # Step 5: Generate narrative summary
run_view, # Generate interactive visualization
run_export, # Export to various formats
# Core classes
KnowledgeGraph, # Knowledge graph data structure
DomainConfig, # Domain configuration schema
LLMClient, # LLM client wrapper
# Utilities
load_domain, # Load domain configurations
export_graph, # Export graph helper
)
Pipeline Architecture
The sift-kg pipeline consists of several stages:
- Extract (
run_extract): Parse documents and extract entities/relations using an LLM
- Build (
run_build): Construct a knowledge graph from extractions
- Resolve (
run_resolve): Find duplicate entities using LLM-based comparison
- Apply Merges (
run_apply_merges): Apply human-reviewed entity merges
- Narrate (
run_narrate): Generate narrative summaries using community detection
- View (
run_view): Create interactive HTML visualizations
- Export (
run_export): Export to GraphML, GEXF, CSV, or SQLite
You can run the full pipeline with run_pipeline() or individual stages for more control.
Basic Example: Step-by-Step
from pathlib import Path
from sift_kg import (
load_domain,
run_extract,
run_build,
run_view,
KnowledgeGraph,
)
# 1. Load domain
domain = load_domain(bundled_name="schema-free")
# 2. Extract entities and relations
extractions = run_extract(
doc_dir=Path("./documents"),
model="openai/gpt-4o-mini",
domain=domain,
output_dir=Path("./output"),
chunk_size=10000,
concurrency=4,
)
print(f"Extracted {len(extractions)} documents")
# 3. Build knowledge graph
kg = run_build(
output_dir=Path("./output"),
domain=domain,
review_threshold=0.7,
postprocess=True,
)
print(f"Graph: {kg.entity_count} entities, {kg.relation_count} relations")
# 4. Generate visualization
html_path = run_view(
output_dir=Path("./output"),
open_browser=False,
min_confidence=0.5,
)
print(f"Visualization saved to {html_path}")
Using Custom Domains
from pathlib import Path
from sift_kg import load_domain, run_extract
# Load custom domain configuration
domain = load_domain(domain_path=Path("./my_domain.yaml"))
# Use it in extraction
extractions = run_extract(
doc_dir=Path("./documents"),
model="openai/gpt-4o-mini",
domain=domain,
output_dir=Path("./output"),
)
Working with the Knowledge Graph
from pathlib import Path
from sift_kg import KnowledgeGraph
# Load an existing graph
kg = KnowledgeGraph.load("./output/graph_data.json")
# Query entities
entity = kg.get_entity("person:alice")
if entity:
print(f"Name: {entity['name']}")
print(f"Type: {entity['entity_type']}")
print(f"Confidence: {entity['confidence']}")
# Get relations
relations = kg.get_relations("person:alice", direction="out")
for rel in relations:
print(f"{rel['source']} --[{rel['relation_type']}]--> {rel['target']}")
# Export to different formats
from sift_kg import export_graph
export_graph(kg, Path("./graph.graphml"), "graphml")
export_graph(kg, Path("./graph.sqlite"), "sqlite")
export_graph(kg, Path("./csv"), "csv")
Environment Setup
Set your LLM API keys before running:
# OpenAI
export OPENAI_API_KEY="sk-..."
# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."
# Or other providers via LiteLLM
export COHERE_API_KEY="..."
export GEMINI_API_KEY="..."
Next Steps