Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/juanceresa/sift-kg/llms.txt

Use this file to discover all available pages before exploring further.

The sift export command converts your knowledge graph into standard formats for use in external tools like Gephi, Cytoscape, Neo4j, Excel, and more.

Quick Start

# Export to GraphML (default)
sift export graphml

# Export to CSV
sift export csv

# Export to SQLite database
sift export sqlite

Command Syntax

sift export <format> [options]
format
choice
required
Export format:
  • graphml — GraphML XML (yEd, Cytoscape, NetworkX)
  • gexf — GEXF XML (Gephi native format)
  • csv — CSV tables (Excel, Pandas, R)
  • sqlite — SQLite database (SQL queries, BI tools)
  • json — sift-kg native JSON format
-o, --output
path
Output directory containing graph_data.json (defaults to output/)
--to
path
Custom export file/directory path
-v, --verbose
boolean
Verbose logging

Export Formats

GraphML

Best for: yEd, Cytoscape, Gephi, NetworkX, igraph
sift export graphml

# Custom output path
sift export graphml --to ./analysis/graph.graphml
Features:
  • Full attribute preservation (entity types, confidence, evidence)
  • Pre-computed layout (spring layout, 1000x1000 canvas)
  • Node colors by entity type
  • Edge colors by relation type
  • Compatible with most graph analysis tools
Output: Single .graphml file
<?xml version="1.0" encoding="UTF-8"?>
<graphml>
  <graph edgedefault="directed">
    <node id="person:john_smith">
      <data key="name">John Smith</data>
      <data key="entity_type">PERSON</data>
      <data key="label">John Smith</data>
      <data key="color">#42A5F5</data>
      <data key="size">25.0</data>
      <data key="x">123.45</data>
      <data key="y">678.90</data>
    </node>
    <edge source="person:john_smith" target="organization:acme_corp">
      <data key="relation_type">WORKS_FOR</data>
      <data key="label">WORKS_FOR</data>
      <data key="confidence">0.95</data>
      <data key="evidence">John Smith is CEO of Acme Corp</data>
      <data key="color">#4CAF50</data>
    </edge>
  </graph>
</graphml>

GEXF

Best for: Gephi (native format)
sift export gexf

# Custom path
sift export gexf --to ./gephi/knowledge-graph.gexf
Features:
  • Gephi’s native format (best compatibility)
  • RGB colors for nodes and edges
  • Pre-computed positions (spring layout)
  • All attributes preserved
Output: Single .gexf file Gephi Import:
  1. Open Gephi
  2. File → Open → Select .gexf file
  3. Graph loads with colors and layout already applied
Use GEXF for Gephi, GraphML for everything else. Both contain the same data.

CSV

Best for: Excel, Pandas, R, SQL imports, manual analysis
sift export csv

# Custom directory
sift export csv --to ./analysis/csv-export
Output: Directory with two CSV files
  • entities.csv — Entity nodes
  • relations.csv — Relation edges

entities.csv

id,name,entity_type,confidence,source_documents,attributes,description
person:john_smith,John Smith,PERSON,0.95,"document1; document2","{""role"": ""CEO""}","John Smith is the CEO of..."
organization:acme_corp,Acme Corporation,ORGANIZATION,0.9,document1,"{}","Acme Corporation is a..."

relations.csv

source,target,relation_type,confidence,support_count,support_documents,support_doc_count,evidence,source_document
person:john_smith,organization:acme_corp,WORKS_FOR,0.95,3,"document1; document2",2,"John Smith is CEO of Acme Corp",document1
Excel Import:
1. Open Excel
2. Data → From Text/CSV
3. Select entities.csv or relations.csv
4. Import with semicolon as delimiter for list fields
Pandas Import:
import pandas as pd

entities = pd.read_csv('output/csv/entities.csv')
relations = pd.read_csv('output/csv/relations.csv')

# Analyze top entities by support documents
top_entities = entities.nlargest(10, 'confidence')

# Find most supported relations
top_relations = relations.nlargest(10, 'support_count')

SQLite

Best for: SQL queries, BI tools, joins with other data
sift export sqlite

# Custom database path
sift export sqlite --to ./analysis/knowledge-graph.db
Output: Single .sqlite database file Schema:
CREATE TABLE nodes (
    node_id TEXT PRIMARY KEY,
    name TEXT,
    entity_type TEXT,
    confidence REAL,
    source_documents TEXT,  -- semicolon-separated
    attributes TEXT,        -- JSON string
    description TEXT
);

CREATE TABLE edges (
    source_id TEXT,
    target_id TEXT,
    relation_type TEXT,
    confidence REAL,
    support_count INTEGER,
    support_documents TEXT,     -- semicolon-separated
    support_doc_count INTEGER,
    evidence TEXT,
    source_document TEXT,
    FOREIGN KEY(source_id) REFERENCES nodes(node_id),
    FOREIGN KEY(target_id) REFERENCES nodes(node_id)
);

CREATE INDEX idx_edges_source ON edges(source_id);
CREATE INDEX idx_edges_target ON edges(target_id);
CREATE INDEX idx_edges_relation ON edges(relation_type);
Example Queries:
-- Most mentioned entities
SELECT name, entity_type, 
       LENGTH(source_documents) - LENGTH(REPLACE(source_documents, ';', '')) + 1 AS doc_count
FROM nodes
WHERE entity_type != 'DOCUMENT'
ORDER BY doc_count DESC
LIMIT 20;
Open in DB Browser:
# Install DB Browser for SQLite (free GUI)
brew install --cask db-browser-for-sqlite  # macOS

# Open database
open output/graph.sqlite

JSON

Best for: Python scripts, JavaScript apps, re-importing to sift-kg
sift export json

# Custom path
sift export json --to ./backup/graph-snapshot.json
Output: sift-kg native format (same as graph_data.json) This is the internal format used by sift-kg. Use for:
  • Backups and versioning
  • Sharing graphs between sift-kg installations
  • Custom processing with NetworkX:
import networkx as nx
import json

# Load as NetworkX MultiDiGraph
with open('output/graph.json') as f:
    data = json.load(f)
    
G = nx.node_link_graph(data)

# Run NetworkX algorithms
import networkx.algorithms.community as nx_comm
communities = nx_comm.louvain_communities(G.to_undirected())

# Analyze
print(f"Communities: {len(communities)}")
for i, comm in enumerate(communities):
    print(f"Community {i+1}: {len(comm)} entities")

Including Descriptions

If you’ve run sift narrate, entity descriptions are automatically included in exports:
# Generate descriptions first
sift narrate

# Export with descriptions embedded
sift export graphml
sift export csv
sift export sqlite
Descriptions appear in:
  • GraphML/GEXF: description attribute on nodes
  • CSV: description column in entities.csv
  • SQLite: description column in nodes table
Without sift narrate, the description field will be empty.

Use Cases

Export to CSV, then analyze with Pandas/NetworkX/igraph:
import pandas as pd
import networkx as nx

# Load CSV export
entities = pd.read_csv('output/csv/entities.csv')
relations = pd.read_csv('output/csv/relations.csv')

# Build NetworkX graph
G = nx.DiGraph()
for _, row in entities.iterrows():
    G.add_node(row['id'], **row.to_dict())
for _, row in relations.iterrows():
    G.add_edge(row['source'], row['target'], **row.to_dict())

# Centrality analysis
centrality = nx.betweenness_centrality(G)
top_10 = sorted(centrality.items(), key=lambda x: -x[1])[:10]
Export to GEXF, open in Gephi:
sift export gexf --to ./gephi-import.gexf
In Gephi:
  1. File → Open → gephi-import.gexf
  2. Graph loads with colors and positions
  3. Apply layout algorithms (Force Atlas 2, Fruchterman-Reingold)
  4. Adjust styling and export publication-quality images
Export to SQLite, connect from BI tools:
sift export sqlite --to ./dashboard/knowledge-graph.db
Connect from:
  • Tableau: SQLite connector
  • Power BI: ODBC driver for SQLite
  • Metabase: SQLite database connection
  • Apache Superset: SQLite support
Build dashboards showing:
  • Entity counts by type over time
  • Most connected entities
  • Confidence score distributions
  • Document coverage (entities per doc)
Export to CSV for audit reports:
sift export csv --to ./audit/extraction-report-2024-03
Use source_documents, support_documents, evidence fields to trace:
  • Which documents mention each entity
  • Evidence supporting each relation
  • Cross-document validation (entities in 3+ docs = high confidence)
Export to CSV, import to Neo4j:
sift export csv --to ./neo4j-import
Neo4j Cypher import script:
// Load entities as nodes
LOAD CSV WITH HEADERS FROM 'file:///entities.csv' AS row
CREATE (e:Entity {
  id: row.id,
  name: row.name,
  type: row.entity_type,
  confidence: toFloat(row.confidence)
});

// Load relations as edges
LOAD CSV WITH HEADERS FROM 'file:///relations.csv' AS row
MATCH (source:Entity {id: row.source})
MATCH (target:Entity {id: row.target})
CREATE (source)-[r:RELATED {
  type: row.relation_type,
  confidence: toFloat(row.confidence),
  evidence: row.evidence
}]->(target);

Format Comparison

FormatBest ForNode AttrsEdge AttrsMulti-EdgesFile Size
GraphMLGeneral-purpose, yEd, Cytoscape✅ Full✅ Full⚠️ MergedLarge
GEXFGephi✅ Full✅ Full⚠️ MergedLarge
CSVExcel, Pandas, SQL import✅ Full✅ Full✅ PreservedSmall
SQLiteSQL queries, BI tools✅ Full✅ Full✅ PreservedMedium
JSONPython, backup, re-import✅ Full✅ Full✅ PreservedMedium
GraphML and GEXF don’t support multi-edges well (multiple relations between same entity pair). Parallel edges are merged, with relation types concatenated: WORKS_FOR; FOUNDED.Use CSV or SQLite if you need to preserve every individual relation.

Advanced Options

Custom Export Paths

# Export to specific file
sift export graphml --to ~/Desktop/analysis.graphml

# Export CSV to specific directory
sift export csv --to ~/Documents/graph-export

# Change output directory (where graph_data.json is read from)
sift export graphml -o ./project-output

Batch Exports

# Export all formats for archival
sift export json --to ./archive/graph.json
sift export graphml --to ./archive/graph.graphml
sift export csv --to ./archive/csv
sift export sqlite --to ./archive/graph.db

Troubleshooting

”Graph not found”

Run sift build first to create graph_data.json.

Attribute truncation in GraphML/GEXF

Complex attributes (nested dicts, long lists) are JSON-serialized to strings. This is a format limitation. For full attribute access, use CSV or SQLite exports.

Large file sizes

GraphML/GEXF are verbose XML formats. For large graphs (>10k entities):
# Use CSV (smaller)
sift export csv

# Or compress GraphML
sift export graphml --to graph.graphml
gzip graph.graphml  # graph.graphml.gz

Parallel edges collapsed

GraphML/GEXF collapse multi-edges (same source/target, different types). Solution: Use CSV or SQLite to preserve all individual relations:
sift export csv  # relations.csv has one row per relation

Special characters in CSV

Entity names with commas, quotes, or newlines are properly escaped. If importing to Excel and seeing issues:
  • Use “From Text/CSV” import wizard (not drag-and-drop)
  • Select UTF-8 encoding
  • Use semicolon delimiter for list fields (source_documents, support_documents)

Next Steps

Visualize Graph

Interactive exploration before exporting

API Reference

Programmatic export from Python scripts

Build docs developers (and LLMs) love