Documentation Index
Fetch the complete documentation index at: https://mintlify.com/juanceresa/sift-kg/llms.txt
Use this file to discover all available pages before exploring further.
What Are Domains?
A domain is a YAML configuration that defines:- Entity types: What kinds of things to extract (people, organizations, concepts, etc.)
- Relation types: How entities can be connected
- Extraction hints: Guidance for the LLM to improve accuracy
- Review requirements: Which relations need human validation
- System context: Background information to help the LLM understand your documents
Bundled Domains
sift-kg ships with four production-ready domains:Schema-Free
The schema-free domain lets the LLM discover entity and relation types from your documents:- Samples your documents
- LLM designs entity and relation types tailored to the corpus
- Schema saved to
discovered_domain.yaml - Uses discovered schema for consistent extraction
- You don’t know what entity types exist in your documents
- You want to explore a new dataset
- You’re building a custom domain and want to see what the LLM finds
General Purpose
The general domain provides broad coverage for common entity types:/home/daytona/workspace/source/src/sift_kg/domains/bundled/general/domain.yaml
OSINT Investigation
Optimized for open-source intelligence work:/home/daytona/workspace/source/src/sift_kg/domains/bundled/osint/domain.yaml
Academic Research
Maps the intellectual landscape of research areas:/home/daytona/workspace/source/src/sift_kg/domains/bundled/academic/domain.yaml
Using Bundled Domains
Specify a bundled domain with the--domain-name flag:
sift.yaml project config:
Creating Custom Domains
Build a domain tailored to your use case:Basic Structure
Advanced Features
1. Type Constraints
Restrict which entities can be connected:- Dropped if
domain_relation_typesis provided tobuild_graph - Mapped to
fallback_relationif defined - Kept as-is (no constraints)
2. Symmetric Relations
Mark bidirectional relationships:3. Review Requirements
Flag specific relation types for human validation:sift build, all instances of this relation type are written to relation_review.yaml regardless of confidence.
4. Canonical Vocabularies
Enforce closed vocabularies for specific entity types:- Canonical entities are pre-created in the graph
- Extractions matching canonical names (case-insensitive) map to canonical entities
- Non-canonical entities are retyped to
canonical_fallback_type - All relations resolve correctly
Domain File Placement
Save your custom domain as a YAML file:--domain flag:
sift.yaml:
Domain Best Practices
1. Start with Schema-Free
Before building a custom domain, run schema-free extraction to see what the LLM discovers:2. Use Extraction Hints
Guide the LLM with specific instructions:3. Provide System Context
Help the LLM understand your domain:4. Balance Granularity
Too coarse:5. Test Iteratively
Domain design is iterative:- Extract a sample of documents
- Review the results
- Add hints or adjust types
- Re-extract with
--force - Repeat until quality is acceptable
6. Version Your Domains
Track domain evolution:Domain Configuration Reference
Top-Level Fields
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Domain name |
version | string | No | Semantic version (default: “1.0.0”) |
description | string | No | Domain description |
entity_types | object | Yes | Entity type definitions |
relation_types | object | Yes | Relation type definitions |
system_context | string | No | LLM context for extraction |
fallback_relation | string | No | Default relation for undefined types |
schema_free | boolean | No | Enable schema discovery mode |
Entity Type Config
| Field | Type | Default | Description |
|---|---|---|---|
description | string | "" | Entity type description |
extraction_hints | list[string] | [] | LLM guidance for extraction |
canonical_names | list[string] | [] | Closed vocabulary (optional) |
canonical_fallback_type | string | null | Type for non-canonical entities |
Relation Type Config
| Field | Type | Default | Description |
|---|---|---|---|
description | string | "" | Relation type description |
source_types | list[string] | [] | Valid source entity types |
target_types | list[string] | [] | Valid target entity types |
symmetric | boolean | false | Bidirectional relationship |
extraction_hints | list[string] | [] | LLM guidance for extraction |
review_required | boolean | false | Flag all instances for review |
/home/daytona/workspace/source/src/sift_kg/domains/models.py
Next Steps
How It Works
Understand the full pipeline from extraction to visualization
Entity Resolution
Learn how sift-kg finds and merges duplicate entities