Exploring the knowledge graph and full-text search in Sherpa

After a folder is ingested, Sherpa exposes everything it learned through three complementary views: an interactive knowledge graph, a 3D overview of the entire world, and a full-text Elasticsearch panel. Each view targets a different kind of exploration — from tracing call chains and copy-book lineage through to open-ended Japanese keyword search. All graph and search screens require the admin role when authentication is enabled.

Knowledge graph viewer

Open ナレッジグラフ (Knowledge Graph) at /ui/graph.html to see the live graph for a registered world. Nodes are colour-coded by type (see the node-type table below). Edges carry a confidence indicator that reflects how the relationship was extracted.

Click a node to expand its immediate neighbours. The panel on the right shows the node’s properties, its source document links, and a button to launch an impact analysis run from that node.
Name search — type a name in the search box to jump directly to a matching node. Partial matches are supported.
Filters — toggle node types, edge types, deprecated status, and confidence threshold from the toolbar.

Clicking a node and selecting 影響調査 (Impact Analysis) sends that node directly to the chat agent as the starting point for a full impact traversal. This is the fastest path from “I found the node” to “what does changing it affect?”

Node types

The graph uses a closed vocabulary of node labels — Sherpa will not create nodes outside this list.

Node label	Description	Example	Primary extraction source
`Module`	A program, class, or COBOL `PROGRAM-ID`	`BILLGEN`, `TaxCalc`	Static (COBOL / JCL)
`Copybook`	A COBOL copybook (shared data definitions)	`TAXRATE-CPY`	Static (COPY statement)
`DataItem`	A column, copybook field, or dataset item	`TAX-RATE`, `CUST-ID`	Static (copybook / SQL)
`Function`	A business function	Invoice issuance	LLM (design documents)
`Screen`	A UI screen or form	Invoice entry screen	LLM + static (web)
`Report`	A printed output or batch report	Invoice, delivery note	LLM + static (OUTPUT)
`Batch`	A job or job-net	Daily close job	Static (JCL)
`Table`	A database table, VSAM file, or dataset	`INVOICE_HDR`	Static (EXEC SQL / FILE)
`BusinessRule`	A calculation or decision rule	Consumption-tax calculation	LLM (design documents)
`Parameter`	A code value, constant, or threshold	Consumption-tax rate = 10%	LLM + static (constants)
`Document`	A design document — 1:1 with the ledger	Specification sheet	Set at ingest time
`Standard`	A regulation or coding standard	Consumption Tax Act	LLM
`Incident`	A recorded fault or incident	Ticket #1234	LLM

Edge types

Edge	Direction (dependency)	Example	Used in impact traversal
`COPIES`	`Module → Copybook`	BILLGEN copies TAXRATE-CPY	✓
`INVOKES`	`Screen / Batch / Module → Module / Function`	Daily-close job invokes BILLGEN	✓
`CONTAINS`	`Copybook / Table → DataItem`	TAXRATE-CPY contains TAX-RATE	✓
`REALIZES`	`DataItem → Parameter / BusinessRule`	TAX-RATE realizes consumption-tax rate	✓
`IMPLEMENTED_BY`	`Function / BusinessRule → Module`	Invoice-issuance function implemented by BILLGEN	✓
`PRODUCED_BY`	`Report → Function / Module / Batch`	Invoice produced by invoice-issuance function	✓
`DOCUMENTS`	`Document → any artefact`	Spec sheet documents invoice-issuance function	— (provenance only)
`RELATES_TO`	`Incident → any artefact`	Incident #1234 relates to BILLGEN	— (reference only)
`USES`	`Function / Module → BusinessRule / Parameter`	Invoice function uses tax rate	✓
`ACCESSES`	`Function / Module → Table / DataItem`	BILLGEN accesses INVOICE_HDR (rw=write)	✓
`CORRESPONDS_TO`	Same-name node in different generations	TAXCALC in v1 ↔ TAXCALC in v2	— (cross-scope only)
`CONFORMS_TO`	`Function / Module → Standard`	BILLGEN conforms to Consumption Tax Act	✓ (regulation scope)

Confidence markers

Every node and edge carries a confidence indicator that reflects how it was extracted:

Marker	Meaning	Extraction method
● Confirmed	Deterministic — reproduced identically on every ingest run	Static analysis (COBOL / JCL parser)
○ Requires check	Probabilistic — derived by the LLM from document text	LLM semantic extraction

Nodes with status = "deprecated" or status = "hidden_candidate" are excluded from RAG retrieval by default but remain visible in the graph viewer with a reduced confidence display. Use the 廃止を含む (Include deprecated) filter to show or hide them. You can force their inclusion in impact queries by passing include_deprecated=true to the graph search API.

3D knowledge overview

The ナレッジ概観 (Knowledge Overview) screen renders the entire world as a 3D force-directed graph. It is designed for a top-down “orient first, then drill down” workflow:

Real relationship lines connect every node — the layout reflects actual structural density, not a layout algorithm.
Hover over a node to highlight its immediate neighbours.
Filters for type, confidence, and deprecated status are available in the toolbar.
Click any node to jump to the standard graph viewer and continue to impact analysis from there.

Graph search

GET /graph/search filters the graph by relationship type and attribute conditions, returning a nodes-and-edges payload in the same format as the graph viewer.

GET /graph/search
  ?version=my-world
  &relationship=COPIES
  &field=name
  &value=TAXRATE
  &op=contains
  &include_deprecated=false
  &limit=200

Parameter	Description
`version`	World ID to query
`relationship`	One or more edge types to filter (repeat the parameter for multiple values)
`field`	Node property name to filter on
`value`	Value to match
`op`	Comparison operator: `eq`, `contains`, `starts_with`
`include_deprecated`	Include deprecated / hidden nodes and edges (default `false`)
`scope_paths`	Folder prefix(es) to restrict the search scope
`limit`	Maximum number of result nodes (1–1000, default 200)

To retrieve the full closed vocabulary of available labels and relationship types, call GET /graph/facets — this returns the exact string values accepted by the relationship and field parameters.

Graph ask (natural-language question)

POST /graph/ask lets you ask a free-text question against the graph. The AI traverses the graph neighbourhood, returns matching nodes with their relationship paths, and explains what it found. If no supporting evidence exists in the graph the response states this explicitly rather than hallucinating an answer.

POST /graph/ask
Content-Type: application/json

{
  "question": "Which programs are affected if the consumption-tax rate changes?",
  "version": "my-world",
  "scope_paths": ["src/billing"]
}

Graph ask requires a configured LLM connection (OpenAI, Gemini, or Ollama). If no LLM is configured the endpoint returns HTTP 503. Enable and select the model in the Settings screen.

Full-text search (Elasticsearch)

GET /admin/es/search runs a BM25 full-text query across the world’s Elasticsearch index using kuromoji morphological analysis for Japanese. Results are ranked by relevance score and include the document ID, an excerpt, and the score.

GET /admin/es/search
  ?version=my-world
  &query=消費税計算
  &scope_paths=src/billing
  &k=20

Parameter	Description
`version`	World ID to query (same as `version` on other endpoints)
`query`	Search string — kuromoji tokenisation, BM25 ranking
`scope_paths`	Restrict results to a sub-folder prefix
`k`	Number of results to return

Results show the document name (relative path from the world root) and a text excerpt. Physical file paths are never exposed.

Vector/semantic search (embedding-based) is available as an optional enhancement. When an embedding backend is enabled, results also incorporate semantic similarity, which improves recall for paraphrased queries. Without an embedding backend, only BM25 keyword matching is used.

Concept management

Sherpa automatically proposes bridges between business-layer concepts (e.g. Parameter: consumption-tax rate) and their code-level counterparts (e.g. DataItem: TAX-RATE) via the REALIZES edge. Admins can review, confirm, and disable these proposals.

Endpoint	Action
`POST /worlds/{wid}/concepts/propose`	Ask the AI to generate concept-to-code bridge proposals for the world
`POST /worlds/{wid}/concepts/confirm`	Accept a set of proposals — writes them to `concepts.json` and rebuilds the graph with the new bridges active
`POST /worlds/{wid}/concepts/disable`	Add a concept–item pair to the denylist — the bridge is removed from the graph and will not be recreated on re-ingest

Disabling a concept bridge is permanent with respect to that world’s concepts.json; a full re-ingest will not restore a disabled bridge.

Getting Started

Using Sherpa

Administration

Deployment

Exploring the knowledge graph and full-text search in Sherpa

Knowledge graph viewer

Node types

Edge types

Confidence markers

3D knowledge overview

Graph search

Graph ask (natural-language question)

Full-text search (Elasticsearch)

Concept management

Build docs developers (and LLMs) love

Getting Started

Using Sherpa

Administration

Deployment

Documentation Index

​Knowledge graph viewer

​Node types

​Edge types

​Confidence markers

​3D knowledge overview

​Graph search

​Graph ask (natural-language question)

​Full-text search (Elasticsearch)

​Concept management

Build docs developers (and LLMs) love

Knowledge graph viewer

Node types

Edge types

Confidence markers

3D knowledge overview

Graph search

Graph ask (natural-language question)

Full-text search (Elasticsearch)

Concept management