Core concepts: worlds, scope paths, lenses, and agents

Sherpa is built around a small set of concepts that fit together consistently. Once you understand how a world maps to a folder, how scope paths filter within it, and how the three analysis lenses use the underlying knowledge graph, the behaviour of every feature in the UI will make sense. This page defines each concept precisely, using the same terminology you will see in the interface and the API.

Worlds

A world is the knowledge base derived from a single registered directory. When you register a folder path in Sherpa, Sherpa creates:

One Neo4j graph — nodes and edges extracted from the folder’s contents.
One Elasticsearch index — full-text chunks of all documents in the folder.

Every node in the graph, every chunk in Elasticsearch, and every document in the ledger carries a world_id that ties it back to its source registration. Key behaviours:

Re-ingest = replace. Running ingest again on the same world scans the folder and brings the knowledge base into sync: changed files are replaced, deleted files are removed, new files are added. There is no separate “publish” or “approve” step.
Delete = remove everything. Deleting a world removes the graph, the ES index, and all derived artefacts for that world_id. The source folder on disk is untouched.
One folder : one world. The same folder path cannot be registered to two different worlds simultaneously (UNIQUE(root_path) constraint). If you need to point a world at a new path, you must delete the existing world and re-register from the new path.

The directory-mirror principle means the folder tree is the knowledge base. You do not configure which files to include — everything in the registered folder is ingested. To exclude content, simply remove it from the folder before running ingest.

Scope paths

A scope path is a subfolder prefix inside a world used to narrow a search, impact analysis, or citation to a subset of the world’s content. For example, if your world contains:

4期更改/
  01_要件定義/
  02_設計/
  03_開発/
4期保守/
5期更改/

You can direct any query at 4期更改/02_設計 (just the design documents for the fourth-generation change project) or at 4期更改 (the entire generation), or issue a cross-scope query spanning 4期更改 and 5期更改. How scope filtering works:

The same top_scope, phase, and category metadata attributes are attached to graph nodes, ES chunks, and document ledger entries alike. Filtering is consistent across all three stores.
Impact analysis Cypher queries include the scope prefix directly in the traversal, so the graph traversal itself is bounded — not just the result set.
Common components (e.g. a shared COBOL copybook that appears in multiple subfolders) are resolved within the same generation (top-level folder segment). They are not merged across generations, which prevents false cross-generation impact paths.

Analysis lenses

Sherpa recognises three investigation modes. The AI agent selects the appropriate lens automatically from the wording of your question, but you can also select it explicitly in the UI.

Impact analysis

What it does: Traverses the Neo4j knowledge graph starting from one or more nodes that correspond to the concept you named (e.g. a business parameter, a COBOL copybook, or a data item). The traversal follows dependency edges in reverse — “what depends on this?” — collecting all reachable nodes. What you receive:

Total hit count with a breakdown by status: fix now / needs review / not affected.
The traversal path for each result (e.g. 消費税率 ← TAX-RATE ← 税率コピーブック ← BILLGEN), showing why each item is affected.
Source citations with download links.
Optional Excel export of the full result set.

Underlying mechanism: Deterministic Neo4j graph traversal — not probabilistic AI generation. The AI’s role is to resolve the natural-language starting term to one or more graph nodes; the traversal itself is exact.

Troubleshooting

What it does: Takes a symptom description (e.g. “the nightly batch terminated abnormally”) and returns a ranked list of cause candidates drawn from the knowledge graph and document corpus. What you receive:

Cause-candidate cards ordered by confidence score.
For each candidate: the inferred reason, suggested verification steps, and a source citation.

Underlying mechanism: Agentic — the AI calls Elasticsearch search, grep, and graph neighbourhood queries autonomously, then synthesises a ranked candidate list. Incident nodes and RELATES_TO edges in the graph are the primary signal.

Spec Q&A

What it does: Answers a natural-language question about how a system works by retrieving the most relevant document excerpts from Elasticsearch and the knowledge graph. What you receive:

The relevant passage(s) from the source documents.
A citation link to each source (original Office file or source code, downloadable in one click).
If no supporting evidence is found, Sherpa returns “no evidence found” explicitly rather than generating a speculative answer.

Underlying mechanism: Agentic — the AI calls ES full-text search and graph neighbourhood queries, then composes an answer grounded only in retrieved content.

Knowledge graph

The knowledge graph in Neo4j uses a closed vocabulary of node labels and edge types. Using a fixed schema (rather than open-ended entity extraction) ensures that impact traversals are reliable and reproducible.

Node labels

Layer	Label	Description
Business	`BusinessRule`	Calculation or decision logic (e.g. consumption-tax calculation rule)
Business	`Parameter`	Codes, constants, thresholds (e.g. consumption tax rate = 10%)
Function / UI	`Function`	Business function (e.g. invoice issuance)
Function / UI	`Screen`	UI screen (e.g. invoice list screen)
Function / UI	`Report`	Output report or form (e.g. invoice, delivery note)
Function / UI	`Batch`	Batch job or job-net (e.g. daily close). Derived from JCL.
Implementation	`Module`	Program / class / procedure (COBOL `PROGRAM-ID`, Java class, etc.)
Implementation	`Copybook`	COBOL copybook — the primary source of coverage in impact analysis
Implementation	`Table`	RDB table, VSAM file, or dataset
Implementation	`DataItem`	Column or copybook data item (e.g. `TAX-RATE`)
Evidence	`Document`	Design spec, definition document — one-to-one with the document ledger entry; the download anchor for citations
Evidence	`Standard`	Regulations or coding standards (e.g. Consumption Tax Act)
Evidence	`Incident`	Incident record — primary signal for troubleshooting

Edge types

Edges are directed by dependency: A → B means “A depends on B”. Impact analysis traverses edges in reverse — “what depends on the node that changed?”

Edge	Direction (dependent → dependency)	Example	Used in impact traversal
`USES`	Function/Module → BusinessRule/Parameter	A function uses the tax rate	✓
`REFERENCES`	BusinessRule → Parameter / Module → DataItem	A rule references the tax rate	✓
`IMPLEMENTED_BY`	Function/BusinessRule → Module	A function is implemented by a COBOL module	✓
`INVOKES`	Screen/Batch/Function/Module → Function/Module	A batch job invokes a function	✓
`PRODUCED_BY`	Report → Function/Module/Batch	An invoice is produced by the invoice-issuance function	✓
`ACCESSES`	Function/Module → Table/DataItem	A module reads or writes a DB table	✓
`COPIES`	Module → Copybook	A COBOL module copies a copybook	✓
`CONTAINS`	Copybook → DataItem / Table → DataItem	A copybook contains a data item	✓
`REALIZES`	DataItem → Parameter/BusinessRule	A code-level item realises a business parameter (the bridge between business and implementation layers)	✓
`CONFORMS_TO`	Function/Module → Standard	A module conforms to a legal standard	✓
`DOCUMENTS`	Document → any artefact	A spec document describes a function (evidence attachment)	—
`RELATES_TO`	Incident → any artefact	An incident is related to a function (troubleshooting signal)	△

The REALIZES edge is the critical bridge between the business layer and the implementation layer. Without it, an impact query starting from a business term like “consumption tax rate” (Parameter) would not reach the COBOL modules that COPY the copybook containing TAX-RATE (DataItem).

Ingest pipeline

Sherpa’s ingest pipeline has two complementary tracks that run on every world ingest: Track S — Static analysis (structural skeleton) Sherpa parses source files directly without AI involvement. For COBOL programs it extracts PROGRAM-ID, COPY statements, CALL statements, and EXEC SQL / FILE references. For JCL it extracts jobs and steps. This produces the structural edges (COPIES, INVOKES, ACCESSES, etc.) with 100% reproducibility. Name resolution uses a two-pass strategy:

Collect all definitions (program IDs, copybook stems, job names, data items).
Resolve references (COPY, CALL, EXEC PGM) using nearest-neighbour lookup: same folder → parent folder → generation root. References that cannot be resolved unambiguously are flagged as ambiguous_reference rather than silently guessed.

Track L — LLM semantic layer (business meaning) The AI provider reads Markdown-converted document text and extracts business-layer nodes (BusinessRule, Parameter, Function, etc.) and the REALIZES bridges that link business concepts to implementation artefacts. This track produces the semantic edges that make natural-language starting terms work in impact analysis. Office / document conversion Word (.docx), Excel (.xlsx), and other Office files are converted to Markdown before indexing. The Markdown is what Elasticsearch indexes and what the AI reads. The original binary files are kept as-is for download — when you click a citation, you receive the original .xlsx or .docx, not the converted Markdown. Source files COBOL, JCL, and copybook files are stored as plain text. They are available for exact-string grep queries (used in the Spec Q&A and Troubleshooting agentic loops) as well as for static analysis.

AI providers

Sherpa supports four AI provider options. The provider is switchable per user from the settings panel in the chat UI.

Provider	Model / endpoint	Notes
Heuristic	— (no model)	Default. No AI key required. Uses deterministic Neo4j traversal and template-based answers. Shown in the UI as “簡易（AIなし）”.
Codex	`gpt-5.5` (OpenAI Codex)	Agentic — Codex drives grep, ES, and graph tool calls autonomously. Reasoning depth configurable (`low` by default). Requires the Codex CLI to be installed.
OpenAI	GPT models via OpenAI API	Set `OPENAI_API_KEY` in `.env`. Text only — files are never uploaded to the API.
Gemini	Google Gemini API	Set `GEMINI_API_KEY` in the environment or via the settings UI.
Ollama	Local LLM via Ollama	Set `OLLAMA_URL` in `.env` (default `http://localhost:11434`). No external API calls.

Regardless of which AI provider is selected, Sherpa sends only the extracted text content of documents to the provider — never the original binary files and never data from other worlds. Files are not retained by any external service.

Personal workspace

Each authenticated user has a personal workspace (My Workspace / マイワークスペース) — a private area for uploading their own files. Personal workspace files:

Are accessible only to the uploading user.
Can be searched with grep (exact-string matching).
Are never indexed into the shared Elasticsearch index or the shared knowledge graph.
Are not available as citations in shared chat sessions.

This separation means that a user can keep working notes, draft specs, or personal reference files without inadvertently polluting the shared knowledge base that the rest of the team relies on.

Directory-mirror principle (summary)

The directory-mirror principle in one sentence: the folder tree on disk is the knowledge base. Sherpa does not maintain a separate internal copy of your documents — it mirrors what the folder contains at the time of each ingest run.Practical implications:

To add content: add files to the folder, then re-ingest.
To update content: update files in the folder, then re-ingest.
To remove content: delete files from the folder, then re-ingest.
There is no versioning, branching, or draft concept inside Sherpa. If you need version history, use Git on the source folder.

Getting Started

Using Sherpa

Administration

Deployment

Core concepts: worlds, scope paths, lenses, and agents

Worlds

Scope paths

Analysis lenses

Impact analysis

Troubleshooting

Spec Q&A

Knowledge graph

Node labels

Edge types

Ingest pipeline

AI providers

Personal workspace

Directory-mirror principle (summary)

Build docs developers (and LLMs) love

Getting Started

Using Sherpa

Administration

Deployment

Documentation Index

​Worlds

​Scope paths

​Analysis lenses

​Impact analysis

​Troubleshooting

​Spec Q&A

​Knowledge graph

​Node labels

​Edge types

​Ingest pipeline

​AI providers

​Personal workspace

​Directory-mirror principle (summary)

Build docs developers (and LLMs) love

Worlds

Scope paths

Analysis lenses

Impact analysis

Troubleshooting

Spec Q&A

Knowledge graph

Node labels

Edge types

Ingest pipeline

AI providers

Personal workspace

Directory-mirror principle (summary)