Sherpa is built around a small set of concepts that fit together consistently. Once you understand how a world maps to a folder, how scope paths filter within it, and how the three analysis lenses use the underlying knowledge graph, the behaviour of every feature in the UI will make sense. This page defines each concept precisely, using the same terminology you will see in the interface and the API.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/tudoumono/Sherpa/llms.txt
Use this file to discover all available pages before exploring further.
Worlds
A world is the knowledge base derived from a single registered directory. When you register a folder path in Sherpa, Sherpa creates:- One Neo4j graph — nodes and edges extracted from the folder’s contents.
- One Elasticsearch index — full-text chunks of all documents in the folder.
world_id that ties it back to its source registration.
Key behaviours:
- Re-ingest = replace. Running ingest again on the same world scans the folder and brings the knowledge base into sync: changed files are replaced, deleted files are removed, new files are added. There is no separate “publish” or “approve” step.
- Delete = remove everything. Deleting a world removes the graph, the ES index, and all derived artefacts for that
world_id. The source folder on disk is untouched. - One folder : one world. The same folder path cannot be registered to two different worlds simultaneously (
UNIQUE(root_path)constraint). If you need to point a world at a new path, you must delete the existing world and re-register from the new path.
Scope paths
A scope path is a subfolder prefix inside a world used to narrow a search, impact analysis, or citation to a subset of the world’s content. For example, if your world contains:4期更改/02_設計 (just the design documents for the fourth-generation change project) or at 4期更改 (the entire generation), or issue a cross-scope query spanning 4期更改 and 5期更改.
How scope filtering works:
- The same
top_scope,phase, andcategorymetadata attributes are attached to graph nodes, ES chunks, and document ledger entries alike. Filtering is consistent across all three stores. - Impact analysis Cypher queries include the scope prefix directly in the traversal, so the graph traversal itself is bounded — not just the result set.
- Common components (e.g. a shared COBOL copybook that appears in multiple subfolders) are resolved within the same generation (top-level folder segment). They are not merged across generations, which prevents false cross-generation impact paths.
Analysis lenses
Sherpa recognises three investigation modes. The AI agent selects the appropriate lens automatically from the wording of your question, but you can also select it explicitly in the UI.Impact analysis
What it does: Traverses the Neo4j knowledge graph starting from one or more nodes that correspond to the concept you named (e.g. a business parameter, a COBOL copybook, or a data item). The traversal follows dependency edges in reverse — “what depends on this?” — collecting all reachable nodes. What you receive:- Total hit count with a breakdown by status: fix now / needs review / not affected.
- The traversal path for each result (e.g.
消費税率 ← TAX-RATE ← 税率コピーブック ← BILLGEN), showing why each item is affected. - Source citations with download links.
- Optional Excel export of the full result set.
Troubleshooting
What it does: Takes a symptom description (e.g. “the nightly batch terminated abnormally”) and returns a ranked list of cause candidates drawn from the knowledge graph and document corpus. What you receive:- Cause-candidate cards ordered by confidence score.
- For each candidate: the inferred reason, suggested verification steps, and a source citation.
Incident nodes and RELATES_TO edges in the graph are the primary signal.
Spec Q&A
What it does: Answers a natural-language question about how a system works by retrieving the most relevant document excerpts from Elasticsearch and the knowledge graph. What you receive:- The relevant passage(s) from the source documents.
- A citation link to each source (original Office file or source code, downloadable in one click).
- If no supporting evidence is found, Sherpa returns “no evidence found” explicitly rather than generating a speculative answer.
Knowledge graph
The knowledge graph in Neo4j uses a closed vocabulary of node labels and edge types. Using a fixed schema (rather than open-ended entity extraction) ensures that impact traversals are reliable and reproducible.Node labels
| Layer | Label | Description |
|---|---|---|
| Business | BusinessRule | Calculation or decision logic (e.g. consumption-tax calculation rule) |
| Business | Parameter | Codes, constants, thresholds (e.g. consumption tax rate = 10%) |
| Function / UI | Function | Business function (e.g. invoice issuance) |
| Function / UI | Screen | UI screen (e.g. invoice list screen) |
| Function / UI | Report | Output report or form (e.g. invoice, delivery note) |
| Function / UI | Batch | Batch job or job-net (e.g. daily close). Derived from JCL. |
| Implementation | Module | Program / class / procedure (COBOL PROGRAM-ID, Java class, etc.) |
| Implementation | Copybook | COBOL copybook — the primary source of coverage in impact analysis |
| Implementation | Table | RDB table, VSAM file, or dataset |
| Implementation | DataItem | Column or copybook data item (e.g. TAX-RATE) |
| Evidence | Document | Design spec, definition document — one-to-one with the document ledger entry; the download anchor for citations |
| Evidence | Standard | Regulations or coding standards (e.g. Consumption Tax Act) |
| Evidence | Incident | Incident record — primary signal for troubleshooting |
Edge types
Edges are directed by dependency:A → B means “A depends on B”. Impact analysis traverses edges in reverse — “what depends on the node that changed?”
| Edge | Direction (dependent → dependency) | Example | Used in impact traversal |
|---|---|---|---|
USES | Function/Module → BusinessRule/Parameter | A function uses the tax rate | ✓ |
REFERENCES | BusinessRule → Parameter / Module → DataItem | A rule references the tax rate | ✓ |
IMPLEMENTED_BY | Function/BusinessRule → Module | A function is implemented by a COBOL module | ✓ |
INVOKES | Screen/Batch/Function/Module → Function/Module | A batch job invokes a function | ✓ |
PRODUCED_BY | Report → Function/Module/Batch | An invoice is produced by the invoice-issuance function | ✓ |
ACCESSES | Function/Module → Table/DataItem | A module reads or writes a DB table | ✓ |
COPIES | Module → Copybook | A COBOL module copies a copybook | ✓ |
CONTAINS | Copybook → DataItem / Table → DataItem | A copybook contains a data item | ✓ |
REALIZES | DataItem → Parameter/BusinessRule | A code-level item realises a business parameter (the bridge between business and implementation layers) | ✓ |
CONFORMS_TO | Function/Module → Standard | A module conforms to a legal standard | ✓ |
DOCUMENTS | Document → any artefact | A spec document describes a function (evidence attachment) | — |
RELATES_TO | Incident → any artefact | An incident is related to a function (troubleshooting signal) | △ |
REALIZES edge is the critical bridge between the business layer and the implementation layer. Without it, an impact query starting from a business term like “consumption tax rate” (Parameter) would not reach the COBOL modules that COPY the copybook containing TAX-RATE (DataItem).
Ingest pipeline
Sherpa’s ingest pipeline has two complementary tracks that run on every world ingest: Track S — Static analysis (structural skeleton) Sherpa parses source files directly without AI involvement. For COBOL programs it extractsPROGRAM-ID, COPY statements, CALL statements, and EXEC SQL / FILE references. For JCL it extracts jobs and steps. This produces the structural edges (COPIES, INVOKES, ACCESSES, etc.) with 100% reproducibility.
Name resolution uses a two-pass strategy:
- Collect all definitions (program IDs, copybook stems, job names, data items).
- Resolve references (COPY, CALL, EXEC PGM) using nearest-neighbour lookup: same folder → parent folder → generation root. References that cannot be resolved unambiguously are flagged as
ambiguous_referencerather than silently guessed.
BusinessRule, Parameter, Function, etc.) and the REALIZES bridges that link business concepts to implementation artefacts. This track produces the semantic edges that make natural-language starting terms work in impact analysis.
Office / document conversion
Word (.docx), Excel (.xlsx), and other Office files are converted to Markdown before indexing. The Markdown is what Elasticsearch indexes and what the AI reads. The original binary files are kept as-is for download — when you click a citation, you receive the original .xlsx or .docx, not the converted Markdown.
Source files
COBOL, JCL, and copybook files are stored as plain text. They are available for exact-string grep queries (used in the Spec Q&A and Troubleshooting agentic loops) as well as for static analysis.
AI providers
Sherpa supports four AI provider options. The provider is switchable per user from the settings panel in the chat UI.| Provider | Model / endpoint | Notes |
|---|---|---|
| Heuristic | — (no model) | Default. No AI key required. Uses deterministic Neo4j traversal and template-based answers. Shown in the UI as “簡易(AIなし)”. |
| Codex | gpt-5.5 (OpenAI Codex) | Agentic — Codex drives grep, ES, and graph tool calls autonomously. Reasoning depth configurable (low by default). Requires the Codex CLI to be installed. |
| OpenAI | GPT models via OpenAI API | Set OPENAI_API_KEY in .env. Text only — files are never uploaded to the API. |
| Gemini | Google Gemini API | Set GEMINI_API_KEY in the environment or via the settings UI. |
| Ollama | Local LLM via Ollama | Set OLLAMA_URL in .env (default http://localhost:11434). No external API calls. |
Regardless of which AI provider is selected, Sherpa sends only the extracted text content of documents to the provider — never the original binary files and never data from other worlds. Files are not retained by any external service.
Personal workspace
Each authenticated user has a personal workspace (My Workspace / マイワークスペース) — a private area for uploading their own files. Personal workspace files:- Are accessible only to the uploading user.
- Can be searched with grep (exact-string matching).
- Are never indexed into the shared Elasticsearch index or the shared knowledge graph.
- Are not available as citations in shared chat sessions.