Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/XxYouDeaDPunKxX/ai-protocol-kit/llms.txt

Use this file to discover all available pages before exploring further.

The GitHub Pages Discovery Set Protocol v1 is a working contract for building the machine-readable discovery layer on top of a GitHub Pages artifact. It provides infrastructure that helps crawlers, bots, LLM readers, and external systems understand what has been published — without competing with the human-facing page. Use it after a GitHub Pages deployment has been committed to, once the owner, repository, deployment type, and canonical URLs are known.
This protocol applies only to artifacts published through GitHub Pages. Do not apply it to generic static hosting, non-GitHub web hosting, SaaS apps, backend apps, CMS sites, or deployment targets that are not GitHub Pages. A custom domain is in scope only when it is configured for GitHub Pages. If the deployment target is not GitHub Pages, stop and use the HTML and Website Discovery Set Protocol instead.

Core Principle

Discovery Set is infrastructure. It helps crawlers, bots, LLM readers, and external systems understand the published GitHub Pages artifact. It must not compete with the human page. GitHub Pages path rules are part of the artifact. Never generate discovery URLs before the GitHub Pages publication root is closed.

GitHub Pages Gate

Before generating any discovery files, all required variables must be closed. If any required value is missing, do not guess paths — ask for the missing deployment root or produce placeholders only.
VariableDescription
OWNERGitHub user or organisation name
REPORepository name
PAGES_TYPEUSER_OR_ORG_SITE, PROJECT_SITE, or CUSTOM_DOMAIN_SITE
SOURCE_BRANCHBranch used by GitHub Pages, when known
SOURCE_PATHroot or /docs, when known
CUSTOM_DOMAINActive custom domain, or NONE
DOMAIN_ROOTAbsolute URL of the bare Pages host or custom domain
SITE_ROOTAbsolute URL where the GitHub Pages artifact begins
DISCOVERY_ROOTRoot where llms.txt, raw-manifest.json, sitemap.xml, and companion files live
CURRENT_PAGE_URLAbsolute canonical URL of each canonical HTML page

GitHub Pages Deployment Types

Repository pattern: OWNER.github.ioDefault SITE_ROOT: https://OWNER.github.io/DOMAIN_ROOT usually equals SITE_ROOT. Root-relative discovery paths may be valid only when DOMAIN_ROOT and SITE_ROOT are the same.Example variables:
OWNER = example-user
REPO  = example-user.github.io
SITE_ROOT  = https://example-user.github.io/
DISCOVERY_ROOT = https://example-user.github.io/

Publication Variables and Default Resolution

VariableDefault
DISCOVERY_ROOTequals SITE_ROOT
LLMS_URLURL_JOIN(DISCOVERY_ROOT, "llms.txt")
RAW_MANIFEST_URLURL_JOIN(DISCOVERY_ROOT, "raw-manifest.json")
SITEMAP_URLURL_JOIN(DISCOVERY_ROOT, "sitemap.xml")
ROBOTS_URL (authoritative)URL_JOIN(DOMAIN_ROOT, "robots.txt")
ROBOTS_URL (companion)URL_JOIN(DISCOVERY_ROOT, "robots.txt")
When constructing URLs, join path segments with exactly one slash. Do not construct discovery URLs through raw string concatenation. Prefer absolute URLs when in doubt.

Site vs. Page Classification

Before generating discovery files, classify the artifact using the binary classifier: Does this GitHub Pages artifact expose more than one canonical HTML URL under the same SITE_ROOT?
AnswerType
YesSITE — multiple canonical HTML pages
NoPAGE — single canonical HTML URL
Anchor links, GitHub repository links, links to llms.txt, raw-manifest.json, sitemap.xml, assets, or external pages do not make a SITE. Only multiple canonical HTML pages under the same SITE_ROOT make a SITE.

What the Discovery Set Includes

Definition: Single canonical HTML URL, usually one index.html at SITE_ROOT.HEAD — required elements
  • <title>
  • <meta name="description">
  • <meta name="robots" content="index, follow">
  • <link rel="canonical" href="[CURRENT_PAGE_URL]">
  • <link rel="alternate" type="text/plain" href="[LLMS_URL]" title="LLM-readable index">
  • <link rel="alternate" type="application/json" href="[RAW_MANIFEST_URL]" title="Machine-readable manifest">
  • Open Graph: og:title, og:description, og:type="website", og:url, og:image (absolute)
  • JSON-LD: SoftwareSourceCode for repos/protocols/tools; WebPage for editorial; ProfilePage for profile surfaces
  • Favicon
Discovery root files
  • index.html
  • llms.txt
  • raw-manifest.json
  • sitemap.xml — single <url> entry for CURRENT_PAGE_URL
  • robots.txt — authoritative only when DOMAIN_ROOT is controlled; otherwise optional companion
FooterLow-noise machine links (visible, small, non-dominant):
<!-- Discovery Set: low-noise links for crawlers, bots, and LLM readers. -->
<div class="machine-links" aria-label="Machine-readable project files">
  <span>Machine-readable:</span>
  <a href="llms.txt">llms.txt</a>
  <a href="raw-manifest.json">manifest</a>
  <a href="sitemap.xml">sitemap</a>
</div>

Robots Authority Rule

robots.txt has a crawler-standard location at DOMAIN_ROOT/robots.txt.
  • If the GitHub Pages deployment controls DOMAIN_ROOT, place authoritative robots.txt there.
  • If the artifact is a PROJECT_SITE under OWNER.github.io/REPO/ and does not control DOMAIN_ROOT, a robots.txt at DISCOVERY_ROOT may be included as a project-level discovery companion — but it must not be treated as authoritative crawler control.
  • sitemap.xml may be cited from an authoritative domain-root robots.txt when domain-root control exists.

Path Rules Summary

ElementRule
canonicalAlways absolute URL; equals CURRENT_PAGE_URL
og:urlAlways absolute URL; equals CURRENT_PAGE_URL
og:imageAlways absolute URL
robots sitemap entryMust be absolute URL
llms.txt / raw-manifest.json in HEADRelative allowed at DISCOVERY_ROOT; absolute preferred for subpages and PROJECT_SITE
Root-relative paths (/llms.txt)Only allowed when DISCOVERY_ROOT equals DOMAIN_ROOT

What Happens When Required Variables Are Missing

If OWNER, REPO, PAGES_TYPE, CUSTOM_DOMAIN state, SITE_ROOT, or DISCOVERY_ROOT is missing, do not guess paths. Ask for the missing deployment root or produce placeholders only. Never assume a root-relative discovery path until PAGES_TYPE, SITE_ROOT, DOMAIN_ROOT, and DISCOVERY_ROOT are all closed.

Build docs developers (and LLMs) love