Skip to main content

Architecture Overview

Beacon transforms any codebase into an agent-ready repository through three distinct phases:
1

Scan

Walk the repository and extract context: README, source files, package manifests, OpenAPI specs
2

Infer

Send context to an AI provider to identify capabilities, endpoints, and schemas
3

Generate

Render the inferred data into a standards-compliant AGENTS.md file
This architecture keeps Beacon fast, accurate, and provider-agnostic.

Phase 1: Scanning

Module: src/scanner.rs

What Gets Scanned

Beacon performs a filesystem walk using the walkdir crate, intelligently collecting:
README files — Any file starting with README (case-insensitive)These provide high-level context about what the project does.
if filename.starts_with("readme") {
    ctx.readme = read_file(path).ok();
    println!("   ✓ README found");
}
See src/scanner.rs:70

What Gets Skipped

To keep scanning fast and focused, Beacon ignores: Directories:
const SKIP_DIRS: &[&str] = &[
    "target", "node_modules", ".git", ".github", "dist",
    "build", "__pycache__", ".venv", "venv",
];
Files:
  • Hidden files (starting with .)
  • Lock files (*.lock, *.sum)
  • System files (.DS_Store, Thumbs.db)
  • Files larger than 50KB (to avoid overwhelming the AI)
See src/scanner.rs:7 and src/scanner.rs:17

Output: RepoContext

The scanner produces a RepoContext struct defined in src/models.rs:60:
pub struct RepoContext {
    pub name: String,
    pub readme: Option<String>,
    pub source_files: Vec<SourceFile>,
    pub openapi_spec: Option<String>,
    pub package_manifest: Option<String>,
    pub existing_agents_md: Option<String>,
}
This structured context is what gets sent to the AI in Phase 2.
Beacon also detects existing AGENTS.md files during scanning. This allows the AI to refine and update existing documentation rather than starting from scratch.

Phase 2: Inference

Module: src/inferrer.rs This is where the magic happens — Beacon uses AI to understand what your code actually does.

The Inference Prompt

Beacon constructs a detailed prompt in src/inferrer.rs:272:
fn build_prompt(ctx: &RepoContext) -> String {
    let mut parts: Vec<String> = vec![
        "You are an expert at analyzing software repositories...",
        "Analyze the following repository context and return a JSON object...",
        "GUIDANCE: Look beyond just utility scripts. Identify server-side capabilities, REST API endpoints...",
        "CRITICAL: Return ONLY valid JSON. No markdown, no explanation...",
    ];
    
    // Include README (truncated to 3000 chars)
    if let Some(readme) = &ctx.readme {
        parts.push(format!("\n## README\n{}", truncate(readme, 3000)));
    }
    
    // Include package manifest (1000 chars)
    // Include OpenAPI spec (3000 chars)
    // Include up to 10 source files (1500 chars each)
    
    parts.join("\n")
}
The prompt provides:
  1. Explicit instructions to focus on agent-usable capabilities (APIs, services, not just scripts)
  2. The exact JSON schema the AI must follow
  3. Truncated context from the scanned repository
Beacon truncates long files to keep token usage reasonable while still providing enough context for accurate inference. README gets 3000 chars, source files get 1500 chars each.

Provider-Specific Implementation

Beacon supports four AI providers, each with different API formats:
Default provider — Fast and cost-effective
async fn call_gemini(prompt: &str, api_key: &str) -> Result<AgentsManifest> {
    let response = CLIENT
        .post(format!("{}?key={}", GEMINI_URL, api_key))
        .json(&json!({
            "contents": [{ "parts": [{ "text": prompt }] }],
            "generationConfig": {
                "temperature": 0.2,
                "responseMimeType": "application/json"
            }
        }))
        .send()
        .await?;
    // Parse response...
}
Uses low temperature (0.2) for deterministic, accurate output and enforces JSON response format.See src/inferrer.rs:63

Response Parsing

All providers return structured JSON that gets deserialized into:
pub struct AgentsManifest {
    pub name: String,
    pub description: String,
    pub version: Option<String>,
    pub capabilities: Vec<Capability>,
    pub endpoints: Vec<Endpoint>,
    pub authentication: Option<Authentication>,
    pub rate_limits: Option<RateLimits>,
    pub contact: Option<String>,
}
Defined in src/models.rs:5 The AI populates this structure by analyzing your code and identifying:
  • Capabilities — High-level actions agents can perform
  • Endpoints — Specific API routes with parameters
  • Authentication — How agents should authenticate
  • Rate Limits — Usage constraints

Phase 3: Generation

Module: src/generator.rs The final phase is straightforward: transform the structured AgentsManifest into markdown.

Rendering Logic

pub fn generate_agents_md(manifest: &AgentsManifest, output_path: &str) -> Result<()> {
    let content = render_markdown(manifest);
    fs::write(output_path, &content)?;
    println!("   ✓ Written to {}", output_path);
    Ok(())
}
See src/generator.rs:5 The render_markdown function builds the AGENTS.md file section by section:
out.push_str(&format!("# AGENTS.md — {}\n\n", m.name));
out.push_str(&format!("> {}\n\n", m.description));

if let Some(version) = &m.version {
    out.push_str(&format!("**Version:** {}\n\n", version));
}
See src/generator.rs:15
if let Some(auth) = &m.authentication {
    out.push_str("## Authentication\n\n");
    out.push_str(&format!("**Type:** `{}`\n\n", auth.r#type));
    if let Some(desc) = &auth.description {
        out.push_str(&format!("{}\n\n", desc));
    }
}
See src/generator.rs:24
for cap in &m.capabilities {
    out.push_str(&format!("### `{}`\n\n", cap.name));
    out.push_str(&format!("{}\n\n", cap.description));
    
    if let Some(input) = &cap.input_schema {
        out.push_str("**Input:**\n\n```json\n");
        out.push_str(&serde_json::to_string_pretty(input).unwrap_or_default());
        out.push_str("\n```\n\n");
    }
    
    if let Some(output) = &cap.output_schema {
        out.push_str("**Output:**\n\n```json\n");
        out.push_str(&serde_json::to_string_pretty(output).unwrap_or_default());
        out.push_str("\n```\n\n");
    }
    
    if !cap.examples.is_empty() {
        out.push_str("**Examples:**\n\n");
        for ex in &cap.examples {
            out.push_str(&format!("- {}\n", ex));
        }
    }
}
See src/generator.rs:37
for ep in &m.endpoints {
    out.push_str(&format!(
        "### `{} {}`\n\n{}\n\n",
        ep.method.to_uppercase(),
        ep.path,
        ep.description
    ));
    
    if !ep.parameters.is_empty() {
        out.push_str("| Parameter | Type | Required | Description |\n");
        out.push_str("|-----------|------|----------|-------------|\n");
        for p in &ep.parameters {
            out.push_str(&format!(
                "| `{}` | `{}` | {} | {} |\n",
                p.name,
                p.r#type,
                if p.required { "✅" } else { "❌" },
                p.description
            ));
        }
    }
}
See src/generator.rs:70

The Complete Flow

1

User runs Beacon

beacon generate ./my-project --provider gemini
2

Scanner walks the filesystem

Collects README, source files (up to 50), package manifest, OpenAPI specs.Skips node_modules, .git, files over 50KB.
3

Inferrer sends context to AI

Truncated context (README: 3000 chars, sources: 1500 chars each) sent to Gemini.Temperature set to 0.2 for deterministic output.
4

AI returns structured JSON

{
  "name": "my-project",
  "description": "A tool that...",
  "capabilities": [...],
  "endpoints": [...],
  "authentication": {...}
}
5

Generator renders AGENTS.md

Transforms JSON into markdown with proper formatting, JSON schema blocks, and tables.
6

File written to disk

✅ Done! AGENTS.md written to: AGENTS.md
   Provider:     gemini
   Capabilities: 3
   Endpoints:    5

CLI vs API Mode

Beacon can run as both a CLI tool and a web API:
# Generate from local path
beacon generate ./my-project

# Generate from GitHub URL
beacon generate https://github.com/user/repo

# Specify provider and output
beacon generate ./my-project \
  --provider claude \
  --api-key sk-ant-... \
  --output ./docs/AGENTS.md
In API mode, Beacon includes:
  • Rate limiting — 20 requests per minute per IP (using Redis)
  • Health checksGET /health returns service status
  • Payment handling — For Beacon Cloud provider
See src/main.rs:448 for the Axum server implementation

Performance Characteristics

Scanning

Fast — Filesystem walk is I/O boundTypical repo: < 1 secondLarge monorepo: 2-5 seconds

Inference

Variable — Depends on AI providerGemini: 2-5 secondsClaude: 3-8 secondsOpenAI: 3-6 seconds

Generation

Instant — Pure string formatting< 100ms for any repo size

Total Time

3-15 seconds for most reposDominated by AI inference latency

Error Handling

Beacon uses anyhow::Result for error propagation and custom errors in src/errors.rs:
pub enum BeaconError {
    InferenceError(String),
    ValidationError(String),
    CloudError { status: u16, message: String },
    PaymentRequired { run_id: String, amount: String, ... },
    TransactionAlreadyUsed,
    // ...
}
Errors are:
  • Logged with tracing at appropriate levels
  • Returned as HTTP status codes in API mode
  • Displayed with context in CLI mode

What Makes This Architecture Special

The three-phase design cleanly separates data collection (scanning) from AI inference.This means you can swap providers without changing the scanning or generation logic.
By using structured JSON schemas and low temperature (0.2), Beacon produces consistent results.Running the same repo through Beacon twice yields nearly identical AGENTS.md files.
Truncation limits keep token usage reasonable while preserving the most important information (README, key source files).This makes Beacon fast and cost-effective even for large codebases.
The generated markdown follows the AAIF specification exactly, ensuring compatibility with any agent that supports the standard.

Next Steps

AGENTS.md Specification

Learn about the AGENTS.md format in detail

AI Providers

Choose the right provider for your needs

Generate Your First File

Try Beacon on your repository

API Reference

Use Beacon as a web service

Build docs developers (and LLMs) love