Repository Analyzer

Overview

The Repository Analyzer module provides utilities for analyzing code repositories, generating summaries, and calculating importance scores for various code components.

Repository Summary

generate_repository_summary

Generate an intelligent summary of a code repository by analyzing important files.

def generate_repository_summary(
    code_list: list[dict],
    max_important_files_token: int = 2000
) -> dict

code_list

list

required

List containing code file information. Each element should contain:

{
    "file_path": "File path",
    "file_content": "File content"
}

max_important_files_token

int

default:"2000"

Token count limit for important files to prevent summary from becoming too large

Returns: Dictionary mapping file paths to their summaries Example:

from src.core.repo_summary import generate_repository_summary

code_files = [
    {
        "file_path": "README.md",
        "file_content": "# Project Name\n\nDescription..."
    },
    {
        "file_path": "src/main.py",
        "file_content": "def main():\n    ..."
    }
]

summary = generate_repository_summary(
    code_list=code_files,
    max_important_files_token=3000
)

get_readme_summary

Extract and summarize key information from README and documentation files.

def get_readme_summary(
    code_content: str,
    history_summary: dict
) -> str

code_content

string

required

Content of README or documentation file

history_summary

dict

required

Previous summary context to avoid duplication

Returns: Concise summary with important information preserved, using <cite> tags for referenced content Example:

from src.core.repo_summary import get_readme_summary

readme_content = """
# MyProject

A powerful data processing framework.

## Installation
pip install myproject

## Usage
from myproject import Processor
"""

summary = get_readme_summary(readme_content, {})

Importance Analyzer

ImportanceAnalyzer

Analyze and score the importance of code components using multiple metrics.

Constructor

ImportanceAnalyzer(
    repo_path: str,
    modules: Dict,
    classes: Dict,
    functions: Dict,
    imports: Dict,
    code_tree: Dict,
    call_graph: Optional[nx.DiGraph] = None,
    weights: Optional[Dict] = None
)

repo_path

string

required

Path to the code repository

modules

dict

required

Module information dictionary from code tree builder

classes

dict

required

Class information dictionary

functions

dict

required

Function information dictionary

imports

dict

required

Import relationships dictionary

code_tree

dict

required

Complete code tree structure

call_graph

nx.DiGraph

NetworkX directed graph representing function call relationships

weights

dict

Custom weights for importance calculation metrics. Default weights:

{
    'key_component': 0.0,
    'usage': 2.0,
    'imports_relationships': 3,
    'complexity': 1.0,
    'semantic': 0.5,
    'documentation': 0.0,
    'git_history': 4.0,
    'size': 0.0
}

Methods

calculate_node_importance

Calculate the overall importance score of a code node.

def calculate_node_importance(node: Dict) -> float

node

dict

required

Node information dictionary containing ‘type’, ‘id’, ‘name’, etc.

Returns: Importance score from 0.0 to 10.0 Example:

from src.core.importance_analyzer import ImportanceAnalyzer

analyzer = ImportanceAnalyzer(
    repo_path="/path/to/repo",
    modules=modules_dict,
    classes=classes_dict,
    functions=functions_dict,
    imports=imports_dict,
    code_tree=tree
)

# Calculate module importance
module_node = {
    'type': 'module',
    'id': 'src.core.engine',
    'name': 'engine'
}
score = analyzer.calculate_node_importance(module_node)
print(f"Importance score: {score}/10.0")

get_file_history_importance

Calculate importance based on Git commit history.

def get_file_history_importance(file_path: str) -> float

file_path

string

required

Path to the file to analyze

Returns: Importance score from 0.0 to 1.0 based on:

Number of commits (files with more commits are more important)
Recency of changes (recently modified files score higher)

Example:

git_score = analyzer.get_file_history_importance("src/core/engine.py")
# Returns higher score for frequently modified files

Importance Scoring Metrics

The ImportanceAnalyzer uses multiple weighted metrics to calculate importance:

1. Usage Analysis (Weight: 2.0)

Measures how frequently a component is referenced:

Import count for modules
Function call frequency
Class instantiation count

2. Import Relationships (Weight: 3.0)

Analyzes inter-module dependencies using graph metrics:

In-degree: How many modules import this module
Out-degree: How many modules this imports
PageRank: Centrality in dependency network
Betweenness: Importance as a bridge between modules

3. Code Complexity (Weight: 1.0)

Evaluates code complexity through:

Conditional statements (if/else)
Loops (for/while)
Exception handling (try/except)
Function nesting depth

4. Semantic Importance (Weight: 0.5)

Identifies important patterns in names:

Keywords: ‘main’, ‘core’, ‘engine’, ‘api’, ‘service’
Entry points: __main__, ‘app’, ‘config’
Architectural patterns: ‘controller’, ‘manager’, ‘handler’

5. Git History (Weight: 4.0)

Analyzes version control history:

Commit frequency
Last modification time
Change recency

Usage Example

from src.core.importance_analyzer import ImportanceAnalyzer
from src.core.tree_code import GlobalCodeTreeBuilder

# Build code tree
builder = GlobalCodeTreeBuilder("/path/to/repo")
builder.parse_repository()

# Create analyzer with custom weights
custom_weights = {
    'usage': 3.0,              # Emphasize usage patterns
    'imports_relationships': 4.0,  # Focus on architecture
    'git_history': 2.0         # Reduce git weight
}

analyzer = ImportanceAnalyzer(
    repo_path="/path/to/repo",
    modules=builder.modules,
    classes=builder.classes,
    functions=builder.functions,
    imports=builder.imports,
    code_tree=builder.code_tree,
    weights=custom_weights
)

# Analyze all modules and rank by importance
module_scores = []
for module_id, module_info in builder.modules.items():
    node = {
        'type': 'module',
        'id': module_id,
        'name': module_info['name']
    }
    score = analyzer.calculate_node_importance(node)
    module_scores.append((module_id, score))

# Sort by importance
module_scores.sort(key=lambda x: x[1], reverse=True)

# Print top 10 most important modules
print("Top 10 Most Important Modules:")
for module_id, score in module_scores[:10]:
    print(f"{score:5.2f} - {module_id}")

Advanced Features

Module Dependency Graph

The analyzer automatically builds a NetworkX directed graph of module dependencies:

# Access the dependency graph
graph = analyzer.module_dependency_graph

# Analyze specific module
module_id = "src.core.engine"
in_degree = graph.in_degree(module_id)   # Modules that import this
out_degree = graph.out_degree(module_id)  # Modules this imports

print(f"{module_id} is imported by {in_degree} modules")
print(f"{module_id} imports {out_degree} modules")

Important Keywords

The analyzer recognizes semantically important keywords:

Core components: main, core, engine
API layers: api, service, controller
Infrastructure: manager, handler, processor
Patterns: factory, builder, provider, repository
Operations: executor, scheduler
Configuration: config, security

Core API

Agents API

Tools & Utilities

Overview

Repository Summary

generate_repository_summary

get_readme_summary

Importance Analyzer

ImportanceAnalyzer

Constructor

Methods

calculate_node_importance

get_file_history_importance

Importance Scoring Metrics

1. Usage Analysis (Weight: 2.0)

2. Import Relationships (Weight: 3.0)

3. Code Complexity (Weight: 1.0)

4. Semantic Importance (Weight: 0.5)

5. Git History (Weight: 4.0)

Usage Example

Advanced Features

Module Dependency Graph

Important Keywords

See Also

Build docs developers (and LLMs) love

Core API

Agents API

Tools & Utilities

Documentation Index

​Overview

​Repository Summary

​generate_repository_summary

​get_readme_summary

​Importance Analyzer

​ImportanceAnalyzer

​Constructor

​Methods

​calculate_node_importance

​get_file_history_importance

​Importance Scoring Metrics

​1. Usage Analysis (Weight: 2.0)

​2. Import Relationships (Weight: 3.0)

​3. Code Complexity (Weight: 1.0)

​4. Semantic Importance (Weight: 0.5)

​5. Git History (Weight: 4.0)

​Usage Example

​Advanced Features

​Module Dependency Graph

​Important Keywords

​See Also

Build docs developers (and LLMs) love

Overview

Repository Summary

generate_repository_summary

get_readme_summary

Importance Analyzer

ImportanceAnalyzer

Constructor

Methods

calculate_node_importance

get_file_history_importance

Importance Scoring Metrics

1. Usage Analysis (Weight: 2.0)

2. Import Relationships (Weight: 3.0)

3. Code Complexity (Weight: 1.0)

4. Semantic Importance (Weight: 0.5)

5. Git History (Weight: 4.0)

Usage Example

Advanced Features

Module Dependency Graph

Important Keywords

See Also