Skip to main content

Overview

The Repository Analyzer module provides utilities for analyzing code repositories, generating summaries, and calculating importance scores for various code components.

Repository Summary

generate_repository_summary

Generate an intelligent summary of a code repository by analyzing important files.
def generate_repository_summary(
    code_list: list[dict],
    max_important_files_token: int = 2000
) -> dict
code_list
list
required
List containing code file information. Each element should contain:
{
    "file_path": "File path",
    "file_content": "File content"
}
max_important_files_token
int
default:"2000"
Token count limit for important files to prevent summary from becoming too large
Returns: Dictionary mapping file paths to their summaries Example:
from src.core.repo_summary import generate_repository_summary

code_files = [
    {
        "file_path": "README.md",
        "file_content": "# Project Name\n\nDescription..."
    },
    {
        "file_path": "src/main.py",
        "file_content": "def main():\n    ..."
    }
]

summary = generate_repository_summary(
    code_list=code_files,
    max_important_files_token=3000
)

get_readme_summary

Extract and summarize key information from README and documentation files.
def get_readme_summary(
    code_content: str,
    history_summary: dict
) -> str
code_content
string
required
Content of README or documentation file
history_summary
dict
required
Previous summary context to avoid duplication
Returns: Concise summary with important information preserved, using <cite> tags for referenced content Example:
from src.core.repo_summary import get_readme_summary

readme_content = """
# MyProject

A powerful data processing framework.

## Installation
pip install myproject

## Usage
from myproject import Processor
"""

summary = get_readme_summary(readme_content, {})

Importance Analyzer

ImportanceAnalyzer

Analyze and score the importance of code components using multiple metrics.

Constructor

ImportanceAnalyzer(
    repo_path: str,
    modules: Dict,
    classes: Dict,
    functions: Dict,
    imports: Dict,
    code_tree: Dict,
    call_graph: Optional[nx.DiGraph] = None,
    weights: Optional[Dict] = None
)
repo_path
string
required
Path to the code repository
modules
dict
required
Module information dictionary from code tree builder
classes
dict
required
Class information dictionary
functions
dict
required
Function information dictionary
imports
dict
required
Import relationships dictionary
code_tree
dict
required
Complete code tree structure
call_graph
nx.DiGraph
NetworkX directed graph representing function call relationships
weights
dict
Custom weights for importance calculation metrics. Default weights:
{
    'key_component': 0.0,
    'usage': 2.0,
    'imports_relationships': 3,
    'complexity': 1.0,
    'semantic': 0.5,
    'documentation': 0.0,
    'git_history': 4.0,
    'size': 0.0
}

Methods

calculate_node_importance

Calculate the overall importance score of a code node.
def calculate_node_importance(node: Dict) -> float
node
dict
required
Node information dictionary containing ‘type’, ‘id’, ‘name’, etc.
Returns: Importance score from 0.0 to 10.0 Example:
from src.core.importance_analyzer import ImportanceAnalyzer

analyzer = ImportanceAnalyzer(
    repo_path="/path/to/repo",
    modules=modules_dict,
    classes=classes_dict,
    functions=functions_dict,
    imports=imports_dict,
    code_tree=tree
)

# Calculate module importance
module_node = {
    'type': 'module',
    'id': 'src.core.engine',
    'name': 'engine'
}
score = analyzer.calculate_node_importance(module_node)
print(f"Importance score: {score}/10.0")

get_file_history_importance

Calculate importance based on Git commit history.
def get_file_history_importance(file_path: str) -> float
file_path
string
required
Path to the file to analyze
Returns: Importance score from 0.0 to 1.0 based on:
  • Number of commits (files with more commits are more important)
  • Recency of changes (recently modified files score higher)
Example:
git_score = analyzer.get_file_history_importance("src/core/engine.py")
# Returns higher score for frequently modified files

Importance Scoring Metrics

The ImportanceAnalyzer uses multiple weighted metrics to calculate importance:

1. Usage Analysis (Weight: 2.0)

Measures how frequently a component is referenced:
  • Import count for modules
  • Function call frequency
  • Class instantiation count

2. Import Relationships (Weight: 3.0)

Analyzes inter-module dependencies using graph metrics:
  • In-degree: How many modules import this module
  • Out-degree: How many modules this imports
  • PageRank: Centrality in dependency network
  • Betweenness: Importance as a bridge between modules

3. Code Complexity (Weight: 1.0)

Evaluates code complexity through:
  • Conditional statements (if/else)
  • Loops (for/while)
  • Exception handling (try/except)
  • Function nesting depth

4. Semantic Importance (Weight: 0.5)

Identifies important patterns in names:
  • Keywords: ‘main’, ‘core’, ‘engine’, ‘api’, ‘service’
  • Entry points: __main__, ‘app’, ‘config’
  • Architectural patterns: ‘controller’, ‘manager’, ‘handler’

5. Git History (Weight: 4.0)

Analyzes version control history:
  • Commit frequency
  • Last modification time
  • Change recency

Usage Example

from src.core.importance_analyzer import ImportanceAnalyzer
from src.core.tree_code import GlobalCodeTreeBuilder

# Build code tree
builder = GlobalCodeTreeBuilder("/path/to/repo")
builder.parse_repository()

# Create analyzer with custom weights
custom_weights = {
    'usage': 3.0,              # Emphasize usage patterns
    'imports_relationships': 4.0,  # Focus on architecture
    'git_history': 2.0         # Reduce git weight
}

analyzer = ImportanceAnalyzer(
    repo_path="/path/to/repo",
    modules=builder.modules,
    classes=builder.classes,
    functions=builder.functions,
    imports=builder.imports,
    code_tree=builder.code_tree,
    weights=custom_weights
)

# Analyze all modules and rank by importance
module_scores = []
for module_id, module_info in builder.modules.items():
    node = {
        'type': 'module',
        'id': module_id,
        'name': module_info['name']
    }
    score = analyzer.calculate_node_importance(node)
    module_scores.append((module_id, score))

# Sort by importance
module_scores.sort(key=lambda x: x[1], reverse=True)

# Print top 10 most important modules
print("Top 10 Most Important Modules:")
for module_id, score in module_scores[:10]:
    print(f"{score:5.2f} - {module_id}")

Advanced Features

Module Dependency Graph

The analyzer automatically builds a NetworkX directed graph of module dependencies:
# Access the dependency graph
graph = analyzer.module_dependency_graph

# Analyze specific module
module_id = "src.core.engine"
in_degree = graph.in_degree(module_id)   # Modules that import this
out_degree = graph.out_degree(module_id)  # Modules this imports

print(f"{module_id} is imported by {in_degree} modules")
print(f"{module_id} imports {out_degree} modules")

Important Keywords

The analyzer recognizes semantically important keywords:
  • Core components: main, core, engine
  • API layers: api, service, controller
  • Infrastructure: manager, handler, processor
  • Patterns: factory, builder, provider, repository
  • Operations: executor, scheduler
  • Configuration: config, security

See Also

Build docs developers (and LLMs) love