Overview
The Repository Analyzer module provides utilities for analyzing code repositories, generating summaries, and calculating importance scores for various code components.
Repository Summary
generate_repository_summary
Generate an intelligent summary of a code repository by analyzing important files.
def generate_repository_summary(
code_list: list[dict],
max_important_files_token: int = 2000
) -> dict
List containing code file information. Each element should contain:{
"file_path": "File path",
"file_content": "File content"
}
max_important_files_token
Token count limit for important files to prevent summary from becoming too large
Returns: Dictionary mapping file paths to their summaries
Example:
from src.core.repo_summary import generate_repository_summary
code_files = [
{
"file_path": "README.md",
"file_content": "# Project Name\n\nDescription..."
},
{
"file_path": "src/main.py",
"file_content": "def main():\n ..."
}
]
summary = generate_repository_summary(
code_list=code_files,
max_important_files_token=3000
)
get_readme_summary
Extract and summarize key information from README and documentation files.
def get_readme_summary(
code_content: str,
history_summary: dict
) -> str
Content of README or documentation file
Previous summary context to avoid duplication
Returns: Concise summary with important information preserved, using <cite> tags for referenced content
Example:
from src.core.repo_summary import get_readme_summary
readme_content = """
# MyProject
A powerful data processing framework.
## Installation
pip install myproject
## Usage
from myproject import Processor
"""
summary = get_readme_summary(readme_content, {})
Importance Analyzer
ImportanceAnalyzer
Analyze and score the importance of code components using multiple metrics.
Constructor
ImportanceAnalyzer(
repo_path: str,
modules: Dict,
classes: Dict,
functions: Dict,
imports: Dict,
code_tree: Dict,
call_graph: Optional[nx.DiGraph] = None,
weights: Optional[Dict] = None
)
Path to the code repository
Module information dictionary from code tree builder
Class information dictionary
Function information dictionary
Import relationships dictionary
Complete code tree structure
NetworkX directed graph representing function call relationships
Custom weights for importance calculation metrics. Default weights:{
'key_component': 0.0,
'usage': 2.0,
'imports_relationships': 3,
'complexity': 1.0,
'semantic': 0.5,
'documentation': 0.0,
'git_history': 4.0,
'size': 0.0
}
Methods
calculate_node_importance
Calculate the overall importance score of a code node.
def calculate_node_importance(node: Dict) -> float
Node information dictionary containing ‘type’, ‘id’, ‘name’, etc.
Returns: Importance score from 0.0 to 10.0
Example:
from src.core.importance_analyzer import ImportanceAnalyzer
analyzer = ImportanceAnalyzer(
repo_path="/path/to/repo",
modules=modules_dict,
classes=classes_dict,
functions=functions_dict,
imports=imports_dict,
code_tree=tree
)
# Calculate module importance
module_node = {
'type': 'module',
'id': 'src.core.engine',
'name': 'engine'
}
score = analyzer.calculate_node_importance(module_node)
print(f"Importance score: {score}/10.0")
get_file_history_importance
Calculate importance based on Git commit history.
def get_file_history_importance(file_path: str) -> float
Path to the file to analyze
Returns: Importance score from 0.0 to 1.0 based on:
- Number of commits (files with more commits are more important)
- Recency of changes (recently modified files score higher)
Example:
git_score = analyzer.get_file_history_importance("src/core/engine.py")
# Returns higher score for frequently modified files
Importance Scoring Metrics
The ImportanceAnalyzer uses multiple weighted metrics to calculate importance:
1. Usage Analysis (Weight: 2.0)
Measures how frequently a component is referenced:
- Import count for modules
- Function call frequency
- Class instantiation count
2. Import Relationships (Weight: 3.0)
Analyzes inter-module dependencies using graph metrics:
- In-degree: How many modules import this module
- Out-degree: How many modules this imports
- PageRank: Centrality in dependency network
- Betweenness: Importance as a bridge between modules
3. Code Complexity (Weight: 1.0)
Evaluates code complexity through:
- Conditional statements (if/else)
- Loops (for/while)
- Exception handling (try/except)
- Function nesting depth
4. Semantic Importance (Weight: 0.5)
Identifies important patterns in names:
- Keywords: ‘main’, ‘core’, ‘engine’, ‘api’, ‘service’
- Entry points:
__main__, ‘app’, ‘config’
- Architectural patterns: ‘controller’, ‘manager’, ‘handler’
5. Git History (Weight: 4.0)
Analyzes version control history:
- Commit frequency
- Last modification time
- Change recency
Usage Example
from src.core.importance_analyzer import ImportanceAnalyzer
from src.core.tree_code import GlobalCodeTreeBuilder
# Build code tree
builder = GlobalCodeTreeBuilder("/path/to/repo")
builder.parse_repository()
# Create analyzer with custom weights
custom_weights = {
'usage': 3.0, # Emphasize usage patterns
'imports_relationships': 4.0, # Focus on architecture
'git_history': 2.0 # Reduce git weight
}
analyzer = ImportanceAnalyzer(
repo_path="/path/to/repo",
modules=builder.modules,
classes=builder.classes,
functions=builder.functions,
imports=builder.imports,
code_tree=builder.code_tree,
weights=custom_weights
)
# Analyze all modules and rank by importance
module_scores = []
for module_id, module_info in builder.modules.items():
node = {
'type': 'module',
'id': module_id,
'name': module_info['name']
}
score = analyzer.calculate_node_importance(node)
module_scores.append((module_id, score))
# Sort by importance
module_scores.sort(key=lambda x: x[1], reverse=True)
# Print top 10 most important modules
print("Top 10 Most Important Modules:")
for module_id, score in module_scores[:10]:
print(f"{score:5.2f} - {module_id}")
Advanced Features
Module Dependency Graph
The analyzer automatically builds a NetworkX directed graph of module dependencies:
# Access the dependency graph
graph = analyzer.module_dependency_graph
# Analyze specific module
module_id = "src.core.engine"
in_degree = graph.in_degree(module_id) # Modules that import this
out_degree = graph.out_degree(module_id) # Modules this imports
print(f"{module_id} is imported by {in_degree} modules")
print(f"{module_id} imports {out_degree} modules")
Important Keywords
The analyzer recognizes semantically important keywords:
- Core components: main, core, engine
- API layers: api, service, controller
- Infrastructure: manager, handler, processor
- Patterns: factory, builder, provider, repository
- Operations: executor, scheduler
- Configuration: config, security
See Also