Repository Agent

The Repository Agent provides advanced capabilities for working with code repositories. It can analyze repository structure, understand codebases, and execute tasks using existing code from GitHub or local repositories.

Overview

Implemented in src/core/agent_scheduler.py:178, the run_repository_agent() method provides a unified interface for executing tasks based on repositories. It automatically detects repository type (GitHub or local) and orchestrates the complete task execution workflow.

Key Features

Unified Interface

Single method handles both GitHub and local repositories automatically

Repository Analysis

Hierarchical analysis of repository structure and code

Autonomous Execution

Independent code exploration and task execution

Input Data Support

Process tasks with user-provided input files

Usage

Basic Usage

from src.core.agent_scheduler import RepoMasterAgent
from configs.oai_config import get_llm_config
import os

# Initialize
llm_config = get_llm_config()
code_execution_config = {
    "work_dir": "coding/workspace",
    "use_docker": False
}

agent = RepoMasterAgent(
    llm_config=llm_config,
    code_execution_config=code_execution_config
)

# Use with GitHub repository
result = agent.run_repository_agent(
    task_description="Extract text from PDF and save to txt file",
    repository="https://github.com/jsvine/pdfplumber",
    input_data='[{"path": "document.pdf", "description": "Input PDF file"}]'
)

CLI Mode

Direct Mode
Via Unified Interface

# Launch Repository Agent directly
python launcher.py --mode repository_agent

# Interactive prompts
📝 Please describe your task: Extract all tables from PDF
📁 Please enter repository path or URL: https://github.com/jsvine/pdfplumber
🗂️  Do you need to provide input data files? (y/N): y
📂 Please enter data file path: /path/to/document.pdf

🔧 Processing repository task...
📋 Task result: [execution results]

# Launch unified mode - automatically selects Repository Agent
python launcher.py --mode unified

🤖 Please describe your task:
   Use the pandas library from my local repo at /home/user/pandas 
   to analyze sales data

🔧 Intelligent task analysis...
📊 Selecting optimal processing method...
[System automatically detects and uses Repository Agent]

Method Signature

run_repository_agent()

Unified interface for executing user tasks based on specified repository. Source: src/core/agent_scheduler.py:178

task_description

str

required

Task description that the user needs to solve. Maintain completeness without omitting any information.

repository

str

required

Repository path or URL. Can be:

GitHub repository URL: https://github.com/owner/repo
Local repository absolute path: /path/to/my/project

input_data

str

default:"None"

JSON string representing local input data. Format:

'[{"path": "file.csv", "description": "Input data description"}]'

Must be provided when task requires local files as input.

repo_type

str

default:"None"

Repository type: 'github' or 'local'. Auto-detected if not specified.

Returns: str - Result of agent executing the task, including completion status and output description

Repository Type Detection

The agent automatically detects repository type based on the repository parameter: Source: src/core/agent_scheduler.py:206

# Auto-detection logic
if repository.startswith(('http://', 'https://')) and 'github.com' in repository:
    repo_type = 'github'
elif os.path.exists(repository):
    repo_type = 'local'
else:
    # Fallback: check if it looks like a URL
    if repository.startswith(('http://', 'https://')) or repository.count('/') >= 1:
        repo_type = 'github'

Input Data Format

When your task requires input files, provide them in JSON format:

Single File

input_data = '[{"path": "/path/to/data.csv", "description": "Sales data"}]'

Multiple Files

input_data = '''[
    {"path": "/path/to/train.csv", "description": "Training data"},
    {"path": "/path/to/test.csv", "description": "Test data"},
    {"path": "/path/to/config.json", "description": "Configuration"}
]'''

Validation

Source: src/core/agent_scheduler.py:232 The agent validates input data structure:

Must be valid JSON array
Each element must be a dictionary
Each dictionary must have path and description fields

Task Workflow

The Repository Agent follows this workflow:

Implementation Details

Source: src/core/agent_scheduler.py:246

Repository Configuration: Build config based on type
Task Initialization: Call TaskManager.initialize_tasks()
Agent Execution: Run AgentRunner.run_agent() with retry logic
Result Return: Provide execution results and output descriptions

Example Use Cases

GitHub Repository
Local Repository
Data Processing

# Use a specialized library from GitHub
result = agent.run_repository_agent(
    task_description="""
    Extract all tables from a PDF file and save as CSV.
    Use the pdfplumber library functionality.
    """,
    repository="https://github.com/jsvine/pdfplumber",
    input_data='[{"path": "report.pdf", "description": "Annual report"}]'
)

# Analyze local codebase
result = agent.run_repository_agent(
    task_description="""
    Add comprehensive docstrings to all functions in the utils module.
    Follow Google style guide.
    """,
    repository="/home/user/projects/my_app",
    repo_type="local"
)

# Process data using repository code
result = agent.run_repository_agent(
    task_description="""
    Use the data processing pipeline in this repository to:
    1. Clean the input data
    2. Apply feature engineering
    3. Generate summary statistics
    """,
    repository="https://github.com/username/data-pipeline",
    input_data='''[
        {"path": "raw_data.csv", "description": "Raw sensor data"},
        {"path": "config.yaml", "description": "Processing config"}
    ]'''
)

Integration with Task Manager

The Repository Agent uses TaskManager and AgentRunner for execution: Source: src/core/agent_scheduler.py:258

# Build task configuration
args = argparse.Namespace(
    config_data={
        "repo": repo_config,
        "task_description": task_description,
        "input_data": input_data,
        "root_path": self.work_dir,
    },
    root_path='coding',
)

# Initialize and execute
task_info = TaskManager.initialize_tasks(args)
result = AgentRunner.run_agent(
    task_info, 
    retry_times=1, 
    work_dir=self.work_dir
)

Scheduler Integration

The Repository Agent is part of the RepoMaster multi-agent system and can be invoked automatically: Source: src/core/agent_scheduler.py:19 (scheduler system message)

Automatic Selection Triggers

The scheduler automatically selects Repository Agent when:

User mentions local file paths or directories
User provides GitHub URLs or repository paths
Task requires specialized tools/libraries from repositories
User wants to find existing solutions

GitHub Repository Search

Before using repositories, you can search for relevant ones: Source: src/core/agent_scheduler.py:142

# Search for repositories
repo_list = await agent.github_repo_search(
    task="Find Python libraries for PDF text extraction"
)

# Returns JSON with top 5 repositories:
# [
#     {
#         "repo_name": "jsvine/pdfplumber",
#         "repo_url": "https://github.com/jsvine/pdfplumber",
#         "repo_description": "Plumb a PDF for detailed information..."
#     },
#     ...
# ]

Error Handling

The agent includes comprehensive validation:

try:
    result = agent.run_repository_agent(
        task_description=task,
        repository=repo_path,
        input_data=data
    )
except ValueError as e:
    # Invalid repository path or input data format
    print(f"Validation error: {e}")
except Exception as e:
    # Execution errors
    print(f"Execution error: {e}")

Common Errors

Invalid Repository Path

ValueError: Local repository path does not exist: /invalid/path

Solution: Verify the path exists and is accessible

Invalid GitHub URL

ValueError: Invalid GitHub repository URL format: not-a-url

Solution: Use format https://github.com/owner/repo

Invalid Input Data

ValueError: input_data format error, please check input data format

Solution: Ensure input_data is valid JSON array with required fields

Missing Required Fields

AssertionError: Each data item must contain 'path' field

Solution: Include both ‘path’ and ‘description’ in each input data item

Performance Considerations

Repository Cloning: GitHub repositories are cloned on first use, which may take time for large repos. Consider using shallow clones for faster initialization.

Retry Logic: The agent runner includes retry logic (retry_times=1) to handle transient failures gracefully.

Limitations

Repository Size: Very large repositories may be slow to clone and analyze
Private Repositories: Requires appropriate authentication for private GitHub repos
Dependencies: Repository dependencies must be installable in the execution environment
Python Focus: Primarily optimized for Python repositories

Getting Started

Core Concepts

Agents

Usage Modes

Configuration

Use Cases

Overview

Key Features

Unified Interface

Repository Analysis

Autonomous Execution

Input Data Support

Usage

Basic Usage

CLI Mode

Method Signature

run_repository_agent()

Repository Type Detection

Input Data Format

Single File

Multiple Files

Validation

Task Workflow

Implementation Details

Example Use Cases

Integration with Task Manager

Scheduler Integration

Automatic Selection Triggers

GitHub Repository Search

Error Handling

Common Errors

Performance Considerations

Limitations

Next Steps

Unified Interface

Deep Search Agent

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Agents

Usage Modes

Configuration

Use Cases

Documentation Index

​Overview

​Key Features

Unified Interface

Repository Analysis

Autonomous Execution

Input Data Support

​Usage

​Basic Usage

​CLI Mode

​Method Signature

​run_repository_agent()

​Repository Type Detection

​Input Data Format

​Single File

​Multiple Files

​Validation

​Task Workflow

​Implementation Details

​Example Use Cases

​Integration with Task Manager

​Scheduler Integration

​Automatic Selection Triggers

​GitHub Repository Search

​Error Handling

​Common Errors

​Performance Considerations

​Limitations

​Next Steps

Unified Interface

Deep Search Agent

Build docs developers (and LLMs) love

Overview

Key Features

Usage

Basic Usage

CLI Mode

Method Signature

run_repository_agent()

Repository Type Detection

Input Data Format

Single File

Multiple Files

Validation

Task Workflow

Implementation Details

Example Use Cases

Integration with Task Manager

Scheduler Integration

Automatic Selection Triggers

GitHub Repository Search

Error Handling

Common Errors

Performance Considerations

Limitations

Next Steps