Skip to main content
The Repository Agent provides advanced capabilities for working with code repositories. It can analyze repository structure, understand codebases, and execute tasks using existing code from GitHub or local repositories.

Overview

Implemented in src/core/agent_scheduler.py:178, the run_repository_agent() method provides a unified interface for executing tasks based on repositories. It automatically detects repository type (GitHub or local) and orchestrates the complete task execution workflow.

Key Features

Unified Interface

Single method handles both GitHub and local repositories automatically

Repository Analysis

Hierarchical analysis of repository structure and code

Autonomous Execution

Independent code exploration and task execution

Input Data Support

Process tasks with user-provided input files

Usage

Basic Usage

from src.core.agent_scheduler import RepoMasterAgent
from configs.oai_config import get_llm_config
import os

# Initialize
llm_config = get_llm_config()
code_execution_config = {
    "work_dir": "coding/workspace",
    "use_docker": False
}

agent = RepoMasterAgent(
    llm_config=llm_config,
    code_execution_config=code_execution_config
)

# Use with GitHub repository
result = agent.run_repository_agent(
    task_description="Extract text from PDF and save to txt file",
    repository="https://github.com/jsvine/pdfplumber",
    input_data='[{"path": "document.pdf", "description": "Input PDF file"}]'
)

CLI Mode

# Launch Repository Agent directly
python launcher.py --mode repository_agent

# Interactive prompts
📝 Please describe your task: Extract all tables from PDF
📁 Please enter repository path or URL: https://github.com/jsvine/pdfplumber
🗂️  Do you need to provide input data files? (y/N): y
📂 Please enter data file path: /path/to/document.pdf

🔧 Processing repository task...
📋 Task result: [execution results]

Method Signature

run_repository_agent()

Unified interface for executing user tasks based on specified repository. Source: src/core/agent_scheduler.py:178
task_description
str
required
Task description that the user needs to solve. Maintain completeness without omitting any information.
repository
str
required
Repository path or URL. Can be:
  • GitHub repository URL: https://github.com/owner/repo
  • Local repository absolute path: /path/to/my/project
input_data
str
default:"None"
JSON string representing local input data. Format:
'[{"path": "file.csv", "description": "Input data description"}]'
Must be provided when task requires local files as input.
repo_type
str
default:"None"
Repository type: 'github' or 'local'. Auto-detected if not specified.
Returns: str - Result of agent executing the task, including completion status and output description

Repository Type Detection

The agent automatically detects repository type based on the repository parameter: Source: src/core/agent_scheduler.py:206
# Auto-detection logic
if repository.startswith(('http://', 'https://')) and 'github.com' in repository:
    repo_type = 'github'
elif os.path.exists(repository):
    repo_type = 'local'
else:
    # Fallback: check if it looks like a URL
    if repository.startswith(('http://', 'https://')) or repository.count('/') >= 1:
        repo_type = 'github'

Input Data Format

When your task requires input files, provide them in JSON format:

Single File

input_data = '[{"path": "/path/to/data.csv", "description": "Sales data"}]'

Multiple Files

input_data = '''[
    {"path": "/path/to/train.csv", "description": "Training data"},
    {"path": "/path/to/test.csv", "description": "Test data"},
    {"path": "/path/to/config.json", "description": "Configuration"}
]'''

Validation

Source: src/core/agent_scheduler.py:232 The agent validates input data structure:
  • Must be valid JSON array
  • Each element must be a dictionary
  • Each dictionary must have path and description fields

Task Workflow

The Repository Agent follows this workflow:

Implementation Details

Source: src/core/agent_scheduler.py:246
  1. Repository Configuration: Build config based on type
  2. Task Initialization: Call TaskManager.initialize_tasks()
  3. Agent Execution: Run AgentRunner.run_agent() with retry logic
  4. Result Return: Provide execution results and output descriptions

Example Use Cases

# Use a specialized library from GitHub
result = agent.run_repository_agent(
    task_description="""
    Extract all tables from a PDF file and save as CSV.
    Use the pdfplumber library functionality.
    """,
    repository="https://github.com/jsvine/pdfplumber",
    input_data='[{"path": "report.pdf", "description": "Annual report"}]'
)

Integration with Task Manager

The Repository Agent uses TaskManager and AgentRunner for execution: Source: src/core/agent_scheduler.py:258
# Build task configuration
args = argparse.Namespace(
    config_data={
        "repo": repo_config,
        "task_description": task_description,
        "input_data": input_data,
        "root_path": self.work_dir,
    },
    root_path='coding',
)

# Initialize and execute
task_info = TaskManager.initialize_tasks(args)
result = AgentRunner.run_agent(
    task_info, 
    retry_times=1, 
    work_dir=self.work_dir
)

Scheduler Integration

The Repository Agent is part of the RepoMaster multi-agent system and can be invoked automatically: Source: src/core/agent_scheduler.py:19 (scheduler system message)

Automatic Selection Triggers

The scheduler automatically selects Repository Agent when:
  • User mentions local file paths or directories
  • User provides GitHub URLs or repository paths
  • Task requires specialized tools/libraries from repositories
  • User wants to find existing solutions
Before using repositories, you can search for relevant ones: Source: src/core/agent_scheduler.py:142
# Search for repositories
repo_list = await agent.github_repo_search(
    task="Find Python libraries for PDF text extraction"
)

# Returns JSON with top 5 repositories:
# [
#     {
#         "repo_name": "jsvine/pdfplumber",
#         "repo_url": "https://github.com/jsvine/pdfplumber",
#         "repo_description": "Plumb a PDF for detailed information..."
#     },
#     ...
# ]

Error Handling

The agent includes comprehensive validation:
try:
    result = agent.run_repository_agent(
        task_description=task,
        repository=repo_path,
        input_data=data
    )
except ValueError as e:
    # Invalid repository path or input data format
    print(f"Validation error: {e}")
except Exception as e:
    # Execution errors
    print(f"Execution error: {e}")

Common Errors

ValueError: Local repository path does not exist: /invalid/path
Solution: Verify the path exists and is accessible
ValueError: Invalid GitHub repository URL format: not-a-url
Solution: Use format https://github.com/owner/repo
ValueError: input_data format error, please check input data format
Solution: Ensure input_data is valid JSON array with required fields
AssertionError: Each data item must contain 'path' field
Solution: Include both ‘path’ and ‘description’ in each input data item

Performance Considerations

Repository Cloning: GitHub repositories are cloned on first use, which may take time for large repos. Consider using shallow clones for faster initialization.
Retry Logic: The agent runner includes retry logic (retry_times=1) to handle transient failures gracefully.

Limitations

  • Repository Size: Very large repositories may be slow to clone and analyze
  • Private Repositories: Requires appropriate authentication for private GitHub repos
  • Dependencies: Repository dependencies must be installable in the execution environment
  • Python Focus: Primarily optimized for Python repositories

Next Steps

Unified Interface

Learn about intelligent multi-agent orchestration

Deep Search Agent

Search for repositories and information

Build docs developers (and LLMs) love