Analyze and execute tasks using GitHub or local repositories
The Repository Agent provides advanced capabilities for working with code repositories. It can analyze repository structure, understand codebases, and execute tasks using existing code from GitHub or local repositories.
Implemented in src/core/agent_scheduler.py:178, the run_repository_agent() method provides a unified interface for executing tasks based on repositories. It automatically detects repository type (GitHub or local) and orchestrates the complete task execution workflow.
from src.core.agent_scheduler import RepoMasterAgentfrom configs.oai_config import get_llm_configimport os# Initializellm_config = get_llm_config()code_execution_config = { "work_dir": "coding/workspace", "use_docker": False}agent = RepoMasterAgent( llm_config=llm_config, code_execution_config=code_execution_config)# Use with GitHub repositoryresult = agent.run_repository_agent( task_description="Extract text from PDF and save to txt file", repository="https://github.com/jsvine/pdfplumber", input_data='[{"path": "document.pdf", "description": "Input PDF file"}]')
# Launch Repository Agent directlypython launcher.py --mode repository_agent# Interactive prompts📝 Please describe your task: Extract all tables from PDF📁 Please enter repository path or URL: https://github.com/jsvine/pdfplumber🗂️ Do you need to provide input data files? (y/N): y📂 Please enter data file path: /path/to/document.pdf🔧 Processing repository task...📋 Task result: [execution results]
# Launch unified mode - automatically selects Repository Agentpython launcher.py --mode unified🤖 Please describe your task: Use the pandas library from my local repo at /home/user/pandas to analyze sales data🔧 Intelligent task analysis...📊 Selecting optimal processing method...[System automatically detects and uses Repository Agent]
The agent automatically detects repository type based on the repository parameter:Source:src/core/agent_scheduler.py:206
# Auto-detection logicif repository.startswith(('http://', 'https://')) and 'github.com' in repository: repo_type = 'github'elif os.path.exists(repository): repo_type = 'local'else: # Fallback: check if it looks like a URL if repository.startswith(('http://', 'https://')) or repository.count('/') >= 1: repo_type = 'github'
# Use a specialized library from GitHubresult = agent.run_repository_agent( task_description=""" Extract all tables from a PDF file and save as CSV. Use the pdfplumber library functionality. """, repository="https://github.com/jsvine/pdfplumber", input_data='[{"path": "report.pdf", "description": "Annual report"}]')
# Analyze local codebaseresult = agent.run_repository_agent( task_description=""" Add comprehensive docstrings to all functions in the utils module. Follow Google style guide. """, repository="/home/user/projects/my_app", repo_type="local")
# Process data using repository coderesult = agent.run_repository_agent( task_description=""" Use the data processing pipeline in this repository to: 1. Clean the input data 2. Apply feature engineering 3. Generate summary statistics """, repository="https://github.com/username/data-pipeline", input_data='''[ {"path": "raw_data.csv", "description": "Raw sensor data"}, {"path": "config.yaml", "description": "Processing config"} ]''')
The Repository Agent is part of the RepoMaster multi-agent system and can be invoked automatically:Source:src/core/agent_scheduler.py:19 (scheduler system message)
Before using repositories, you can search for relevant ones:Source:src/core/agent_scheduler.py:142
# Search for repositoriesrepo_list = await agent.github_repo_search( task="Find Python libraries for PDF text extraction")# Returns JSON with top 5 repositories:# [# {# "repo_name": "jsvine/pdfplumber",# "repo_url": "https://github.com/jsvine/pdfplumber",# "repo_description": "Plumb a PDF for detailed information..."# },# ...# ]
Repository Cloning: GitHub repositories are cloned on first use, which may take time for large repos. Consider using shallow clones for faster initialization.
Retry Logic: The agent runner includes retry logic (retry_times=1) to handle transient failures gracefully.