Skip to main content

Overview

RepoMaster’s task routing system automatically analyzes user requests and selects the optimal execution strategy. The scheduler agent acts as an intelligent orchestrator, choosing between web search, repository exploration, or general programming assistance.

Routing Architecture

Mode Selection Strategy

The scheduler agent analyzes tasks using a prioritized decision process defined in agent_scheduler.py:19-71.

1. Web Search Priority

Trigger Conditions:
  • Questions requiring real-time data
  • Current events or latest information
  • External documentation lookup
  • General knowledge queries
Implementation (agent_scheduler.py:127-140):
async def web_search(
    self, 
    query: Annotated[str, "Query for general web search"]
) -> str:
    """Perform general web search for real-time information"""
    return await self.repo_searcher.deep_search(query)
Example Queries:
  • “What is the stock price of APPLE?”
  • “Latest Python 3.12 features”
  • “How to use the new React hooks API?“

2. Repository Mode

Trigger Conditions:
  • User mentions local file paths or directories
  • Specific repository URLs provided
  • Task requires analyzing existing code
  • Need for specialized tools/libraries
Repository Type Detection (agent_scheduler.py:206-229):
# Automatically detect repository type
if repo_type is None:
    if repository.startswith(('http://', 'https://')) and 'github.com' in repository:
        repo_type = 'github'
    elif os.path.exists(repository):
        repo_type = 'local'
    else:
        # Try to determine if it's a GitHub URL format
        if repository.startswith(('http://', 'https://')) or repository.count('/') >= 1:
            repo_type = 'github'
        else:
            raise ValueError(f"Unable to determine repository type: {repository}")
Process:
  1. Validate GitHub URL format
  2. Build repository configuration
  3. Initialize TaskManager with GitHub URL
  4. Clone repository to working directory
  5. Execute task with CodeExplorer
Configuration:
repo_config = {
    "type": "github",
    "url": repository
}
Process:
  1. Validate local path exists
  2. Build repository configuration
  3. Initialize TaskManager with local path
  4. Copy repository to working directory
  5. Execute task with CodeExplorer
Configuration:
repo_config = {
    "type": "local",
    "path": repository
}

3. General Code Assistant Mode

Trigger Conditions:
  • General programming questions
  • Algorithm implementations
  • Code examples without specific repository
  • Debugging help
Implementation (agent_scheduler.py:273-332):
def run_general_code_assistant(
    self,
    task_description: Annotated[str, "Programming task"],
    work_directory: Annotated[Optional[str], "Working directory"] = None
):
    """Provide general programming assistance"""
    explorer = CodeExplorer(
        local_repo_path=None,  # No repository
        work_dir=work_dir,
        task_type="general",
        use_venv=True,
        is_cleanup_venv=False
    )
    
    result = asyncio.run(explorer.a_code_analysis(enhanced_task, max_turns=20))
    return result

Task Initialization Flow

TaskManager Initialization

The TaskManager class handles task environment setup (git_task.py:159-256). Key Method: initialize_tasks(args) (git_task.py:236-256):
@staticmethod
def initialize_tasks(args, root_path='coding'):
    """Initialize task environment and task list"""
    # Create unique working directory
    work_task_path = PathManager.create_unique_path(root_path)
    
    # Extract task configuration
    task_info = args.config_data
    task_id = "repo_master"
    
    # Build output configuration
    out_task_info = {
        'repo': task_info['repo'],
        'task_description': task_info['task_description'],
        'task_prompt': task_info.get('task_prompt', 
                                     TaskManager.get_task_prompt()),
        'input_data': task_info['input_data'],
        'parameters': task_info.get('parameters', {}),
        'root_path': root_path,
        'work_task_path': work_task_path,
        'task_id': task_id
    }
    
    return out_task_info

Task Prompt Generation

Template Structure (git_task.py:171-191):
def get_task_prompt():
    return """### Task Description
{task_description}

#### Repository Path (Absolute Path): 
Understanding Guide: ['Read README.md to understand basic functionality']

#### File Paths
- Input file paths and descriptions:
{input_data}

- Output file directory: 
Results must be saved in the {output_dir_path} directory.

#### Additional Notes
**Core Objective**: Quickly understand and analyze the code repository, 
generate and execute necessary code to complete user-specified tasks.
"""
Placeholder Replacement (git_task.py:210-222):
placeholders = {
    '{repo_path}': target_repo_path,
    '{input_data}': target_input_data,
    '{output_dir_path}': target_output_path,
    '{task_description}': task_info.get('task_description', '')
}
for placeholder, value in placeholders.items():
    task_desc = task_desc.replace(placeholder, value)

Environment Setup

The DataProcessor class handles environment preparation (git_task.py:62-157).

Repository Setup

GitHub Repository (git_task.py:116-124):
elif repo_type == 'github':
    repo_url = repo_info['url']
    repo_name = repo_url.split('/')[-1].replace('.git', '')
    target_repo_path = f"{work_dir}/{repo_name}"
    
    if not os.path.exists(target_repo_path):
        clone_cmd = f"git clone {repo_url} {target_repo_path}"
        subprocess.run(clone_cmd, shell=True, check=True)
Local Repository (git_task.py:108-114):
if repo_type == 'local':
    source_repo_path = repo_info['path']
    repo_name = Path(source_repo_path).name
    target_repo_path = f"{work_dir}/{repo_name}"
    
    if not os.path.exists(target_repo_path):
        os.system(f"cp -a {source_repo_path} {target_repo_path}")

Input Data Handling

Data Copy Process (git_task.py:136-156):
data_info = task_info.get('input_data', [])
try:
    new_data_info = []
    for data in data_info:
        data_path = data.get('path')
        data_desc = data.get('description')
        if data_path and os.path.exists(data_path):
            new_data_path = DataProcessor.copy_dataset(
                data_path, target_input_path
            )
            new_data_info.append({
                'path': new_data_path,
                'description': data_desc
            })
except Exception as e:
    print(f"Error occurred while copying dataset: {e}")
Input Data Format: Provide input data as JSON list:
[
  {
    "path": "/path/to/input.pdf",
    "description": "PDF file to process"
  }
]

Agent Execution Flow

Repository-First Approach (agent_scheduler.py:48-54):
1. Repository Search: Use github_repo_search to find relevant repositories
2. Sequential Execution: Select most promising repository
3. Result Evaluation: Critically evaluate if result satisfies requirements
4. Switching: If failed, select next best repository and retry
5. Continue: Until success or all options exhausted
Execution Logic:
# 1. Search for repositories
repo_list = github_repo_search(task)

# 2. Try each repository sequentially
for repo in repo_list:
    result = run_repository_agent(
        task_description=task,
        repository=repo['repo_url']
    )
    
    # 3. Evaluate result
    if result_satisfies_requirements(result):
        return result
    
    # 4. Try next repository
    continue

AgentRunner Execution

Main Execution Method (git_task.py:262-323):
@staticmethod
def run_agent(task_info, retry_times=2, work_dir=None):
    """Run Code Agent to execute tasks"""
    task_id = task_info['task_id']
    work_task_path = task_info['work_task_path']
    
    # Create working directory
    work_dir = work_dir if work_dir else f'{work_task_path}/{task_id}/workspace'
    os.makedirs(work_dir, exist_ok=True)
    
    # Prepare environment
    target_output_path, target_input_data, target_repo_path = \
        DataProcessor.setup_task_environment(task_info, work_dir)
    
    # Generate task description
    task = TaskManager.prepare_task_description(
        task_info, target_output_path, target_input_data, target_repo_path
    )
    
    # Run code agent
    explorer = CodeExplorer(
        target_repo_path,
        work_dir=work_dir,
        task_type="gitbench",
        use_venv=True,
        is_cleanup_venv=False
    )
    
    answer = asyncio.run(explorer.a_code_analysis(task, max_turns=20))
    
    # Retry logic
    if not os.path.exists(target_output_path) and retry_times > 0:
        return AgentRunner.run_agent(task_info, retry_times - 1)
    
    return answer

Path Management

The PathManager class provides utilities for directory and ID generation (git_task.py:20-60).

Unique Path Generation

Task ID Generation (git_task.py:24-27):
@staticmethod
def generate_task_id():
    """Generate random task ID"""
    date_str = datetime.now().strftime("%m%d_%H%M")
    return f'gitbench_{date_str}'
Unique Directory Creation (git_task.py:38-44):
@staticmethod
def create_unique_dir(base_path, prefix):
    """Create a unique directory that doesn't exist"""
    path = f'{base_path}/{prefix}'
    while os.path.exists(path):
        path = f'{path}_{random.randint(1, 10)}'
    os.makedirs(path, exist_ok=True)
    return path

Work Task Path

coding/gitbench_0304_1423/ - Root task directory

Workspace

workspace/ - Code execution environment

Input Dataset

input_dataset/ - User-provided input files

Output Result

output_result/ - Task execution outputs

Retry and Fallback Mechanisms

Automatic Retry

Retry Logic (git_task.py:310-312):
if not os.path.exists(target_output_path) and retry_times > 0:
    print(f"---Task {task_id} submission failed, retrying {retry_times} times---")
    return AgentRunner.run_agent(task_info, retry_times - 1)
Default Retry Configuration:
  • Initial retry count: 2
  • Triggers: Missing output directory
  • Action: Re-execute entire agent workflow

Mode Fallback

From scheduler system message (agent_scheduler.py:56-60):
3. Sequential Execution and Fallback:
   - If one approach doesn't yield a solution, consider trying an 
     alternative mode (e.g., switching to General Code Assistant Mode)
   - Be persistent in finding a solution
   - If one repository doesn't work, try another

Task Tracing and Debugging

Conversation History Export

Trace File Generation (agent_code_explore.py:309-315):
task_trace_dir = f"res/trace/code_analysis_{self.task_id}"

if os.path.exists(self.work_dir):
    with open(f"{self.work_dir}/trace_{timestamp}.txt", "w") as f:
        f.write(json.dumps(messages, ensure_ascii=False, indent=2))

Result Extraction

Final Answer Extraction (agent_scheduler.py:394-414):
def _extract_final_answer(self, chat_result) -> str:
    """Extract final answer from chat history"""
    final_answer = chat_result.summary
    
    if isinstance(final_answer, dict):
        final_answer = final_answer['content']
    
    if final_answer is None:
        final_answer = ""
    final_answer = final_answer.strip().lstrip()
    
    # Fallback to last message content
    messages = chat_result.chat_history
    final_content = messages[-1].get("content", "")
    
    if final_answer == "":
        final_answer = final_content
    
    return final_answer

Working Directory Structure

coding/
└── gitbench_0304_1423/
    └── repo_master/
        └── workspace/
            ├── repository_name/     # Cloned/copied repository
            ├── input_dataset/       # Input data files
            ├── output_result/       # Generated outputs
            ├── task_info.json       # Task configuration
            └── trace_*.txt          # Conversation history

Best Practices

For Task Description

Provide clear, detailed task descriptions with:
  • Exact requirements
  • Expected output format
  • Input data descriptions
  • Success criteria
Mention:
  • Repository URLs or paths
  • Related documentation
  • Dependencies or requirements
  • Example inputs/outputs

For Repository Tasks

  • Use absolute paths for local repositories
  • Ensure GitHub URLs are accessible
  • Provide input data descriptions
  • Specify output directory requirements

For Performance

  • Set appropriate max_turns (default: 20)
  • Use retry count wisely (default: 2)
  • Enable virtual environments for isolation
  • Monitor working directory sizes

Next Steps

Repository Exploration

Learn about code analysis tools

Multi-Agent System

Understand agent collaboration

Build docs developers (and LLMs) love