Task Routing & Scheduling

Overview

RepoMaster’s task routing system automatically analyzes user requests and selects the optimal execution strategy. The scheduler agent acts as an intelligent orchestrator, choosing between web search, repository exploration, or general programming assistance.

Routing Architecture

Mode Selection Strategy

The scheduler agent analyzes tasks using a prioritized decision process defined in agent_scheduler.py:19-71.

1. Web Search Priority

Trigger Conditions:

Questions requiring real-time data
Current events or latest information
External documentation lookup
General knowledge queries

Implementation (agent_scheduler.py:127-140):

async def web_search(
    self, 
    query: Annotated[str, "Query for general web search"]
) -> str:
    """Perform general web search for real-time information"""
    return await self.repo_searcher.deep_search(query)

Example Queries:

“What is the stock price of APPLE?”
“Latest Python 3.12 features”
“How to use the new React hooks API?“

2. Repository Mode

Trigger Conditions:

User mentions local file paths or directories
Specific repository URLs provided
Task requires analyzing existing code
Need for specialized tools/libraries

Repository Type Detection (agent_scheduler.py:206-229):

# Automatically detect repository type
if repo_type is None:
    if repository.startswith(('http://', 'https://')) and 'github.com' in repository:
        repo_type = 'github'
    elif os.path.exists(repository):
        repo_type = 'local'
    else:
        # Try to determine if it's a GitHub URL format
        if repository.startswith(('http://', 'https://')) or repository.count('/') >= 1:
            repo_type = 'github'
        else:
            raise ValueError(f"Unable to determine repository type: {repository}")

GitHub Repository Mode

Process:

Validate GitHub URL format
Build repository configuration
Initialize TaskManager with GitHub URL
Clone repository to working directory
Execute task with CodeExplorer

Configuration:

repo_config = {
    "type": "github",
    "url": repository
}

Local Repository Mode

Process:

Validate local path exists
Build repository configuration
Initialize TaskManager with local path
Copy repository to working directory
Execute task with CodeExplorer

Configuration:

repo_config = {
    "type": "local",
    "path": repository
}

3. General Code Assistant Mode

Trigger Conditions:

General programming questions
Algorithm implementations
Code examples without specific repository
Debugging help

Implementation (agent_scheduler.py:273-332):

def run_general_code_assistant(
    self,
    task_description: Annotated[str, "Programming task"],
    work_directory: Annotated[Optional[str], "Working directory"] = None
):
    """Provide general programming assistance"""
    explorer = CodeExplorer(
        local_repo_path=None,  # No repository
        work_dir=work_dir,
        task_type="general",
        use_venv=True,
        is_cleanup_venv=False
    )
    
    result = asyncio.run(explorer.a_code_analysis(enhanced_task, max_turns=20))
    return result

Task Initialization Flow

TaskManager Initialization

The TaskManager class handles task environment setup (git_task.py:159-256). Key Method: initialize_tasks(args) (git_task.py:236-256):

@staticmethod
def initialize_tasks(args, root_path='coding'):
    """Initialize task environment and task list"""
    # Create unique working directory
    work_task_path = PathManager.create_unique_path(root_path)
    
    # Extract task configuration
    task_info = args.config_data
    task_id = "repo_master"
    
    # Build output configuration
    out_task_info = {
        'repo': task_info['repo'],
        'task_description': task_info['task_description'],
        'task_prompt': task_info.get('task_prompt', 
                                     TaskManager.get_task_prompt()),
        'input_data': task_info['input_data'],
        'parameters': task_info.get('parameters', {}),
        'root_path': root_path,
        'work_task_path': work_task_path,
        'task_id': task_id
    }
    
    return out_task_info

Task Prompt Generation

Template Structure (git_task.py:171-191):

def get_task_prompt():
    return """### Task Description
{task_description}

#### Repository Path (Absolute Path): 

Understanding Guide: ['Read README.md to understand basic functionality']

#### File Paths
- Input file paths and descriptions:
{input_data}

- Output file directory: 
Results must be saved in the {output_dir_path} directory.

#### Additional Notes
**Core Objective**: Quickly understand and analyze the code repository, 
generate and execute necessary code to complete user-specified tasks.
"""

Placeholder Replacement (git_task.py:210-222):

placeholders = {
    '{repo_path}': target_repo_path,
    '{input_data}': target_input_data,
    '{output_dir_path}': target_output_path,
    '{task_description}': task_info.get('task_description', '')
}
for placeholder, value in placeholders.items():
    task_desc = task_desc.replace(placeholder, value)

Environment Setup

The DataProcessor class handles environment preparation (git_task.py:62-157).

Repository Setup

GitHub Repository (git_task.py:116-124):

elif repo_type == 'github':
    repo_url = repo_info['url']
    repo_name = repo_url.split('/')[-1].replace('.git', '')
    target_repo_path = f"{work_dir}/{repo_name}"
    
    if not os.path.exists(target_repo_path):
        clone_cmd = f"git clone {repo_url} {target_repo_path}"
        subprocess.run(clone_cmd, shell=True, check=True)

Local Repository (git_task.py:108-114):

if repo_type == 'local':
    source_repo_path = repo_info['path']
    repo_name = Path(source_repo_path).name
    target_repo_path = f"{work_dir}/{repo_name}"
    
    if not os.path.exists(target_repo_path):
        os.system(f"cp -a {source_repo_path} {target_repo_path}")

Input Data Handling

Data Copy Process (git_task.py:136-156):

data_info = task_info.get('input_data', [])
try:
    new_data_info = []
    for data in data_info:
        data_path = data.get('path')
        data_desc = data.get('description')
        if data_path and os.path.exists(data_path):
            new_data_path = DataProcessor.copy_dataset(
                data_path, target_input_path
            )
            new_data_info.append({
                'path': new_data_path,
                'description': data_desc
            })
except Exception as e:
    print(f"Error occurred while copying dataset: {e}")

Input Data Format: Provide input data as JSON list:

[
  {
    "path": "/path/to/input.pdf",
    "description": "PDF file to process"
  }
]

Agent Execution Flow

Sequential Repository Search

Repository-First Approach (agent_scheduler.py:48-54):

Repository Search: Use github_repo_search to find relevant repositories
Sequential Execution: Select most promising repository
Result Evaluation: Critically evaluate if result satisfies requirements
Switching: If failed, select next best repository and retry
Continue: Until success or all options exhausted

Execution Logic:

# 1. Search for repositories
repo_list = github_repo_search(task)

# 2. Try each repository sequentially
for repo in repo_list:
    result = run_repository_agent(
        task_description=task,
        repository=repo['repo_url']
    )
    
    # 3. Evaluate result
    if result_satisfies_requirements(result):
        return result
    
    # 4. Try next repository
    continue

AgentRunner Execution

Main Execution Method (git_task.py:262-323):

@staticmethod
def run_agent(task_info, retry_times=2, work_dir=None):
    """Run Code Agent to execute tasks"""
    task_id = task_info['task_id']
    work_task_path = task_info['work_task_path']
    
    # Create working directory
    work_dir = work_dir if work_dir else f'{work_task_path}/{task_id}/workspace'
    os.makedirs(work_dir, exist_ok=True)
    
    # Prepare environment
    target_output_path, target_input_data, target_repo_path = \
        DataProcessor.setup_task_environment(task_info, work_dir)
    
    # Generate task description
    task = TaskManager.prepare_task_description(
        task_info, target_output_path, target_input_data, target_repo_path
    )
    
    # Run code agent
    explorer = CodeExplorer(
        target_repo_path,
        work_dir=work_dir,
        task_type="gitbench",
        use_venv=True,
        is_cleanup_venv=False
    )
    
    answer = asyncio.run(explorer.a_code_analysis(task, max_turns=20))
    
    # Retry logic
    if not os.path.exists(target_output_path) and retry_times > 0:
        return AgentRunner.run_agent(task_info, retry_times - 1)
    
    return answer

Path Management

The PathManager class provides utilities for directory and ID generation (git_task.py:20-60).

Unique Path Generation

Task ID Generation (git_task.py:24-27):

@staticmethod
def generate_task_id():
    """Generate random task ID"""
    date_str = datetime.now().strftime("%m%d_%H%M")
    return f'gitbench_{date_str}'

Unique Directory Creation (git_task.py:38-44):

@staticmethod
def create_unique_dir(base_path, prefix):
    """Create a unique directory that doesn't exist"""
    path = f'{base_path}/{prefix}'
    while os.path.exists(path):
        path = f'{path}_{random.randint(1, 10)}'
    os.makedirs(path, exist_ok=True)
    return path

Work Task Path

coding/gitbench_0304_1423/ - Root task directory

Workspace

workspace/ - Code execution environment

Input Dataset

input_dataset/ - User-provided input files

Output Result

output_result/ - Task execution outputs

Retry and Fallback Mechanisms

Automatic Retry

Retry Logic (git_task.py:310-312):

if not os.path.exists(target_output_path) and retry_times > 0:
    print(f"---Task {task_id} submission failed, retrying {retry_times} times---")
    return AgentRunner.run_agent(task_info, retry_times - 1)

Default Retry Configuration:

Initial retry count: 2
Triggers: Missing output directory
Action: Re-execute entire agent workflow

Mode Fallback

From scheduler system message (agent_scheduler.py:56-60):

3. Sequential Execution and Fallback:
   - If one approach doesn't yield a solution, consider trying an 
     alternative mode (e.g., switching to General Code Assistant Mode)
   - Be persistent in finding a solution
   - If one repository doesn't work, try another

Task Tracing and Debugging

Conversation History Export

Trace File Generation (agent_code_explore.py:309-315):

task_trace_dir = f"res/trace/code_analysis_{self.task_id}"

if os.path.exists(self.work_dir):
    with open(f"{self.work_dir}/trace_{timestamp}.txt", "w") as f:
        f.write(json.dumps(messages, ensure_ascii=False, indent=2))

Result Extraction

Final Answer Extraction (agent_scheduler.py:394-414):

def _extract_final_answer(self, chat_result) -> str:
    """Extract final answer from chat history"""
    final_answer = chat_result.summary
    
    if isinstance(final_answer, dict):
        final_answer = final_answer['content']
    
    if final_answer is None:
        final_answer = ""
    final_answer = final_answer.strip().lstrip()
    
    # Fallback to last message content
    messages = chat_result.chat_history
    final_content = messages[-1].get("content", "")
    
    if final_answer == "":
        final_answer = final_content
    
    return final_answer

Working Directory Structure

coding/
└── gitbench_0304_1423/
    └── repo_master/
        └── workspace/
            ├── repository_name/     # Cloned/copied repository
            ├── input_dataset/       # Input data files
            ├── output_result/       # Generated outputs
            ├── task_info.json       # Task configuration
            └── trace_*.txt          # Conversation history

Best Practices

For Task Description

Be Specific

Provide clear, detailed task descriptions with:

Exact requirements
Expected output format
Input data descriptions
Success criteria

Include Context

Mention:

Repository URLs or paths
Related documentation
Dependencies or requirements
Example inputs/outputs

For Repository Tasks

Use absolute paths for local repositories
Ensure GitHub URLs are accessible
Provide input data descriptions
Specify output directory requirements

For Performance

Set appropriate max_turns (default: 20)
Use retry count wisely (default: 2)
Enable virtual environments for isolation
Monitor working directory sizes

Next Steps

Repository Exploration

Learn about code analysis tools

Multi-Agent System

Understand agent collaboration

Getting Started

Core Concepts

Agents

Usage Modes

Configuration

Use Cases

Documentation Index

​Overview

​Routing Architecture

​Mode Selection Strategy

​1. Web Search Priority

​2. Repository Mode

​3. General Code Assistant Mode

​Task Initialization Flow

​TaskManager Initialization

​Task Prompt Generation

​Environment Setup

​Repository Setup

​Input Data Handling

​Agent Execution Flow

​Sequential Repository Search

​AgentRunner Execution

​Path Management

​Unique Path Generation

Work Task Path

Workspace

Input Dataset

Output Result

​Retry and Fallback Mechanisms

​Automatic Retry

​Mode Fallback

​Task Tracing and Debugging

​Conversation History Export

​Result Extraction

​Working Directory Structure

​Best Practices

​For Task Description

​For Repository Tasks

​For Performance

​Next Steps

Repository Exploration

Multi-Agent System

Build docs developers (and LLMs) love

Overview

Routing Architecture

Mode Selection Strategy

1. Web Search Priority

2. Repository Mode

3. General Code Assistant Mode

Task Initialization Flow

TaskManager Initialization

Task Prompt Generation

Environment Setup

Repository Setup

Input Data Handling

Agent Execution Flow

Sequential Repository Search

AgentRunner Execution

Path Management

Unique Path Generation

Retry and Fallback Mechanisms

Automatic Retry

Mode Fallback

Task Tracing and Debugging

Conversation History Export

Result Extraction

Working Directory Structure

Best Practices

For Task Description

For Repository Tasks

For Performance

Next Steps