The replication workflow helps you plan and execute reproductions of published experiments, benchmark results, or specific claims. It extracts implementation details, produces a structured replication plan, and can guide execution across multiple compute environments including local, Docker, Modal, and RunPod.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/getcompanion-ai/feynman/llms.txt
Use this file to discover all available pages before exploring further.
Invocation
- CLI
- REPL
Workflow stages
Extract
The
researcher subagent pulls all implementation details from the target paper and any linked code: model architecture, hyperparameters, training schedule, dataset preparation, evaluation protocol, and hardware requirements.If CHANGELOG.md exists in your workspace, the most recent relevant entries are read before planning or resuming, enabling the workflow to pick up where it left off.Plan
A structured replication plan is produced, specifying:
- What code, datasets, metrics, and environment are needed
- What is verified vs. inferred vs. still missing
- Which checks or test oracles will be used to decide whether replication succeeded
Environment selection
Before running anything, you are asked where to execute the experiment:
| Environment | Description |
|---|---|
| Local | Run in the current working directory |
| Virtual environment | Create an isolated venv or conda environment first |
| Docker | Run experiment code inside an isolated Docker container |
| Modal | Run on Modal’s serverless GPU infrastructure. A Modal-decorated Python script is written and executed with modal run <script.py>. Requires modal CLI (pip install modal && modal setup). Best for burst GPU jobs that don’t need persistent state. |
| RunPod | Provision a GPU pod on RunPod and SSH in for execution. Uses runpodctl to create pods, transfer files, and manage lifecycle. Requires runpodctl CLI and RUNPOD_API_KEY. Best for long-running experiments or when you need SSH access and persistent storage. |
| Plan only | Produce the replication plan without executing |
If you choose “Plan only”, the workflow produces a complete, actionable plan that you can hand off to a human researcher or execute yourself.
Execute
If you chose an execution environment, the replication steps are implemented and run there. Notes, scripts, raw outputs, and results are saved to disk in a reproducible layout under
experiments/.The outcome is not marked as replicated unless the planned checks actually passed.Log
For multi-step or resumable replication work, concise entries are appended to
CHANGELOG.md after meaningful progress, failed attempts, and major verification outcomes. Each entry records the active objective, what changed, what was checked, and the next step.Outputs
| Artifact | Path |
|---|---|
| Replication plan | outputs/.plans/<slug>.md |
| Experiment code and scripts | experiments/ |
| Run logs and raw outputs | experiments/<slug>/ |
| Progress log | CHANGELOG.md |
Compute environment notes
Modal (serverless GPU)
Modal is best for burst workloads: training jobs, inference benchmarks, or evaluation sweeps that need a GPU but do not require persistent storage between runs. The workflow generates a complete Modal script with the appropriate decorators (@app.function, GPU selection, image specification) and runs it directly.
RunPod (persistent GPU pods)
RunPod is best for long-running experiments or when you need a stable SSH session, persistent storage, and full control over the environment. The workflow usesrunpodctl to provision the pod, transfer files, and manage the lifecycle.
Related
- Paper Audit — compare paper claims against a public codebase without running code
- Deep Research — investigate a topic before deciding what to replicate