Versioning System
Overview
SWE-bench assigns each task instance a specific version
with respect to its repository. This version information is crucial for ensuring reproducible execution-based evaluation, as it determines the exact installation instructions needed for that repository.
How Versioning Works
When task instances are created, they are assigned a version based on the repository's state at the time of issue creation. This version information enables the evaluation harness to set up the correct environment for testing the proposed patch.
Tools for Version Management
General Purpose Tool: get_versions.py
The get_versions.py
script is a general-purpose tool for retrieving version information. It can obtain versions through two methods:
- Reading directly from the GitHub repository
- Building the repository locally and locating appropriate version files
The script assigns each task instance a new version: <value>
key/value pair in its metadata.
Usage
The script can be invoked via the run_get_version.sh
wrapper script:
python get_versions.py \
--instances_path [Required] [folder] Path to candidate task instances \
--retrieval_method [Required] [choice] Method to retrieve versions ("build", "mix", or "github") \
--cleanup [Required] [bool] Remove testbed and conda environments upon task completion \
--conda_env [Required] [str] Name of conda environment to run task installation within \
--num_workers [Required] [int] Number of processes to parallelize on \
--path_conda [Required] [folder] Path to miniconda or anaconda installation \
--output_dir [Required] [folder] Path to directory to write versioned task instances to \
--testbed [Required] [folder] Path to testbed directory, for cloning GitHub repos to
Repository-Specific Version Extraction
For certain repositories, SWE-bench provides specialized scripts in the extract_web/
directory that crawl the package's website to find versions and their cutoff dates.
These scripts (like get_versions_*.py
) can be adapted to other repositories to check task instances' creation_date
against the version dates.
Integration with Evaluation
The version information is used by the evaluation harness to:
- Set up the correct Docker environment for each task
- Install the proper dependencies based on the version
- Ensure consistent evaluation conditions across runs
This versioning system is a key component in making SWE-bench evaluations reproducible and reliable.