Skip to content

User Guide Overview

This guide provides detailed information about using the SWE-bench CLI. Each command is documented with examples and common use cases.

Available Commands

Dataset Information

SWE-bench has different subsets and splits available:

Subsets

  • swe-bench-m: The main dataset
  • swe-bench_lite: A smaller subset for testing and development

Splits

  • dev: Development/validation split
  • test: Test split (currently only available for swe-bench_lite)

Common Workflows

  1. Basic Evaluation:

    sb-cli submit swe-bench-m dev --predictions_path preds.json --run_id my_run
    sb-cli get-report swe-bench-m dev my_run
    

  2. Development Testing:

    sb-cli submit swe-bench_lite dev --predictions_path test.json --run_id test_run
    

  3. Managing Runs:

    sb-cli list-runs swe-bench-m dev
    sb-cli delete-run swe-bench-m dev old_run_id