User Guide Overview
This guide provides detailed information about using the SWE-bench CLI. Each command is documented with examples and common use cases.
Available Commands
- submit: Submit model predictions for evaluation
- get-report: Retrieve evaluation reports
- list-runs: View all your submitted runs
- delete-run: Remove a specific run
Dataset Information
SWE-bench has different subsets and splits available:
Subsets
swe-bench-m
: The main datasetswe-bench_lite
: A smaller subset for testing and development
Splits
dev
: Development/validation splittest
: Test split (currently only available forswe-bench_lite
)
Common Workflows
-
Basic Evaluation:
-
Development Testing:
-
Managing Runs: