Overview

SWE-bench Multimodal

SWE-bench Multimodal augments the original benchmark with 517 issues that contain visual elements such as:

  • Screenshots of bugs or interface issues
  • Design mockups or wireframes
  • Diagrams explaining desired functionality
  • Error messages with visual context

This extension evaluates models' ability to interpret and act on information presented in both textual and visual formats.

Correspondence

For questions about SWE-bench Multimodal, please contact:

Citation

If you use SWE-bench Multimodal in your research, please cite our paper:

@inproceedings{yang2025swebench,
    title={SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?},
    author={John Yang and Carlos E Jimenez and Alex L Zhang and Kilian Lieret and Joyce Yang and Xindi Wu and Ori Press and Niklas Muennighoff and Gabriel Synnaeve and Karthik R Narasimhan and Diyi Yang and Sida Wang and Ofir Press},
    booktitle={The Thirteenth International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=riTiq3i21b}
}