PaperLLMsv1.0

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

by Princeton University · free · Last verified 2026-03-17

Introduced SWE-bench, a benchmark of 2,294 real GitHub issues from 12 popular Python repositories requiring models to resolve issues by writing code patches. SWE-bench reveals that even the best LLMs resolve fewer than 4% of issues with standard techniques, motivating research into code agents.

https://arxiv.org/abs/2310.06770 ↗

B+

B+—Good

Adoption: AQuality: A+Freshness: ACitations: B+Engagement: F

Specifications

License: MIT
Pricing: free
Capabilities: software-engineering-evaluation, code-patch-generation, issue-resolution
Integrations: GitHub, git, Python
Use Cases: ai-software-engineering, code-agent-evaluation, benchmark-research
API Available: No
Tags: swe-bench, software-engineering, benchmark, github, code-agents
Added: 2026-03-17
Completeness: 100%

Index Score

71.3

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service