Skip to main content
PaperLLMsv1.0

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

by Princeton University · free · Last verified 2026-03-17

Introduced SWE-bench, a benchmark of 2,294 real GitHub issues from 12 popular Python repositories requiring models to resolve issues by writing code patches. SWE-bench reveals that even the best LLMs resolve fewer than 4% of issues with standard techniques, motivating research into code agents.

https://arxiv.org/abs/2310.06770
B+
B+Good
Adoption: AQuality: A+Freshness: ACitations: B+Engagement: F

Specifications

License
MIT
Pricing
free
Capabilities
software-engineering-evaluation, code-patch-generation, issue-resolution
Integrations
Use Cases
ai-software-engineering, code-agent-evaluation, benchmark-research
API Available
No
Tags
swe-bench, software-engineering, benchmark, github, code-agents
Added
2026-03-17
Completeness
100%

Index Score

71.3
Adoption
85
Quality
93
Freshness
84
Citations
75
Engagement
0

Put AI to work for your business

Deploy this paper alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service