Skip to main content
PaperLLMsv1.0

Evaluating Large Language Models Trained on Code (Codex)

by OpenAI · paid · Last verified 2026-03-17

Introduced Codex, a GPT language model fine-tuned on publicly available code from GitHub, and the HumanEval benchmark for measuring code synthesis from docstrings. Codex powers GitHub Copilot and represents a breakthrough in automated programming assistance.

https://arxiv.org/abs/2107.03374
B+
B+Good
Adoption: A+Quality: A+Freshness: B+Citations: A+Engagement: F

Specifications

License
Proprietary
Pricing
paid
Capabilities
code-generation, code-completion, docstring-to-code, unit-test-generation
Integrations
github-copilot
Use Cases
automated-programming, code-completion, developer-productivity
API Available
No
Tags
codex, code-generation, github-copilot, python, humaneval
Added
2026-03-17
Completeness
100%

Index Score

79.2
Adoption
95
Quality
90
Freshness
71
Citations
93
Engagement
0

Put AI to work for your business

Deploy this paper alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service