PaperLLMsv1.0

Evaluating Large Language Models Trained on Code (Codex)

by OpenAI · paid · Last verified 2026-03-17

Introduced Codex, a GPT language model fine-tuned on publicly available code from GitHub, and the HumanEval benchmark for measuring code synthesis from docstrings. Codex powers GitHub Copilot and represents a breakthrough in automated programming assistance.

https://arxiv.org/abs/2107.03374 ↗

C+

C+—Average

Adoption: A+Quality: A+Freshness: B+Citations: FEngagement: F

Specifications

License: Proprietary
Pricing: paid
Capabilities: code-generation, code-completion, docstring-to-code, unit-test-generation
Integrations: github-copilot
Use Cases: automated-programming, code-completion, developer-productivity
API Available: No
Tags: codex, code-generation, github-copilot, python, humaneval
Added: 2026-03-17
Completeness: 100%

Index Score

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service