Skip to main content
DatasetAI for Codev1.0

CodeSearchNet

by GitHub / Microsoft Research · open-source · Last verified 2026-03-17

A dataset and benchmark challenge for code retrieval and search containing 2 million (code, documentation) pairs in six programming languages — Python, Java, JavaScript, PHP, Ruby, and Go — curated by GitHub and Microsoft Research. It is the canonical benchmark for code-to-natural-language and natural-language-to-code retrieval tasks and is widely used to evaluate code embedding models.

https://huggingface.co/datasets/code_search_net
B+
B+Good
Adoption: B+Quality: AFreshness: CCitations: AEngagement: F

Specifications

License
MIT
Pricing
open-source
Capabilities
code-search, code-documentation, evaluation
Integrations
hugging-face
Use Cases
code-retrieval, documentation-generation, model-evaluation
API Available
Yes
Tags
code, code-search, documentation, function-docstring, evaluation
Added
2026-03-17
Completeness
100%

Index Score

70.4
Adoption
78
Quality
86
Freshness
45
Citations
88
Engagement
0

Put AI to work for your business

Deploy this dataset alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service