DatasetAI for Codev1.0

CodeSearchNet

by GitHub / Microsoft Research · open-source · Last verified 2026-03-17

A dataset and benchmark challenge for code retrieval and search containing 2 million (code, documentation) pairs in six programming languages — Python, Java, JavaScript, PHP, Ruby, and Go — curated by GitHub and Microsoft Research. It is the canonical benchmark for code-to-natural-language and natural-language-to-code retrieval tasks and is widely used to evaluate code embedding models.

https://huggingface.co/datasets/code_search_net ↗

C—Below Average

Adoption: B+Quality: AFreshness: CCitations: FEngagement: F

Specifications

License: MIT
Pricing: open-source
Capabilities: code-search, code-documentation, evaluation
Integrations: hugging-face
Use Cases: code-retrieval, documentation-generation, model-evaluation
API Available: Yes
Tags: code, code-search, documentation, function-docstring, evaluation
Added: 2026-03-17
Completeness: 100%

Index Score

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service