BenchmarkLLMsv1.0

Needle-in-a-Haystack

by Greg Kamradt (community) · open-source · Last verified 2026-03-17

Needle-in-a-Haystack is a pressure test for long-context language models that places a single fact (the needle) at a specific position within a long document (the haystack) and asks the model to retrieve it. It systematically varies both context length and needle depth to reveal performance degradation patterns.

https://github.com/gkamradt/LLMTest_NeedleInAHaystack ↗

B+

B+—Good

Adoption: AQuality: AFreshness: ACitations: AEngagement: F

Specifications

License: MIT
Pricing: open-source
Capabilities: evaluation, long-context-evaluation, retrieval-testing
Integrations
Use Cases: model-evaluation, long-context-ai
API Available: No
Evaluated Models: gpt-4o, claude-opus-4, gemini-2-5-pro, llama-3-70b
Metrics: retrieval-accuracy
Methodology: A unique fact is inserted at varying positions (10%–100% depth) within Paul Graham essays ranging from 1K to 128K tokens. The model is asked to retrieve the fact; accuracy is plotted as a heatmap over context length × depth.
Last Run: 2026-03-01
Tags: long-context, retrieval, single-fact, pressure-test, context-length
Added: 2026-03-17
Completeness: 100%

Index Score

70.4

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service