AI Research Papers
122 landmark AI and LLM research papers ranked by composite score — covering transformers, agents, RAG, fine-tuning, alignment, and more. Each paper is scored on citations, quality, freshness, adoption, and community engagement.
Want to put cutting-edge AI research to work? Get a free audit — we translate the latest research into production agents for your business.
Get Free AI Audit →122 papers
Attention Is All You Need
by Google Brain
Introduced the Transformer architecture, replacing RNNs with self-attention for sequence-to-sequence tasks. This paper fundamentally changed the field of NLP and became the foundation for all modern large language models.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
by Google AI
Introduced BERT, a bidirectional Transformer pre-trained on masked language modeling and next sentence prediction. Established the pretrain-then-fine-tune paradigm that dominated NLP for years and achieved state-of-the-art on 11 NLP benchmarks.
Learning Transferable Visual Models From Natural Language Supervision (CLIP)
by OpenAI
Introduced CLIP (Contrastive Language-Image Pre-training), a model trained on 400 million image-text pairs using contrastive learning that achieves remarkable zero-shot transfer to diverse vision tasks. CLIP became foundational for vision-language alignment and generative AI pipelines.
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
by Google Brain
Introduced chain-of-thought prompting, a simple technique of providing exemplars with step-by-step reasoning traces in few-shot prompts. This approach dramatically improves LLM performance on arithmetic, commonsense, and symbolic reasoning tasks, with the effect emerging at approximately 100B parameters.
High-Resolution Image Synthesis with Latent Diffusion Models (Stable Diffusion)
by CompVis / Stability AI
Introduced Latent Diffusion Models (LDMs), which perform the diffusion process in a compressed latent space rather than pixel space, dramatically reducing computational cost while maintaining image quality. This work underpins Stable Diffusion, the most widely used open-source image generation model.
Language Models are Few-Shot Learners (GPT-3)
by OpenAI
Introduced GPT-3, a 175B parameter language model demonstrating remarkable few-shot learning capabilities across diverse tasks. Showed that scaling model size dramatically improves in-context learning without gradient updates, reshaping the field.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
by Google Brain
Introduced the Vision Transformer (ViT), demonstrating that a pure transformer applied directly to sequences of image patches achieves state-of-the-art performance on image classification when pretrained on large datasets. The paper challenged the dominance of convolutional neural networks in computer vision.
Training Language Models to Follow Instructions with Human Feedback
by OpenAI
Presents InstructGPT, which uses Reinforcement Learning from Human Feedback (RLHF) to align GPT-3 with human intent. By fine-tuning on human demonstrations and training a reward model on human preference comparisons, InstructGPT produces outputs that human evaluators prefer to GPT-3 outputs despite having 100× fewer parameters.
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
by Facebook AI Research
Introduces Retrieval-Augmented Generation (RAG), combining parametric memory (language model weights) with non-parametric memory (dense retrieval over Wikipedia) for knowledge-intensive NLP tasks. RAG models achieve state-of-the-art on open-domain QA benchmarks and produce more specific, factual, and diverse responses than pure parametric models.
Proximal Policy Optimization Algorithms
by OpenAI
PPO introduces a clipped surrogate objective that constrains policy update step sizes, achieving the stability of trust-region methods (TRPO) with the simplicity and scalability of first-order optimizers. It quickly became the dominant RL algorithm for training large language models with human feedback.
Highly Accurate Protein Structure Prediction with AlphaFold
by DeepMind
AlphaFold 2 achieves atomic-level accuracy in protein structure prediction by combining evolutionary information from multiple sequence alignments with a novel Evoformer architecture and structure module, solving a 50-year grand challenge in biology. Its predictions have been released for virtually all known proteins and have accelerated drug discovery, enzyme design, and structural biology worldwide.
GPT-4 Technical Report
by OpenAI
Technical report for GPT-4, OpenAI's multimodal large language model accepting image and text inputs. Demonstrates state-of-the-art performance on academic and professional benchmarks, including passing the bar exam in the top 10% of test takers.
Segment Anything
by Meta AI
Introduced the Segment Anything Model (SAM) and the SA-1B dataset of 1 billion masks on 11 million images. SAM is a promptable segmentation foundation model that generalizes to new image distributions and tasks without additional training, enabling a new paradigm of interactive segmentation.
Evaluating Large Language Models Trained on Code (Codex)
by OpenAI
Introduced Codex, a GPT language model fine-tuned on publicly available code from GitHub, and the HumanEval benchmark for measuring code synthesis from docstrings. Codex powers GitHub Copilot and represents a breakthrough in automated programming assistance.
ReAct: Synergizing Reasoning and Acting in Language Models
by Google / Princeton
Introduces ReAct, a paradigm that combines reasoning traces and task-specific actions in language models. By interleaving thinking steps with tool calls, ReAct agents outperform chain-of-thought and act-only baselines on diverse tasks including question answering, fact verification, and interactive decision-making.
LoRA: Low-Rank Adaptation of Large Language Models
by Microsoft Research
Introduces LoRA, which freezes pretrained model weights and injects trainable low-rank decomposition matrices into Transformer layers. Reduces trainable parameters by 10,000× and GPU memory by 3× with no inference latency overhead, enabling efficient LLM fine-tuning.
LLaMA: Open and Efficient Foundation Language Models
by Meta AI
Introduces LLaMA, a collection of foundation language models ranging from 7B to 65B parameters, trained on publicly available datasets. Showed that smaller models trained on more tokens can match or exceed larger models, democratizing LLM research.
Deep Reinforcement Learning from Human Preferences
by OpenAI
This foundational RLHF paper shows that human preference comparisons between agent behaviors can train a reward model that guides deep RL agents in complex tasks like Atari games and MuJoCo locomotion, without hand-crafted reward functions. The approach reduces human labeling effort by ~3 orders of magnitude compared to direct reward specification.
Gemini: A Family of Highly Capable Multimodal Models
by Google DeepMind
Introduced the Gemini family of multimodal models (Ultra, Pro, Nano) natively trained to process and combine text, images, audio, and video. Gemini Ultra is the first model to surpass human expert performance on MMLU and achieves state-of-the-art across 30 of 32 benchmarks evaluated.
Efficient Memory Management for Large Language Model Serving with PagedAttention
by UC Berkeley
Introduced PagedAttention and the vLLM serving system, which manages the KV cache in non-contiguous physical memory blocks inspired by OS paging, enabling near-zero memory waste and efficient sharing of KV cache across requests. vLLM achieves 2-4x higher throughput than HuggingFace Transformers and 1.7x over Orca.
Generative Agents: Interactive Simulacra of Human Behavior
by Stanford University / Google
Introduces generative agents—computational software agents that simulate believable human behavior—by combining a large language model with memory streams, reflection synthesis, and planning mechanisms. Twenty-five agents populate a virtual town, exhibiting emergent social behaviors including relationship formation, information propagation, and event coordination.
Hierarchical Text-Conditional Image Generation with CLIP Latents (DALL-E 2)
by OpenAI
Presented DALL-E 2 (unCLIP), a hierarchical text-conditional image generation system using CLIP image embeddings as a prior and a diffusion decoder. The system achieves state-of-the-art photorealism and text-image alignment, substantially advancing the field of text-to-image synthesis.
Training Language Models to Follow Instructions with Human Feedback (InstructGPT)
by OpenAI
Introduces InstructGPT, fine-tuning GPT-3 with Reinforcement Learning from Human Feedback (RLHF) to follow instructions. A 1.3B InstructGPT model is preferred over 175B GPT-3 by human labelers, establishing RLHF as the dominant alignment technique.
Self-Consistency Improves Chain of Thought Reasoning in Language Models
by Google Brain
Introduced self-consistency, a decoding strategy that samples diverse reasoning paths from a language model and returns the most consistent answer by marginalizing out the reasoning paths. Self-consistency is a simple, training-free technique that substantially improves chain-of-thought prompting across arithmetic and commonsense reasoning tasks.
Missing a paper?
Submit any AI research paper to the index. Our pipeline automatically scores it on citations, quality, and community impact.
Submit a Paper