What are the most important AI research papers?

The most foundational AI papers include 'Attention Is All You Need' (Transformers, 2017), 'GPT-3: Language Models are Few-Shot Learners' (2020), 'ReAct: Synergizing Reasoning and Acting in Language Models' (2022), and 'Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks' (2020). The AaaS paper index tracks these and hundreds more, ranked by citation count and community impact.

How are AI papers scored on AaaS?

AI papers are scored across five dimensions: citations (academic citation count), quality (methodological rigor and reproducibility), freshness (recency and follow-up work), adoption (industry implementation), and engagement (community discussion on Twitter/X, Hugging Face, and arXiv). These combine into a 0–100 composite score.

Where can I read these AI papers?

Most AI research papers are freely available on arXiv (arxiv.org). Each paper in the AaaS index links to the original publication, arXiv preprint, or journal source. Many papers also have associated GitHub repositories with code implementations.

What AI papers should I read to understand LLM agents?

Key papers for understanding LLM agents include ReAct (reasoning + acting), Toolformer (learning tool use), AutoGPT and BabyAGI (early autonomous agent systems), Chain-of-Thought Prompting, and Constitutional AI. Browse the AaaS paper index filtered by 'agent' category for a curated reading list.

Knowledge Index

AI Research Papers

122 landmark AI and LLM research papers ranked by composite score — covering transformers, agents, RAG, fine-tuning, alignment, and more. Each paper is scored on citations, quality, freshness, adoption, and community engagement.

122 Papers IndexedBest Papers 2026 →Compare Papers →Submit a Paper →

Want to put cutting-edge AI research to work? Get a free audit — we translate the latest research into production agents for your business.

Get Free AI Audit →

122 papers

PaperLLMs

Attention Is All You Need

by Google Brain

Introduced the Transformer architecture, replacing RNNs with self-attention for sequence-to-sequence tasks. This paper fundamentally changed the field of NLP and became the foundation for all modern large language models.

transformersattentionnlp

84.1A

PaperLLMs

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

by Google AI

Introduced BERT, a bidirectional Transformer pre-trained on masked language modeling and next sentence prediction. Established the pretrain-then-fine-tune paradigm that dominated NLP for years and achieved state-of-the-art on 11 NLP benchmarks.

bertpre-trainingbidirectional

82.8A

PaperComputer Vision

Learning Transferable Visual Models From Natural Language Supervision (CLIP)

by OpenAI

Introduced CLIP (Contrastive Language-Image Pre-training), a model trained on 400 million image-text pairs using contrastive learning that achieves remarkable zero-shot transfer to diverse vision tasks. CLIP became foundational for vision-language alignment and generative AI pipelines.

clipcontrastive-learningzero-shot

82.2A

PaperLLMs

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

by Google Brain

Introduced chain-of-thought prompting, a simple technique of providing exemplars with step-by-step reasoning traces in few-shot prompts. This approach dramatically improves LLM performance on arithmetic, commonsense, and symbolic reasoning tasks, with the effect emerging at approximately 100B parameters.

chain-of-thoughtreasoningprompting

82.1A

PaperComputer Vision

High-Resolution Image Synthesis with Latent Diffusion Models (Stable Diffusion)

by CompVis / Stability AI

Introduced Latent Diffusion Models (LDMs), which perform the diffusion process in a compressed latent space rather than pixel space, dramatically reducing computational cost while maintaining image quality. This work underpins Stable Diffusion, the most widely used open-source image generation model.

stable-diffusionlatent-diffusiontext-to-image

82A

PaperLLMs

Language Models are Few-Shot Learners (GPT-3)

by OpenAI

Introduced GPT-3, a 175B parameter language model demonstrating remarkable few-shot learning capabilities across diverse tasks. Showed that scaling model size dramatically improves in-context learning without gradient updates, reshaping the field.

gpt-3few-shotin-context-learning

82A

PaperComputer Vision

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

by Google Brain

Introduced the Vision Transformer (ViT), demonstrating that a pure transformer applied directly to sequences of image patches achieves state-of-the-art performance on image classification when pretrained on large datasets. The paper challenged the dominance of convolutional neural networks in computer vision.

vision-transformerimage-classificationattention

81.9A

PaperAI Ethics & Safety

Training Language Models to Follow Instructions with Human Feedback

by OpenAI

Presents InstructGPT, which uses Reinforcement Learning from Human Feedback (RLHF) to align GPT-3 with human intent. By fine-tuning on human demonstrations and training a reward model on human preference comparisons, InstructGPT produces outputs that human evaluators prefer to GPT-3 outputs despite having 100× fewer parameters.

rlhfalignmentinstruction-following

81.8A

PaperAI Agents

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

by Facebook AI Research

Introduces Retrieval-Augmented Generation (RAG), combining parametric memory (language model weights) with non-parametric memory (dense retrieval over Wikipedia) for knowledge-intensive NLP tasks. RAG models achieve state-of-the-art on open-domain QA benchmarks and produce more specific, factual, and diverse responses than pure parametric models.

ragretrievalgeneration

81.2A

Paperreinforcement-learning

Proximal Policy Optimization Algorithms

by OpenAI

PPO introduces a clipped surrogate objective that constrains policy update step sizes, achieving the stability of trust-region methods (TRPO) with the simplicity and scalability of first-order optimizers. It quickly became the dominant RL algorithm for training large language models with human feedback.

reinforcement-learningppopolicy-gradient

81.1A

Paperdomain-specific

Highly Accurate Protein Structure Prediction with AlphaFold

by DeepMind

AlphaFold 2 achieves atomic-level accuracy in protein structure prediction by combining evolutionary information from multiple sequence alignments with a novel Evoformer architecture and structure module, solving a 50-year grand challenge in biology. Its predictions have been released for virtually all known proteins and have accelerated drug discovery, enzyme design, and structural biology worldwide.

biologyprotein-structurealphafold

81.1A

PaperLLMs

GPT-4 Technical Report

by OpenAI

Technical report for GPT-4, OpenAI's multimodal large language model accepting image and text inputs. Demonstrates state-of-the-art performance on academic and professional benchmarks, including passing the bar exam in the top 10% of test takers.

gpt-4multimodalrlhf

81A

PaperComputer Vision

Segment Anything

by Meta AI

Introduced the Segment Anything Model (SAM) and the SA-1B dataset of 1 billion masks on 11 million images. SAM is a promptable segmentation foundation model that generalizes to new image distributions and tasks without additional training, enabling a new paradigm of interactive segmentation.

segmentationfoundation-modelpromptable

79.2B+

PaperLLMs

Evaluating Large Language Models Trained on Code (Codex)

by OpenAI

Introduced Codex, a GPT language model fine-tuned on publicly available code from GitHub, and the HumanEval benchmark for measuring code synthesis from docstrings. Codex powers GitHub Copilot and represents a breakthrough in automated programming assistance.

codexcode-generationgithub-copilot

79.2B+

PaperAI Agents

ReAct: Synergizing Reasoning and Acting in Language Models

by Google / Princeton

Introduces ReAct, a paradigm that combines reasoning traces and task-specific actions in language models. By interleaving thinking steps with tool calls, ReAct agents outperform chain-of-thought and act-only baselines on diverse tasks including question answering, fact verification, and interactive decision-making.

agentsreasoningtool-use

79B+

Papertraining

LoRA: Low-Rank Adaptation of Large Language Models

by Microsoft Research

Introduces LoRA, which freezes pretrained model weights and injects trainable low-rank decomposition matrices into Transformer layers. Reduces trainable parameters by 10,000× and GPU memory by 3× with no inference latency overhead, enabling efficient LLM fine-tuning.

lorafine-tuninglow-rank

78.8B+

PaperLLMs

LLaMA: Open and Efficient Foundation Language Models

by Meta AI

Introduces LLaMA, a collection of foundation language models ranging from 7B to 65B parameters, trained on publicly available datasets. Showed that smaller models trained on more tokens can match or exceed larger models, democratizing LLM research.

llamaopen-sourceefficient

78.1B+

Paperreinforcement-learning

Deep Reinforcement Learning from Human Preferences

by OpenAI

This foundational RLHF paper shows that human preference comparisons between agent behaviors can train a reward model that guides deep RL agents in complex tasks like Atari games and MuJoCo locomotion, without hand-crafted reward functions. The approach reduces human labeling effort by ~3 orders of magnitude compared to direct reward specification.

reinforcement-learningrlhfhuman-feedback

78B+

PaperLLMs

Gemini: A Family of Highly Capable Multimodal Models

by Google DeepMind

Introduced the Gemini family of multimodal models (Ultra, Pro, Nano) natively trained to process and combine text, images, audio, and video. Gemini Ultra is the first model to surpass human expert performance on MMLU and achieves state-of-the-art across 30 of 32 benchmarks evaluated.

geminimultimodalgoogle

77.8B+

PaperLLMs

Efficient Memory Management for Large Language Model Serving with PagedAttention

by UC Berkeley

Introduced PagedAttention and the vLLM serving system, which manages the KV cache in non-contiguous physical memory blocks inspired by OS paging, enabling near-zero memory waste and efficient sharing of KV cache across requests. vLLM achieves 2-4x higher throughput than HuggingFace Transformers and 1.7x over Orca.

paged-attentionvllminference

77.7B+

PaperAI Agents

Generative Agents: Interactive Simulacra of Human Behavior

by Stanford University / Google

Introduces generative agents—computational software agents that simulate believable human behavior—by combining a large language model with memory streams, reflection synthesis, and planning mechanisms. Twenty-five agents populate a virtual town, exhibiting emergent social behaviors including relationship formation, information propagation, and event coordination.

agentssimulationsocial

77.3B+

PaperComputer Vision

Hierarchical Text-Conditional Image Generation with CLIP Latents (DALL-E 2)

by OpenAI

Presented DALL-E 2 (unCLIP), a hierarchical text-conditional image generation system using CLIP image embeddings as a prior and a diffusion decoder. The system achieves state-of-the-art photorealism and text-image alignment, substantially advancing the field of text-to-image synthesis.

dall-e-2text-to-imagediffusion

77.1B+

Papertraining

Training Language Models to Follow Instructions with Human Feedback (InstructGPT)

by OpenAI

Introduces InstructGPT, fine-tuning GPT-3 with Reinforcement Learning from Human Feedback (RLHF) to follow instructions. A 1.3B InstructGPT model is preferred over 175B GPT-3 by human labelers, establishing RLHF as the dominant alignment technique.

rlhfinstructgptalignment

77B+

PaperLLMs

Self-Consistency Improves Chain of Thought Reasoning in Language Models

by Google Brain

Introduced self-consistency, a decoding strategy that samples diverse reasoning paths from a language model and returns the most consistent answer by marginalizing out the reasoning paths. Self-consistency is a simple, training-free technique that substantially improves chain-of-thought prompting across arithmetic and commonsense reasoning tasks.

self-consistencychain-of-thoughtreasoning

76.7B+

Missing a paper?

Submit any AI research paper to the index. Our pipeline automatically scores it on citations, quality, and community impact.

Submit a Paper