Skip to main content
brand
context
industry
strategy
AaaS
Channel

LLMs

Large language models, fine-tuning, RAG, and inference

30 entities in this channel

PaperLLMs

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

by Google AI

Introduced BERT, a bidirectional Transformer pre-trained on masked language modeling and next sentence prediction. Established the pretrain-then-fine-tune paradigm that dominated NLP for years and achieved state-of-the-art on 11 NLP benchmarks.

bertpre-trainingbidirectional
82.8A
PaperLLMs

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

by Google Brain

Introduced chain-of-thought prompting, a simple technique of providing exemplars with step-by-step reasoning traces in few-shot prompts. This approach dramatically improves LLM performance on arithmetic, commonsense, and symbolic reasoning tasks, with the effect emerging at approximately 100B parameters.

chain-of-thoughtreasoningprompting
82.1A
PaperLLMs

Language Models are Few-Shot Learners (GPT-3)

by OpenAI

Introduced GPT-3, a 175B parameter language model demonstrating remarkable few-shot learning capabilities across diverse tasks. Showed that scaling model size dramatically improves in-context learning without gradient updates, reshaping the field.

gpt-3few-shotin-context-learning
82A
PaperLLMs

GPT-4 Technical Report

by OpenAI

Technical report for GPT-4, OpenAI's multimodal large language model accepting image and text inputs. Demonstrates state-of-the-art performance on academic and professional benchmarks, including passing the bar exam in the top 10% of test takers.

gpt-4multimodalrlhf
81A
DatasetLLMs

Wikipedia Dump

by Wikimedia Foundation

The full text dump of Wikipedia articles available in over 300 languages, regularly updated and distributed by the Wikimedia Foundation. It is one of the most universally included components in language model pretraining pipelines due to its high factual density, editorial quality, and broad topical coverage.

nlpencyclopedicfactual
80.2A
PaperLLMs

Evaluating Large Language Models Trained on Code (Codex)

by OpenAI

Introduced Codex, a GPT language model fine-tuned on publicly available code from GitHub, and the HumanEval benchmark for measuring code synthesis from docstrings. Codex powers GitHub Copilot and represents a breakthrough in automated programming assistance.

codexcode-generationgithub-copilot
79.2B+
ModelLLMs

GPT-5

by OpenAI

OpenAI's frontier model with advanced reasoning, native multimodal understanding, and robust function calling. Designed for complex enterprise workflows and agentic applications.

llmreasoningmultimodal
78.7B+
ModelLLMs

GPT-4o

by OpenAI

OpenAI's natively multimodal flagship model processing text, image, and audio inputs with a single unified architecture. Delivers GPT-4 Turbo-level intelligence at 2x speed and 50% lower cost, with breakthrough real-time voice capabilities.

llmmultimodalomni
78.1B+
PaperLLMs

LLaMA: Open and Efficient Foundation Language Models

by Meta AI

Introduces LLaMA, a collection of foundation language models ranging from 7B to 65B parameters, trained on publicly available datasets. Showed that smaller models trained on more tokens can match or exceed larger models, democratizing LLM research.

llamaopen-sourceefficient
78.1B+
ModelLLMs

Claude 4

by Anthropic

Anthropic's most capable model featuring advanced reasoning, coding, and multimodal capabilities. Excels at complex analysis, agentic tasks, and extended thinking with industry-leading safety.

llmreasoningcoding
78B+
ModelLLMs

GPT-4

by OpenAI

OpenAI's breakthrough large language model that demonstrated a significant leap in reasoning and factual accuracy over GPT-3.5. Widely adopted across enterprise and developer workflows for code generation, analysis, and complex problem-solving.

llmreasoningmultimodal
77.9B+
ModelLLMs

Claude 3.5 Sonnet

by Anthropic

Anthropic's breakout model that surpassed Claude 3 Opus at Sonnet-tier pricing, setting new industry benchmarks for coding. Introduced computer use capability and became the most popular model on the API due to its exceptional intelligence-to-cost ratio.

llmcodingmultimodal
77.7B+
SkillLLMs

Chain-of-Thought

by AaaS

Guides LLMs to produce step-by-step reasoning before arriving at a final answer. Dramatically improves performance on math, logic, and multi-step problems by making the model's reasoning process explicit and verifiable.

promptingreasoningchain-of-thought
76.6B+
SkillLLMs

Prompt Engineering

by AaaS

The foundational discipline of crafting effective prompts to elicit desired behaviors from language models. Covers system prompt design, instruction formatting, output structuring, temperature tuning, and iterative prompt refinement techniques.

promptingengineeringoptimization
76.5B+
DatasetLLMs

Common Crawl

by Common Crawl Foundation

The world's largest open repository of web crawl data, maintained by the non-profit Common Crawl Foundation and updated with new crawls monthly since 2011. It forms the foundational raw data layer for virtually every major language model pretraining pipeline including GPT-3, LLaMA, PaLM, and Falcon, typically after quality filtering and deduplication steps.

nlpweb-crawlmassive-scale
76.4B+
ModelLLMs

BERT

by Google

BERT (Bidirectional Encoder Representations from Transformers) is Google's landmark 2018 language model that introduced the bidirectional pre-training paradigm using masked language modeling and next sentence prediction. It revolutionized NLP by demonstrating that a single pre-trained model could achieve state-of-the-art results across dozens of downstream tasks with minimal fine-tuning.

foundationalgoogletransformer
76.3B+
BenchmarkLLMs

GSM8K

by OpenAI

Grade School Math 8K benchmark with 8,500 linguistically diverse grade school math word problems requiring 2-8 step reasoning. Tests basic mathematical reasoning and arithmetic with problems that require sequential multi-step solutions.

benchmarkevaluationmath
75.7B+
BenchmarkLLMs

MATH

by UC Berkeley

Collection of 12,500 competition mathematics problems from AMC, AIME, and other math competitions covering algebra, geometry, number theory, combinatorics, and more. Problems require multi-step reasoning and mathematical insight beyond pattern matching.

benchmarkevaluationmathematics
74.4B+
BenchmarkLLMs

ARC-AGI

by Chollet / ARC Prize Foundation

ARC-AGI (Abstraction and Reasoning Corpus for Artificial General Intelligence) measures fluid intelligence through visual grid transformation puzzles. Models must infer transformation rules from three or fewer examples and apply them to a test grid — a task trivially solved by humans but historically extremely difficult for AI systems.

agiabstract-reasoningvisual-patterns
74.1B+
BenchmarkLLMs

HellaSwag

by Allen AI

Evaluates commonsense natural language inference by asking models to select the most plausible continuation of a scenario. Uses adversarially filtered endings generated by language models, making it challenging for machines while trivial for humans.

benchmarkevaluationcommonsense
74B+
SkillLLMs

Few-Shot Learning

by AaaS

Teaches LLMs to perform tasks by providing a small number of input-output examples in the prompt. Enables rapid task adaptation without fine-tuning by demonstrating the desired pattern through carefully selected, representative examples.

promptingfew-shotexamples
73.5B+
BenchmarkLLMs

MLPerf Inference

by MLCommons

MLPerf Inference is the industry-standard benchmark for measuring AI inference performance across hardware platforms. It covers image classification, object detection, NLP, speech recognition, and generative AI workloads, enabling fair apples-to-apples comparison of accelerators and inference stacks.

inferencethroughputlatency
73.1B+
BenchmarkLLMs

ARC Challenge

by Allen AI

AI2 Reasoning Challenge featuring grade-school science questions that require commonsense reasoning and world knowledge. The Challenge set contains questions that simple retrieval and co-occurrence methods fail to answer correctly.

benchmarkevaluationscience
73.1B+
DatasetLLMs

BookCorpus

by University of Toronto

A dataset of over 11,000 unpublished books spanning fiction and non-fiction genres, originally scraped from Smashwords and used as the primary pretraining corpus for BERT alongside Wikipedia. It provides rich long-range dependency data that helps models learn coherent narrative structure and extended discourse patterns.

nlpbookslong-form
71.3B+
SkillLLMs

Summarization

by AaaS

Condenses long documents into concise summaries while preserving key information and maintaining factual accuracy. Supports extractive, abstractive, and hierarchical summarization with configurable length, style, and focus area parameters.

summarizationcondensationnlp
69.8B
SkillLLMs

RAG Retrieval

by AaaS

A technique that enhances large language models by dynamically retrieving relevant information from an external knowledge base. This process grounds the model's responses in factual data, reducing hallucinations and enabling it to answer questions about information not present in its original training data.

ragretrieval-augmented-generationllm
68.3B
SkillLLMs

Semantic Search

by AaaS

Enables meaning-based retrieval by converting queries and documents into dense vector representations and finding nearest neighbors. Foundational skill for any RAG pipeline or knowledge-base-powered agent.

searchembeddingssimilarity
67.6B
DatasetLLMs

OpenWebText

by EleutherAI

OpenWebText is a large-scale, open-source English text corpus created by scraping web pages linked from Reddit. Designed as a public replication of OpenAI's original WebText dataset used for GPT-2, it contains approximately 38 GB of text filtered by Reddit upvotes to ensure a baseline of quality and relevance.

nlpweb-textreddit
66.4B
DatasetLLMs

LAION-400M Text Captions

by LAION

The text caption component of the LAION-400M dataset, offering 400 million English alt-text captions. These captions were scraped from the web and filtered using CLIP to ensure a minimum similarity to their corresponding images. The text is used independently for large-scale NLP and multimodal research.

nlpcaptionsimage-text
66.3B
DatasetLLMs

SlimPajama

by Cerebras

SlimPajama is a cleaned and deduplicated version of the RedPajama dataset, containing 627 billion high-quality tokens. Produced by Cerebras, it demonstrates that training on fewer, higher-quality tokens can match or exceed the performance of models trained on larger, noisier datasets.

nlppretrainingdeduplicated
65.5B