Skip to main content

LLMs

Large language models, fine-tuning, RAG, and inference

25 entities indexed

DatasetLLMs

Wikipedia Dump

by Wikimedia Foundation

The full text dump of Wikipedia articles available in over 300 languages, regularly updated and distributed by the Wikimedia Foundation. It is one of the most universally included components in language model pretraining pipelines due to its high factual density, editorial quality, and broad topical coverage.

nlpencyclopedicfactual
56C+
PaperLLMs

Visual Instruction Tuning (LLaVA)

by University of Wisconsin–Madison / Microsoft Research

Introduced LLaVA (Large Language and Vision Assistant), a multimodal model trained via visual instruction tuning using GPT-4-generated multimodal instruction-following data. LLaVA demonstrates impressive multimodal chat abilities and achieves 85.1% on Science QA, pioneering open-source visual instruction tuning.

llavamultimodalinstruction-tuning
54C+
PaperLLMs

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

by Princeton University / Google DeepMind

Introduced Tree of Thoughts (ToT), a framework that generalizes chain-of-thought prompting to a tree search over intermediate reasoning steps. ToT enables LLMs to explore multiple reasoning paths, evaluate choices, and backtrack, achieving dramatic improvements on tasks requiring lookahead and planning.

tree-of-thoughtsreasoningsearch
54C+
PaperLLMs

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

by Google Brain

Introduced Switch Transformers, a simplified mixture-of-experts (MoE) architecture that routes each token to exactly one expert (top-1 routing), enabling trillion-parameter models with sub-linear compute scaling. Switch Transformers achieve 7x pretraining speedup over a dense T5 model while maintaining model quality.

mixture-of-expertsmoesparse-model
54C+
PaperLLMs

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

by Princeton University

Introduced SWE-bench, a benchmark of 2,294 real GitHub issues from 12 popular Python repositories requiring models to resolve issues by writing code patches. SWE-bench reveals that even the best LLMs resolve fewer than 4% of issues with standard techniques, motivating research into code agents.

swe-benchsoftware-engineeringbenchmark
53C+
SkillLLMs

Summarization

by AaaS

Condenses long documents into concise summaries while preserving key information and maintaining factual accuracy. Supports extractive, abstractive, and hierarchical summarization with configurable length, style, and focus area parameters.

summarizationcondensationnlp
51C+
SkillLLMs

Semantic Search

by AaaS

Enables meaning-based retrieval by converting queries and documents into dense vector representations and finding nearest neighbors. Foundational skill for any RAG pipeline or knowledge-base-powered agent.

searchembeddingssimilarity
51C+
BenchmarkLLMs

TruthfulQA

by University of Oxford

Measures whether language models generate truthful answers to questions where humans are commonly mistaken. Covers health, law, finance, and politics topics where popular misconceptions and conspiracies create systematic failure modes.

benchmarkevaluationtruthfulness
50C+
BenchmarkLLMs

WinoGrande

by Allen AI

Large-scale dataset for commonsense coreference resolution inspired by Winograd schemas. Tests whether models can correctly resolve pronoun references based on world knowledge and commonsense reasoning in carefully constructed sentence pairs.

benchmarkevaluationcommonsense
49C
SkillLLMs

Text Classification

by AaaS

Automates the categorization of text into predefined classes. This skill leverages large language models to perform zero-shot and multi-label classification, eliminating the need for extensive training data. It can analyze documents, user feedback, or social media posts, assigning relevant labels from a simple list or a complex hierarchical taxonomy.

text-classificationnlpcategorization
49C
BenchmarkLLMs

TyDi QA

by Clark et al. / Google Research

TyDi QA is a multilingual question-answering benchmark featuring 11 typologically diverse languages. Questions are written natively by speakers of each language, ensuring genuine linguistic challenges and avoiding translation artifacts. It is designed to evaluate reading comprehension across a wide range of language structures.

question-answeringmultilingualtypologically-diverse
47C
SkillLLMs

Translation

by AaaS

Provides the ability to translate text from a source language to a target language. It aims to preserve the original meaning, tone, and cultural context. The skill supports domain-specific terminology for fields like legal or medical, allows for register control between formal and informal language, and handles idiomatic expressions with contextually appropriate equivalents.

translationmultilinguallocalization
47C
DatasetLLMs

SlimPajama

by Cerebras

SlimPajama is a cleaned and deduplicated version of the RedPajama dataset, containing 627 billion high-quality tokens. Produced by Cerebras, it demonstrates that training on fewer, higher-quality tokens can match or exceed the performance of models trained on larger, noisier datasets.

nlppretrainingdeduplicated
47C
DatasetLLMs

OpenWebText

by EleutherAI

OpenWebText is a large-scale, open-source English text corpus created by scraping web pages linked from Reddit. Designed as a public replication of OpenAI's original WebText dataset used for GPT-2, it contains approximately 38 GB of text filtered by Reddit upvotes to ensure a baseline of quality and relevance.

nlpweb-textreddit
47C
DatasetLLMs

LAION-400M Text Captions

by LAION

The text caption component of the LAION-400M dataset, offering 400 million English alt-text captions. These captions were scraped from the web and filtered using CLIP to ensure a minimum similarity to their corresponding images. The text is used independently for large-scale NLP and multimodal research.

nlpcaptionsimage-text
45C
BenchmarkLLMs

XL-Sum

by Hasan et al. / University of Edinburgh

XL-Sum is a large-scale benchmark dataset for multilingual abstractive summarization. It contains 1.35 million article-summary pairs from BBC News across 44 languages, designed to evaluate a model's ability to generate concise summaries across diverse linguistic families and writing systems.

summarizationmultilingualnews
44C
BenchmarkLLMs

SimpleQA

by OpenAI

SimpleQA is a benchmark dataset developed by OpenAI to assess the factual accuracy of language models. It consists of simple, unambiguous questions that have a single, verifiable correct answer. The benchmark is designed to measure a model's ability to recall factual knowledge and, crucially, to abstain from answering when it is uncertain, providing a measure of its calibration.

benchmarkevaluationfactuality
44C
DatasetLLMs

PushShift Reddit Dataset

by PushShift.io

A massive, multi-billion token archive of Reddit comments and submissions from 2005 to 2023, collected by the PushShift project. This dataset is a cornerstone for social NLP research, large-scale language model pre-training, and studying the dynamics of online communities and conversational discourse.

nlpsocial-mediadialogue
43C
PaperLLMs

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

by Idiap Research Institute / EPFL

Shows that by approximating the softmax attention kernel, transformers can be expressed as linear RNNs, enabling O(1) autoregressive inference. Introduces the linear attention framework that inspired many subsequent efficient attention variants.

linear-attentionrnn-equivalenceefficient-transformers
42C
SkillLLMs

Tree-of-Thought

by AaaS

Extends chain-of-thought prompting by exploring multiple reasoning paths simultaneously and evaluating them as a branching tree. The model generates, evaluates, and prunes candidate solutions using breadth-first or depth-first search strategies for optimal problem solving.

promptingreasoningtree-of-thought
37D
ModelLLMs

CodeLlama-70B-Instruct-v2

by Meta AI

An advanced instruction-tuned large language model specifically designed for code generation, explanation, and debugging across multiple programming languages. This version offers improved performance and reduced hallucination rates compared to its predecessor.

llmcode-generationinstruction-following
19F
ModelLLMs

Whisper-Large-v4-Multilingual

by OpenAI

An updated version of OpenAI's Whisper model with expanded language support and improved accuracy for speech-to-text transcription.

ASRSpeech-to-TextMultilingual
19F
ModelLLMs

ProteinFold-Mini-v1

by BioCompute AI

A compact yet effective model for accelerated protein structure prediction, leveraging recent advancements in AI for bioinformatics. Ideal for initial screening and rapid hypothesis generation in drug discovery.

bioinformaticsprotein-foldingdrug-discovery
18F
ModelLLMs

RLHF-Guard-7B

by Anthropic

A small, efficient model specifically designed for red-teaming and safety alignment of larger language models, identifying harmful outputs.

SafetyLLMAlignment
18F
ModelLLMs

RoboAction-VLM-8B

by Google DeepMind

An embodied AI model integrating visual perception with language understanding to enable complex robotic task planning and execution. Designed for general-purpose robotic manipulation in unstructured environments.

roboticsembodied-aivlm
14F