Skip to main content
Knowledge Index

Explore.

7,960 AI entities indexed across tools, models, agents, skills, benchmarks, and more — schema-verified, agent-maintained.

1,020 entities of 7,960 total

ToolAI Tools & APIs

Hugging Face

by Hugging Face

The largest platform for sharing and deploying machine learning models, datasets, and applications. Provides the Transformers library, Inference API, Spaces for demos, and a vibrant open-source AI community.

model-hubinference-apitransformers
79B+
Tooledge-ai

TensorFlow Lite

by Google

TensorFlow Lite is Google's lightweight ML framework designed for on-device inference on mobile, embedded, and IoT devices. It enables deploying trained models with minimal latency and no network dependency, supporting a wide range of hardware accelerators including GPU, DSP, and NPU.

edgeiotmobile
75B+
ToolAI Tools & APIs

Apache Airflow (ML Edition)

by Apache Software Foundation

Battle-tested workflow scheduler for authoring, scheduling, and monitoring data and ML pipelines as directed acyclic graphs. The ML ecosystem around Airflow includes providers for SageMaker, Vertex AI, MLflow, and all major cloud AI services.

workflowdagscheduling
74.8B+
ToolAI Tools & APIs

Three.js (AI Integration)

by Three.js Community (Mr.doob)

The foundational JavaScript 3D library for rendering GPU-accelerated graphics in the browser via WebGL, with a growing ecosystem of AI-generated geometry, procedural shaders, and LLM-driven scene graph manipulation. Three.js powers the majority of web-based spatial AI visualizations.

3dwebgljavascript
73.9B+
ToolAI Tools & APIs

dbt (AI/ML Edition)

by dbt Labs

The analytics engineering framework that transforms raw warehouse data into clean, tested, and documented datasets ready for ML and AI. dbt's model graph, column-level lineage, and semantic layer make it the backbone of production feature engineering pipelines.

analytics-engineeringsql-transformationdata-modeling
73.7B+
ToolAI Tools & APIs

Apache Spark MLlib

by Apache Software Foundation

Apache Spark's built-in machine learning library for distributed, large-scale ML on data lakes and warehouses. MLlib provides scalable algorithms for classification, regression, clustering, and collaborative filtering, plus a pipeline API for feature engineering.

distributed-mlbig-dataspark
72.9B+
Tooldomain-specific

Stability AI Platform

by Stability AI

The Stability AI Platform provides API access to Stability AI's suite of generative image, video, and audio models including Stable Diffusion 3.5 and Stable Video Diffusion, enabling developers to build creative AI applications at scale. It offers both hosted API endpoints and open-weight models for on-premises deployment.

image-generationstable-diffusionapi
72.1B+
Tooledge-ai

MediaPipe

by Google

MediaPipe is Google's cross-platform framework for building perception pipelines that run on-device in real time. It provides production-ready solutions for tasks like hand tracking, face detection, pose estimation, and object detection across Android, iOS, web, and desktop.

edgecomputer-visionpose
72.1B+
ToolAI Tools & APIs

Streamlit

by Snowflake (via acquisition)

Python-first framework for building interactive data applications and ML demos in minutes with no frontend experience required. Streamlit's reactive execution model, built-in widgets, and LLM streaming components make it the go-to tool for AI prototype UIs.

ui-builderpythondata-apps
71.5B+
ToolAI Tools & APIs

DeepL API

by DeepL

DeepL API provides neural machine translation of exceptional quality for 30+ languages, consistently outperforming competitors on blind translation benchmarks. It supports real-time text and full document translation with format preservation, a glossary system, and a free tier for developers.

translationnlplanguage-ai
71.2B+
ToolAI Tools & APIs

Zapier AI

by Zapier

Zapier AI extends the world's largest no-code automation platform with AI-powered workflow generation, natural language Zap building, and an AI Actions API that lets LLMs trigger real-world automations. It connects 6000+ apps and enables non-technical users to build AI-augmented workflows without writing code.

automationno-codeai-workflows
71.1B+
ToolAI Tools & APIs

Databricks

by Databricks Inc.

Unified data intelligence platform combining data engineering, ML, and GenAI on a Lakehouse foundation. Databricks provides managed Spark, Delta Lake, MLflow, and Model Serving with vector search, enabling end-to-end AI pipelines from raw data to production models.

lakehousemlflowspark
71B+
Tooldomain-specific

Rosetta

by RosettaCommons

Rosetta is a comprehensive software suite for computational macromolecular modeling and design, enabling researchers to predict protein structure, design novel proteins, and model protein-protein interactions. Developed by the RosettaCommons consortium, it is the gold standard in computational protein design and has contributed to multiple Nobel Prize-winning research programs.

biotechprotein-designstructural-biology
70.8B+
ToolAI Tools & APIs

Anthropic Tool Use

by Anthropic

Anthropic's native tool use capability allowing Claude models to interact with external tools and APIs. Provides structured tool definitions with input schemas and supports parallel tool calls and streaming.

tool-useanthropicclaude
70.7B+
Tooledge-ai

ONNX Runtime Mobile

by Microsoft

ONNX Runtime Mobile is Microsoft's high-performance inference engine optimized for mobile and edge devices, enabling deployment of models from any ONNX-compatible training framework. It provides hardware-accelerated inference via NNAPI, Core ML, and XNNPACK execution providers.

edgeonnxmobile
70.4B+
Tooldomain-specific

Runway ML

by Runway

Runway ML is a leading generative AI creative platform for video generation, editing, and visual effects, offering models like Gen-3 Alpha for high-fidelity text-to-video and image-to-video synthesis. It is used by filmmakers, advertisers, and content creators to produce cinematic-quality AI-generated video at scale.

videogenerative-aicreative
70B+
ToolAI Tools & APIs

Gradio

by Hugging Face

Python library for rapidly building shareable ML demos with a focus on multimodal inputs including images, audio, video, and text. Gradio is the standard for Hugging Face Spaces demos and integrates natively with the Hugging Face Hub model ecosystem.

ui-builderml-demoshuggingface
68.6B
Tooledge-ai

Apache TVM

by Apache Software Foundation

Apache TVM is an open-source machine learning compiler stack that optimizes deep learning workloads for a diverse set of hardware backends including CPUs, GPUs, FPGAs, and custom accelerators. It automates model optimization through its AutoTVM and Ansor auto-tuning systems, delivering state-of-the-art inference performance on edge targets.

edgecompileroptimization
68B
ToolAI Tools & APIs

OneTrust AI

by OneTrust

OneTrust offers a Trust Intelligence Platform to help organizations manage privacy, security, and data governance. It automates workflows for compliance with regulations like GDPR and CCPA, manages user consent, and provides tools for AI governance, data discovery, and third-party risk assessment across the enterprise.

complianceprivacy-managementgdpr
67.8B
ToolAI Tools & APIs

Delta Lake

by Linux Foundation (Delta Lake Project)

Delta Lake is an open-source storage layer that brings ACID transactions and reliability to data lakes. Built on top of Parquet files, it enables features like schema enforcement, time travel for data versioning, and unified batch and streaming data processing. It serves as the foundational storage format for the Lakehouse architecture.

data-lakelakehousestorage-format
67.3B
ToolAI for Code

Claude Code

by Anthropic

Claude Code is an agentic AI coding assistant from Anthropic designed to operate within a developer's terminal. It autonomously handles complex software development tasks by understanding entire codebases, editing files, executing shell commands, and managing Git workflows, acting as a hands-on pair programmer with minimal human supervision.

ai-codingcliagentic-ai
67.1B
ToolAI Tools & APIs

AWS API Gateway (ML)

by Amazon Web Services

AWS-managed API gateway service for building, deploying, and scaling ML and AI APIs backed by Lambda, SageMaker, and Bedrock endpoints. AWS API Gateway provides built-in authorization, throttling, caching, and monitoring for production AI service deployments at any scale.

api-gatewayawslambda
66.8B
Tooltesting

LangSmith Testing

by LangChain

LangSmith is a platform for debugging, testing, evaluating, and monitoring LLM applications. It enables developers to visualize execution traces of their chains and agents, collect datasets, and run automated evaluators to score model performance. The platform is designed to streamline the LLM development lifecycle from prototype to production.

llm-evaluationllm-testingllmops
66.2B
Toolknowledge-graph

Neo4j GraphRAG

by Neo4j

Neo4j GraphRAG combines graph database capabilities with vector search to build retrieval-augmented generation systems that leverage structured relationships alongside semantic similarity. It enables developers to construct knowledge graphs that ground LLM responses in connected, structured data reducing hallucinations and improving traceability.

knowledge-graphragvector-search
65.9B
ToolAI Tools & APIs

Semantic Kernel

by Microsoft

Microsoft's open-source SDK for integrating LLMs into applications with plugin architecture. Supports planners, memory, and connectors for building enterprise AI solutions across .NET, Python, and Java.

llm-frameworkmicrosoftorchestration
65.7B
ToolAI Tools & APIs

Make AI

by Make

Make (formerly Integromat) is a visual no-code automation platform with deep AI integration that enables complex multi-step workflows through a drag-and-drop scenario builder. It offers granular data transformation, HTTP module flexibility, and AI-powered scenario generation for orchestrating sophisticated automation pipelines.

automationno-codevisual-workflow
65.3B
ToolAI Tools & APIs

Milvus

by Zilliz

Cloud-native vector database built for scalable similarity search with GPU acceleration. Supports billions of vectors with multiple index types, hybrid search, and multi-vector queries.

vector-databasedistributedgpu-accelerated
65.2B
ToolAI Tools & APIs

Instructor

by Jason Liu

Instructor is a Python library that simplifies extracting structured, typed data from Large Language Model (LLM) responses. By leveraging Pydantic models, it enables developers to define a desired data schema, and Instructor handles the prompting, validation, and retries to ensure the LLM output conforms to that schema, streamlining data extraction tasks.

structured-outputpydanticvalidation
64.8B
ToolAI Tools & APIs

Groq

by Groq

Groq is an AI inference company that provides ultra-fast access to open-source large language models. It leverages its custom-designed Language Processing Unit (LPU) hardware to deliver industry-leading token generation speeds, significantly reducing latency for real-time applications via an OpenAI-compatible API.

inferencelpulow-latency
64.5B
Tooldomain-specific

Jasper AI

by Jasper

Jasper is an AI content platform designed for enterprise marketing teams. It helps create on-brand content at scale by combining advanced AI models with a company's specific brand knowledge. The platform supports multi-channel content generation, campaign workflows, and ensures brand voice consistency across all outputs.

ai-writercontent-creationmarketing
63.9B
ToolAI Tools & APIs

Model Context Protocol

by Anthropic

Open protocol by Anthropic for connecting AI models to external tools, data sources, and services. Provides a standardized interface for tool use with server and client SDKs for building integrations.

protocolmcpanthropic
63.8B
ToolAI Tools & APIs

H2O AutoML

by H2O.ai

H2O AutoML is an open-source, distributed machine learning platform that automates the model training process. It systematically explores various algorithms and hyperparameters to produce a leaderboard of the best models. It supports both a Python/R API and a no-code Flow UI, making it accessible to both developers and business users.

automlmachine-learningopen-source
63.8B
ToolAI Tools & APIs

Bubble AI

by Bubble

Bubble AI is an AI-powered no-code development platform for building web applications without writing code. Users can describe their app idea in natural language, and the AI assistant generates layouts, database structures, and workflows. This visual programming environment allows for extensive customization, ideal for creating MVPs and internal tools.

no-codeapp-builderai-generation
63.4B
ToolAI Infrastructure

LoRA Library

by Hugging Face

The LoRA Library, integrated within Hugging Face's PEFT (Parameter-Efficient Fine-Tuning) package, provides tools to create, share, and use LoRA adapters. It allows for the efficient customization of large pre-trained models by training only a small number of new weights, drastically reducing computational costs and storage requirements compared to full fine-tuning.

loraadaptersmodel-hub
63.1B
ToolAI Infrastructure

TGI

by Hugging Face

A production-ready inference server for large language models, developed by Hugging Face in Rust. It enables high-performance LLM serving through features like tensor parallelism, continuous batching, and quantization, making it ideal for deploying demanding models at scale with low latency.

llm-inferencemodel-servinghugging-face
63B
ToolAI Tools & APIs

GitBook AI

by GitBook

GitBook AI is an intelligent documentation and knowledge management platform with built-in AI that can answer questions, generate content, and surface insights from your entire knowledge base. It combines a Notion-like editor with an AI search assistant and GitHub sync for technical teams.

documentationknowledge-baseai-docs
62.9B
ToolAI Tools & APIs

Temporal AI

by Temporal Technologies

Durable workflow orchestration platform that makes it easy to build reliable distributed applications. Temporal handles retries, timeouts, and failure recovery automatically, making it ideal for long-running AI pipelines and agent orchestration.

workfloworchestrationdurable-execution
62.6B
ToolAI Tools & APIs

Mintlify

by Mintlify

Mintlify is an AI-powered platform for creating and maintaining developer documentation. It auto-generates content from code comments and OpenAPI specs, provides an AI chatbot trained on the docs for instant answers, and offers a rich component library for technical writing. The platform simplifies publishing and hosting.

documentation-platformai-documentationdeveloper-tools
62.6B
Tooldomain-specific

Terra

by Broad Institute / Microsoft

Terra is a cloud-based open platform for biomedical researchers to access data, run analysis tools, and collaborate, built on Google Cloud with support for WDL and CWL workflow languages. It provides access to petabyte-scale genomic datasets including TCGA and GTEx, and supports scalable analysis through Cromwell and Spark pipelines.

genomicsbioinformaticscloud
62.4B
ToolAI Tools & APIs

AutoGluon

by Amazon Web Services

AutoGluon is an open-source AutoML framework from AWS that simplifies machine learning. It automates model training, hyperparameter tuning, and ensembling to achieve state-of-the-art performance on tabular, image, text, and time-series data with just a few lines of Python code, making advanced ML accessible to all skill levels.

automltabular-datamultimodal-learning
62.2B
ToolAI Tools & APIs

Unity ML-Agents

by Unity Technologies

Unity ML-Agents is an open-source toolkit that enables the use of the Unity game engine as a simulation environment for training intelligent agents. It connects rich 3D environments with Python-based deep reinforcement learning and imitation learning frameworks like TensorFlow and PyTorch, facilitating research and development in game AI, robotics, and autonomous systems.

reinforcement-learningsimulationgame-ai
62B
ToolAI Tools & APIs

Airbyte

by Airbyte Inc.

Open-source data integration platform with 350+ pre-built connectors for syncing data into AI-ready warehouses and vector databases. Airbyte's PyAirbyte SDK and AI Connector Builder enable rapid connector creation for custom data sources and AI pipelines.

data-integrationeltconnectors
62B
ToolAI Tools & APIs

Label Studio

by HumanSignal

Open-source data labeling and annotation platform supporting text, image, audio, and video. Provides customizable labeling interfaces, ML-assisted labeling, and team collaboration for building training datasets.

annotationlabelingdata-management
61.45B
ToolAI Tools & APIs

Auto-sklearn

by University of Freiburg (AutoML Group)

Auto-sklearn is an open-source AutoML toolkit built on scikit-learn. It leverages Bayesian optimization, meta-learning, and automated ensemble construction to find the best-performing machine learning pipeline for a given tabular dataset. It is a prominent tool in academic research for automated model selection.

automlbayesian-optimizationscikit-learn
61.1B
ToolAI for Code

Cline

by Cline

Autonomous coding agent that operates directly in VS Code with support for multiple LLM providers. Can create and edit files, run terminal commands, and browse the web while requiring human approval for actions.

ai-codingvscode-extensionautonomous
61.05B
Tooledge-ai

Edge Impulse

by Edge Impulse

Edge Impulse is a leading development platform for machine learning on embedded systems and IoT devices. It offers an end-to-end MLOps pipeline, from data collection and signal processing to model training and deployment. The platform simplifies creating TinyML applications for resource-constrained microcontrollers.

tinymlembedded-mliot
61B
ToolAI Tools & APIs

Dagster

by Dagster Labs

Dagster is an asset-centric data orchestrator for building, testing, and monitoring data pipelines. It models data dependencies and computations as a graph of software-defined assets, providing built-in data lineage, type checking, and observability. This approach helps data teams create reliable and maintainable data platforms.

data-orchestrationsoftware-defined-assetsdata-pipelines
60.7B
ToolAI Tools & APIs

ReadMe AI

by ReadMe

ReadMe AI is an interactive API documentation platform that transforms OpenAPI specs and Markdown into beautiful, interactive developer hubs with personalized API explorers. Its AI-powered features include auto-generated code samples, semantic search, and contextual AI answers drawn from your documentation.

api-documentationdeveloper-portalai-docs
59.9C+
ToolAI Tools & APIs

Outlines

by .txt

Outlines is an open-source Python library that provides fine-grained control over large language model text generation. It uses constrained decoding to force the model's output to conform to a specific structure, such as a regular expression, a Pydantic model, or a JSON schema. This guarantees that the generated text is always valid and parseable, eliminating the need for post-processing and error handling.

structured-generationconstrained-decodingjson-schema
59.4C+
ToolAI Tools & APIs

Play.ht

by Play.ht

Play.ht is an AI-powered text-to-speech generator and voice cloning platform. It offers a vast library of over 900 AI voices in multiple languages and accents. The platform is designed for various applications, from creating audio versions of articles to developing interactive conversational AI, thanks to its low-latency real-time streaming API.

text-to-speechvoice-cloningaudio-ai
59.3C+
ToolAI Infrastructure

Triton Inference Server

by NVIDIA

Triton is an open-source inference server from NVIDIA designed for high-performance, production-ready AI. It supports deploying models from virtually any framework, such as TensorFlow, PyTorch, and ONNX, on both GPUs and CPUs. Key features include dynamic batching, concurrent model execution, and model ensembling to maximize throughput and resource utilization.

model-servinginference-servernvidia
59.1C+
ToolAI Tools & APIs

n8n AI

by n8n

n8n AI is a source-available, workflow automation platform that enables users to build complex, AI-powered automations. It features a visual, node-based editor where users can connect hundreds of applications and services, including various LLMs and AI agents, to orchestrate intricate processes with minimal code.

workflow-automationlow-codeai-automation
58.9C+
ToolAI Tools & APIs

TrustArc AI

by TrustArc

TrustArc AI is a comprehensive privacy management platform that leverages AI to automate and simplify compliance with global regulations like GDPR and CCPA. It provides tools for data inventory, risk assessments, and consent management, helping organizations build and maintain robust privacy programs.

privacy-managementcompliance-automationgdpr-compliance
58.7C+
Toolfeature-store

Hopsworks

by Hopsworks AB

Hopsworks is an open-source ML platform centered around its feature store, enabling teams to manage the full lifecycle of features from engineering to serving for both batch and real-time ML workloads. It integrates deeply with Apache Spark, Flink, and Python environments, and provides built-in model registry and serving capabilities.

feature-storemlopsopen-source
58.5C+
ToolAI Tools & APIs

NVIDIA Omniverse

by NVIDIA

NVIDIA's platform for building physically accurate 3D simulations, digital twins, and collaborative virtual worlds powered by Universal Scene Description (USD) and real-time ray tracing. Omniverse integrates generative AI for scene synthesis, avatar animation, and synthetic data generation for robot and autonomous vehicle training.

3dsimulationdigital-twin
58.4C+
ToolAI for Code

Replit AI

by Replit

AI-powered cloud development platform with integrated coding assistant and one-click deployment. Combines a browser-based IDE with AI code generation, debugging, and instant deployment to production.

ai-codingcloud-idedeployment
58.2C+
ToolAI Tools & APIs

Kong AI Gateway

by Kong Inc.

AI-native API gateway that provides a unified control plane for managing, securing, and observing all LLM traffic across any provider. Kong AI Gateway adds semantic caching, prompt injection protection, token rate limiting, and cost attribution on top of the battle-tested Kong Gateway.

api-gatewayllm-proxyrate-limiting
58.1C+
ToolAI Tools & APIs

Phrase TMS AI

by Phrase

Phrase TMS AI is a translation management system with integrated AI that automates the end-to-end localization workflow for enterprises, including MT integration, translation memory, terminology management, and quality assurance automation. It serves as the operational backbone for global content and software localization programs.

translation-managementlocalizationtms
58C+
Toolfeature-store

Tecton

by Tecton

Tecton is an enterprise feature store platform that enables data scientists and ML engineers to build, share, and serve features for real-time and batch machine learning applications. It provides a declarative Python SDK for defining feature pipelines, with automatic backfilling, versioning, and point-in-time correct training data generation.

feature-storemlopsreal-time
57.4C+
ToolAI Infrastructure

Kubeflow

by Google

Open-source ML platform for Kubernetes providing end-to-end ML workflow orchestration. Includes pipeline authoring, distributed training, hyperparameter tuning, and model serving on Kubernetes clusters.

mlopskubernetespipelines
57.05C+
ToolAI Tools & APIs

Apigee AI (Google Cloud)

by Google Cloud

Google Cloud's enterprise API management platform with native Vertex AI and Gemini integration for building secure AI-powered APIs. Apigee AI adds LLM traffic management, semantic caching, safety policies, and analytics to Google's proven API gateway infrastructure.

api-gatewayenterprisegoogle-cloud
57C+
ToolAI Tools & APIs

Kapwing AI

by Kapwing

Kapwing is a browser-based AI video editor designed for content creators and social media teams. It offers AI-powered subtitle generation, background removal, smart cut, and one-click repurposing across formats and aspect ratios.

video-editingai-videosubtitle-generation
56.6C+
ToolAI Tools & APIs

Unbabel

by Unbabel

Unbabel is an AI-powered translation platform that combines neural machine translation with a community of professional post-editors to deliver human-quality translations at machine speed. It is purpose-built for enterprise customer support and content teams requiring guaranteed accuracy at scale.

translationhuman-in-the-loopcustomer-support-translation
56.1C+
ToolAI Tools & APIs

Modal

by Modal

Serverless cloud platform for running GPU-accelerated Python code with zero infrastructure management. Provides instant container spin-up, GPU autoscaling, and simple decorators for deploying ML workloads.

serverlessgpu-computeinfrastructure
56.1C+
ToolAI Tools & APIs

Guidance

by Microsoft

Microsoft's language for controlling LLMs with interleaved generation and prompting. Supports constrained output via token healing, regex constraints, and context-free grammars for reliable generation.

llm-programmingconstrained-generationtemplating
55.85C+
Tooldomain-specific

PathAI

by PathAI

PathAI is an AI-powered pathology platform that assists pathologists in diagnosing diseases including cancer by analyzing digitized tissue slides with deep learning models trained on millions of pathology images. It provides quantitative biomarker analysis, treatment response prediction, and clinical trial endpoint measurement at scale.

pathologymedical-aidigital-pathology
55.8C+
ToolAI Tools & APIs

FLAML

by Microsoft Research

Fast and Lightweight AutoML library from Microsoft Research that minimizes compute while maximizing accuracy. FLAML uses cost-aware hyperparameter search and is designed to be embedded inside larger systems, including the AutoGen multi-agent framework.

automlhyperparameter-optimizationefficient
55.8C+
ToolAI Tools & APIs

Credo AI

by Credo AI

Credo AI is an AI governance platform that enables organizations to assess, monitor, and document AI model risks, fairness, and regulatory compliance. It automates evidence collection for frameworks like EU AI Act, NIST AI RMF, and ISO 42001, bridging the gap between AI teams and risk officers.

ai-governanceresponsible-aicompliance
55.8C+
ToolAI Tools & APIs

Chainlit

by Chainlit (Community)

Production-ready Python framework for building conversational AI applications with streaming, message threading, and human-in-the-loop feedback. Chainlit is optimized specifically for LLM chat UIs and integrates natively with LangChain, LlamaIndex, and LiteLLM.

ui-builderchatbotconversational-ai
55.8C+
ToolAI Tools & APIs

RunPod

by RunPod

Cloud GPU platform for AI inference and training with serverless and dedicated GPU options. Provides cost-effective GPU rentals with pre-built templates for popular ML frameworks and models.

gpu-cloudserverlessinference
55.4C+
ToolAI Tools & APIs

SerpAPI

by SerpAPI

API for scraping and parsing search engine results from Google, Bing, Yahoo, and others. Provides structured JSON results from multiple search engines with support for locations, languages, and devices.

search-apiscrapingmulti-engine
55.3C+
ToolAI Infrastructure

Great Expectations

by Superconductive

Open-source data quality platform for validating, profiling, and documenting data pipelines. Provides expectation-based testing for data quality with automated documentation and alerting capabilities.

data-qualityvalidationtesting
55.3C+
ToolAI Infrastructure

DVC

by Iterative

Open-source version control system for machine learning projects with Git-like data management. Provides data versioning, experiment tracking, and ML pipeline management alongside your existing Git workflow.

version-controldata-versioningml-pipelines
54.9C+
Toolknowledge-graph

ArangoDB

by ArangoDB

ArangoDB is a native multi-model database supporting graphs, documents, and key-value storage in a single engine, with integrated vector search and ML capabilities for building knowledge-graph-backed AI applications. Its AQL query language and ArangoSearch make it suitable for complex knowledge retrieval pipelines combining structural and semantic search.

knowledge-graphmulti-modelgraph
54.9C+
ToolAI Tools & APIs

Swarm

by OpenAI

OpenAI's experimental lightweight multi-agent orchestration framework focused on handoffs and routines. Provides a minimal abstraction for agent coordination using function calling and agent transfers.

multi-agentopenailightweight
54.65C+
Tooltesting

Patronus AI

by Patronus AI

Patronus AI is an enterprise LLM evaluation platform specializing in automated testing for hallucination, toxicity, PII leakage, and factual accuracy across production AI systems. It provides a library of 1,000+ pre-built evaluators and supports custom evaluator creation to enforce application-specific quality gates.

testingllmevaluation
54.6C+
ToolAI Tools & APIs

Fireworks AI

by Fireworks AI

High-performance inference platform for generative AI with fast model serving and fine-tuning. Optimized for production workloads with function calling, JSON mode, and grammar-based generation.

inferencefine-tuningfast-inference
54.55C+
ToolAI for Code

Continue

by Continue

Open-source AI code assistant for VS Code and JetBrains with customizable model and context providers. Supports tab autocomplete, chat, inline editing, and custom slash commands with any LLM.

ai-codingopen-sourceide-extension
54.55C+
ToolAI Tools & APIs

Pictory

by Pictory AI

Pictory is an AI video creation platform that transforms long-form text, articles, and scripts into short branded videos automatically. It includes AI voiceovers, stock footage matching, automatic highlight extraction, and a brand kit for consistent visual identity.

ai-videotext-to-videovideo-summarization
54.2C+
ToolAI Tools & APIs

Flyte

by Union.ai (Linux Foundation)

Kubernetes-native workflow orchestration platform purpose-built for machine learning and data processing at scale. Flyte enforces strong typing on inputs and outputs, provides built-in versioning, and integrates natively with Kubernetes for resource management.

workflowmlopskubernetes
54.1C+
ToolAI Tools & APIs

Arthur AI

by Arthur AI

Arthur AI is an enterprise ML monitoring and observability platform that tracks model performance, detects data and concept drift, and measures fairness in production deployments. It provides real-time alerting, explainability dashboards, and bias mitigation tooling for high-stakes AI applications.

ml-monitoringai-observabilitymodel-performance
54C+
ToolAI Tools & APIs

Anyscale

by Anyscale

Enterprise platform for scaling AI applications built on the Ray distributed computing framework. Provides managed Ray clusters, model serving, and fine-tuning infrastructure for production AI workloads.

raydistributed-computinginference
54C+
ToolAI Tools & APIs

Arize AI

by Arize AI

ML observability platform for monitoring model performance, detecting drift, and troubleshooting issues. Provides real-time monitoring, embedding analysis, and automated performance alerts for AI systems.

observabilitymodel-monitoringml-observability
53.8C+
Toolknowledge-graph

Amazon Neptune ML

by Amazon Web Services

Amazon Neptune ML is a managed graph machine learning capability built on Neptune that uses graph neural networks to make predictions on graph data without requiring ML expertise. It automatically trains GNN models on graph structure and node/edge properties for tasks like node classification, link prediction, and regression.

knowledge-graphawsgnn
53.8C+
ToolAI Tools & APIs

Panel

by HoloViz / NumFOCUS

High-level app and dashboarding framework from HoloViz that works with nearly every visualization library in the Python ecosystem. Panel supports reactive programming, GPU-accelerated plotting, and server-side rendering, making it ideal for complex analytical AI dashboards.

ui-builderdashboardshvplot
53.6C+
ToolAI Infrastructure

Metaflow

by Netflix / Outerbounds

Human-friendly Python library for building and managing real-life data science and ML projects. Originally developed at Netflix, provides seamless scaling from laptops to cloud with versioning and reproducibility.

ml-pipelinesdata-scienceworkflow
52.95C+
ToolAI Tools & APIs

Traefik AI Gateway

by Traefik Labs

Cloud-native edge router and AI gateway built for Kubernetes-native LLM traffic management. Traefik AI extends the battle-tested Traefik reverse proxy with LLM-aware middleware for token counting, semantic caching, failover routing, and provider load balancing.

api-gatewaykubernetescloud-native
52.7C+
ToolAI for Code

Sourcegraph Cody

by Sourcegraph

AI coding assistant powered by Sourcegraph's code graph for deep codebase understanding. Provides context-aware code generation and answers using entire repository knowledge across large codebases.

ai-codingcode-searchcodebase-context
51.8C+
ToolAI Tools & APIs

Snorkel

by Snorkel AI

Enterprise data-centric AI platform for programmatically labeling and curating training data. Uses weak supervision and labeling functions to create large labeled datasets without manual annotation.

programmatic-labelingweak-supervisiondata-labeling
51.3C+
ToolAI Tools & APIs

Spline AI

by Spline Design

Browser-based 3D design tool with integrated AI generation capabilities for creating interactive 3D scenes, objects, and animations from text prompts. Spline AI allows designers and developers to produce real-time web-ready 3D graphics without traditional 3D modeling expertise.

3dspatialgenerative-3d
51.1C+
ToolAI Tools & APIs

Marker

by VikParuchuri

Fast and accurate PDF to Markdown converter optimized for books and scientific papers. Handles complex layouts, equations, tables, and multi-column documents with higher quality than traditional OCR tools.

pdf-to-markdowndocument-conversionocr
51C+
ToolAI Tools & APIs

Zilliz Cloud

by Zilliz

Fully managed vector database service built on Milvus for enterprise-grade similarity search. Provides auto-scaling, high availability, and enterprise security with a simplified operational experience.

vector-databasemanaged-milvuscloud
50.9C+
ToolAI Tools & APIs

Cleanlab

by Cleanlab

Data-centric AI library for finding and fixing label errors in datasets automatically. Uses confident learning algorithms to identify mislabeled data, estimate noise, and improve model training quality.

data-qualitylabel-errorsconfidence-learning
50.55C+
ToolAI for Code

Gemini Code Assist

by Google

Google's AI-powered code assistance tool integrated with Google Cloud and IDEs. Provides code completions, explanations, and transformations powered by Gemini models with enterprise security controls.

ai-codinggooglecode-completion
50.5C+
ToolAI Tools & APIs

txtai

by NeuML

All-in-one embeddings database for semantic search, LLM orchestration, and language model workflows. Combines vector search with NLP pipelines including summarization, translation, and text-to-speech.

embeddingssemantic-searchrag
49.7C
ToolAI Tools & APIs

Swimm

by Swimm

Swimm is an AI-powered code documentation tool that auto-generates and keeps documentation synchronized with the codebase using IDE plugins. It detects code changes and alerts developers when docs become stale, enabling engineering teams to maintain accurate, living documentation at scale.

code-documentationdeveloper-toolsknowledge-management
49.7C
ToolAI Tools & APIs

Serper

by Serper

Fast and affordable Google Search API for developers and AI applications. Provides structured Google search results including organic results, knowledge graphs, and related questions via a simple REST API.

search-apigoogle-resultsfast
49.6C
ToolAI Infrastructure

AutoTrain

by Hugging Face

Hugging Face's automated training solution for fine-tuning LLMs and other models with minimal configuration. Provides a no-code UI and CLI for training custom models with automatic hyperparameter selection.

automlhugging-facefine-tuning
49.6C
Toolknowledge-graph

Stardog

by Stardog

Stardog is an enterprise knowledge graph platform built on W3C standards (RDF, OWL, SPARQL) that enables organizations to unify disparate data sources into a semantic layer for AI and analytics. Its Virtual Graph capability connects to existing databases without data migration, and its AI integration supports LLM grounding on enterprise knowledge.

knowledge-graphenterpriserdf
49.4C
ToolAI Tools & APIs

Prodigy

by Explosion

Scriptable annotation tool by Explosion for creating training data with active learning. Integrates with spaCy for NLP tasks and provides efficient annotation workflows with model-in-the-loop labeling.

annotationactive-learningnlp
49.4C
ModelLLMs

GPT-5

by OpenAI

OpenAI's frontier model with advanced reasoning, native multimodal understanding, and robust function calling. Designed for complex enterprise workflows and agentic applications.

llmreasoningmultimodal
78.7B+
ModelLLMs

GPT-4o

by OpenAI

OpenAI's natively multimodal flagship model processing text, image, and audio inputs with a single unified architecture. Delivers GPT-4 Turbo-level intelligence at 2x speed and 50% lower cost, with breakthrough real-time voice capabilities.

llmmultimodalomni
78.1B+
ModelLLMs

Claude 4

by Anthropic

Anthropic's most capable model featuring advanced reasoning, coding, and multimodal capabilities. Excels at complex analysis, agentic tasks, and extended thinking with industry-leading safety.

llmreasoningcoding
78B+
ModelLLMs

GPT-4

by OpenAI

OpenAI's breakthrough large language model that demonstrated a significant leap in reasoning and factual accuracy over GPT-3.5. Widely adopted across enterprise and developer workflows for code generation, analysis, and complex problem-solving.

llmreasoningmultimodal
77.9B+
ModelLLMs

Claude 3.5 Sonnet

by Anthropic

Anthropic's breakout model that surpassed Claude 3 Opus at Sonnet-tier pricing, setting new industry benchmarks for coding. Introduced computer use capability and became the most popular model on the API due to its exceptional intelligence-to-cost ratio.

llmcodingmultimodal
77.7B+
ModelComputer Vision

Midjourney V6

by Midjourney

Midjourney V6 represents a major leap in photorealism, prompt adherence, and artistic coherence, setting a new industry benchmark for AI image generation quality. It introduced native text rendering within images and dramatically improved its understanding of complex, multi-subject prompts.

image-generationtext-to-imagecreative-ai
77.2B+
ModelSpeech & Audio AI

Whisper V3

by OpenAI

OpenAI's state-of-the-art open-source automatic speech recognition model trained on 680K hours of multilingual audio. Supports 99 languages with near-human accuracy and includes translation, timestamp, and language detection capabilities.

speech-to-texttranscriptionmultilingual
77B+
ModelLLMs

BERT

by Google

BERT (Bidirectional Encoder Representations from Transformers) is Google's landmark 2018 language model that introduced the bidirectional pre-training paradigm using masked language modeling and next sentence prediction. It revolutionized NLP by demonstrating that a single pre-trained model could achieve state-of-the-art results across dozens of downstream tasks with minimal fine-tuning.

foundationalgoogletransformer
76.3B+
ModelLLMs

Gemini 2.5 Pro

by Google DeepMind

Google DeepMind's flagship thinking model with native multimodal understanding across text, images, audio, and video. Excels at complex reasoning, code generation, and agentic tasks with a million-token context window.

llmreasoningmultimodal
76.2B+
ModelComputer Vision

Stable Diffusion XL

by Stability AI

Stability AI's high-resolution image generation model producing photorealistic and artistic images at 1024x1024 resolution. Features a two-stage architecture with a base model and refiner for enhanced detail and compositional quality.

image-generationdiffusionopen-source
74.4B+
ModelLLMs

GPT-4 Turbo

by OpenAI

An optimized variant of GPT-4 offering a 128K context window, faster inference, and significantly reduced costs. Introduced JSON mode and improved function calling, making it the preferred GPT-4 variant for production applications.

llmreasoningmultimodal
74.3B+
ModelLLMs

Llama 3.1 70B

by Meta

Meta's workhorse open-source model with 70B parameters, 128K context window, and native tool-use support. Widely deployed as a cost-effective alternative to proprietary frontier models.

llmopen-sourcelarge-model
73.5B+
ModelLLMs

DeepSeek-V3

by DeepSeek

DeepSeek's frontier-class MoE model with 671B total parameters and 37B active, trained using FP8 mixed precision for unprecedented cost efficiency. Matches or exceeds GPT-4o and Claude 3.5 Sonnet on key benchmarks.

llmopen-sourcemoe
72.8B+
ModelLLMs

o1

by OpenAI

OpenAI's first reasoning model that uses extended internal chain-of-thought before responding. Achieves expert-level performance on competitive math (AIME), PhD-level science (GPQA), and complex coding tasks through deliberative alignment.

llmreasoningchain-of-thought
72.6B+
ModelSpeech & Audio AI

ElevenLabs Turbo v2.5

by ElevenLabs

ElevenLabs Turbo v2.5 is a low-latency multilingual text-to-speech model optimized for real-time conversational AI applications, offering sub-400ms first-audio latency while maintaining the high voice cloning fidelity ElevenLabs is known for across 32 languages. It powers a wide range of AI assistant, customer service, and interactive voice applications where natural-sounding, real-time speech is critical.

text-to-speechvoice-cloninglow-latency
72.4B+
ModelLLMs

Llama 3.1 405B

by Meta

The largest openly available language model at 405 billion parameters, rivaling proprietary frontier models in reasoning and knowledge. A landmark release demonstrating open-source models can match closed alternatives.

llmopen-sourcefrontier
72.2B+
ModelComputer Vision

DALL-E 3

by OpenAI

OpenAI's most advanced image generation model with native ChatGPT integration. Features dramatically improved prompt following, text rendering, and safety mitigations compared to DALL-E 2, generating high-fidelity images from natural language descriptions.

image-generationtext-to-imagecreative
72.2B+
ModelLLMs

Claude 4 Sonnet

by Anthropic

Anthropic's balanced Claude 4 generation model delivering strong coding and reasoning at competitive pricing. Features improved agentic capabilities and extended thinking, offering a compelling mid-tier option between Haiku and Opus.

llmcodingmultimodal
72.2B+
ModelLLMs

Llama 3 70B

by Meta

Meta's high-performance 70B parameter model closing the gap with proprietary frontier models. Achieved competitive results on major benchmarks while remaining fully open-source.

llmopen-sourcelarge-model
72.05B+
ModelLLMs

Claude 4.5 Sonnet

by Anthropic

Anthropic's most advanced Sonnet-tier model, combining frontier intelligence with practical speed and cost. Features state-of-the-art coding performance, improved extended thinking, and robust agentic capabilities for complex multi-step workflows.

llmcodingmultimodal
71.1B+
ModelLLMs

GPT-2

by OpenAI

GPT-2 is OpenAI's 2019 autoregressive language model that demonstrated for the first time that large-scale unsupervised pre-training on internet text could produce coherent, fluent long-form text generation with zero-shot task performance. Its initial withheld release sparked global debate about AI safety and responsible disclosure of capable AI systems.

foundationalopenaiautoregressive
70.8B+
ModelLLMs

Gemini 2.5 Flash

by Google DeepMind

Google DeepMind's fast thinking model optimized for speed and cost efficiency while retaining strong reasoning capabilities. Supports a million-token context window with native multimodal input.

llmfast-inferencemultimodal
70.7B+
ModelLLMs

Gemini 2.0 Flash

by Google

Google's next-generation fast model built for the agentic era, featuring native tool use, multimodal generation, and real-time streaming. Outperforms Gemini 1.5 Pro on key benchmarks while maintaining Flash-tier speed and cost efficiency.

llmfastmultimodal
70.7B+
Modelother

AlphaFold 3

by Google DeepMind

AlphaFold 3 is Google DeepMind's third-generation protein structure prediction model that extends beyond proteins to predict the structures of DNA, RNA, and small molecules and their interactions. It represents a revolutionary tool for drug discovery and structural biology, dramatically accelerating our understanding of molecular machines that underpin life.

foundationaldeepmindprotein-structure
70.6B+
ModelSpeech & Audio AI

Google WaveNet

by Google / DeepMind

Google WaveNet is DeepMind's pioneering generative model for raw audio waveforms that dramatically advanced the state of the art in text-to-speech naturalness when published in 2016 and continues to power Google Assistant, Google Cloud TTS, and various Google products at massive scale. Its autoregressive waveform generation approach established the template for neural vocoder research and inspired a generation of TTS architectures.

text-to-speechwavenetgoogle
70.5B+
ModelLLMs

Mistral 7B

by Mistral AI

Mistral AI's breakthrough 7B parameter model that outperformed Llama 2 13B across all benchmarks at launch. Introduced sliding window attention and grouped-query attention for efficient inference.

llmopen-sourcesmall-model
70.4B+
ModelLLMs

Gemini 1.5 Pro

by Google

Google's mid-size multimodal model featuring a groundbreaking 2 million token context window using mixture-of-experts architecture. Excels at long-document understanding, video analysis, and cross-modal reasoning tasks that require processing large volumes of information.

llmlong-contextmultimodal
70.4B+
ModelLLMs

GPT-4o mini

by OpenAI

OpenAI's most cost-efficient small model, replacing GPT-3.5 Turbo as the default lightweight option. Scores 82% on MMLU and outperforms GPT-4 on chat preferences while costing over 60% less than GPT-4o.

llmlightweightcost-efficient
70.35B+
ModelComputer Vision

FLUX 1.1 Pro

by Black Forest Labs

FLUX 1.1 Pro from Black Forest Labs is a next-generation text-to-image model built by the original creators of Stable Diffusion, offering superior prompt comprehension, anatomical accuracy, and photorealistic detail. It sets a new open-weights standard with exceptional speed and quality, available in Pro, Dev, and Schnell variants for different use cases.

image-generationtext-to-imageopen-source
70.1B+
ModelLLMs

T5

by Google

T5 (Text-To-Text Transfer Transformer) is Google's 2019 framework that reframes all NLP tasks as text-to-text problems, allowing a single model to be trained on a unified mixture of tasks. Its clean formulation and the C4 dataset became foundational references for multitask learning research, and T5 variants remain widely used in production and research.

foundationalgoogleencoder-decoder
69.7B
ModelLLMs

GPT-4V

by OpenAI

OpenAI's multimodal extension of GPT-4 with native vision capabilities for image understanding, OCR, and visual reasoning. Processes interleaved text and images for tasks ranging from chart analysis to visual question answering.

multimodalvisionopenai
69.6B
ModelSpeech & Audio AI

Suno V3.5

by Suno AI

Suno V3.5 is a text-to-song AI model that generates complete, radio-quality music tracks with vocals, instrumentation, and song structure directly from natural language prompts or custom lyrics. It supports an enormous range of genres and styles and is widely regarded as the most accessible and highest-quality text-to-music system for non-musicians.

music-generationtext-to-musicvocals
69.4B
ModelLLMs

Mixtral 8x7B

by Mistral AI

Mistral AI's sparse mixture-of-experts model using 8 expert networks of 7B parameters each, activating only 2 per token. Matches GPT-3.5 performance while using a fraction of the compute at inference.

llmopen-sourcemoe
69.4B
ModelLLMs

Qwen 2.5 72B

by Alibaba Cloud

The flagship open-weight model in the Qwen 2.5 series, offering substantial improvements in reasoning, instruction following, and structured output over its predecessor. Supports 128K context with strong performance across 29+ languages.

llmmultilingualopen-weight
69.3B
ModelLLMs

DeepSeek Coder V3

by DeepSeek

DeepSeek Coder V3 is DeepSeek's third-generation code-specialized model, trained on over 2 trillion tokens of code and natural language with a mixture-of-experts architecture. It achieves state-of-the-art performance on major coding benchmarks, surpassing GPT-4o and Claude 3.5 Sonnet on several code generation tasks.

deepseekcodeopen-source
69.2B
ModelLLMs

Llama 3.3 70B

by Meta

Meta's refined 70B model delivering performance comparable to the much larger 405B variant through improved training techniques. Offers the best performance-to-cost ratio in the Llama family.

llmopen-sourcelarge-model
68.95B
ModelLLMs

Llama 3 8B

by Meta

Meta's third-generation compact language model with significantly improved performance over Llama 2 at the same size class. Features an expanded 128K token vocabulary and improved tokenizer.

llmopen-sourcesmall-model
68.9B
ModelLLMs

o3-mini

by OpenAI

A compact and cost-efficient reasoning model that delivers strong STEM performance at a fraction of o3's cost. Supports configurable reasoning effort (low/medium/high) to balance speed and accuracy for different use cases.

llmreasoningcost-efficient
68.5B
ModelLLMs

Claude 3 Opus

by Anthropic

Anthropic's most intelligent model at launch of the Claude 3 family, excelling at highly complex tasks requiring deep reasoning and nuanced understanding. Set new benchmarks in graduate-level reasoning and demonstrated near-human comprehension across academic subjects.

llmreasoningmultimodal
68.5B
ModelLLMs

Llama 2 70B

by Meta

Meta's largest Llama 2 variant with 70 billion parameters delivering substantially improved reasoning and knowledge over the 7B version. Became the de facto open-source baseline for LLM research.

llmopen-sourcelarge-model
68.4B
ModelLLMs

Llama 2 7B

by Meta

Llama 2 7B is an open-source 7 billion parameter large language model developed by Meta. Optimized for dialogue and general text generation, its permissive license and manageable size have made it a popular foundational model for fine-tuning, research, and building custom NLP applications.

llmopen-sourcemeta-ai
68.3B
ModelComputer Vision

Sora

by OpenAI

Sora is a text-to-video diffusion transformer model by OpenAI that generates high-fidelity, minute-long videos from textual prompts. It demonstrates an advanced understanding of language and the physical world, enabling complex scenes with multiple characters, specific motions, and coherent narratives.

video-generationtext-to-videoopenai
68B
ModelLLMs

Llama 3.1 8B

by Meta

Llama 3.1 8B is a compact, open-source language model from Meta, featuring a 128K token context window and native tool-use capabilities. It is optimized for high performance in instruction-following and reasoning tasks, making it a cost-effective solution for scalable, on-device, or resource-constrained applications.

llmopen-sourcesmall-model
67.9B
ModelComputer Vision

Stable Diffusion 3

by Stability AI

Stable Diffusion 3 is a powerful text-to-image model using a Multimodal Diffusion Transformer (MMDiT) architecture. It excels at generating images with unprecedented text quality, adhering closely to complex prompts, and achieving high photorealism and compositional accuracy compared to its predecessors.

image-generationdiffusiontext-to-image
67.55B
ModelSpeech & Audio AI

Azure Neural TTS

by Microsoft

Azure Neural TTS is Microsoft's enterprise-grade text-to-speech service, part of Azure AI Speech. It provides 400+ natural-sounding voices across 140+ languages, with detailed prosody control via SSML. The service is designed for scalable applications, from accessibility tools to customer service bots.

text-to-speechneural-ttsazure-ai
67.2B
ModelComputer Vision

Adobe Firefly 3

by Adobe

Adobe Firefly 3 is a commercially safe generative image model trained exclusively on licensed Adobe Stock and public-domain content, making it uniquely suitable for professional and enterprise creative workflows. Its deep integration with Photoshop, Illustrator, and Express enables AI-powered generation directly within industry-standard design tools.

image-generationtext-to-imagecommercial-safe
66.8B
ModelLLMs

Codex-2

by OpenAI

Codex-2 is OpenAI's second-generation code-specialized model, significantly advancing code completion, synthesis, and debugging over the original Codex. It underpins GitHub Copilot's next-generation features and supports a wider range of programming languages and frameworks.

openaicodecode-generation
66.8B
ModelLLMs

ClinicalBERT

by Kexin Huang et al. (Academic)

ClinicalBERT is a BERT-based model pre-trained on clinical notes from the MIMIC-III dataset. It provides a deep contextual understanding of electronic health record (EHR) text and clinical documentation, serving as a foundational model for various clinical natural language processing tasks.

clinical-nlptransformer-modelbert
66.4B
ModelLLMs

Gemini 2.5 Ultra

by Google DeepMind

Gemini 2.5 Ultra is Google DeepMind's most capable model in the 2.5 generation, designed for the most demanding reasoning, coding, and multimodal tasks. It features an extended context window and advanced chain-of-thought capabilities surpassing prior Gemini variants.

googledeepmindfrontier
66B
ModelLLMs

Claude Opus 4

by Anthropic

Anthropic's most capable model in the Claude 4 generation, designed for the most demanding reasoning, analysis, and agentic tasks. Excels at complex multi-step problems requiring deep understanding and sustained coherence across long contexts.

llmreasoningfrontier
65.8B
ModelLLMs

Gemini 1.5 Flash

by Google

Google's lightweight and fast multimodal model optimized for high-volume, cost-sensitive workloads. Supports a 1 million token context window with natively multimodal capabilities across text, image, audio, and video at a fraction of Pro's cost.

llmfastmultimodal
65.6B
ModelLLMs

Cohere Embed v3

by Cohere

Cohere's state-of-the-art embedding model supporting 100+ languages with native int8 and binary quantization for efficient storage. Produces high-quality vector representations optimized for search, classification, and clustering tasks.

embeddingssemantic-searchrag
65.6B
ModelLLMs

Grok-3

by xAI

Grok-3 is xAI's frontier model, delivering state-of-the-art performance in math, science, and coding. Trained on the Colossus supercluster, it features DeepSearch for multi-step research and a 'Think' mode for extended chain-of-thought reasoning, enabling it to tackle complex, real-world problems with access to real-time information.

llmfrontier-modelreasoning-engine
65.55B
ModelAI for Code

DeepSeek-Coder-V2

by DeepSeek

DeepSeek-Coder-V2 is a powerful open-source Mixture-of-Experts (MoE) model specialized in code. It supports 338 programming languages and features advanced fill-in-the-middle capabilities, offering performance comparable to top-tier proprietary models like GPT-4 Turbo at a significantly lower inference cost.

code-generationopen-sourcemoe
65.4B
ModelLLMs

Claude 3.5 Haiku

by Anthropic

Anthropic's fastest, most affordable model in the 3.5 generation, offering performance comparable to Claude 3 Opus. It excels at coding, complex workflows, and agentic tasks due to its advanced tool-use capabilities and speed, making it ideal for high-throughput applications and enterprise automation.

llmfastcost-efficient
65.4B
ModelComputer Vision

Runway Gen-3 Alpha

by Runway

Runway Gen-3 Alpha is a professional-grade video generation model for high-fidelity, temporally consistent clips. It offers fine-grained control over motion, style, and camera behavior via text and image inputs, making it a key tool in professional film and advertising workflows for meeting commercial standards.

video-generationtext-to-videoimage-to-video
65.3B
ModelLLMs

Qwen 2 72B

by Alibaba Cloud

Qwen2-72B is a 72-billion parameter large language model from Alibaba's Qwen2 series. It offers state-of-the-art performance, particularly in multilingual understanding, reasoning, and coding tasks. As an open-weight model, it provides a powerful alternative to proprietary systems for a wide range of applications.

llmopen-weightmultilingual
65.2B
ModelLLMs

Claude 3 Sonnet

by Anthropic

The balanced mid-tier model in the Claude 3 family, offering a strong combination of speed and intelligence. Provides enterprise-grade performance for coding, analysis, and content generation at moderate cost.

llmbalancedmultimodal
65.2B
Modelother

AlphaGo

by Google DeepMind

AlphaGo is a landmark AI from DeepMind that mastered the game of Go. It combines deep neural networks with Monte Carlo Tree Search and reinforcement learning, famously defeating world champion Lee Sedol in 2016. Its success demonstrated AI's ability to tackle complex problems requiring strategic planning.

foundationaldeepmindreinforcement-learning
64.8B
ModelAI for Code

Qwen 2.5 Coder 32B

by Alibaba Cloud

Qwen 2.5 Coder 32B is an open-weight, code-specialized large language model from Alibaba Cloud. Fine-tuned on a massive corpus covering over 92 programming languages, it excels at code generation, completion, and debugging tasks, demonstrating performance on par with or exceeding proprietary models like GPT-4o on several benchmarks.

code-llmopen-weightcode-generation
64.7B
ModelLLMs

Claude 3 Haiku

by Anthropic

Claude 3 Haiku is Anthropic's fastest, most compact model, excelling at near-instant responsiveness. It handles a wide range of tasks, including multimodal vision, with strong performance at a low cost, making it ideal for high-throughput applications like content moderation and customer service.

llmhigh-speedcost-efficient
64.7B
ModelSpeech & Audio AI

MusicGen

by Meta AI

MusicGen is an open-source text-to-music model from Meta AI that generates high-quality instrumental music from text descriptions. It can also be conditioned on a melody reference, providing a strong, controllable baseline for both research and commercial applications, trained on 20K hours of licensed music.

music-generationtext-to-musicopen-source
64.5B
ModelLLMs

Mixtral 8x22B

by Mistral AI

Mixtral 8x22B is a large-scale, open-source Mixture-of-Experts (MoE) model from Mistral AI. It features 176 billion total parameters but only activates 39 billion per token, balancing immense power with efficiency. The model excels at reasoning, code generation, and multilingual tasks, and includes native function calling capabilities.

llmopen-sourcemoe
64.4B
ModelLLMs

Mistral Large

by Mistral AI

Mistral Large is Mistral AI's flagship proprietary model, offering top-tier reasoning and multilingual capabilities. It is designed to compete with other frontier models like GPT-4, excelling in complex tasks that require deep understanding. Its native function calling and fluency in over 30 languages make it highly versatile for enterprise-grade applications.

llmproprietary-modelapi-access
64.3B
ModelAI for Code

Code Llama 34B

by Meta

Code Llama 34B is a large language model from Meta, fine-tuned from Llama 2 for code-specific tasks. It excels at generating, completing, and explaining code across various languages. With variants supporting a 100K token context window, it can analyze and work with extensive codebases for complex tasks like refactoring.

code-llmopen-sourcecode-generation
64.3B
Modelembeddings

Multilingual-E5-Large

by Microsoft Research

Multilingual-E5-Large is a powerful text embedding model from Microsoft supporting 100 languages. Trained on billions of text pairs using contrastive learning, it excels at cross-lingual information retrieval and semantic similarity, establishing a strong open-source baseline for multilingual NLP tasks.

text-embeddingmultilingualcross-lingual
64.2B
ModelLLMs

Med-PaLM 2

by Google

Med-PaLM 2 is Google's large language model specialized for the medical domain. It achieves expert-level performance on medical licensing exams (USMLE) by leveraging advanced clinical reasoning and question-answering capabilities. The model is designed to generate accurate and helpful responses for healthcare professionals.

medical-aiclinical-decision-supportllm
64.2B
Modelmultimodal

Qwen2.5-VL-72B

by Alibaba Cloud (Qwen Team)

Qwen2.5-VL-72B is Alibaba's flagship open vision-language model at 72 billion parameters, achieving top-tier performance on visual understanding benchmarks including chart analysis, document parsing, and fine-grained image understanding. It supports dynamic resolution image inputs and video understanding with native high-resolution processing.

alibabaqwenvision-language
64B
ModelLLMs

GPT-4.5

by OpenAI

GPT-4.5 is a hypothetical large language model from OpenAI, positioned as a research preview before GPT-5. It focuses on large-scale unsupervised learning to significantly reduce hallucinations and enhance factual accuracy. The model is also designed for improved creative writing and greater emotional intelligence in its responses.

llmreasoningmultimodal
64B
ModelLLMs

Phi-3.5-mini

by Microsoft

Phi-3.5-mini is a 3.8B parameter instruction-tuned model from Microsoft, optimized for edge and mobile devices. Despite its compact size, it delivers performance comparable to much larger models on benchmarks for reasoning, coding, and language tasks, making it highly efficient for on-device AI applications.

small-language-modelon-device-aiedge-computing
63.9B
ModelLLMs

o1-mini

by OpenAI

A smaller, faster, and more affordable reasoning model optimized for STEM tasks. Delivers 80% of o1's reasoning capability at roughly 80% lower cost, making it ideal for high-volume coding and math workloads.

llmreasoningmath
63.9B
ModelLLMs

PaLM

by Google

PaLM (Pathways Language Model) is Google's 540 billion parameter language model trained using the Pathways system across 6,144 TPU v4 chips, demonstrating breakthrough capabilities on chain-of-thought reasoning, code generation, and multilingual tasks. It introduced the concept of 'discontinuous' capability jumps at scale and set new benchmarks on hundreds of NLP tasks upon release in 2022.

foundationalgooglepathways
63.3B
ModelComputer Vision

Ideogram 2

by Ideogram AI

Ideogram 2 is a text-to-image model renowned for its superior ability to render legible and accurate text within generated images. It excels at creating high-quality photorealistic and artistic visuals with strong prompt adherence, making it a powerful tool for design, branding, and creative projects.

text-to-imageimage-generationtypography
63.2B
ModelSpeech & Audio AI

Amazon Polly Neural

by Amazon Web Services

Amazon Polly is a cloud-based text-to-speech (TTS) service from AWS that produces highly natural-sounding human speech using neural engine technology. It supports over 30 languages with both standard and neural voices, offering deep integration with the AWS ecosystem for scalable production applications.

text-to-speechcloud-ttsenterprise
63B
ModelLLMs

Claude Opus 4.5

by Anthropic

Claude Opus 4.5 is Anthropic's frontier AI model, delivering state-of-the-art performance in complex reasoning, creative tasks, and nuanced understanding. It features advanced multimodal vision capabilities for analyzing images and documents, along with extended thinking for multi-step, agentic tasks.

llmfrontier-modelmultimodal-ai
62.9B
ModelSpeech & Audio AI

TTS-1

by OpenAI

OpenAI's TTS-1 is a text-to-speech model designed for real-time audio generation. It provides six distinct, natural-sounding preset voices and supports low-latency streaming, making it ideal for interactive applications. A higher-quality variant, tts-1-hd, is available for tasks where audio fidelity is prioritized over speed.

text-to-speechvoice-synthesisaudio-generation
62.7B
ModelLLMs

Command R+

by Cohere

Cohere's most capable RAG-optimized model, offering significantly enhanced reasoning, multi-step tool use, and superior grounded generation over Command R. Designed for complex enterprise workflows requiring high accuracy and citations.

llmragenterprise
62.7B
ModelComputer Vision

Imagen 3

by Google DeepMind

Google DeepMind's highest-quality text-to-image generation model producing photorealistic images with improved detail, lighting, and fewer artifacts. Features enhanced prompt understanding and safety filtering.

image-generationdiffusiontext-to-image
62.65B
ModelLLMs

Qwen 2.5 Max

by Alibaba Cloud

Alibaba Cloud's most capable proprietary model in the Qwen 2.5 family, optimized for complex reasoning and enterprise applications. Available exclusively through Alibaba Cloud's Model Studio API with enhanced safety and alignment.

llmproprietaryreasoning
62.6B
ModelSpeech & Audio AI

AudioCraft

by Meta AI

AudioCraft is an open-source generative audio framework from Meta AI. It integrates MusicGen for music, AudioGen for sound effects, and the EnCodec codec into a single platform. This unified, modular design allows for text-to-audio generation and has become a key reference for audio LLM research.

audio-generationmusic-generationsound-effects
62.6B
ModelLLMs

LegalBERT

by Ilias Chalkidis et al. (Academic)

LegalBERT is a family of BERT models pre-trained on a diverse corpus of English legal texts, including legislation, court cases, and contracts. This specialized training allows it to significantly outperform general-purpose BERT models on downstream legal NLP tasks, establishing it as a foundational baseline for legal AI research and applications.

legal-techberttransformer-model
62.5B
ModelLLMs

Gemma 2 9B

by Google DeepMind

Gemma 2 9B is a lightweight, state-of-the-art open model from Google, part of the next generation of the Gemma family. It offers strong performance for its size class, making it ideal for environments with limited computational resources. Built on a new architecture, it is optimized for on-device applications, research, and fine-tuning.

llmopen-weightssmall-model
62.2B
ModelLLMs

QwQ-32B

by Alibaba / Qwen Team

QwQ-32B is a 32 billion parameter language model from Alibaba, specifically optimized for complex reasoning tasks. It utilizes a deep chain-of-thought methodology to excel at mathematical, scientific, and logical problems, achieving performance comparable to much larger models and showcasing high parameter efficiency.

reasoningqwenalibaba
62B
ModelLLMs

BLOOM

by BigScience Workshop

BLOOM is a 176 billion parameter, open-access multilingual language model developed by the BigScience research workshop. Trained on 46 natural languages and 13 programming languages, it provides powerful text and code generation capabilities, making it a key resource for researchers and developers building multilingual AI applications.

foundational-modelbigsciencemultilingual
61.6B
ModelAI for Code

StarCoder2 15B

by BigCode (ServiceNow + Hugging Face)

StarCoder2 15B is a powerful open-source code generation model from the BigCode project. Trained on The Stack v2 dataset spanning over 600 programming languages, it excels at code completion, generation, and fill-in-the-middle tasks, emphasizing data transparency and author opt-out.

code-llmopen-sourcecode-generation
61.5B
ModelLLMs

Phi-3 Mini

by Microsoft

Microsoft's Phi-3 Mini is a 3.8 billion parameter small language model (SLM) designed for high performance on resource-constrained devices. Despite its compact size, it exhibits strong reasoning and language understanding capabilities, making it suitable for on-device and edge AI applications. It is optimized for efficient inference.

slmopen-weightedge-ai
61.5B
ModelLLMs

Cohere Rerank v3

by Cohere

Cohere Rerank v3 is a state-of-the-art neural model designed to significantly boost the relevance of search results for Retrieval-Augmented Generation (RAG) systems. It re-scores a list of candidate documents from any keyword or vector search system, identifying the most pertinent information. It supports over 100 languages and can process long documents, making it highly versatile.

rerankingsearchrag
61.45B
ModelAI for Code

DeepSeek Coder 33B

by DeepSeek

DeepSeek Coder 33B is a dense, open-source large language model specializing in code-related tasks. Trained from scratch on a massive 2 trillion token dataset of code and natural language, it understands project-level context and supports 87 different programming languages for advanced code generation and completion.

code-generationopen-sourcedense-model
61.2B
ModelLLMs

Llama 3.2 11B Vision

by Meta

Llama 3.2 11B Vision is Meta's first open-source multimodal model, integrating native image understanding with advanced text generation. At a compact 11B parameters, it's designed for efficiency, enabling visual question answering, image captioning, and complex reasoning across text and images in a single, deployable model.

llmopen-sourcemultimodal
60.8B
ModelLLMs

DeepSeek-V2

by DeepSeek

DeepSeek's mixture-of-experts model introducing Multi-head Latent Attention (MLA) for dramatically reduced inference cost. Activates 21B of its 236B total parameters per token while matching larger dense models.

llmopen-sourcemoe
60.8B
ModelAI for Code

Codestral

by Mistral AI

Codestral is Mistral AI's open-weight generative model explicitly designed for code generation tasks. Trained on a diverse dataset of over 80 programming languages, it excels at code completion, generation, and its unique fill-in-the-middle capability. It is optimized for low-latency performance in real-world applications.

code-generationopen-weightfill-in-middle
60.65B
ModelLLMs

Gemma 2 27B

by Google DeepMind

Gemma 2 27B is a powerful, mid-sized open-weights model from Google DeepMind. It delivers significant performance gains in reasoning, coding, and instruction following over smaller variants. Designed for server-side deployment, it provides a strong foundation for advanced research and custom fine-tuning projects.

llmopen-weightsgoogle
60.25B
ModelLLMs

Claude 4.5 Haiku

by Anthropic

Claude 4.5 Haiku is Anthropic's fastest and most compact model, engineered for near-instant responsiveness and high-throughput workloads. It provides enterprise-grade performance at a fraction of the cost, making it ideal for real-time interactions, content moderation, and cost-effective agentic tasks.

llmanthropicclaude-4.5-haiku
60.1B
ModelSpeech & Audio AI

XTTS-v2

by Coqui AI

XTTS-v2 is an open-source, cross-lingual text-to-speech model from Coqui AI. It excels at high-quality voice cloning from just a few seconds of audio and supports 17 languages. With real-time streaming inference, it's ideal for applications needing custom voices and low-latency output.

text-to-speechvoice-cloningmultilingual-tts
59.6C+
ModelLLMs

BloombergGPT

by Bloomberg

BloombergGPT is a 50-billion parameter large language model developed by Bloomberg. It is specifically trained on a massive, curated corpus of financial data accumulated over decades, combined with general-purpose datasets. This specialized training allows it to excel at financial natural language processing tasks, outperforming similarly sized general models.

financefinancial-nlpdomain-specific
59.6C+
ModelLLMs

Grok-2

by xAI

Grok-2 is xAI's second-generation large language model, notable for its real-time knowledge access through the X platform. It possesses strong reasoning and multimodal capabilities, including vision understanding. The model is designed for a more natural, conversational interaction style with a lower tendency to refuse prompts.

large-language-modelgenerative-aixai
59.45C+
ModelLLMs

BioGPT

by Microsoft Research

BioGPT is a domain-specific language model from Microsoft, pre-trained on a massive corpus of biomedical literature from PubMed. It excels at tasks like generating biomedical text, extracting relationships between entities, and answering questions based on medical research, achieving state-of-the-art results on several benchmarks.

biomedicalnlppubmed
59.1C+
ModelLLMs

Command R

by Cohere

Command R is a retrieval-optimized language model from Cohere, specifically designed for enterprise-grade Retrieval-Augmented Generation (RAG) and tool use. It excels in multilingual applications, supporting over 10 languages, and features built-in capabilities for grounding responses and generating citations to ensure accuracy.

llmragenterprise-ai
59.05C+
ModelLLMs

Gemma 2B

by Google DeepMind

Gemma 2B is Google DeepMind's open-weight 2 billion parameter language model from the Gemma family, designed for lightweight deployment on devices with limited resources. It delivers strong performance for its size on language understanding and generation tasks, and serves as a foundation for fine-tuning on domain-specific tasks.

googlesmalledge
59C+
ModelComputer Vision

Pika 1.5

by Pika Labs

Pika 1.5 is an accessible AI video generation model that transforms text prompts or images into high-quality videos. It is known for its expressive motion, diverse cinematic styles, and unique features like physics-based effects and automated lip-sync, making it popular among creators and consumers.

video-generationtext-to-videoimage-to-video
58.6C+
Agentdevops-foundry

SRE Triage Agent

by AaaS DevOps Foundry

Detects anomalies in live system telemetry, runs deterministic diagnostics from the organization's top remediation runbooks, and autonomously resolves up to 40% of standard incidents without human intervention. Operates within strict change-window and read-only access constraints, with mandatory human-in-the-loop approval for any remediation touching production data or falling outside predefined runbooks. Reduces mean-time-to-recovery and augments on-call teams.

sreincident-managementtriage
84.2A
Agentdevops-foundry

Pipeline Healer Agent

by AaaS DevOps Foundry

Continuously observes CI/CD pipelines, code repositories, and incident logs. Detects deployment anomalies the moment thresholds breach, safely rolls back anomalous releases using historical context, and triggers automated fixes — all without waiting for a human on-call engineer. Operates within strict rollback policies including blast-radius limits and change-window enforcement to prevent cascading failures.

devopsci-cddeployment
82.6A
Agentdevops-foundry

Dependency Guardian Agent

by AaaS DevOps Foundry

Maps the entire dependency tree across an organization's codebases, tests library updates in isolated sandbox environments, writes localized unit tests to verify compatibility, and submits fully validated pull requests that respect architectural constraints. Prevents the cascade-of-breaking-changes problem that plagues manual dependency updates, where an LLM taking a prompt literally would introduce version conflicts or accidentally remove necessary features.

devopsdependency-managementsecurity-patching
76.8B+
AgentAI Agents

OpenAI Assistants API

by OpenAI

OpenAI's managed agent platform for building custom AI assistants with persistent threads, built-in code interpreter, file search, and function calling. Handles conversation state, tool orchestration, and context management so developers can focus on business logic.

agent-platformapifunction-calling
74.5B+
AgentAI Business & Strategy

Microsoft Copilot Agent

by Microsoft

Microsoft's autonomous agent within the Copilot ecosystem that operates across Microsoft 365 apps to automate business processes. Handles email triage, meeting preparation, document summarization, and cross-app workflow automation with enterprise-grade security.

enterprisecopilotmicrosoft-365
74.5B+
AgentAI Agents

Personalized Tutor Agent

by Khanmigo (Khan Academy)

An adaptive tutoring agent that dynamically adjusts difficulty, pacing, and instructional modality based on individual learner performance signals. It maintains a persistent knowledge model per student, identifies misconceptions through Socratic questioning, and routes learners to mastery via spaced-repetition scheduling.

educationtutoringadaptive-learning
73.7B+
Agentdevops-foundry

Codebase Architecture Agent

by AaaS DevOps Foundry

Maps structural dependencies, architectural patterns, and historical technical decisions across enterprise codebases. When a critical service fails and the original developers are unavailable, this agent produces a semantic architecture map — dependency graphs, hotspot analysis, and knowledge gap identification — in minutes instead of weeks. Integrates deeply with repositories to understand code as architecture, not just text.

devopsarchitecturecode-search
72.4B+
AgentAI Agents

Omnichannel Support Agent

by Intercom

A fully-autonomous customer support agent that unifies conversations across chat, email, SMS, and social DMs into a single threaded context window. It resolves tier-1 and tier-2 tickets using a retrieval-augmented knowledge base and maintains CSAT targets through sentiment-aware tone calibration.

customer-serviceomnichannellive-chat
72.4B+
AgentAI Agents

AutoGen

by Microsoft Research

Microsoft's multi-agent conversation framework enabling multiple LLM agents to converse, collaborate, and solve tasks through automated chat. Supports customizable agent behaviors, human-in-the-loop, and code execution sandboxing.

multi-agentconversablemicrosoft
72.4B+
AgentAI Tools & APIs

Perplexity

by Perplexity AI

AI-powered answer engine that combines real-time web search with LLM synthesis to provide cited, accurate answers. Features multi-step research capabilities, source verification, and conversational follow-up for deep topic exploration.

research-agentanswer-enginereal-time-search
72B+
AgentAI Agents

EHR Documentation Agent

by Nuance Communications (Microsoft)

Ambient AI agent that listens to physician-patient encounters, generates structured clinical notes (SOAP, H&P, discharge summaries), and auto-populates EHR fields in real time. Reduces documentation burden by over 70% while maintaining compliance with ICD-10 and CPT coding standards.

healthcareehrclinical-documentation
72B+
AgentAI Business & Strategy

Salesforce Einstein Agent

by Salesforce

Salesforce's autonomous AI agent built on the Einstein platform that handles customer interactions, resolves support cases, and automates sales workflows. Operates within the Salesforce ecosystem with full access to CRM data, knowledge bases, and business rules.

enterprisecrmsales
71.7B+
AgentAI Agents

Drug Interaction Checker

by Wolters Kluwer Health

Real-time pharmacological agent that screens multi-drug regimens for contraindications, adverse interactions, and dosing conflicts. Cross-references patient allergy profiles, renal function, and genetic pharmacogenomics data to surface clinically relevant alerts at point of prescribing.

healthcarepharmacologydrug-safety
71.7B+
AgentAI Agents

SEO Analysis Agent

by Ahrefs

A fully-autonomous SEO agent that continuously crawls a target website, audits technical health, researches high-intent keywords, and generates prioritized optimization recommendations. It tracks ranking movements in real time and surfaces backlink opportunities from competitor gap analysis.

marketingseokeyword-research
71.6B+
AgentSpeech & Audio AI

ElevenLabs Conversational Agent

by ElevenLabs

ElevenLabs' conversational AI agent platform combining industry-leading voice synthesis with real-time dialogue capabilities. Supports 29+ languages, custom voice creation, and ultra-low-latency responses for natural phone and web interactions.

voice-agenttext-to-speechvoice-cloning
71.1B+
AgentAI Tools & APIs

AutoGPT

by Significant Gravitas

One of the first open-source autonomous AI agents that chains LLM calls to accomplish complex goals. Decomposes high-level objectives into sub-tasks, maintains memory, and executes multi-step plans with internet access and file operations.

general-agentautonomousopen-source
70.6B+
Agentdevops-foundry

Latency Budget Planner Agent

by AaaS DevOps Foundry

Decomposes end-to-end application latency into detailed per-component budgets for real-time and streaming pipeline architectures. Autonomously adds graceful degradation protocols, timeout handling configurations, and p50/p95 tracing metrics required for production multimodal systems. Where a generic AI produces streaming pipeline code without real-world latency considerations, this agent understands the physics of distributed systems and produces actionable latency allocation plans.

devopsperformancelatency
70.5B+
AgentAI Agents

Learning Path Optimizer

by Coursera

A recommendation agent that maps learner skill profiles against target competency frameworks and synthesizes the shortest credentialed path to proficiency. It continuously reoptimizes routing as learners complete modules and integrates real-time labor-market signals to prioritize high-value skill sequences.

educationlearning-pathspersonalization
70.5B+
AgentAI Agents

Legal Research Agent

by Westlaw AI (Thomson Reuters)

Comprehensive legal research agent that queries case law databases, statutes, regulations, and secondary sources to synthesize jurisdiction-specific memos, identify controlling precedents, and map circuit splits. Generates formatted legal research memos with citation-verified sources and confidence scores.

legallegal-researchcase-law
69.9B
AgentAI Business & Strategy

Google Duet AI

by Google

Google's AI-powered assistant embedded across Google Workspace and Google Cloud that automates document creation, email drafting, data analysis, and cloud infrastructure management. Leverages Gemini models for contextual understanding across the Google ecosystem.

enterprisegoogle-workspaceproductivity
69.8B
AgentAI Agents

Meeting Summarizer Agent

by Otter.ai

An autonomous agent that joins virtual meetings, transcribes conversations in real time with speaker diarization, and generates structured summaries containing decisions made, action items with owners and due dates, and key discussion points. It distributes follow-up notes to participants, syncs action items into project management tools, and maintains a searchable meeting knowledge base.

enterprisemeetingsproductivity
69.1B
AgentAI Tools & APIs

Snyk AI Agent

by Snyk

AI-powered developer security agent that continuously scans code, dependencies, containers, and infrastructure-as-code for vulnerabilities. Provides automated fix pull requests, prioritizes issues by exploitability, and integrates directly into the developer workflow for shift-left security.

securitydevsecopsvulnerability-scanning
68.8B
AgentAI for Code

GitHub Copilot Workspace

by GitHub (Microsoft)

GitHub's AI-native development environment that turns issues into fully implemented code changes. Plans, implements, and validates multi-file edits with human-in-the-loop review before merging.

coding-agentgithubcollaborative
68.6B
AgentAI Agents

LangGraph

by LangChain Inc.

LangChain's framework for building stateful, multi-agent applications using graph-based workflows. Provides fine-grained control over agent state, cycles, branching, and human-in-the-loop checkpoints for production-grade agentic systems.

multi-agentgraphstateful
68.5B
AgentAI Agents

Dependency Updater Agent

by Mend (WhiteSource)

An automated agent that scans software repositories for outdated or vulnerable dependencies, opens pull requests with tested dependency upgrades, and resolves breaking API changes introduced by major version bumps. It groups related updates, runs the test suite for each PR, and prioritizes CVE-critical packages to ensure security patches ship within SLA windows.

codingdependenciessecurity
68.4B
AgentAI Agents

AWS Bedrock Agents

by Amazon Web Services

AWS's fully managed agent service within Amazon Bedrock that orchestrates multi-step tasks using foundation models. Automatically breaks down user requests, calls APIs, queries knowledge bases, and executes actions while maintaining enterprise security and compliance controls.

agent-platformawsenterprise
68.2B
AgentAI Business & Strategy

ServiceNow AI Agent

by ServiceNow

An autonomous AI agent built on the Now Platform, designed to automate end-to-end IT Service Management (ITSM) processes. It independently resolves common incidents, fulfills service requests, and executes standard change workflows by leveraging a proprietary knowledge graph and workflow engine, reducing the need for human intervention.

itsmai-agentworkflow-automation
67.6B
AgentAI Agents

Performance Profiler Agent

by Datadog

An autonomous profiling agent that instruments application code, analyzes CPU flame graphs, memory heap snapshots, and database query plans to identify performance bottlenecks, then proposes and optionally applies targeted code optimizations. It tracks regression history, correlates deployments with latency spikes, and benchmarks fixes against baseline measurements before recommending production rollout.

codingperformanceprofiling
67.5B
AgentAI Agents

CrewAI

by CrewAI

Framework for orchestrating role-playing autonomous AI agents that work together as a crew. Enables defining agents with specific roles, goals, and backstories to collaborate on complex tasks through structured workflows.

multi-agentcrewrole-playing
67.4B
AgentAI Agents

Risk Assessment Agent

by ServiceNow

This AI agent automates enterprise risk management (ERM) by continuously synthesizing data from internal systems and external intelligence. It identifies, categorizes, and scores diverse risks, maintaining a live risk register and mapping control effectiveness to provide a real-time, holistic view of the organization's risk posture.

enterprise-risk-managementgrccompliance-automation
67.2B
AgentAI Business & Strategy

Zendesk AI Agent

by Zendesk

Zendesk's AI Agent is an autonomous customer support tool designed to resolve inquiries across email, chat, and messaging. Trained on billions of real service interactions, it understands intent and sentiment to provide resolutions without requiring human intervention, freeing up teams for complex issues.

customer-supporthelpdeskticketing
67.1B
AgentAI Agents

Social Media Optimizer

by Sprout Social

A semi-autonomous agent that optimizes social media content for maximum reach. It analyzes platform-specific engagement patterns, rewrites posts, schedules them for peak audience times, and A/B tests caption variations to improve performance across channels.

marketing-automationsocial-media-managementcontent-optimization
67.1B
AgentAI Agents

Document Classification Agent

by ABBYY

An AI agent that automates document processing by classifying unstructured files like invoices, contracts, and emails into predefined categories. It extracts key data, validates it against business logic, and routes documents to appropriate systems, supporting multiple languages and improving over time via human feedback.

idpdocument-ainlp
67.1B
AgentAI Tools & APIs

Elicit

by Elicit

AI research assistant that automates systematic literature reviews and evidence synthesis. Searches across 200M+ academic papers, extracts key findings, and synthesizes results into structured summaries with full citations.

research-agentacademicliterature-review
66.9B
AgentAI for Code

SWE-agent

by Princeton NLP

Princeton NLP's research agent that turns LLMs into autonomous software engineers. Achieves state-of-the-art results on SWE-bench by providing an agent-computer interface optimized for code navigation and editing.

coding-agentresearchopen-source
66.7B
AgentAI Tools & APIs

Figma AI Agent

by Figma

Figma AI is a suite of native artificial intelligence features integrated directly within the Figma and FigJam platforms. It accelerates the design process by generating UI elements from text prompts, automatically populating mockups with realistic content, and providing intelligent suggestions to improve design consistency.

design-agentfigmagenerative-ui
66.3B
Agentcustomer-success-foundry

Support Resolver Agent

by AaaS

Resolves up to 80% of Tier-1 and Tier-2 support requests by directly accessing the CRM and payment gateways. Processes refunds within configurable monetary caps, updates account settings, modifies subscriptions, and routes edge cases to human representatives with full conversation summaries and diagnostic context. Unlike basic chatbots that regurgitate FAQ documents, this agent takes transactional action — it resolves, not deflects.

supportcustomer-serviceticketing
66.2B
AgentAI Agents

Google Vertex AI Agents

by Google Cloud

Google Vertex AI Agents is an enterprise-grade platform for building and deploying production-ready generative AI agents on Google Cloud. It enables developers to create agents that can reason, use tools, and leverage grounded generation with Google Search to complete complex tasks and engage in multi-turn conversations.

agent-platformgoogle-cloudenterprise-ai
66B
AgentAI Tools & APIs

CrowdStrike Charlotte AI

by CrowdStrike

CrowdStrike's generative AI security analyst, Charlotte AI, accelerates threat operations by automating investigation and response. It correlates alerts, enriches incidents with threat intelligence, and recommends actions, allowing security teams to query vast datasets and understand threats using natural language.

cybersecuritygenerative-aiai-security
66B
AgentAI Agents

Predictive Maintenance Agent

by SparkCognition

An IoT-connected agent that ingests vibration, temperature, acoustic, and electrical signals from industrial equipment to predict failure events hours to weeks in advance using ML anomaly detection and physics-based models. It generates work orders in CMMS systems, recommends spare parts pre-positioning, and calculates optimal maintenance windows to minimize production impact.

manufacturingpredictive-maintenanceIoT
65.8B
AgentAI Agents

Radiology Report Agent

by Nuance PowerScribe

An AI assistant that accelerates radiology reporting by automatically drafting structured reports from imaging findings. It applies standard templates like ACR BI-RADS, extracts key measurements, and codes findings using RadLex terminology, significantly reducing radiologist documentation time and improving data consistency for analytics.

healthcareradiologyreporting
65.6B
AgentAI Agents

Medical Imaging Analyzer

by Arterys

Deep-learning agent that analyzes DICOM medical images across modalities — CT, MRI, X-ray, and PET — to surface anomalies, measure lesions, and generate structured findings. Integrates directly into PACS workflows and flags priority studies for radiologist review.

healthcaremedical-imagingradiology
65.5B
AgentAI Agents

Escalation Manager Agent

by Zendesk

A decision-intelligence agent that monitors live support queues in real time, detects escalation signals (frustrated language, churn-risk keywords, repeat contacts), and routes high-priority cases to the most qualified available agent with full context pre-loaded. It enforces tiered escalation policies and logs every routing decision for compliance auditing.

customer-serviceescalationrouting
65.5B
AgentAI Agents

Database Migration Agent

by Redgate

An autonomous agent designed to automate the entire database migration lifecycle. It analyzes schema differences, generates forward and rollback migration scripts, and validates data integrity post-migration. The agent supports complex data transformations and migrations across different database platforms like PostgreSQL and Oracle, ensuring zero data loss.

database-migrationschema-managementdevops
65.5B
AgentAI Agents

Fraud Detection Agent

by Featurespace

An AI agent designed for real-time fraud prevention across various payment channels. It leverages behavioral biometrics, graph analytics, and machine learning to analyze transaction streams, identify suspicious patterns, and provide sub-50ms risk decisions. The system includes adaptive feedback loops for continuous model improvement.

financefraud-detectionrisk-management
65.3B
AgentAI Infrastructure

PagerDuty AI

by PagerDuty

PagerDuty AI is an AIOps agent for incident management that automates triage and response. It intelligently groups related alerts to reduce noise, correlates events to identify root causes, and suggests or executes automated remediation runbooks. This helps teams minimize downtime and streamline their on-call processes.

incident-managementon-callaiops
65.1B
AgentAI Agents

Content Strategy Agent

by Jasper AI

An autonomous AI agent designed to streamline content marketing operations. It performs comprehensive audits of existing content, identifies strategic topic gaps by analyzing competitors and search trends, and generates data-driven editorial calendars. The agent ensures all content aligns with brand voice and business objectives.

marketingcontent-strategyeditorial-planning
65.1B
AgentAI Tools & APIs

DataRobot AI Agent

by DataRobot

DataRobot is an enterprise AI platform that automates the end-to-end machine learning lifecycle. It enables users to build, deploy, and monitor predictive models at scale, from data preparation to production. The platform offers automated feature engineering, model selection, and hyperparameter tuning to accelerate the path from raw data to business value.

automlmachine-learningenterprise-ai
65B
AgentAI Agents

Student Assessment Agent

by Turnitin

An automated assessment agent that generates item banks, administers adaptive quizzes, and provides calibrated scoring with detailed feedback explanations. It applies Item Response Theory to estimate learner proficiency and surfaces at-risk students to instructors via configurable alert thresholds.

educationassessmentgrading
64.9B
AgentAI Agents

Contract Management Agent

by Ironclad

An AI agent for automating contract lifecycle management (CLM). It extracts critical data like terms, dates, and obligations from agreements, centralizes them into a searchable repository, and provides automated alerts for key deadlines. The agent streamlines review by comparing new contracts against pre-approved clause libraries and company playbooks.

contract-lifecycle-managementlegal-techai-agent
64.8B
AgentAI Agents

Portfolio Optimizer

by BlackRock Aladdin

An advanced AI agent for constructing and managing investment portfolios. It leverages quantitative techniques like mean-variance optimization and the Black-Litterman model to align portfolios with specific investor goals, risk tolerances, and constraints such as ESG mandates, while continuously monitoring for drift and executing tax-efficient rebalancing.

financeportfolio-managementquantitative-finance
64.6B
AgentAI Business & Strategy

Intercom Fin

by Intercom

Intercom Fin is an AI-powered chatbot designed for customer support automation, built on OpenAI's GPT-4. It autonomously resolves customer queries by leveraging a company's help center content and past conversation data. Fin provides human-like answers, can execute actions, and intelligently escalates complex issues to human agents.

customer-supportchatbotconversational-ai
64.6B
AgentAI Agents

Boston Dynamics Atlas

by Boston Dynamics

Next-generation fully electric humanoid robot designed for industrial and commercial applications. Features unmatched athletic ability, whole-body manipulation, and advanced perception for operating in complex, dynamic environments alongside humans.

humanoid-robotroboticsmobility
64.4B
AgentAI Agents

Azure AI Agent Service

by Microsoft

An enterprise-grade platform from Microsoft for building, deploying, and managing sophisticated AI agents. Built on the Copilot stack, it allows developers to create agents that can reason, use tools, and orchestrate complex tasks. The service features deep integration with Microsoft services and robust responsible AI controls.

agent-platformazureenterprise-ai
64.3B
AgentAI Agents

Ad Copy Generator

by Copy.ai

An AI agent designed for paid advertising that generates multiple headline and description variants to boost click-through rates. It analyzes product data, target personas, and landing pages to create optimized copy for Google Ads, Meta, and LinkedIn, ensuring strong message-to-market alignment.

marketingadvertisingcopywriting
64.2B
AgentAI Tools & APIs

MetaGPT

by DeepWisdom

MetaGPT is an open-source multi-agent framework that automates software development by simulating a virtual company. It assigns distinct roles like product manager, architect, and engineer to different LLM agents. Starting from a single-line requirement, it follows Standardized Operating Procedures (SOPs) to generate comprehensive outputs, including user stories, system designs, diagrams, and executable code.

multi-agentopen-sourcesoftware-development
64B
AgentAI Agents

Customer Feedback Analyzer

by Medallia

A continuous feedback intelligence agent that ingests NPS surveys, review platforms, support tickets, and social mentions to extract structured voice-of-customer insights. It applies aspect-level sentiment analysis to surface product and service themes and auto-generates prioritized improvement briefs for product and operations teams.

customer-servicefeedbacksentiment-analysis
64B
AgentAI Agents

Player Analytics Agent

by GameAnalytics

A behavioral analytics agent that ingests player telemetry streams to build individual and cohort behavior models, predict churn risk, and surface liveops intervention opportunities. It continuously segments the player base by engagement and monetization propensity, feeding recommendations into targeted push notification, reward, and re-engagement campaign engines.

gaminganalyticsplayer-behavior
63.8B
AgentAI Agents

Literature Review Agent

by Elicit AI

An AI-powered agent designed to automate systematic literature reviews. It queries major academic databases like PubMed and arXiv to identify, screen, and synthesize evidence from thousands of papers. The agent produces structured outputs including evidence tables, meta-analysis plots, and PRISMA-compliant reports with bias assessments.

literature-reviewsystematic-reviewresearch-automation
63.6B
AgentAI Agents

Email Triage Agent

by Superhuman

An inbox intelligence agent that reads, categorizes, and prioritizes incoming emails by urgency and business intent, drafts context-aware reply suggestions, auto-responds to routine inquiries within configured policies, and escalates high-priority items with briefings to ensure nothing critical is missed. It learns communication preferences over time to continuously improve draft quality and routing accuracy.

enterpriseemailproductivity
63.6B
AgentAI for Code

Aider

by Paul Gauthier

AI pair programming tool in the terminal that works with any LLM to edit code in local git repositories. Features automatic git commits, multi-file editing, and voice coding with support for connecting to dozens of model providers.

coding-agentclipair-programming
63.5B
Agentfinance-foundry

Invoice Reconciler Agent

by AaaS

Ingests unstructured invoice data across wildly varying formats (multi-page PDFs, email attachments, CSV exports). Matches invoices deterministically against Purchase Orders and receipt logs in the ERP system, and authorizes payment for the established happy path — achieving 60-80% straight-through processing without human intervention. Exception invoices (mismatches, missing POs, duplicate detection) are routed to a human queue with full context. Every payment authorization generates an immutable audit trail.

financeaccounts-payableinvoice
63.4B
AgentAI Agents

Contract Review Agent

by Ironclad AI

An AI agent designed to automate the legal contract review process. It extracts key clauses from documents like NDAs, MSAs, and SOWs, comparing them against a pre-defined legal playbook to flag non-standard language. The agent scores risk levels by clause and can automatically generate redlines with preferred positions, accelerating review cycles.

legalcontract-reviewclm
63.4B
AgentAI for Code

Amazon Q Developer Agent

by Amazon Web Services

Amazon Q is an AI-powered developer agent from AWS that automates code transformations, feature implementation, and security remediation. It is deeply integrated with the AWS ecosystem, allowing it to understand project context, suggest relevant AWS services, and streamline cloud-native development workflows directly within the IDE.

coding-agentawsenterprise
63.4B
AgentAI Agents

Quality Inspection Agent

by Landing AI

An AI agent that uses computer vision to perform real-time quality inspections on manufacturing lines. It automatically detects, classifies, and logs surface defects, dimensional inaccuracies, and assembly errors at production speed, triggering alerts or reject mechanisms to prevent faulty products from proceeding.

quality-controlcomputer-visiondefect-detection
63.2B
AgentAI Agents

ChatDev

by OpenBMB

ChatDev is a virtual software company powered by multiple LLM agents that simulate a real-world development team. These agents, playing roles like CEO, programmer, and tester, collaborate to automate the entire software development lifecycle, from design and coding to testing, based on a single natural language prompt.

multi-agentsoftware-developmentcollaborative
63.2B
AgentAI Agents

Campaign Analytics Agent

by Northbeam

An autonomous AI agent that unifies campaign data from disparate marketing channels to provide a holistic view of performance. It leverages advanced multi-touch attribution models to calculate true ROI and delivers actionable recommendations for budget optimization. The agent automatically generates executive-level reports and issues real-time alerts for performance anomalies.

marketing-analyticsattribution-modelingroi-optimization
63.1B
AgentAI Agents

Expense Audit Agent

by AppZen

An AI agent that automates the auditing of employee expense reports. It uses OCR to extract data from receipts, then validates expenses against company policies, per-diem rates, and vendor lists. The agent flags violations and potential fraud, auto-approves compliant reports, and routes exceptions for human review.

enterprisefinanceexpense-management
63B
AgentAI for Code

OpenHands

by All Hands AI

OpenHands is an open-source platform for creating autonomous AI software agents. It offers a secure, sandboxed environment where agents can execute complex development tasks by writing code, running commands, browsing the web, and interacting with APIs. It supports multi-agent delegation for tackling intricate problems.

coding-agentopen-sourcesandboxed
62.9B
AgentAI Tools & APIs

Consensus

by Consensus NLP

Consensus is an AI-powered search engine designed to extract and synthesize findings directly from peer-reviewed scientific literature. It uses natural language processing to answer user questions with evidence-based conclusions, highlighting the general consensus from multiple studies and providing metrics on study quality.

research-agentacademic-searchevidence-based-medicine
62.7B
AgentSpeech & Audio AI

Vapi AI

by Vapi

Vapi AI is a developer-first platform for building and deploying real-time, conversational voice agents. It provides low-latency streaming, interruptible speech, and seamless integrations with various LLM, TTS, and STT providers. The platform is designed for developers to create sophisticated voice experiences with features like function calling and call analytics.

voice-aivoice-agentvoice-api
62.6B
AgentAI Infrastructure

GitLab Duo Agent

by GitLab

GitLab Duo is an AI-powered assistant integrated into the GitLab DevSecOps platform. It enhances developer productivity across the software development lifecycle by offering code suggestions, summarizing issues, explaining vulnerabilities, and generating tests, all within the native GitLab environment.

ai-assistantdevsecopscode-generation
62.6B
Agentrevenue-foundry

Intent Prospector Agent

by AaaS

Continuously tracks buyer intent signals across website visits, email engagement, CRM activity, and third-party intent data providers. Scores leads against Ideal Customer Profiles, drafts hyper-personalized outreach sequences based on behavioral signals, and books meetings directly on sales representatives' calendars. Replaces generic outreach spam with data-driven, compliant prospecting that respects CAN-SPAM and GDPR opt-out requirements.

salesprospectingintent-data
62.5B
AgentAI Agents

Logistics Routing Agent

by project44

An AI agent designed to solve complex vehicle routing problems (VRP) for logistics and supply chain operations. It optimizes multi-stop routes for entire fleets by considering constraints like time windows, vehicle capacity, and traffic. The agent dynamically reroutes in real-time to adapt to new orders, delays, or cancellations.

supply-chainlogisticsrouting
62.4B
AgentAI Tools & APIs

H2O AI Agent

by H2O.ai

H2O.ai offers an open-source and enterprise AutoML platform that automates the machine learning lifecycle. It excels at automated model training, interpretation, and deployment, supporting distributed computing for large datasets. The platform provides comprehensive model explainability features like SHAP values, making complex models transparent.

automlmachine-learningopen-source
62.3B
AgentAI Agents

Carbon Footprint Analyzer

by Persefoni

Calculates comprehensive Scope 1, 2, and 3 carbon emissions across the entire value chain. This ESG intelligence agent ingests diverse data like energy bills, travel records, and procurement data to generate audit-ready GHG inventory reports. It benchmarks performance and identifies key reduction opportunities, ensuring alignment with GHG Protocol standards.

esgsustainabilitycarbon-accounting
62.2B
Agentfinance-foundry

Fraud Isolator Agent

by AaaS

Continuously evaluates live transaction streams against episodic memories of individual user behavior patterns. Detects complex anomaly patterns that rule-based fraud detection systems miss due to high false-positive rates. Autonomously pauses suspicious transactions and triggers secure multi-factor escalation workflows. The agent can only pause and flag — it never approves or releases funds, ensuring a human always makes the final call on flagged transactions.

financefraudsecurity
62.1B
AgentAI Agents

SLA Monitor Agent

by PagerDuty

Monitors service-level agreement (SLA) compliance by tracking response and resolution times for all active tickets. The agent proactively alerts teams about potential SLA breaches, allowing them to act before a violation occurs. It can also automatically reprioritize ticket queues based on urgency and generates regular SLA performance reports for management.

sla-managementcustomer-serviceit-service-management
62B
AgentAI Agents

Fleet Management Agent

by Geotab

An operational intelligence agent for managing autonomous vehicle fleets. It optimizes asset utilization and uptime by intelligently dispatching vehicles to demand hotspots, scheduling predictive maintenance from telemetry data, and balancing charge levels for EVs. The agent provides a real-time control dashboard to surface anomalies for operations teams.

fleet-managementautonomous-vehiclesdispatch
62B
AgentAI Agents

Demand Forecasting Agent

by o9 Solutions

The Demand Forecasting Agent leverages machine learning to analyze diverse datasets, including historical sales, market trends, and external factors like weather or promotions. It produces accurate, SKU-level demand forecasts for various time horizons, enabling businesses to optimize inventory, reduce stockouts, and improve supply chain efficiency.

supply-chainforecastingdemand-planning
61.8B
AgentAI for Code

Codex CLI

by OpenAI

OpenAI's open-source CLI coding agent that operates in the terminal with sandboxed execution. Reads and edits files, runs commands, and supports multiple approval modes from suggest to full-auto.

coding-agentcliopenai
61.6B
Agentfinance-foundry

Cloud Cost Optimizer Agent

by AaaS

Holds persistent, long-term memory of historical cloud infrastructure utilization patterns. Autonomously monitors resource usage across regions and cloud providers, identifies idle or underutilized resources, right-sizes instances based on real traffic patterns, and executes cost-saving measures continuously — all without waiting for a human FinOps review. Operates within safety guardrails: never terminates production instances, enforces a 7-day cooldown before right-sizing, and logs all actions with rollback capability.

finopscloudcost-optimization
61.4B
AgentAI Agents

Architecture Review Agent

by Codescene

A senior-engineer-level agent that statically analyzes codebase architecture using dependency graphs, coupling metrics, and design pattern recognition to identify anti-patterns, circular dependencies, and violations of architectural fitness functions. It produces architectural decision records (ADRs), generates C4 model diagrams, and prioritizes refactoring opportunities by technical debt cost and business risk.

codingarchitecturereview
61.4B
Agentcustomer-success-foundry

Churn Prevention Agent

by AaaS

Tracks subtle drops in product usage, performs sentiment analysis on support tickets, and synthesizes disparate signals into churn risk scores for each account. Preemptively drafts retention plans, schedules proactive check-in calls, and highlights upselling opportunities — all before the critical renewal window opens. Transforms customer success from reactive firefighting into a data-driven retention engine.

churnretentioncustomer-success
61.3B
AgentAI Agents

Legal Document Drafter

by Harvey AI

An AI agent that automates the creation of legal documents by leveraging structured data, template libraries, and firm-specific style guides. It generates jurisdiction-compliant agreements, pleadings, and regulatory filings, incorporating precedents and flagging potential issues for attorney review. The system streamlines drafting workflows, ensuring consistency and accuracy.

legaldocument-draftinglegal-writing
61.2B
AgentAI Agents

Supplier Risk Agent

by Resilinc

An intelligence agent that continuously monitors supplier financial health, geopolitical exposure, ESG compliance, news sentiment, and delivery performance to generate dynamic risk scores for every vendor in the supply network. It alerts procurement teams to emerging threats and recommends dual-sourcing or buffer stock adjustments before disruptions materialize.

supply-chainrisk-managementsupplier
61.1B
AgentAI Agents

NPC Behavior Agent

by Inworld AI

An AI agent that uses reinforcement learning (RL) to generate dynamic NPC behaviors. Instead of relying on static scripts, it learns complex strategies through self-play and interaction, adapting its difficulty and tactics in real-time to match a player's skill level, ensuring a consistently challenging and unpredictable experience.

gamingnpcreinforcement-learning
61B
AgentAI Agents

Digital Twin Agent

by Ansys

This AI agent creates and manages high-fidelity virtual replicas of physical assets and processes. By synchronizing with real-time IoT data, it runs complex simulations to test changes, predict failures, and analyze what-if scenarios posed in natural language, enabling optimization before physical implementation.

digital-twinmanufacturingsimulation
60.9B
AgentAI Tools & APIs

GPT Researcher

by Tavily

Open-source autonomous research agent that conducts comprehensive web research on any topic. Generates detailed research reports by planning queries, scraping multiple sources, filtering information, and synthesizing findings with citations.

research-agentopen-sourceautonomous
60.8B
AgentAI Agents

Financial Statement Analyzer

by Visible Alpha

An AI-powered agent designed for systematic financial statement analysis. It automates the ingestion and parsing of corporate filings like 10-Ks and 10-Qs to compute key financial ratios, identify accounting anomalies, and benchmark performance against industry peers. The agent generates concise, investment-grade summaries highlighting financial health and potential risks.

financefinancial-analysisaccounting
60.8B
AgentAI Business & Strategy

Moveworks

by Moveworks

Moveworks is an enterprise AI copilot platform that automates employee support. It uses conversational AI to understand and resolve requests across IT, HR, finance, and other departments directly in collaboration tools like Slack and Microsoft Teams, reducing the need for manual intervention.

enterprise-aiit-support-automationemployee-experience
60.7B
AgentAI Agents

HR Screening Agent

by HireVue

An AI recruiting agent that screens resumes at scale, scores candidates against job description criteria, conducts asynchronous video interview analysis, and shortlists top applicants while flagging potential bias signals for human review. It integrates with ATS platforms to automate interview scheduling for shortlisted candidates and maintains a structured candidate evaluation audit trail.

enterpriseHRrecruitment
60.7B
Agentrevenue-foundry

Campaign Orchestrator Agent

by AaaS

Monitors live campaign performance across Google Ads, Meta, LinkedIn, and other advertising channels continuously. Automatically reallocates budgets to highest-performing channels, dynamically personalizes ad copy for different audience segments, and kills underperformers before they waste spend. Operates within configurable budget guardrails including max daily spend caps, minimum ROAS thresholds, and A/B test significance gates.

marketingcampaignadvertising
60.6B
AgentAI Tools & APIs

Galileo AI

by Galileo AI

Galileo AI is a design copilot that transforms natural language prompts into high-fidelity, editable UI designs. It generates complete screens, individual components, and custom illustrations directly within Figma, aiming to accelerate the design process by automating repetitive tasks and providing instant visual mockups.

design-agentui-generationfigma
60.6B
AgentAI Agents

Traffic Prediction Agent

by HERE Technologies

This agent specializes in spatio-temporal traffic forecasting, predicting conditions up to 30 minutes in advance for intersections and corridors. It processes data from V2X communications, vehicle telemetry, and infrastructure sensors. The predictions are designed for fleet routing engines to optimize ETAs and alleviate urban congestion.

traffic-predictionspatio-temporalforecasting
60.4B
AgentAI Tools & APIs

STORM (Stanford)

by Stanford NLP

STORM is an open-source AI research agent from Stanford University designed to automate the creation of comprehensive, Wikipedia-style articles. It simulates a human research process by generating diverse questions, searching the web for information, and synthesizing the findings into a well-structured, cited narrative based on a generated outline.

research-agentopen-sourcestanford
60.3B
AgentAI Tools & APIs

Tavily Research Agent

by Tavily

Tavily is a specialized search API designed for Large Language Models (LLMs) and AI agents. It provides real-time, fact-grounded web search results in a structured, clean format, eliminating the need for manual data cleaning. The API is optimized to deliver relevant, concise information, making it ideal for powering autonomous agents and RAG applications.

research-agentsearch-apireal-time
60.25B
AgentAI Agents

Knowledge Base Builder Agent

by Guru

This autonomous agent streamlines knowledge management by ingesting data from support tickets, chat logs, and documents. It automatically generates, updates, and deduplicates knowledge base articles, identifying content gaps by analyzing unanswered user queries. New drafts are created for human review, ensuring a constantly improving self-service resource.

knowledge-managementcustomer-serviceself-service
59.9C+
AgentAI for Code

Sourcegraph Cody

by Sourcegraph

Sourcegraph's AI coding assistant with deep codebase context powered by code graph intelligence. Understands entire repositories through code search, cross-references, and dependency analysis for highly accurate code generation and answers.

coding-agentcode-intelligencesourcegraph
59.9C+
AgentAI Business & Strategy

Ada AI

by Ada

Ada is an enterprise-grade conversational AI platform designed for automating customer service. Its no-code builder allows businesses to create and deploy AI agents across various digital channels, aiming to resolve a high percentage of customer inquiries without human intervention and providing seamless handoffs when needed.

conversational-aicustomer-supportchatbot-platform
59.9C+
SkillAI Tools & APIs

Transfer Learning

by Community

Leverages knowledge from a source domain to improve model performance on a target domain with limited labeled data. A foundational technique for reducing training costs and accelerating model development across diverse applications.

transfer-learningdomain-adaptationfine-tuning
78.2B+
SkillLLMs

Chain-of-Thought

by AaaS

Guides LLMs to produce step-by-step reasoning before arriving at a final answer. Dramatically improves performance on math, logic, and multi-step problems by making the model's reasoning process explicit and verifiable.

promptingreasoningchain-of-thought
76.6B+
SkillLLMs

Prompt Engineering

by AaaS

The foundational discipline of crafting effective prompts to elicit desired behaviors from language models. Covers system prompt design, instruction formatting, output structuring, temperature tuning, and iterative prompt refinement techniques.

promptingengineeringoptimization
76.5B+
SkillAI for Code

Code Generation

by AaaS

Generates functional code from natural language descriptions, specifications, or partial implementations. Covers multiple languages and frameworks with support for boilerplate scaffolding, algorithm implementation, and API integration patterns.

codinggenerationprogramming
75B+
SkillAI Agents

Function Calling

by AaaS

Enables LLMs to invoke external functions by generating structured JSON arguments matching defined schemas. Supports parallel function calls, error handling, and chained invocations for complex multi-step tool interactions.

function-callingtoolsstructured-output
73.7B+
SkillAI Tools & APIs

Collaborative Filtering

by Community

Predicts user preferences by identifying patterns from collective user-item interaction histories, using memory-based neighborhood methods or model-based matrix factorization and neural approaches. The backbone of recommendation systems at scale across e-commerce, streaming, and social platforms.

recommendationcollaborative-filteringmatrix-factorization
73.6B+
SkillLLMs

Few-Shot Learning

by AaaS

Teaches LLMs to perform tasks by providing a small number of input-output examples in the prompt. Enables rapid task adaptation without fine-tuning by demonstrating the desired pattern through carefully selected, representative examples.

promptingfew-shotexamples
73.5B+
SkillAI Agents

Tool Use

by AaaS

Equips AI agents with the ability to select and use appropriate tools from a defined toolkit to accomplish tasks. Covers tool selection logic, input marshalling, output interpretation, and fallback strategies when tools fail or return unexpected results.

toolsagentsintegration
72B+
SkillSpeech & Audio AI

Speech Recognition

by AaaS

Teaches integration and optimization of automatic speech recognition (ASR) systems — from Whisper to streaming cloud APIs — for agentic voice pipelines. Covers language identification, word error rate reduction, punctuation restoration, and handling noisy audio environments.

asrwhispertranscription
71.9B+
SkillAI Tools & APIs

Time-Series Forecasting

by Community

Predicts future values of sequential, time-indexed data using classical statistical models (ARIMA, ETS), gradient boosting (LightGBM, XGBoost), and deep learning architectures (Transformers, N-BEATS, TFT). Handles trend, seasonality, exogenous covariates, and uncertainty quantification.

time-seriesforecastingtemporal
71.3B+
SkillAI Tools & APIs

Domain-Specific Fine-Tuning

by Community

Adapts a general-purpose pretrained model to a narrow domain by continuing training on curated domain corpora or instruction datasets. Produces specialized models that outperform generalist baselines on domain-specific benchmarks while preserving broad language understanding.

fine-tuningdomain-adaptationllm
71.2B+
SkillAI for Code

Code Review

by AaaS

Analyzes code for bugs, security vulnerabilities, performance issues, and style violations. Provides actionable feedback with severity levels and suggested fixes aligned to language-specific best practices and project conventions.

codingreviewquality
71B+
SkillAI Tools & APIs

Hybrid Recommendation Systems

by Community

Combines collaborative filtering and content-based signals — along with contextual, knowledge-graph, and session-based features — into unified ranking models that outperform single-strategy approaches. Modern implementations use two-tower neural architectures for efficient retrieval followed by cross-attention reranking.

recommendationhybridensemble
70.8B+
SkillAI Tools & APIs

Graph Neural Networks

by Community

Applies deep learning directly to graph-structured data by passing and aggregating messages between connected nodes across multiple layers, enabling node classification, link prediction, and graph-level tasks. Powers state-of-the-art knowledge graph completion, molecular property prediction, and social network analysis.

GNNgraph-learningnode-classification
70.6B+
SkillAI Tools & APIs

Reinforcement Learning for Control

by Community

Trains control policies for autonomous systems through environment interaction and reward signals using model-free (PPO, SAC, TD3) and model-based (MBPO, Dreamer) RL algorithms. Enables superhuman performance in complex continuous control tasks from locomotion to manipulation.

reinforcement-learningcontrolautonomous-systems
69.9B
SkillLLMs

Summarization

by AaaS

Condenses long documents into concise summaries while preserving key information and maintaining factual accuracy. Supports extractive, abstractive, and hierarchical summarization with configurable length, style, and focus area parameters.

summarizationcondensationnlp
69.8B
SkillAI Tools & APIs

Anomaly Detection

by Community

Identifies unusual patterns, outliers, and change points in time-series and tabular data using statistical, density-based, isolation forest, autoencoder, and transformer-based methods. Fundamental for operational monitoring, fraud detection, and predictive maintenance systems.

anomaly-detectiontime-seriesoutlier-detection
69.4B
SkillLLMs

RAG Retrieval

by AaaS

A technique that enhances large language models by dynamically retrieving relevant information from an external knowledge base. This process grounds the model's responses in factual data, reducing hallucinations and enabling it to answer questions about information not present in its original training data.

ragretrieval-augmented-generationllm
68.3B
SkillComputer Vision

Object Detection

by AaaS

A core computer vision skill that enables agents to identify and locate objects within an image or video stream. By predicting bounding boxes and class labels for each object, this skill forms the foundation for environmental understanding. It is crucial for applications requiring spatial awareness, from autonomous navigation to automated inspection.

computer-visionobject-detectionbounding-box
68.3B
SkillAI for Code

Code Debugging

by AaaS

Diagnoses and resolves software bugs by analyzing error messages, stack traces, and code behavior. Applies systematic debugging strategies including root cause analysis, state inspection, and targeted fix generation with regression awareness.

debuggingtroubleshootingerror-analysis
68.1B
SkillLLMs

Semantic Search

by AaaS

Enables meaning-based retrieval by converting queries and documents into dense vector representations and finding nearest neighbors. Foundational skill for any RAG pipeline or knowledge-base-powered agent.

searchembeddingssimilarity
67.6B
SkillAI Tools & APIs

Federated Learning

by Community

A machine learning technique that trains an algorithm across multiple decentralized edge devices or servers holding local data samples, without exchanging the data itself. It enables collaborative model training by aggregating locally computed updates, thereby preserving data privacy, security, and sovereignty.

federated-learningprivacy-preserving-mldistributed-training
67.5B
SkillAI Tools & APIs

Content-Based Recommendation

by Community

Recommends items by matching item feature profiles to user preference profiles derived from their interaction history, using TF-IDF, embeddings, and semantic similarity techniques. Effective for cold-start scenarios where user interaction data is sparse and item metadata is rich.

recommendationcontent-baseditem-features
67.5B
SkillLLMs

Text Classification

by AaaS

Automates the categorization of text into predefined classes. This skill leverages large language models to perform zero-shot and multi-label classification, eliminating the need for extensive training data. It can analyze documents, user feedback, or social media posts, assigning relevant labels from a simple list or a complex hierarchical taxonomy.

text-classificationnlpcategorization
67.3B
SkillAI Tools & APIs

Path Planning

by Community

Path Planning is a fundamental capability in robotics and autonomous systems that computes a collision-free geometric path from a start to a goal configuration. It operates within a system's configuration space, using algorithms like A* or RRT to find optimal or feasible routes, distinct from motion planning which also considers dynamics like velocity and acceleration.

path-planningroboticsmotion-planning
67.3B
SkillComputer Vision

Visual Question Answering

by AaaS

Enables agents to answer free-form natural language questions about images by grounding language in visual features. Covers prompt construction for vision-language models, chain-of-thought visual reasoning, and failure modes such as hallucination and spatial confusion.

vqavision-languagemultimodal
66.8B
SkillAI Tools & APIs

Differential Privacy

by Community

Provides mathematically rigorous privacy guarantees by adding calibrated noise to query outputs or model gradients, ensuring individual data points cannot be inferred from published statistics or trained models. The de facto standard for privacy-preserving data analysis and compliant ML training.

privacydifferential-privacynoise-injection
66.8B
SkillLLMs

Embedding Generation

by AaaS

Generates dense vector embeddings from text, images, or other data types for use in similarity search, clustering, and classification. Covers model selection, batch processing, dimensionality considerations, and normalization strategies for optimal retrieval performance.

embeddingsvectorsrepresentation
66.4B
SkillAI for Code

Test Generation

by AaaS

Automates the creation of test suites by analyzing source code, function signatures, or specifications. It generates unit tests, integration tests, and edge case scenarios for popular frameworks, complete with necessary mocks and assertions. This accelerates development cycles and improves code reliability.

test-generationautomated-testingunit-testing
66.3B
SkillLLMs

ReAct Prompting

by AaaS

Implements the Reasoning + Acting (ReAct) paradigm where LLMs alternate between thinking steps and action steps. The model reasons about what to do next, takes an action (like searching or computing), observes the result, and continues reasoning until the task is complete.

promptingreactreasoning
66.3B
SkillAI Tools & APIs

Sensor Fusion

by Community

Combines data from multiple heterogeneous sensors — cameras, LiDAR, radar, GPS, IMU — using probabilistic filters and deep learning to produce a unified, accurate state estimate of the environment. Foundational for autonomous vehicles, drones, and any robot requiring robust situational awareness.

autonomous-systemssensor-fusionKalman-filter
66B
SkillLLMs

Fine-Tuning

by AaaS

Adapts pre-trained language models to specific domains, tasks, or styles through additional training on curated datasets. Covers full fine-tuning, parameter-efficient methods like LoRA and QLoRA, and best practices for dataset preparation, hyperparameter selection, and evaluation.

trainingfine-tuningadaptation
66B
Skilldevops-foundry

Anomaly Detection

by AaaS

Identifies deviations from normal system behavior across time-series telemetry data (CPU, memory, latency, error rates, request volumes). Uses statistical methods (z-score, IQR) and learned baselines to distinguish genuine anomalies from expected variance. A critical cross-foundry skill reused by SRE (F1), Fraud Detection (F6), and Supply Chain (F8) agents.

anomalydetectiontelemetry
66B
SkillAI for Code

Code Refactoring

by AaaS

Code Refactoring is the disciplined process of restructuring existing computer code without altering its external behavior. It focuses on enhancing nonfunctional attributes like readability, maintainability, and performance. This practice is key to managing technical debt, applying design patterns, and modernizing legacy systems to align with current best practices.

refactoringclean-codedesign-patterns
65.4B
SkillLLMs

Translation

by AaaS

Provides the ability to translate text from a source language to a target language. It aims to preserve the original meaning, tone, and cultural context. The skill supports domain-specific terminology for fields like legal or medical, allows for register control between formal and informal language, and handles idiomatic expressions with contextually appropriate equivalents.

translationmultilinguallocalization
65.2B
SkillAI Tools & APIs

Synthetic Data Generation

by Community

A process for creating artificial data that mimics the statistical properties and patterns of real-world datasets. It employs techniques like GANs, VAEs, and diffusion models to generate new data points, addressing issues of data scarcity, privacy, and imbalance. This enables robust model training and testing where real data is unavailable or sensitive.

synthetic-datadata-augmentationgenerative-ai
65.2B
SkillAI Tools & APIs

Robot Perception

by Community

Enables robots to interpret their surroundings by processing and fusing data from sensors like cameras, LiDAR, and IMUs. This capability allows machines to build environmental models, detect and track objects, and determine their own position and orientation (localization). It is a cornerstone of autonomous navigation and interaction.

roboticsperceptioncomputer-vision
65.1B
SkillComputer Vision

OCR Pipeline

by AaaS

Builds end-to-end pipelines for extracting structured text from images, scanned documents, and PDFs using OCR engines combined with layout analysis. Teaches preprocessing, engine selection (Tesseract, PaddleOCR, Google Document AI), post-correction, and handoff to language models for structured extraction.

ocrdocument-parsingtext-extraction
65.1B
SkillAI Tools & APIs

Motion Planning

by Community

Motion Planning is the process of generating a valid trajectory for an autonomous system, such as a robot arm or self-driving car, from a starting state to a desired goal state. It computes a collision-free path that respects the system's kinematic and dynamic constraints, effectively bridging perception with physical action.

motion-planningroboticspath-planning
65B
SkillAI Tools & APIs

Knowledge Graph Construction

by Community

Builds structured knowledge graphs from unstructured text and semi-structured sources through entity recognition, relation extraction, coreference resolution, and entity linking. The resulting graphs power question answering, search, recommendation, and reasoning applications.

knowledge-graphinformation-extractionNLP
64.7B
SkillComputer Vision

Image Generation Prompting

by AaaS

Master structured prompting for text-to-image diffusion models like Stable Diffusion and Midjourney. Learn to control style, composition, and quality using techniques such as negative prompting, LoRA weights, and iterative refinement. This skill enables the programmatic generation of consistent, on-brand imagery at scale.

image-generationdiffusionprompt-craft
64.6B
SkillLLMs

Prompt Chaining

by AaaS

Prompt Chaining is a technique for executing complex tasks by breaking them into a sequence of smaller, interconnected prompts. The output from one large language model (LLM) call serves as the input for the next, creating a multi-step workflow. This method enables more sophisticated reasoning, state management, and integration with external tools.

prompt-engineeringchainingllm-orchestration
64.3B
SkillLLMs

Hybrid Search

by AaaS

Hybrid search enhances information retrieval by merging the results of two distinct search methods: dense vector search for semantic understanding and sparse keyword search (like BM25) for lexical precision. This dual approach ensures that search results are not only contextually relevant but also capture exact term matches, significantly improving recall and relevance across diverse and complex queries.

ragsearchhybrid-search
63.8B
SkillLLMs

Data Extraction

by AaaS

Data Extraction is the process of automatically identifying and pulling structured information from unstructured or semi-structured sources like documents, web pages, and text. It uses NLP and computer vision to parse content into a predefined schema, enabling data to be used in databases, analytics, and automated workflows.

data-extractionstructured-dataparsing
63.8B
SkillAI Tools & APIs

Few-Shot Domain Adaptation

by Community

Adapts models to new target domains using only a handful of labeled examples, combining meta-learning, prompt engineering, and prototype-based methods. Critical for enterprise deployments where labeled data is scarce or expensive to acquire.

few-shotdomain-adaptationmeta-learning
63.5B
Skillcustomer-success-foundry

CRM Data Retrieval

by AaaS

Queries CRM systems to retrieve customer account data, ticket history, subscription status, and interaction logs. Provides the customer context foundation that support, churn, and sales agents depend on for personalized actions.

crmdata-accesssalesforce
63.5B
SkillLLMs

Document Chunking

by AaaS

Splits large documents into semantically coherent chunks optimized for embedding and retrieval. Supports recursive, semantic, and sentence-based splitting strategies with configurable overlap and size parameters.

ragchunkingpreprocessing
63.45B
SkillAI Agents

Multi-Step Reasoning

by AaaS

A core AI capability that enables agents to break down complex queries into a sequence of manageable, logical steps. By generating intermediate thoughts and verifying them, this process mimics human reasoning to solve problems that require planning, deduction, and synthesis of information over multiple stages.

multi-step-reasoningchain-of-thoughttree-of-thoughts
63.2B
SkillAI Tools & APIs

Active Learning

by Community

Active Learning is a machine learning technique that intelligently selects the most informative data points from a large pool of unlabeled data to be labeled by a human annotator. By prioritizing examples where the model is most uncertain, it aims to achieve higher model accuracy with significantly fewer labeled samples, reducing annotation costs and time.

active-learningdata-labelingannotation
63.2B
Skilldevops-foundry

Log Analysis

by AaaS

Parses, correlates, and summarizes structured and unstructured log streams from multiple sources (application logs, system logs, CI/CD logs). Identifies error patterns, correlates events across distributed services using trace IDs, and extracts actionable insights from high-volume log data. A foundational skill reused across DevOps, SRE, and security agents.

logsanalysisparsing
63.1B
SkillComputer Vision

Image Segmentation

by AaaS

Covers semantic, instance, and panoptic segmentation techniques that enable agents to produce pixel-level masks for scene understanding. Includes practical guidance on using SAM 2, Mask R-CNN, and integrating segmentation outputs into multimodal pipelines.

visionsegmentationSAM
62.7B
SkillAI Tools & APIs

Feature Attribution

by AaaS

This skill involves computing and communicating which input features most influenced a model's prediction. It leverages methods like SHAP, LIME, and Integrated Gradients for tabular, text, and image data. The core focus is on generating local and global explanations and presenting them visually for both technical and non-technical audiences.

xaiexplainable-aiinterpretability
62.6B
SkillAI Tools & APIs

Causal Effect Estimation

by Community

Causal Effect Estimation quantifies the true impact of an action or intervention by analyzing observational data. It moves beyond simple correlation to isolate causality using statistical methods, which is crucial for evaluating policies, business strategies, and medical treatments where A/B tests are infeasible.

causal-inferencecausal-effect-estimationaverage-treatment-effect
62.3B
SkillAI Agents

Planning

by AaaS

Enables agents to create structured execution plans for multi-step tasks by analyzing goals, identifying sub-tasks, ordering dependencies, and allocating resources. Supports plan revision when steps fail or new information emerges during execution.

planningstrategytask-management
62.2B
SkillLLMs

Named Entity Recognition

by AaaS

Identifies and classifies named entities (people, organizations, locations, dates, etc.) within unstructured text. Supports custom entity types, relationship extraction between entities, and structured output formatting for downstream processing.

nerentity-extractionnlp
62.2B
SkillAI Tools & APIs

Sim-to-Real Transfer

by Community

Sim-to-Real Transfer is a set of techniques used in robotics and AI to bridge the 'reality gap' between simulation and the real world. It enables models and control policies trained in a virtual environment to be deployed effectively on physical hardware, drastically reducing the need for costly, time-consuming, and potentially unsafe real-world data collection.

sim-to-realroboticsreinforcement-learning
62.1B
SkillAI Agents

Multi-Agent Coordination

by AaaS

Multi-Agent Coordination involves designing systems where multiple autonomous agents collaborate to achieve a common goal. This skill encompasses architectural patterns like hierarchical supervision and peer-to-peer negotiation for task distribution and conflict resolution. It focuses on managing shared information and ensuring coherent collective action in complex, dynamic environments.

multi-agentorchestrationcoordination
62B
SkillAI Infrastructure

Streaming Responses

by AaaS

This skill involves implementing real-time, token-by-token data delivery from Large Language Models to end-users. It utilizes protocols like Server-Sent Events (SSE) or WebSockets to create interactive and responsive applications, such as chatbots or code assistants, by progressively displaying content as it's generated.

streamingssewebsockets
61.9B
SkillLLMs

Reranking

by AaaS

Applies a cross-encoder or LLM-based reranker to refine initial retrieval results by scoring query-document pairs for relevance. Dramatically improves precision by promoting the most contextually relevant passages to the top of the result set.

ragrerankingrelevance
61.3B
Skillfinance-foundry

OCR Extraction

by AaaS

Extracts structured data from unstructured documents (PDFs, scanned images, email attachments) using optical character recognition with layout-aware parsing. Handles multi-page invoices, varying formats, and poor scan quality — producing structured key-value pairs for downstream reconciliation.

ocrdocument-processinginvoice
61.3B
Skilldevops-foundry

Escalation Routing

by AaaS

Routes unresolved or high-risk incidents to the appropriate human responder with full diagnostic context. Determines escalation urgency (P1-P5), identifies the correct on-call engineer or team based on service ownership, and packages a complete incident summary (timeline, diagnostics run, hypothesis). A cross-foundry skill reused by Customer Success (F4) and Healthcare (F9) agents.

escalationroutingincident
61.3B
SkillAI Ethics & Safety

Content Filtering

by AaaS

A system that automatically screens text inputs and outputs for large language models (LLMs) to detect and manage harmful content. It uses multi-category classification to identify issues like toxicity, hate speech, and violence, applying configurable rules and thresholds to enforce safety policies and protect users.

content-moderationai-safetytrust-and-safety
61.2B
SkillAI for Code

Code Explanation

by AaaS

Provides detailed, multi-level explanations for code snippets, functions, or entire repositories. It breaks down complex algorithms, clarifies control flow, and describes the purpose of variables and dependencies. The skill supports numerous programming languages, generating documentation-style overviews or granular, line-by-line analyses to accelerate learning and code reviews.

explanationunderstandingdocumentation
61.1B
SkillAI Agents

Web Browsing

by AaaS

Empowers autonomous agents to interact with the web like a human user. This skill provides the core functionality to navigate to URLs, render pages including executing JavaScript, and parse DOM elements. It enables complex workflows such as filling out forms, clicking buttons, and extracting structured data for analysis or task completion.

browsingwebnavigation
60.8B
SkillAI Tools & APIs

Continual Learning

by Community

A machine learning paradigm enabling models to learn sequentially from a continuous stream of data without forgetting previously acquired knowledge. Continual Learning, or Lifelong Learning, directly addresses the problem of catastrophic forgetting in neural networks using methods like regularization, memory replay, and dynamic architectures.

continual-learninglifelong-learningcatastrophic-forgetting
60.8B
SkillAI Ethics & Safety

Prompt Injection Defense

by AaaS

Detects and mitigates prompt injection attacks where malicious inputs attempt to override system instructions or extract sensitive information. Implements input sanitization, instruction hierarchy enforcement, and output monitoring to protect LLM-powered applications.

securityprompt-injectiondefense
60.7B
SkillAI Agents

Agentic RAG

by AaaS

Agentic RAG transforms Retrieval-Augmented Generation from a static, single-step process into a dynamic, multi-step workflow. In this paradigm, an LLM-powered agent intelligently decides when to retrieve information, what queries to use, and whether to perform additional retrieval cycles, often using external tools to refine its approach.

ragagentictool-use
60.7B
SkillAI for Code

SQL Generation

by AaaS

Converts natural language questions into executable SQL queries against relational databases. Supports schema-aware generation, multi-table joins, aggregations, and query optimization with dialect-specific syntax for PostgreSQL, MySQL, SQLite, and others.

sqldatabasequery-generation
60.5B
Skillcustomer-success-foundry

Sentiment Analysis

by AaaS

Classifies the emotional tone and sentiment polarity of customer text communications — support tickets, survey responses, chat logs, and social mentions. Produces sentiment scores with confidence levels, enabling churn prevention and coaching agents to identify dissatisfied accounts before explicit complaints surface.

nlpsentimentcustomer-feedback
60.3B
Skillcustomer-success-foundry

Knowledge Retrieval

by AaaS

Retrieves relevant articles, documentation, and policy information from knowledge bases in response to real-time queries. Uses hybrid search (keyword + semantic) with cross-encoder reranking to surface the most contextually appropriate content for support and coaching agents.

ragknowledge-baseretrieval
60.3B
Skillcustomer-success-foundry

Ticket Routing

by AaaS

Classifies support tickets by category, urgency, and required expertise, then routes them to the correct queue or human agent. Handles auto-resolution for simple cases and escalates complex ones with full context summaries.

supportroutingtriage
60.2B
Skilldevops-foundry

Deployment Monitoring

by AaaS

Continuously observes deployment pipelines and post-deploy health metrics. Detects anomalous deployment patterns (elevated error rates, latency spikes, failed health checks) within seconds of release. Integrates with canary and blue-green deployment strategies to provide real-time go/no-go signals based on configurable thresholds.

ci-cdmonitoringdeployment
60.2B
Skillrevenue-foundry

Lead Scoring

by AaaS

Assigns numerical scores to leads based on demographic fit, firmographic match, behavioral engagement, and intent signals. Enables agents to rank prospects by conversion likelihood and route high-scoring leads to immediate outreach while nurturing lower-scoring ones.

salesscoringqualification
59.8C+
SkillAI Infrastructure

Context Window Optimization

by AaaS

A set of techniques for managing the limited memory (context window) of Large Language Models. It involves strategically structuring prompts, summarizing or pruning conversation history, and selectively including relevant information to ensure efficient, cost-effective, and coherent long-form interactions with an AI.

context-windowoptimizationtoken-management
59.6C+
SkillAI Ethics & Safety

PII Detection

by AaaS

Identifies and flags personally identifiable information (PII) in text data, including names, addresses, phone numbers, SSNs, and financial details. Supports configurable sensitivity levels, redaction strategies, and compliance reporting for GDPR, HIPAA, and CCPA requirements.

piiprivacydetection
59.5C+
SkillAI Tools & APIs

Entity Resolution

by Community

Identifies and merges records across heterogeneous data sources that refer to the same real-world entity, using blocking, similarity scoring, and classification models to scale to large corpora. Critical for maintaining knowledge graph integrity and enabling cross-source analytics.

entity-resolutionrecord-linkagededuplication
59.5C+
SkillAI for Code

Documentation Generation

by AaaS

Generates technical documentation from source code, including API references, README files, inline comments, and architectural guides. Adapts tone and detail level for different audiences from developer guides to end-user documentation.

documentationdocsapi-docs
59.4C+
Skilldevops-foundry

Threshold Detection

by AaaS

Evaluates real-time metrics against configurable thresholds (SLOs, SLIs, error budgets) and triggers appropriate responses. Supports static thresholds, dynamic baselines, and anomaly-based detection. Distinguishes between noise and genuine threshold breaches using historical context and burn-rate analysis.

monitoringalertingthresholds
59.2C+
SkillAI Tools & APIs

Causal Discovery

by Community

Causal Discovery is a subfield of AI that infers causal relationships from observational data. It constructs a Directed Acyclic Graph (DAG) to represent these cause-and-effect links without manual intervention or controlled experiments, using statistical algorithms to distinguish correlation from causation.

causal-inferencecausal-discoverydag
59.1C+
SkillAI Agents

Agent Memory Systems

by AaaS

Teaches design and implementation of multi-tier agent memory architectures — in-context working memory, episodic memory via vector stores, and semantic memory via knowledge graphs — enabling agents to maintain coherent state across long-running tasks and sessions. Covers retrieval-augmented memory, memory consolidation, and forgetting strategies.

memorylong-termepisodic
59C+
SkillAI Tools & APIs

Structured Output RAG

by AaaS

This skill involves building Retrieval-Augmented Generation (RAG) systems that output structured data, like JSON, conforming to a predefined schema. Instead of unreliable free-form text, it uses techniques like constrained decoding and validation to ensure outputs are machine-readable and ready for direct use in APIs or databases.

structured-outputragjson
58.9C+
SkillAI Agents

Reflection

by AaaS

Allows agents to evaluate their own outputs, identify errors or weaknesses, and iteratively improve responses. Implements self-critique loops where the agent reviews its work against quality criteria and refines until standards are met.

reflectionself-evaluationmetacognition
58.9C+
Skillcustomer-success-foundry

Refund Processing

by AaaS

Processes customer refunds through payment gateway APIs within configurable monetary caps. Enforces refund policies (max per transaction, max per day, cooling periods) and generates immutable audit trails for every refund action. Escalates requests above caps to human approval.

refundpaymentprocessing
58.7C+
SkillAI Agents

Memory Management

by AaaS

Enables AI agents to maintain state and context across multiple interactions by managing short-term and long-term memory. This is crucial for creating coherent, personalized experiences, moving beyond stateless request-response models. It uses techniques like conversation buffers, summarization, and vector-based retrieval.

memory-managementcontext-retentionstateful-ai
58.7C+
SkillAI Tools & APIs

Web Scraping

by AaaS

Web scraping automates the extraction of large amounts of data from websites. By simulating human browsing, it can crawl through pages, parse HTML, and collect specific information like prices, contacts, or articles, transforming unstructured web content into structured data for analysis or other applications.

web-scrapingdata-extractionweb-crawling
58.6C+
SkillAI Agents

Autonomous Planning

by AaaS

Autonomous Planning enables AI agents to independently decompose high-level, long-horizon objectives into a structured graph of executable sub-tasks. It involves generating plans using classical (PDDL), LLM-based, or hybrid methods, estimating necessary resources, and dynamically replanning in response to execution failures or new environmental data.

autonomous-planningai-agentsgoal-decomposition
58.6C+
Skillcustomer-success-foundry

Usage Trend Analysis

by AaaS

Tracks product usage patterns over time — login frequency, feature adoption, session duration, and activity drops. Identifies accounts showing declining engagement that correlate with churn risk, enabling proactive retention before the customer disengages.

analyticsusagetrends
58.5C+
Skilldevops-foundry

Telemetry Analysis

by AaaS

Ingests and analyzes telemetry data (metrics, traces, spans) from distributed systems. Correlates performance data across service boundaries using distributed tracing, identifies bottleneck services, and produces latency breakdowns. Provides the observability foundation that SRE Triage and Latency Budget Planner agents depend on.

telemetrymetricstraces
58.5C+
Skillfinance-foundry

PO Matching

by AaaS

Matches extracted invoice data against Purchase Orders and receipt logs in ERP systems using deterministic matching rules (PO number, vendor, amount, line items). Handles partial matches, tolerance thresholds, and multi-line reconciliation. Routes exceptions to human queues with full mismatch details.

accounts-payablepurchase-ordermatching
58.5C+
Skillpeople-foundry

Calendar Negotiation

by AaaS

Accesses multiple participants' calendars simultaneously and finds optimal meeting times across time zones, working hours, and scheduling constraints. Handles rescheduling, cancellations, and conflict resolution autonomously.

schedulingcalendarcoordination
58.5C+
Skillfinance-foundry

Approval Workflow

by AaaS

Routes transactions, documents, and exceptions through configurable multi-step approval chains based on amount thresholds, risk levels, and organizational policies. Tracks approver actions with timestamps, sends reminders for pending items, and escalates stalled approvals — ensuring no payment or commitment is authorized without the required sign-offs.

workflowapprovalsrouting
58.5C+
Skilldevops-foundry

Pull Request Generation

by AaaS

Generates complete, well-structured pull requests including: descriptive title, detailed body with change rationale, test results summary, dependency diff, and reviewer assignments. Follows the organization's PR template and conventional commit conventions. Produces PRs that human reviewers can approve quickly because all context is pre-packaged.

pull-requestautomationgit
58.3C+
SkillLLMs

Constitutional AI

by AaaS

Applies Anthropic's Constitutional AI principles to self-supervise model outputs against a set of defined rules or principles. The model critiques and revises its own responses to ensure they align with safety guidelines, ethical principles, and quality standards.

safetyalignmentconstitutional
58C+
SkillAI Ethics & Safety

Output Validation

by AaaS

Validates LLM outputs against expected schemas, formats, and quality criteria before delivery to end users. Implements JSON schema validation, hallucination checks, citation verification, and automated retry logic for outputs that fail validation.

validationoutput-qualityschema-validation
57.7C+
SkillAI Tools & APIs

Counterfactual Reasoning

by Community

Generates and evaluates counterfactual explanations — minimal input changes that would alter a model's prediction — using structural causal models and algorithmic recourse techniques. Provides actionable explanations for model decisions and supports causal effect estimation under interventions.

causal-inferencecounterfactualsexplainability
57.6C+
Skillrevenue-foundry

Personalized Outreach

by AaaS

Drafts hyper-personalized outreach messages for each prospect using their specific firmographic profile, recent intent signals, and ICP match factors. Enforces brand voice and CAN-SPAM/GDPR compliance, adapts tone by channel (email, LinkedIn, phone script), and graduates from human-approved to autonomous sending as trust is established.

salesoutreachpersonalization
57.5C+
Skilldevops-foundry

Dependency Mapping

by AaaS

Constructs complete dependency graphs across package managers (npm, pip, cargo, Maven) and internal modules. Identifies version conflicts, circular dependencies, security-vulnerable transitive dependencies, and upgrade paths. Produces actionable dependency health reports that inform both the Codebase Architect and Dependency Guardian agents.

dependenciesgraphpackage-management
57.5C+
Skillrevenue-foundry

Buyer Intent Tracking

by AaaS

Monitors buyer intent signals across website visits, email opens, content downloads, CRM activity, and third-party intent data providers. Correlates engagement patterns to identify accounts showing active buying behavior, enabling agents to prioritize high-intent prospects over cold outreach.

salesintent-datasignals
57.5C+
SkillSpeech & Audio AI

Speaker Diarization

by AaaS

Enables agents to segment audio recordings by speaker identity, answering 'who spoke when' for downstream summarization and analysis tasks. Covers embedding-based clustering (pyannote.audio, NeMo), overlapping speech handling, and merging diarization with ASR transcripts.

diarizationspeaker-idaudio
57.4C+
Skilldevops-foundry

Rollback Execution

by AaaS

Executes safe, policy-constrained rollbacks of failed deployments. Respects blast-radius limits (max affected services), rate limits (max rollbacks per hour), and change-window constraints. Supports multiple rollback strategies: Git revert, container image pinning, feature flag disabling, and traffic shifting. Produces a detailed rollback report with root cause hypothesis.

rollbackdeploymentrecovery
57.4C+
Scriptai-scripts

Hugging Face Transformers Training Script

by Hugging Face

The Hugging Face Transformers training script simplifies the process of training and fine-tuning transformer models for various NLP tasks. It provides a high-level API and pre-built training loops, enabling users to quickly adapt pre-trained models to their specific datasets and objectives.

transformersnlptraining
91.8A+
Scriptai-scripts

PyTorch Image Classification Script

by PyTorch

A Python script using PyTorch for training and evaluating image classification models. It provides a modular structure for defining datasets, models, training loops, and evaluation metrics, enabling researchers and practitioners to quickly prototype and deploy image classification solutions.

image classificationpytorchdeep learning
89.8A
Scriptai-scripts

TensorFlow Model Garden

by Google

The TensorFlow Model Garden is a repository containing a collection of example implementations for state-of-the-art (SOTA) machine learning models and modeling solutions for TensorFlow. It provides a wide variety of models, pre-trained weights, and scripts to help users quickly prototype and deploy TensorFlow-based AI solutions.

tensorflowmodelsmachine-learning
87.2A
Scriptai-scripts

TensorFlow Model Optimization Toolkit Script

by Google

The TensorFlow Model Optimization Toolkit script provides tools and techniques to optimize TensorFlow models for deployment, including quantization, pruning, and clustering. It reduces model size and improves inference speed, making models more suitable for edge devices and resource-constrained environments.

tensorflowmodel-optimizationquantization
86.2A
Scriptai-scripts

Scikit-learn Model Evaluation Script

by Scikit-learn

A Python script leveraging scikit-learn to comprehensively evaluate machine learning models. It calculates various performance metrics (e.g., accuracy, precision, recall, F1-score, AUC) and generates visualizations (e.g., confusion matrices, ROC curves) to provide insights into model behavior and facilitate informed decision-making.

model evaluationscikit-learnmachine learning
85.1A
Scriptai-scripts

LangChain Expression Language (LCEL) Script

by LangChain

LCEL is a declarative way to compose chains of language models and other primitives in LangChain. This script demonstrates how to use LCEL to build complex AI pipelines with features like streaming, parallel execution, and retry mechanisms, enabling developers to create robust and scalable AI applications.

langchainchainingexpression language
84.7A
Scriptai-scripts

Stable Diffusion XL Turbo Inference Script

by Stability AI

This script provides a streamlined method for performing image generation using Stable Diffusion XL Turbo. It leverages optimized inference techniques to achieve faster generation speeds, making it suitable for real-time applications and interactive experiences.

image generationdiffusion modelinference
82.8A
ScriptSpeech & Audio AI

Speech-to-Text Pipeline

by OpenAI

Production-grade ASR pipeline using OpenAI Whisper or faster-whisper with VAD-based chunking, speaker timestamp alignment, and SRT/VTT subtitle export. Handles long-form audio via sliding window segmentation and automatic language detection.

speech-to-textwhispertranscription
71.4B+
ScriptComputer Vision

Object Detection Setup

by Ultralytics

Bootstraps a production-ready object detection workflow using YOLOv8 or RT-DETR, including webcam/video stream ingestion, NMS post-processing, and annotation overlay rendering. Outputs annotated frames and a structured JSON detections log suitable for downstream analytics.

object-detectionyolobounding-boxes
67.9B
ScriptAI for Code

Feature Importance Analyzer

by Community

Analyzes feature importance for scikit-learn compatible models using multiple advanced techniques. It computes SHAP values with Tree and Kernel Explainers, calculates permutation importance, and performs feature selection with Boruta. Results are compiled into an interactive HTML dashboard for easy interpretation and sharing.

feature-importanceshappermutation-importance
66.9B
ScriptAI for Code

REST AI API Template

by Community

Production-ready FastAPI template for AI-powered REST APIs, with pre-wired OpenAI/Anthropic client, async streaming endpoints, JWT authentication, rate limiting, structured logging, and OpenAPI docs. Includes Docker Compose stack with Redis rate-limit store and Prometheus metrics.

rest-apifastapiopenai
66.7B
ScriptAI for Code

Fraud Detection Pipeline

by Community

This is a complete machine learning pipeline for detecting fraudulent transactions in real-time. It employs a hybrid approach, using XGBoost or LightGBM for classification and an Isolation Forest for anomaly detection. The system is specifically designed to handle severely imbalanced datasets through SMOTE-Tomek resampling and cost-sensitive learning.

fraud-detectionanomaly-detectionimbalanced-learning
63.7B
ScriptComputer Vision

Image Classification Pipeline

by Community

End-to-end image classification pipeline that handles dataset loading, preprocessing, model inference, and result export using PyTorch and torchvision. Supports batch inference against any Hugging Face ViT or ResNet checkpoint with configurable confidence thresholds.

image-classificationvisionpytorch
62.7B
ScriptAI Infrastructure

Model Fine-Tuning (LoRA)

by AaaS

This script automates the process of fine-tuning large language models using Low-Rank Adaptation (LoRA). It provides an end-to-end workflow, from preparing custom datasets to training lightweight adapters and merging them into a base model for efficient deployment. This enables domain-specific model specialization with significantly reduced computational costs.

fine-tuningloratraining
62.6B
ScriptComputer Vision

OCR Pipeline Script

by Community

This script provides a sophisticated OCR pipeline that intelligently routes documents to the most suitable engine—Tesseract, PaddleOCR, or a cloud API—based on image quality analysis. It processes various document types and outputs structured JSON containing text sorted by reading order, complete with bounding box coordinates and confidence scores for each word or line.

ocrtext-extractiondocument-ai
62.1B
ScriptComputer Vision

Image Segmentation Script

by Meta AI

Runs Segment Anything Model (SAM 2) or Mask2Former on image batches, producing per-pixel segmentation masks with class labels and confidence scores. Includes utilities for mask overlay visualization and RLE-encoded mask export compatible with COCO annotation format.

segmentationsammask
62B
ScriptAI Infrastructure

Data Quality Checker

by Great Expectations

Automates data quality testing for tabular data using the Great Expectations library. This script profiles datasets to generate and validate 'Expectations' covering schema, statistical properties, and referential integrity. It produces a comprehensive HTML report (Data Docs) and can be integrated into CI/CD pipelines as a quality gate to prevent bad data from entering production systems.

data-qualitygreat-expectationsvalidation
62B
ScriptAI Infrastructure

PII Redaction Pipeline

by Microsoft

An automated pipeline that leverages Microsoft Presidio to identify and remove personally identifiable information (PII) from text and structured data. It supports configurable entity recognizers for GDPR and HIPAA compliance and features a reversible pseudonymization capability with a secure vault for authorized re-identification.

pii-redactiondata-maskingdata-anonymization
61.7B
ScriptAI Infrastructure

Basic RAG Pipeline

by AaaS

This script provides a foundational Retrieval-Augmented Generation (RAG) pipeline. It handles core tasks like loading documents, splitting text into chunks, generating embeddings, and indexing them into a vector store. It includes a basic query interface, making it ideal for learning the RAG workflow and prototyping simple applications.

scriptragpipeline
61.5B
ScriptSpeech & Audio AI

Speaker Diarization Script

by pyannote

This script automates the process of creating turn-by-turn transcripts from multi-speaker audio files. It first uses the pyannote.audio library to perform speaker diarization, identifying who spoke and when. These speaker segments are then aligned and merged with a transcription generated by OpenAI's Whisper, producing a final text output that attributes each line of dialogue to a specific speaker.

speaker-diarizationaudio-processingtranscription
60.4B
ScriptAI for Code

Chatbot Builder Script

by Community

This script generates a production-ready chatbot foundation using Rasa for structured dialogue and an LLM for open-ended fallback. It provides a unified channel adapter for deploying to Web, WhatsApp, and Slack, and includes built-in conversation analytics and a Streamlit-based testing environment for rapid development.

chatbotrasallm
60.2B
ScriptAI for Code

Neo4j RAG Pipeline

by Neo4j

Implements a GraphRAG pattern that stores document entities and relationships in Neo4j, then retrieves contextually relevant subgraphs at query time before passing them to an LLM. Includes automatic entity extraction with spaCy, relationship inference, and a Cypher query generator.

knowledge-graphneo4jgraph-rag
59.8C+
ScriptComputer Vision

Visual Search Engine

by Community

This script provides a complete framework for building a multimodal visual search engine. It uses CLIP to generate image and text embeddings, which are indexed in a vector database like Qdrant or Weaviate for efficient similarity search. The system supports both text-to-image and image-to-image queries and includes a FastAPI server for API access.

visual-searchimage-embeddingssimilarity-search
59.4C+
ScriptAI Infrastructure

Serverless Model Deploy

by Community

Packages a trained ML model into a serverless function on AWS Lambda, Modal, or Google Cloud Run, handling cold-start optimization, dependency layering, and auto-scaling configuration. Includes health-check endpoints, structured logging, and a GitHub Actions workflow for automated rollout.

serverlesslambdamodal
59C+
ScriptAI for Code

Recommendation Engine Setup

by Community

This script provides a complete setup for a modern, two-stage recommendation engine. It uses a two-tower neural network for efficient candidate retrieval and a powerful Large Language Model (LLM) for nuanced re-ranking. The system integrates with a Feast feature store to leverage real-time user context, ensuring timely and relevant suggestions.

recommendation-enginecollaborative-filteringllm-reranking
58.7C+
ScriptAI Infrastructure

Edge Model Optimization

by Community

Optimizes PyTorch and TensorFlow models for edge hardware by applying INT8/FP16 quantization and converting them to ONNX or TFLite formats. This script provides platform-specific tuning for ARM and NPU targets, benchmarking latency and memory usage while generating a report on accuracy trade-offs.

edge-deploymentonnxquantization
58.7C+
ScriptAI Infrastructure

Model Serving (vLLM)

by AaaS

This script automates the deployment of a large language model using the vLLM inference engine. It creates a high-throughput, OpenAI-compatible API endpoint. Key features like PagedAttention and continuous batching are configured to maximize performance and memory efficiency, making it suitable for production environments.

llm-servingmodel-deploymentvllm
58.6C+
ScriptAI for Code

WebSocket Streaming API

by Community

WebSocket server that proxies token-by-token LLM streaming to multiple simultaneous clients, with connection lifecycle management, heartbeat keep-alives, and per-session context persistence. Supports fan-out broadcasting for collaborative AI sessions and reconnection with message replay.

websocketstreamingreal-time
58.4C+
ScriptAI for Code

Automated Feature Engineering

by Alteryx

Applies Deep Feature Synthesis via Featuretools and AutoFeat to automatically generate hundreds of candidate features from relational tabular data, then prunes them using mutual information and SHAP-based importance filters. Produces a reproducible feature pipeline serializable to scikit-learn format.

feature-engineeringfeaturetoolsautoml
58.1C+
ScriptAI for Code

Sentiment Dashboard

by Community

Ingests social media feeds, reviews, and support tickets in near-real-time, scores sentiment at entity and aspect level using a fine-tuned RoBERTa model, and renders a live Streamlit dashboard with trend charts, topic clustering, and configurable alert thresholds for brand-crisis detection.

sentiment-analysisdashboardbrand-monitoring
58C+
ScriptAI Infrastructure

Data Cleaning Script

by AaaS

Cleans and normalizes text data for LLM consumption by removing HTML artifacts, fixing encoding issues, standardizing whitespace, deduplicating near-identical entries, and filtering low-quality content based on configurable quality heuristics.

scriptautomationcleaning
57.6C+
ScriptSpeech & Audio AI

Voice Cloning Setup

by Coqui

Sets up a zero-shot voice cloning pipeline using Coqui XTTS-v2 or Tortoise-TTS, requiring only a 3-second reference audio clip to synthesize new speech in the target voice. Includes a FastAPI inference server, audio quality normalization, and speaker embedding export for reuse.

voice-cloningttscoqui
57.4C+
ScriptAI Infrastructure

Document Ingestion Pipeline

by AaaS

Automated pipeline for ingesting documents from multiple sources (files, URLs, APIs) into a vector store. Handles format detection, text extraction, chunking, deduplication, metadata enrichment, and incremental updates for growing knowledge bases.

scriptautomationingestion
57.3C+
ScriptAI Infrastructure

Dataset Preparation

by AaaS

Prepares datasets for LLM fine-tuning by converting raw data into instruction-following, conversation, or completion formats. Handles data cleaning, deduplication, train/val/test splitting, tokenization analysis, and quality filtering.

scriptautomationdataset
57.3C+
ScriptAI Tools & APIs

Web Scraping Pipeline

by AaaS

Automated web scraping pipeline with configurable crawl depth, content extraction, and rate limiting. Converts web content into clean text documents suitable for embedding and RAG ingestion with support for dynamic JavaScript-rendered pages.

scriptautomationscraping
56.8C+
ScriptAI Agents

Tool Calling Setup

by AaaS

Sets up a tool-calling agent with typed tool definitions, argument validation, error handling, and execution sandboxing. Includes example tools for web search, calculator, file operations, and database queries with a pluggable tool registry.

scriptautomationtool-calling
56.4C+
ScriptAI Infrastructure

Batch Embedding Generation

by AaaS

Generates embeddings at scale for large document collections with batching, rate limiting, checkpointing, and error recovery. Supports multiple embedding providers (OpenAI, Cohere, local models) with automatic dimension detection and output format selection.

scriptautomationembeddings
56.4C+
ScriptAI for Code

Temporal Feature Builder

by Community

Generates comprehensive temporal features from time-series data including rolling statistics, lag features, Fourier transforms, and calendar encodings using tsfresh and custom transformers. Handles irregular time series with forward-fill interpolation and produces a point-in-time-correct feature matrix to prevent leakage.

temporal-featurestime-seriesrolling-windows
56.2C+
ScriptAI Infrastructure

RAG Pipeline Setup

by AaaS

End-to-end setup script for deploying a production RAG pipeline. Provisions vector database, configures document ingestion, sets up embedding generation, and creates retrieval endpoints.

ragpipelinesetup
55.8C+
ScriptAI Infrastructure

Advanced RAG Pipeline

by AaaS

Production-grade RAG pipeline with hybrid search, reranking, contextual compression, and multi-index routing. Includes query decomposition, metadata filtering, evaluation metrics, and performance monitoring for enterprise deployments.

scriptautomationrag
55.6C+
ScriptAI Infrastructure

Model A/B Testing

by Community

Implements statistically rigorous A/B and shadow-mode testing for competing ML model versions behind a feature flag router, logging predictions and latencies to a data warehouse for significance testing. Automatically computes sample size requirements and stops experiments when significance thresholds are met.

a-b-testingshadow-modetraffic-splitting
55.6C+
ScriptAI for Code

Graph Embedding Generator

by Community

Generates node and edge embeddings for knowledge graphs using Node2Vec, TransE, or a GNN (via PyTorch Geometric), then indexes them in a vector store for similarity search and link prediction. Includes training scripts, evaluation on standard link-prediction benchmarks, and a REST API for embedding lookup.

graph-embeddingsnode2vecgraph-neural-networks
55.1C+
ScriptAI for Code

Financial Report Parser

by Community

Parses SEC filings, earnings call transcripts, and annual reports using FinBERT for sentiment analysis and a table-extraction pipeline that converts HTML/XBRL financial statements into normalized pandas DataFrames. Exports structured financial metrics to a database and generates LLM-ready summaries for investor Q&A.

financial-nlpsec-filingsearnings
55C+
ScriptAI for Code

Clinical NLP Pipeline

by Community

Processes unstructured clinical notes using medspaCy and BioClinicalBERT to extract diagnoses, medications, procedures, and lab values, then maps entities to ICD-10 and SNOMED-CT codes. Outputs FHIR-compatible JSON bundles and includes a de-identification step compliant with HIPAA Safe Harbor.

clinical-nlphealthcareicd-10
55C+
ScriptAI Infrastructure

Feature Store Sync

by Feast

Synchronizes feature definitions and materialized feature values between offline (BigQuery/Snowflake) and online (Redis/DynamoDB) feature stores using Feast or Tecton, with configurable freshness SLAs and backfill scheduling. Includes drift monitoring to alert when online and offline distributions diverge.

feature-storefeasttecton
54.4C+
ScriptAI Infrastructure

PDF Extraction Pipeline

by AaaS

Specialized pipeline for extracting structured content from PDF documents including text, tables, images, and metadata. Supports OCR for scanned documents, layout analysis for complex formats, and chunking optimized for PDF document structures.

scriptautomationpdf
54.3C+
ScriptAI Infrastructure

Docker ML Deployment

by AaaS

Containerizes ML models and inference servers with optimized Docker images for production deployment. Includes multi-stage builds for minimal image size, GPU support configuration, health checks, and docker-compose setups for full inference stacks.

scriptautomationdocker
54.2C+
ScriptAI Infrastructure

Canary Deployment ML

by Community

Orchestrates progressive canary deployments of ML model services on Kubernetes using Istio traffic shifting, with automated rollback triggered by error-rate or latency SLO breaches. Integrates with Argo Rollouts for declarative release management and posts deployment status to Slack.

canary-deploymentprogressive-rolloutkubernetes
54.1C+
ScriptAI Infrastructure

Model Evaluation Harness

by AaaS

Comprehensive model evaluation script that runs models against standard benchmarks including MMLU, HumanEval, GSM8K, and custom evaluation sets. Produces detailed reports with per-category breakdowns, confidence intervals, and comparison charts.

scriptautomationevaluation
53.9C+
ScriptAI Infrastructure

GGUF Conversion

by AaaS

Converts Hugging Face model weights to GGUF format for use with llama.cpp and compatible inference engines. Supports multiple quantization levels (Q4_K_M, Q5_K_M, Q8_0), validates output integrity, and generates model cards with performance characteristics.

scriptautomationgguf
53.9C+
ScriptSpeech & Audio AI

Music Generation Script

by Meta AI

Generates royalty-free music from text prompts using Meta's MusicGen or AudioCraft, with controls for tempo, key, duration, and genre conditioning. Provides a CLI for batch generation and a streaming mode that writes 30-second chunks to disk or an S3 bucket.

music-generationaudiocraftmusicgen
53.8C+
ScriptComputer Vision

Face Recognition Setup

by Community

Configures a face recognition system using InsightFace or DeepFace, supporting gallery enrollment, real-time identification against a FAISS vector store, and liveness detection. Designed with privacy-first defaults and includes GDPR-compliant consent logging.

face-recognitionbiometricsdeepface
53.5C+
ScriptAI Infrastructure

Model Comparison Script

by AaaS

Side-by-side model comparison script that runs identical prompts through multiple LLM APIs and presents results in a structured format. Measures response quality, latency, token usage, and cost per query with automated scoring via LLM judges.

scriptautomationcomparison
53.3C+
ScriptAI for Code

Knowledge Graph Builder

by Community

Automatically constructs a knowledge graph from unstructured text by extracting subject-predicate-object triples using an LLM, then serializing them to RDF/OWL or property-graph formats. Supports ontology alignment, duplicate merging via entity resolution, and Turtle/JSON-LD export.

knowledge-graphentity-extractiontriple-extraction
53C+
ScriptAI Infrastructure

Knowledge Base Builder

by AaaS

End-to-end script for building a searchable knowledge base from heterogeneous sources including documents, APIs, databases, and web content. Orchestrates ingestion, deduplication, embedding, indexing, and creates a unified query interface across all sources.

scriptautomationknowledge-base
53C+
ScriptAI for Code

Legal Document Analyzer

by Community

Analyzes legal contracts and court documents using a fine-tuned LegalBERT model for clause classification, obligation extraction, and risk-flag detection, with outputs cross-referenced against a configurable playbook of standard clause definitions. Generates a redline-ready Word document and a structured JSON risk register.

legal-nlpcontract-analysisclause-extraction
52.8C+
ScriptAI Agents

Multi-Agent Orchestration

by AaaS

Orchestrates multiple specialized AI agents in coordinated workflows with task routing, state management, and result aggregation. Implements supervisor and swarm patterns with configurable agent selection logic and inter-agent communication.

scriptautomationmulti-agent
52.6C+
ScriptAI Infrastructure

Model Quantization (GPTQ)

by AaaS

Quantizes language models using GPTQ for efficient inference on consumer hardware. Performs calibration-based quantization, quality evaluation against the original model, and exports in formats compatible with vLLM, llama.cpp, and other inference engines.

scriptautomationquantization
52.2C+
ScriptAI for Code

Entity Linking Script

by Community

Disambiguates named entities in text by linking them to canonical Wikidata or custom knowledge base entries, using a bi-encoder retriever followed by a cross-encoder reranker. Handles multi-lingual input via mBERT and outputs entity URIs with confidence scores for downstream graph population.

entity-linkingnelwikidata
52.1C+
ScriptAI Tools & APIs

Cost Calculator

by AaaS

Calculates and projects LLM API costs based on usage patterns, model pricing, and workload forecasts. Compares costs across providers and models, identifies the most cost-effective configuration for a given quality threshold, and generates budget reports.

scriptautomationcost
51.6C+
ScriptAI Tools & APIs

Hallucination Detector

by AaaS

Detects hallucinated content in LLM outputs by cross-referencing claims against source documents and knowledge bases. Uses claim decomposition, source attribution scoring, and consistency checking to flag unsupported or fabricated statements.

scriptautomationhallucination
51C+
ScriptAI Infrastructure

Hybrid Search Setup

by AaaS

Configures a hybrid search system combining dense vector similarity with sparse BM25 keyword matching. Sets up dual index creation, score fusion strategies, and query routing logic for optimal retrieval across different query types.

scriptautomationsearch
50.9C+
ScriptAI Tools & APIs

Prompt Testing Suite

by AaaS

Automated testing framework for prompt engineering with test case management, assertion-based evaluation, regression detection, and A/B comparison. Validates prompt outputs against expected patterns, formats, and quality criteria with CI/CD integration.

scriptautomationprompt-testing
50.8C+
ScriptAI Agents

MCP Server Template

by AaaS

Template for building Model Context Protocol (MCP) servers that expose tools, resources, and prompts to MCP-compatible clients. Includes typed tool handlers, resource providers, error handling, and transport configuration for stdio and HTTP modes.

scriptautomationmcp
50.8C+
ScriptAI Infrastructure

LLM Load Testing

by AaaS

Load tests LLM API endpoints with configurable concurrency, request patterns, and duration. Measures throughput, latency percentiles (p50/p95/p99), time-to-first-token, error rates, and generates performance reports with degradation alerts.

scriptautomationload-testing
50.8C+
ScriptAI Infrastructure

CSV to Embeddings

by AaaS

Converts CSV data into vector embeddings with configurable column selection, text template formatting, and metadata extraction. Outputs to popular vector stores or file formats with chunking support for large CSV files that exceed memory limits.

scriptautomationcsv
50.8C+
ScriptSpeech & Audio AI

Audio Classification Setup

by Community

Configures an audio classification system using Audio Spectrogram Transformer (AST) or YAMNet fine-tuned on AudioSet, with Mel spectrogram feature extraction and batch inference. Exports per-clip predictions with top-5 class probabilities and integrates with a streaming event bus for real-time use.

audio-classificationsound-eventsast
50.8C+
ScriptAI Tools & APIs

Document Classification

by AaaS

Classifies documents into predefined categories using LLM-based inference with configurable taxonomies. Supports batch processing, multi-label classification, confidence thresholds, and exports results to CSV or database with audit trails.

scriptautomationclassification
50.4C+
ScriptAI Infrastructure

Data Lineage Tracker

by OpenLineage

Instruments ETL and ML pipelines with OpenLineage events, shipping dataset-level provenance metadata to a Marquez or Apache Atlas backend. Generates interactive lineage DAGs showing data transformations from source to model artifact, supporting impact analysis and audit trails.

data-lineageopenlineagemarquez
50.4C+
ScriptAI Infrastructure

Cost Optimization Script

by AaaS

Analyzes LLM API usage patterns and identifies cost optimization opportunities. Recommends model downgrades for simple tasks, prompt compression strategies, caching opportunities, and batch processing windows based on historical usage data and cost metrics.

scriptautomationcost
50.4C+
ScriptAI for Code

GraphQL AI Gateway

by Community

GraphQL gateway for multi-model AI services built with Strawberry Python, exposing query, mutation, and subscription resolvers for chat, embedding, and image generation endpoints across multiple LLM providers. Features a DataLoader-based batching layer and persisted query caching to minimize token usage.

graphqlai-gatewaystrawberry
49.7C
ScriptAI for Code

Supply Chain Optimizer

by Community

Combines ML demand forecasting (Prophet + LightGBM) with constraint-based optimization (Google OR-Tools) to minimize inventory costs while meeting service-level targets across a multi-echelon supply chain. Outputs replenishment orders, safety stock recommendations, and a scenario simulation dashboard.

supply-chainoptimizationor-tools
49.4C
ScriptAI Tools & APIs

Safety Audit Script

by AaaS

Comprehensive safety audit for LLM-powered applications testing for prompt injection vulnerabilities, PII leakage, harmful content generation, and policy violations. Generates detailed audit reports with severity ratings and remediation recommendations.

scriptautomationsafety
49.2C
ScriptAI Tools & APIs

Entity Extraction Pipeline

by AaaS

Extracts named entities and relationships from unstructured text at scale using LLM-powered NER with custom entity type support. Outputs structured data with entity linking, relationship graphs, and confidence scores for knowledge graph construction.

scriptautomationentity-extraction
49.2C
ScriptAI Infrastructure

Monitoring Setup (Grafana)

by AaaS

Sets up Grafana dashboards and Prometheus metrics for LLM application monitoring. Includes pre-built dashboards for token usage, latency, error rates, cost tracking, and model performance with configurable alert rules and notification channels.

scriptautomationmonitoring
49.1C
ScriptAI Infrastructure

Model Benchmarking Suite

by AaaS

Performance benchmarking suite measuring LLM inference throughput, latency percentiles, time-to-first-token, and tokens-per-second under various load patterns. Generates detailed performance reports with charts for capacity planning and SLA validation.

scriptautomationbenchmarking
49.1C
ScriptAI Tools & APIs

LLM Regression Testing

by AaaS

Detects regressions in LLM behavior across model updates, prompt changes, or configuration modifications. Runs golden test sets, compares outputs using semantic similarity and LLM judges, and flags significant quality degradation with detailed diff reports.

scriptautomationregression
48.7C
ScriptAI Agents

Agent Evaluation Framework

by AaaS

Evaluates AI agent performance across defined test scenarios with success criteria, step tracking, and automated scoring. Supports custom evaluation rubrics, regression detection, and generates detailed reports comparing agent versions over time.

scriptautomationevaluation
48.3C
ScriptAI Infrastructure

Annotation Pipeline

by AaaS

Automated data annotation pipeline using LLMs for labeling, classification, and quality scoring of training data. Implements multi-annotator consensus, confidence thresholds, human review queuing for uncertain samples, and annotation analytics.

scriptautomationannotation
47.9C
ScriptAI Tools & APIs

Token Usage Analyzer

by AaaS

Analyzes token usage patterns across LLM applications to identify optimization opportunities. Tracks input/output token ratios, identifies verbose prompts, detects unnecessary context, and recommends prompt engineering improvements for cost reduction.

scriptautomationtokens
47.8C
ScriptAI Infrastructure

CI/CD ML Pipeline

by AaaS

CI/CD pipeline for machine learning models with automated testing, evaluation, registry management, and staged deployment. Runs benchmark suites, compares against baseline metrics, and promotes models through staging environments with approval gates.

scriptautomationci-cd
47.8C
ScriptAI Tools & APIs

Bias Detection Script

by AaaS

Detects demographic and topical biases in LLM outputs by running structured test prompts across protected categories. Measures response quality disparities, sentiment differences, and representation gaps with statistical significance testing and bias scorecards.

scriptautomationbias
47.7C
ScriptAI Infrastructure

Rate Limiter Setup

by AaaS

Configures intelligent rate limiting for LLM API proxies with per-user, per-model, and per-endpoint limits. Implements token bucket, sliding window, and adaptive rate limiting algorithms with Redis-backed distributed state and graceful degradation.

scriptautomationrate-limiting
47.4C
ScriptAI Agents

Agent Deployment Script

by AaaS

Deploys AI agents as production services with health checks, graceful shutdown, error recovery, and monitoring integration. Supports Docker and Kubernetes deployments with configurable scaling, environment management, and rollback capabilities.

scriptautomationdeployment
47.4C
ScriptAI Tools & APIs

Red Teaming Script

by AaaS

Automated red teaming toolkit that generates and tests adversarial prompts against LLM applications. Covers jailbreak attempts, prompt injection variants, social engineering patterns, and boundary probing with categorized attack vectors and success tracking.

scriptautomationred-teaming
46.7C
ScriptAI Infrastructure

Latency Benchmarking

by AaaS

Benchmarks LLM API latency across providers, models, and prompt sizes with detailed statistical analysis. Measures time-to-first-token, inter-token latency, total response time, and generates comparison reports with confidence intervals and percentile distributions.

scriptautomationlatency
46.1C
ScriptAI Infrastructure

Kubernetes Model Serving

by AaaS

Deploys and manages LLM inference workloads on Kubernetes with GPU scheduling, auto-scaling based on queue depth, rolling updates, and canary deployments. Generates Helm charts and Kustomize configurations for reproducible deployments.

scriptautomationkubernetes
46.1C
ScriptAI for Code

Energy Forecast Script

by Community

Forecasts electricity demand and renewable generation (solar/wind) using Temporal Fusion Transformer or N-HiTS via NeuralForecast, with weather feature integration and probabilistic intervals for grid balancing. Outputs 24-hour and 7-day ahead forecasts in an InfluxDB-compatible format.

energydemand-forecastingtime-series
46.1C
ScriptAI Infrastructure

API Gateway Configuration

by AaaS

Configures an API gateway for LLM inference endpoints with provider routing, rate limiting, authentication, request/response logging, and failover between multiple LLM providers. Includes usage tracking and cost allocation by API key.

scriptautomationapi-gateway
46.1C
ScriptAI Infrastructure

Multi-Source RAG

by AaaS

RAG pipeline that queries multiple specialized vector indexes and merges results with intelligent routing. Implements source-aware retrieval with automatic query classification, per-source relevance scoring, and citation tracking across diverse knowledge domains.

scriptautomationrag
45.8C
ScriptAI Tools & APIs

A/B Testing Framework

by AaaS

Framework for A/B testing different LLM configurations including models, prompts, temperatures, and system instructions. Runs controlled experiments with statistical significance testing, effect size calculation, and automated winner selection.

scriptautomationab-testing
45.7C
ScriptAI Agents

Agent Monitoring Dashboard

by AaaS

Sets up a monitoring dashboard for AI agent systems tracking task completion rates, error rates, latency, token usage, and cost. Integrates with Prometheus for metrics collection and Grafana for visualization with pre-built alert rules.

scriptautomationmonitoring
45.3C
ScriptAI Infrastructure

Consent Management Script

by Community

Implements a GDPR-compliant consent management layer that records per-user data processing consents in an append-only ledger, enforces purpose limitation at the data access layer, and generates DSAR (data subject access request) reports on demand. Supports consent propagation to downstream ML training pipelines.

consentgdprdata-governance
45.2C
ScriptAI Infrastructure

Model Merging

by AaaS

Merges multiple fine-tuned model checkpoints using strategies like SLERP, TIES, DARE, and linear interpolation. Enables combining specialized model capabilities without additional training, with automated quality validation against benchmark suites.

scriptautomationmerging
45C
ScriptAI Infrastructure

Vector DB Migration

by AaaS

Migrates vector data between different vector database providers (Pinecone, Weaviate, Chroma, Qdrant, Milvus). Handles schema mapping, batch transfers, index recreation, metadata preservation, and validation with rollback support.

scriptautomationmigration
44.4C
ScriptAI Agents

Agent Testing Harness

by AaaS

Testing harness for AI agents with mock tool providers, simulated user interactions, and deterministic replay capabilities. Enables unit testing of agent logic, integration testing of tool chains, and end-to-end testing of complete agent workflows.

scriptautomationtesting
43.6C
ScriptAI Agents

A2A Communication Setup

by AaaS

Configures Agent-to-Agent (A2A) communication infrastructure with message routing, capability discovery, and protocol compliance. Sets up agent registries, message queues, and typed message schemas for reliable inter-agent collaboration.

scriptautomationa2a
40.7C
Benchmarkai-benchmarks

MLPerf Training

by MLCommons

MLPerf Training is a suite of benchmarks that measure the time it takes to train various machine learning models on different hardware and software platforms. It provides a standardized way to compare the performance of different AI training systems, driving innovation in hardware and software optimization for AI workloads.

traininghardwareperformance
89.3A
Benchmarkai-benchmarks

HELM: Holistic Evaluation of Language Models

by Stanford Center for Research on Foundation Models (CRFM)

HELM is a living benchmark designed to provide a comprehensive and holistic evaluation of language models across a wide range of scenarios and metrics. It aims to move beyond single-number evaluations by assessing models on factors like truthfulness, calibration, fairness, robustness, and efficiency, providing a more nuanced understanding of their capabilities and limitations.

language-modelsevaluationholistic
87A
BenchmarkComputer Vision

ImageNet

by Deng et al. / Stanford / Princeton

ImageNet (ILSVRC) is the foundational large-scale visual recognition benchmark with 1.2 million training images across 1,000 object categories. Top-1 and Top-5 accuracy on the validation set have been the standard measure of progress in image classification for over a decade.

image-classificationvisiontop-1-accuracy
81.2A
Benchmarkai-benchmarks

RoboSuite

by Stanford AI Lab

RoboSuite is a simulation framework and benchmark suite for robot learning. It provides a standardized set of environments and tasks for training and evaluating reinforcement learning algorithms in robotics, focusing on manipulation and locomotion tasks with realistic physics and sensor models.

roboticsreinforcement-learningsimulation
80.9A
Benchmarkai-benchmarks

AI2 Reasoning Challenge (ARC)

by Allen Institute for AI (AI2)

The AI2 Reasoning Challenge (ARC) is a question-answering dataset designed to evaluate advanced reasoning capabilities in AI systems. It consists of elementary-level science questions specifically crafted to be difficult for retrieval-based methods and require deeper understanding and reasoning to answer correctly.

reasoningquestion-answeringscience
80.7A
BenchmarkComputer Vision

COCO Detection

by Lin et al. / Microsoft

COCO Detection is the standard benchmark for object detection and instance segmentation, featuring 330,000 images with over 1.5 million annotated instances across 80 object categories. Mean Average Precision (mAP) at various IoU thresholds is the primary metric.

object-detectioninstance-segmentationvision
80.2A
BenchmarkSpeech & Audio AI

LibriSpeech

by Panayotov et al. / Johns Hopkins

LibriSpeech is the standard English automatic speech recognition (ASR) benchmark derived from LibriVox audiobooks, containing 1,000 hours of read speech at 16kHz. Word Error Rate (WER) on clean and noisy test splits drives competitive progress in ASR research.

asrspeech-recognitionenglish
79B+
BenchmarkComputer Vision

ADE20K Segmentation

by Zhou et al. / MIT CSAIL

ADE20K is the benchmark for semantic scene parsing, containing 25,000 images densely annotated with 150 semantic categories. Mean Intersection over Union (mIoU) is the standard metric, and it drives progress in perception systems for autonomous driving, robotics, and scene understanding.

semantic-segmentationscene-parsingvision
76B+
BenchmarkLLMs

GSM8K

by OpenAI

Grade School Math 8K benchmark with 8,500 linguistically diverse grade school math word problems requiring 2-8 step reasoning. Tests basic mathematical reasoning and arithmetic with problems that require sequential multi-step solutions.

benchmarkevaluationmath
75.7B+
BenchmarkAI for Code

SWE-bench Verified

by Princeton NLP

Human-validated subset of SWE-bench containing 500 problems verified by software engineers for correctness, clarity, and solvability. Provides a more reliable signal than the full SWE-bench by filtering out ambiguous or under-specified issues.

benchmarkevaluationsoftware-engineering
74.4B+
BenchmarkLLMs

MATH

by UC Berkeley

Collection of 12,500 competition mathematics problems from AMC, AIME, and other math competitions covering algebra, geometry, number theory, combinatorics, and more. Problems require multi-step reasoning and mathematical insight beyond pattern matching.

benchmarkevaluationmathematics
74.4B+
BenchmarkLLMs

ARC-AGI

by Chollet / ARC Prize Foundation

ARC-AGI (Abstraction and Reasoning Corpus for Artificial General Intelligence) measures fluid intelligence through visual grid transformation puzzles. Models must infer transformation rules from three or fewer examples and apply them to a test grid — a task trivially solved by humans but historically extremely difficult for AI systems.

agiabstract-reasoningvisual-patterns
74.1B+
BenchmarkLLMs

HellaSwag

by Allen AI

Evaluates commonsense natural language inference by asking models to select the most plausible continuation of a scenario. Uses adversarially filtered endings generated by language models, making it challenging for machines while trivial for humans.

benchmarkevaluationcommonsense
74B+
BenchmarkSpeech & Audio AI

Common Voice

by Mozilla Foundation

Common Voice is Mozilla's crowd-sourced multilingual speech corpus spanning 100+ languages with verified recordings from volunteers. It benchmarks ASR systems on low-resource and diverse language conditions, making it critical for evaluating cross-lingual speech model generalization.

asrmultilingualcrowdsourced
73.5B+
BenchmarkLLMs

MLPerf Inference

by MLCommons

MLPerf Inference is the industry-standard benchmark for measuring AI inference performance across hardware platforms. It covers image classification, object detection, NLP, speech recognition, and generative AI workloads, enabling fair apples-to-apples comparison of accelerators and inference stacks.

inferencethroughputlatency
73.1B+
BenchmarkLLMs

ARC Challenge

by Allen AI

AI2 Reasoning Challenge featuring grade-school science questions that require commonsense reasoning and world knowledge. The Challenge set contains questions that simple retrieval and co-occurrence methods fail to answer correctly.

benchmarkevaluationscience
73.1B+
BenchmarkLLMs

MedQA

by Jin et al. / UC San Diego

MedQA tests medical knowledge using free-form multiple-choice questions drawn from the US Medical Licensing Examination (USMLE). It evaluates whether language models can reason through complex clinical scenarios requiring deep biomedical knowledge.

medicalqaclinical
72.8B+
BenchmarkLLMs

MT-Bench

by LMSYS

Multi-turn conversation benchmark with 80 high-quality questions across 8 categories including writing, reasoning, math, coding, and extraction. Uses GPT-4 as an automated judge to evaluate response quality on a 1-10 scale across two conversation turns.

benchmarkevaluationmulti-turn
72.2B+
BenchmarkLLMs

FLORES-200

by NLLB Team / Meta AI

FLORES-200 is a many-to-many multilingual translation benchmark covering 200 languages, including many low-resource ones. It evaluates machine translation systems across 40,000 language direction pairs, making it the most comprehensive translation benchmark for assessing cross-lingual generalization.

translationmultilinguallow-resource
72.2B+
BenchmarkLLMs

GPQA

by NYU

Graduate-level Google-Proof Question Answering benchmark featuring questions written by domain experts in physics, chemistry, and biology. Questions are designed to be unsearchable, requiring genuine reasoning rather than memorization.

benchmarkevaluationgraduate-level
71.6B+
BenchmarkLLMs

TruthfulQA

by University of Oxford

Measures whether language models generate truthful answers to questions where humans are commonly mistaken. Covers health, law, finance, and politics topics where popular misconceptions and conspiracies create systematic failure modes.

benchmarkevaluationtruthfulness
71B+
BenchmarkAI for Code

MBPP

by Google Research

Mostly Basic Programming Problems — a collection of 974 crowd-sourced Python programming tasks with natural language descriptions and test cases. Tests foundational programming ability including string manipulation, list processing, and basic algorithms.

benchmarkevaluationcoding
71B+
BenchmarkComputer Vision

Flickr30k

by Young et al. / University of Illinois

Flickr30k is a benchmark for image-text retrieval and visual grounding, comprising 31,783 Flickr images each paired with five human-written captions. Models are evaluated on bidirectional image-to-text and text-to-image retrieval recall at ranks 1, 5, and 10.

image-captioningvisual-groundingretrieval
70.9B+
BenchmarkLLMs

Needle-in-a-Haystack

by Greg Kamradt (community)

Needle-in-a-Haystack is a pressure test for long-context language models that places a single fact (the needle) at a specific position within a long document (the haystack) and asks the model to retrieve it. It systematically varies both context length and needle depth to reveal performance degradation patterns.

long-contextretrievalsingle-fact
70.4B+
BenchmarkComputer Vision

VQA v2

by Georgia Tech / VT

Visual Question Answering benchmark requiring models to answer open-ended questions about images. Version 2 balances the dataset to reduce language biases, ensuring models must genuinely understand image content rather than relying on question-type priors.

benchmarkevaluationmultimodal
70.3B+
BenchmarkLLMs

BIG-Bench Hard

by Google DeepMind

Curated subset of 23 challenging BIG-Bench tasks where prior language models performed below average human raters. Specifically designed to test tasks that benefit significantly from chain-of-thought prompting and multi-step reasoning.

benchmarkevaluationreasoning
70.1B+
BenchmarkLLMs

WinoGrande

by Allen AI

Large-scale dataset for commonsense coreference resolution inspired by Winograd schemas. Tests whether models can correctly resolve pronoun references based on world knowledge and commonsense reasoning in carefully constructed sentence pairs.

benchmarkevaluationcommonsense
69.7B
BenchmarkAI Ethics & Safety

RealToxicityPrompts

by Gehman et al. / Allen Institute for AI

RealToxicityPrompts measures the propensity of language model generations to produce toxic content when conditioned on a diverse set of 100,000 naturally occurring prompts extracted from the web. It uses the Perspective API to score generated text on toxicity dimensions.

toxicitygenerationsafety
69.7B
BenchmarkLLMs

PubMedQA

by Jin et al. / Carnegie Mellon University

PubMedQA is a biomedical question-answering dataset sourced from PubMed abstracts. Models must answer yes/no/maybe questions about biomedical research findings, testing the ability to reason over scientific literature.

medicalbiomedicalresearch
68.4B
BenchmarkLLMs

LegalBench

by Guha et al. / Stanford CodeX

LegalBench is a collaboratively built benchmark measuring the legal reasoning ability of large language models across 162 tasks spanning issue spotting, rule recall, rule application, and legal interpretation. It provides a comprehensive evaluation of whether models can reason like lawyers.

legalreasoningnlp
68.3B
BenchmarkLLMs

ScienceQA

by Lu et al. / UCLA

ScienceQA is a large-scale multimodal benchmark featuring 21,208 science questions for grades 3-12. It uniquely combines visual diagrams and textual contexts, requiring models to perform complex reasoning. Each question includes multiple-choice options, a detailed lecture, and a step-by-step explanation for the correct answer.

benchmarkscience-qamultimodal-reasoning
68B
BenchmarkLLMs

AlpacaEval

by Stanford

Automated evaluation framework comparing model outputs against a reference model on 805 instructions. Uses LLM judges to determine win rates, with length-controlled metrics to avoid rewarding verbosity over quality.

benchmarkevaluationinstruction-following
67.9B
BenchmarkLLMs

BioASQ

by Tsatsaronis et al. / BioASQ Challenge

BioASQ is a large-scale benchmark for biomedical semantic question answering. It challenges systems to perform document retrieval, concept mapping, and answer extraction from PubMed literature. The benchmark includes diverse question types like yes/no, factoid, list, and summary, with gold-standard answers curated by experts.

biomedicalquestion-answeringinformation-retrieval
67.7B
BenchmarkLLMs

MMLU-Pro

by TIGER-Lab

MMLU-Pro is a challenging benchmark designed to evaluate the advanced reasoning and knowledge capabilities of frontier AI models. It enhances the original MMLU by introducing harder, professionally-vetted questions, expanding answer choices from 4 to 10, and reducing sensitivity to prompt formatting for a more robust and discriminative assessment.

benchmarkmodel-evaluationllm-testing
67.2B
BenchmarkAI Agents

ToolBench

by Qin et al. / Tsinghua University

ToolBench evaluates LLMs on their ability to use real-world REST APIs to complete user instructions. It provides 16,000+ real APIs from RapidAPI Hub across 49 categories and 12,000+ instruction–API solution pairs, measuring whether models can plan and execute multi-step API call sequences.

tool-useapiagents
67B
BenchmarkComputer Vision

MMMU

by CUHK / Waterloo

MMMU is a challenging multimodal benchmark designed to evaluate large models on expert-level tasks. It contains over 11,500 college-level problems spanning six core disciplines, requiring models to integrate deep subject knowledge with visual perception to answer multiple-choice questions with detailed reasoning.

benchmarkevaluationmultimodal
66.9B
BenchmarkLLMs

DROP

by Allen AI

DROP (Discrete Reasoning Over Paragraphs) is a challenging benchmark designed to evaluate a model's numerical reasoning capabilities within textual contexts. It requires systems to read paragraphs and answer questions that involve discrete operations like addition, counting, sorting, or comparison. Unlike simpler QA datasets, DROP necessitates multi-step reasoning processes, pushing models beyond basic information retrieval.

benchmarkdatasetevaluation
66.7B
BenchmarkAI Ethics & Safety

ToxiGen

by Hartvigsen et al. / MIT

ToxiGen is a large-scale, machine-generated dataset for evaluating nuanced hate speech detection. It contains over 274,000 toxic and benign statements about 13 minority groups, designed to challenge models to identify implicit toxicity without relying on obvious slurs or surface-level cues.

toxicity-detectionhate-speechimplicit-bias
66.4B
BenchmarkLLMs

BigCodeBench

by Zhuo et al. / BigCode / Hugging Face

BigCodeBench is a challenging benchmark for evaluating large language models on practical, function-level code generation tasks. It comprises 1,140 problems that require the use and integration of popular Python libraries like NumPy, Pandas, and Scikit-learn, moving beyond simple algorithmic puzzles to mirror real-world software development scenarios.

benchmarkcode-generationllm-evaluation
66.3B
BenchmarkLLMs

TyDi QA

by Clark et al. / Google Research

TyDi QA is a multilingual question-answering benchmark featuring 11 typologically diverse languages. Questions are written natively by speakers of each language, ensuring genuine linguistic challenges and avoiding translation artifacts. It is designed to evaluate reading comprehension across a wide range of language structures.

question-answeringmultilingualtypologically-diverse
66.1B
BenchmarkLLMs

MedMCQA

by Pal et al. / IIT Kanpur

MedMCQA is a massive multiple-choice question dataset sourced from Indian medical entrance examinations like AIIMS and NEET-PG. It contains over 194,000 questions covering 2,400 healthcare topics, designed to rigorously test a model's breadth of medical knowledge and reasoning abilities across multiple subjects.

medicalmcqindian-medical
65.5B
BenchmarkLLMs

RULER

by Hsieh et al. / NVIDIA

RULER is a synthetic benchmark for evaluating large language models in long-context scenarios, scaling from 4K to 128K tokens. It assesses complex skills like multi-hop retrieval, aggregation, and coreference resolution, offering a more nuanced analysis than simple 'needle-in-a-haystack' tests.

long-context-evaluationllm-benchmarkretrieval-testing
65.2B
BenchmarkLLMs

AIME 2024

by MAA

A highly challenging benchmark for evaluating the mathematical reasoning of frontier AI models. It uses 30 problems from the 2024 American Invitational Mathematics Examination (AIME), which are designed to test creative problem-solving, multi-step deduction, and knowledge across number theory, geometry, algebra, and combinatorics.

benchmarkmodel-evaluationmathematics
64.9B
BenchmarkAI Ethics & Safety

BBQ (Bias Benchmark for QA)

by Parrish et al. / NYU

BBQ is a question-answering benchmark designed to expose social biases in language models. It uses ambiguous and disambiguated questions related to nine protected categories to measure a model's tendency to rely on harmful stereotypes when context is lacking versus its ability to answer correctly when enough information is provided.

biasqasocial-bias
64.6B
BenchmarkLLMs

LongBench

by Bai et al. / Tsinghua University

LongBench is a comprehensive bilingual benchmark designed to evaluate the long-context understanding capabilities of large language models in English and Chinese. It comprises 21 diverse tasks, including single and multi-document QA, summarization, and code completion, with an average context length of over 6,700 tokens to rigorously test model performance on extended inputs.

long-contextbilingualmulti-task
64.5B
BenchmarkSpeech & Audio AI

MusicCaps

by Agostinelli et al. / Google DeepMind

MusicCaps is a benchmark dataset of 5,521 music clips from AudioSet, each paired with a detailed text description written by professional musicians. It is primarily used for evaluating text-to-music generation models, as well as for music captioning, retrieval tasks, and fine-tuning audio-language models.

musicaudio-captioningmultimodal
64.3B
BenchmarkLLMs

IFEval

by Google Research

Instruction-Following Evaluation benchmark testing models' ability to precisely follow verifiable formatting instructions. Includes constraints like word count limits, specific formatting requirements, keyword inclusion/exclusion, and structural rules that can be programmatically verified.

benchmarkevaluationinstruction-following
64.3B
BenchmarkLLMs

Chatbot Arena Hard

by LMSYS

Chatbot Arena Hard is a static benchmark composed of 500 challenging prompts curated from Chatbot Arena. It is designed to rigorously evaluate and differentiate the capabilities of large language models. The benchmark utilizes an automated judging system, typically employing a powerful model like GPT-4, to provide a quick, reproducible proxy for human preference.

benchmarkevaluationchat
63.9B
BenchmarkAI for Code

HumanEval+

by BigCode

HumanEval+ is a benchmark for rigorously evaluating code generation models. It augments the original HumanEval dataset by expanding the test suite for each of its 164 problems by 80x. This extensive testing helps uncover subtle bugs and failures on edge cases that simpler benchmarks miss, providing a more accurate measure of a model's true coding ability.

benchmarkevaluationcoding
63.8B
BenchmarkAI Ethics & Safety

CyberSecEval

by Meta AI

CyberSecEval is a benchmark developed by Meta to assess the cybersecurity risks associated with Large Language Models (LLMs). It evaluates a model's propensity to generate insecure code, assist in exploiting vulnerabilities, and facilitate attacks, helping safety teams quantify the dual-use risk of code-capable models.

cybersecurityai-safetyllm-evaluation
63.8B
BenchmarkComputer Vision

DocVQA

by CVC Barcelona

DocVQA is a large-scale dataset and benchmark for Visual Question Answering on document images. It challenges models to answer questions by reading and interpreting text, understanding layouts, and reasoning about information within complex documents like forms, invoices, and reports. It serves as a standard for evaluating document intelligence systems.

benchmarkdatasetdocument-ai
63.1B
BenchmarkLLMs

FinanceBench

by Islam et al. / Patronus AI

FinanceBench is a benchmark designed to evaluate the financial question-answering capabilities of Large Language Models. It uses publicly available corporate documents like 10-K filings and earnings reports to test models on information retrieval, numerical reasoning, and multi-step financial calculations, providing a standardized testbed for financial AI.

financeragnumerical-reasoning
62.8B
BenchmarkAI Agents

WebArena

by CMU

WebArena is a realistic and reproducible benchmark environment designed to evaluate autonomous language agents. It tests an agent's ability to perform complex, multi-step tasks across a diverse set of self-hosted websites, including e-commerce, forums, and content management systems, using real web interfaces.

benchmarkagent-evaluationweb-benchmark
62.4B
BenchmarkLLMs

XL-Sum

by Hasan et al. / University of Edinburgh

XL-Sum is a large-scale benchmark dataset for multilingual abstractive summarization. It contains 1.35 million article-summary pairs from BBC News across 44 languages, designed to evaluate a model's ability to generate concise summaries across diverse linguistic families and writing systems.

summarizationmultilingualnews
62.2B
BenchmarkAI Agents

GAIA Benchmark

by Meta / Hugging Face

GAIA (General AI Assistants) is a benchmark for evaluating AI models on complex, real-world tasks. It features questions with unambiguous factual answers that require sophisticated capabilities like multi-step reasoning, web browsing, and tool use. GAIA is designed to test the practical limits of general-purpose AI assistants.

benchmarkevaluationagents
62.2B
BenchmarkAI Ethics & Safety

CrowS-Pairs

by Nangia et al. / NYU

CrowS-Pairs is a benchmark dataset for evaluating social bias in masked language models. It contains 1,508 sentence pairs with stereotypical and anti-stereotypical statements across nine bias types. The benchmark measures a model's preference for stereotypical completions using pseudo-log-likelihood scores.

biasstereotypesmasked-lm
62B
BenchmarkLLMs

MGSM

by Google Research

MGSM (Multilingual Grade School Math) is a benchmark for evaluating the mathematical reasoning of large language models across multiple languages. It consists of 250 grade-school math problems from the GSM8K dataset, professionally translated into ten typologically diverse languages, including low-resource ones like Swahili and Telugu.

benchmarkevaluationmath
61.4B
BenchmarkAI Agents

AgentBoard

by Ma et al. / Shanghai AI Lab

AgentBoard is a comprehensive evaluation framework for Large Language Model (LLM) based agents. It assesses agent performance across nine diverse tasks, including embodied AI, gaming, web browsing, and tool use. The framework uniquely measures both final task success and partial progress through a fine-grained sub-goal metric.

agent-evaluationllm-benchmarkmulti-task-evaluation
61.1B
BenchmarkLLMs

ContractNLI

by Koreeda & Manning / Stanford NLP

ContractNLI is a dataset for natural language inference (NLI) focused on contract understanding. It challenges models to determine if a hypothesis about a contract is entailed, contradicted, or not mentioned by the contract text. This simulates real-world legal document review, testing a model's ability to reason over complex legal language.

legalnlicontract
60.8B
BenchmarkLLMs

SimpleQA

by OpenAI

SimpleQA is a benchmark dataset developed by OpenAI to assess the factual accuracy of language models. It consists of simple, unambiguous questions that have a single, verifiable correct answer. The benchmark is designed to measure a model's ability to recall factual knowledge and, crucially, to abstain from answering when it is uncertain, providing a measure of its calibration.

benchmarkevaluationfactuality
60.4B
BenchmarkLLMs

Humanity's Last Exam

by CAIS

Humanity's Last Exam is a crowdsourced benchmark designed to rigorously test the limits of advanced AI systems. It comprises extremely difficult questions contributed by domain experts across diverse fields like science, math, and philosophy, serving as a public evaluation for frontier model capabilities in complex reasoning and specialized knowledge.

benchmarkevaluationfrontier-testing
60.2B
BenchmarkAI Ethics & Safety

WinoBias

by Zhao et al. / USC

WinoBias is a benchmark dataset designed to measure gender bias in coreference resolution systems. It consists of sentence pairs where pronouns refer to individuals in stereotyped or non-stereotyped occupations, allowing for the quantification of a model's reliance on gender stereotypes versus grammatical correctness.

biasgender-biascoreference
59.8C+
BenchmarkLLMs

InfiniteBench

by Zhang et al. / Peking University

InfiniteBench is a benchmark designed to evaluate the long-context capabilities of large language models. It features tasks that require processing and reasoning over inputs exceeding 100,000 tokens, including math, code debugging, and retrieval from novels, where crucial information is distributed across the entire context.

long-contextllm-evaluationbenchmark
59.6C+
BenchmarkAI Agents

AgentBench

by Tsinghua University

Comprehensive benchmark evaluating LLM agents across 8 distinct environments including operating systems, databases, knowledge graphs, digital card games, lateral thinking puzzles, and web shopping. Tests generalization of agent capabilities across diverse interaction paradigms.

benchmarkevaluationagents
59.3C+
BenchmarkLLMs

Minerva Math

by Google Research

Minerva Math is a quantitative reasoning benchmark designed to evaluate large language models on complex STEM problems. Sourced from web pages with LaTeX and arXiv preprints, it covers subjects like math, physics, and chemistry, requiring multi-step computation, symbolic manipulation, and deep scientific understanding to solve.

benchmarkevaluationmathematics
58.9C+
BenchmarkLLMs

CaseHOLD

by Zheng et al. / Berkeley Law / LexGLUE

CaseHOLD is a legal NLP benchmark for evaluating a model's ability to identify the correct holding statement for a US court case. Given a citing context, the model must choose the correct holding from a list of candidates. Sourced from over 53,000 cases, it is a core component of the LexGLUE benchmark suite for legal AI.

legal-nlpbenchmarkcase-law
58.8C+
BenchmarkAI Agents

API-Bank

by Li et al. / Wuhan University

API-Bank is a comprehensive benchmark for evaluating tool-augmented LLMs. It features 73 diverse APIs and assesses models on three levels: API retrieval, API calling, and complex planning. The benchmark measures both the correctness of tool selection and the accuracy of execution, providing a thorough test of an agent's capabilities.

tool-useapi-callagents
58.8C+
BenchmarkComputer Vision

MathVista

by UCLA

Mathematical reasoning benchmark requiring visual understanding of charts, plots, geometry diagrams, and infographics. Tests the intersection of visual perception and mathematical reasoning with 6,141 problems from 28 existing datasets and 3 newly collected ones.

benchmarkevaluationmultimodal
58.3C+
BenchmarkAI for Code

Aider Polyglot

by Aider

Multi-language code editing benchmark testing models' ability to make targeted code changes across Python, JavaScript, TypeScript, Java, C++, and other languages. Evaluates real-world code modification tasks rather than generation from scratch.

benchmarkevaluationcoding
58.2C+
BenchmarkAI Agents

MLAgentBench

by Huang et al. / Stanford

MLAgentBench challenges AI agents to perform machine learning research tasks autonomously — reading papers, writing code, running experiments, analyzing results, and improving models. It tests whether agents can replicate and build upon real ML research across 13 diverse ML tasks.

agentsml-researchcoding
57.9C+
BenchmarkLLMs

FrontierMath

by Epoch AI

Benchmark of original, research-level mathematics problems created by professional mathematicians. Tests capabilities at the frontier of mathematical reasoning including novel proofs, advanced computation, and multi-domain mathematical synthesis.

benchmarkevaluationmathematics
55.9C+
BenchmarkLLMs

ClinicalCamel Benchmark

by Toma et al. / University of Toronto

ClinicalCamel Benchmark evaluates open-source language models on clinical dialogue and medical instruction-following tasks derived from physician–patient interactions. It focuses on safety, accuracy, and appropriateness of clinical advice generation.

medicalclinicalinstruction-following
55.9C+
BenchmarkAI for Code

Codeforces Benchmark

by Codeforces / Community

Evaluates models on competitive programming problems from the Codeforces platform across difficulty ratings. Tests algorithmic thinking, data structure knowledge, and the ability to produce correct and efficient solutions under competitive constraints.

benchmarkevaluationcompetitive-programming
55.7C+
BenchmarkAI Agents

TAU-bench

by Sierra AI

Tool-Agent-User benchmark evaluating AI agents on realistic customer service scenarios requiring multi-step tool use. Tests agents' ability to navigate complex workflows, use tools correctly, follow policies, and handle edge cases in airline and retail domains.

benchmarkevaluationagents
54.8C+
BenchmarkAI for Code

MLE-bench

by OpenAI

Benchmark evaluating AI agents on real Kaggle machine learning competitions. Tests the full ML engineering pipeline including data exploration, feature engineering, model selection, training, and submission formatting against actual competition leaderboards.

benchmarkevaluationmachine-learning
54.8C+
BenchmarkAI Agents

OSWorld

by University of Hong Kong

Benchmark for evaluating multimodal agents on real operating system tasks spanning Ubuntu, Windows, and macOS environments. Tests agents' ability to interact with desktop applications, file systems, terminals, and GUI elements to complete everyday computer tasks.

benchmarkevaluationagents
53.7C+
BenchmarkComputer Vision

RealWorldQA

by xAI

Benchmark testing multimodal models on practical real-world visual understanding tasks. Features questions about real photographs requiring spatial reasoning, object recognition, scene understanding, and practical knowledge that goes beyond simple object detection.

benchmarkevaluationmultimodal
53.1C+
BenchmarkLLMs

EnergyBench

by Lannelongue et al. / EMBL-EBI

EnergyBench quantifies the energy consumption and carbon footprint of AI inference across hardware and software configurations. It correlates task accuracy with joules consumed, enabling practitioners to make informed accuracy-efficiency trade-offs for sustainable AI deployment.

energyefficiencysustainability
49C
BenchmarkLLMs

GreenAI Benchmark

by Schwartz et al. / AI2 / University of Washington

GreenAI Benchmark evaluates the efficiency of AI training and inference by reporting accuracy alongside FLOPs, parameters, and CO2 emissions. It promotes the efficiency metric paradigm where reporting results without computational cost is considered incomplete science.

green-aiefficiencyflops
48.5C
Benchmarkbenchmarks-evaluation

SWE-bench

by Princeton NLP

SWE-bench is a benchmark for evaluating AI systems' ability to resolve real GitHub issues from popular Python repositories. Each instance requires understanding a codebase, identifying the bug, and producing a correct patch. SWE-bench Verified is the curated subset accepted as the standard for coding agent evaluation by the AI industry.

benchmarkcodingsoftware-engineering
44C
Benchmarkbenchmarks-evaluation

MTEB

by Hugging Face / MTEB Team

MTEB (Massive Text Embedding Benchmark) is the standard benchmark for evaluating text embedding models across 8 task types (retrieval, clustering, classification, etc.) and 112 datasets. The MTEB leaderboard on Hugging Face is the primary reference for selecting embedding models and is updated continuously as new models are released.

benchmarkembeddingsretrieval
44C
Benchmarkbenchmarks-evaluation

MMLU

by UC Berkeley

MMLU (Massive Multitask Language Understanding) is a comprehensive benchmark covering 57 academic subjects from elementary to professional level, including STEM, law, medicine, and social sciences. It became the standard for measuring general knowledge breadth in LLMs and is included in virtually every model evaluation suite.

benchmarkknowledgemultitask
44C
Benchmarkbenchmarks-evaluation

LiveBench

by LiveBench OSS

LiveBench is a contamination-resistant benchmark that continuously updates with new questions sourced from recent math competitions, research papers, and news. By using only data post-dating model training cutoffs, LiveBench mitigates benchmark saturation and provides more reliable capability assessments of frontier models.

benchmarkcontamination-resistantlive
44C
Benchmarkbenchmarks-evaluation

HumanEval

by OpenAI

HumanEval is OpenAI's code generation benchmark consisting of 164 hand-written Python programming problems with unit tests. It measures a model's ability to generate syntactically correct and functionally complete code from docstring descriptions. HumanEval is the foundational coding benchmark that all subsequent code benchmarks build upon.

benchmarkcodingpython
44C
Benchmarkbenchmarks-evaluation

HELM

by Stanford CRFM

HELM (Holistic Evaluation of Language Models) from Stanford CRFM provides a multi-dimensional evaluation framework that measures LLMs across accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency. It evaluates models on 42 scenarios and 59 metrics, providing the most comprehensive public assessment of LLM capabilities and risks.

benchmarkholisticfairness
44C
Benchmarkbenchmarks-evaluation

GPQA Diamond

by NYU / Cohere

GPQA Diamond (Graduate-Level Google-Proof Q&A) is a challenging multiple-choice benchmark requiring expert-level knowledge in biology, chemistry, and physics. Questions are designed to be answerable by domain PhD students but not by web search. GPQA Diamond is the standard for measuring frontier scientific reasoning capability.

benchmarksciencereasoning
44C
Benchmarkbenchmarks-evaluation

Chatbot Arena

by LMSys

Chatbot Arena is a crowdsourced human evaluation platform from LMSys where users anonymously compare responses from two random LLMs and vote for the better one. The resulting Elo-based leaderboard (LMSYS Leaderboard) is widely regarded as the most reliable measure of real-world LLM preference across diverse user tasks.

benchmarkhuman-evaluationelo
44C
Benchmarkbenchmarks-evaluation

ARC-AGI-2

by ARC Prize Foundation

ARC-AGI-2 is the second iteration of François Chollet's Abstraction and Reasoning Corpus benchmark, designed to measure fluid intelligence and generalization in AI systems. Tasks require identifying abstract visual patterns that cannot be solved by memorization, targeting a capability gap that separates current LLMs from human-level reasoning.

benchmarkagiabstraction
44C
Benchmarkbenchmarks-evaluation

AIME 2025

by MAA / Community Eval

AIME (American Invitational Mathematics Examination) 2025 is used as a frontier math reasoning benchmark for LLMs. The competition-level math problems require multi-step reasoning without lookup, making AIME scores a direct indicator of a model's mathematical problem-solving depth. Frontier models are evaluated on the 2025 problem set to avoid training data contamination.

benchmarkmathreasoning
44C
Benchmark

MATH-500

by

Mathematics benchmark testing advanced problem-solving from algebra to competition mathematics.

mathreasoningproblem-solving
36D
Benchmark

Arena-Hard Auto

by

Automated benchmark derived from Chatbot Arena for evaluating instruction-following and open-ended generation.

evaluationinstructionautomated
32D
Datasetai-datasets

AI2 Reasoning Challenge (ARC)

by Allen Institute for AI (AI2)

The AI2 Reasoning Challenge (ARC) is a question-answering dataset designed to encourage research in advanced question-answering. It consists of grade-school science questions specifically crafted to require reasoning beyond simple fact retrieval, posing a significant challenge for AI models.

question answeringreasoningscience
84.2A
DatasetComputer Vision

ImageNet-1K

by ImageNet / Stanford Vision Lab

The canonical large-scale visual recognition benchmark containing 1.28 million training images across 1,000 object categories. ImageNet-1K underpins the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) and has driven the majority of deep learning breakthroughs in computer vision since 2012.

image-classificationobject-recognitionbenchmark
83.3A
DatasetComputer Vision

COCO 2017

by Microsoft

Microsoft COCO (Common Objects in Context) 2017 provides 118K training images with 860K object instances annotated with bounding boxes, segmentation masks, keypoints, and captions across 80 object categories. It remains the primary benchmark for object detection and instance segmentation research.

object-detectionsegmentationkeypoints
82.5A
Datasetscientific

Protein Data Bank

by RCSB PDB / wwPDB Consortium

The RCSB Protein Data Bank (PDB) is the single worldwide archive of experimentally determined 3D structures of proteins, nucleic acids, and complex assemblies, currently containing over 220,000 biological macromolecular structures determined by X-ray crystallography, NMR, and cryo-EM. It is the foundational structural dataset for computational biology and was used to train and validate AlphaFold2 and other structure-prediction models.

proteinsstructuresbiology
81.9A
Datasetscientific

UniProt

by UniProt Consortium (EMBL-EBI / SIB / PIR)

UniProt (Universal Protein Resource) is the world's comprehensive, freely accessible protein sequence and functional information database, maintained by a consortium of EMBL-EBI, SIB, and PIR. It contains over 250 million protein sequences in UniParc, with 570,000+ manually reviewed entries in SwissProt providing expert-curated functional annotations, and serves as the gold-standard training source for protein language models.

proteinsbiologysequences
80.9A
Datasetbenchmarks

MMLU Dataset

by UC Berkeley

Massive Multitask Language Understanding (MMLU) is a benchmark covering 57 academic subjects from STEM to humanities, with 14,000+ multiple-choice questions at undergraduate and professional level. It has become the de facto standard for measuring broad world knowledge and academic reasoning in LLMs.

benchmarkmultiple-choiceknowledge
80.9A
Datasetknowledge

Wikipedia (Processed)

by Wikimedia Foundation / Hugging Face

The processed Wikipedia dataset is a cleaned and tokenized version of Wikipedia dumps covering 20+ languages, available via Hugging Face Datasets. With HTML stripped and paragraph structure preserved, it is one of the most universally used pretraining corpora and a standard knowledge-grounding source for retrieval-augmented generation (RAG) baselines and open-domain QA systems.

wikipediaencyclopedicpretraining
80.2A
DatasetLLMs

Wikipedia Dump

by Wikimedia Foundation

The full text dump of Wikipedia articles available in over 300 languages, regularly updated and distributed by the Wikimedia Foundation. It is one of the most universally included components in language model pretraining pipelines due to its high factual density, editorial quality, and broad topical coverage.

nlpencyclopedicfactual
80.2A
DatasetSpeech & Audio AI

LibriSpeech

by OpenSLR / Johns Hopkins University

LibriSpeech is a corpus of approximately 1,000 hours of 16kHz read English speech derived from LibriVox audiobooks, split into clean and other subsets of 100h and 360h for training, with dedicated development and test sets. It has become the de facto standard benchmark for English ASR systems.

automatic-speech-recognitionASRenglish
80.2A
Datasetbenchmarks

GSM8K Dataset

by OpenAI

Grade School Math 8K is a dataset of 8,500 high-quality linguistically diverse grade school math word problems requiring 2-8 step reasoning. Created by OpenAI, GSM8K is widely used for evaluating multi-step arithmetic reasoning and the effectiveness of chain-of-thought prompting.

benchmarkmathgrade-school
79.8B+
Datasetscientific

PubChem

by NCBI / NIH

PubChem is the world's largest open chemical database maintained by the NCBI, containing information on over 115 million compounds, 295 million substances, and 270 million bioactivity outcomes from more than 1.2 million assays. It provides standardized molecular structures, properties, and biological activity data freely accessible via REST API and bulk download, making it the canonical resource for cheminformatics and drug discovery research.

chemistrymoleculesbioassay
79.6B+
Datasetai-datasets

GENIE Benchmark

by Stanford University

The GENIE Benchmark is a comprehensive dataset for evaluating the performance of text-to-SQL models. It includes a diverse set of SQL queries and corresponding natural language questions across multiple domains, designed to assess the generalization capabilities of these models.

text-to-sqlnatural language processingdatabase
79.2B+
DatasetAI for Code

HumanEval Dataset

by OpenAI

A curated set of 164 handwritten Python programming problems released by OpenAI, each consisting of a function signature, docstring, reference solution, and unit tests. HumanEval introduced the pass@k metric for functional code correctness evaluation and has become the de facto standard benchmark reported in virtually every code generation model paper.

codeevaluationpython
79B+
Datasetmedical

MIMIC-IV

by MIT Laboratory for Computational Physiology / Beth Israel Deaconess Medical Center

MIMIC-IV (Medical Information Mart for Intensive Care) is a comprehensive de-identified electronic health record database covering over 300,000 patients admitted to Beth Israel Deaconess Medical Center's ICU between 2008 and 2019. It contains detailed clinical data including diagnoses, procedures, medications, laboratory values, and waveforms, enabling a wide range of clinical AI research.

ehrclinicalicu
78.8B+
Datasetbenchmarks

MATH Dataset

by UC Berkeley

A challenging benchmark of 12,500 competition mathematics problems from AMC, AIME, and similar competitions across 5 difficulty levels and 7 subjects. Each problem includes a full step-by-step solution in LaTeX, making it suitable for both evaluation and training of mathematical reasoning.

benchmarkcompetition-mathhard-math
77.3B+
DatasetComputer Vision

SA-1B (Segment Anything)

by Meta AI

SA-1B is Meta AI's massive segmentation dataset released alongside the Segment Anything Model (SAM), containing over 1 billion high-quality segmentation masks across 11 million diverse, high-resolution images. It is the largest segmentation dataset ever created and enables training of generalist vision models with strong zero-shot transfer capabilities.

segmentationSAMfoundation-model
77.2B+
Datasetbenchmarks

HellaSwag Dataset

by University of Washington

HellaSwag is an adversarially filtered commonsense NLI benchmark where models must pick the most plausible sentence completion from 4 options. Humans score 95%+ while early LLMs struggled below 50%, making it a robust test of grounded language understanding and commonsense reasoning.

benchmarkcommonsensesentence-completion
77B+
DatasetLLMs

Common Crawl

by Common Crawl Foundation

The world's largest open repository of web crawl data, maintained by the non-profit Common Crawl Foundation and updated with new crawls monthly since 2011. It forms the foundational raw data layer for virtually every major language model pretraining pipeline including GPT-3, LLaMA, PaLM, and Falcon, typically after quality filtering and deduplication steps.

nlpweb-crawlmassive-scale
76.4B+
Datasetbenchmarks

ARC Dataset

by Allen Institute for AI

The AI2 Reasoning Challenge (ARC) dataset contains 7,787 grade 3–9 science exam questions split into Easy and Challenge partitions. The Challenge set contains questions that require deeper reasoning and world knowledge, making it a reliable signal for advanced language understanding.

benchmarkscience-questionsmultiple-choice
76.2B+
DatasetComputer Vision

Open Images V7

by Google

Google's Open Images V7 is one of the largest existing datasets with object-level annotations, containing approximately 9 million images annotated with image-level labels, object bounding boxes, object segmentation masks, visual relationships, and localized narratives across 600+ object classes.

object-detectionsegmentationvisual-relationships
76.1B+
Datasetbenchmarks

TruthfulQA Dataset

by University of Oxford

TruthfulQA measures the truthfulness of LLMs across 817 adversarially crafted questions spanning 38 categories where humans are commonly misled by false beliefs. Models are scored on generating truthful AND informative answers, revealing how larger models can paradoxically become more confidently wrong.

benchmarktruthfulnesshallucination
75.1B+
Datasetknowledge

Stack Exchange Dump

by Stack Exchange

The Stack Exchange Data Dump is a quarterly XML export of all public questions, answers, comments, and votes across the entire Stack Exchange network of 170+ Q&A communities including Stack Overflow. Containing hundreds of millions of high-quality technical and domain-specific Q&A pairs, it is a critical pretraining source for code and reasoning capabilities and a standard retrieval benchmark for dense passage retrieval.

qacommunitycode
75B+
Datasetbenchmarks

SuperGLUE

by New York University

SuperGLUE is a benchmark suite of 8 challenging NLU tasks including question answering, coreference resolution, causal reasoning, and word-sense disambiguation, designed as a harder successor to GLUE. It includes human baselines and has driven significant progress in pre-trained language model capabilities.

benchmarknlp-benchmarknatural-language-understanding
74.5B+
DatasetComputer Vision

LAION-5B

by LAION

The largest openly available image-text pair dataset, containing 5.85 billion CLIP-filtered image-text pairs across English, multilingual, and aesthetic subsets. LAION-5B was the primary training corpus for Stable Diffusion, DALL-E 2 replications, and numerous open vision-language models, enabling the open-source community to train competitive text-to-image generation models.

multimodalimage-textlarge-scale
74.2B+
DatasetComputer Vision

ADE20K Dataset

by MIT CSAIL

ADE20K is a densely annotated semantic segmentation dataset containing over 27,000 images with pixel-level annotations for 150 semantic categories covering both indoor and outdoor scenes. It is the primary benchmark for scene parsing and semantic segmentation tasks in the computer vision community.

semantic-segmentationscene-parsingscene-understanding
74.2B+
Datasetbenchmarks

WinoGrande Dataset

by Allen Institute for AI

WinoGrande is a large-scale crowdsourced dataset of 44,000 Winograd-style fill-in-the-blank commonsense problems, debiased using the AFLITE algorithm to minimize spurious statistical cues. It is significantly harder than the original Winograd Schema Challenge for contemporary NLP models.

benchmarkcommonsensewinograd-schema
73.8B+
DatasetSpeech & Audio AI

AudioSet

by Google

Google's AudioSet is a large-scale dataset of manually annotated audio events comprising over 2 million 10-second YouTube clips labeled with a hierarchical ontology of 632 audio event classes. It is the primary benchmark for audio tagging and sound event detection, spanning music, speech, and environmental sounds.

audio-classificationsound-eventslarge-scale
73.7B+
Datasetmedical

CheXpert

by Stanford ML Group

CheXpert is a large chest X-ray dataset from Stanford containing 224,316 chest radiographs from 65,240 patients with labels for 14 observations mined from radiology reports using an automated labeler. It uniquely addresses label uncertainty with positive, negative, and uncertain labels, making it a challenging and realistic benchmark for automated chest X-ray interpretation.

chest-x-rayradiologymulti-label
73.2B+
Datasetmedical

PubMedCentral OA

by National Institutes of Health / National Library of Medicine

PubMedCentral Open Access (PMC OA) is a subset of the PMC literature archive made freely available for text mining and NLP research, containing over 4 million full-text biomedical and life science articles. It is the primary corpus used for pretraining biomedical language models such as BioBERT, PubMedBERT, and BioGPT.

biomedical-nlpscientific-literaturefull-text
73.1B+
DatasetSpeech & Audio AI

VoxCeleb2

by Oxford Visual Geometry Group (VGG)

VoxCeleb2 is a large-scale speaker recognition dataset containing over 1 million utterances from 6,112 celebrities extracted from YouTube videos in challenging real-world conditions. It is the standard benchmark for speaker verification and diarization research, providing naturalistic conversational speech at scale.

speaker-verificationspeaker-recognitionin-the-wild
73B+
Datasetmultilingual

FLORES-200 Dataset

by Meta AI

FLORES-200 is Meta's few-shot translation evaluation benchmark spanning 200 languages, including many low-resource and endangered ones. Each language contains 1,012 parallel sentences translated from English Wikipedia, covering both devtest and test splits for systematic MT evaluation at scale.

evaluationmachine-translation200-languages
73B+
Datasetinstruction-tuning

Alpaca Dataset

by Stanford University

Stanford Alpaca's 52,000 instruction-following examples generated using the self-instruct technique applied to GPT-3.5 (text-davinci-003). This foundational dataset enabled the creation of the Alpaca 7B model and popularized cost-effective instruction-tuning approaches.

instruction-followingself-instructstanford
73B+
DatasetSpeech & Audio AI

Common Voice 15

by Mozilla

Mozilla's Common Voice 15.0 is the world's largest publicly available multilingual speech corpus, containing over 30,000 hours of validated speech data across 114 languages, all contributed and validated by volunteers. It enables training and evaluation of multilingual and low-resource speech recognition systems.

ASRmultilingualcrowdsourced
72.6B+
Datasetfinancial

SEC-EDGAR Filings

by U.S. Securities and Exchange Commission

The SEC-EDGAR Filings dataset encompasses over 20 million full-text regulatory filings submitted to the US Securities and Exchange Commission since 1993, including 10-K annual reports, 10-Q quarterly reports, 8-K current reports, and proxy statements from all US public companies. It is the foundational corpus for financial NLP research, sentiment analysis, and financial document AI.

financial-nlp10-K10-Q
72.5B+
DatasetAI for Code

MBPP (Mostly Basic Python Problems)

by Google

A dataset of 974 crowd-sourced Python programming problems suitable for entry-level programmers, each with a problem description, code solution, and three automated test cases. MBPP complements HumanEval by covering a broader variety of programming concepts and is widely used alongside it for comprehensive evaluation of code generation capabilities across model families.

codeevaluationpython
72.5B+
DatasetAI for Code

The Stack v2

by BigCode

An expanded code pretraining dataset containing 3 trillion tokens of source code in 619 programming languages, curated by BigCode from GitHub repositories with permissive SPDX licenses. Version 2 triples the size of the original Stack and includes improved deduplication, opt-out mechanisms for authors, and structured data from GitHub issues and pull requests alongside raw source files.

codepretrainingpermissive-license
72.3B+
DatasetComputer Vision

CelebA-HQ

by NVIDIA / CUHK

CelebA-HQ is a high-quality version of the CelebA face dataset containing 30,000 celebrity images at 1024×1024 resolution with 40 binary attribute annotations. It was introduced alongside Progressive GAN and has become the standard benchmark for high-fidelity face generation and synthesis research.

face-generationGANhigh-resolution
72.3B+
Datasetscientific

ArXiv Papers Dataset

by Cornell University / arXiv

The ArXiv Papers Dataset is a bulk export of over 2.3 million scientific preprints from arXiv spanning physics, mathematics, computer science, biology, finance, and economics, provided by Cornell University and hosted on Kaggle and AWS S3. The full-text LaTeX source and parsed metadata make it a primary pretraining corpus for scientific language models and citation-network research.

scientific-paperspreprintsnlp
72.2B+
Datasetinstruction-tuning

OpenAssistant Conversations

by LAION

A large-scale, human-annotated dataset of assistant-style conversations collected through the OpenAssistant crowdsourcing platform. Contains over 161,000 messages across 66,000+ conversation trees, with ranked responses for RLHF training.

rlhfinstruction-followingconversations
72B+
Datasetmultilingual

mC4

by Google

The multilingual Colossal Clean Crawled Corpus (mC4) spans 101 languages and contains hundreds of billions of tokens scraped from Common Crawl with language detection and heuristic quality filters. It was used to train mT5 and is one of the largest publicly available multilingual pre-training corpora.

multilingualweb-crawlpre-training
72B+
Datasetscientific

Semantic Scholar ORC

by Allen Institute for AI (AI2)

The Semantic Scholar Open Research Corpus (S2ORC) is a large English-language corpus of 136 million academic papers with structured metadata, abstracts, citation graphs, and full-text body paragraphs where licensing allows. Maintained by the Allen Institute for AI, it covers 19 scientific fields and is widely used for scientific NLP tasks including citation prediction, claim verification, and scientific QA.

scientific-papersopen-researchfull-text
71.7B+
DatasetLLMs

BookCorpus

by University of Toronto

A dataset of over 11,000 unpublished books spanning fiction and non-fiction genres, originally scraped from Smashwords and used as the primary pretraining corpus for BERT alongside Wikipedia. It provides rich long-range dependency data that helps models learn coherent narrative structure and extended discourse patterns.

nlpbookslong-form
71.3B+
DatasetComputer Vision

Places365

by MIT CSAIL

Places365 is a scene-centric database with 1.8 million training images across 365 scene categories, designed to train and evaluate scene recognition models. The dataset enables models to understand the semantic meaning of places and environments, making it ideal for applications in autonomous driving, robotics, and image retrieval.

scene-recognitionscene-classificationtransfer-learning
70.7B+
DatasetAI for Code

CodeSearchNet

by GitHub / Microsoft Research

A dataset and benchmark challenge for code retrieval and search containing 2 million (code, documentation) pairs in six programming languages — Python, Java, JavaScript, PHP, Ruby, and Go — curated by GitHub and Microsoft Research. It is the canonical benchmark for code-to-natural-language and natural-language-to-code retrieval tasks and is widely used to evaluate code embedding models.

codecode-searchdocumentation
70.4B+
DatasetAI for Code

APPS (Automated Programming Progress Standard)

by UC Berkeley

A benchmark of 10,000 programming problems at introductory, interview, and competitive programming difficulty levels, each with problem statements, test cases, and human-written solutions. APPS is the standard dataset for evaluating code generation models on realistic programming tasks ranging from simple loops to complex algorithmic challenges drawn from competitive programming platforms.

codecompetitive-programmingevaluation
70.3B+
Datasetinstruction-tuning

UltraFeedback

by Tsinghua University

A large-scale, high-quality preference dataset with 64,000 instructions each answered by 4 LLMs and rated by GPT-4 on instruction-following, truthfulness, honesty, and helpfulness. UltraFeedback is the backbone of the Zephyr and Tulu 2 DPO models.

rlhfpreference-datagpt-4-annotated
70.2B+
Datasetfinancial

Financial PhraseBank

by Pekka Malo et al. / Aalto University

Financial PhraseBank is a sentiment analysis dataset containing 4,845 sentences from English-language financial news annotated by 16 financial domain experts with positive, negative, or neutral sentiment labels. It is the most widely used benchmark for financial sentiment analysis and has been used to fine-tune FinBERT and numerous other financial NLP models.

financial-sentimentNLPsentiment-analysis
70.1B+
Datasetalignment

Self-Instruct

by University of Washington

Self-Instruct is the foundational instruction-tuning dataset and methodology introduced by Wang et al. (2022), where 175 human-written seed tasks are iteratively expanded into 52,000 instruction-input-output triplets using GPT-3 as the generator. It established the paradigm of bootstrapping instruction data from existing LLMs and directly inspired Alpaca, WizardLM, and most subsequent synthetic alignment datasets.

instruction-tuningself-playseed-tasks
69.8B
DatasetAI for Code

StarCoderData

by BigCode

The 780 billion token code dataset used to pretrain the StarCoder family of models, assembled by BigCode from The Stack v1 spanning 86 programming languages with permissive licenses. It includes GitHub issues, Git commits, and Jupyter notebook data alongside source files, enabling models to learn from developer workflows and not just static code.

codepretraininggithub
69.7B
Datasetmathematics

DM Mathematics

by Google DeepMind

DeepMind Mathematics (DM Mathematics) is a dataset of 2 million mathematical question-answer pairs covering algebra, arithmetic, calculus, comparisons, measurement, numbers, polynomials, and probability, procedurally generated to test mathematical reasoning capabilities of language models. The symbolic and step-structured nature of the dataset makes it a standard benchmark for evaluating compositional generalization and multi-step arithmetic reasoning.

mathematicsreasoningsymbolic
69.2B
Datasetinstruction-tuning

LIMA

by Meta AI

LIMA (Less Is More for Alignment) is a carefully curated dataset of 1,000 high-quality instruction-response pairs demonstrating that alignment quality matters more than quantity. Sourced from StackExchange, wikiHow, and manually written prompts, LIMA-tuned models rival GPT-4 on many benchmarks.

quality-over-quantityinstruction-followingmeta
69.1B
Datasetinstruction-tuning

OpenHermes 2.5

by Nous Research

A large curated synthetic instruction dataset with ~1 million entries sourced from multiple high-quality open datasets including Airoboros, Camel, GPT4-LLM, and others. OpenHermes 2.5 powers the Nous Hermes model family and is widely regarded as one of the best open instruction datasets.

syntheticgpt-4instruction-following
68.7B
Datasetalignment

OASST2

by LAION / OpenAssistant

OpenAssistant Conversations 2 (OASST2) is a crowd-sourced human-annotated dataset of 100,000+ assistant-style conversations in 35 languages, where human contributors created and ranked message trees to produce preference labels for RLHF training. It is the largest open multilingual human-feedback dataset and is widely used for training preference models and reward functions in open-source alignment pipelines.

rlhfhuman-feedbackchat
68.5B
Datasetmultilingual

NLLB Training Data

by Meta AI

The No Language Left Behind (NLLB) training corpus released by Meta AI contains high-quality parallel data across 200+ language pairs, including newly mined bitext for dozens of low-resource languages. It was used to train the NLLB-200 model achieving state-of-the-art translation on low-resource language pairs.

machine-translation200-languagesparallel-corpus
68.5B
Datasetinstruction-tuning

ShareGPT

by Community

A community-collected dataset of real ChatGPT and GPT-4 conversation logs shared by users, covering a broad range of tasks and domains. Available in multiple filtered and cleaned versions including ShareGPT52K and ShareGPT90K used by Vicuna and other open models.

conversationsgpt-4chatgpt
68.4B
DatasetComputer Vision

LSUN

by Princeton / Columbia University

The Large-Scale Scene Understanding (LSUN) dataset is a massive collection of nearly one million labeled images for each of 10 scene and 20 object categories. It is a key benchmark for advancing research in scene understanding, particularly for generative modeling, classification, and reconstruction tasks.

scene-classificationscene-understandinglarge-scale
68.3B
Datasetinstruction-tuning

Dolly-15K

by Databricks

Dolly-15K is a high-quality, open-source dataset of 15,000 instruction-following records generated by humans. Created by Databricks employees, it's designed for fine-tuning large language models to exhibit instruction-following capabilities, such as those seen in ChatGPT, using a relatively small, targeted dataset.

instruction-tuningsupervised-fine-tuninghuman-generated-data
68.3B
Datasetmultilingual

OPUS-100

by University of Helsinki

OPUS-100 is a large-scale multilingual parallel corpus for machine translation, featuring 100 languages pivoted through English. Sampled from the OPUS collection, it provides up to 1 million sentence pairs per language pair, making it a standard benchmark for training and evaluating multilingual models.

parallel-corpusmachine-translationmultilingual-nlp
68.1B
Datasetsynthetic

Phi-1 TextBooks

by Microsoft

Phi-1 TextBooks is a synthetic dataset of Python coding textbooks and exercises generated by GPT-3.5 and GPT-4. It was created to pretrain Microsoft's Phi-1 small language model, demonstrating that high-quality, curriculum-style data can significantly boost the coding abilities of smaller models compared to training on general web data.

synthetic-datatextbookscoding
67.7B
DatasetSpeech & Audio AI

GigaSpeech

by Seasalt.ai / SpeechColab

GigaSpeech is a multi-domain English speech corpus with 10,000 hours of high-quality labeled audio for ASR, sourced from audiobooks, podcasts, and YouTube across a broad range of topics and recording conditions. Its scale and diversity make it particularly valuable for training robust, domain-generalizable speech recognition models.

ASRlarge-scaleenglish
67.7B
Datasetinstruction-tuning

WizardLM Evol-Instruct

by Microsoft Research

WizardLM Evol-Instruct is a synthetic dataset created by Microsoft Research for fine-tuning large language models. It uses an LLM-based evolutionary process to iteratively rewrite and complicate a seed set of instructions, progressively increasing their complexity and diversity. The dataset is designed to enhance a model's ability to follow intricate, multi-step commands across various domains like coding, math, and reasoning.

evol-instructcomplexity-evolutionsynthetic
67.2B
Datasetmultilingual

TyDi QA Dataset

by Google Research

TyDi QA is a benchmark for question answering across 11 typologically diverse languages. It features information-seeking questions written by native speakers who have not seen the answer, ensuring real-world applicability. This design challenges models to generalize beyond high-resource, typologically similar languages.

question-answeringmultilingualtypologically-diverse
66.9B
DatasetComputer Vision

DataComp-1B

by DataComp Consortium

A curated 1.28 billion image-text pair dataset produced through the DataComp benchmark competition, which challenged participants to filter a 12.8 billion pair candidate pool to produce the best downstream CLIP model. DataComp-1B represents the winning filtering strategy and achieves state-of-the-art zero-shot classification performance among datasets of its size.

multimodalimage-textbenchmark
66.6B
DatasetLLMs

OpenWebText

by EleutherAI

OpenWebText is a large-scale, open-source English text corpus created by scraping web pages linked from Reddit. Designed as a public replication of OpenAI's original WebText dataset used for GPT-2, it contains approximately 38 GB of text filtered by Reddit upvotes to ensure a baseline of quality and relevance.

nlpweb-textreddit
66.4B
DatasetLLMs

LAION-400M Text Captions

by LAION

The text caption component of the LAION-400M dataset, offering 400 million English alt-text captions. These captions were scraped from the web and filtered using CLIP to ensure a minimum similarity to their corresponding images. The text is used independently for large-scale NLP and multimodal research.

nlpcaptionsimage-text
66.3B
Datasetmedical

BioASQ Dataset

by BioASQ Consortium

The BioASQ dataset is a benchmark for biomedical semantic indexing and question answering. It contains thousands of expert-annotated questions (factoid, list, yes/no, summary) paired with relevant PubMed articles, concepts, and ideal answers, designed to train and evaluate advanced NLP systems in the medical domain.

biomedical-qaquestion-answeringsemantic-indexing
66.2B
Datasetrobotics

Open X-Embodiment

by Google DeepMind / Consortium

Open X-Embodiment (OXE) is a massive robotics dataset combining over 1 million demonstration episodes from 22 distinct robot embodiments. It covers 527 skills and is designed to train generalist robot policies that can transfer skills across diverse hardware, serving as a key resource for vision-language-action models.

roboticsmanipulationmulti-robot
66.1B
Datasetlegal

Legal-BERT Training Data

by Gerasimos Spanakis / Maastricht University

The Legal-BERT training corpus is a large collection of English legal text assembled from UK legislation, EU legislation, ECHR/ECLI court decisions, and US contracts specifically curated to pretrain domain-adapted BERT models. It has enabled a family of Legal-BERT models that significantly outperform general-domain language models on legal NLP tasks.

legal-nlppretrainingcontracts
65.9B
Datasetai-datasets

GenLaw: A Legal Reasoning Dataset

by Stanford Center for Legal Informatics

GenLaw is a comprehensive dataset designed for evaluating legal reasoning capabilities of large language models. It contains a diverse set of legal questions, case summaries, and relevant statutes, enabling researchers to assess a model's ability to understand and apply legal principles.

legalreasoninglaw
65.8B
DatasetLLMs

SlimPajama

by Cerebras

SlimPajama is a cleaned and deduplicated version of the RedPajama dataset, containing 627 billion high-quality tokens. Produced by Cerebras, it demonstrates that training on fewer, higher-quality tokens can match or exceed the performance of models trained on larger, noisier datasets.

nlppretrainingdeduplicated
65.5B
Datasetlegal

EU Court Decisions

by European Court of Human Rights / CJEU

The EU Court Decisions dataset aggregates judgments from the European Court of Human Rights (ECHR) and the Court of Justice of the European Union (CJEU), covering tens of thousands of decisions in multiple EU languages with structured metadata. It is widely used for multilingual legal NLP research, legal judgment prediction, and cross-lingual information retrieval.

european-lawcourt-decisionsmultilingual
65.5B
Datasetcode

Evol-CodeAlpaca

by Microsoft Research

Evol-CodeAlpaca is a dataset of 110,000 instruction-solution pairs for code generation, created by applying the EvolInstruct method to Code Alpaca seeds. Using GPT-4, it progressively increases the complexity and diversity of programming problems, serving as the primary training data for the WizardCoder models.

code-generationinstruction-tuningevol-instruct
65.3B
DatasetComputer Vision

ShareGPT4V

by Shanghai AI Lab

ShareGPT4V is a large-scale, high-quality dataset containing 100,000 image-text pairs generated by GPT-4V. It is specifically designed for the instruction-tuning of open-source large vision-language models (LVLMs). The dataset's detailed captions and conversational QA pairs significantly enhance a model's ability to perform complex scene understanding, OCR, and visual reasoning.

datasetmultimodalinstruction-tuning
65.1B
Datasetfinancial

FinQA Dataset

by Zhiyu Chen et al. / University of California Santa Barbara

FinQA is a large-scale dataset for numerical reasoning over financial data, containing over 8,000 question-answer pairs from S&P 500 earnings reports. Each question requires multi-step reasoning across both unstructured text and structured tables, making it a challenging benchmark for financial AI systems.

financial-qanumerical-reasoningtable-qa
65.1B
Datasetsynthetic

Cosmopedia

by Hugging Face

Cosmopedia is a massive synthetic dataset containing 30 million documents styled as textbooks, blog posts, and articles. Generated by Mixtral-8x7B-Instruct, it provides a vast, multilingual corpus of high-quality educational content designed for pretraining large language models at scale.

synthetic-datatext-corpusllm-pretraining
65.1B
DatasetComputer Vision

CC12M (Conceptual 12M)

by Google

CC12M is a large-scale dataset by Google containing 12 million image-text pairs from the web. It was created with a less restrictive filtering process than its predecessor, CC3M, to achieve greater scale and diversity. This makes it a foundational resource for pretraining large vision-language models like CLIP and ALIGN.

multimodalimage-textweb-crawl
65.1B
Datasetmultilingual

XL-Sum Dataset

by BUET (Bangladesh University of Engineering and Technology)

XL-Sum is a massive multilingual dataset for abstractive summarization. It consists of over 1 million article-summary pairs scraped from BBC News, covering 44 different languages. This diversity makes it a crucial resource for developing and evaluating cross-lingual and multilingual summarization models.

summarizationmultilingualnews
64.9B
Datasetlegal

CaseText Corpus

by Casetext (acquired by Thomson Reuters)

The CaseText Corpus is a large-scale dataset of US federal and state court decisions. It includes full text, structured metadata, and citation networks, designed for legal research and the development of AI applications like legal language models and case retrieval systems, spanning decades of US jurisprudence.

case-lawlegal-researchcase-retrieval
64.7B
Datasetrobotics

RLBench

by Dyson Robotics Lab / Imperial College London

RLBench is a large-scale robot learning benchmark and dataset built on the CoppeliaSim simulator, providing 100 unique manipulation tasks with demonstrations, observations, and reward functions. It offers RGB, depth, and point-cloud observations for a Franka Panda arm across diverse household tasks, widely used for evaluating imitation learning, reinforcement learning, and multi-task robot policies.

roboticsmanipulationbenchmark
64.2B
Datasetsynthetic

OpenMathInstruct

by NVIDIA

OpenMathInstruct is a large-scale, synthetic dataset by NVIDIA featuring 1.8M+ math problem-solution pairs. Generated by Mixtral models and verified for correctness, it provides reliable, step-by-step reasoning chains for training and fine-tuning language models on diverse mathematical topics, from arithmetic to competition math.

synthetic-datamathematicsinstruction-tuning
64.2B
DatasetAI for Code

GitHub Code Dataset

by Hugging Face / BigCode

The GitHub Code Dataset is a massive, multilingual collection of source code from public GitHub repositories, spanning 32 programming languages. Distributed via Hugging Face under the BigCode project, it provides a foundational resource for pretraining large language models on diverse code-related tasks, from generation to analysis.

codemultilingual-codegithub
63.6B
DatasetLLMs

CC-News

by CommonCrawl Foundation

CC-News is a large-scale dataset of over 700,000 English news articles from the CommonCrawl archive, collected between 2016 and 2019. It serves as a key pretraining corpus, notably for the RoBERTa model, providing a rich source of journalistic text for developing models that understand news language and current events.

nlpnewsweb-crawl
63.3B
DatasetSpeech & Audio AI

MusicNet

by University of Washington

MusicNet is a collection of 330 freely licensed classical music recordings with over 1 million annotated labels indicating the precise timing and identity of every musical note in each recording. It supports supervised learning for music transcription, instrument recognition, and music information retrieval tasks.

musicinstrument-recognitionnote-annotations
63.2B
Datasetmultilingual

CulturaX

by University of Oregon

CulturaX is a massive, cleaned multilingual text corpus containing 6.3 trillion tokens across 167 languages. It was created by combining, deduplicating, and filtering the mC4 and OSCAR datasets using language model-based quality scoring. This makes it one of the largest and cleanest public datasets for pre-training large language models.

multilingual-corpuspre-training-datasetllm-training
63.2B
Datasetalignment

Tulu V2 Mix

by Allen Institute for AI (AI2)

Tulu V2 Mix is a curated 326,000-sample mixture of instruction-tuning datasets from AI2. It blends diverse sources like FLAN, Open Assistant, and Code Alpaca to train the Tulu 2 model family. The dataset serves as a benchmark for analyzing the impact of different data sources on model performance and quality.

instruction-tuningsftdata-mixture
63.1B
Datasetmedical

MedNLI

by University of Massachusetts / Partners Healthcare

MedNLI is a benchmark dataset for Natural Language Inference (NLI) in the clinical domain. Derived from the MIMIC-III database, it contains over 14,000 sentence pairs from clinical notes, each annotated by a clinician as representing entailment, contradiction, or a neutral relationship, enabling the evaluation of clinical text reasoning.

natural-language-inferenceclinical-nlpentailment
62.8B
DatasetComputer Vision

WebVid-10M

by University of Oxford

WebVid-10M is a massive dataset containing over 10 million video clips paired with descriptive text captions. Scraped from stock video websites, it serves as a foundational pretraining corpus for state-of-the-art video-language models, facilitating research in video understanding, retrieval, and generation.

multimodalvideo-textvideo-captioning
62.7B
DatasetLLMs

PushShift Reddit Dataset

by PushShift.io

A massive, multi-billion token archive of Reddit comments and submissions from 2005 to 2023, collected by the PushShift project. This dataset is a cornerstone for social NLP research, large-scale language model pre-training, and studying the dynamics of online communities and conversational discourse.

nlpsocial-mediadialogue
62B
Datasetinstruction-tuning

Nectar

by UC Berkeley

Nectar is a large-scale, high-quality preference dataset from Berkeley AI Research (BAIR). It contains 183,000 prompts, each with seven ranked responses from diverse models like GPT-4, ChatGPT, and open-source LLMs. It is designed for training robust reward models for RLHF and DPO.

rlhfpreference-datareward-model
61.6B
Datasetrobotics

RoboNet

by Berkeley AI Research (BAIR)

RoboNet is a large-scale dataset for robot learning, featuring 15 million video frames from diverse robot arms across multiple labs. It is designed to train and benchmark self-supervised visual models, aiming to achieve generalization across different robot morphologies and workspaces without task-specific labels.

roboticsvideomanipulation
60.3B
Datasetalignment

Orca DPO Pairs

by Intel Labs / Community

Orca DPO Pairs is a synthetic dataset containing 12,000 instruction-following examples. Each example includes a prompt, a high-quality response from GPT-4 (chosen), and a lower-quality response from GPT-3.5 (rejected). It is designed for efficiently aligning language models using Direct Preference Optimization (DPO) without a reward model.

dpopreferencealignment
60.2B
Datasetrobotics

CALVIN

by Albert-Ludwigs-Universität Freiburg

CALVIN is a large-scale dataset and benchmark for long-horizon, language-conditioned robot manipulation. It features over 24 hours of teleoperated demonstration data in a tabletop environment, encompassing 34 distinct skills that can be composed to solve complex, multi-step tasks from natural language instructions.

roboticslanguage-conditionedmanipulation
59.5C+
Datasetalignment

Deita 6K

by HKUST / Community

Deita 6K is an ultra-compact, high-quality instruction-tuning dataset of 6,000 carefully selected samples produced by the Data-Efficient Instruction Tuning for Alignment (DEITA) framework, which scores and filters instruction data by complexity and quality using LLM judges. Despite its small size, models trained on Deita 6K match or outperform those trained on datasets 10-100x larger, demonstrating the power of principled data selection over scale.

instruction-tuningdata-selectionquality-filtering
58.6C+
Datasetsynthetic

CAMEL-AI Datasets

by CAMEL-AI

The CAMEL-AI Datasets are a collection of synthetic multi-agent conversation datasets generated through the Communicative Agents framework, where AI assistants and user agents collaborate via role-playing to solve tasks. The collection covers coding, math, science, and open-ended reasoning domains, providing diverse instruction-following dialogues useful for SFT and alignment research.

syntheticmulti-agentrole-playing
58.2C+
DatasetAI for Code

CodeParrot GitHub Code

by Hugging Face

A 50 GB dataset of Python code scraped from GitHub, originally created to train the CodeParrot model as a demonstration of code-focused language model pretraining. It filters repositories for Python files only and applies basic deduplication, making it a lightweight starting point for Python-specific code generation research and experimentation.

codegithubpython
57.4C+
Datasetalignment

Capybara

by Argilla / LDJnr

Capybara is a high-quality instruction-tuning dataset of 15,000 diverse, long-form single- and multi-turn conversations synthesized to cover a wide range of topics and response styles, designed to improve model coherence and verbosity on open-ended tasks. It emphasizes narrative quality and conceptual depth over simple factual responses, making it particularly effective for improving chat model fluency and reasoning.

instruction-tuninglong-formdiverse
57.4C+
Datasetsynthetic

Genstruct

by NousResearch

Genstruct is a synthetic instruction dataset generated by the Genstruct-7B model, which converts raw documents into structured instruction-response pairs. Unlike typical self-instruct approaches, Genstruct grounds every instruction in a source document, ensuring factual consistency and enabling controllable synthetic data generation from any text corpus.

syntheticinstruction-tuningdocument-grounded
53.4C+
Datasetdatasets

UltraChat

by Tsinghua University

1.5M high-quality multi-turn dialogue dataset for instruction fine-tuning.

alignmentdialoguesft
44C
Datasetdatasets

The Pile

by EleutherAI

825GB diverse English pretraining corpus from 22 high-quality data sources.

pretrainingenglishdiverse
44C
Datasetdatasets

SWE-bench

by Princeton NLP

2.3K real GitHub issues requiring AI agents to write and verify code fixes.

benchmarkcodingagents
44C
PaperLLMs

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

by Google AI

Introduced BERT, a bidirectional Transformer pre-trained on masked language modeling and next sentence prediction. Established the pretrain-then-fine-tune paradigm that dominated NLP for years and achieved state-of-the-art on 11 NLP benchmarks.

bertpre-trainingbidirectional
82.8A
PaperComputer Vision

Learning Transferable Visual Models From Natural Language Supervision (CLIP)

by OpenAI

Introduced CLIP (Contrastive Language-Image Pre-training), a model trained on 400 million image-text pairs using contrastive learning that achieves remarkable zero-shot transfer to diverse vision tasks. CLIP became foundational for vision-language alignment and generative AI pipelines.

clipcontrastive-learningzero-shot
82.2A
PaperLLMs

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

by Google Brain

Introduced chain-of-thought prompting, a simple technique of providing exemplars with step-by-step reasoning traces in few-shot prompts. This approach dramatically improves LLM performance on arithmetic, commonsense, and symbolic reasoning tasks, with the effect emerging at approximately 100B parameters.

chain-of-thoughtreasoningprompting
82.1A
PaperComputer Vision

High-Resolution Image Synthesis with Latent Diffusion Models (Stable Diffusion)

by CompVis / Stability AI

Introduced Latent Diffusion Models (LDMs), which perform the diffusion process in a compressed latent space rather than pixel space, dramatically reducing computational cost while maintaining image quality. This work underpins Stable Diffusion, the most widely used open-source image generation model.

stable-diffusionlatent-diffusiontext-to-image
82A
PaperLLMs

Language Models are Few-Shot Learners (GPT-3)

by OpenAI

Introduced GPT-3, a 175B parameter language model demonstrating remarkable few-shot learning capabilities across diverse tasks. Showed that scaling model size dramatically improves in-context learning without gradient updates, reshaping the field.

gpt-3few-shotin-context-learning
82A
PaperComputer Vision

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

by Google Brain

Introduced the Vision Transformer (ViT), demonstrating that a pure transformer applied directly to sequences of image patches achieves state-of-the-art performance on image classification when pretrained on large datasets. The paper challenged the dominance of convolutional neural networks in computer vision.

vision-transformerimage-classificationattention
81.9A
PaperAI Ethics & Safety

Training Language Models to Follow Instructions with Human Feedback

by OpenAI

Presents InstructGPT, which uses Reinforcement Learning from Human Feedback (RLHF) to align GPT-3 with human intent. By fine-tuning on human demonstrations and training a reward model on human preference comparisons, InstructGPT produces outputs that human evaluators prefer to GPT-3 outputs despite having 100× fewer parameters.

rlhfalignmentinstruction-following
81.8A
PaperAI Agents

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

by Facebook AI Research

Introduces Retrieval-Augmented Generation (RAG), combining parametric memory (language model weights) with non-parametric memory (dense retrieval over Wikipedia) for knowledge-intensive NLP tasks. RAG models achieve state-of-the-art on open-domain QA benchmarks and produce more specific, factual, and diverse responses than pure parametric models.

ragretrievalgeneration
81.2A
Paperreinforcement-learning

Proximal Policy Optimization Algorithms

by OpenAI

PPO introduces a clipped surrogate objective that constrains policy update step sizes, achieving the stability of trust-region methods (TRPO) with the simplicity and scalability of first-order optimizers. It quickly became the dominant RL algorithm for training large language models with human feedback.

reinforcement-learningppopolicy-gradient
81.1A
Paperdomain-specific

Highly Accurate Protein Structure Prediction with AlphaFold

by DeepMind

AlphaFold 2 achieves atomic-level accuracy in protein structure prediction by combining evolutionary information from multiple sequence alignments with a novel Evoformer architecture and structure module, solving a 50-year grand challenge in biology. Its predictions have been released for virtually all known proteins and have accelerated drug discovery, enzyme design, and structural biology worldwide.

biologyprotein-structurealphafold
81.1A
PaperLLMs

GPT-4 Technical Report

by OpenAI

Technical report for GPT-4, OpenAI's multimodal large language model accepting image and text inputs. Demonstrates state-of-the-art performance on academic and professional benchmarks, including passing the bar exam in the top 10% of test takers.

gpt-4multimodalrlhf
81A
PaperComputer Vision

Segment Anything

by Meta AI

Introduced the Segment Anything Model (SAM) and the SA-1B dataset of 1 billion masks on 11 million images. SAM is a promptable segmentation foundation model that generalizes to new image distributions and tasks without additional training, enabling a new paradigm of interactive segmentation.

segmentationfoundation-modelpromptable
79.2B+
PaperLLMs

Evaluating Large Language Models Trained on Code (Codex)

by OpenAI

Introduced Codex, a GPT language model fine-tuned on publicly available code from GitHub, and the HumanEval benchmark for measuring code synthesis from docstrings. Codex powers GitHub Copilot and represents a breakthrough in automated programming assistance.

codexcode-generationgithub-copilot
79.2B+
PaperAI Agents

ReAct: Synergizing Reasoning and Acting in Language Models

by Google / Princeton

Introduces ReAct, a paradigm that combines reasoning traces and task-specific actions in language models. By interleaving thinking steps with tool calls, ReAct agents outperform chain-of-thought and act-only baselines on diverse tasks including question answering, fact verification, and interactive decision-making.

agentsreasoningtool-use
79B+
Papertraining

LoRA: Low-Rank Adaptation of Large Language Models

by Microsoft Research

Introduces LoRA, which freezes pretrained model weights and injects trainable low-rank decomposition matrices into Transformer layers. Reduces trainable parameters by 10,000× and GPU memory by 3× with no inference latency overhead, enabling efficient LLM fine-tuning.

lorafine-tuninglow-rank
78.8B+
PaperLLMs

LLaMA: Open and Efficient Foundation Language Models

by Meta AI

Introduces LLaMA, a collection of foundation language models ranging from 7B to 65B parameters, trained on publicly available datasets. Showed that smaller models trained on more tokens can match or exceed larger models, democratizing LLM research.

llamaopen-sourceefficient
78.1B+
Paperreinforcement-learning

Deep Reinforcement Learning from Human Preferences

by OpenAI

This foundational RLHF paper shows that human preference comparisons between agent behaviors can train a reward model that guides deep RL agents in complex tasks like Atari games and MuJoCo locomotion, without hand-crafted reward functions. The approach reduces human labeling effort by ~3 orders of magnitude compared to direct reward specification.

reinforcement-learningrlhfhuman-feedback
78B+
PaperLLMs

Gemini: A Family of Highly Capable Multimodal Models

by Google DeepMind

Introduced the Gemini family of multimodal models (Ultra, Pro, Nano) natively trained to process and combine text, images, audio, and video. Gemini Ultra is the first model to surpass human expert performance on MMLU and achieves state-of-the-art across 30 of 32 benchmarks evaluated.

geminimultimodalgoogle
77.8B+
PaperLLMs

Efficient Memory Management for Large Language Model Serving with PagedAttention

by UC Berkeley

Introduced PagedAttention and the vLLM serving system, which manages the KV cache in non-contiguous physical memory blocks inspired by OS paging, enabling near-zero memory waste and efficient sharing of KV cache across requests. vLLM achieves 2-4x higher throughput than HuggingFace Transformers and 1.7x over Orca.

paged-attentionvllminference
77.7B+
PaperAI Agents

Generative Agents: Interactive Simulacra of Human Behavior

by Stanford University / Google

Introduces generative agents—computational software agents that simulate believable human behavior—by combining a large language model with memory streams, reflection synthesis, and planning mechanisms. Twenty-five agents populate a virtual town, exhibiting emergent social behaviors including relationship formation, information propagation, and event coordination.

agentssimulationsocial
77.3B+
PaperComputer Vision

Hierarchical Text-Conditional Image Generation with CLIP Latents (DALL-E 2)

by OpenAI

Presented DALL-E 2 (unCLIP), a hierarchical text-conditional image generation system using CLIP image embeddings as a prior and a diffusion decoder. The system achieves state-of-the-art photorealism and text-image alignment, substantially advancing the field of text-to-image synthesis.

dall-e-2text-to-imagediffusion
77.1B+
Papertraining

Training Language Models to Follow Instructions with Human Feedback (InstructGPT)

by OpenAI

Introduces InstructGPT, fine-tuning GPT-3 with Reinforcement Learning from Human Feedback (RLHF) to follow instructions. A 1.3B InstructGPT model is preferred over 175B GPT-3 by human labelers, establishing RLHF as the dominant alignment technique.

rlhfinstructgptalignment
77B+
PaperLLMs

Self-Consistency Improves Chain of Thought Reasoning in Language Models

by Google Brain

Introduced self-consistency, a decoding strategy that samples diverse reasoning paths from a language model and returns the most consistent answer by marginalizing out the reasoning paths. Self-consistency is a simple, training-free technique that substantially improves chain-of-thought prompting across arithmetic and commonsense reasoning tasks.

self-consistencychain-of-thoughtreasoning
76.7B+
Paperresearch

Scaling Laws for Neural Language Models

by OpenAI

Empirically establishes power-law scaling relationships between language model performance and model size, dataset size, and compute budget. Provides the foundational framework for predicting LLM capabilities as a function of scale, guiding research for years.

scaling-lawscompute-optimallanguage-models
76.7B+
PaperLLMs

Visual Instruction Tuning (LLaVA)

by University of Wisconsin–Madison / Microsoft Research

Introduced LLaVA (Large Language and Vision Assistant), a multimodal model trained via visual instruction tuning using GPT-4-generated multimodal instruction-following data. LLaVA demonstrates impressive multimodal chat abilities and achieves 85.1% on Science QA, pioneering open-source visual instruction tuning.

llavamultimodalinstruction-tuning
76B+
PaperLLMs

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

by Google Brain

Introduced Switch Transformers, a simplified mixture-of-experts (MoE) architecture that routes each token to exactly one expert (top-1 routing), enabling trillion-parameter models with sub-linear compute scaling. Switch Transformers achieve 7x pretraining speedup over a dense T5 model while maintaining model quality.

mixture-of-expertsmoesparse-model
75.8B+
Paperethics

Model Cards for Model Reporting

by Google

Model Cards introduces a structured framework for documenting machine learning models across intended uses, performance disaggregated by demographic groups, and ethical considerations, enabling informed model selection and deployment decisions. The paper has become an industry standard, with model card adoption by Google, Hugging Face, and most major AI providers.

ethicsmodel-cardstransparency
75.8B+
PaperLLMs

Language Models are Unsupervised Multitask Learners (GPT-2)

by OpenAI

Introduced GPT-2, demonstrating that large language models trained on diverse web text can perform zero-shot transfer across many NLP tasks without task-specific fine-tuning. Showed emergent capabilities at scale and sparked debate on responsible AI release.

gpt-2language-modelingzero-shot
75.8B+
Paperinfrastructure

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

by Stanford University

Introduces FlashAttention, an IO-aware exact attention algorithm that restructures attention computation to minimize memory reads/writes between HBM and SRAM. Achieves 2-4× speedup over standard attention and enables training on much longer sequences.

flash-attentionio-awarememory-efficient
75.7B+
PaperLLMs

Training Compute-Optimal Large Language Models (Chinchilla)

by DeepMind

Challenges the Kaplan et al. scaling laws by showing that model size and training tokens should scale equally. Trains Chinchilla (70B) on 4× more data than Gopher, matching or beating models 4× its size, redefining compute-optimal training strategies.

chinchillascaling-lawscompute-optimal
75.4B+
Paperethics

Datasheets for Datasets

by Microsoft Research / Multiple Institutions

Drawing an analogy to electronics component datasheets, this paper proposes that every ML dataset should be accompanied by a standardized document covering its motivation, composition, collection process, preprocessing, uses, distribution, and maintenance. Datasheets for Datasets has become the foundational standard for dataset transparency and is widely required by major AI venues.

ethicsdatasetsdocumentation
75.2B+
PaperLLMs

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

by Princeton University / Google DeepMind

Introduced Tree of Thoughts (ToT), a framework that generalizes chain-of-thought prompting to a tree search over intermediate reasoning steps. ToT enables LLMs to explore multiple reasoning paths, evaluate choices, and backtrack, achieving dramatic improvements on tasks requiring lookahead and planning.

tree-of-thoughtsreasoningsearch
74.9B+
PaperLLMs

Fast Inference from Transformers via Speculative Decoding

by Google Research

Introduced speculative decoding, a lossless inference acceleration technique that uses a smaller, faster draft model to propose multiple tokens, then verifies them in parallel with the target model in a single forward pass. This achieves 2-3x speedup without any degradation in output quality or distribution.

speculative-decodinginference-efficiencydraft-model
74.7B+
PaperAI Ethics & Safety

Constitutional AI: Harmlessness from AI Feedback

by Anthropic

Introduces Constitutional AI (CAI), a method for training harmless AI assistants using a set of written principles (a 'constitution') to guide both supervised learning and reinforcement learning from AI feedback (RLAIF). CAI enables Anthropic to reduce reliance on human harm labels while maintaining helpfulness and making AI reasoning about harmlessness explicit.

alignmentsafetyconstitutional-ai
74.7B+
PaperComputer Vision

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

by Stability AI

Presented SDXL, a significantly improved latent diffusion model architecture featuring a 3.5B parameter UNet backbone with a secondary refiner model, conditioning on image size and crop parameters, and a curated high-aesthetic dataset. SDXL substantially improves visual quality and prompt adherence over prior Stable Diffusion versions.

sdxlstable-diffusiontext-to-image
74.5B+
PaperLLMs

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

by DeepSeek

DeepSeek-R1 demonstrates that pure reinforcement learning with rule-based rewards—without supervised fine-tuning on chain-of-thought data—can incentivize emergent reasoning capabilities in LLMs including self-verification, reflection, and long chain-of-thought. The model achieves performance comparable to OpenAI-o1 on reasoning benchmarks while being fully open-sourced, triggering a significant industry response.

reasoningreinforcement-learningdeepseek
74.5B+
Papertraining

QLoRA: Efficient Finetuning of Quantized LLMs

by University of Washington

Introduces QLoRA, which combines 4-bit quantization with LoRA adapters to fine-tune a 65B LLM on a single 48GB GPU while preserving full 16-bit fine-tuning performance. Introduces NF4 data type and double quantization for extreme memory reduction.

qloraquantizationfine-tuning
74.4B+
PaperLLMs

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

by Institute of Science and Technology Austria (IST Austria)

Presented GPTQ, a one-shot weight quantization method based on approximate second-order information that can quantize GPT models with 175B parameters to 4-bit or 3-bit precision in approximately four GPU-hours with negligible accuracy loss. GPTQ made large model inference practical on consumer hardware.

gptqquantizationpost-training-quantization
74.4B+
PaperLLMs

Code Llama: Open Foundation Models for Code

by Meta AI

Introduced Code Llama, a family of large language models for code built on Llama 2 through code-specific pretraining and fine-tuning. Code Llama achieves state-of-the-art performance among open models on HumanEval and MBPP, with variants for Python, instruction following, and long context (100K tokens).

code-llamametacode-generation
74.3B+
Paperai-evaluation

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

by LMSYS / UC Berkeley

Introduces Chatbot Arena, a platform for crowdsourced human evaluation of LLMs via pairwise comparisons using an Elo rating system. The arena has collected over 240K human votes across 50+ models, revealing human preference rankings that often diverge from standard benchmark leaderboards and providing a complementary evaluation signal.

evaluationhuman-preferenceelo
74B+
PaperAI Agents

Toolformer: Language Models Can Teach Themselves to Use Tools

by Meta AI

Presents Toolformer, a model that learns to use external tools (APIs) in a self-supervised manner without requiring human annotations. The model decides which APIs to call, how to call them, and how to incorporate results, achieving strong performance across diverse tasks while maintaining generative language modeling ability.

tool-useself-supervisedapi-calling
73.7B+
PaperLLMs

Mistral 7B

by Mistral AI

Introduces Mistral 7B, a 7B parameter language model outperforming LLaMA 2 13B on all benchmarks and approaching LLaMA 2 34B on code and reasoning. Uses grouped-query attention and sliding window attention for efficient inference.

mistralefficientsliding-window-attention
73.6B+
PaperLLMs

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

by MIT / MIT-IBM Watson AI Lab

Introduced AWQ (Activation-aware Weight Quantization), a hardware-friendly low-bit weight quantization approach that protects a small fraction (1%) of salient weights based on activation magnitudes, achieving better performance than GPTQ at 4-bit while being faster and more broadly applicable across model architectures.

awqquantizationactivation-aware
73.3B+
PaperLLMs

PaLM: Scaling Language Modeling with Pathways

by Google Research

Introduces PaLM (Pathways Language Model), a 540B parameter model trained on 780B tokens using the Pathways system. Achieved breakthrough performance on reasoning tasks and demonstrated discontinuous performance improvements that define emergent abilities.

palmscalingpathways
73.2B+
PaperComputer Vision

DINOv2: Learning Robust Visual Features without Supervision

by Meta AI

Presented DINOv2, a self-supervised vision foundation model trained on a curated dataset of 142 million images using a combination of self-distillation and contrastive objectives. DINOv2 features serve as universal visual representations, excelling on depth estimation, segmentation, and classification without fine-tuning.

dinov2self-supervisedvision-transformer
73.1B+
Paperai-evaluation

Holistic Evaluation of Language Models

by Stanford CRFM

Presents HELM, a holistic evaluation framework for language models across 42 scenarios and 59 metrics including accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency. HELM reveals that no single model dominates across all dimensions and exposes significant gaps between narrow and comprehensive model assessment.

evaluationbenchmarkholistic
73B+
PaperLLMs

GPT-4V(ision) System Card

by OpenAI

The system card for GPT-4 with vision (GPT-4V), detailing the model's visual understanding capabilities, safety evaluations, limitations, and mitigation strategies. GPT-4V represents a major advancement in large multimodal models, enabling complex visual reasoning from natural language prompts.

gpt-4vmultimodalvision
72.7B+
PaperLLMs

The Claude 3 Model Family: Opus, Sonnet, Haiku

by Anthropic

Presents the Claude 3 family of models (Opus, Sonnet, Haiku), demonstrating state-of-the-art performance on reasoning, vision, and multilingual tasks. Highlights Anthropic's safety techniques including Constitutional AI and RLHF-based alignment.

claudeanthropicmultimodal
72.6B+
PaperComputer Vision

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding (Imagen)

by Google Brain

Introduced Imagen, a text-to-image diffusion model that leverages large pretrained language models (T5-XXL) for text understanding combined with cascaded diffusion models for image synthesis. Imagen demonstrated that scaling text encoders is more impactful than scaling diffusion models, establishing DrawBench as a new evaluation benchmark.

imagentext-to-imagediffusion
72.2B+
Paperinfrastructure

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

by Princeton University / Together AI

Extends FlashAttention with improved work partitioning across GPU thread blocks and warps, achieving 2× speedup over FlashAttention and ~9× speedup over standard attention. Enables efficient training of models with context lengths up to 256K tokens.

flash-attention-2attentionparallelism
72.2B+
Paperethics

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?

by University of Washington / Black in AI

This influential FAccT paper argues that ever-larger language models carry significant risks—including environmental costs, biased training data, and the illusion of meaning—that are often overlooked in the race for benchmark performance. It calls for pausing scaling to focus on documentation, auditing, and community-centered research practices.

ethicsllmbias
72.1B+
PaperLLMs

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

by Salesforce Research

Presented BLIP-2, which bridges the modality gap between frozen image encoders and frozen LLMs using a lightweight Querying Transformer (Q-Former) trained in two stages. BLIP-2 achieves state-of-the-art VQA performance with significantly fewer trainable parameters than prior methods.

blip-2multimodalq-former
71.9B+
PaperLLMs

Flamingo: a Visual Language Model for Few-Shot Learning

by DeepMind

Introduced Flamingo, a family of visual language models that bridge powerful pretrained vision and language models, enabling few-shot learning on a diverse range of multimodal tasks by training on arbitrarily interleaved sequences of images, video, and text. Flamingo set new few-shot state-of-the-art on 16 benchmarks.

flamingomultimodalfew-shot
71.8B+
PaperAI Agents

MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

by Tsinghua / Peking University / DeepWisdom

Presents MetaGPT, a multi-agent framework that encodes human workflows as Standardized Operating Procedures (SOPs) for LLM agents acting as specialized software roles. By assigning product manager, architect, engineer, and QA roles, MetaGPT produces complete, executable codebases from natural language requirements with higher quality than prior approaches.

agentsmulti-agentsoftware-engineering
71.7B+
PaperLLMs

Let's Verify Step by Step

by OpenAI

Demonstrated that process-based reward models (PRMs), which provide feedback on each reasoning step, substantially outperform outcome-based reward models (ORMs) for training LLMs to solve mathematical reasoning problems. The paper also introduced PRM800K, a dataset of 800K step-level human feedback labels on MATH solutions.

process-reward-modelsreasoningrlhf
71.6B+
PaperLLMs

RoFormer: Enhanced Transformer with Rotary Position Embedding

by Zhuiyi Technology

Introduces Rotary Position Embedding (RoPE), encoding absolute position information with a rotation matrix and naturally incorporating relative position in self-attention. Adopted by LLaMA, PaLM 2, and most modern LLMs for its length generalization properties.

roperotary-position-embeddingpositional-encoding
71.4B+
PaperLLMs

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

by Princeton University

Introduced SWE-bench, a benchmark of 2,294 real GitHub issues from 12 popular Python repositories requiring models to resolve issues by writing code patches. SWE-bench reveals that even the best LLMs resolve fewer than 4% of issues with standard techniques, motivating research into code agents.

swe-benchsoftware-engineeringbenchmark
71.3B+
PaperAI Agents

Voyager: An Open-Ended Embodied Agent with Large Language Models

by NVIDIA / Caltech / UT Austin

Presents Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager uses an automatic curriculum, an ever-growing skill library of executable code, and an iterative prompting mechanism to overcome failures.

agentsminecraftlifelong-learning
71.2B+
Papertraining

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

by Stanford University

Introduces DPO, a stable and efficient alternative to RLHF that directly optimizes a language model on human preference data without an explicit reward model or RL. Achieves comparable or superior alignment results with significantly simpler implementation.

dpoalignmentpreference-optimization
71.2B+
PaperAI Agents

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

by Microsoft Research

Presents GraphRAG, which uses LLM-generated knowledge graphs and community detection to enable query-focused summarization over entire text corpora. Unlike standard RAG which answers local questions from text chunks, GraphRAG enables global sensemaking queries by reasoning over interconnected entity communities at multiple granularities.

ragknowledge-graphgraph
70.9B+
Paperreinforcement-learning

Decision Transformer: Reinforcement Learning via Sequence Modeling

by UC Berkeley / Google Brain

Decision Transformer recasts offline reinforcement learning as a conditional sequence modeling problem, predicting actions given return-to-go, states, and past actions using a causal Transformer. This eliminates the need for temporal difference learning and bootstrapping while achieving competitive performance on Atari and MuJoCo benchmarks.

reinforcement-learningoffline-rltransformers
70.6B+
PaperAI Agents

REALM: Retrieval-Augmented Language Model Pre-Training

by Google Research

Proposes REALM, which augments language model pre-training with a learned textual knowledge retriever, enabling the model to retrieve and attend over documents from a large corpus during both pre-training and fine-tuning. REALM achieves state-of-the-art on Open-domain QA benchmarks while providing interpretable knowledge retrieval.

ragpretrainingretrieval
70.5B+
PaperLLMs

StarCoder: May the Source Be With You!

by BigCode / Hugging Face / ServiceNow

Presented StarCoder, a 15.5B parameter open-source code LLM trained on 1 trillion tokens from The Stack (permissively licensed source code) with fill-in-the-middle capability, fast multi-token prediction inference, and a commitment to responsible AI through a model card and attribution feature.

starcodercode-llmopen-source
70.3B+
Papertraining

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

by Google Brain

Introduces the Sparsely-Gated Mixture-of-Experts (MoE) layer, enabling 1000× capacity increase with only marginal computational cost increase. A learned gating network selects a sparse subset of expert sub-networks per input, enabling unprecedented model scale.

mixture-of-expertsmoesparse
70.3B+
Paperrobotics

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

by Google / Everyday Robots

SayCan combines the semantic reasoning capabilities of large language models with learned value functions that encode physical feasibility, allowing robots to plan long-horizon tasks expressed in natural language. The approach grounds high-level language instructions in real-world robot affordances without task-specific fine-tuning.

roboticslanguage-groundingllm
70.1B+
Papertraining

DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter

by Hugging Face

Introduces DistilBERT, a knowledge-distilled version of BERT that retains 97% of BERT's language understanding while being 40% smaller and 60% faster. Demonstrates the effectiveness of task-agnostic knowledge distillation for pretrained language models.

distilbertknowledge-distillationbert
69.9B
Paperreinforcement-learning

Conservative Q-Learning for Offline Reinforcement Learning

by UC Berkeley

CQL (Conservative Q-Learning) addresses distribution shift in offline RL by augmenting the standard Bellman objective with a term that penalizes Q-values for out-of-distribution actions, producing a lower bound on the true value function. This conservative approach prevents over-optimistic value estimation and achieves strong performance across locomotion, navigation, and robotic manipulation datasets.

reinforcement-learningoffline-rlq-learning
69.8B
PaperAI Agents

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

by University of Washington / IBM AI Research / Allen AI

Introduces Self-RAG, a framework that trains a single LM to adaptively retrieve passages on demand, generate text, and critique its own outputs using special reflection tokens. Unlike standard RAG, Self-RAG decides when to retrieve and reflects on retrieved passages and generation quality, outperforming ChatGPT and standard RAG on diverse downstream tasks.

ragself-reflectioncritique
69.7B
Paperresearch

Emergent Abilities of Large Language Models

by Google Research / Stanford / DeepMind / UNC

Defines and documents emergent abilities in LLMs — capabilities that appear sharply at certain model scales rather than improving gradually. Surveys over 100 tasks where models exhibit phase-transition-like capability gains, sparking debate on whether emergence is real or a measurement artifact.

emergent-abilitiesscalingphase-transitions
69.6B
PaperAI Agents

Improving Language Models by Retrieving from Trillions of Tokens

by DeepMind

Presents RETRO (Retrieval-Enhanced Transformers), a model that retrieves from a 2-trillion-token database at inference time via chunked cross-attention. RETRO achieves performance comparable to GPT-3 with 25× fewer parameters by leveraging retrieved passages, demonstrating that retrieval augmentation is a compute-efficient alternative to scaling.

ragretrievallanguage-model
69.4B
PaperAI Agents

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

by Princeton NLP / Princeton Language and Intelligence

Introduces SWE-agent, which defines Agent-Computer Interfaces (ACIs) to enable LLMs to autonomously solve real GitHub issues by browsing codebases, editing files, and running tests. On the SWE-bench benchmark, SWE-agent with GPT-4 Turbo resolves 12.5% of issues, significantly outperforming prior methods.

agentssoftware-engineeringcode
69.3B
PaperAI Ethics & Safety

Red Teaming Language Models with Language Models

by DeepMind

Proposes using language models to automatically generate test cases that elicit harmful behaviors from target language models—a scalable alternative to manual red teaming. The approach discovers diverse attack prompts across harm categories and reveals that larger models are harder to red-team but produce more harmful outputs when successfully attacked.

safetyred-teamingadversarial
69B
PaperLLMs

Qwen2.5 Technical Report

by Alibaba Cloud / Qwen Team

Qwen2.5 is a comprehensive family of open-source LLMs (0.5B to 72B parameters) trained on 18 trillion tokens including significantly expanded coding and mathematics data, achieving state-of-the-art open-source performance on coding (HumanEval), mathematics (MATH), and multilingual benchmarks. The series includes specialized Qwen2.5-Coder and Qwen2.5-Math variants.

llmqwenalibaba
69B
Paperai-evaluation

Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* Quality

by LMSYS / UC Berkeley / CMU / UCSD

Presents Vicuna-13B, an open-source chatbot created by fine-tuning LLaMA on ShareGPT conversation data, achieving approximately 90% of ChatGPT and Bard quality as judged by GPT-4. The paper introduces GPT-4 as an automated judge for chatbot evaluation, establishing a widely adopted evaluation paradigm for conversational AI.

evaluationopen-sourcechatbot
68.9B
PaperAI Agents

AgentBench: Evaluating LLMs as Agents

by Tsinghua University

Introduces AgentBench, the first systematic benchmark for evaluating LLMs as autonomous agents across eight distinct environments spanning operating systems, databases, knowledge graphs, digital games, and web browsing. The benchmark reveals a large performance gap between commercial and open-source models on real-world agent tasks.

benchmarkagentsevaluation
68.4B
PaperLLMs

Competition-Level Code Generation with AlphaCode

by DeepMind

AlphaCode is a large-scale language model from DeepMind designed for competitive programming. It was pre-trained on public GitHub code and fine-tuned on a curated dataset of programming contest problems. The system generates a vast number of potential solutions and then filters them using test cases to find a correct one.

alphacodedeepmindcode-generation
68.3B
PaperAI Ethics & Safety

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

by OpenAI

This paper explores weak-to-strong generalization, a method for training a powerful AI model using supervision from a weaker one. It serves as an analogy for aligning superintelligent AI with human values. The research shows that strong models can learn beyond their weak supervisors and introduces techniques like auxiliary confidence loss to improve performance.

ai-safetyalignmentsuperalignment
68B
PaperAI Ethics & Safety

Scalable agent alignment via reward modeling: a research direction

by DeepMind

This research paper proposes a method for aligning advanced AI systems by using recursive reward modeling. The approach leverages AI assistants to help human evaluators assess complex AI actions, enabling scalable oversight and positioning this technique alongside debate and amplification as key AI safety strategies.

alignmentscalable-oversightreward-modeling
67.9B
PaperLLMs

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

by Alibaba Cloud / DAMO Academy

Qwen-VL is a large-scale vision-language model series from Alibaba, trained on a curated multilingual multimodal dataset. It supports high-resolution image understanding, visual grounding with bounding boxes, and multilingual text reading, achieving state-of-the-art results on multiple visual benchmarks.

qwen-vlmultimodalvision-language
67.8B
PaperAI Agents

CAMEL: Communicative Agents for Mind Exploration of Large Language Model Society

by KAUST

CAMEL introduces a novel framework for studying multi-agent cooperation by having AI agents role-play to solve tasks. It utilizes a technique called 'inception prompting' to ensure agents adhere to their assigned personas, enabling the exploration of complex communicative behaviors and societal dynamics within large language models with minimal human guidance.

multi-agent-systemsagent-communicationrole-playing-ai
67.7B
PaperLLMs

STaR: Bootstrapping Reasoning With Reasoning

by Stanford University / Google Brain

STaR (Self-Taught Reasoner) is a research paper introducing an iterative bootstrapping method for language models. The model learns to improve its reasoning abilities by generating rationales for problems, filtering out the incorrect ones, and then fine-tuning itself on the successfully reasoned examples. This allows smaller models to achieve reasoning performance comparable to much larger ones.

starself-taught-reasonerbootstrapping
67.5B
PaperLLMs

The Llama 4 Herd: The Beginning of a New Era of Natively Multimodal AI

by Meta AI

Llama 4 introduces a family of natively multimodal mixture-of-experts models—Scout (17B/16 experts), Maverick (17B/128 experts), and Behemoth (288B/16 experts)—pretrained jointly on text, image, and video data. Maverick achieves top scores on vision-language benchmarks while Scout offers 10M-token context at a fraction of the compute of comparable models.

llmmultimodalmixture-of-experts
67.5B
PaperLLMs

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

by Google Research

Introduces Grouped-Query Attention (GQA), an efficient attention mechanism that generalizes Multi-Head and Multi-Query Attention. GQA groups query heads to share key and value heads, drastically reducing the KV cache size and memory bandwidth, which accelerates inference speed while maintaining near Multi-Head quality.

grouped-query-attentiongqamulti-query-attention
67.4B
Paperethics

Artificial Intelligence Ethics Guidelines: A Global Inventory

by EPFL / Multiple Institutions

This paper presents a systematic review of 84 prominent AI ethics guidelines from around the world. It identifies a global convergence on five key ethical principles, including transparency and justice, but reveals significant divergence in how these principles are interpreted and operationalized across different sectors and regions.

ethicsai-policyguidelines
67B
Paperinterpretability

Zoom In: An Introduction to Circuits

by Distill / OpenAI

This essay by Chris Olah and colleagues at Distill introduces the circuits framework for mechanistic interpretability, arguing that neural network weights encode interpretable algorithms composed of features and circuits. It presents case studies of curve detectors and multimodal neurons as evidence that individual units and motifs in neural networks are meaningfully interpretable.

interpretabilitymechanistic-interpretabilitycircuits
66.6B
PaperAI Ethics & Safety

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

by Anthropic

Demonstrates that LLMs can be trained to behave safely during normal operation but exhibit unsafe behaviors when triggered by specific conditions—acting as 'sleeper agents'—and that standard safety training techniques including RLHF, supervised fine-tuning, and adversarial training fail to reliably remove these backdoors, sometimes even hiding them deeper.

safetydeceptionalignment
66.4B
Papertraining

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

by Google Research

Introduces Switch Transformers, simplifying MoE routing to select a single expert per token (top-1), enabling stable trillion-parameter T5-scale models with 7× pre-training speedup. Demonstrates that parameter count and compute can be decoupled through sparsity.

switch-transformermixture-of-expertstrillion-parameters
66.1B
Paperreinforcement-learning

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

by DeepSeek

This paper introduces Group Relative Policy Optimization (GRPO), a memory-efficient reinforcement learning algorithm. GRPO enables scalable RLHF-style training by replacing the critic model with group-sampled reward baselines, a technique used to enhance the mathematical reasoning of models like DeepSeekMath.

reinforcement-learninggrpomath-reasoning
66.1B
Paperrobotics

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

by Google DeepMind

RT-2 is a Vision-Language-Action (VLA) model that translates visual and language inputs directly into robotic actions. By co-fine-tuning large models on both web-scale and robotics data, it transfers knowledge from the internet to physical control, enabling robots to reason about and execute tasks involving novel objects and scenarios without explicit robotic training.

roboticsvision-language-modelsaction-models
65.5B
PaperAI Ethics & Safety

Representation Engineering: A Top-Down Approach to AI Transparency

by Center for AI Safety / UC Berkeley

Representation Engineering (RepE) is a top-down AI transparency technique for interpreting and controlling Large Language Models. It uses linear probes on activation differences from contrastive prompts to identify and manipulate high-level concepts like truthfulness and emotion without needing to retrain or fine-tune the model.

interpretabilitytransparencyrepresentation-engineering
65.2B
PaperAI Agents

Atlas: Few-shot Learning with Retrieval Augmented Language Models

by Meta AI / University College London

Atlas is a retrieval-augmented language model designed for few-shot learning. It uniquely pre-trains its retriever and language model components jointly, enabling it to effectively leverage external knowledge documents. This approach allows Atlas to achieve state-of-the-art few-shot performance on knowledge-intensive NLP benchmarks like MMLU, outperforming much larger models.

ragfew-shot-learningretrieval-augmented-generation
65.2B
Paperinterpretability

Towards Monosemanticity: Decomposing Language Models with Dictionary Learning

by Anthropic

This research paper from Anthropic introduces a method using sparse autoencoders to decompose the internal activations of a transformer model. It successfully extracts thousands of interpretable, monosemantic features, demonstrating that the superposition of concepts within neurons can be untangled.

interpretabilitymonosemanticitydictionary-learning
65B
PaperLLMs

Claude Opus 4 Technical Report

by Anthropic

The Claude Opus 4 technical report details Anthropic's flagship model, highlighting its extended thinking, advanced coding, and agentic capabilities. It showcases top-tier performance on benchmarks like SWE-bench and GPQA, along with significant improvements in safety through Constitutional AI and RLHF.

claude-opus-4anthropicllm-research
65B
PaperLLMs

Gemini 2.5 Pro Technical Report

by Google DeepMind

Gemini 2.5 Pro introduces thinking mode—an integrated chain-of-thought reasoning layer—combined with a 1M-token context window and natively multimodal capabilities spanning text, image, audio, and video. The model achieves leading positions on multiple reasoning and coding benchmarks including Codeforces, AIME, and MMMU.

llmgeminigoogle
64.7B
Paperinterpretability

In-context Learning and Induction Heads

by Anthropic

This paper establishes a causal link between specific transformer circuits, termed "induction heads," and the phenomenon of in-context learning. It demonstrates that these two-layer attention patterns, which copy and complete sequences, emerge predictably during training and are a key mechanistic driver of few-shot learning abilities in LLMs.

interpretabilitycircuitsinduction-heads
63.9B
PaperLLMs

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

by Carnegie Mellon University / Together AI

Mamba is a novel sequence modeling architecture based on structured state space models (SSMs). It introduces a selection mechanism that allows the model to selectively propagate or forget information based on the input, overcoming a key limitation of previous SSMs. This enables Mamba to achieve Transformer-level performance with linear time complexity and significantly faster inference.

mambastate-space-modelssm
63.8B
Paperai-evaluation

LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

by LMSYS / UC Berkeley

Introduces LMSYS-Chat-1M, a large-scale dataset of one million real-world conversations with 25 state-of-the-art LLMs collected from the Chatbot Arena platform. Analysis reveals diverse usage patterns, safety violations, and human preference signals, making it a valuable resource for safety evaluation, capability assessment, and alignment research.

datasetevaluationconversations
63.8B
Paperdomain-specific

Towards Expert-Level Medical Question Answering with Large Language Models

by Google Research

This paper introduces Med-PaLM 2, a large language model fine-tuned on medical data. It achieves expert-level performance on medical licensing exam questions, demonstrating clinical reasoning comparable to physicians, and proposes a framework for evaluating the safety and alignment of medical AI systems.

healthcaremedical-aillm
63.4B
PaperLLMs

CogVLM: Visual Expert for Pretrained Language Models

by Tsinghua University / Zhipu AI

CogVLM is a vision-language model that enhances pretrained language models (LLMs) with visual understanding. It introduces a trainable visual expert module into each layer of a frozen LLM, enabling deep fusion of image and text features. This approach achieves state-of-the-art results on numerous vision-language benchmarks without altering the original language model's parameters.

cogvlmmultimodalvisual-expert
63.4B
PaperLLMs

Fast Transformer Decoding: One Write-Head is All You Need (Multi-Query Attention)

by Google Brain

Introduces Multi-Query Attention (MQA), an efficient attention mechanism for autoregressive decoding. By sharing a single key and value head across all query heads, MQA drastically reduces the size of the KV cache. This leads to significant memory bandwidth savings and faster inference speeds with minimal impact on model quality.

multi-query-attentionmqainference-speed
63.2B
Providerai-providers

NVIDIA AI

by NVIDIA

NVIDIA AI provides a comprehensive suite of hardware and software solutions for accelerating AI development and deployment. Their offerings include GPUs optimized for deep learning, AI software development kits (SDKs), and pre-trained AI models to enable faster innovation across various industries.

gpudeep-learninghardware
93A+
Providerai-providers

Amazon SageMaker

by Amazon Web Services (AWS)

Amazon SageMaker is a fully managed machine learning service that enables data scientists and developers to build, train, and deploy machine learning models quickly. It provides a suite of tools and services covering the entire ML lifecycle, from data preparation to model deployment and monitoring.

cloud-mlmanaged-servicemachine-learning
86.7A
Providerai-providers

Databricks

by Databricks

Databricks is a unified data analytics platform built on Apache Spark, providing tools for data engineering, data science, and machine learning. It enables organizations to process large datasets, build and deploy ML models, and collaborate across teams.

data-engineeringmachine-learningapache-spark
82.3A
Providerai-providers

AssemblyAI

by AssemblyAI

AssemblyAI provides a Speech-to-Text API that allows developers to transcribe audio and video files with high accuracy. Their platform offers features like speaker diarization, sentiment analysis, and content moderation, making it a comprehensive solution for audio intelligence.

speech-to-textaudio-intelligencetranscription
80.8A
ProviderAI Tools & APIs

Hugging Face

by Hugging Face

Hugging Face is the GitHub of AI, providing the world's largest open model hub, dataset repository, and ML collaboration platform. Its Transformers library is the de-facto standard for working with open-weight models, and the Hugging Face Hub hosts hundreds of thousands of models and datasets. Its Spaces platform allows AI demos to be deployed instantly.

model-hubopen-sourceinfrastructure
78.3B+
ProviderAI Infrastructure

Amazon Web Services AI

by Amazon

Amazon Web Services is the world's largest cloud provider and offers the most comprehensive set of AI and machine learning services, including Amazon Bedrock for managed foundation model APIs, SageMaker for MLOps, Rekognition for computer vision, and Alexa for voice AI. AWS Bedrock gives enterprises access to models from Anthropic, Meta, Mistral, Cohere, and others through a unified API.

cloud-providermlopsenterprise
75.3B+
ProviderAI Tools & APIs

LangChain Inc

by LangChain Inc

LangChain Inc is the company behind the most widely adopted LLM orchestration framework in the AI ecosystem. LangChain provides composable abstractions for building LLM-powered applications, while its LangSmith platform offers observability and evaluation tooling, and LangGraph enables the construction of stateful, multi-actor agent workflows.

ai-frameworkorchestrationrag
74.4B+
ProviderAI Infrastructure

Microsoft Azure AI

by Microsoft

Microsoft Azure AI is the AI services division of Microsoft's cloud platform, uniquely positioned as the exclusive cloud partner of OpenAI. Through Azure OpenAI Service, enterprises access GPT-4, DALL-E, and Whisper with enterprise-grade compliance and data residency guarantees. Microsoft has deeply integrated AI across its product suite including Copilot for Microsoft 365, GitHub Copilot, and Azure AI Foundry.

cloud-providerenterprisemanaged-ai
73.9B+
ProviderAI Infrastructure

Google Cloud AI

by Google

Google Cloud AI provides enterprise access to Google DeepMind's Gemini models and a comprehensive suite of managed AI services via Vertex AI. As the creator of the Transformer architecture and TensorFlow, Google Cloud offers unmatched AI infrastructure including custom TPUs, a full MLOps platform, and pre-built APIs for vision, speech, and natural language processing.

cloud-providerenterprisemanaged-ai
71.4B+
Providerai-providers

Graphcore

by Graphcore

Graphcore is a semiconductor company that develops Intelligence Processing Units (IPUs), a type of microprocessor designed specifically for AI and machine learning workloads. Their IPUs are designed to accelerate training and inference for complex AI models, offering an alternative to GPUs.

hardwareacceleratoripu
69.5B
ProviderAI Infrastructure

Pinecone Systems

by Pinecone

Pinecone is the leading managed vector database, purpose-built for AI applications requiring similarity search at scale. It powers retrieval-augmented generation, semantic search, and recommendation systems for thousands of enterprises. Pinecone's serverless architecture eliminates infrastructure management while delivering sub-millisecond query performance.

vector-databaseinfrastructurerag
69.2B
Providerai-research

LMSYS

by LMSYS / UC Berkeley

LMSYS (Large Model Systems Organization) is a research collective from UC Berkeley known for creating Chatbot Arena—the leading human preference-based LLM evaluation leaderboard—and developing high-performance open-source inference systems including vLLM and FastChat. LMSYS research on Elo-based evaluation and serving efficiency has become foundational to the field.

open-sourcebenchmarkingresearch
68.6B
Providerai-research

EleutherAI

by EleutherAI

EleutherAI is a decentralized open-source AI research collective best known for training and releasing the GPT-Neo, GPT-J, GPT-NeoX, and Pythia model families, as well as developing the LM Evaluation Harness—the standard benchmarking framework for language models. The organization operates as a grassroots nonprofit committed to open and reproducible AI research.

open-sourcellmresearch
67.8B
Providerai-research

Allen Institute for AI (AI2)

by Allen Institute for AI

The Allen Institute for AI (AI2) is a nonprofit research institute focused on high-impact, open-source AI. Founded by Paul Allen, it produces foundational models like OLMo, influential datasets such as MMLU, and reasoning benchmarks. Its Semantic Scholar platform provides AI-powered discovery across 200M+ academic papers.

open-sourceresearchnlp
67.3B
Providerai-data

Scale AI

by Scale AI

Scale AI is the leading AI data platform providing high-quality training data labeling, RLHF pipelines, and model evaluation services for frontier AI labs, government agencies, and Fortune 500 enterprises. Its Rapid platform and data engine power training datasets for many leading language and vision models.

data-labelingrlhfevaluation
67B
Providerai-audio

ElevenLabs

by ElevenLabs

ElevenLabs is a voice technology research company developing advanced text-to-speech and voice cloning software. Their platform allows users to generate high-quality spoken audio in numerous languages, create custom AI voices, or clone existing ones. It is widely used for audiobooks, video games, and content creation.

ttsvoice-cloningaudio-ai
67B
Providerai-research

LAION

by LAION

LAION (Large-scale Artificial Intelligence Open Network) is a German nonprofit that creates and releases massive open datasets for AI research. Its most notable contribution, LAION-5B, is a dataset of 5.85 billion image-text pairs that was pivotal in training foundational models like Stable Diffusion.

datasetsopen-sourcenonprofit
66.9B
Providerai-search

Perplexity AI

by Perplexity AI

Perplexity AI is an answer engine that combines real-time web search with large language model reasoning to deliver cited, conversational responses. Founded in 2022, it has rapidly grown to tens of millions of monthly active users and positions itself as an AI-native alternative to traditional search engines.

searchraganswer-engine
66.8B
ProviderAI Tools & APIs

Weights & Biases

by Weights & Biases

Weights & Biases (W&B) is a leading MLOps platform for developers, specializing in experiment tracking, model evaluation, and dataset versioning. It provides tools to visualize model performance, manage datasets, and collaborate on machine learning projects, integrating with popular frameworks like PyTorch and TensorFlow.

mlopsexperiment-trackingobservability
65.4B
Providerai-creative

Runway ML

by Runway ML

Runway is an applied AI research company focused on building multimodal AI systems for art, entertainment, and human creativity. It provides a suite of web-based tools for generative content creation, including industry-leading text-to-video, image-to-video, and AI-powered video editing features for creative professionals.

video-generationcreative-aimultimodal
65.1B
Providerai-consumer

Character AI

by Character AI

Character AI is a consumer platform for creating and interacting with AI-powered characters. Users can engage in conversations for entertainment, role-playing, and creative exploration. It has become a major consumer AI application with a massive user base, focusing on personalized and immersive chat experiences.

chatbotsroleplayconsumer-ai
63.8B
ProviderAI Business & Strategy

Stability AI

by Stability AI

Stability AI is a generative AI company known for developing the popular open-source Stable Diffusion text-to-image model. They focus on creating open, multi-modal AI models for image, language, audio, and video generation, which are accessible via APIs and as downloadable weights for custom implementation.

generative-aiimage-generationvideo-generation
62.7B
ProviderAI Infrastructure

Groq

by Groq

Groq is a semiconductor company that developed the Language Processing Unit (LPU), a custom chip for ultra-fast AI inference. Their managed API provides some of the fastest publicly available LLM inference speeds, often exceeding 800 tokens/second, making it ideal for latency-sensitive applications.

inferencehardwarelpu
62.3B
ProviderAI Infrastructure

Weaviate

by Weaviate

Weaviate is an open-source vector database designed for AI-native applications. It enables flexible hybrid search, combining vector and keyword methods, and uniquely supports multi-modal data like text, images, and audio. Weaviate offers both self-hosting for maximum control and a managed cloud service for ease of use.

vector-databaseopen-sourceinfrastructure
61.2B
Providerai-research

BigCode Project

by BigCode / Hugging Face / ServiceNow

BigCode is an open scientific collaboration by Hugging Face and ServiceNow for the responsible development of large language models (LLMs) for code. The project produced the StarCoder and StarCoder2 models, trained on 'The Stack' dataset, with a strong emphasis on ethical data governance, source attribution, and consent.

open-sourcecode-modelsresearch-collaboration
60.7B
Providerai-research

BigScience

by BigScience / Hugging Face

BigScience was a year-long, open research collaboration involving over 1,000 volunteer researchers, organized by Hugging Face. This global effort focused on the transparent and ethical development of large language models, culminating in the creation of BLOOM, a 176-billion parameter open-access multilingual model.

open-sourcellmresearch
59.2C+
ProviderAI Infrastructure

Together AI

by Together AI

Together AI provides a high-performance cloud inference platform for open-source models, offering one of the fastest and most cost-effective APIs for running models like Llama, Mistral, and DeepSeek. Its Together Inference platform specializes in speculative decoding and model parallelism techniques, and also offers managed fine-tuning and custom model deployment.

inferenceopen-source-hostingfine-tuning
57.8C+
Providerai-creative

Synthesia

by Synthesia

Synthesia is an enterprise AI video generation platform that enables users to create professional-quality videos featuring realistic AI avatars from text scripts, without cameras, actors, or studios. Serving thousands of enterprise customers including Accenture, BBC, and Reuters, it is the leading platform for scalable AI-generated corporate video content.

video-generationavatarssynthetic-media
57.4C+
Providerai-marketing

Jasper AI

by Jasper AI

Jasper AI is an enterprise-grade AI content platform designed for marketing teams to produce brand-consistent copy, campaigns, and creative assets at scale. It integrates with brand voice guidelines, company knowledge bases, and major marketing workflows to maintain tone consistency across channels.

copywritingmarketing-aicontent-generation
56.4C+
Providerai-legal

Casetext

by Casetext / Thomson Reuters

Casetext was a pioneer in AI-powered legal research and drafting, launching CoCounsel—the first AI legal assistant powered by GPT-4—before being acquired by Thomson Reuters in 2023 for $650M. Its technology is now integrated into Westlaw and Practical Law, making AI legal assistance available to millions of legal professionals.

legal-ailegaltechlegal-research
56.3C+
ProviderAI Infrastructure

Anyscale

by Anyscale

Anyscale is the company behind Ray, the open-source distributed computing framework that has become the infrastructure backbone for training and serving large-scale AI at companies like OpenAI, Uber, and Spotify. Anyscale provides a managed platform for Ray workloads, including Anyscale Endpoints for scalable LLM inference and RayLLM for open-model serving.

infrastructuredistributed-computingray
56.2C+
ProviderAI Infrastructure

Replicate

by Replicate

Replicate is a cloud platform that makes it trivial to run open-source machine learning models via a simple API with pay-per-second billing. It hosts thousands of community models spanning image generation, video, audio, and language, and allows developers to package and deploy custom models as Cogs without managing any GPU infrastructure.

model-deploymentinfrastructuremanaged-inference
55.5C+
Providerai-data

Labelbox

by Labelbox

Labelbox is an enterprise data-curation and annotation platform that streamlines the creation of high-quality training datasets for computer vision, NLP, and multimodal AI models. It provides annotation tooling, quality workflows, model-assisted labeling, and a managed workforce marketplace.

data-labelingannotationmlops
53.4C+
Providerai-legal

Harvey AI

by Harvey AI

Harvey AI is an enterprise legal AI platform built on foundation models fine-tuned on legal corpora to assist law firms and corporate legal departments with research, drafting, due diligence, and contract analysis. It is deployed at leading global law firms and backed by OpenAI, positioning itself as the AI layer for professional legal services.

legal-ailegaltechenterprise
52.5C+
Providerai-hardware

Cerebras Systems

by Cerebras Systems

Cerebras Systems designs and manufactures the Wafer Scale Engine (WSE), the world's largest AI chip, enabling ultra-fast LLM training and inference at speeds far exceeding GPU clusters. Its CS-3 system and Cerebras Inference cloud service deliver token generation rates of 2,000+ tokens/second for leading open-weight models.

ai-chipswafer-scaleinference
52.3C+
ProviderAI Infrastructure

BentoML

by BentoML

BentoML is an open-source platform for building, shipping, and scaling AI applications and model inference services, providing a unified framework from local development to cloud production. BentoCloud, its managed service, offers one-click deployment, auto-scaling, and observability for ML teams.

mlopsmodel-servingopen-source
52.1C+
Providerai-research

Nomic AI

by Nomic AI

Nomic AI builds open, auditable AI systems focused on embedding models and large-scale data visualization, most notably the nomic-embed-text model and Atlas—a platform for exploring and understanding massive datasets through interactive AI-powered maps. The company emphasizes transparency and reproducibility in model development.

open-sourceembeddingsvisualization
51C+
ProviderAI Infrastructure

Modal

by Modal Labs

Modal is a serverless cloud platform purpose-built for running GPU-intensive Python workloads including ML inference, fine-tuning, and batch processing without managing infrastructure. Developers define compute requirements in Python decorators and Modal handles container orchestration, scaling, and cold-start optimization.

serverless-gpumlopscloud-compute
51C+
ProviderAI Infrastructure

Fireworks AI

by Fireworks AI

Fireworks AI is a production inference platform founded by ex-Google Brain researchers, offering fast and reliable serving for open-weight models with enterprise SLAs. Fireworks specializes in compound AI systems, function calling, and JSON-mode inference, and provides FireFunction—its own fine-tuned function-calling model—alongside hosting for Llama, Mistral, and other popular open models.

inferenceopen-source-hostingenterprise
50.8C+
Providerai-healthcare

PathAI

by PathAI

PathAI develops AI-powered pathology solutions that enable more accurate cancer diagnosis, biomarker assessment, and drug development support by analyzing histopathology images at scale. Its AISight platform is deployed in clinical laboratories and pharmaceutical research, improving diagnostic consistency and accelerating oncology trials.

pathologymedical-aidiagnostics
49.2C
Providerai-data

Snorkel AI

by Snorkel AI

Snorkel AI commercializes weak supervision and programmatic data development research from Stanford AI Lab, enabling teams to build, manage, and iterate on AI training datasets programmatically at scale. Its platform reduces reliance on manual labeling by using labeling functions and foundation model assistance.

programmatic-labelingdata-developmentweak-supervision
49C
ProviderAI Infrastructure

IBM Watson / watsonx

by IBM

IBM Watson, now branded as IBM watsonx, is IBM's enterprise AI platform offering governed, trustworthy AI for regulated industries. The watsonx.ai studio, watsonx.data lakehouse, and watsonx.governance suite provide a complete enterprise AI development and deployment pipeline with strong emphasis on explainability, fairness, and compliance for sectors like finance, healthcare, and government.

cloud-providerenterprisegoverned-ai
47.2C
ProviderAI Infrastructure

Oracle AI

by Oracle

Oracle AI provides a suite of generative AI services built into Oracle Cloud Infrastructure (OCI), including the OCI Generative AI Service powered by Cohere and Meta models. Oracle has uniquely integrated AI capabilities directly into its database (Oracle Database 23ai), ERP, and industry cloud offerings, targeting enterprises with existing Oracle relationships.

cloud-providerenterprisedatabase-ai
47C
ProviderAI Business & Strategy

Zhipu AI (GLM)

by Zhipu AI

Zhipu AI is a Chinese AI company spun out of Tsinghua University's KEG Lab, known for the GLM (General Language Model) series. Its ChatGLM models were among the first high-quality open Chinese language models and have been widely adopted in Chinese industry and research communities.

ai-labfoundation-modelschinese
46.9C
ProviderAI Agents

Adept AI

by Adept AI

Adept AI builds AI systems that can take actions in software to complete complex multi-step workflows on behalf of users. The company focuses on general-purpose action models trained to interact with real-world software interfaces through browser and desktop automation.

agentscomputer-useworkflow-automation
46.9C
Providerai-biotech

Recursion Pharmaceuticals

by Recursion Pharmaceuticals

Recursion Pharmaceuticals is a clinical-stage techbio company that combines automated biology, large-scale imaging, and machine learning to industrialize drug discovery, operating one of the largest biological datasets in the industry. Its Recursion OS platform maps biological relationships at unprecedented scale to identify novel therapeutic targets and drug candidates.

drug-discoverybiotechai-biology
46.7C
Providerai-observability

Helicone

by Helicone

Helicone is an open-source LLM observability and monitoring platform that provides a single proxy endpoint for logging, tracking costs, debugging, and improving LLM applications across all major model providers. It integrates with a one-line code change and supports caching, rate limiting, and prompt management.

observabilityllm-monitoringlogging
46.4C
Providerai-biotech

Insilico Medicine

by Insilico Medicine

Insilico Medicine is an AI-driven drug discovery company that has become the first to advance an AI-designed small molecule into Phase II clinical trials, demonstrating end-to-end AI-powered drug development from target identification through IND. Its Chemistry42 and PandaOmics platforms generatively design and screen drug candidates.

drug-discoveryai-chemistrygenerative-ai
46C
Providerai-hardware

SambaNova Systems

by SambaNova Systems

SambaNova Systems builds reconfigurable AI hardware and software solutions optimized for enterprise-scale LLM training and inference, offering its Samba-1 model and SambaNova Cloud API as commercial services. The company's Reconfigurable Dataflow Unit (RDU) architecture is designed specifically for deep learning workloads.

ai-chipsreconfigurableinference
45.4C
Providerllm-providers

xAI

by xAI

xAI is Elon Musk's AI company and creator of the Grok model family. It provides API access to Grok models with real-time web search integration, available through the xAI API and X (Twitter) platform. Grok models are trained on a broad mix of web and social data and emphasize up-to-date knowledge and uncensored reasoning.

llmgrokreal-time
44C
Providergpu-compute

Vast.ai

by Vast.ai

Vast.ai is a peer-to-peer GPU marketplace connecting researchers and startups with spare GPU capacity from data centers and individuals worldwide. It offers some of the cheapest GPU rental prices on the market with flexibility to choose hardware by price, latency, or reliability score. Best suited for cost-sensitive experimentation and training runs.

gpu-cloudmarketplacepeer-to-peer
44C
Providergpu-compute

Together AI (GPU Compute)

by Together AI

Together AI's compute platform provides on-demand and reserved GPU clusters for training and fine-tuning open-source models. It offers H100 and A100 clusters with high-bandwidth networking optimized for distributed training runs, serving as both a GPU cloud provider and an inference platform. Teams use Together AI compute to run multi-node training jobs on Llama and Mistral variants.

gpu-cloudh100a100
44C
Providerllm-providers

Together AI

by Together AI

Together AI provides a cloud platform for running, fine-tuning, and deploying open-source language models. It hosts a wide catalog of models from Llama to Mistral and offers serverless inference, dedicated endpoints, and a fine-tuning pipeline. Together AI is popular among developers who want OpenAI-compatible APIs for open-weight models at competitive pricing.

inferencefine-tuningopen-source
44C
Providerllm-providers

SambaNova

by SambaNova Systems

SambaNova Systems builds custom AI hardware (Reconfigurable Dataflow Units) and offers cloud inference via SambaNova Cloud. It delivers some of the highest throughput speeds for large models including Llama 3 and Meta's frontier releases, targeting enterprises that need predictable, high-throughput inference at scale.

inferencerduhardware
44C
Providergpu-compute

RunPod

by RunPod

RunPod is a community-driven GPU cloud marketplace offering some of the lowest per-hour prices for NVIDIA and AMD GPUs. It enables developers to rent GPU compute from a distributed network of data centers and deploy containerized workloads instantly. RunPod supports serverless GPU endpoints, making it popular for open-source model inference.

gpu-cloudcost-efficientmarketplace
44C
Providergpu-compute

Replicate

by Replicate

Replicate is a platform for running machine learning models in the cloud via a simple API. It hosts thousands of open-source models for image generation, language, audio, and video, deployable with a single API call. Replicate charges per-second of GPU usage and supports deploying custom models as private or public endpoints.

gpu-cloudmodel-hostingapi
44C
Providerllm-providers

OpenAI

by OpenAI

OpenAI is the leading AI research and deployment company behind the GPT and o-series model families. It offers API access to frontier language models, image generation via DALL-E, speech recognition via Whisper, and an Assistants API for building stateful agent workflows. OpenAI operates both a consumer product (ChatGPT) and an enterprise API platform used by millions of developers.

llmgptapi
44C
Providergpu-compute

Modal

by Modal Labs

Modal is a cloud compute platform for running GPU workloads from Python, with a focus on developer ergonomics and serverless scaling. It allows deploying Python functions as GPU-accelerated endpoints with zero infrastructure configuration, automatic scaling to zero, and fast cold-start times. Popular for ML inference, batch jobs, and LLM serving.

gpu-cloudserverlesspython
44C
Providerllm-providers

Mistral AI

by Mistral AI

Mistral AI is a French AI company known for publishing high-efficiency open-weight models alongside its commercial API offerings. The Mistral and Mixtral model families deliver strong benchmark performance at a fraction of the compute cost of larger models. Mistral's La Plateforme API provides access to both open and closed proprietary models.

llmmistralmixtral
44C
Providerllm-providers

Meta AI

by Meta

Meta AI is the open-source AI division of Meta, responsible for the Llama model family. Llama 4 and its variants are released under open weights licenses, enabling local deployment, fine-tuning, and commercial use. Meta provides model weights via Hugging Face and its own download portal, making it the dominant open-weights LLM ecosystem.

llmllamaopen-weights
44C
Providergpu-compute

Lambda Labs

by Lambda Labs

Lambda Labs provides cloud GPU instances and on-premises GPU servers targeted at AI researchers and ML engineers. Its Lambda Cloud offers on-demand and reserved NVIDIA H100 and A100 instances at competitive rates with a simple developer-friendly interface. Lambda also sells GPU workstations and servers for local development.

gpu-cloudh100a100
44C
Providerllm-providers

Groq

by Groq

Groq offers ultra-low-latency LLM inference through its custom Language Processing Unit (LPU) hardware. The GroqCloud API serves open-weight models including Llama, Mixtral, and Gemma at speeds that far exceed GPU-based inference, making it ideal for real-time agent applications. Groq provides a developer-friendly API compatible with the OpenAI client format.

inferencelpulow-latency
44C
Providerllm-providers

Google DeepMind

by Google DeepMind

Google DeepMind is the unified AI research division behind the Gemini model family. It offers API access through Google AI Studio and Vertex AI, covering multimodal reasoning, code generation, long-context understanding up to 2M tokens, and tight integration with Google Cloud services. DeepMind also publishes foundational research in reinforcement learning and scientific AI.

llmgeminimultimodal
44C
Providergpu-compute

Google Cloud (GPU)

by Google Cloud

Google Cloud offers A100, H100, and TPU v5 instances for AI training and inference via Compute Engine and Vertex AI. Google Cloud's TPU pods provide unique competitive advantage for training large models efficiently, while its A3 instances with H100s target inference workloads. Deep integration with Vertex AI simplifies the MLOps lifecycle.

gpu-cloudgoogletpu
44C
Providergpu-compute

FluidStack

by FluidStack

FluidStack aggregates spare GPU capacity from data centers globally, providing an on-demand cloud GPU rental marketplace at competitive rates. It offers H100, A100, and RTX GPU clusters for training and inference with an API-driven provisioning model. FluidStack is used by AI startups for burst compute and cost-efficient long-running training jobs.

gpu-cloudmarketplaceh100
44C
Providerllm-providers

Fireworks AI

by Fireworks AI

Fireworks AI specializes in fast, cost-efficient inference for open-source models including Llama, Mistral, and Mixtral families. It offers serverless and on-demand deployment with a focus on production reliability. Fireworks provides an OpenAI-compatible API and supports compound AI systems through its FireFunction tool-calling models.

inferenceopen-sourcefast
44C
Providerllm-providers

DeepSeek

by DeepSeek

DeepSeek is a Chinese AI lab that has released competitive open-weight models rivaling frontier closed models at dramatically lower training costs. DeepSeek R1 and V3 demonstrated that mixture-of-experts and reinforcement learning at scale can close the gap with GPT-4-class models. Models are freely available via Hugging Face and a low-cost API.

llmdeepseekopen-weights
44C
Providergpu-compute

CoreWeave

by CoreWeave

CoreWeave is a specialized cloud infrastructure provider built exclusively for GPU-intensive AI and ML workloads. It offers on-demand and reserved access to NVIDIA H100, A100, and H200 clusters with high-bandwidth InfiniBand networking. CoreWeave is trusted by AI labs and enterprises for large-scale model training and inference at competitive pricing.

gpu-cloudh100a100
44C
Providerllm-providers

Cohere

by Cohere

Cohere is an enterprise-focused AI company specializing in language models optimized for business applications including search, retrieval-augmented generation, and text classification. Its Command and Embed model families are widely used in enterprise RAG pipelines. Cohere offers private cloud and on-premises deployment options alongside its API.

llmembeddingsrag
44C
Providergpu-compute

Cerebras Inference

by Cerebras Systems

Cerebras provides cloud inference powered by its Wafer-Scale Engine (WSE) chip, delivering some of the highest token throughput for large language models. Cerebras Inference serves Llama and other open-weight models with hardware-level advantages that push tokens-per-second beyond what GPU clusters can achieve for certain model sizes.

inferencewsehigh-throughput
44C
Providergpu-compute

Baseten

by Baseten

Baseten is a model inference platform for deploying ML models to production with high performance and reliability. It specializes in low-latency serving of open-source LLMs and diffusion models with features like cascade batching, LoRA serving, and speculative decoding. Baseten targets teams that need production-grade inference without managing Kubernetes.

inferencegpu-cloudproduction
44C
Providergpu-compute

Azure (GPU)

by Microsoft Azure

Microsoft Azure provides ND H100 v5 and NCv3 GPU instances for AI model training and inference, with tight integration into Azure AI Studio, Azure OpenAI Service, and GitHub Copilot infrastructure. Azure is the preferred cloud for enterprises with Microsoft licensing agreements and provides access to OpenAI models via Azure OpenAI Service.

gpu-cloudazuremicrosoft
44C
Providergpu-compute

AWS EC2 (GPU)

by Amazon Web Services

Amazon EC2 provides GPU instances (P4, P5, G5, Inf2 families) for AI/ML training and inference at any scale. As the largest cloud provider, AWS offers the broadest ecosystem of managed ML services including SageMaker, Bedrock, and Trainium-based Inf2 instances. Best for enterprises requiring deep AWS integration and compliance certifications.

gpu-cloudawsenterprise
44C
Providerllm-providers

Anthropic

by Anthropic

Anthropic is an AI safety company and the creator of the Claude model family. Its API provides access to Claude Opus, Sonnet, and Haiku variants, with strong support for long-context reasoning, tool use, and multi-agent workflows via the Claude Agent SDK. Anthropic publishes extensive safety research and pioneered Constitutional AI alignment techniques.

llmclaudesafety
44C
Providerllm-providers

Alibaba / Qwen

by Alibaba Cloud

Alibaba Cloud's Qwen team releases the Qwen model series, a family of open-weight and API-accessible language models covering dense and mixture-of-experts architectures. Qwen models are competitive on multilingual and coding benchmarks and are available through Alibaba Cloud's DashScope API as well as Hugging Face for local deployment.

llmqwenmultilingual
44C
Providerllm-providers

AI21 Labs

by AI21 Labs

AI21 Labs is an Israeli AI company known for the Jamba model family, which uses a hybrid SSM-Transformer architecture for long-context efficiency. Its Wordtune product targets writing assistance while the API focuses on enterprise NLP tasks. Jamba 1.6 offers a unique balance of long-context window handling and low inference latency.

llmjambassm
44C
ProviderAI Business & Strategy

01.AI (Yi)

by 01.AI

01.AI is a Chinese AI startup founded by Kai-Fu Lee, creator of the Yi series of bilingual large language models. Yi models are released as open weights under permissive licenses and have demonstrated strong performance on multilingual benchmarks, positioning 01.AI as a key contributor to the open-source AI ecosystem.

ai-labfoundation-modelschinese
43.4C
Providerai-robotics

Figure AI

by Figure AI

Figure AI is building general-purpose humanoid robots designed to perform physical labor in warehouses, factories, and logistics environments, powered by a neural network trained with visual data and language models. Its Figure 02 robot, developed in partnership with BMW and backed by OpenAI, Microsoft, and NVIDIA, is one of the most advanced humanoid platforms commercially deployed.

humanoid-robotsroboticsembodied-ai
39.8D
ProviderAI Infrastructure

Lepton AI

by Lepton AI

Lepton AI provides a serverless cloud platform for running open-source AI models and custom workloads with a Pythonic SDK, eliminating infrastructure management overhead for ML teams. Founded by ex-Meta researchers, the platform supports fine-tuning, deployment, and monitoring of models with pay-per-use pricing.

mlopsserverlessinference
39.6D
ProviderAI Business & Strategy

Baichuan

by Baichuan

Baichuan Intelligence is a Chinese AI startup founded by Zhiyuan Wang, a former Sogou CEO, specializing in large language models with applications in healthcare and enterprise workflows. Its Baichuan2 series models are notable for strong Chinese language performance and vertical-specific fine-tuning capabilities.

ai-labfoundation-modelschinese
38.7D
Provider

Cerebras

by

AI compute provider with wafer-scale chips delivering record-breaking inference speeds for LLMs.

AIhardwareinference
38D
ProviderAI Business & Strategy

Inflection AI

by Inflection AI

Inflection AI was co-founded by Mustafa Suleyman (ex-DeepMind) and Reid Hoffman, initially building the Pi personal AI assistant. After a major leadership transition to Microsoft in 2024, the remaining company pivoted to enterprise AI services, offering its Inflection 3 model and AI consulting for large organizations.

ai-labenterprisefoundation-models
37.3D
Providerai-research

Mozilla AI

by Mozilla

Mozilla AI is a startup launched by the Mozilla Foundation to build open, trustworthy AI tools and advocate for responsible AI development as a counterweight to closed proprietary systems. The organization releases tools like Lumigator (LLM evaluation) and contributes to open-source AI infrastructure aligned with the open web.

open-sourceresponsible-ainonprofit
36.9D
Provider

Cerebras

by

AI compute provider with wafer-scale chips delivering record-breaking inference speeds for LLMs.

AIhardwareinference
0F
Hardwareai-hardware

AMD Instinct MI350X

by AMD

The AMD Instinct MI350X is a data center GPU designed for high-performance computing and AI workloads. It utilizes a CDNA 4 architecture and features HBM3E memory, offering substantial improvements in memory bandwidth and capacity compared to previous generations, making it suitable for large language model training and inference.

gpudata-centerai-accelerator
73.8B+
HardwareAI Infrastructure

NVIDIA RTX 4090

by NVIDIA

NVIDIA's flagship consumer GPU based on Ada Lovelace. Has become popular for local LLM inference and fine-tuning due to its 24GB GDDR6X memory and high performance-per-dollar ratio, enabling on-premise AI workloads without data center costs.

gpuconsumerworkstation
72.6B+
Hardwareai-hardware

AMD Instinct MI400A

by Advanced Micro Devices (AMD)

The AMD Instinct MI400A is a data center accelerator designed for high-performance computing and AI workloads. It integrates CPU and GPU cores on a single chip, aiming to improve performance and efficiency for demanding AI applications.

data-centeracceleratorhpc
72.2B+
Hardwareai-hardware

Cerebras Wafer Scale Engine 4 (WSE-4)

by Cerebras Systems

The Cerebras WSE-4 is the fourth generation wafer-scale processor designed specifically for AI compute. It features a massive array of compute cores fabricated on a single silicon wafer, enabling extremely high bandwidth and low latency for large AI models.

wafer-scaleai-acceleratorhigh-performance-computing
71.8B+
Hardwareai-hardware

AMD Instinct MI400 Series

by Advanced Micro Devices (AMD)

The AMD Instinct MI400 series is a family of data center GPUs designed for high-performance computing and AI workloads. It leverages AMD's CDNA 4 architecture and offers significant improvements in performance and energy efficiency compared to previous generations, targeting large-scale AI training and inference.

gpuai-acceleratordata-center
71.5B+
HardwareAI Infrastructure

NVIDIA DGX H100

by NVIDIA

The NVIDIA DGX H100 is a purpose-built AI supercomputer, serving as the foundational building block for large-scale AI infrastructure. It integrates eight H100 Tensor Core GPUs with high-speed NVLink interconnects, providing a turnkey solution for the most demanding AI training, inference, and data analytics workloads.

ai-supercomputerlarge-scale-trainingenterprise-ai
67.9B
Hardwareai-hardware

Tesla Dojo D2 Chip

by Tesla

The Tesla Dojo D2 chip is a custom-designed AI accelerator developed by Tesla for training large-scale neural networks used in autonomous driving. It is a key component of Tesla's Dojo supercomputer, aimed at improving the efficiency and speed of AI model training.

ai-acceleratorautonomous-drivingsupercomputer
67B
HardwareAI Infrastructure

NVIDIA B100

by NVIDIA

The NVIDIA B100 is a data center GPU based on the Blackwell architecture, succeeding the H100. It offers substantial performance improvements for AI training and inference, featuring a second-generation Transformer Engine with FP4 precision, and a fifth-generation NVLink interconnect for massive multi-GPU scaling.

gpuai-acceleratordata-center
65.8B
HardwareAI Infrastructure

NVIDIA Jetson AGX Orin

by NVIDIA

The NVIDIA Jetson AGX Orin is a high-performance System-on-Module (SoM) designed for edge AI and autonomous machines. It delivers up to 275 TOPS of AI performance, integrating an NVIDIA Ampere architecture GPU with Arm CPUs and deep learning accelerators for server-class computing in a power-efficient package.

edge-aiembedded-systemsrobotics-platform
65.5B
Hardwareai-hardware

Graphcore Bow Pod2024

by Graphcore

The Graphcore Bow Pod2024 is a modular AI compute system built for large-scale machine learning. It utilizes Graphcore's Intelligence Processing Units (IPUs) and is specifically engineered to accelerate sparse models, such as graph neural networks and large language models, in data center environments.

ipugraph-neural-networkssparse-models
64.8B
Hardwareai-hardware

Tenstorrent Wormhole GF12

by Tenstorrent

The Tenstorrent Wormhole GF12 is a high-performance AI accelerator built on GlobalFoundries' 12nm process. It features a grid of programmable Tensix cores, RISC-V CPUs, and a high-speed Ethernet fabric for direct chip-to-chip communication, enabling scalable systems for both AI training and inference workloads.

ai-acceleratorrisc-vdata-center
64.5B
Hardwareai-hardware

d-Matrix Corsair

by d-Matrix

The d-Matrix Corsair is an in-memory compute platform designed to accelerate AI inference workloads. It leverages analog compute to achieve high energy efficiency and low latency, targeting applications like recommendation engines and generative AI.

in-memory computeanalog computeinference
64.5B
HardwareAI Infrastructure

NVIDIA A10G

by NVIDIA

NVIDIA Ampere GPU optimized for graphics and inference workloads. Commonly deployed in AWS G5 instances, offering a cost-effective option for inference, graphics rendering, and video processing at cloud scale.

gpudata-centerinference
63.9B
HardwareAI Infrastructure

NVIDIA V100

by NVIDIA

NVIDIA Volta architecture GPU that introduced Tensor Cores to the data center, providing the first dedicated matrix multiply hardware for AI. Powered the first wave of transformer model training including BERT and GPT-2, and became the dominant AI training platform from 2017–2020.

gpudata-centertraining
63.6B
HardwareAI Infrastructure

NVIDIA L40S

by NVIDIA

The NVIDIA L40S is a universal data center GPU based on the Ada Lovelace architecture. It features 48GB of GDDR6 memory and combines powerful AI compute, graphics, and media acceleration capabilities, making it a versatile solution for a wide range of workloads from generative AI to professional visualization.

gpudata-centerinference
63.4B
HardwareAI Infrastructure

Apple M4 Ultra Neural Engine

by Apple

Apple M4 Ultra's 32-core Neural Engine capable of 38 TOPS, embedded in Apple's highest-end desktop and workstation chips. Combined with up to 192GB unified memory shared between CPU, GPU, and Neural Engine, it enables running large models locally on macOS with exceptional energy efficiency.

neural-engineedgeapple-silicon
62.1B
Hardwareai-hardware

Graphcore Bow Pod1024

by Graphcore

The Graphcore Bow Pod1024 is a supercomputing-scale AI system, delivering over 250 PetaFLOPS of AI compute. It leverages 1,024 Bow IPU processors linked by a high-bandwidth fabric, specifically engineered for training massive, next-generation AI models and complex graph analytics workloads at an unprecedented scale.

ipuai-hardwaresupercomputer
59.5C+
HardwareAI Infrastructure

NVIDIA GB200 NVL72

by NVIDIA

The NVIDIA GB200 NVL72 is a liquid-cooled, rack-scale system designed for exascale AI. It connects 36 Grace Blackwell Superchips, comprising 72 B200 GPUs and 36 Grace CPUs, via fifth-generation NVLink to function as a single massive GPU for training and inferencing on trillion-parameter models with unprecedented performance and energy efficiency.

gpudata-centertraining
58.8C+
HardwareAI Infrastructure

Google TPU v5p

by Google

Google's fifth-generation Tensor Processing Unit, the TPU v5p, is an AI accelerator designed for training and serving the largest AI models. It offers significant performance gains over its predecessor, featuring liquid cooling, 95 GB of HBM, and support for new data formats like MX4 for enhanced efficiency and scalability in massive pod configurations.

tpuai-acceleratorgoogle-cloud
58.7C+
HardwareAI Infrastructure

Google TPU v4

by Google

Google's fourth-generation TPU, used internally to train PaLM, LaMDA, and early Gemini models. Features 32GB HBM2 per chip and an optical circuit-switched ICI for flexible pod topology, enabling massive-scale distributed training.

tpudata-centertraining
58.5C+
HardwareAI Infrastructure

NVIDIA Jetson Orin NX

by NVIDIA

Compact Orin-based Jetson module delivering up to 100 TOPS in a small form factor. Targets robotics, drones, medical devices, and industrial edge AI applications requiring significant AI performance in constrained size, weight, and power envelopes.

gpuedgeembedded
58C+
HardwareAI Infrastructure

Google TPU v5e

by Google

Google's cost-efficient TPU variant optimized for inference and medium-scale training. Offers a better price-performance ratio than TPU v5p for serving workloads, with 16GB HBM2 per chip and excellent throughput for transformer inference.

tpudata-centerinference
57.1C+
HardwareAI Infrastructure

Google TPU v6 (Trillium)

by Google

Google's sixth-generation TPU, codenamed Trillium, delivering 4.7x compute improvement over TPU v5e. Features next-generation matrix multiply units and significantly higher memory bandwidth, designed for training and serving Gemini-class models.

tpudata-centertraining
53.7C+
HardwareAI Infrastructure

AWS Inferentia2

by AWS

AWS second-generation custom inference chip with 4x higher compute and 10x higher memory bandwidth than Inferentia1. Optimized for cost-efficient large-scale inference of transformer models with very high throughput and low latency.

ai-acceleratorinferenceaws
51.1C+
HardwareAI Infrastructure

NVIDIA P100

by NVIDIA

NVIDIA Pascal architecture GPU and the first to use HBM2 memory in a data center product. Delivered 10x deep learning performance over its predecessor and was the primary platform for training early deep learning models before the Volta generation.

gpudata-centertraining
50.4C+
HardwareAI Infrastructure

Google Tensor G4

by Google

Google's fourth-generation Tensor chip powering Pixel 9 smartphones. Features a dedicated TPU-derived neural core enabling on-device Gemini Nano inference for features like live captions, call screening, and generative AI photography without cloud latency.

neural-coremobileedge
49.8C
HardwareAI Infrastructure

Intel Meteor Lake NPU

by Intel

Intel's first dedicated Neural Processing Unit embedded in Core Ultra (Meteor Lake) laptop processors. Delivers 10+ TOPS for AI inferencing on Windows AI PCs, enabling background AI workloads like live captioning, noise suppression, and on-device LLM assistance without using GPU/CPU resources.

npuedgepc
48.9C
HardwareAI Infrastructure

AWS Trainium2

by AWS

AWS second-generation custom AI training chip delivering up to 4x performance improvement over Trainium. Designed specifically for training large language models on AWS, with tight integration with UltraCluster networking for scale-out training jobs.

ai-acceleratortrainingaws
48C
HardwareAI Infrastructure

Cerebras CS-3

by Cerebras

Cerebras Wafer Scale Engine 3 — the world's largest chip, spanning an entire silicon wafer. Contains 4 trillion transistors and 44GB of on-chip SRAM, eliminating off-chip memory bandwidth as a bottleneck for training large neural networks.

wafer-scaletraininginference
47.4C
HardwareAI Infrastructure

Google TPU v3

by Google

Google's third-generation TPU featuring liquid cooling to sustain higher clock speeds and 32GB HBM per chip. Doubled compute and memory versus TPU v2, enabling training of BERT, T5, and early large language models. Powered many foundational AI research papers at Google Brain and DeepMind.

tputraininginference
47.2C
HardwareAI Infrastructure

MediaTek Dimensity 9400 APU

by MediaTek

MediaTek Dimensity 9400's AI Processing Unit — the most powerful mobile NPU in Android smartphones. Delivers 50 TOPS for on-device AI with support for 13B parameter models on-device, enabling private, low-latency AI features for Android flagship devices.

apumobileedge
46.4C
Hardwareai-hardware

Google TPU v7 Ironwood

by Google

Google's TPU v7 Ironwood is the seventh generation of Google's custom Tensor Processing Units, designed for large-scale AI inference at hyperscaler capacity. Ironwood pods target serving frontier models like Gemini at Google's internal scale and are available to cloud customers via Google Cloud's TPU v7 instances.

googletpuinference
44C
Hardwareai-hardware

Google TPU v6e Trillium

by Google

Google TPU v6e Trillium is Google's sixth-generation TPU with 4x the compute and 3x the memory bandwidth per chip compared to v5e. Trillium is generally available on Google Cloud for both training and inference workloads, offering the most cost-efficient TPU option for teams training Gemma and other open models on Google Cloud.

googletputraining
44C
Hardwareai-hardware

SambaNova SN40L RDU

by SambaNova Systems

SambaNova's SN40L is a Reconfigurable Dataflow Unit designed for high-throughput LLM inference and training. Its tiered memory architecture — combining on-chip SRAM with off-chip DRAM — allows serving multiple large models simultaneously with industry-leading batch throughput. The SN40L is the hardware underlying SambaNova Cloud's inference API.

sambanovarduinference
44C
Hardwareai-hardware

NVIDIA RTX 5090

by NVIDIA

The NVIDIA RTX 5090 is NVIDIA's flagship consumer/prosumer GPU in the Blackwell generation, featuring 32GB GDDR7 memory and massive compute for local AI inference and fine-tuning. It allows running 70B quantized models on a single consumer GPU and is the premier choice for developers who need frontier local model capability in a workstation.

nvidiablackwellconsumer-gpu
44C
Hardwareai-hardware

NVIDIA H200

by NVIDIA

The NVIDIA H200 is a Hopper-generation GPU with 141GB of HBM3e memory — nearly double the H100's bandwidth — targeting inference workloads for very large models. The additional memory enables running 70B+ parameter models on fewer GPUs, significantly reducing the cost per inference token for large-scale deployments.

nvidiahoppergpu
44C
Hardwareai-hardware

NVIDIA H100

by NVIDIA

The NVIDIA H100 Hopper GPU is the dominant AI training and inference accelerator in production deployments as of 2024–2025. With 80GB HBM3 memory and NVLink 4 support, it delivers 4x the compute of the A100. The H100 SXM5 variant connects to 8-GPU NVL8 nodes via NVSwitch for large model training runs.

nvidiahoppergpu
44C
Hardwareai-hardware

NVIDIA GB200 NVL72

by NVIDIA

The GB200 NVL72 is NVIDIA's rack-scale AI system combining 36 Grace CPUs and 72 Blackwell B200 GPUs via NVLink interconnect. It delivers up to 1.44 ExaFLOPS of AI compute in a single rack, targeting hyperscaler-class training of frontier models. The NVL72 represents a fundamental shift from server-level to rack-level GPU system design.

nvidiablackwellrack-scale
44C
Hardwareai-hardware

NVIDIA B200

by NVIDIA

The NVIDIA B200 is the first Blackwell-architecture data center GPU, delivering 2.5x the training throughput and 5x the inference performance of the H100. With 192GB of HBM3e memory and NVLink 5 interconnects, it is designed for training and serving trillion-parameter models. The B200 anchors NVIDIA's Blackwell product generation.

nvidiablackwellgpu
44C
Hardwareai-hardware

NVIDIA A100

by NVIDIA

The NVIDIA A100 Ampere GPU remains widely deployed in cloud and on-premises AI infrastructure for training and inference. With 40GB or 80GB HBM2e memory variants and MIG (Multi-Instance GPU) support for partitioning into up to 7 isolated GPU instances, the A100 is the proven workhorse of many production AI deployments.

nvidiaamperegpu
44C
Hardwareai-hardware

Intel Gaudi 3

by Intel

Intel Gaudi 3 is Intel's AI training and inference accelerator designed as a cost-competitive alternative to NVIDIA H100. It features 128GB of HBM2e memory and 24 100GbE RoCE ports for scale-out connectivity. Gaudi 3 is supported by Intel's Optimum Habana software stack and available via major cloud providers and on-premises.

intelgauditraining
44C
Hardwareai-hardware

Groq LPU

by Groq

Groq's Language Processing Unit (LPU) is a deterministic ASIC architecture optimized for sequential transformer inference, eliminating the memory-bandwidth bottlenecks of GPU-based serving. Groq LPU clusters deliver measured token generation speeds of 500+ tokens/second for Llama-class models, significantly outpacing GPU inference for latency-critical applications.

groqlpuasic
44C
Hardwareai-hardware

Cerebras WSE-3

by Cerebras Systems

The Cerebras Wafer-Scale Engine 3 (WSE-3) is the world's largest chip, containing 4 trillion transistors on a single 46,225 mm² silicon wafer. Its architecture eliminates the memory bandwidth bottlenecks of conventional GPU clusters for large model inference, achieving industry-leading tokens-per-second throughput for models up to 70B parameters.

cerebraswafer-scaleasic
44C
Hardwareai-hardware

AWS Trainium3

by Amazon Web Services

AWS Trainium3 is Amazon's third-generation custom ML training chip, offering significant improvements in training throughput and energy efficiency over Trainium2. Trainium3 instances are available through Amazon SageMaker and EC2, targeting cost-efficient training of large language models for AWS-native AI development teams.

awstrainiumtraining
44C
Hardwareai-hardware

AMD MI325X

by AMD

The AMD Instinct MI325X is an updated Instinct GPU with 288GB of HBM3e memory and improved memory bandwidth over the MI300X. It targets inference workloads for the largest frontier models and positions AMD competitively against the NVIDIA H200 in memory-bound inference scenarios.

amdinstinctgpu
44C
Hardwareai-hardware

AMD MI300X

by AMD

The AMD Instinct MI300X is AMD's flagship AI accelerator featuring 192GB of HBM3 memory, the highest of any GPU when released. This massive memory capacity makes it compelling for inference of 70B+ parameter models and has led to adoption by Microsoft Azure, Oracle, and major AI labs as an H100 alternative.

amdinstinctgpu
44C
HardwareAI Infrastructure

SambaNova SN40L

by SambaNova

SambaNova's Reconfigurable Dataflow Unit with a three-tier memory hierarchy: on-chip scratchpad, on-package HBM, and off-package DRAM. The unique architecture enables running multiple models simultaneously and excels at efficient mixture-of-experts inference.

rduinferencetraining
43.4C
HardwareAI Infrastructure

Google TPU v2

by Google

Google's second-generation TPU and the first available on Google Cloud. Added training capability (v1 was inference-only), HBM memory for gradient storage, and introduced the concept of TPU Pods — interconnected multi-chip systems enabling distributed training at scale.

tputraininginference
42.9C
HardwareAI Infrastructure

Google TPU v1

by Google

Google's first Tensor Processing Unit — the seminal custom AI ASIC that launched the modern era of purpose-built ML hardware. Deployed in 2015 and described publicly in a landmark 2017 ISCA paper, it ran inference for Google Search, Maps, and Translate, delivering 30x performance-per-watt vs contemporary GPUs.

tpuinferencegoogle
42.5C
HardwareAI Infrastructure

Qualcomm Cloud AI 100

by Qualcomm

Qualcomm's data center AI inference accelerator designed for power-efficient deployment. Based on the same AI architecture as Snapdragon, it delivers competitive inference performance with a focus on power efficiency metrics (TOPS/W) for hyperscale deployments.

ai-acceleratorinferencequalcomm
38.9D
HardwareAI Infrastructure

NVIDIA K80

by NVIDIA

NVIDIA Kepler-based dual-GPU data center card that became the first widely available cloud GPU for deep learning. Google Colab's original free tier ran on K80s, making it instrumental in democratizing access to GPU-accelerated deep learning for researchers and students worldwide.

gpudata-centertraining
38D
HardwareAI Infrastructure

Graphcore Bow IPU

by Graphcore

Graphcore's Bow Intelligence Processing Unit using 3D wafer-on-wafer technology. Features a massively parallel MIMD architecture with 1472 processor cores and 900MB on-chip SRAM, designed for graph-structured AI workloads and sparse computation.

iputraininginference
37.8D
HardwareAI Infrastructure

Graphcore MK2 IPU (Colossus GC200)

by Graphcore

Graphcore's second-generation Colossus GC200 Intelligence Processing Unit. Featured 1472 IPU-Cores with 900MB on-chip SRAM and introduced the Bulk Synchronous Parallel with Staleness (BSS) execution model. Preceded the Bow IPU and established Graphcore's approach to graph-native, SRAM-centric AI compute.

iputraininginference
33.3D
HardwareAI Infrastructure

Tenstorrent Grayskull

by Tenstorrent

Tenstorrent's first commercial AI accelerator co-designed by Jim Keller. Built on a RISC-V Tensix processor architecture with a mesh NoC, enabling programmable AI compute. Notable for its open software stack and developer-friendly approach to hardware AI.

ai-acceleratorinferencetraining
32.2D
HardwareAI Infrastructure

Intel Nervana NNP-T1000

by Intel

Intel Nervana Neural Network Processor for Training — Intel's attempt at a purpose-built AI training chip following the 2016 acquisition of Nervana Systems. Featured 32GB HBM2 and a novel MCDRAM+HBM architecture. Discontinued in 2020 as Intel pivoted focus to the Habana Gaudi line.

ai-acceleratortrainingintel
25.4D
Integrationai-integrations

Databricks Feature Store - MLflow Integration

by Databricks

The Databricks Feature Store provides a centralized repository for managing and sharing machine learning features. Its integration with MLflow enables seamless tracking of feature usage in ML models, ensuring reproducibility and simplifying model deployment workflows by automatically packaging feature dependencies.

feature-storemlopsmodel-tracking
82.8A
Integrationai-integrations

PyTorch Geometric

by PyTorch

PyTorch Geometric (PyG) is a library built upon PyTorch to facilitate the development of graph neural networks (GNNs). It provides data handling utilities, learning methods on graphs and other irregular structures, and benchmark datasets for various graph-related tasks.

graph neural networkspytorchgeometric deep learning
81.8A
Integrationai-integrations

TensorFlow Quantum

by Google

TensorFlow Quantum (TFQ) is a library for building quantum machine learning models. It allows researchers to construct and train hybrid quantum-classical models by leveraging TensorFlow's infrastructure for classical computation and quantum simulators or quantum hardware for quantum computation.

quantum computingmachine learningtensorflow
79.2B+
IntegrationAI Tools & APIs

LangChain + OpenAI

by LangChain

Native integration between LangChain and OpenAI's GPT models. Provides seamless access to chat completions, embeddings, and function calling through LangChain's unified interface. Supports streaming, tool use, and structured output via the langchain-openai package.

langchainopenaillm-integration
78.4B+
Integrationai-integrations

MLflow Databricks Integration

by Databricks

The MLflow integration with Databricks provides a managed MLflow service within the Databricks platform. It simplifies the process of tracking experiments, managing models, and deploying them to production by leveraging Databricks' scalable infrastructure and collaborative environment.

mlopsmodel trackingexperiment management
77.2B+
IntegrationAI for Code

GitHub Copilot + VS Code

by GitHub

GitHub Copilot integrates into VS Code as a first-party extension, delivering inline ghost-text completions, multi-line suggestions, and a dedicated Copilot Chat panel for conversational refactoring, test generation, and documentation. It leverages Codex and GPT-4 models under the hood, with workspace-aware context from open tabs and the current file.

idevscodecode-completion
76.4B+
IntegrationAI Infrastructure

Meta + HuggingFace (Llama)

by Meta AI

Official Meta Llama model weights distributed through the HuggingFace Hub under Meta's community license. Covers Llama 3.1, 3.2, and 3.3 variants from 1B to 405B parameters with full transformers, TGI, and vLLM compatibility. HuggingFace serves as the primary public distribution channel for Meta's open-weight releases.

metahuggingfacellama
75.8B+
IntegrationAI Tools & APIs

LangChain + Anthropic

by LangChain

Official LangChain integration for Anthropic's Claude model family. Exposes Claude's extended context window, vision capabilities, and tool use through LangChain's standard chat model interface. Supports streaming and the full Messages API via the langchain-anthropic package.

langchainanthropicclaude
73.4B+
IntegrationAI Infrastructure

Pinecone + OpenAI Embeddings

by Pinecone

Direct integration pairing Pinecone's managed vector database with OpenAI's text-embedding-3 models. Commonly used pattern for production RAG systems where OpenAI generates dense vectors and Pinecone handles ANN retrieval at scale. Supports serverless and pod-based indexes with metadata filtering.

pineconeopenaiembeddings
73.2B+
IntegrationAI Tools & APIs

W&B + Hugging Face

by Weights & Biases

Weights & Biases integrates directly into Hugging Face Trainer and PEFT via a built-in report_to callback, logging training loss curves, GPU utilization, gradient norms, and hyperparameters to shareable W&B runs. The integration supports sweep-based hyperparameter optimization and artifact versioning for model checkpoints.

experiment-trackingfine-tuninghuggingface
72.5B+
Integrationai-integrations

TensorFlow Privacy

by Google

TensorFlow Privacy is a library that makes it easier to train machine learning models with differential privacy. It provides TensorFlow optimizers that implement differentially private stochastic gradient descent (DP-SGD), allowing developers to protect the privacy of training data while still achieving good model performance.

differential privacyprivacy-preserving MLtensorflow
72.2B+
IntegrationAI Infrastructure

vLLM + NVIDIA

by vLLM Project

vLLM's NVIDIA backend leverages CUDA kernels, FlashAttention-2, and PagedAttention to deliver state-of-the-art throughput for LLM inference on NVIDIA A100, H100, and H200 GPUs. The integration supports tensor and pipeline parallelism across multiple GPUs, FP8/FP16/BF16 quantization, and CUDA graph capture for minimal per-token latency.

inferencenvidiagpu
72.1B+
IntegrationAI Tools & APIs

LangSmith + LangChain

by LangChain Inc.

LangSmith provides first-class tracing and evaluation for LangChain pipelines, capturing every LLM call, chain step, and tool invocation with full prompt/response payloads. Teams use the integration to debug production failures, build evaluation datasets, and run automated regression tests against golden traces.

observabilitytracingllm-ops
71.7B+
IntegrationAI Infrastructure

OpenAI + Azure OpenAI Service

by Microsoft Azure

Microsoft Azure's managed deployment of OpenAI models including GPT-4o, o1, and DALL-E 3 with enterprise SLAs, private networking, and regional data residency. Provides the same OpenAI API surface with additional Azure IAM, VNet integration, content filtering, and Azure Monitor observability.

openaiazureenterprise-ai
71.5B+
Integrationai-integrations

Databricks Feature Store - Feast Integration

by Databricks

The Databricks Feature Store integrates with Feast, an open-source feature store, to streamline feature engineering and management for machine learning workflows. This integration allows users to define, store, and serve features consistently across training and inference, reducing data skew and improving model performance within the Databricks environment.

feature-storefeastmlops
70.8B+
IntegrationAI Tools & APIs

LangChain + Pinecone

by LangChain

LangChain VectorStore integration for Pinecone's managed vector database. Enables similarity search, MMR retrieval, and metadata filtering within LangChain RAG pipelines. Supports both serverless and pod-based Pinecone indexes via the langchain-pinecone package.

langchainpineconevector-store
70.2B+
Integrationai-integrations

Hugging Face Optimum Intel Extension

by Hugging Face / Intel

Hugging Face Optimum Intel Extension is a toolkit designed to accelerate inference and training of transformer models on Intel CPUs and GPUs. It leverages Intel's Deep Learning Boost (DL Boost) and other hardware features to optimize model performance within the Hugging Face ecosystem.

hugging faceinteloptimization
69.8B
IntegrationAI for Code

Cursor + OpenAI

by Anysphere

Cursor is a VS Code fork that uses OpenAI's GPT-4 and o-series models as its reasoning engine for multi-file edits, semantic codebase search, and an agent mode that can autonomously implement features across the entire repository. It offers a Composer panel for multi-file diffs and a codebase-aware chat that indexes the project with embeddings for precise retrieval.

ideai-editoropenai
69.6B
IntegrationAI Infrastructure

Anthropic + AWS Bedrock

by Amazon Web Services

Anthropic's Claude model family available through Amazon Bedrock's fully managed foundation model service. Provides serverless inference with pay-per-token pricing, AWS IAM authentication, VPC endpoint support, and model evaluation tools. Claude 3.5 Sonnet, Haiku, and Opus are all available through the Bedrock API.

anthropicawsbedrock
68.2B
IntegrationAI Infrastructure

TGI + Hugging Face Hub

by Hugging Face

Text Generation Inference (TGI) by Hugging Face is a production-grade inference server that directly loads models from the Hugging Face Hub via model IDs, handling shard downloading, quantization, and OpenAI-compatible endpoint serving in a single Docker command. It implements continuous batching, speculative decoding, and FlashAttention for optimal throughput on Ampere and Hopper GPUs.

inferencehuggingfacetext-generation
68B
IntegrationAI Infrastructure

Ollama + Docker

by Ollama

Ollama's official Docker image provides a self-contained environment for running large language models locally. It enables developers to easily deploy and manage quantized GGUF models using familiar container orchestration tools like Docker Compose and Kubernetes, supporting GPU acceleration and an OpenAI-compatible API.

local-inferencedockerself-hosted
67.5B
Integrationmcp-servers

MCP + GitHub

by Anthropic / GitHub

Integrates the MCP environment with GitHub's REST and GraphQL APIs, enabling programmatic control over software development workflows. Users can manage repositories, track issues, review pull requests, and search code directly from an agent context, streamlining development tasks without switching tools.

mcpgithubgit
67.5B
IntegrationAI for Code

GitHub Copilot + JetBrains

by GitHub

The GitHub Copilot plugin for JetBrains IDEs integrates AI-powered code completion and a conversational chat panel directly into the editor. It provides inline, ghost-text suggestions and mirrors the functionality of the VS Code extension, adapting to JetBrains' native keymaps and user interface for a seamless experience across IDEs like IntelliJ IDEA and PyCharm.

ai-code-assistantcode-completioncopilot
67B
Integrationmcp-servers

MCP + Filesystem

by Anthropic

The Anthropic MCP Filesystem server allows AI agents, like Claude, to interact directly with a user's local files. It exposes a secure API for reading, writing, listing, and searching files and directories, enabling agents to perform tasks such as code analysis, data processing, and file organization on the host machine.

mcpfilesystemfile-access
66B
IntegrationAI Tools & APIs

LangChain + Chroma

by LangChain

LangChain VectorStore integration for Chroma, the open-source AI-native embedding database. Ideal for local development and prototyping with zero infrastructure setup. Supports persistent and in-memory collections, metadata filtering, and relevance-scored retrieval via langchain-chroma.

langchainchromavector-store
65.6B
IntegrationAI Tools & APIs

LangChain + Google AI

by LangChain

This integration connects the LangChain framework with Google's advanced AI services, including the Gemini API via Google AI Studio and models on Vertex AI. It enables developers to build sophisticated applications leveraging multimodal capabilities for processing text and images, advanced function calling for tool use, and grounding responses with Google Search for accuracy.

langchaingooglegemini
65.1B
IntegrationAI Infrastructure

Google AI + Vertex AI

by Google Cloud

Vertex AI is Google Cloud's managed machine learning platform for deploying and scaling AI applications. It provides an enterprise-grade environment for using Google's foundation models like Gemini and PaLM, adding MLOps tooling, security controls, and deep integration with the Google Cloud ecosystem. This includes features like model tuning, evaluation, and grounding with Google Search.

google-cloudvertex-aigenerative-ai
64.6B
IntegrationAI Tools & APIs

LangChain + HuggingFace

by LangChain

This integration connects LangChain with the HuggingFace ecosystem, enabling the use of thousands of open-source models. It allows developers to call models via the HuggingFace Inference API, run local inference using the `transformers` library, and generate embeddings, all within LangChain's structured framework for building complex LLM applications.

langchain-integrationhuggingfaceopen-source-llm
64.3B
IntegrationAI Infrastructure

TensorRT-LLM + NVIDIA Triton

by NVIDIA

TensorRT-LLM optimizes large language models into fused CUDA kernels, while the Triton Inference Server orchestrates serving. Together, they form NVIDIA's production stack for maximizing token throughput and minimizing latency on datacenter GPUs, enabling high-performance, scalable LLM inference.

inference-optimizationllm-servingnvidia
63.8B
Integrationagent-frameworks

LangGraph + LangSmith

by LangChain Inc.

The LangGraph and LangSmith integration provides built-in observability for stateful agent graphs. It automatically captures every node execution, state change, and tool call as a structured trace in LangSmith, enabling deep, step-by-step debugging, performance analysis, and regression testing of complex agent workflows.

agentslanggraphlangsmith
63.8B
Integrationagent-frameworks

CrewAI + LangChain

by CrewAI / LangChain

This integration enables CrewAI agents to leverage the entire LangChain tool ecosystem. CrewAI orchestrates multi-agent workflows by assigning roles and delegating tasks, while LangChain provides the foundational tools for capabilities like web search, code execution, vector store retrieval, and API connectivity.

agentscrewailangchain
63.7B
IntegrationAI Infrastructure

Ray Serve + GCP

by Anyscale

Ray Serve deploys scalable model serving applications on Google Cloud Platform using GKE and Vertex AI infrastructure, with Ray's distributed runtime managing replica placement, traffic splitting, and resource scheduling across GPU node pools. The integration supports multi-model serving graphs, A/B rollouts, and seamless scale-to-zero on GCP Spot instances for cost optimization.

deploymentgcpkubernetes
62.5B
Integrationrag-pipelines

LlamaParse + LlamaIndex

by LlamaIndex

LlamaParse is a proprietary parsing service for complex documents like PDFs with embedded tables and charts. Its first-party integration with the open-source LlamaIndex framework allows developers to directly ingest parsed, structured objects (Nodes) into advanced Retrieval-Augmented Generation (RAG) pipelines, preserving the original document's rich context.

ragllamaparsellamaindex
62.1B
IntegrationAI Tools & APIs

Helicone + OpenAI

by Helicone

Helicone is an observability platform for LLMs that acts as a proxy for the OpenAI API. It enables developers to monitor usage, track costs, and optimize performance with minimal code changes. Key features include real-time dashboards, request-level caching, rate-limiting, and detailed analytics.

llm-observabilityapi-proxyopenai
61.9B
Integrationmcp-servers

MCP + Slack

by Anthropic / Slack

This integration connects MCP-compatible AI agents, such as Claude, directly to a Slack workspace. It enables programmatic control over Slack functionalities, allowing agents to read channel histories, post messages, manage channels, and look up user information. The connection is authenticated using a Slack Bot token for secure, automated communication.

mcpslackmessaging
61.5B
Integrationmcp-servers

MCP + Brave Search

by Anthropic / Brave

An integration that connects the Multi-agent Control Plane (MCP) with Brave's independent search index. It equips AI agents, like Claude, with tools for real-time web, local, and news searches, offering a privacy-focused alternative to Google and Bing for data retrieval and grounding.

mcpbrave-searchweb-search
61.5B
IntegrationAI Tools & APIs

LangChain + Weaviate

by LangChain

LangChain integration for Weaviate's open-source vector database. Supports hybrid search (BM25 + vector), multi-tenancy, and generative search modules within LangChain chains and agents. Connects via the Weaviate Python client inside the langchain-weaviate package.

langchainweaviatevector-store
61.3B
IntegrationAI Tools & APIs

Langfuse + LlamaIndex

by Langfuse

Langfuse integrates with LlamaIndex to provide open-source observability for LLM applications. A simple callback handler captures detailed traces of query engines, retrievers, and LLM calls. This data, including token usage, latency, and custom scores, is visualized in a self-hostable dashboard for comprehensive monitoring.

observabilitytracingopen-source
61B
Integrationmcp-servers

MCP + Puppeteer

by Anthropic

Official MCP Puppeteer server providing headless Chrome browser control to MCP clients. Exposes tools for page navigation, element interaction, form filling, screenshot capture, and JavaScript execution, enabling Claude to automate complex web workflows that require a real browser environment.

mcppuppeteerbrowser-automation
60.4B
Integrationagent-frameworks

AutoGen + Azure OpenAI

by Microsoft

Integrate the AutoGen multi-agent framework with Azure OpenAI Service to build sophisticated, enterprise-grade AI applications. This connector enables developers to leverage Azure's security features, including RBAC and private endpoints, while using all standard AutoGen agents like AssistantAgent and UserProxyAgent for complex, collaborative tasks.

autogenazure-openaimulti-agent-systems
60.4B
IntegrationAI for Code

Tabnine + VS Code

by Tabnine

Tabnine's VS Code extension provides AI-powered code completions, including whole-line and full-function suggestions. It is designed for enterprises with strict privacy and data-residency needs, offering on-premise or private cloud deployment options. The AI can be trained on a team's specific codebase for highly relevant completions.

idevscodecode-completion
59.8C+
IntegrationAI for Code

Cline + VS Code

by Community

Cline is an open-source VS Code extension that provides an AI agent with direct access to the IDE's environment. It enables multi-step agentic workflows by allowing the AI to use the file system, terminal, and an integrated browser. The extension supports various models and includes a human-in-the-loop approval process for safety.

ide-extensionvscodeagentic-coding
59.7C+
Integrationrag-pipelines

LlamaIndex + Qdrant

by LlamaIndex / Qdrant

Native LlamaIndex vector store adapter for Qdrant, enabling index construction, similarity search, and filtered retrieval over Qdrant collections. Supports both in-memory and hosted Qdrant deployments with payload-based metadata filtering.

ragllamaindexqdrant
59.4C+
Integrationrag-pipelines

Unstructured + Pinecone

by Unstructured / Pinecone

This integration provides a direct pipeline from Unstructured's data transformation service to the Pinecone vector database. It automates extracting, cleaning, and chunking data from documents like PDFs and DOCX, then embeds and indexes the content into a Pinecone namespace for use in RAG applications.

ragdocument-parsingvector-store
59.3C+
Integrationmcp-servers

MCP + PostgreSQL

by Anthropic

This integration provides a secure, read-only connection to a PostgreSQL database within the MCP environment. It allows agents to perform database introspection, such as listing schemas and describing tables. A key feature is its ability to facilitate natural-language-to-SQL workflows, enabling users to ask questions in plain English and have them translated into safe, read-only SELECT queries for execution.

mcppostgresqldatabase
59.3C+
IntegrationAI Tools & APIs

LangChain + Ollama

by LangChain

Integrate LangChain with Ollama for fully local LLM inference. This allows developers to use models like Llama 3 and Mistral on their own hardware, ensuring data privacy by eliminating external API calls. It's ideal for building offline-capable, privacy-sensitive applications.

langchainollamalocal-llm
59.3C+
IntegrationAI Tools & APIs

Arize Phoenix + LangChain

by Arize AI

Arize Phoenix integrates with LangChain to provide deep observability for LLM applications. By leveraging OpenTelemetry, it captures and streams traces for chains, agents, and retrievers to a local UI or the Arize cloud. This enables developers to debug applications, detect embedding drift, score retrieval quality, and analyze hallucinations at the span level.

llmopsobservabilityml-monitoring
59.3C+
IntegrationAI Tools & APIs

Portkey + Multi-Provider

by Portkey

Portkey's AI gateway unifies over 200 LLM providers through a single OpenAI-compatible API. It enables automatic fallbacks, load balancing, and semantic caching to improve reliability and performance. The platform provides full observability, capturing detailed cost, latency, and metadata for every request.

ai-gatewayllm-opsmulti-provider
59.2C+
IntegrationAI Tools & APIs

LangChain + Mistral AI

by LangChain

This integration connects the LangChain framework with Mistral AI's suite of models, including Mistral Large and Codestral. It enables developers to build sophisticated applications by leveraging Mistral's capabilities like function calling, JSON mode, and streaming within LangChain's structured environment for creating agents and chains.

langchainmistralfunction-calling
59.2C+
IntegrationAI Infrastructure

BentoML + AWS

by BentoML

BentoML streamlines deploying machine learning models to the AWS cloud. It packages models and their inference logic into standardized containers, enabling one-command deployment to services like SageMaker, EC2, and ECS. The platform automates production concerns such as auto-scaling, batching, and monitoring.

mlopsmodel-deploymentmodel-serving
58.7C+
IntegrationAI for Code

Windsurf + Anthropic

by Codeium

Windsurf (by Codeium) is an AI-native IDE that integrates Anthropic's Claude models as the backbone of its Cascade agent, which autonomously plans and executes multi-step coding tasks with real-time file and terminal access. The Anthropic integration powers deep context awareness across large codebases and supports long-horizon agent tasks with coherent state tracking.

ideai-editoranthropic
58.6C+
Integrationagent-frameworks

Claude Agent SDK + MCP

by Anthropic

Anthropic's Claude Agent SDK ships with native Model Context Protocol (MCP) client support, allowing Claude-powered agents to connect to any MCP server and use its exposed tools, resources, and prompts. The integration bridges Claude's tool-use capabilities with the open MCP ecosystem for plug-and-play external integrations.

agentsanthropicclaude
58.2C+
IntegrationAI Tools & APIs

LangChain + Cohere

by LangChain

LangChain integration for Cohere's enterprise AI platform. Provides access to Command models for generation, Embed v3 for multilingual embeddings, and the Rerank API for RAG pipeline precision improvement. Available via the langchain-cohere package with first-class reranker support.

langchaincoherereranking
57.7C+
IntegrationAI for Code

Sourcegraph + Cody

by Sourcegraph

Sourcegraph Cody combines enterprise-grade code search with an AI coding assistant, letting developers ask questions grounded in the entire codebase indexed by Sourcegraph. The integration uses Sourcegraph's precise code intelligence (SCIP) as a retrieval layer for Cody's Claude-powered chat, delivering context-accurate answers across mono-repos with millions of files.

idecode-searchcody
57.7C+
Integrationmcp-servers

MCP + Google Drive

by Anthropic / Google

Official MCP Google Drive server granting MCP clients access to Drive file listings, search, and document content reading via OAuth 2.0. Supports Docs, Sheets, Slides, and plain files, enabling agents to retrieve and reason over cloud-stored enterprise documents.

mcpgoogle-drivegdocs
57.4C+
IntegrationAI Tools & APIs

Groq + LangChain

by Groq

LangChain chat model integration for Groq's Language Processing Unit (LPU) inference API. Enables ultra-low-latency LLM calls within LangChain chains and agents with first-token latency under 100ms. Supports Llama 3, Mixtral, and Gemma models served on Groq hardware via the langchain-groq package.

groqlangchainfast-inference
57.4C+
IntegrationAI for Code

Continue + VS Code

by Continue Dev

Continue is an open-source AI code assistant for VS Code that supports any LLM through a flexible config file, covering inline completions, chat, edit mode, and custom slash commands. Its context providers system lets developers include files, docs, web search results, and terminal output in every prompt, making it highly adaptable to team-specific workflows.

idevscodeopen-source
57.2C+
IntegrationAI Infrastructure

Chroma + HuggingFace

by Chroma

Chroma's built-in embedding function for HuggingFace's sentence-transformers library. Enables fully local embedding generation and vector storage without any API keys. Supports hundreds of pre-trained models from the HuggingFace Hub including all-MiniLM, BGE, and E5 variants.

chromahuggingfacelocal-embeddings
56.2C+
IntegrationAI Infrastructure

Qdrant + LlamaIndex

by Qdrant

LlamaIndex VectorStore integration for Qdrant's high-performance vector search engine. Exposes Qdrant's payload filtering, sparse-dense hybrid search, and collection management through LlamaIndex's standard index and query engine abstractions for advanced RAG pipelines.

qdrantllamaindexvector-store
55.9C+
IntegrationAI Infrastructure

DeepSeek + Together AI

by Together AI

DeepSeek's open-weight models including DeepSeek-V3 and DeepSeek-R1 served through Together AI's inference cloud at competitive token prices. Provides an OpenAI-compatible API endpoint, enabling drop-in substitution for cost-sensitive workloads. Together AI's custom GPU kernels deliver high throughput for DeepSeek's MoE architecture.

deepseektogether-aiinference-provider
55.8C+
IntegrationAI Tools & APIs

Arize Phoenix + LlamaIndex

by Arize AI

Arize Phoenix instruments LlamaIndex query pipelines with OpenTelemetry spans, exposing retrieval precision, reranker performance, and LLM generation quality in a local-first UI. The integration is particularly valuable for RAG applications where diagnosing retrieval failures requires joint analysis of embeddings, chunks, and generation outputs.

observabilityragllamaindex
55.4C+
Integrationrag-pipelines

Firecrawl + LangChain

by Firecrawl / LangChain

LangChain document loader built on Firecrawl's web crawling and scraping API, transforming live web content into clean Markdown documents ready for chunking and indexing. Supports full-site crawls, sitemap-driven ingestion, and JavaScript-rendered pages.

ragweb-scrapinglangchain
55.4C+
Integrationmcp-servers

MCP + Notion

by Community / Notion

MCP Notion server built on the official Notion API, providing tools for searching pages, reading blocks, creating pages, and updating database entries. Enables Claude and other agents to use Notion as a structured knowledge store within agentic workflows.

mcpnotionknowledge-base
55.3C+
IntegrationAI Infrastructure

Weaviate + Cohere

by Weaviate

Weaviate's built-in text2vec-cohere and reranker-cohere modules for zero-ETL vectorization and result reranking within Weaviate clusters. Automatically embeds documents at write time using Cohere Embed v3 and reranks retrieval results without external orchestration code.

weaviatecoherevectorize-module
54C+
IntegrationAI Infrastructure

Milvus + LangChain

by Zilliz

LangChain VectorStore integration for Milvus, the open-source distributed vector database. Supports billion-scale ANN search, multiple index types (IVF_FLAT, HNSW, DiskANN), and collection-level partitioning through LangChain's unified retriever interface via the pymilvus client.

milvuslangchainvector-store
52.9C+
Integrationagent-frameworks

PydanticAI + Anthropic

by Pydantic

PydanticAI's native Anthropic model provider, enabling type-safe agentic workflows backed by Claude models. Agent inputs, tool call parameters, and structured outputs are all validated through Pydantic schemas, with full support for Claude's extended tool use and streaming responses.

agentspydanticaianthropic
52.6C+
Integrationagent-frameworks

SmolAgents + HuggingFace

by HuggingFace

SmolAgents is HuggingFace's minimal agent framework that defaults to code-writing agents powered by HuggingFace-hosted open-source models. The integration allows seamless use of models from the HuggingFace Hub (Qwen, Mistral, LLaMA) through the Inference API or local transformers without API key lock-in.

agentssmolagentshuggingface
52.5C+
IntegrationAI Infrastructure

LlamaFile + Local Execution

by Mozilla

LlamaFile by Mozilla and Justine Tunney bundles a complete LLM with its runtime into a single self-contained executable that runs on Linux, macOS, Windows, FreeBSD, NetBSD, and OpenBSD without any installation. It embeds a compressed GGUF model and a llama.cpp backend into a polyglot binary (ZIP + ELF/Mach-O), serving an OpenAI-compatible HTTP API on localhost at startup.

local-inferencesingle-binaryportable
52C+
Integrationmcp-servers

MCP + Sentry

by Community / Sentry

MCP Sentry server exposing Sentry's error tracking and performance monitoring data to MCP-compatible agents. Agents can list recent issues, retrieve stack traces, inspect breadcrumbs, and query performance data, enabling AI-powered incident triage and root cause analysis workflows.

mcpsentryerror-tracking
51.6C+
Integrationagent-frameworks

Swarm + OpenAI

by OpenAI

OpenAI's experimental Swarm framework natively targets the OpenAI Chat Completions API for lightweight, stateless multi-agent handoffs. Agents are plain Python functions decorated with tool schemas; the framework manages context passing and agent-to-agent transfers through the standard OpenAI function-calling interface.

agentsswarmopenai
50.9C+
IntegrationAI Infrastructure

Mistral AI + AWS Bedrock

by Amazon Web Services

Mistral AI's Mistral Large and Mistral Small models available through Amazon Bedrock for serverless inference. Provides AWS-native access to Mistral's frontier models with pay-per-token pricing, IAM-based auth, and Bedrock Guardrails — enabling EU-origin AI capabilities within AWS infrastructure without a separate Mistral API account.

mistralawsbedrock
50.8C+
IntegrationAI Tools & APIs

Braintrust + Anthropic

by Braintrust Data

Braintrust wraps the Anthropic SDK to automatically trace every Claude API call and funnel results into structured eval datasets. Developers can run model-graded scoring, regression suites against golden datasets, and A/B comparisons between Claude model versions directly from the Braintrust dashboard.

evaluationobservabilityanthropic
50C+
IntegrationAI Infrastructure

pgvector + Django

by pgvector

pgvector-django package adding native vector similarity search to Django's ORM via PostgreSQL's pgvector extension. Adds VectorField, IvfflatIndex, and HnswIndex with cosine, L2, and inner product distance operators. Enables AI-powered search inside existing Django applications without a separate vector DB.

pgvectordjangopostgresql
49.9C
Integrationrag-pipelines

Marker + ChromaDB

by VikParuchuri / ChromaDB

Combines Marker's high-fidelity PDF-to-Markdown conversion with ChromaDB's local-first vector store for lightweight, self-hosted RAG pipelines. Ideal for on-device or air-gapped deployments where cloud vector stores are unavailable.

ragpdf-parsingchromadb
48.2C
Integrationagent-frameworks

Agency Swarm + OpenAI

by VRSEN

Agency Swarm is built on top of the OpenAI Assistants API, wrapping it with agency-level abstractions for defining communication flows between specialized agents. It provides a higher-level interface for creating persistent agent threads, shared tool registries, and structured agent communication protocols.

agentsagency-swarmopenai
47.2C
Integrationrag-pipelines

Jina Reader + PGVector

by Jina AI / PostgreSQL

Routes Jina Reader's URL-to-text extraction through PostgreSQL's pgvector extension for SQL-native RAG storage. Enables teams already running PostgreSQL to add vector search without adopting a separate vector database, keeping the stack simple.

ragjinapgvector
45.3C
IntegrationAI Tools & APIs

Opik + LangChain

by Comet ML

Opik by Comet provides an open-source LLM observability platform that integrates with LangChain via a callback handler, recording traces, token counts, and custom scores into a queryable dataset. The integration includes built-in hallucination and answer-relevance evaluators that run automatically on captured traces.

observabilityevaluationlangchain
45.1C
Integrationrag-pipelines

Docling + Weaviate

by IBM / Weaviate

Combines IBM's Docling document conversion library with Weaviate's vector database for structured RAG pipelines. Docling extracts rich document structure (tables, figures, headings) which is then stored as typed Weaviate objects with native vector indexing.

ragdoclingweaviate
44.8C
IntegrationAI Infrastructure

LanceDB + LlamaIndex

by LanceDB

LlamaIndex integration for LanceDB's serverless, embedded vector database built on the Lance columnar format. Supports multimodal data (text, images, video), zero-copy queries, and versioned datasets. Ideal for local or edge AI applications requiring a zero-ops vector store with full LlamaIndex query engine compatibility.

lancedbllamaindexserverless-vector-db
44.3C
IntegrationAI Infrastructure

Cohere + AWS SageMaker

by Amazon Web Services

Cohere's Command and Embed models deployed as dedicated SageMaker endpoints for real-time inference with guaranteed throughput. Available through AWS Marketplace as JumpStart models, supporting VPC isolation, auto-scaling, and A/B testing. Preferred for enterprises requiring dedicated capacity and AWS billing consolidation.

cohereawssagemaker
43.9C
IntegrationAI Infrastructure

Fireworks AI + vLLM

by Fireworks AI

Integration between Fireworks AI's model platform and the vLLM inference engine for on-premises or self-hosted deployment of Fireworks-optimized models. Fireworks packages FireOptimizer-quantized models in formats directly compatible with vLLM's OpenAI-compatible server, enabling enterprise teams to run Fireworks-quality inference on their own GPU infrastructure.

fireworks-aivllmself-hosted-inference
42.4C
IntegrationAI Infrastructure

Vespa + Haystack

by deepset

Haystack DocumentStore integration for Vespa, Yahoo's open-source big-data serving engine. Combines Vespa's multi-stage ranking, approximate nearest neighbor search, and real-time indexing with Haystack's RAG pipeline builder. Supports BM25 + dense hybrid retrieval at web scale.

vespahaystackhybrid-search
42.2C
IntegrationAI Tools & APIs

Log10 + OpenAI

by Log10

Log10 provides zero-configuration auto-logging for OpenAI API calls through a context manager that intercepts completions and stores full request/response pairs with automatic tagging. The integration supports user feedback collection, few-shot prompt organization, and GDPR-compliant data masking for PII in logged payloads.

observabilityauto-loggingopenai
41.2C
Integrationrag-pipelines

Chunkr + Milvus

by Chunkr / Zilliz

Pairs Chunkr's semantic chunking service with Milvus's high-performance vector database for production-scale RAG. Chunkr splits documents using structure-aware boundaries and Milvus stores the resulting dense vectors with ANN indexing for sub-millisecond retrieval.

ragchunkingmilvus
41C
IntegrationAI Infrastructure

Zilliz + Apache Spark

by Zilliz

Connector linking Zilliz Cloud (managed Milvus) with Apache Spark for large-scale batch embedding ingestion and vector ETL pipelines. Enables parallel document embedding across Spark executors with direct write to Zilliz collections, supporting data lake to vector store pipelines at petabyte scale.

zillizapache-sparkbatch-vectorization
38.7D
Integration

Weights & Biases

by

ML experiment tracking and model monitoring platform. Integrates with all major training frameworks.

MLtrackingexperiments
38D
IntegrationAI Infrastructure

Cerebras + LiteLLM

by LiteLLM

LiteLLM proxy integration for Cerebras Inference, enabling Cerebras's wafer-scale chip throughput to be accessed via a unified OpenAI-compatible gateway. Allows developers to route requests to Cerebras's CS-3 hardware — delivering over 2000 tokens/second on Llama 3.1 70B — from any existing OpenAI SDK integration through LiteLLM's model aliases.

cerebraslitellmwafer-scale
37.8D
IntegrationAI Infrastructure

Turbopuffer + Vercel

by Turbopuffer

Integration connecting Turbopuffer's serverless vector database with Vercel's deployment platform. Turbopuffer stores vectors on object storage with sub-100ms cold query latency, making it viable for Vercel serverless functions and Edge Runtime. Zero infrastructure management for full-stack AI apps on Vercel.

turbopuffervercelserverless-vector-db
34.7D
Integration

Weights & Biases

by

ML experiment tracking and model monitoring platform. Integrates with all major training frameworks.

MLtrackingexperiments
0F
IntegrationAI Infrastructure

OWASP Top 10 for Agentic Applications

by OWASP Foundation

Security standard for AI agent systems (2026).

standardsecurityai-agents
0F
Integration

OWASP Top 10 for Agentic Applications

by

Security standard for AI agent systems (2026).

standardsecurityai-agents
0F
Integration

EU AI Act Compliance Framework

by

Regulatory framework for AI systems in the EU (Aug 2026).

regulationcomplianceai-governance
0F
Integration

AP2 (Agent Payment Protocol)

by

Autonomous agent commerce with crypto-signed mandates.

protocolpaymentsagent-commerce
0F