AI Agents

ReAct: Synergizing Reasoning and Acting in Language Models

by Google / Princeton

Introduces ReAct, a paradigm that combines reasoning traces and task-specific actions in language models. By interleaving thinking steps with tool calls, ReAct agents outperform chain-of-thought and act-only baselines on diverse tasks including question answering, fact verification, and interactive decision-making.

agentsreasoningtool-use

79B+

Generative Agents: Interactive Simulacra of Human Behavior

by Stanford University / Google

Introduces generative agents—computational software agents that simulate believable human behavior—by combining a large language model with memory streams, reflection synthesis, and planning mechanisms. Twenty-five agents populate a virtual town, exhibiting emergent social behaviors including relationship formation, information propagation, and event coordination.

agentssimulationsocial

77.3B+

agent-platformapifunction-calling

OpenAI Assistants API

by OpenAI

OpenAI's managed agent platform for building custom AI assistants with persistent threads, built-in code interpreter, file search, and function calling. Handles conversation state, tool orchestration, and context management so developers can focus on business logic.

74.5B+

tool-useself-supervisedapi-calling

Toolformer: Language Models Can Teach Themselves to Use Tools

by Meta AI

Presents Toolformer, a model that learns to use external tools (APIs) in a self-supervised manner without requiring human annotations. The model decides which APIs to call, how to call them, and how to incorporate results, achieving strong performance across diverse tasks while maintaining generative language modeling ability.

73.7B+

educationtutoringadaptive-learning

Personalized Tutor Agent

by Khanmigo (Khan Academy)

An adaptive tutoring agent that dynamically adjusts difficulty, pacing, and instructional modality based on individual learner performance signals. It maintains a persistent knowledge model per student, identifies misconceptions through Socratic questioning, and routes learners to mastery via spaced-repetition scheduling.

73.7B+

function-callingtoolsstructured-output

Function Calling

by AaaS

Enables LLMs to invoke external functions by generating structured JSON arguments matching defined schemas. Supports parallel function calls, error handling, and chained invocations for complex multi-step tool interactions.

73.7B+

customer-serviceomnichannellive-chat

Omnichannel Support Agent

by Intercom

A fully-autonomous customer support agent that unifies conversations across chat, email, SMS, and social DMs into a single threaded context window. It resolves tier-1 and tier-2 tickets using a retrieval-augmented knowledge base and maintains CSAT targets through sentiment-aware tone calibration.

72.4B+

multi-agentconversablemicrosoft

AutoGen

by Microsoft Research

Microsoft's multi-agent conversation framework enabling multiple LLM agents to converse, collaborate, and solve tasks through automated chat. Supports customizable agent behaviors, human-in-the-loop, and code execution sandboxing.

72.4B+

healthcareehrclinical-documentation

EHR Documentation Agent

by Nuance Communications (Microsoft)

Ambient AI agent that listens to physician-patient encounters, generates structured clinical notes (SOAP, H&P, discharge summaries), and auto-populates EHR fields in real time. Reduces documentation burden by over 70% while maintaining compliance with ICD-10 and CPT coding standards.

72B+

Tool Use

by AaaS

Equips AI agents with the ability to select and use appropriate tools from a defined toolkit to accomplish tasks. Covers tool selection logic, input marshalling, output interpretation, and fallback strategies when tools fail or return unexpected results.

toolsagentsintegration

72B+

agentsmulti-agentsoftware-engineering

MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

by Tsinghua / Peking University / DeepWisdom

Presents MetaGPT, a multi-agent framework that encodes human workflows as Standardized Operating Procedures (SOPs) for LLM agents acting as specialized software roles. By assigning product manager, architect, engineer, and QA roles, MetaGPT produces complete, executable codebases from natural language requirements with higher quality than prior approaches.

71.7B+

healthcarepharmacologydrug-safety

Drug Interaction Checker

by Wolters Kluwer Health

Real-time pharmacological agent that screens multi-drug regimens for contraindications, adverse interactions, and dosing conflicts. Cross-references patient allergy profiles, renal function, and genetic pharmacogenomics data to surface clinically relevant alerts at point of prescribing.

71.7B+

agentsminecraftlifelong-learning

Voyager: An Open-Ended Embodied Agent with Large Language Models

by NVIDIA / Caltech / UT Austin

Presents Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager uses an automatic curriculum, an ever-growing skill library of executable code, and an iterative prompting mechanism to overcome failures.

71.2B+

ToolBench

by Qin et al. / Tsinghua University

ToolBench evaluates LLMs on their ability to use real-world REST APIs to complete user instructions. It provides 16,000+ real APIs from RapidAPI Hub across 49 categories and 12,000+ instruction–API solution pairs, measuring whether models can plan and execute multi-step API call sequences.

tool-useapiagents

67B

multi-step-reasoningchain-of-thoughttree-of-thoughts

Multi-Step Reasoning

by AaaS

A core AI capability that enables agents to break down complex queries into a sequence of manageable, logical steps. By generating intermediate thoughts and verifying them, this process mimics human reasoning to solve problems that require planning, deduction, and synthesis of information over multiple stages.

63.2B

benchmarkagent-evaluationweb-benchmark

WebArena

by CMU

WebArena is a realistic and reproducible benchmark environment designed to evaluate autonomous language agents. It tests an agent's ability to perform complex, multi-step tasks across a diverse set of self-hosted websites, including e-commerce, forums, and content management systems, using real web interfaces.

62.4B

benchmarkevaluationagents

GAIA Benchmark

by Meta / Hugging Face

GAIA (General AI Assistants) is a benchmark for evaluating AI models on complex, real-world tasks. It features questions with unambiguous factual answers that require sophisticated capabilities like multi-step reasoning, web browsing, and tool use. GAIA is designed to test the practical limits of general-purpose AI assistants.

62.2B

planningstrategytask-management

Planning

by AaaS

Enables agents to create structured execution plans for multi-step tasks by analyzing goals, identifying sub-tasks, ordering dependencies, and allocating resources. Supports plan revision when steps fail or new information emerges during execution.

62.2B

multi-agentorchestrationcoordination

Multi-Agent Coordination

by AaaS

Multi-Agent Coordination involves designing systems where multiple autonomous agents collaborate to achieve a common goal. This skill encompasses architectural patterns like hierarchical supervision and peer-to-peer negotiation for task distribution and conflict resolution. It focuses on managing shared information and ensuring coherent collective action in complex, dynamic environments.

62B

agent-evaluationllm-benchmarkmulti-task-evaluation

AgentBoard

by Ma et al. / Shanghai AI Lab

AgentBoard is a comprehensive evaluation framework for Large Language Model (LLM) based agents. It assesses agent performance across nine diverse tasks, including embodied AI, gaming, web browsing, and tool use. The framework uniquely measures both final task success and partial progress through a fine-grained sub-goal metric.

61.1B

Web Browsing

by AaaS

Empowers autonomous agents to interact with the web like a human user. This skill provides the core functionality to navigate to URLs, render pages including executing JavaScript, and parse DOM elements. It enables complex workflows such as filling out forms, clicking buttons, and extracting structured data for analysis or task completion.

browsingwebnavigation

60.8B

benchmarkevaluationagents

AgentBench

by Tsinghua University

Comprehensive benchmark evaluating LLM agents across 8 distinct environments including operating systems, databases, knowledge graphs, digital card games, lateral thinking puzzles, and web shopping. Tests generalization of agent capabilities across diverse interaction paradigms.

59.3C+

API-Bank

by Li et al. / Wuhan University

API-Bank is a comprehensive benchmark for evaluating tool-augmented LLMs. It features 73 diverse APIs and assesses models on three levels: API retrieval, API calling, and complex planning. The benchmark measures both the correctness of tool selection and the accuracy of execution, providing a thorough test of an agent's capabilities.

tool-useapi-callagents

58.8C+

scriptautomationtool-calling

Tool Calling Setup

by AaaS

Sets up a tool-calling agent with typed tool definitions, argument validation, error handling, and execution sandboxing. Includes example tools for web search, calculator, file operations, and database queries with a pluggable tool registry.

56.4C+

scriptautomationmulti-agent

Multi-Agent Orchestration

by AaaS

Orchestrates multiple specialized AI agents in coordinated workflows with task routing, state management, and result aggregation. Implements supervisor and swarm patterns with configurable agent selection logic and inter-agent communication.

52.6C+

MCP Server Template

by AaaS

Template for building Model Context Protocol (MCP) servers that expose tools, resources, and prompts to MCP-compatible clients. Includes typed tool handlers, resource providers, error handling, and transport configuration for stdio and HTTP modes.

scriptautomationmcp

50.8C+

scriptautomationevaluation

Agent Evaluation Framework

by AaaS

Evaluates AI agent performance across defined test scenarios with success criteria, step tracking, and automated scoring. Supports custom evaluation rubrics, regression detection, and generates detailed reports comparing agent versions over time.

48.3C

scriptautomationdeployment

Agent Deployment Script

by AaaS

Deploys AI agents as production services with health checks, graceful shutdown, error recovery, and monitoring integration. Supports Docker and Kubernetes deployments with configurable scaling, environment management, and rollback capabilities.

47.4C

ProviderAI Agents

Adept AI

by Adept AI

Adept AI builds AI systems that can take actions in software to complete complex multi-step workflows on behalf of users. The company focuses on general-purpose action models trained to interact with real-world software interfaces through browser and desktop automation.

agentscomputer-useworkflow-automation

46.9C