Skip to main content
brand
context
industry
strategy
AaaS
Channel

AI Agents

Autonomous agents, assistants, and multi-agent systems

31 entities in this channel

PaperAI Agents

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

by Facebook AI Research

Introduces Retrieval-Augmented Generation (RAG), combining parametric memory (language model weights) with non-parametric memory (dense retrieval over Wikipedia) for knowledge-intensive NLP tasks. RAG models achieve state-of-the-art on open-domain QA benchmarks and produce more specific, factual, and diverse responses than pure parametric models.

ragretrievalgeneration
81.2A
PaperAI Agents

ReAct: Synergizing Reasoning and Acting in Language Models

by Google / Princeton

Introduces ReAct, a paradigm that combines reasoning traces and task-specific actions in language models. By interleaving thinking steps with tool calls, ReAct agents outperform chain-of-thought and act-only baselines on diverse tasks including question answering, fact verification, and interactive decision-making.

agentsreasoningtool-use
79B+
PaperAI Agents

Generative Agents: Interactive Simulacra of Human Behavior

by Stanford University / Google

Introduces generative agents—computational software agents that simulate believable human behavior—by combining a large language model with memory streams, reflection synthesis, and planning mechanisms. Twenty-five agents populate a virtual town, exhibiting emergent social behaviors including relationship formation, information propagation, and event coordination.

agentssimulationsocial
77.3B+
AgentAI Agents

OpenAI Assistants API

by OpenAI

OpenAI's managed agent platform for building custom AI assistants with persistent threads, built-in code interpreter, file search, and function calling. Handles conversation state, tool orchestration, and context management so developers can focus on business logic.

agent-platformapifunction-calling
74.5B+
PaperAI Agents

Toolformer: Language Models Can Teach Themselves to Use Tools

by Meta AI

Presents Toolformer, a model that learns to use external tools (APIs) in a self-supervised manner without requiring human annotations. The model decides which APIs to call, how to call them, and how to incorporate results, achieving strong performance across diverse tasks while maintaining generative language modeling ability.

tool-useself-supervisedapi-calling
73.7B+
AgentAI Agents

Personalized Tutor Agent

by Khanmigo (Khan Academy)

An adaptive tutoring agent that dynamically adjusts difficulty, pacing, and instructional modality based on individual learner performance signals. It maintains a persistent knowledge model per student, identifies misconceptions through Socratic questioning, and routes learners to mastery via spaced-repetition scheduling.

educationtutoringadaptive-learning
73.7B+
SkillAI Agents

Function Calling

by AaaS

Enables LLMs to invoke external functions by generating structured JSON arguments matching defined schemas. Supports parallel function calls, error handling, and chained invocations for complex multi-step tool interactions.

function-callingtoolsstructured-output
73.7B+
AgentAI Agents

Omnichannel Support Agent

by Intercom

A fully-autonomous customer support agent that unifies conversations across chat, email, SMS, and social DMs into a single threaded context window. It resolves tier-1 and tier-2 tickets using a retrieval-augmented knowledge base and maintains CSAT targets through sentiment-aware tone calibration.

customer-serviceomnichannellive-chat
72.4B+
AgentAI Agents

AutoGen

by Microsoft Research

Microsoft's multi-agent conversation framework enabling multiple LLM agents to converse, collaborate, and solve tasks through automated chat. Supports customizable agent behaviors, human-in-the-loop, and code execution sandboxing.

multi-agentconversablemicrosoft
72.4B+
AgentAI Agents

EHR Documentation Agent

by Nuance Communications (Microsoft)

Ambient AI agent that listens to physician-patient encounters, generates structured clinical notes (SOAP, H&P, discharge summaries), and auto-populates EHR fields in real time. Reduces documentation burden by over 70% while maintaining compliance with ICD-10 and CPT coding standards.

healthcareehrclinical-documentation
72B+
SkillAI Agents

Tool Use

by AaaS

Equips AI agents with the ability to select and use appropriate tools from a defined toolkit to accomplish tasks. Covers tool selection logic, input marshalling, output interpretation, and fallback strategies when tools fail or return unexpected results.

toolsagentsintegration
72B+
PaperAI Agents

MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

by Tsinghua / Peking University / DeepWisdom

Presents MetaGPT, a multi-agent framework that encodes human workflows as Standardized Operating Procedures (SOPs) for LLM agents acting as specialized software roles. By assigning product manager, architect, engineer, and QA roles, MetaGPT produces complete, executable codebases from natural language requirements with higher quality than prior approaches.

agentsmulti-agentsoftware-engineering
71.7B+
AgentAI Agents

Drug Interaction Checker

by Wolters Kluwer Health

Real-time pharmacological agent that screens multi-drug regimens for contraindications, adverse interactions, and dosing conflicts. Cross-references patient allergy profiles, renal function, and genetic pharmacogenomics data to surface clinically relevant alerts at point of prescribing.

healthcarepharmacologydrug-safety
71.7B+
PaperAI Agents

Voyager: An Open-Ended Embodied Agent with Large Language Models

by NVIDIA / Caltech / UT Austin

Presents Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager uses an automatic curriculum, an ever-growing skill library of executable code, and an iterative prompting mechanism to overcome failures.

agentsminecraftlifelong-learning
71.2B+
BenchmarkAI Agents

ToolBench

by Qin et al. / Tsinghua University

ToolBench evaluates LLMs on their ability to use real-world REST APIs to complete user instructions. It provides 16,000+ real APIs from RapidAPI Hub across 49 categories and 12,000+ instruction–API solution pairs, measuring whether models can plan and execute multi-step API call sequences.

tool-useapiagents
67B
SkillAI Agents

Multi-Step Reasoning

by AaaS

A core AI capability that enables agents to break down complex queries into a sequence of manageable, logical steps. By generating intermediate thoughts and verifying them, this process mimics human reasoning to solve problems that require planning, deduction, and synthesis of information over multiple stages.

multi-step-reasoningchain-of-thoughttree-of-thoughts
63.2B
BenchmarkAI Agents

WebArena

by CMU

WebArena is a realistic and reproducible benchmark environment designed to evaluate autonomous language agents. It tests an agent's ability to perform complex, multi-step tasks across a diverse set of self-hosted websites, including e-commerce, forums, and content management systems, using real web interfaces.

benchmarkagent-evaluationweb-benchmark
62.4B
BenchmarkAI Agents

GAIA Benchmark

by Meta / Hugging Face

GAIA (General AI Assistants) is a benchmark for evaluating AI models on complex, real-world tasks. It features questions with unambiguous factual answers that require sophisticated capabilities like multi-step reasoning, web browsing, and tool use. GAIA is designed to test the practical limits of general-purpose AI assistants.

benchmarkevaluationagents
62.2B
SkillAI Agents

Planning

by AaaS

Enables agents to create structured execution plans for multi-step tasks by analyzing goals, identifying sub-tasks, ordering dependencies, and allocating resources. Supports plan revision when steps fail or new information emerges during execution.

planningstrategytask-management
62.2B
SkillAI Agents

Multi-Agent Coordination

by AaaS

Multi-Agent Coordination involves designing systems where multiple autonomous agents collaborate to achieve a common goal. This skill encompasses architectural patterns like hierarchical supervision and peer-to-peer negotiation for task distribution and conflict resolution. It focuses on managing shared information and ensuring coherent collective action in complex, dynamic environments.

multi-agentorchestrationcoordination
62B
BenchmarkAI Agents

AgentBoard

by Ma et al. / Shanghai AI Lab

AgentBoard is a comprehensive evaluation framework for Large Language Model (LLM) based agents. It assesses agent performance across nine diverse tasks, including embodied AI, gaming, web browsing, and tool use. The framework uniquely measures both final task success and partial progress through a fine-grained sub-goal metric.

agent-evaluationllm-benchmarkmulti-task-evaluation
61.1B
SkillAI Agents

Web Browsing

by AaaS

Empowers autonomous agents to interact with the web like a human user. This skill provides the core functionality to navigate to URLs, render pages including executing JavaScript, and parse DOM elements. It enables complex workflows such as filling out forms, clicking buttons, and extracting structured data for analysis or task completion.

browsingwebnavigation
60.8B
BenchmarkAI Agents

AgentBench

by Tsinghua University

Comprehensive benchmark evaluating LLM agents across 8 distinct environments including operating systems, databases, knowledge graphs, digital card games, lateral thinking puzzles, and web shopping. Tests generalization of agent capabilities across diverse interaction paradigms.

benchmarkevaluationagents
59.3C+
BenchmarkAI Agents

API-Bank

by Li et al. / Wuhan University

API-Bank is a comprehensive benchmark for evaluating tool-augmented LLMs. It features 73 diverse APIs and assesses models on three levels: API retrieval, API calling, and complex planning. The benchmark measures both the correctness of tool selection and the accuracy of execution, providing a thorough test of an agent's capabilities.

tool-useapi-callagents
58.8C+
ScriptAI Agents

Tool Calling Setup

by AaaS

Sets up a tool-calling agent with typed tool definitions, argument validation, error handling, and execution sandboxing. Includes example tools for web search, calculator, file operations, and database queries with a pluggable tool registry.

scriptautomationtool-calling
56.4C+
ScriptAI Agents

Multi-Agent Orchestration

by AaaS

Orchestrates multiple specialized AI agents in coordinated workflows with task routing, state management, and result aggregation. Implements supervisor and swarm patterns with configurable agent selection logic and inter-agent communication.

scriptautomationmulti-agent
52.6C+
ScriptAI Agents

MCP Server Template

by AaaS

Template for building Model Context Protocol (MCP) servers that expose tools, resources, and prompts to MCP-compatible clients. Includes typed tool handlers, resource providers, error handling, and transport configuration for stdio and HTTP modes.

scriptautomationmcp
50.8C+
ScriptAI Agents

Agent Evaluation Framework

by AaaS

Evaluates AI agent performance across defined test scenarios with success criteria, step tracking, and automated scoring. Supports custom evaluation rubrics, regression detection, and generates detailed reports comparing agent versions over time.

scriptautomationevaluation
48.3C
ScriptAI Agents

Agent Deployment Script

by AaaS

Deploys AI agents as production services with health checks, graceful shutdown, error recovery, and monitoring integration. Supports Docker and Kubernetes deployments with configurable scaling, environment management, and rollback capabilities.

scriptautomationdeployment
47.4C
ProviderAI Agents

Adept AI

by Adept AI

Adept AI builds AI systems that can take actions in software to complete complex multi-step workflows on behalf of users. The company focuses on general-purpose action models trained to interact with real-world software interfaces through browser and desktop automation.

agentscomputer-useworkflow-automation
46.9C
ScriptAI Agents

Agent Monitoring Dashboard

by AaaS

Sets up a monitoring dashboard for AI agent systems tracking task completion rates, error rates, latency, token usage, and cost. Integrates with Prometheus for metrics collection and Grafana for visualization with pre-built alert rules.

scriptautomationmonitoring
45.3C