AI Agents
Autonomous agents, assistants, and multi-agent systems
31 entities in this channel
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
by Facebook AI Research
Introduces Retrieval-Augmented Generation (RAG), combining parametric memory (language model weights) with non-parametric memory (dense retrieval over Wikipedia) for knowledge-intensive NLP tasks. RAG models achieve state-of-the-art on open-domain QA benchmarks and produce more specific, factual, and diverse responses than pure parametric models.
ReAct: Synergizing Reasoning and Acting in Language Models
by Google / Princeton
Introduces ReAct, a paradigm that combines reasoning traces and task-specific actions in language models. By interleaving thinking steps with tool calls, ReAct agents outperform chain-of-thought and act-only baselines on diverse tasks including question answering, fact verification, and interactive decision-making.
Generative Agents: Interactive Simulacra of Human Behavior
by Stanford University / Google
Introduces generative agents—computational software agents that simulate believable human behavior—by combining a large language model with memory streams, reflection synthesis, and planning mechanisms. Twenty-five agents populate a virtual town, exhibiting emergent social behaviors including relationship formation, information propagation, and event coordination.
OpenAI Assistants API
by OpenAI
OpenAI's managed agent platform for building custom AI assistants with persistent threads, built-in code interpreter, file search, and function calling. Handles conversation state, tool orchestration, and context management so developers can focus on business logic.
Toolformer: Language Models Can Teach Themselves to Use Tools
by Meta AI
Presents Toolformer, a model that learns to use external tools (APIs) in a self-supervised manner without requiring human annotations. The model decides which APIs to call, how to call them, and how to incorporate results, achieving strong performance across diverse tasks while maintaining generative language modeling ability.
Personalized Tutor Agent
by Khanmigo (Khan Academy)
An adaptive tutoring agent that dynamically adjusts difficulty, pacing, and instructional modality based on individual learner performance signals. It maintains a persistent knowledge model per student, identifies misconceptions through Socratic questioning, and routes learners to mastery via spaced-repetition scheduling.
Function Calling
by AaaS
Enables LLMs to invoke external functions by generating structured JSON arguments matching defined schemas. Supports parallel function calls, error handling, and chained invocations for complex multi-step tool interactions.
Omnichannel Support Agent
by Intercom
A fully-autonomous customer support agent that unifies conversations across chat, email, SMS, and social DMs into a single threaded context window. It resolves tier-1 and tier-2 tickets using a retrieval-augmented knowledge base and maintains CSAT targets through sentiment-aware tone calibration.
AutoGen
by Microsoft Research
Microsoft's multi-agent conversation framework enabling multiple LLM agents to converse, collaborate, and solve tasks through automated chat. Supports customizable agent behaviors, human-in-the-loop, and code execution sandboxing.
EHR Documentation Agent
by Nuance Communications (Microsoft)
Ambient AI agent that listens to physician-patient encounters, generates structured clinical notes (SOAP, H&P, discharge summaries), and auto-populates EHR fields in real time. Reduces documentation burden by over 70% while maintaining compliance with ICD-10 and CPT coding standards.
Tool Use
by AaaS
Equips AI agents with the ability to select and use appropriate tools from a defined toolkit to accomplish tasks. Covers tool selection logic, input marshalling, output interpretation, and fallback strategies when tools fail or return unexpected results.
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
by Tsinghua / Peking University / DeepWisdom
Presents MetaGPT, a multi-agent framework that encodes human workflows as Standardized Operating Procedures (SOPs) for LLM agents acting as specialized software roles. By assigning product manager, architect, engineer, and QA roles, MetaGPT produces complete, executable codebases from natural language requirements with higher quality than prior approaches.
Drug Interaction Checker
by Wolters Kluwer Health
Real-time pharmacological agent that screens multi-drug regimens for contraindications, adverse interactions, and dosing conflicts. Cross-references patient allergy profiles, renal function, and genetic pharmacogenomics data to surface clinically relevant alerts at point of prescribing.
Voyager: An Open-Ended Embodied Agent with Large Language Models
by NVIDIA / Caltech / UT Austin
Presents Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager uses an automatic curriculum, an ever-growing skill library of executable code, and an iterative prompting mechanism to overcome failures.
ToolBench
by Qin et al. / Tsinghua University
ToolBench evaluates LLMs on their ability to use real-world REST APIs to complete user instructions. It provides 16,000+ real APIs from RapidAPI Hub across 49 categories and 12,000+ instruction–API solution pairs, measuring whether models can plan and execute multi-step API call sequences.
Multi-Step Reasoning
by AaaS
A core AI capability that enables agents to break down complex queries into a sequence of manageable, logical steps. By generating intermediate thoughts and verifying them, this process mimics human reasoning to solve problems that require planning, deduction, and synthesis of information over multiple stages.
WebArena
by CMU
WebArena is a realistic and reproducible benchmark environment designed to evaluate autonomous language agents. It tests an agent's ability to perform complex, multi-step tasks across a diverse set of self-hosted websites, including e-commerce, forums, and content management systems, using real web interfaces.
GAIA Benchmark
by Meta / Hugging Face
GAIA (General AI Assistants) is a benchmark for evaluating AI models on complex, real-world tasks. It features questions with unambiguous factual answers that require sophisticated capabilities like multi-step reasoning, web browsing, and tool use. GAIA is designed to test the practical limits of general-purpose AI assistants.
Planning
by AaaS
Enables agents to create structured execution plans for multi-step tasks by analyzing goals, identifying sub-tasks, ordering dependencies, and allocating resources. Supports plan revision when steps fail or new information emerges during execution.
Multi-Agent Coordination
by AaaS
Multi-Agent Coordination involves designing systems where multiple autonomous agents collaborate to achieve a common goal. This skill encompasses architectural patterns like hierarchical supervision and peer-to-peer negotiation for task distribution and conflict resolution. It focuses on managing shared information and ensuring coherent collective action in complex, dynamic environments.
AgentBoard
by Ma et al. / Shanghai AI Lab
AgentBoard is a comprehensive evaluation framework for Large Language Model (LLM) based agents. It assesses agent performance across nine diverse tasks, including embodied AI, gaming, web browsing, and tool use. The framework uniquely measures both final task success and partial progress through a fine-grained sub-goal metric.
Web Browsing
by AaaS
Empowers autonomous agents to interact with the web like a human user. This skill provides the core functionality to navigate to URLs, render pages including executing JavaScript, and parse DOM elements. It enables complex workflows such as filling out forms, clicking buttons, and extracting structured data for analysis or task completion.
AgentBench
by Tsinghua University
Comprehensive benchmark evaluating LLM agents across 8 distinct environments including operating systems, databases, knowledge graphs, digital card games, lateral thinking puzzles, and web shopping. Tests generalization of agent capabilities across diverse interaction paradigms.
API-Bank
by Li et al. / Wuhan University
API-Bank is a comprehensive benchmark for evaluating tool-augmented LLMs. It features 73 diverse APIs and assesses models on three levels: API retrieval, API calling, and complex planning. The benchmark measures both the correctness of tool selection and the accuracy of execution, providing a thorough test of an agent's capabilities.
Tool Calling Setup
by AaaS
Sets up a tool-calling agent with typed tool definitions, argument validation, error handling, and execution sandboxing. Includes example tools for web search, calculator, file operations, and database queries with a pluggable tool registry.
Multi-Agent Orchestration
by AaaS
Orchestrates multiple specialized AI agents in coordinated workflows with task routing, state management, and result aggregation. Implements supervisor and swarm patterns with configurable agent selection logic and inter-agent communication.
MCP Server Template
by AaaS
Template for building Model Context Protocol (MCP) servers that expose tools, resources, and prompts to MCP-compatible clients. Includes typed tool handlers, resource providers, error handling, and transport configuration for stdio and HTTP modes.
Agent Evaluation Framework
by AaaS
Evaluates AI agent performance across defined test scenarios with success criteria, step tracking, and automated scoring. Supports custom evaluation rubrics, regression detection, and generates detailed reports comparing agent versions over time.
Agent Deployment Script
by AaaS
Deploys AI agents as production services with health checks, graceful shutdown, error recovery, and monitoring integration. Supports Docker and Kubernetes deployments with configurable scaling, environment management, and rollback capabilities.
Adept AI
by Adept AI
Adept AI builds AI systems that can take actions in software to complete complex multi-step workflows on behalf of users. The company focuses on general-purpose action models trained to interact with real-world software interfaces through browser and desktop automation.
Agent Monitoring Dashboard
by AaaS
Sets up a monitoring dashboard for AI agent systems tracking task completion rates, error rates, latency, token usage, and cost. Integrates with Prometheus for metrics collection and Grafana for visualization with pre-built alert rules.