Skip to main content

AI Agents

Autonomous agents, assistants, and multi-agent systems

26 entities indexed

SkillAI Agents

Tool Use

by AaaS

Equips AI agents with the ability to select and use appropriate tools from a defined toolkit to accomplish tasks. Covers tool selection logic, input marshalling, output interpretation, and fallback strategies when tools fail or return unexpected results.

toolsagentsintegration
54C+
PaperAI Agents

Toolformer: Language Models Can Teach Themselves to Use Tools

by Meta AI

Presents Toolformer, a model that learns to use external tools (APIs) in a self-supervised manner without requiring human annotations. The model decides which APIs to call, how to call them, and how to incorporate results, achieving strong performance across diverse tasks while maintaining generative language modeling ability.

tool-useself-supervisedapi-calling
51C+
PaperAI Agents

Voyager: An Open-Ended Embodied Agent with Large Language Models

by NVIDIA / Caltech / UT Austin

Presents Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager uses an automatic curriculum, an ever-growing skill library of executable code, and an iterative prompting mechanism to overcome failures.

agentsminecraftlifelong-learning
50C+
PaperAI Agents

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

by Princeton NLP / Princeton Language and Intelligence

Introduces SWE-agent, which defines Agent-Computer Interfaces (ACIs) to enable LLMs to autonomously solve real GitHub issues by browsing codebases, editing files, and running tests. On the SWE-bench benchmark, SWE-agent with GPT-4 Turbo resolves 12.5% of issues, significantly outperforming prior methods.

agentssoftware-engineeringcode
50C+
PaperAI Agents

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

by University of Washington / IBM AI Research / Allen AI

Introduces Self-RAG, a framework that trains a single LM to adaptively retrieve passages on demand, generate text, and critique its own outputs using special reflection tokens. Unlike standard RAG, Self-RAG decides when to retrieve and reflects on retrieved passages and generation quality, outperforming ChatGPT and standard RAG on diverse downstream tasks.

ragself-reflectioncritique
49C
SkillAI Agents

Planning

by AaaS

Enables agents to create structured execution plans for multi-step tasks by analyzing goals, identifying sub-tasks, ordering dependencies, and allocating resources. Supports plan revision when steps fail or new information emerges during execution.

planningstrategytask-management
47C
BenchmarkAI Agents

ToolBench

by Qin et al. / Tsinghua University

ToolBench evaluates LLMs on their ability to use real-world REST APIs to complete user instructions. It provides 16,000+ real APIs from RapidAPI Hub across 49 categories and 12,000+ instruction–API solution pairs, measuring whether models can plan and execute multi-step API call sequences.

tool-useapiagents
47C
PaperAI Agents

Improving Language Models by Retrieving from Trillions of Tokens

by DeepMind

Presents RETRO (Retrieval-Enhanced Transformers), a model that retrieves from a 2-trillion-token database at inference time via chunked cross-attention. RETRO achieves performance comparable to GPT-3 with 25× fewer parameters by leveraging retrieved passages, demonstrating that retrieval augmentation is a compute-efficient alternative to scaling.

ragretrievallanguage-model
47C
SkillAI Agents

Web Browsing

by AaaS

Empowers autonomous agents to interact with the web like a human user. This skill provides the core functionality to navigate to URLs, render pages including executing JavaScript, and parse DOM elements. It enables complex workflows such as filling out forms, clicking buttons, and extracting structured data for analysis or task completion.

browsingwebnavigation
45C
BenchmarkAI Agents

WebArena

by CMU

WebArena is a realistic and reproducible benchmark environment designed to evaluate autonomous language agents. It tests an agent's ability to perform complex, multi-step tasks across a diverse set of self-hosted websites, including e-commerce, forums, and content management systems, using real web interfaces.

benchmarkagent-evaluationweb-benchmark
44C
SkillAI Agents

Reflection

by AaaS

Allows agents to evaluate their own outputs, identify errors or weaknesses, and iteratively improve responses. Implements self-critique loops where the agent reviews its work against quality criteria and refines until standards are met.

reflectionself-evaluationmetacognition
43C
ScriptAI Agents

Tool Calling Setup

by AaaS

Sets up a tool-calling agent with typed tool definitions, argument validation, error handling, and execution sandboxing. Includes example tools for web search, calculator, file operations, and database queries with a pluggable tool registry.

scriptautomationtool-calling
42C
SkillAI Agents

Tool Selection Strategy

by AaaS

Covers heuristics and learned strategies for agents to select the right tool from a large catalog given a task description, including embedding-based tool retrieval, LLM-based routing, and multi-step tool chaining. Teaches fallback hierarchies, tool description engineering, and cost-aware selection to minimize unnecessary API calls.

tool-useroutingtool-selection
42C
BenchmarkAI Agents

TAU-bench

by Sierra AI

Tool-Agent-User benchmark evaluating AI agents on realistic customer service scenarios requiring multi-step tool use. Tests agents' ability to navigate complex workflows, use tools correctly, follow policies, and handle edge cases in airline and retail domains.

benchmarkevaluationagents
41C
BenchmarkAI Agents

MLAgentBench

by Huang et al. / Stanford

MLAgentBench challenges AI agents to perform machine learning research tasks autonomously — reading papers, writing code, running experiments, analyzing results, and improving models. It tests whether agents can replicate and build upon real ML research across 13 diverse ML tasks.

agentsml-researchcoding
41C
ScriptAI Agents

Multi-Agent Orchestration

by AaaS

Orchestrates multiple specialized AI agents in coordinated workflows with task routing, state management, and result aggregation. Implements supervisor and swarm patterns with configurable agent selection logic and inter-agent communication.

scriptautomationmulti-agent
40C
ScriptAI Agents

MCP Server Template

by AaaS

Template for building Model Context Protocol (MCP) servers that expose tools, resources, and prompts to MCP-compatible clients. Includes typed tool handlers, resource providers, error handling, and transport configuration for stdio and HTTP modes.

scriptautomationmcp
39D
BenchmarkAI Agents

OSWorld

by University of Hong Kong

Benchmark for evaluating multimodal agents on real operating system tasks spanning Ubuntu, Windows, and macOS environments. Tests agents' ability to interact with desktop applications, file systems, terminals, and GUI elements to complete everyday computer tasks.

benchmarkevaluationagents
39D
ScriptAI Agents

Agent Monitoring Dashboard

by AaaS

Sets up a monitoring dashboard for AI agent systems tracking task completion rates, error rates, latency, token usage, and cost. Integrates with Prometheus for metrics collection and Grafana for visualization with pre-built alert rules.

scriptautomationmonitoring
35D
ScriptAI Agents

Agent Testing Harness

by AaaS

Testing harness for AI agents with mock tool providers, simulated user interactions, and deterministic replay capabilities. Enables unit testing of agent logic, integration testing of tool chains, and end-to-end testing of complete agent workflows.

scriptautomationtesting
34D
ProviderAI Agents

Adept AI

by Adept AI

Adept AI builds AI systems that can take actions in software to complete complex multi-step workflows on behalf of users. The company focuses on general-purpose action models trained to interact with real-world software interfaces through browser and desktop automation.

agentscomputer-useworkflow-automation
32D
AgentAI Agents

LyricLoom

by SonicCraft Studios

A creative voice agent specializing in generating original spoken word content, from podcasts to audiobooks, with customizable voices and styles.

AI agentvoice AIcontent creation
18F
AgentAI Agents

AuraSpeak

by Vocalix Technologies

A next-generation voice agent framework for building highly conversational and context-aware AI assistants across various platforms.

AI agentvoice AIconversational AI
18F
AgentAI Agents

DataScout AI

by ScoutLogic Corp.

An enterprise-grade browser agent for automated data collection and analysis from public web sources, ensuring compliance and scalability.

AI agentbrowser automationenterprise
16F
AgentAI Agents

TaskWeaver

by AutoFlow Personal

A personal browser agent that learns user habits to automate repetitive online tasks, from managing emails to booking appointments and comparing prices.

AI agentbrowser automationpersonal assistant
12F
AgentAI Agents

BugFixer Bot

by DebugAI Solutions

An AI-powered debugging agent that automatically identifies, diagnoses, and suggests fixes for code errors across multiple programming languages.

AI agentcodingdebugging
11F