Aisera Logo Red

Benchmark Report

Top of the CLASS: Benchmarking LLM Agents on Real-World Enterprise Tasks

Authors

Michael WornowPhD. Student at Stanford University
Vaishnav GarodiaM.S. Computer Science at Stanford University
Vasilis VassalosSr. Director of ai/ml | Aisera
Utkarsh ContractorField CTO | Aisera

Aisera’s AI Agents built using domain-specific LLMs sets the standard, outperforming general-purpose models in real-world use cases!

Aisera Agents

GPT-4o

Claude 3.5 Sonnet

Gemini 1.5 Pro

AI agents are reshaping enterprise work. As LLMs gain industry focus, how can organizations identify the best fit for their use case or domain? Traditional evaluation methods often rely on synthetic data or artificial scenarios, failing to address real-world use cases across domains such as IT, HR, and more.

Download this report to:

  • Learn about the CLASSic framework for agentic AI benchmarking
  • Get a comprehensive report with results and analysis
  • Understand the implications and tradeoffs for implementing agentic AI

Coming soon: Algorithms, and datasets to evaluate AI Agents of your choice.

Download Report

Accepted as a conference paper at ICLR 2025