Benchmark Report
Top of the CLASS: Benchmarking LLM Agents on Real-World Enterprise Tasks
Authors

Michael WornowPhD. Student at Stanford University

Vaishnav GarodiaM.S. Computer Science at Stanford University

Vasilis VassalosSr. Director of ai/ml | Aisera

Utkarsh ContractorField CTO | Aisera
Aisera’s AI Agents built using domain-specific LLMs sets the standard, outperforming general-purpose models in real-world use cases!
Aisera Agents
GPT-4o
Claude 3.5 Sonnet
Gemini 1.5 Pro
AI agents are reshaping enterprise work. As LLMs gain industry focus, how can organizations identify the best fit for their use case or domain? Traditional evaluation methods often rely on synthetic data or artificial scenarios, failing to address real-world use cases across domains such as IT, HR, and more.
Download this report to:
- Learn about the CLASSic framework for agentic AI benchmarking
- Get a comprehensive report with results and analysis
- Understand the implications and tradeoffs for implementing agentic AI
Coming soon: Algorithms, and datasets to evaluate AI Agents of your choice.
Download Report
Accepted as a conference paper at ICLR 2025