New ITBench-AA Benchmark Reveals Frontier Models Struggle with Agentic Enterprise IT Tasks

Artificial Analysis and IBM have introduced ITBench-AA, the first benchmark specifically designed for agentic enterprise IT tasks. Initial results show that frontier AI models score below 50%, indicating significant room for improvement in their ability to handle complex IT operations.

Source: Hugging Face