Artificial Analysis and IBM have introduced ITBench-AA, the first benchmark specifically designed for agentic enterprise IT tasks. Initial results show that frontier AI models score below 50%, indicating significant room for improvement in their ability to handle complex IT operations.
Source: Hugging Face