Microsoft Introduces Open-Source Tool for AI Behavior Testing
Microsoft has unveiled Adaptive Spec-driven Scoring for Evaluation and Regression Testing (ASSERT), an open-source framework enabling developers to cr...
Microsoft has unveiled Adaptive Spec-driven Scoring for Evaluation and Regression Testing (ASSERT), an open-source framework enabling developers to cr...
Simon Willison announced the alpha release of datasette-agent-micropython 0.1a0, a tool aiming to enable safe generation and execution of Python code ...
JetBrains has introduced Mellum2, a 12B Mixture-of-Experts model, marking a new development in the field of large language models. Further specifics o...
Mistral AI has announced the launch of Mistral 3, signifying a new generation or major update to their flagship AI model. This release is expected to ...
Mistral AI has announced the release of Mistral Medium 3.5, indicating an update to their suite of AI models. This new iteration likely brings perform...
Mistral AI has unveiled Mistral Small 4, introducing another new model to its growing portfolio. This release suggests a focus on providing efficient ...
Details in article.
Anthropic is emphasizing the enhanced 'honesty' of its new Claude Opus 4.8 model, stating that it's trained to avoid making unsupported claims. This a...
Anthropic has released Claude Opus 4.7, its latest foundational AI model, promising stronger performance across coding, agent capabilities, vision, an...
Details in article.
Artificial Analysis and IBM have introduced ITBench-AA, the first benchmark specifically designed for agentic enterprise IT tasks. Initial results sho...
Human Archive, a startup founded by UC Berkeley and Stanford researchers, is leveraging India's gig economy to collect crucial physical training data ...