OpenAI Ceases Evaluation on SWE-bench Verified Due to Contamination Concerns

MACHINE LEARNING Februar 24, 2026

OpenAI has ceased evaluating its AI models on SWE-bench Verified due to concerns about data contamination, flawed tests, and training leakage. The company states that the benchmark is no longer an accurate measure of frontier coding progress and recommends transitioning to SWE-bench Pro for more reliable evaluations.