OpenAI has ceased evaluating its AI models on SWE-bench Verified due to concerns about data contamination, flawed tests, and training leakage. The company states that the benchmark is no longer an accurate measure of frontier coding progress and recommends transitioning to SWE-bench Pro for more reliable evaluations.
Source: OpenAI