Now, Claude Sonnet 4.5 has lapped that last model, outperforming it on the SWE-bench Verified evaluation, a human-filtered subset of the SWE-bench. Claude Sonnet 4.5 also outperformed leading models ...
Between AD Mitchell's silly fumble and penalty, Xavien Howard's coverage and Lou Anarumo's inability to count to 11, the ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results