Math and Music Models

DeepSeek claims its ‘reasoning’ model beats OpenAI’s o1 on certain benchmarks

According to DeepSeek, R1 beats o1 on the benchmarks AIME, MATH-500, and SWE-bench Verified. AIME employs other models to evaluate a model’s performance, while MATH-500 is a collection of word ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Feedback

Trending now