News
For the first time, large language models performed on a par with gold medallists in the International Mathematical Olympiad.
Anthropic research reveals AI models perform worse with extended reasoning time, challenging industry assumptions about test-time compute scaling in enterprise deployments.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results