News

For the first time, large language models performed on a par with gold medallists in the International Mathematical Olympiad.
Anthropic research reveals AI models perform worse with extended reasoning time, challenging industry assumptions about test-time compute scaling in enterprise deployments.