The DeepSeek R1 developers relied mostly on Reinforcement Learning (RL) to improve the AI’s reasoning abilities. This training method uses a reward system to provide feedback to the AI, which made ...