Comment Market misconception (Score 1) 77
The reaction here is it took less GPUs to TRAIN the model, but that's simply because using strictly reinforcement learning they were able to skip the computationally expensive step of training on a supervised learning dataset (not counting the "cold start" dataset). What the market reactors don't realize is it's inference, not training, where the vast majority of GPU compute will go towards.
The fact is, DeepSeek-R1 actually requires MORE GPU processing power during inference because it does all the thinking real-time which requires a lot more tokens than an equivalent Llama or Qwen or any other current LLM for the same prompt.