Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror

Comment Market misconception (Score 1) 77

The reaction here is it took less GPUs to TRAIN the model, but that's simply because using strictly reinforcement learning they were able to skip the computationally expensive step of training on a supervised learning dataset (not counting the "cold start" dataset). What the market reactors don't realize is it's inference, not training, where the vast majority of GPU compute will go towards.

The fact is, DeepSeek-R1 actually requires MORE GPU processing power during inference because it does all the thinking real-time which requires a lot more tokens than an equivalent Llama or Qwen or any other current LLM for the same prompt.

Comment Questions answered (Score 2) 63

Just like SIMD operations on a CPU, GPUs are typically designed to execute the same instructions in parallel over a set of data. Even if the hardware algorithm is slower, parallelizing across a data stream can end up being much faster than executing it faster one at a time especially since hardware buses perform better with bursts of data.

The advantages of FPGAs and custom/special purpose ASICs is that you can choose to optimize less for generality and more for handling specialized tasks. If you have enough transistors/LUTs you can utilize more and more of them to optimize and reduce how many clock cycles your algorithm executes all the way until it takes only a single clock cycle (as long as you're not bumping in to path planning length limitations). FPGA speeds are typically less than 800Mhz so even single cycle operations can't get any faster than your FPGAs maximum speed whereas an ASIC can be designed to run at much higher clock speeds.

ASICs are typically simulated on fairly beefy FPGAs, often several working in concert, before being produced so calculating speed is obviously doable.

Slashdot Top Deals

Chemistry professors never die, they just fail to react.

Working...