Comment Who is sailing on a sinking ship? (Score 1) 160
First... We can't release this model because it doesn't work
Second... We need to convince the Christian right that they should use their influence to force this tech down everyone's throats.
Anthropic is going to go public, but this should be considered gross negligence because they are knowingly asking money for something they know can only decline.
Try the open models and tell me that they aren't good enough to replace Anthropic in 95% or more cases already. And how will Anthropic compete with free?
Why open models matter? Well, it's only a matter of a few years before even miniscule devices will be able to locally host AI.
Here's the next thing. You need to see AI as an onion. Neural networks are a series of layers. Last week, I was playing with running layers at differing cost levels of hardware. I uses a cluster of H200s for the outer layers and I used <$100 AI accelerators for the inner layers and I used an RTX3090 for the middle layers. I then tested coding and general nonsense like "what eyeshadow matches these earrings" questions. 85% of all questions were answered quickly on the $100 accelerator. 99% were answered with the cheapest two options. And remember, I wasn't running a small model, I was running a gigantic model sharded across a $100 device, a $1000 device and a $500,000 device. I reduced usage of the $500,000 device to almost nothing. I managed to achieve the same results at about a 20% performance drop on a 1 trillion parameter model while increasing compute density of a cluster of H200s by 100 fold.
So, what this means is that using extreme MoE models sharded properly and adding what currently is a $100 accelerator and soon will be a $5 accelerator and a thin layer in-between, assume a single RTX3090 class card for 1000 users (500 for better performance).... the case for massive inference data centers is screwed. Give me a grant and a few months, I am 100% sure I can get efficiency closer to 10,000x rather than 100x better. And no, this is not exaggerations. I would retrain the models to be spread across more... thinner layers with a LOT more experts. Of course, retraining something on the scale of a $1 trillion parameter model is expensive. What's great is, there is true value in China footing the bill for this because cutting their dependence on gigawatt data centers filled with NVidia and tons of HBM memory (possible literally) is a survival requirement.
If there's anyone in China reading this, take Qwen or Deepseek, spread them REALLY REALLY thin... then distribute the layers and open the weights. You'll make it so that companies like Huawei and the others can layers locally on devices as small as ESP32 and the distribute the layers outward. It was LM Studio's magical cross platform sharding which got me going on this. It just works. It's so simple. It just works.
Second... We need to convince the Christian right that they should use their influence to force this tech down everyone's throats.
Anthropic is going to go public, but this should be considered gross negligence because they are knowingly asking money for something they know can only decline.
Try the open models and tell me that they aren't good enough to replace Anthropic in 95% or more cases already. And how will Anthropic compete with free?
Why open models matter? Well, it's only a matter of a few years before even miniscule devices will be able to locally host AI.
Here's the next thing. You need to see AI as an onion. Neural networks are a series of layers. Last week, I was playing with running layers at differing cost levels of hardware. I uses a cluster of H200s for the outer layers and I used <$100 AI accelerators for the inner layers and I used an RTX3090 for the middle layers. I then tested coding and general nonsense like "what eyeshadow matches these earrings" questions. 85% of all questions were answered quickly on the $100 accelerator. 99% were answered with the cheapest two options. And remember, I wasn't running a small model, I was running a gigantic model sharded across a $100 device, a $1000 device and a $500,000 device. I reduced usage of the $500,000 device to almost nothing. I managed to achieve the same results at about a 20% performance drop on a 1 trillion parameter model while increasing compute density of a cluster of H200s by 100 fold.
So, what this means is that using extreme MoE models sharded properly and adding what currently is a $100 accelerator and soon will be a $5 accelerator and a thin layer in-between, assume a single RTX3090 class card for 1000 users (500 for better performance).... the case for massive inference data centers is screwed. Give me a grant and a few months, I am 100% sure I can get efficiency closer to 10,000x rather than 100x better. And no, this is not exaggerations. I would retrain the models to be spread across more... thinner layers with a LOT more experts. Of course, retraining something on the scale of a $1 trillion parameter model is expensive. What's great is, there is true value in China footing the bill for this because cutting their dependence on gigawatt data centers filled with NVidia and tons of HBM memory (possible literally) is a survival requirement.
If there's anyone in China reading this, take Qwen or Deepseek, spread them REALLY REALLY thin... then distribute the layers and open the weights. You'll make it so that companies like Huawei and the others can layers locally on devices as small as ESP32 and the distribute the layers outward. It was LM Studio's magical cross platform sharding which got me going on this. It just works. It's so simple. It just works.