Comment Re:Why not say 55 (Score 1) 45
Rounding rules.
Rounding rules.
I think Olmo had clear dataset downloads. Need to check later.
I appreciate your response and think you've got a few good points, but I still don't see the training data to be the important one. Let's be honest, 80% of data is crap anyway. I think the commercial curated datasets have a higher standard, but if you take some crawl-dataset you first need good quality filters and then you notice that its outdated and you somehow should get at least a bit of recent data so people asking about the current president don't call the model crap.
The important things are the architecture and the insights about the architecture. That's why we're still stuck with transformers, we know much about them and companies investing millions know that they will get out a workable model while other architectures are more of a gamble. Not even transformers in Bitnets were tried as larger models, because it could be a complete loss of investment.
The next thing is insights about efficiency. After the DeepSeek release they had an open source week where they released most of their software containing training optimizations. Google is a bit more closed, but they still had a paper how they got their inference pipeline efficient that communicates the high-level ideas. Details would be nice, but if you build a google scale datacenter it will not replicate the structure Google uses in detail anyway, so maybe the ideas are already helpful and the plumbing not that much needed.
I also think you still do not value foundation models enough. If you want to evaluate some idea, let's say some new reinforcement learning technique, you need a base model that you can tweak with it. Your budget covers trying your tweak, not training a base model before getting started. When I've read the first model paper I was like "... and what's their result?" when they only described their model and very little you could program with their paper. But in the end the result is the model you can use and you know the most important facts about. The first Llama3 models are quite outdated by know, still they are often used in other papers, because their properties are well-known and you can better compare with other research when you use the same base. They also train well as far as I know (I'm not doing LLM training myself).
"Take the full set of preferred elements that build the full state of the art system"
You expect everyone to have the budget to do so. A Hobbyist can grab a smaller Llama model and a xx90 card and start tinkering. But many smaller research departments could not train an 8B llama from scratch. While it would certainly be interesting to have a base model more tailored to the thing you want to build on top, it would be costly and take time.
I mean best case is someone releases everything, model, benchmarks, architecture, inference software, training material, reports what problems they encountered during training and how they solved them. But if I have to choose between base model or training data, I'd choose the base model.
Otherwise I agree, with models at that scale we depend a bit (too) much on large companies doing the base training. But we currently have little other choice. On the other hand, in 2025 people got the GPT-2 speedrun down to under 2 minutes. I wonder how long the Kimi K2 speedrun will take in 2030.
My browser has read your post before rendering it. Putting it through an AI does the same as my browser does. It uses it as input, does some fancy processing and gives me an output. The browser takes the bytes and renders them into truetype letters, the AI can for example summarize it. In the end are both processing the input to give an output created from the original content.
If I buy a book, the authors also can't opt out of me using it as a doorstop or doing other things with it they didn't intend me to do. I would be more worried if authors could deny me certain uses after I bought a book.
But who are the people using the AI?
I've got the impression the "AI will replace us" people do not think about the AI we have, but about some form of robots that autonomously do everything on their own. But every image generator has a person who uses it. Who first thinks about what they want, then try designs with different prompts and ideas, then refine a good candidate with the more direct tools, refine the image in an editor afterward, possibly train models to create more consistent images of the same character/thing/style and so on.
If you think AI is creative on its own, ask a LLM about a joke. Most will tell you something about atoms or scarecrows over and over again. AI is good at executing things, good at knowing things, but where should creativity come in? In the best case there were a lot of learned "presets" that are chosen depending on a random seed that would appear like creativity, but in practice the creativity has to come from a human controlling the thing.
I don't see this as a large problem and not only to save jobs, but also just because nobodys said these tools must be able to replace humans or human creativity. Photoshop also does not replace your drawing skill or other things that are more comfortable with using advanced tools but are not provided by the tool itself.
The missing
Would you please adjust your irony detector? Or do your really need a
Anti-Trust.
"the greatest competitive pressure we've ever seen"
Given that they didn't have much competition before, that's no real superlative. Google managed really late to become part of the game. Remember their old announcements about Bard and how they struggled to catch up at all (even though they invented LLMs)? They managed to get good by now, but before Anthropic was the only competition for OpenAI.
If companies do not need to pay humans to make the ads, they take the money and pay the humans to make other things. It's not like there wouldn't be enough labor left.
Great, tell the politicians how you trick the face detector, so they can introduce a new law forcing you to upload your full id, or letting yourself be verified by some GAFAM company who they trust more than the webcam verification companies. These are possibly shady, but better than giving Meta your data so they can vouch for your age.
We'll probably soon be seeing "May contain AI" disclaimers on everything. How do you want to be sure that no component of your ad pipeline uses AI? You can't. And probably a few indeed do use AI.
By the way, can we get disclaimers on food ads if they use glue instead of milk when presenting how it allegedly looks like when you pour milk on cornflakes?
Installing someone's system image instead of just a software was a dumb idea from the very first docker release on. We are lucky if people are able to build secure software, but docker requires them to also build secure system images. It also discourages people from running updates, saying "you can just deploy the new image in no time". Yeah, but I'd rather update when my OS has the update, not when some random guy uploading stuff to dockerhub finally gets around to rebuild the image.
Add to that:
- ChatGPT/Sora gets an advantage over other video generators
- ChatGPT gets more (paying) users and makes more profit (to the benefit of all investors)
- OpenAI has less legal problems
- OpenAI's investors have less concern about OpenAI's legal problems
- Disney's content is present on another platform and having users interact with it, which is basically a free ad
- There are probably a lot of ideas how to monetize such creations, from ads for related Disney movies to selling prints or even plushies of the creations that make Disney money (and not OpenAI)
It is a bet on OpenAI's continued growth, but it may be a good investment in the future.
Opera offers you an Opera browser and an Neon AI Browser. So why do you ask for Neon without AI instead of just using Opera?
My idea of roughing it is when room service is late.