
Reasoning LLMs Deliver Value Today, So AGI Hype Doesn't Matter (simonwillison.net) 73
Simon Willison, commenting on the recent paper from Apple researchers that found state-of-the-art large language models face complete performance collapse beyond certain complexity thresholds: I thought this paper got way more attention than it warranted -- the title "The Illusion of Thinking" captured the attention of the "LLMs are over-hyped junk" crowd. I saw enough well-reasoned rebuttals that I didn't feel it worth digging into.
And now, notable LLM skeptic Gary Marcus has saved me some time by aggregating the best of those rebuttals together in one place!
[...] And therein lies my disagreement. I'm not interested in whether or not LLMs are the "road to AGI". I continue to care only about whether they have useful applications today, once you've understood their limitations.
Reasoning LLMs are a relatively new and interesting twist on the genre. They are demonstrably able to solve a whole bunch of problems that previous LLMs were unable to handle, hence why we've seen a rush of new models from OpenAI and Anthropic and Gemini and DeepSeek and Qwen and Mistral.
They get even more interesting when you combine them with tools.
They're already useful to me today, whether or not they can reliably solve the Tower of Hanoi or River Crossing puzzles.
And now, notable LLM skeptic Gary Marcus has saved me some time by aggregating the best of those rebuttals together in one place!
[...] And therein lies my disagreement. I'm not interested in whether or not LLMs are the "road to AGI". I continue to care only about whether they have useful applications today, once you've understood their limitations.
Reasoning LLMs are a relatively new and interesting twist on the genre. They are demonstrably able to solve a whole bunch of problems that previous LLMs were unable to handle, hence why we've seen a rush of new models from OpenAI and Anthropic and Gemini and DeepSeek and Qwen and Mistral.
They get even more interesting when you combine them with tools.
They're already useful to me today, whether or not they can reliably solve the Tower of Hanoi or River Crossing puzzles.
combine them with tools? (Score:1)
"They get even more interesting when you combine them with tools."
LLMs ARE tools. How long will it take for people to understand that LLMs are not magic, they are computer applications.
Re: (Score:1)
"They get even more interesting when you combine them with tools."
LLMs ARE tools. How long will it take for people to understand that LLMs are not magic, they are computer applications.
Tools has a specific meaning in this context. But if it makes you happy, insert the word "other" in the obvious place.
Re: (Score:2)
Re: (Score:2)
Look up what "tool calling" in LLMs is. And read some article about MCP.
The sentence basically says "A Computer can print easier, if you install printer drivers".
If you want to LLM to control something, you need to give it access to tools for controlling something.
Re: (Score:2)
How long will it take for people to understand that LLMs are not magic
Have you ever met humans?
I Disagree (Score:5, Insightful)
Your argument seems to be that, since the lies, politely referred to as AI hype, contain partial truths(LLMs do what you think they should) then the fact that they are lying does not matter.
I disagree. It is my opinion that big fat whopper lies are being told continuously and few are being called to task about it, like they should. They are still lying and those lies definitely matter.
The Emperor has no clothes. That he's wearing socks does change the first statement.
Re:I Disagree ... Pragmatism (Score:2)
I continue to care only about whether they have useful applications today, once you've understood their limitations.
The point being made, I think, is to ignore all the hype, propaganda, marketing and sales pitches and focus on whether the tool is useful to you.
Make up your own mind and please don't tell me about it :)
Re: I Disagree ... Pragmatism (Score:2)
Re: (Score:3)
Your argument seems to be that, since the lies
Let me stop you there. The fact that some people use LLMs to generate text without thought and end up with lies / hallucinations / bullshit outputs has little to nothing to do with the concept of AIs and LLMs providing value. LLMs objectively do provide value, the value depends on how they are fed and implemented. We use LLMs at work to process natural language input and respond with only authorities references, effectively searching documents in a perfect form.
Just some of the examples of LLM use cases whi
Re: (Score:2)
Let me stop you there. The fact that some people use LLMs to generate text without thought and end up with lies / hallucinations / bullshit outputs has little to nothing to do with the concept of AIs and LLMs providing value.
You stopped yourself too soon. You should have read the whole sentence. Then you might have realized that the lies that I and the author are talking about are the marketing and sales lies, not the questionable output of the LLM itself.
Re: (Score:2)
Erm. How many trillions of dollars does it take to redo "Google" in your organization yourself, inhouse?
The argument about the value of AI is about if it is superhuman, because only superhuman capabilities justify the fuckload of investor money being thrown into this technology. There is a bubble currently and it will burst, and in the meantime the bubble is causing unnecessary destruction in the economy, by reorganising companies with proven expertise into generic service providers of dubious quality.
Re: (Score:3)
Well, yes -- the lies and the exaggerations are a problem. But even if you *discount* the lies and exaggerations, they're not *all of the problem*.
I have no reason to believe this particular individual is a liar, so I'm inclined to entertain his argument as being offered in good faith. That doesn't mean I necessarily have to buy into it. I'm also allowed to have *degrees* of belief; while the gentleman has *a* point, that doesn't mean there aren't other points to make.
That's where I am on his point. I th
Re: (Score:2)
That he's wearing socks does change the first statement.
To the technically minded it does change the first statement; however, it does not change the intended message of the first statement.
Re: (Score:2)
The point is another one: They deliver value, but NOT as an encyclopedia. ... generate new texts.
If you try to use a text generator as an encyclopedia you shouldn't be surprised when it doesn't only generate works about existing things, but also does what it is programmed to do
The value delivered is not in producing truth, but in processing truth. Give the LLM a Wikipedia tool and you can ask it questions about an article. It then retrieves the article and answers your questions about it. The point is, you
The paper is TL;DR (Score:4, Funny)
Has anyone got an AI generated summary for me to read ?
Re: (Score:2)
What about asking your favorite AI model to create one?
Need a new name - Artificial skill? knowledge? (Score:4, Interesting)
When we call LLMs and related systems "Artificial Intelligence", what we are really doing is false advertising. We need a better name. Maybe "Artificial Skills" and "Artificial knowledge"? This whole AGI thing, pretending that current "AI" is a step on the way to actual artificial intelligence, except in that it's another failed step done by researchers trying to work out what intelligence actually is is a big con job. There's no clarity about that at all.
This is really needed because the systems break in horrible ways, such as Tesla cars being able to drive, but not understanding that they are driving in dangerous conditions where their cameras aren't enough and need to slow down. The confusing this is causing is already ending up with people dead.
Re: (Score:1)
Re: (Score:2)
The strong do as they will. The weak do as they must. 5-th Century BC Athenian commons vs Militians
The quote is "the strong do what they can and the weak suffer what they must," and it was the Melians not the Militians, and it wasn't Athenian commons, it was an Athenian commander.
Re: (Score:2)
You make a case for not allowing laymen access to scientific research. "Artificial intelligence" is a fairly well defined technical term. Your point is that it is confusing for the uneducated.
I disagree. It's important that the public be allowed access to scientific research, not only because they pay for most of it, but aslo as it is a collective endeavour of humanity, and everyone should be given as much opportunity as practical to educate themselves.
Re: (Score:2)
I don't want the research departments renamed. If they are either searching for how to create artificial intelligence or they are dealing with the outcomes of the research that's fine. What I want is that the products delivered are not allowed to use the branding.
Re: (Score:2)
We "created" artificial intelligence in the 50s. You're taking your definition of the term from science fiction. Emphasis on the last word in that sentence.
Re: (Score:2)
My definition was clearly stated in "Professor Jefferson’s Lister Oration for 1949" as referenced in the seminal work Computing Machinery and Intelligence, itself in part written before 1950. It is completely standard.
Re: (Score:2)
I didn't see the term "artificial intelligence" in there anywhere, sorry. Not even "intelligence."
Re: (Score:3)
Why? Artificial Intelligence is exactly what the definition has always been, a system that learns / is trained on input and then produces output. LLMs fit this description as did the earliest AI papers. The term was effectively coined in the 50s to describe exactly the kind of thing LLMs are now. Your desire to redefine it now is your misunderstanding (probably from reading too many scifi books), not a problem with the word itself.
I agree with the rest of it though. AI in its current form is not a step to A
Re: (Score:2)
Artificial Intelligence is exactly what the definition has always been, a system that learns / is trained on input and then produces output.
That isn't the original definition. Here is how it was defined in 1955, in the proposal for the Dartmouth Summer Research Project on Artificial Intelligence [computerhistory.org].
The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves.
They defined AI in terms of what it does (simulating intelligent behavior), not what approach or mechanism is used to do it (learning from input, programming by an expert, trial and error, etc.)
Re: (Score:2)
Errr. Maybe re-read what you quoted. They did define AI in terms of what it does, and what it does defined as simulate intelligence though a learning process, - which is precise what LLM models do.
Re: (Score:2)
The important part is the handwaving. In other words, for example
(my emphasis)
We now have learning clearly separated from the rest of intelligence. Just as we didn't call expert systems "intelligent systems" we should now be calling these "artificial learning" or "learning systems" and not "artificial intelligence" because they are not that.
Re: (Score:2)
and what it does defined as simulate intelligence though a learning process
No, they defined learning as only one aspect of intelligence. Read the whole proposal, which is fascinating. They start by listing seven aspects of "the artificial intelligence problem", only one of which (self-improvement) requires learning. Then they each propose the specific problems they want to work on during the two month project. Shannon wants to explore information theory concepts: noise, redundancy, etc. Minsky describes something similar to what we would now call reinforcement learning. McCa
Re: (Score:2)
When we call LLMs and related systems "Artificial Intelligence", what we are really doing is false advertising. We need a better name.
Traditionally the names have been "strong AI" or "weak AI." You can call LLMs "weak AI."
Re: (Score:2)
I'd be happy with that, but I think there's a reason that the marketing departments of AI related companies aren't using it.
partially true (Score:4, Interesting)
The best value an LLM provide, imho, is that they will like know more of the subject matter than you do.
I suspect when the bubble bursts and dust settles, we'll end up with a kind of interactive encyclopedia as as useable form factor for LLMs.
Re: partially true (Score:3)
Re: (Score:2)
Interactive encyclopedias that at times are confidently.... Wrong!
Just like Wikipedia, which is nonetheless the most useful website ever created. Both Wikipedia and LLM gives you a initial idea, which you can dig through in other resources for confirmation.
If Wikipedia stopped accepting new contributions, its static dump would be an important achievement of the 2000s. If LLMs stopped being updated, existing open weights models (Llama) would be an important achievement of the 2020s.
Re: (Score:1)
Don't forget that they also preserve language, being language models. You want to know how people talked in the 20s? Current models will preserve that forever. You want to know if the sentence you wrote for the person in your book who lives in the 20s using the right language for that? A language model tells you. I bet people in science, linguistics and history will be very happy in a few decades that they have the essence of the language of today packed into a small model.
Re: (Score:2)
Very interesting. For once, I have to note the contribution of Meta for hosting conversations in so many languages, enabling them to to preserve even relatively rare languages.
I agree very much that the value of the existing LLM is already much beyond the interactive encyclopedia mentioned by the OP. Even only the ability to summarise/expand/rephrase/explain text is mind boggling.
People here who ridicule LLM based on their failure at complex logics or pitiful level at chess are missing the point. Which is b
Re: (Score:3, Interesting)
Personally I have found several values for the LLM, for example:
1. Negative searches that were not possible with traditional search engines except in very rare scenarios. Like "Find me a molecule that has iron and oxygen, but not nitrogen."
2. Searching research papers with fuzzy words. It is very hard to find a research paper unless you know the exact words to search for it, but AI can translate your fuzzy wording into meaningful search results.
3. Testing out coding ideas. You can describe something you wan
Re: (Score:2)
1. Negative searches that were not possible with traditional search engines except in very rare scenarios. Like "Find me a molecule that has iron and oxygen, but not nitrogen."
2. Searching research papers with fuzzy words. It is very hard to find a research paper unless you know the exact words to search for it, but AI can translate your fuzzy wording into meaningful search results
uhhhh search engines supported sort of both of these until they all agreed to start sucking. They were fantastic features.
If you’re too young to know about old search engines how did you get here?
Re: (Score:2)
Re: (Score:2)
It is very hard to find a research paper unless you know the exact words to search for it
If you have the citation, finding the paper is likely going to be trivial. My guess is that you're just asking the LLM for a citation for something and not checking to see if the paper even exists. They're very good at generating pretend citations. LLMs are not search engines.
Testing out coding ideas. You can describe something you want and you get instantly code that creates the UI for you.
Or you could just draw a picture. Not only will you be able to iterate faster, you'll use significantly fewer resources. If that's not your thing, you could use any one of a zillion interface design tools. They don't need endles
Re: (Score:2)
The best value an LLM provide, imho, is that they will like know more of the subject matter than you do.
They are stuffed with statistics about more of the subject matter than you're familiar with, which is not the same thing as knowing. Even if you trained them only and exclusively on correct information presented logically, they would still hallucinate bullshit that looks as statistically likely as factual information.
I suspect when the bubble bursts and dust settles, we'll end up with a kind of interactive encyclopedia as as useable form factor for LLMs.
An LLM could be a guide to a real encyclopedia with actual facts, but if you trained it on the encyclopedia instead of having it citing it, it would still hallucinate horseshit.
Re: (Score:2)
Sure, but..
Let me ask you this. Lets say for the topic of teaching math... like the Khan Academy... where there is a lesson and the facts of whichever topic aren't in disputed... Trigonomet
Re: (Score:2)
Could the model not be trained to be nearly deterministic in it's outputs?
No. The technology doesn't do that. Instead of whatever ineffable process we use to correlate things in ways that make sense, it only and solely correlates things in ways which look like they make sense. You cannot train your way out of this problem, an entirely new technology is needed. Maybe to replace this, maybe only to augment it, but still fundamentally different.
Re: (Score:2)
It's a shame you are answering as AC instead of simply setting up a new account. It make it less worth replying to since most times ACs never come back.
I assume you mean this: https://ancillary-proxy.atarimworker.io?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2F... [wikipedia.org]
Which only applies to neural networks. However there is evidence, including slime moulds which have no neurons but show intelligence, that intelligence is not just a neural network.
Over the target (Score:1)
So now we're getting daily or more stories questioning the value of machine learning, discounting the importance of machine learning, declaiming the feasibility of AGI, downplaying the impact of models on employment, criticizing the hype around it all, predicting crashes in investments, etc. Every day, someone, somewhere is furiously writing another think piece on along these lines. There are at least two on the Slashdot main page right now.
Meanwhile, the makers are making, the investors are investing,
Re: (Score:2)
No
And it’s “they’re”
Re: (Score:2)
The really funny thing is, if you give most people a 20 layer Tower of Hanoi problem and ask them to solve it you'll get something that you might call "complete performance collapse."
The people who get to administer such tests usually refer to it by terms like "you've got to be fucking kidding" or similar.
Re: (Score:2)
This is a good perspective. It is almost certainly true that the capabilities of LLMs will continue to advance.
I also think a crash is very likely. That's because startups aren't really about inventing new technology, they are about finding new business models for some technology. I think a lot of the stuff that people are applying LLMs to is likely not to be particularly value-generating. That's in part because of structural problems with accuracy and hallucination, but also because an lot of human work is
Yep (Score:2)
Re: (Score:2)
They are far too inconsistent to be considered tools. These things are toys, as more people are discovering every day.
Same could be said for humans (Score:2)
"researchers that found state-of-the-art large language models face complete performance collapse beyond certain complexity thresholds"
Humans also face complete performance collapse with cognitive tasks beyond certain complexity thresholds
Re: (Score:2)
This is the dumbest take. Yes, humans make mistakes and their performance drops of with task complexity, but these things are in no way comparable to the failures of LLMs. The kinds of mistakes humans make are nothing like the kinds of 'mistakes' that LLMs make. You won't find humans accidentally fabricating citations or unintentionally summarizing text that doesn't exist. As for 'complexity', LLMs fail on even simplified versions of Towers of Hanoi, even when they're given explicit instructions. Hum
Pretty simple! (Score:2)
Oh, well - if Simon Willison says so! (Score:2)
Who is this "Simon Willison", exactly?
Yes, I can easily DDG him... but the point is, if this isn't someone most of us know. TFS should really provide some information regarding who he is.
And now that I know he's just one of the creators of Django... why should I care what his opinion is regarding LLMs, one way or the other?
Re: (Score:2)
Well, I can tell from the sample of this writing that he is either an idiot or a liar, maybe both.
Buzzword Industrial Complex needs a (Score:1)
new goal to keep the hype-train funded. Resting an laurels of existing AI is not enough to justify bigly P/E ratios.
"Once you understand their limitations" (Score:3, Insightful)
Worse, they are not being marketed OR deployed as assistants, or force multipliers, but as *replacements* for entire processes, without human oversight or intervention when they are - in NO way - suitable, or well enough trained to do so.
Most things comply somewhat closely with the 80/20 rule... 20% of the work takes 80% of the time. When well trained and in a solid framework (which is a lot of work in and of itself), LLM's can do the other 80%, maybe about 80% of the time. That's a huge productivity boost - but it's being sold as much, much more than that. An Air Traffic Control LLM has been floated. Not as a joke. No one who "understands the limitations" would ever take that seriously - but people in positions of responsibility are still seriously considering insanity like this.
He has a point... (Score:2)
We've gotten very wrapped up in the philosophical discussion of whether AI models are "thinking." But most people don't actually care whether we've reached some abstract achievement of creating "thought." Most people just care if the tool can do the job.
Of course, there are serious limitations with current AI tools, but every tool has limitations. The trick is making sure they are used responsibly and in a way that is cognizant of those limitations.
Re: (Score:2)
We've gotten very wrapped up in the philosophical discussion of whether AI models are "thinking." But most people don't actually care whether we've reached some abstract achievement of creating "thought." Most people just care if the tool can do the job.
The tool can't do the job because it's not thinking, which is why people keep bringing that up. Think about it before complaining!
Re: (Score:2)
No, the tool can't do certain jobs because it's too limited to do those jobs, not because it meets some abstract definition of "thinking." It doesn't matter if my calculator is "thinking" when it multiplies numbers- it can do my arithmetic no problem because it is doing the task it is designed to do.
Re: (Score:2)
Your calculator is executing a simple and provable algorithm, the LLM is executing a complex and non-provable one because the inputs are too varied. They are fundamentally different things. Your calculator is limited, but predictable and thus reliable.
Re: (Score:2)
There are different ways to validate an LLM and the ways to validate it may depend on what it's asked to do. For example, LLMs typically do a good job of drafting a simple letter (for example, a business cover letter). I can validate the output by simply reading it, and I can do so far quicker than it would take me to manually type out and format the letter.
On the other hand, there are certainly tasks I could ask an LLM to do where validation is much more difficult. But these are also tasks where validating
Re: (Score:2)
There are different ways to validate an LLM and
...all of them which don't involve human effort are bullshit, because only humans can suitably detect the hallucinations.
Can't fly a plane! /s (Score:2)
This is the problem a lot of people apply to every piece of technology that comes along.
"It can't safely fly a plane full of 300 people! It's useless!"
Ok, yeah, sure... I guess. But most things in the world don't have that degree of confidence needed. I used an LLM this morning to remind me of a plot thread in a TV show I haven't watched in 15 years. It got it right, as I remember it as well and it's infinitely easier than scrolling through 300-episode summaries.
And then... (Score:2)
... by recent Programming 102 students tried ChatGPT on some really simple code (game of life in about 40 lines of Python) and GhatGPT completely failed. "Reasoning" models will not do any better. The fact of the matter is that LLMs are really not any better that a somewhat improved search. And that falls flat on its face at pretty low complexities, and there is no way to do better. If the model is specialized, it will get a bit more depth given good (and hence expensive) training data, but it will still fa