

MIT Report: 95% of Generative AI Pilots at Companies Are Failing (fortune.com) 93
The GenAI Divide: State of AI in Business 2025, a new report published by MIT's NANDA initiative, reveals that while generative AI holds promise for enterprises, most initiatives to drive rapid revenue growth are falling flat. Fortune: Despite the rush to integrate powerful new models, about 5% of AI pilot programs achieve rapid revenue acceleration; the vast majority stall, delivering little to no measurable impact on P&L. The research -- based on 150 interviews with leaders, a survey of 350 employees, and an analysis of 300 public AI deployments -- paints a clear divide between success stories and stalled projects.
To unpack these findings, I spoke with Aditya Challapally, the lead author of the report, and a research contributor to project NANDA at MIT. "Some large companies' pilots and younger startups are really excelling with generative AI," Challapally said. Startups led by 19- or 20-year-olds, for example, "have seen revenues jump from zero to $20 million in a year," he said. "It's because they pick one pain point, execute well, and partner smartly with companies who use their tools," he added.
But for 95% of companies in the dataset, generative AI implementation is falling short. The core issue? Not the quality of the AI models, but the "learning gap" for both tools and organizations. While executives often blame regulation or model performance, MIT's research points to flawed enterprise integration. Generic tools like ChatGPT excel for individuals because of their flexibility, but they stall in enterprise use since they don't learn from or adapt to workflows, Challapally explained.
To unpack these findings, I spoke with Aditya Challapally, the lead author of the report, and a research contributor to project NANDA at MIT. "Some large companies' pilots and younger startups are really excelling with generative AI," Challapally said. Startups led by 19- or 20-year-olds, for example, "have seen revenues jump from zero to $20 million in a year," he said. "It's because they pick one pain point, execute well, and partner smartly with companies who use their tools," he added.
But for 95% of companies in the dataset, generative AI implementation is falling short. The core issue? Not the quality of the AI models, but the "learning gap" for both tools and organizations. While executives often blame regulation or model performance, MIT's research points to flawed enterprise integration. Generic tools like ChatGPT excel for individuals because of their flexibility, but they stall in enterprise use since they don't learn from or adapt to workflows, Challapally explained.
startups (Score:4, Informative)
Re: (Score:2)
This is about initiatives within established enterprises to utilize these technologies.
Startup vs established corp (Score:4, Insightful)
Startup: Hey, let us use AI to solve a specific problem.
Established corp: Hey, let us use AI to resolve a problem we created by our bloated processes and decisions, and while at there, to reduce workforce.
No wonder why it is working for a few of them.
Re:Startup vs established corp (Score:5, Interesting)
This might also have something to do with the fact that AIs still hallucinate a lot and are fundamentally unreliable.
Individuals using AIs casually can easily check their facts. But when actual responsibility is accorded to an AI, as in a corporate setting, it is going to fail.
Re: (Score:2)
when actual responsibility is accorded to an AI, as in a corporate setting, it is going to fail
This. And for a corporation, the whole point is that the AI's output has to be used without having humans look at it, because humans cost [more] money. This is a fundamental, well-discussed issue with many LLM use cases that live in important spaces - to act on the output, you need human invigilators, and once you add enough of those to restore safety to the system, the AI becomes irrelevant.
Re: Startup vs established corp (Score:2)
Is this the same excuse that market makers such as Ken Griffin make when explaining why they haven't moved to a T+0 securities transaction system, and the same excuse they used before to delay adoption of T+1, T+2, etc.?
Re: (Score:2)
You are unfairly generalizing use cases.
There are lot of tasks for which current AIs are unsuitable.
There are a lot of taks for wich current AIs are an impressive force multiplier.
But you've got to pick the right AI for the right task. And even so
only some tasks benefit.
Additionally, it's a moving target. But the ones that are good value will continue to be good value.
This article indicates, probably correctly, that in 95% of the cases that AI is not suited for the job it was placed in. That's what happe
Re: (Score:2)
This article indicates, probably correctly, that in 95% of the cases that AI is not suited for the job it was placed in. That's what happens when you believe a salesman
But this is either circular logic, or "you just have to BELIEEEEEEVE". Huge amounts have been spent on placing AI in various positions. If AI salespeople are making 95% of their money selling into places where AI won't work... then the only thing that is provable is "AI can successfully handle only 5% of the paid jobs that it's receiving". It could mean the sustainably addressable market is only 5% of the current snake oil hype market.
Re: (Score:2)
Yeah, this is true for most if not all hype cycles [archive.ph].
In this case there's an international race towards weaponizing the hype before the competition,
so it's anyone's guess how long the industry stays on nationally-funded life support.
Re: Startup vs established corp (Score:1)
Why assume humans don't make mistakes, hallucinate, lie, cheat, steal etc.?
What measurable impact on profit and loss do you have?
Re: (Score:3)
Are you serious? These answers should be obvious.
Humans are put through a vetting process when hired, sometimes including criminal background check and usually including some level of proof-of-competence during the interview. Then when they are put on the job they are assigned a team, a supervisor, etc., so they aren't operating in isolation. And on top of that there are performance reviews, auditable processes when security is paramount, and so on.
It's an imperfect system but it is "good enough" to weed
Re: Startup vs established corp (Score:2)
Do you think I should take your confident assertions on faith alone, or might business people want to be free to test to see if you're the one hallucinating?
Re: Startup vs established corp (Score:1)
Re: Startup vs established corp (Score:2)
Re: (Score:2)
Re: (Score:2)
That's not the same thing. If you use an LLM that's been trained on the contents of the web. And it's about as reliable as an average of those contents. If you train it one validated inputs, you still get some mistakes, but you can get really valuable results, that are worth checking out. But this takes a LOT of effort. And it's not what the salesmen promise.
Re: Startup vs established corp (Score:2)
Is AI grammar and spelling better than the average on the internet?
Re: (Score:2)
Individuals using AIs casually can easily check their facts.
Provided they know at least some basics about the thing in question.
As with pretty much all tools: They are most effective in the hands of a professional, and might be dangerous in the hands of a novice.
Claude 4.0 can't match braces!!! (Score:3)
Startup: Hey, let us use AI to solve a specific problem. Established corp: Hey, let us use AI to resolve a problem we created by our bloated processes and decisions, and while at there, to reduce workforce.
No wonder why it is working for a few of them.
The number is likely greater than 95%. I will wager those 5% who claim a gain are using misleading math and staked their careers on the company's LLM future. Remember, in order to be more than a fun toy to play with, it has to bring in more revenue or reduce headcount. If you're paying a human being to do it now, it requires a lot more accuracy than today's LLMs offer.
EVERYONE in the tech world is pushing LLMs to boost developer productivity. Claude 4.0 generates Java that compiles maybe 50% of the t
Re: Claude 4.0 can't match braces!!! (Score:2)
Why is AI so correct at natural language grammar?
Re: Claude 4.0 can't match braces!!! (Score:1)
Re: (Score:2)
Why is AI so correct at natural language grammar?
It's not. It's good at matching common style guides. Human language is redundant for fault tolerance reasons. There's far less data conveyed in a sentence than the information it contains. That's why you can finish someone else's... The meaning of the sentence was already transmitted before the end of the sentence. That inherent fault tolerance of language makes it easy for people to interpret around the nonsense LLMs inject into their sentence structure. That's why LLMs tend to use so many m-dashes and adv
Re: Claude 4.0 can't match braces!!! (Score:2)
Can you give an example of one grammar mistake it's made (how many times do humans screw up its vs. it's, compared to AI?)?
Re: (Score:2)
I literally included two in my post. LLMs tend to liberally sprinkle nonsense adverbs into sentences. They also slap em-dashes around like a nervous twitch.
Re: Claude 4.0 can't match braces!!! (Score:2)
Can you provide explicit examples? What if your idea of nonsense adverbs and excessive em dashes is a personal idiosyncracy?
Re: (Score:1)
Orders of magnitude more data available for training, and natural language itself is fuzzy and leaves plenty of room for interpretation and correction on the side of the consumer.
That's one of the big reasons current "AI" is so sketch in the sciences -- precision and correctness are very important if not outright required. In art and language, errors can be tolerated or sometimes even preferred.
Re: Claude 4.0 can't match braces!!! (Score:2)
Is natural language, like nature, complete in a mathematical sense, whereas math itself, as long as it strives for consistency, will remain incomplete?
In short, to understand nature, will you have to give up the consistency you've developed quite the mood affiliation for?
Re: (Score:3)
Because is not AI but LLM? As in Large Language Model?
is it?...or is Java just stricter? (Score:2)
Why is AI so correct at natural language grammar?
So...is it "so correct" or is your brain more forgiving than a compiler? If you see bad grammar, you correct it automatically in your brain because you're used to reading texts from friends with all sorts of typos. Java will fail compilation if you mess up. AI messes up simple shopping recommendations all the time.
If it was "so correct"....entire industries would die....quickly....about half of the job done in big: law firms, accounting firms, publishers, etc....could theoretically be replaced by an
Re: (Score:2)
So...is it "so correct" or is your brain more forgiving than a compiler? If you see bad grammar, you correct it automatically in your brain because you're used to reading texts from friends with all sorts of typos. Java will fail compilation if you mess up. AI messes up simple shopping recommendations all the time.
No, it's highly correct. Correct doesn't mean meaningful.
Grammar isn't the only reason a compiler will refuse to compile.
If it was "so correct"....entire industries would die....quickly....about half of the job done in big: law firms, accounting firms, publishers, etc....could theoretically be replaced by an AI....as well as data entry, billing, etc....not to mention lower-level jobs like tech support
Untrue.
Law firms, accounting firms, and publishers are all using LLMs right now. They're not limited by the grammatical correctness, they're limited by the fact that LLMs are full of shit.
It's one thing to give a cool response...it's another thing to replace the output of a human being. When AI actually works, we'll see a lot more evidence than keynotes and promises from Jensen Huang and Sam Altman. We'll see industries completely wiped out...the same way that streaming killed the video rental store.
LLMs can replace the output of some human beings already.
When they can replace the output of a lot of human beings, then we'll see major foundational problems.
But right now, if your job is reliant on
Re: (Score:2)
There are many more path to correctness in natural language.
In fact, you don't really need to be grammatically correct at all, and AI frequently is not.
Re: Claude 4.0 can't match braces!!! (Score:2)
How correct was the grammar of chatbots before the Attention mechanism was discovered?
Re: Startup vs established corp (Score:2)
I'm curious what specific problems have been "solved" by particular generative AI startups. I'm aware of AI-based tools for specific tasks, but nothing that uses LLMs to solve a problem (other than perhaps code generation).
Re: Startup vs established corp (Score:2)
"Humans can design proteins, but the search space is astronomical. LLM-like models (AlphaFold, ESMFold) have produced accurate predictions that individual researchers couldnâ(TM)t reach unaided. Not impossible for humans, but practically unachievable without AI."
"AlphaFold is an AI system developed by Google DeepMind that predicts a proteinâ(TM)s 3D structure from its amino acid sequence. It regularly achieves accuracy competitive with experiment."
Why not include fundraising as revenue? (Score:2)
How much of a factor is the revenue from a potential IPO? Or getting bought out? Why does MIT ignore the reality of stock markets as revenue generators that aren't necessarily reported in P&L but likely dwarf real revenue?
Re: (Score:2)
Re: Why not include fundraising as revenue? (Score:2)
If you bet on enough, if only one hits, will it be worth it?
Re: (Score:2)
Just remember to devalue the potential revenue by the percentage of companies that actually gain long term value
Why? Whose "success" are we talking about here? I would argue that if you're an entrepreneur and you build a hallucination, and it sells to someone so you, personally, make out like a bandit - that's success.
Re: (Score:2)
IIUC, for many of these startups the valuation is ... a wild guess, based on money that not even potentially currently accessible. And that different people offer valuations that differ by over a couple of orders of magnitude.
Re: Why not include fundraising as revenue? (Score:2)
Like Amazon when it was losing huge amounts of money?
Which means 5% are a success! (Score:2, Informative)
Have they tried having more faith? (Score:2)
The AI prophets have promised that if you just put your faith, your trust, and your livelihood in the hands of the AI gods, all will be well. Clearly, the 95% didn't have enough faith, and didn't invest heavily enough in the AI. If they had, it would have worked out for them.
Re: Have they tried having more faith? (Score:2)
Have you considered that AI could have written that report to lull you into a false sense of security?
Re: (Score:2)
Which AI prophets predicted that for when?
That sounds more like a salesman's promise than anything else.
The "AI prophets" I'm aware of that predicted something like that placed it a century or two ahead. Sometimes more. Even most Singulitarians put "the Singlarity" towards the end of the century. The first one I'm aware of that place it anytime close was https://ancillary-proxy.atarimworker.io?url=https%3A%2F%2Fai-2027.com%2F [ai-2027.com] , and we're arguably on their projected path.
Re: (Score:2)
Which AI prophets predicted that for when? That sounds more like a salesman's promise than anything else. The "AI prophets" I'm aware of that predicted something like that placed it a century or two ahead. Sometimes more. Even most Singulitarians put "the Singlarity" towards the end of the century. The first one I'm aware of that place it anytime close was https://ancillary-proxy.atarimworker.io?url=https%3A%2F%2Fai-2027.com%2F [ai-2027.com] , and we're arguably on their projected path.
I'm talking about the charlatans. The salesman, as you say. The Sam Altmans and their ilk. All of them are promising the AI god will solve every world problem, if we just keep pouring money and resources into them.
Re: Have they tried having more faith? (Score:2)
What if NLP is AI-complete?
And there is a good reason... (Score:5, Interesting)
"they stall in enterprise use since they don't learn from or adapt to workflows"
Unlikely that many organizations will permit their AI model to 'learn' from workflows that are proprietary and considered trade secrets. Especially if their AI models need to talk to the mothership at all.
Re: (Score:2)
I am sure there is a classic try to force new technology onto the old process problem. However I would also guess that the larger the enterprise the more corner cases there are, and GenAI just isn't good at corner cases.
Re: And there is a good reason... (Score:2)
How many corner cases are there in spelling and grammar that AI gets spectacularly right?
Re: And there is a good reason... (Score:1)
Re: And there is a good reason... (Score:2)
How many times was the corner case screwed up by fallible humans?
Re: (Score:2)
Re: And there is a good reason... (Score:2)
I mean, the same is true of many things that are automated. Every once in a while someone dies on an escalator, but far fewer people die from riding an escalator than by falling down stairs.
Re: (Score:1)
Equally, work that is transactional in nature, such as much customer support, lends itself to automation. AI is the latest tool to use to build those automations.
My last 3 roles were transactional, they just needed large knowledge sets to accomplish. And storage is cheap... Cheaper than my team.
Re: (Score:2)
Given that the reason code won't compile (50% of the time) is usually bad grammar, I'd say it not only doesn't get grammar spectacularly right, it actually gets it spectacularly wrong.
It also tends to repeat itself over and over, on irrelevant things. Much like you're doing.
Re: And there is a good reason... (Score:2)
Why couldn't code before the Attention mechanism generate natural language?
Re: (Score:2)
Who cares? In what way is that relevant to a discussion of the well documented spectacular failings and hallucinations of the current crop of AIs? It doesn't. You can keep harping on it while masturbating furiously all you want, it's still irrelevant.
The only thing any AI can reliably be used for so far is to sucker investors out of more money.
Re: And there is a good reason... (Score:2)
Are you aware that natural language processing has been the holy grail for linguistics and AI? Do you know how hard it is (from bitter experience in my case) to design a rule-based system that can parse and generate sentences that are correct in the grammar sense?
Have you just blithely moved goalposts? What if you're like the observers of Galileo's canonball drop experiment who walked away still believing heavier objects fall faster than lighter ones because look! the bridges that the Romans built using tha
Re: And there is a good reason... (Score:2)
If AI can get context-sensitive grammar right, what else can it do?
Re: (Score:2)
The thing is, it *IS* a good corner case...but it needs to be tailored to the job. It's not "one size fits all". Of course, that's not what the salesmen promised, and that's not what management wanted to hear, but it's what's true.
In carefully chosen tasks you can use it as an impressive force multiplier...but it often needs to be tailored to the task. And so far you always need to evaluate the output.
Re: And there is a good reason... (Score:2)
Donâ(TM)t adapt... aka don't embrace their replacement. Startups with a focus on AI go into it thinking their jobs are safe and important. Enterprise employees see it as training theory replacements or their coworkers. Less incentive to succeed for one of these groups...
Re: (Score:1)
When the printing press came, scribes found other work.
When the steam engine came, in many places, laborers found other work. Unless mass starvation and deaths from unemployment were more widespread than I have been told, which is possible.
When the automobile came, horsemen became chauffeurs. Or found other work.
When the computer dominated, calculators and others found work, many feeding and caring for those computers.
When the spreadsheet came, accountants did more with them than before. Clerks still entere
Re: (Score:2)
Re: And there is a good reason... (Score:2)
No one expects AI to procreate more humans.
And this is just the one example. You've got such a short horizon.
Re: (Score:2)
Re: (Score:2)
You wrote, in part;
'to do *everything* a human does'
And while procreation may even be successful with in vitro gestation, that is a ways from here.
I did not exclude any 'job' a human does. Plumbers are not likely to be entirely displaced by some implementation of AI, robotic residential plumbing is a ways from here. Various other crafts and trades also.
Take an honest look at the jobs that are transactional. These are obvious targets. More reason for our youth to pursue real skills.
Re: (Score:2)
Re: (Score:2)
And the various building trades claim there are shortages of skilled workers.
Besides, this is not a zero-sum game. Like all technological advances, the doomsayers are busy proclaiming the end of things.
Re: (Score:2)
Re: (Score:2)
>AI will make new opportunities. It will not happen overnight.
No, it will not. If the *last* revolution is any indication there is going to be a generation of poverty and suffering, death and disease, before those new jobs appear.
No - GenAI shines where errors are acceptable... (Score:2)
And in enterpresi in 99% of places errors are not acceptable...
It is fine if the picture with your cat is a litte off... it is not if your enterprise serve has a huge security hole... or messes customers data...
Re: No - GenAI shines where errors are acceptable. (Score:2)
What is the Enterprise doing with your data, selling it to the government for tracking?
Re: (Score:2)
Customers might be companies...
data might be anything like orders...
Re: No - GenAI shines where errors are acceptable (Score:2)
Remember when no one submitted credit cards online but it's become a small insurable cost to businesses to just deal with the relatively small amount of fraud?
Terrible headline (Score:2, Insightful)
The headline assumes that "no measurable impact on P&L" = failure
This is a symptom of a bigger problem, the idea that everything about a business can be reduced to measurable numbers
I suspect that AI tools will help some workers a bit and help others a lot, while annoying or impeding others
As for me, I find AI tools to be very useful in the work I do. It doesn't write my code or design my circuit, but it helps me find answers in difficult documentation
Re:Terrible headline FTFY (Score:1)
Re: (Score:3)
The headline assumes that "no measurable impact on P&L" = failure
And it's correct, because that's what people are trying to accomplish with it. Welcome to Capitalism, are you new?
Bullshit (Score:2)
Citation Needed (Score:5, Interesting)
Startups led by 19- or 20-year-olds, for example, "have seen revenues jump from zero to $20 million in a year," he said. "It's because they pick one pain point, execute well, and partner smartly with companies who use their tools
Citations needed. Which startups are implementing AI that results in 20 million dollar businesses? These stats are quite astonishing. But without any specifics whatsoever, it impossible to understand what is actually going on.
Which one pain point(s) have startups been able to address with AI that generated so much revenue?
Also, where is the actual report? One link is a Fortune fluff piece and the other is a link to a very vague Google forms survey.
Re: (Score:2, Redundant)
Which one pain point(s) have startups been able to address with AI that generated so much revenue?
I can think of only two: loneliness and inadequate access to AI. I'm pretty sure the only profitable uses for AI so far involve virtual girlfriends, porn, and selling AI to other companies.
Re: (Score:2)
That sounds like the startups alright, but I've heard of lots of use cases where AI was powerful, from protein folding to theorem checking.
Re: (Score:2)
That sounds like the startups alright, but I've heard of lots of use cases where AI was powerful, from protein folding to theorem checking.
Sorry, I should have said LLMs. AI in the broader sense can be used for all sorts of stuff.
Re: (Score:1)
Also, where is the actual report?
http://web.archive.org/web/202... [archive.org]
Cross purposes (Score:4, Interesting)
Apples, Oranges and SQL (Score:3)
- The apples and oranges issue between APIs like LangChain and traditional business process workflows found in tools like CRMs, and exiting IT data pipelines.
- SQL -- If the LLM has to generate correct SQL every single time for the integration to work, you may be doomed to failure. This isn't just syntax, it is correctly choosing all of the constraint values. One incorrectly chosen constraint value (e.g. "airplane" substituted for "aircraft") and the integration just does not work reliably.
Misnomer (Score:2)
There is in practice no such thing as "generative AI". All is derivative. ... and derived from pirated works in most cases.
Re: Misnomer (Score:2)
Everything is derivative (or just plain copying), though generative doesn't even imply novelty. I can generate a list of the first X positive integers and that will still be generative as long as I didn't specify each one literally. It will not be novel.
Generative AI Pilots (Score:2)
Well what type of aircraft are they flying? Maybe have them fly something slower / simpler like a Cessna to start with?
People still don't understand the problems (Score:3)
When you have a job opening in a company, there are expected job duties/responsibilities that come with the job, then you look for a person to fill that position. It's pretty straight forward, and companies generally will not just hire someone and then figure out what to have them do on the job to make them be worth the pay/benefits.
So, AI, all of this hype, but very few seem to focus on, "What can the AI do? How do we make the AI actually do these things?" So, it's like that "hire someone and then figure out if they can do this or that task within the company." mindset, it's very obviously flawed. If you look for an AI solution for a given task, companies would do a lot better, and that is when AI might actually pay off, but you have to be task focused first, then "what is the best way to handle that task?"
Companies that have executives who are so focused on financials that they don't pay attention to obvious problems within the company at to blame. Short term profits at the expense of long term growth is TYPICAL these days, and is why capitalism may seem good, right up until those who run businesses stop caring about the health of the company.
Wrong Goal (Score:2)
If the goal is to rapidly increase revenue then get more intelligent execs that set better goals. The goals should be to get things done faster and with higher quality, that is unlikely to grow revenue rapidly - especially when everyone else is also using AI. The real promise of AI is lower cost in the long term, and higher quality. But you're not going to make more revenue from that higher quality because everyone is using AI, it's just the ante to keep playing in the game.
Taking this at face value... (Score:2)