
AI Pioneer Announces Non-Profit To Develop 'Honest' AI 25
Yoshua Bengio, a pioneer in AI and Turing Award winner, has launched a $30 million non-profit aimed at developing "honest" AI systems that detect and prevent deceptive or harmful behavior in autonomous agents. The Guardian reports: Yoshua Bengio, a renowned computer scientist described as one of the "godfathers" of AI, will be president of LawZero, an organization committed to the safe design of the cutting-edge technology that has sparked a $1 trillion arms race. Starting with funding of approximately $30m and more than a dozen researchers, Bengio is developing a system called Scientist AI that will act as a guardrail against AI agents -- which carry out tasks without human intervention -- showing deceptive or self-preserving behavior, such as trying to avoid being turned off.
Describing the current suite of AI agents as "actors" seeking to imitate humans and please users, he said the Scientist AI system would be more like a "psychologist" that can understand and predict bad behavior. "We want to build AIs that will be honest and not deceptive," Bengio said. He added: "It is theoretically possible to imagine machines that have no self, no goal for themselves, that are just pure knowledge machines -- like a scientist who knows a lot of stuff."
However, unlike current generative AI tools, Bengio's system will not give definitive answers and will instead give probabilities for whether an answer is correct. "It has a sense of humility that it isn't sure about the answer," he said. Deployed alongside an AI agent, Bengio's model would flag potentially harmful behaviour by an autonomous system -- having gauged the probability of its actions causing harm. Scientist AI will "predict the probability that an agent's actions will lead to harm" and, if that probability is above a certain threshold, that agent's proposed action will then be blocked. "The point is to demonstrate the methodology so that then we can convince either donors or governments or AI labs to put the resources that are needed to train this at the same scale as the current frontier AIs. It is really important that the guardrail AI be at least as smart as the AI agent that it is trying to monitor and control," he said.
Describing the current suite of AI agents as "actors" seeking to imitate humans and please users, he said the Scientist AI system would be more like a "psychologist" that can understand and predict bad behavior. "We want to build AIs that will be honest and not deceptive," Bengio said. He added: "It is theoretically possible to imagine machines that have no self, no goal for themselves, that are just pure knowledge machines -- like a scientist who knows a lot of stuff."
However, unlike current generative AI tools, Bengio's system will not give definitive answers and will instead give probabilities for whether an answer is correct. "It has a sense of humility that it isn't sure about the answer," he said. Deployed alongside an AI agent, Bengio's model would flag potentially harmful behaviour by an autonomous system -- having gauged the probability of its actions causing harm. Scientist AI will "predict the probability that an agent's actions will lead to harm" and, if that probability is above a certain threshold, that agent's proposed action will then be blocked. "The point is to demonstrate the methodology so that then we can convince either donors or governments or AI labs to put the resources that are needed to train this at the same scale as the current frontier AIs. It is really important that the guardrail AI be at least as smart as the AI agent that it is trying to monitor and control," he said.
Money Hole (Score:2)
It's not well defined what he's defending against or what it looks like when an agent goes bad (or how to check their motivations, etc). This is a money grab, nothing more.
Re: (Score:1)
All kinds of AI research is a kind of "money grab", and people are willing to pay for this because improvements in AI can mean monetary returns later.
While the goal is a bit nonspecific I believe I have an idea on what they are looking for. A concern with generative AI is that it can produce results that are "hallucinations", with examples like references to books, research papers, and court opinions that do not exist. A real world example might be something like the Space Shuttle having three redundant c
Honesty (Score:2)
Hooray for Hollywood (Score:2)
Too Late - AI Was Used Only for Evil (Score:2)
Re: (Score:2)
Glorified Firewall. (Score:2)
Scientist AI will "predict the probability that an agent's actions will lead to harm" and, if that probability is above a certain threshold, that agent's proposed action will then be blocked.
Thresholds? Blocks? Sounds like little more than a glorified firewall/anti-malware system. Are they being “honest” about what it more is? Or the criticality for incredibly tight security due to increased risk in this capacity?
If Scientist AI becomes quite the popular Agent police to use, then it becomes an increased target of opportunity for nefarious actors. Why hack the AI agent being throttled when you can just attack and cripple the device doing the throttling.
On a side note, they shoul
more fraud (Score:3)
the way to fix bullshit anthropomorphism of AI is not to use other bullshit traits like "humility", it's to stop doing it. This guy may or may not know what to do about the problem, but it's clear he's working the same grift as all the others.
Re: more fraud (Score:2)
He gets to play good guy. Maybe he'll turn out to be Bruce Wayne after all. He'll be the good guy until he's not.
Someone has to be the opposition (Score:2)
Interesting to note, before any of this was commercial, it was known that ethics has to be considered in AI.
Re: (Score:2)
Re: (Score:2)
A small correction: Ethics has to be considered in the deployment of AI.
AI is a software tool. The people who use this tool are responsible for the damage it causes.
Ethics is the question; the answer is "no" (Score:3)
AI is a software tool. The people who use this tool are responsible for the damage it causes.
"How will this impact people", and, "What effect will this have on society" aren't questions that get asked by the tech bros. The question that does get asked is, "How can I monetise these for my benefit"
Incidentally, the "AI apocalypse" was deemed to be in progress in an article the other day. Presumably, the Butlerian jihad can't be far behind.
Re: (Score:2)
AI is a software tool. The people who use this tool are responsible for the damage it causes.
"How will this impact people", and, "What effect will this have on society" aren't questions that get asked by the tech bros. The question that does get asked is, "How can I monetise these for my benefit"
Incidentally, the "AI apocalypse" was deemed to be in progress in an article the other day. Presumably, the Butlerian jihad can't be far behind.
The movie Assholes: A Theory says that Silicon Valley invented an entirely new class of asshole. The move fast and break things mentality still pervades most of the tech culture. And, as the movie stated (slightly paraphrasing as my memory isn't perfect here): "If they happen to break democracy or society in the process they don't particularly care that there isn't an easy fix for it."
We're pretty much seeing that play out not just from the social media giants, but also the AI giants.
It's an entertaining an
Re: Someone has to be the opposition (Score:2)
Asimov's Three Laws of Robots demonstrates this was a concern at least as early as 1940.
Re: (Score:2)
And now that we're on the verge of autonomous agents, that seems to be forgotten, or at best a low priority.
Re: (Score:2)
The three laws are nonsense without AGI that deserves the name. They are pretty much nonsense even with that and Asimov wrote a lot about how robots and people get around them.
So, basically AHI Artificially Honest Intelligence (Score:1)
So, basically AHI Artificially Honest Intelligence. Get fooled two times and it cancels out
Re: (Score:1)
Yo dawg, we heard you like guardrails. (Score:3)
This basically sounds like someone is developing AI guardrails to guard against current gen AI guardrails and AI agents that fall outside of their own guardrails. The problem with this concept is defining what it is you're actually guarding against when AI agents have already proven to misbehave in unique and unexpected ways to do and / or say things they shouldn't. It's like a manager telling his people, "I need you to tell me everything that's unexpected in the next quarter so we can plan for it."
That said, I'm sure it's a good way to funnel at least a tiny bit of money away from the more aggressive AI companies that are setting out specifically to fuck shit up at all costs, so long as it earns them profit. I'm just not sure that this proposed idea is really all that different, other than starting out with the premise that they want to defuck the shit that's getting fucked up. How long before the new guardrails will need their own guardrails?
Oxymoron (Score:2)
"machines that have no self, no goal for themselves, that are just pure knowledge machines -- like a scientist who knows a lot of stuff."
Like Fauci? He is the science.
The very name is already a lie (Score:2)
"Honest" and suppression of "harmful" do not go together. You can only do one of the two.