Microsoft Reveals Two In-House AI Models 17
Today, Microsoft unveiled two in-house AI models: MAI-Voice-1, a high-speed speech-generation system now live in Copilot, and MAI-1-Preview, its first end-to-end foundation model trained on 15,000 H100 GPUs. Neowin reports: MAI-Voice-1 is a speech generation model and is already available in Copilot Daily and Podcasts. To preview the full capabilities of this voice model, Microsoft has created a new Copilot Labs experience that anyone can try today. With the Copilot Audio Expressions experience, users can just paste text content and select the voice, style, and mode to generate high-fidelity, expressive audio. They can also download the generated audio if required. Microsoft also highlighted that this MAI-Voice-1 model is very fast and efficient. In fact, it can generate a full minute of audio in under a second on a single GPU.
Second, Microsoft has begun public testing of MAI-1-preview on LMArena, a popular platform for community model evaluation. This represents MAI's first foundation model trained end-to-end and offers a glimpse of future offerings inside Copilot. They are actively spinning the flywheel to deliver improved models and will have much more to share in the coming months. MAI-1-preview is an MoE (mixture-of-experts) model, pre-trained and post-trained on nearly 15,000 NVIDIA H100 GPUs. Notably, MAI-1-preview is Microsoft's first foundation model trained end-to-end in-house. Microsoft claims that this model is better at following instructions and can offer helpful responses to everyday user questions. Microsoft will be rolling out this new model to certain text use cases within Copilot over the coming weeks.
Second, Microsoft has begun public testing of MAI-1-preview on LMArena, a popular platform for community model evaluation. This represents MAI's first foundation model trained end-to-end and offers a glimpse of future offerings inside Copilot. They are actively spinning the flywheel to deliver improved models and will have much more to share in the coming months. MAI-1-preview is an MoE (mixture-of-experts) model, pre-trained and post-trained on nearly 15,000 NVIDIA H100 GPUs. Notably, MAI-1-preview is Microsoft's first foundation model trained end-to-end in-house. Microsoft claims that this model is better at following instructions and can offer helpful responses to everyday user questions. Microsoft will be rolling out this new model to certain text use cases within Copilot over the coming weeks.
Spinning a flywheel, quite an achievement (Score:2)
"They are actively spinning the flywheel ..."
Wow fantastic. Finally something AI is good for.
Re: (Score:2)
I came here to ask: WTF does that mean?
Who writes this shit? Certainly somebody who doesn't know how to communicate. Park it with this story: https://f6ffb3fa-34ce-43c1-939d-77e64deb3c0c.atarimworker.io/story/25/... [slashdot.org]
Re: (Score:1)
Try it on "Sad" (Score:2)
Where's the Microsoft... (Score:2)
Where's the Microsoft out-house models? Gives new meaning to crapification or enshittification.
JoshK.
Re: (Score:2)
Out-shitification???
Re: (Score:2)
Indeed, quite. A humorous word...
I'd define the "Out-shittification" as when the crapification becomes the industry fad...and the next best thing that will align the planets, bring world peace; since sliced bread...
But as Hagrid said to Ron "Better out then in..."
https://ancillary-proxy.atarimworker.io?url=https%3A%2F%2Fyoutu.be%2FMaRUKKfGvy4%3Ft... [youtu.be]
JoshK.
Do people really use speech? (Score:2)
high-speed speech-generation system
Serious question: do people really want to talk to their AI? Unless you live and work alone, I just don't see it. Plus, the overly-bubbly California valley-girl voices are just grating.
Re: (Score:2)
Any improvement in this area is welcome, as long as it runs offline. Current models are....barely acceptable.
Re: (Score:3)
Re: (Score:2)
I was at a video trade show earlier this year and there was a lot of talk about agentic AI being used to help in video production environments. Not sure if they need more noise in a production room but the goal was to keep eyes on what's going on instead of looking down to fumble around with a keyboard and mouse. That's a bit different to speech generation, but is definitely about talking to AI. People are finding lots of uses for this in different environments, although it remains to be seen whether it
Re: (Score:2)
Re: (Score:2)
Do people talk to Siri or Google? Voice is for many an interface. And for many it is much faster than typing a full question on a mobile phone. Not everyone is using a desktop PC.
Can't even paste images (Score:3)
Re: (Score:1)
It is a good idea to disable that your browser can read the clipboard anyway.
Re: (Score:2)
Microsoft has long stopped caring about actually well-working technology. As long as all the clueless fanbois think there is no alternative to Microsoft's crappy products, profits are high and that is the only thing Microsoft cares about.