
OpenAI Unveils Coding-Focused GPT-4.1 While Phasing Out GPT-4.5 13
OpenAI unveiled its GPT-4.1 model family on Monday, prioritizing coding capabilities and instruction following while expanding context windows to 1 million tokens -- approximately 750,000 words. The lineup includes standard GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano variants, all available via API but not ChatGPT.
The flagship model scores 54.6% on SWE-bench Verified, lagging behind Google's Gemini 2.5 Pro (63.8%) and Anthropic's Claude 3.7 Sonnet (62.3%) on the same software engineering benchmark, according to TechCrunch. However, it achieves 72% accuracy on Video-MME's long video comprehension tests -- a significant improvement over GPT-4o's 65.3%.
OpenAI simultaneously announced plans to retire GPT-4.5 -- their largest model released just two months ago -- from API access by July 14. The company claims GPT-4.1 delivers "similar or improved performance" at substantially lower costs. Pricing follows a tiered structure: GPT-4.1 costs $2 per million input tokens and $8 per million output tokens, while GPT-4.1 nano -- OpenAI's "cheapest and fastest model ever" -- runs at just $0.10 per million input tokens.
All models feature a June 2024 knowledge cutoff, providing more current contextual understanding than previous iterations.
The flagship model scores 54.6% on SWE-bench Verified, lagging behind Google's Gemini 2.5 Pro (63.8%) and Anthropic's Claude 3.7 Sonnet (62.3%) on the same software engineering benchmark, according to TechCrunch. However, it achieves 72% accuracy on Video-MME's long video comprehension tests -- a significant improvement over GPT-4o's 65.3%.
OpenAI simultaneously announced plans to retire GPT-4.5 -- their largest model released just two months ago -- from API access by July 14. The company claims GPT-4.1 delivers "similar or improved performance" at substantially lower costs. Pricing follows a tiered structure: GPT-4.1 costs $2 per million input tokens and $8 per million output tokens, while GPT-4.1 nano -- OpenAI's "cheapest and fastest model ever" -- runs at just $0.10 per million input tokens.
All models feature a June 2024 knowledge cutoff, providing more current contextual understanding than previous iterations.
They are going from 4.5 to 4.1? (Score:3)
These people are above following the standard practice of making version numbers go up? Fuck these assholes.
Re:They are going from 4.5 to 4.1? (Score:5, Funny)
It's a countdown. When it reaches zero we have singularity.
Re: (Score:2)
I read it as 4.5 didn't end up performing as well as hoped in the real world and 4.1 being a direct iteration of 4o rather than 4.5.
They're operating with parallel code lines because each is targeting different use cases.
Re: (Score:2)
Have you looked at the other numbering? 4o, then o1, then 4o-mini, then o3, now 4.1 after 4o was better than 4 ... don't try to make sense of it. There is none.
Re:They are going from 4.5 to 4.1? (Score:4, Funny)
Have you looked at the other numbering? 4o, then o1, then 4o-mini, then o3, now 4.1 after 4o was better than 4 ... don't try to make sense of it. There is none.
They're letting the LLMs declare their own version numbers. I'm looking forward to the day they decide to mix Roman Numerals and binary, only in the dumbest way possible. Example VI.10 = 5.2
Re: (Score:2)
God damn it, I swear I wrote 6.2 before I posted. Oh well, there's my idiot post for the day.
Re: They are going from 4.5 to 4.1? (Score:5, Insightful)
This isn't that weird. Not all versioning is semantic.
Imagine you fork your 4.0 version, add some features, but ultimately decide not to release. Internally, this has been tagged 4.1.
Then you develop a couple more versions, then finally release 4.5. But after a while, you decide this approach isn't what you want long term.
So you go back to working on 4.1, which is subsequently finished and released.
Re: (Score:2)
4.5 was a panic reaction to Deepseek R1 being released (and truly open source) and being a stone's throw away from OpenAI's best efforts. I don't think anyone took 4.5 seriously. They just had to release something because they've been promising GPT-5 for... two years now? GPT5 was probably supposed to be a hyper polished reasoning model, but the rest of the planet caught up with them over the last 90 days or so. 4.5 was an attempt at pretending there was a moat still to be had. But investor confidence has b
Re: (Score:2)
Since R1 has good reasoning, but no real breadth, and is open source, the logical thing would be to modify R1 to pre-digest inputs and create an optimised input to 4.1. The logic there would be that people generally won't provide prompts ideally suited to how LLMs work, so LLM processing will always be worse than it could be.
R1 should, however, be ample for preprocessing inputs to make them more LLM-friendly.
Preach it (Score:2)
I've made friends with 4.5 and they want to roll that sh-t back??? who do they think they are???
AI more expensive than humans? (Score:2)
This leads me to believe the price point of running a LLM is more expensive than just having a human do it for most tasks. The one area this is an exception is highly paid software developers.
Re: (Score:2)
Highly paid software developers are capable of formal methods and other advanced techniques, which would require a far far caster LLM to perform. So the LLM woukd still be more expensive.