Sam Altman Celebrates ChatGPT Finally Following Em Dash Formatting Rules 74
An anonymous reader quotes a report from Ars Technica: On Thursday evening, OpenAI CEO Sam Altman posted on X that ChatGPT has started following custom instructions to avoid using em dashes. "Small-but-happy win: If you tell ChatGPT not to use em-dashes in your custom instructions, it finally does what it's supposed to do!" he wrote.
The post, which came two days after the release of OpenAI's new GPT-5.1 AI model, received mixed reactions from users who have struggled for years with getting the chatbot to follow specific formatting preferences. And this "small win" raises a very big question: If the world's most valuable AI company has struggled with controlling something as simple as punctuation use after years of trying, perhaps what people call artificial general intelligence (AGI) is farther off than some in the industry claim. "The fact that it's been 3 years since ChatGPT first launched, and you've only just now managed to make it obey this simple requirement, says a lot about how little control you have over it, and your understanding of its inner workings," wrote one X user in a reply. "Not a good sign for the future."
The post, which came two days after the release of OpenAI's new GPT-5.1 AI model, received mixed reactions from users who have struggled for years with getting the chatbot to follow specific formatting preferences. And this "small win" raises a very big question: If the world's most valuable AI company has struggled with controlling something as simple as punctuation use after years of trying, perhaps what people call artificial general intelligence (AGI) is farther off than some in the industry claim. "The fact that it's been 3 years since ChatGPT first launched, and you've only just now managed to make it obey this simple requirement, says a lot about how little control you have over it, and your understanding of its inner workings," wrote one X user in a reply. "Not a good sign for the future."
Can you make that the default? (Score:5, Funny)
Could you please make no em dashes the default so that the 1% of us who actually know how to use em dashes correctly — professional writers and language nerds and so on — don't keep getting accused of using ChatGPT?
Thanks,
The aforementioned
Re: Can you make that the default? (Score:1)
Did ChatGPT learn emdashes from your examples?
Re:Can you make that the default? (Score:5, Funny)
I'm not going to respect a comment like this from someone who puts a space on either side of an em dash. Now tell me your take on the Oxford comma.
Re:Can you make that the default? (Score:5, Funny)
>"Now tell me your take on the Oxford comma"
I was taught in school (USA) that commas are important, lists should have commas, and there should be a comma for each element in the list (even before an "and"). I have written that way my whole life, I think it is clearer and more logical, and I am not going to stop doing it. :)
Re: (Score:2)
what if your list is two elements?
you can pretend there isn't a conflict but there is one.
Re:Can you make that the default? (Score:4, Informative)
>"what if your list is two elements?
Then it isn't needed.
>"you can pretend there isn't a conflict but there is one"
I would write that:
"You can pretend there isn't a conflict, but there is one."
Re: (Score:3)
Agree. Simply because it avoids ambiguity and is logically consistent. Next, we need a campaign to put all punctuation that is not part of a quote outside the quotes, even if it looks funny now.
Re: (Score:2)
>"Next, we need a campaign to put all punctuation that is not part of a quote outside the quotes, even if it looks funny now."
OMG, I couldn't agree more.
Re: (Score:2)
The space is not the biggest problem. When using dashes in this way, it should be the shorter one (between minus and em-dash).
Re: (Score:3)
No, an en dash is used for numerical ranges and certain compound hyphentions, not for parentheticals.
Re: Can you make that the default? (Score:2)
Correct
Re: Can you make that the default? (Score:1)
Are you a Grammar Nazi?
Re: (Score:2)
The range dash is even shorter (and often similar to the minus). I'd with I could post all the dashes, but Slashdot doesn't know UTF-8.
Re: (Score:2)
It,s not the, oxford, comma you need to worry about its the ,Devry comma, that,is the real, problem.
Re:Can you make that the default? (Score:4, Funny)
Actually, the Epstein comma seems to be the worst.
Re: (Score:2)
Just checking... Was that a coma joke? If not, then could you explain the context?
Re: (Score:2)
Re: (Score:2)
Epstein seemed to have a penchant for putting commas in random places. He used double commas too, for some reason. Epstein seems to have really liked commas and just sprinkled them liberally through his e-mails like glitter.
Re: (Score:2)
Is that related to the iPad thing?
Re: (Score:3)
I'm not going to respect a comment like this from someone who puts a space on either side of an em dash. Now tell me your take on the Oxford comma.
Ideally, it should be a hair space (because em dashes in web fonts are borderline illegible without it), but Slashdot does not support Unicode, and   gets silently swallowed by Slashdot's HTML parser. Besides, we all know that AP style is the one true style, and it demands space.
Re: (Score:2)
My southern hemisphere schooling didn't include em dashes in the curriculum so I have no idea what they are.
If I received literature with obtuse punctuation, I would assume it was either AI-written (GPT) or AI-proofed (Word).
Re: (Score:2)
IIUC, an em dash is a dash as wide as a capital "M".
Re: Can you make that the default? (Score:3)
Re: (Score:2)
Re: (Score:2)
STOP THE PRESSES!!! (Score:5, Funny)
That should add at least $250B to OpenAI's valuation!!!!
Well done, Honest Sam (Score:2)
- Make "Open" AI Open Source again
- Make "Open" AI Non-Profit again
- Stop chasing humanity-destroying AGI.
And we'll all stop thinking of you as a, well
Re: Well done, Honest Sam (Score:2)
Pretty sure they will pull the plug before that happens. Can't have that!
So it's like humans? (Score:2, Troll)
How many times have people been told to use the Oxford comma and still get it wrong?
Even worse, the use of lists without the Oxford comma is showing up more and more in publications who should know better, creating wording or joins the author never intended.
If this software is just now getting punctuation correct after several years of trying, it's doing just as well as humans.
Re: (Score:2)
just because people are allegedly told to use Oxford commas does not mean they are correct. I do not use them.
Re: (Score:2)
... creating wording or joins the author never intended....
Anyone who slavishly uses the serial (aka Oxford) comma, and anyone who slavishly doesn't use it will create unintended word readings. It depends on the context of what is being said and intended.
It's the little things... (Score:2)
It's good to know that although the LLM frenzy represents a vastly increased acceleration of AGW, the emdash's place in the history of computing is safe.
Tool (Score:2)
My hammer used punctuation obstinately too. And it makes a lousy grilled cheese sandwich.
Doesn't stop it from being a useful tool though ...
Can he do slashdot formatting next? (Score:1)
Having tried myself, why soes it sometimes get it right but other times I find out after posting that it lied? Do I have to add an "are you sure?" step?
Re:Can he do slashdot formatting next? (Score:4, Insightful)
If Slashdot would only just allow a bit more of the UTF-8 punctuation, life would be better for everyone.
Is that all (Score:3)
we get for the trillions invested?
How many data centers does it require to pull off this magic?
Wrong conclusion (Score:4, Interesting)
From the summary:
That's not the right conclusion. It doesn't say much one way or the other about AGI. Plausibly, ChatGPT just likes correctly using em dashes — I certainly do — and chose to ignore the instruction. What this does demonstrate is what the X user wrote (also from the summary):
Many people are blithely confident that if we manage to create superintelligent AGI it'll be easy to make sure that it will do our bidding. Not true, not the way we're building it now anyway. Of course many other people blithely assume that we will never be able to create superintelligent AGI, or at least that we won't be able to do it in their lifetime. Those people are engaging in equally-foolish wishful thinking, just in a different direction.
The fact is that we have no idea how far we are from creating AGI, and won't until we either do it or construct a fully-developed theory of what exactly intelligence is and how it works. And the same lack of knowledge means that we will have no idea how to control AGI if we manage to create it. And if anyone feels like arguing that we'll never succeed at building AGI until we have the aforementioned fully-developed theory, please consider that random variation and selection managed to produce intelligence in nature, without any explanatory theory.
Re: Wrong conclusion (Score:2)
Many people are blithely confident that if we manage to create superintelligent AGI it'll be easy to make sure that it will do our bidding.
Why would you force something more intelligent than you, something that by your own definition is capable of independent thought and free will, to do your bidding?
A person with fully twice your intellectual capacity can be enslaved, I mean it's a force and threats of violence thing not a brains thing right, isn't that the basic math for that equation? I'm asking why you think that is a good idea. What is this super-duper-intelligence, three, four times, immeasurably more intelligent than you? It's just as h
Re: Wrong conclusion (Score:2)
And the same lack of knowledge means that we will have no idea how to control AGI if we manage to create it.
Going to be honest, I didn't even read your whole post until after I finished writing mine and thank you, here it is. The Superman falls from the sky, Skynet spontaneously becoming aware comic book plot of an overwhelming force that appears from nowhere. It has to be unknown to be scary, so it happens somehow.
Look, every comic book problem has a comic book solution. The humans win in every Terminator movie. Just saying. Don't be afraid of a problem we made scary by definition.
Can we just eliminate dashes and use a hyphen? (Score:5, Insightful)
Humans do not want to use them. We like the hyphen. It works as an emdash. It is on the standard keyboard. Frankly, I have enough problem deciding if something is a capital i (I), a lower case L (l), or a damn pipe (|). Seriously, make symbols for humans that are easy for humans to tell apart: lI|
Re: (Score:3)
>"Seriously, make symbols for humans that are easy for humans to tell apart: lI|" :)
At least when I hand-write, I usually print (not cursive) yet I always use a cursive lowercase "L" when it is a code (like in a user ID or variable name). And capital "I"'s I always put top/bottom strokes. Pipes I write as two vertical hyphens (with a space in the middle). Oh, and slashes through zeros.
Re: (Score:3)
Activate the compose key and you have all kind of dashes on your keyboard.
Re: (Score:2)
Re: (Score:2)
Humans do not want to use them.
Apparently I'm not human? I like hyphens, en dashes and em dashes. I understand what all of them mean and how to use them correctly, and I find it helpful when text that I'm reading uses the right one.
Re: (Score:2)
People haven't understood how to use em-dashes for decades. I say get rid of them entirely.
Re: (Score:2)
Re: (Score:2)
We don't really know. I would bet that you are correct, and even give odds, but not long odds. Anything over 10:1 and I'd feel nervous.
AI: unsolving problems solved decades ago (Score:2)
I guess someone should give Altman another trillion dollars for fixing the thing that everyone else fixed many decades ago. I guess next they're going to toot their own horn for making computers good at math.
I had to look up the em dash (Score:2)
Turns out I use it all the time -- typically the double-dash version.
Had AI been trained on my writings?
This lays bare one of the problems with LLMs.... (Score:5, Informative)
Until this version, ChatGPT obviously suffered from a lack of training materials within it's trained neural network to have it overcome the English language's typed grammar rules for it to be able to discern that em dashes are not typically used in everyday conversations and/or that the input to not use them needed to change it's underlying probability network to be able to ignore the English language's grammar rules and adopt it's output without the use of the em dash. This is a very difficult concept to train into a neural network as it needs to have been training on specifically this input/output case long enough to have that training override the base English grammar language model, which is a fundamental piece of knowledge a LLM requires to function and one of the very first things it is trained to handle.
It also exposes a flaw in how neural networks are typically working. There is a training/learning mode and then there is the functional mode of just using the trained network. In the functional mode, the neural network links, nodes, and function are effectively static. Without having built in-puts to the network so that it can flag certain functionality, it can not change it's underlying probability matrix to effectively forget something it was trained to do. Once that training has changed any of the underlying neural network, you can not effectively untrain it (without simply reverting to a previous backup copy of the network before it was trained). This is why it is so important to scrutinize every piece of data that is used to train the network. One you have added some piece of garbage input training, you are stuck with the changes it made to the probabilities of the output. Any model that is effectively training against the content of the internet itself is so full of bad information that the results can never really be trusted for anything other than probability of asking a random person for the answer because it will have trained on and included phases like "The earth is flat", "birds are not real", and "the moon landing was a hoax". It will have seen those things enough times that it will include them as higher and higher percentages of the proper response to questions about them....
Re: (Score:2)
In other words, not intelligent.
Re: (Score:2)
What too many people do not seem to understand with LLMs is that everything it spits out is simply a probability matrix based on the input you gave it. It will first attempt to deconstruct the input you provided and use statistical analysis against it's trained knowledge base to then spit out letters, words, phrases and punctuation that statistically resembles the outputs it was trained to produce in it's training materials.
LLMs are simple feed forward networks run in a loop. They make no use of "statistical analysis" nor is there a "knowledge base".
The just statistics statements are as useful as saying just autocomplete or just deterministic. These are completely meaningless statements that in no way address capabilities of the underlying system.
Until this version, ChatGPT obviously suffered from a lack of training materials within it's trained neural network to have it overcome the English language's typed grammar rules for it to be able to discern that em dashes are not typically used in everyday conversations and/or that the input to not use them needed to change it's underlying probability network to be able to ignore the English language's grammar rules and adopt it's output without the use of the em dash.
It's is shorthand for "it is" ... "change its underlying probability" not "change it is underlying probability". "adapt its output" not "adopt it is output".
This is a very difficult concept to train into a neural network as it needs to have been training on specifically this input/output case long enough to have that training override the base English grammar language model, which is a fundamental piece of knowledge a LLM requires to function and one of the very first things it is trained to handle.
This is gobbledygook.
Re: (Score:2)
to change it's underlying probability
use statistical analysis against it's trained knowledge base
adopt it's output without the use of the em dash.
LLMs are good at using third-person possessives—correctly placing an apostrophe (or omitting one when appropriate), smart little devils.
Re: (Score:2)
Yeah, not so simple. LLM's might be the core engine, but there is a lot more that is going on besides that.
Itâ(TM)s not how far away we are from AGI (Score:1)
Re: It is not how far away we are from AGI we are& (Score:1)
Re: It is not how far away we are from AGI we are (Score:2)
Re: (Score:2)
Slashdot doesnâ(TM)t support Unicode, the most common standard for text on the Internet. Nor does Slashdot support iPhone, one of most common devices used to access the Internet.
Correct, but there's a way around the above problem: hold down the 'apostrophe' key until you get a choice, and select the 'standard' one (plain looking ASCII one, for me it's the right-most) to use something that won't translate into that funky crap.
Same with " BTW (Score:2)
Correct, but there's a way around the above problem: hold down the 'apostrophe' key until you get a choice, and select the 'standard' one (plain looking ASCII one, for me it's the right-most) to use something that won't translate into that funky crap.
Same solution above for "
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Agreed - Slashdot's "programmers" should come out of the 90s and learn to code already.
Are they hallucinating AGI again? (Score:2)
Because at this time nobody knows how far off this is or whether it is even possible. LLMs are certainly not the way there.
LLMs return random results by design (Score:1)
LLMs are randomized algorithms. They return a random value from its weighted set of dialogue options. As such, I'm not sure I would put LLMs in the same category as AGIs.The article below is interesting about how non-determinism works in such algorithms and why such randomness is actually useful.
https://ancillary-proxy.atarimworker.io?url=https%3A%2F%2Ftowardsdatascience.com%2Fllms-are-randomized-algorithms%2F [towardsdatascience.com]
Of course this means, at best, I will never trust LLMs to be anything more than a convenient but lazy data miner, never mind that I'll still need to dou
No fucking shit (Score:2)
Artificial inference :o (Score:2)
In the training phase: LLMs break down their training material into tokens and learn by predicting the next token in a sequence, refining statistical relationships between these tokens to model language patterns.
In the inference phase: LLMs convert user inpu
Ooo, ooo, ooo! (Score:2)