Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
AI Microsoft

Microsoft Copilot Joins ChatGPT At the Feet of the Mighty Atari 2600 Video Chess (theregister.com) 54

Robert Caruso once again pitted an AI chatbot against Atari 2600 Video Chess -- this time using Microsoft's Copilot instead of ChatGPT. Despite confident claims of chess mastery, Copilot fell just as hard. The Register reports: By now, anybody with experience of today's generative AI systems will know what happened. Copilot's hubris was misplaced. Its moves were... interesting, and it managed to lose two pawns, a knight, and a bishop while the mighty Atari 2600 Video Chess was only down a single pawn. Eventually, Caruso asked Copilot to compare what it thought the board looked like with the last screenshot he'd pasted, and the chatbot admitted they were different. "ChatGPT deja vu."

There was no way Microsoft's chatbot could win with this handicap. Still, it was gracious in defeat: "Atari's earned the win this round. I'll tip my digital king with dignity and honor [to the] the vintage silicon mastermind that bested me fair and square." Caruso's experiment is amusing but also highlights the absolute confidence with which an AI can spout nonsense. Copilot (like ChatGPT) had likely been trained on the fundamentals of chess, but could not create strategies. The problem was compounded by the fact that what it understood the positions on the chessboard to be, versus reality, appeared to be markedly different.

The story's moral has to be: Beware of the confidence of chatbots. LLMs are apparently good at some things. A 45-year-old chess game is clearly not one of them.

This discussion has been archived. No new comments can be posted.

Microsoft Copilot Joins ChatGPT At the Feet of the Mighty Atari 2600 Video Chess

Comments Filter:
  • I tried to play chess with ChatGDP. It constantly said it was not designed to do this. I prodded it and got about 7 moves out of it. It is a chatbot and not a chess player. I know this and it knows this. It did play a great game, after I repeatedly asked if I played this, what would you do? It is not a chess player.
  • It is just ludicrously slow. But is a chess player it's surprisingly good if you're willing to be very very patient
    • eh, no. I played that for a few rounds and got bored by beating it. I am on Chess.com if you are willing. I am rated around 1750. Not great but I play exciting games... with sacrifices and dramatic checkmates.
      • by muntjac ( 805565 )
        1750 on chess.com is like a god compared to most average players isn't it? I don't think thats a good representation of how most people would fare against the atari.
        • humbly, yes, I think 1750 is like a god. I was a child prodigy, and was a chess champion. I have many fond memories.
      • by test321 ( 8891681 ) on Thursday July 03, 2025 @08:20PM (#65495416)

        People elsewhere estimate the Atari 2600 Video Chess to be ~1300 ELO https://ancillary-proxy.atarimworker.io?url=https%3A%2F%2Fwww.reddit.com%2Fr%2Fchess... [reddit.com]

        • back in that day, the programs did not recognize about pawns becoming queens when they hit the eighth square, or did not recognize simple sacrifices to go for a checkmate. I got bored with them quickly.
          • It's expected. At their core all chess algo are search a min-max tree, but instead of going width- or depth- first exhaustive search, they use heuristics to prioritize some branches of the tree (A-star).

            On the modest hardware of the older machine there isn't that much you can explore before the player gets bored waiting.
            So obviously, you're going to make much stringent rules: "Never take a branch where you lose a piece" prunes entire swaths of the tree, rather than "see if sacrificing peice XXX gives us a

    • by AmiMoJo ( 196126 )

      It's main advantage seems to be that it knows where the pieces are on the board.

      I've had ChatGPT forget the current state of things with other stuff too. I asked it to do some web code, and it kept forgetting what state the files were in. I hear that some are better like Claude with access to a repo, but with ChatGPT even if you give it the current file as an attachment it often just ignores it and carries on blindly.

      In fact one bug it created was due to it forgetting what it named a variable, and trying to

      • I've had ChatGPT forget the current state of things with other stuff too. I asked it to do some web code, and it kept forgetting what state the files were in. I hear that some are better like Claude with access to a repo, but with ChatGPT even if you give it the current file as an attachment it often just ignores it and carries on blindly.

        Yup, they currently have very limited context windows.

        And it's also a case of "wrong tool for the wrong job". Keeping track of very large code bases is well within the range of much simpler software (e.g.: the thing that powers the "autosuggest" function of your IDE which is fed from a database of all functions/variables/etc. names of the entire database).
        For code, you would need such an exhaustive tool to give the list of possible suggestion and then the language model to only predict which from the pre-fi

  • by cathector ( 972646 ) on Thursday July 03, 2025 @08:29PM (#65495428)

    this is just click bait.
    everyone knows these models are not good at actual gameplay nor is it news that they will confidently mis-state stuff. it wasn't news on the first round it's still not news, and it misses the point that there is a Ton of stuff that humans currently do which the models will do cheaper.

    • To pull at a string, I did play ChatGTP for 6 or 7 moves. It did do well. I know it scanned and consumed like.. all of the great Chess games ever played. It can only predict the next word, or move. That seems like the nature of LLM's. If I ever can coax ChatGTP to play a whole chess game.. I will let you know the results.
      • I know it scanned and consumed like.. all of the great Chess games ever played. It can only predict the next word, or move.

        ...and this has been already demonstrated eons ago using hidden Markov models.
        (can't manage to find the website with the exact example I had in mind, but it's by the same guy who had fun feeding both Alice in Wonderland and the bible into a Markov model and use it to predict/generate funny walls of text).

        That seems like the nature of LLM's. If I ever can coax ChatGTP to play a whole chess game.. I will let you know the results.

        The only limitation of both old models like HHM and the current chatbots is that they don't have a concept of the state of the chess board.

        Back in that example, the dev used a simple chess software to keep

        • The only limitation of both old models like HHM and the current chatbots is that they don't have a concept of the state of the chess board.

          I'm not so sure. I was recently working on some font rendering software with ChatGPT to parse the glyphs of TrueType files. The glyphs are described with contours called bezier curves which are described with loops of (x,y) points called contours. I had parsed out the (x,y) points for the contours of the character 'a' in some rare font, and I gave ChatGPT these points (but I didn't tell it what character they were from). It suddenly occured to me to ask a side question of whether it could recognize this

          • It suddenly occured to me to ask a side question of whether it could recognize this character from these contour points, and it could. It said it "looked" like the letter 'a'.

            You only asked it once. It could be any of:

            - You got one lucky answer. I could be that chat bot are bad at classifying glyphs, you just got one lucky answer (see: spelling questions. Until byte-based input handler are a thing, a chatbot doesn't see the ASCII chars forming a string, it only sees tokens -- very vaguely, but good enough metaphor: words) (Same category: all the usual "Chat bot successfully passes bar/med school/Poudlard final 's exams better than average students" press releases). But give it

    • I think it's kind of delicious to see chess used as a benchmark of intelligence again. Of course the chatbot could be augmented with a chess engine that it knows how to invoke to easily beat any human. But using an LLM as a chess engine itself is a nice challenge. Maybe there's a way to do it, or maybe AI as we know it needs more visualization capability. Or maybe if the AI can write a good chess engine given the rules.

      In any case a single guy trying to prove a negative by failing to do something (set

      • I would prefer that an AI Company just goes for General Intelligence. Maybe something self-aware. I sometimes think that I am smarter than and better than other people because I can beat them at chess, but then I hear my mom yelling at me, and possibly feeling her slapping me on my butt, telling me that because I am good at one thing, does not mean that I am "better than", other people.
      • by allo ( 1728082 )

        You could maybe get an average engine, if you really optimized your prompts with things like "Always output a representation of the current board at the end of your answer", but by just using a dialog and hoping that all moves make sense you more benchmark if the LLM can reconstruct the state of the board from a long dialog and then get some weak move as bonus. If you want to make this competition fair, first think real hard how to optimize a LLM to play chess as good as possible, instead of using an arbitr

    • it misses the point that there is a Ton of stuff that humans currently do which the models will do cheaper.

      A WORKING AI can do a lot of things cheaper than MANY jobs. However, nothing I have seen can reliably generate Java that compiles...like obvious, basic syntax mistakes, like missing braces or semi-colons placed randomly. Java may not be your jam, but if can't do that...what can it do? That's an easy use case and perfect for AI.

      I use Claude 4.0 daily at work with a mandate by my employer...their quote "those who don't embrace AI will be replaced by programmers who do." OK, kewl....I want to be more pr

      • The code compiles about 50% of the time. It often can't even match braces and will introduce semi-colons in the middle of statements.

        That doesn't match my experience at all from C++ and Python. I use the Plus version of ChatGPT, and I've literally never seen it make a syntax error since GPT-4 based models launched. It occassionally produces subtle logic bugs (but so do humans), but we're talking about once in every thousands of lines of code - and never something as basic as matching braces or semi-colons. Given Java has a simpler grammar and is so popular, I would have expected its proficiency in Java to be at least as good.

        You menti

  • HVAC Repair... (Score:5, Interesting)

    by NobleNobbler ( 9626406 ) on Thursday July 03, 2025 @08:41PM (#65495460)

    Was diagnosing an HVAC low delta problem on 3 hours of sleep and tried some LLMs as an experiment. They all rang 20 alarm fires, said the compressor is going to explode and absolutely just went all in on catastrophe.

    Then I noticed the liquid line had a lower pressure than the suction line.

    I reversed the probes.

    The things that "AI" misses are outrageous. The language it uses is definitive and it draws on complex topics.

    And it misses literally classroom 101 common sense sanity checks.

    • I work with insanely high voltages, and with software too. I do research and development. We have to have a firm grasp at what AI is now and what it is not. It does not have common sense, it simply predicts the next word in a sentence. I find that amazing when I am writing software.
      • by leptons ( 891340 )
        I find the LLM is wrong 90% of the time when it tries to write software for me.
        • me2. I like that it does the simple stuff though. It is great at syntax and making loops. It goes nuts when I tell it that stuff does not work, it keeps apologizing and telling me that it does work.
  • Everyone here is so righteous and says this isn't AI, but this thing can make moves and use its limited abilities to predict the next best move. Given its limited skill set, the fact that it can even play is already miraculous.
    • Re:Well darn (Score:4, Insightful)

      by evanh ( 627108 ) on Friday July 04, 2025 @02:38AM (#65495918)

      The Atari 2600 manages with maybe a 0.1 MIPS single core, a few kBytes of RAM, not a lot more ROM, and one or two Watts of electricity.

    • the fact that it can even play is already miraculous.

      LLMs invent invalid moves. It is somewhat miraculous that they can sometimes generate valid ones, but the real miracle is that people think there is intelligence there. It's a miracle for the people selling "AI" anyway.

      • Here in Russia, ChatGPT is the only thing keeping me alive. The local doctors are grossly unqualified, negligent, and basically don't care about your health.

        But yeah, it's not "AI", too bad real fucking two-legged intelligence doesn't give a fuck about what's going on with me, neither they even try to help.

        And it's not exclusive to my country. r/ChatGPT has already seen many stories about people who have saved themselves or their loved ones using a next token predictor.

        Lastly, this will be a revelation for

  • A chess AI like the Atari's has an internal representation of the current state (exact!), can keep a history of previous states (I don't know if it does), has a list of legal moves and no option to attempt another one, and a planning algorithm that can go many moves ahead before deciding on one.

    They benchmarked this against a LLM, that needs to parse the history of the game to (hopefully) get some kinda meaningful representation in its latent space and then can try to make up a move. Even with reasoning it

    • I wonder if you wouldn't win if you just told ChatGPT to write an chess AI and then used the chess AI to beat the Atari. Writing code is something text models are good for. Playing chess is not.

      The devil is in the detail.
      All chess algorithms are A-star: they search a min-max tree, but use heuristic to prioritize some branches instead of doing width- or depth- frist.
      Generatingn a template of a standard chess algo would be probably easy for a chatbot (these are prominently featured in tons of "introduction to machine learning" courses that training the LLM could have ingested), writing the heurisitc function to guide the A-star search is more an art-form and is probably where the chat bot is going

      • by allo ( 1728082 )

        The go breakthrough is monte carlo tree search. That's also a planning algorithm that has a good heuristic to keep exploring underrated paths. The main heuristic if a path is good is "Play it to the end (randomly or with a good simple heuristic) and see who wins", so you get to explore which prefixes in the tree yield good heuristic results. The main advantage is you can do this highly parallel and you can stop it any time and then play the best move. The Go AI got really good, when they added neural networ

        • It is interesting that they manage to play a game to the end, but there is no point to have them play in a competition.

          Oh, definitely. I wasn't musing this in the sense "let's send a chatGPT-powered chess engine to a tournament !",
          more like "maybe a chatGPT-power chess engine could manage to play more than a couple of round, enough to keep your nephew entertained".

    • You could probably have an 'AI' that was not a language model but purely a predictive model for chess moves. There are great many chess games, especially at high level, that consist entirely of moves someone has played before. You could feed it a shitton of games, then feed it moves in the same format as the training data. I imagine you'd be able to confuse that 'AI' by playing crazy nonstandard moves, but so long as you stuck to standard lines it would probably give you a pretty good game.

      That's one of the

      • by allo ( 1728082 )

        The field of AI is more than LLM. And the Atari chess engine is certainly a part of that field. And yeah, there are many architectures, with neuronal net and without, that can easily beat the Atari. And as said, I think some of the top LLM can probably write quite good chess programs as most of the algorithms are widely documented, discussed and contained in tutorial texts. I'm not sure how good the Atari plays, but I guess many people here would first need to read some articles about modern chess AIs (thes

  • ...except media gullibility. Look, I get it. Watching a generative AI flail against an Atari 2600 is funny. It plays well on social media. It makes people feel good about “real” computing. But let’s be clear: LLMs getting curb-stomped by 8-bit silicon in a chess match isn’t just apples to oranges — it’s apples to architecture diagrams.

    ChatGPT and Copilot are language models. They don’t play chess the way AlphaZero or even Stockfish does. They generate plausible descriptions of chess moves based on training data. They aren’t tracking game state in structured memory. They don’t use a search tree or evaluation function. They’re basically cosplaying a chess engine — like a high schooler pretending to be a lawyer after binge-watching Suits.

    And you can flip that analogy around and still make it work: expecting an LLM to beat a dedicated chess algorithm is like asking Tom Cruise to fly a combat mission over Iran just because he looked convincing doing it on screen.

    Meanwhile, even the humble Atari 2600 version of Video Chess was running a purpose-built minmax search algorithm with a handcrafted evaluation function — all in silicon, not tokens. It doesn't have to guess what the board looks like. It knows. And it doesn't hallucinate, get distracted, or lose track of a bishop because the move history got flattened in the working token space.

    So what does this little stunt prove? That LLMs aren't optimized for real-time spatial state tracking? Shocking. That trying to bolt a complex turn-based system onto a model that lacks persistent memory and visual context is a bad idea? Groundbreaking. That prompt-driven hubris doesn’t equal capability? You don't say.

    This isn’t a fair fight. It's a stunt for attracting eyeballs and mouse clicks. And it's about as informative as asking an Atari to write a sonnet or explain Gödel’s incompleteness theorems — both of which LLMs can do, and often better than most poets or mathematicians could manage on the fly. Wake me when someone wires up a transformer-based architecture with structured spatial memory and an embedded rules engine — something capable of reproducing the cognitive contours in Hilbert space that mirror what biological chess engines like Boris Spassky or Bobby Fischer did in their wetware.

    Until then, all this proves is that language models are terrible chess engines — which is like saying your microwave is bad at making omelets. We knew that already.

    • Until then, all this proves is that language models are terrible chess engines — which is like saying your microwave is bad at making omelets. We knew that already.

      OK, so what is it good for? ChatGPT was released 3 years ago. Think of how much the internet advanced 3 years after Netscape Navigator 1.0 was released. What is something commercially valuable these LLMs can actually do that we can objectively see? (no, Mark Zuckerberg promising his developers are so much more productive with them is pure bullshit...as he has presented no evidence and based on subjective "vibes").

      The wealthiest companies in history have poured trillions into and hired the best minds an

      • OK, so what is it good for?

        LLMs excel at writing filler text which is mandated by some ritual but whose content is unimportant. For example, your fifth-grade report on the life and times of George Washington Carver, which is required work to get a grade but not anything likely to contribute to the sum total of human knowledge.

        An LLM can write a summary better than a bored student, but that's really only of value to the student wanting to cheat -- it doesn't benefit society in any way, and the real value of educating the student is l

        • OK, so what is it good for?

          LLMs excel at writing filler text which is mandated by some ritual but whose content is unimportant. For example, your fifth-grade report on the life and times of George Washington Carver, which is required work to get a grade but not anything likely to contribute to the sum total of human knowledge.

          An LLM can write a summary better than a bored student, but that's really only of value to the student wanting to cheat -- it doesn't benefit society in any way, and the real value of educating the student is lost.

          This is the same dusty argument that gets trotted out every time a new technology challenges the gatekeepers of rote effort. Yes, LLMs can write fifth-grade reports — just like calculators can do your long division, Photoshop can color-correct your vacation photos, and Google Maps can tell you where north is without needing a compass or a sextant. That doesn’t mean the tools are pointless. It means the task was never the point — the thinking behind it was.

          You’re fixating on the most

      • Until then, all this proves is that language models are terrible chess engines — which is like saying your microwave is bad at making omelets. We knew that already.

        OK, so what is it good for? ChatGPT was released 3 years ago. Think of how much the internet advanced 3 years after Netscape Navigator 1.0 was released. What is something commercially valuable these LLMs can actually do that we can objectively see? (no, Mark Zuckerberg promising his developers are so much more productive with them is pure bullshit...as he has presented no evidence and based on subjective "vibes").

        The wealthiest companies in history have poured trillions into and hired the best minds and turned it loose on the collective public imagination. What do they have to show for it? Is there anything we can objectively measure?

        You’re right to ask what LLMs are good for — but comparing them to Netscape Navigator isn’t quite as apt as you might think. Netscape was a browser, built on protocols like TCP/IP that were already designed to do something specific: connect people and allow them to share information over the internet. Those technologies grew, evolved, and helped spark the explosion of the web.

        In fact, let’s take a moment to look back at the history of TCP/IP itself. These protocols were developed in

        • Everyone is scared of AI taking their job...legitimately so...but think about it from the pro-AI side...where's their victory? You brought up games. Where's a game that was built with AI or a maybe a great game that was built in record time by a small team using AI that looks like a AAA game? Why aren't games shipping faster? If the big studios haven't embraced them, then surely upstarts would be building some mind-blowing games that were built in a few weeks that look like they took many years in a AA
          • You’re asking a fair question, but it’s worth clarifying: games aren’t being built entirely by AI — and no serious dev thinks they can crank out a AAA title in a few weeks with ChatGPT. But AI tools are already being used in real production pipelines.

            For example, Ubisoft's Ghostwriter [ubisoft.com] helps narrative designers create ambient NPC dialogue — a massive time-saver for open-world games. NVIDIA’s ACE [gamedeveloper.com] showed AI-driven NPCs responding to unscripted player dialogue. Ubisoft's Capt

            • I'm definitely aware, first-hand that AI can't do anything on it's own. You listed many tools. They sound impressive in isolation. However, you're just repeating what the AI sales people tell us....I'll give you new, shiny tools that'll make you SOOOOOO productive. Mark Zuckerberg says his AI can replace a mid-level coder. So I use one daily and it sucks (Claude 4.0). I've dabbled with ChatGPT and CoPilot and they were far worse...OK, so maybe I'm the issue and I suck...well...there's still no result
    • by jonadab ( 583620 )
      So what you're saying is, the job LLMs are really best suited for is acting.

      Or generating scripts for a children's program called "Let's Play Make Believe".

Moneyliness is next to Godliness. -- Andries van Dam

Working...