
OpenAI's o3 Model Beats Master-Level Geoguessr Player 32
In a blog post yesterday, Master I-ranked human GeoGuessr player Sam Patterson said that OpenAI's o3 model outscored him in a head-to-head match, "correctly identifying all five countries and twice landing within a few hundred meters." Geoguessing is a game -- most popularly known through the platform GeoGuessr -- where players are dropped into a random location in Google Street View and must figure out where in the world they are using only visual clues from the environment. With the release of its newest AI models, o3 and o4-mini, OpenAI now does a surprisingly good job of analyzing uploaded images to determine their locations using nothing but subtle visual clues.
"Even when I embedded fake GPS coordinates in the image EXIF, the model ignored the spoof and still pinpointed the real locations, showing its performance comes from visual reasoning and on-the-fly web sleuthing -- not hidden metadata," says Patterson. From the post: I notice that it often does a lot of unnecessary and repetitive cropping, and will sometimes spend way too much time on something unimportant. A human is very good at knowing what matters, and o3 is less knowledgeable about what things it should focus on. It got distracted by advertising multiple times. However, most of what it says about things like signs and road lines appears to be accurate, or at least close enough to truth that they meaningfully add up. Given the end result of these excellent guesses, it seems to arrive at the guesses from that information.
If it's using other information to arrive at the guess, then it's not metadata from the files, but instead web search. It seems likely that in the Austria round, the web search was meaningful, since it mentioned the website named the town itself. It appeared less meaningful in the Ireland round. It was still very capable in the rounds without search.
So to put a bow on this:
- The o3 model isn't smoke and mirrors, tricking us by only using EXIF data. It's at a comparable Geoguessr skill level to Master I or better players now (at least according to my own ~20 or so rounds of testing).
- Humans still hold a big edge in decision time -- most of my guesses were 4 min.
- Spoofing EXIF data doesn't throw off the model.
Whether you view this as dystopian or as a technological marvel -- or both -- you can't claim it's a parlor trick.
"Even when I embedded fake GPS coordinates in the image EXIF, the model ignored the spoof and still pinpointed the real locations, showing its performance comes from visual reasoning and on-the-fly web sleuthing -- not hidden metadata," says Patterson. From the post: I notice that it often does a lot of unnecessary and repetitive cropping, and will sometimes spend way too much time on something unimportant. A human is very good at knowing what matters, and o3 is less knowledgeable about what things it should focus on. It got distracted by advertising multiple times. However, most of what it says about things like signs and road lines appears to be accurate, or at least close enough to truth that they meaningfully add up. Given the end result of these excellent guesses, it seems to arrive at the guesses from that information.
If it's using other information to arrive at the guess, then it's not metadata from the files, but instead web search. It seems likely that in the Austria round, the web search was meaningful, since it mentioned the website named the town itself. It appeared less meaningful in the Ireland round. It was still very capable in the rounds without search.
So to put a bow on this:
- The o3 model isn't smoke and mirrors, tricking us by only using EXIF data. It's at a comparable Geoguessr skill level to Master I or better players now (at least according to my own ~20 or so rounds of testing).
- Humans still hold a big edge in decision time -- most of my guesses were 4 min.
- Spoofing EXIF data doesn't throw off the model.
Whether you view this as dystopian or as a technological marvel -- or both -- you can't claim it's a parlor trick.
WTF (Score:1)
Re: (Score:3, Funny)
Re: (Score:3)
"DUPE!!!"
Re: (Score:2)
I play it occasionally. It's kind of fun and you get to see and explore places you'll never visit IRL.
Re: (Score:2)
You know, what a GeoGuessr player is was explained in the excerpt, so, you know, RTFM.
There are probably more Geoguesser players than Slashdot readers. :)
Re: (Score:2)
Are you saying that throttling posts and adding advertising has made the site lose market share since back in the old days when slashdotting was a thing?
Re: WTF (Score:2)
They should have explained it at the beginning.
Re: WTF (Score:2)
It took me a bit to figure it out too. I thought they were talking about a chess program.
Re: (Score:2)
I have played it many years ago, was actually pretty good at it. But didn't know it was called "geoguessr".
Re: Comment Subject: (Score:2)
Alt headline (Score:4, Insightful)
Re: (Score:2)
"Computers better than people at remembering and sifting through gigantic amounts of data"
You just took a _very_ hard computing problem, beating humans at geoguessr, using just image classification and a reasoning LLM, and dismissed it as ... computers fast.
So every advancement in computing could similarly be dismissed as computer fast, it's expected.
You're saying AI is innately superior to humans.
Yay~ (Score:1)
Oh good (Score:3)
Re: (Score:2)
Geoguesser is a game where you are shown a photo and have to figure out where it was taken just by looking at it. Things like the type of road signs, the landscape, country specific building regulations and so forth are used, as well as more obvious stuff like the language of any text that is visible.
The fact that AI is good at this has some practical uses. Law enforcement will probably be interested in the ability to locate any random photo taken outdoors. Consumers may like to have that feature to locate
Re: (Score:2)
Re: (Score:2)
I don't know the resource requirements to achieve this result or speculated future results. I still question the value to society relative to the social and environmental impact of "AI" in general. Who will have access to which of t
Re: (Score:2)
Performing image analysis and coming to correct conclusions based on the content of an image is indeed a huge value of society, especially when it outperforms people.
Today: we're talking about someone determining a location on the world better than a human player.
Tomorrow: we're talking about a computer determining whether a shadow on a CT scan is a cancer better than a human player.
Calling this "games" rather than fundamentally the principle of the game is a very low-IQ approach to this story.
Re: (Score:2)
I am not sure how IQ relates. I have never measured my IQ and don't concern myself with your assessment of mine. IQ is simply one measure of intelligence. There are many other forms that we do not understand and for which we have no ability to measure. The human mind is a bit more complex than you seem to perceive, which may be one reason that you are a proponent of AI. I also believe that, in general, human beings are more valuable than machines.
I am sharing my perspective, w
Nobody fucking cares (Score:2)
But LLMs only regurgitate (Score:1)
Shh!! (Score:1)