tomatoguy - Slashdot User

Submission + - Anthropic can now track the bizarre inner workings of a large language model

Submitted by tomatoguy on Friday May 09, 2025 @07:46PM

tomatoguy writes: Having psychology-adjacent interests (and perhaps because it's a Friday afternoon), I found this fascinating.

What the firm found challenges some basic assumptions about how this technology really works.

MIT Technology Review

https://ancillary-proxy.atarimworker.io?url=https%3A%2F%2Fwww.technologyreview.com%2F2025%2F03%2F27%2F1113916%2Fanthropic-can-now-track-the-bizarre-inner-workings-of-a-large-language-model%2F (paywalled)

https://ancillary-proxy.atarimworker.io?url=https%3A%2F%2Farchive.is%2F4mujU (free)

= This caught my eye first: studying something that claims to be brainy using brain-investigation tools and approaches.

Anthropic says it was inspired by brain-scan techniques used in neuroscience to build what the firm describes as a kind of microscope that can be pointed at different parts of a model while it runs. The technique highlights components that are active at different times. Researchers can then zoom in on different components and record when they are and are not active.

= Secondly, LLMs get "better" when they know when to shut up.

The latest generation of large language models, like Claude 3.5 and Gemini and GPT-4o, hallucinate far less than previous versions, thanks to extensive post-training (the steps that take an LLM trained on text scraped from most of the internet and turn it into a usable chatbot). But Batson’s team was surprised to find that this post-training seems to have made Claude refuse to speculate as a default behavior. When it did respond with false information, it was because some other component had overridden the “don’t speculate” component.

Comment Re:People should find a way to move to Linux (Score 1) 80

by tomatoguy on Wednesday December 06, 2023 @09:23PM (#64062337) Attached to: Windows 10 Gets Three More Years of Security Updates, If You Can Afford Them

I've done the same with Fedora...16 years running 24/7.

What is your doubt based on?

Comment Heard it on my local* station , and meh... (Score 1) 63

by tomatoguy on Thursday November 02, 2023 @09:04PM (#63975780) Attached to: The Final Beatles Song, 'Now and Then,' Featuring All Four Members and AI, Released

It was a song "by" the AI-resurrected Beatles, but not a "Beatles" song.

* CKUA.com is that station.

Comment Welcome... (Score 1) 40

by tomatoguy on Thursday November 02, 2023 @08:47PM (#63975754) Attached to: Microsoft Overhauling Its Software Security After Major Azure Cloud Attacks

... to the 21st Century, hope you stay.

Comment But no one who (Score 3, Funny) 56

by tomatoguy on Saturday August 12, 2023 @10:07PM (#63763112) Attached to: Some People Are Having Sex in San Francisco's Robotaxis

... reads SlasDot

Comment Landfill operators (Score 1) 17

by tomatoguy on Thursday August 03, 2023 @09:43PM (#63738708) Attached to: Etsy Scrambling To Avert a Widespread Sellers' Strike

are relieved...

Comment Re:Wrote a number of things in Apple Pascal (Score 1) 76

by tomatoguy on Wednesday July 26, 2023 @07:58PM (#63717116) Attached to: New Book about 'The Apple II Age' Celebrates Early Software Developers - and Users

I wrote lots of stuff to do cognitive-processes research. My first real programming job. Apple II with 2 (!) floppy drives and some 80-character video card feeding a 9" monitor in the lab and 2 others in the study rooms.

Comment Where 'manageable' means (Score 1) 57

by tomatoguy on Sunday July 16, 2023 @08:07PM (#63691181) Attached to: Bill Gates Calls AI's Risks 'Real But Manageable'

... exploitable.

Comment So it's an avocado peeler (Score 1) 59

by tomatoguy on Thursday July 13, 2023 @08:41PM (#63684081) Attached to: Chipotle Tests Robot That Can Prepare Avocados To Make Guacamole Faster

Nothing more...

Comment Originally, to stroke Trump**'s ego... (Score 1) 148

by tomatoguy on Monday July 03, 2023 @08:14PM (#63654850) Attached to: What's the Mission of the US Space Force?

... and now something for adults to figure out.

Comment IRL, the CEO was a crook! (Score 1) 9

by tomatoguy on Saturday June 24, 2023 @09:43PM (#63629896) Attached to: Social App IRL, Which Raised $200 Million, Shuts Down After CEO Misconduct Probe

Heh. Had to add something here.

Comment Sounds like (Score 1) 159

by tomatoguy on Wednesday June 14, 2023 @08:49PM (#63603772) Attached to: Researchers Warn of 'Model Collapse' As AI Trains On AI-Generated Content

the Conservative media ecosystem, huffing each others' farts

Comment They could find new careers (Score 1) 78

by tomatoguy on Saturday June 10, 2023 @09:35PM (#63592376) Attached to: 'Extremely Remorseful' Lawyers Confronted by Judge Over 'Legal Gibberish' Citations from ChatGPT

as Donald Trump**'s TV lawyers. I've heard he needs a lot and since they speak gibberish they're on his wavelength.

Comment Re:Excellent! (Score 1) 309

by tomatoguy on Wednesday May 31, 2023 @09:01PM (#63566103) Attached to: India Cuts Periodic Table and Evolution From School Textbooks

America will be more than happy to dumb itself down to meet India's stupidity.

Comment Why didn't "God" help? (Score 1) 79

by tomatoguy on Tuesday May 30, 2023 @08:07PM (#63562663) Attached to: Intel's Revival Plan Runs Into Trouble. 'We Had Some Serious Issues.'

Gone AWOL? Never existed?

Slashdot Top Deals