Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror

Submission + - Anthropic can now track the bizarre inner workings of a large language model

tomatoguy writes: Having psychology-adjacent interests (and perhaps because it's a Friday afternoon), I found this fascinating.

What the firm found challenges some basic assumptions about how this technology really works.

MIT Technology Review

https://ancillary-proxy.atarimworker.io?url=https%3A%2F%2Fwww.technologyreview.com%2F2025%2F03%2F27%2F1113916%2Fanthropic-can-now-track-the-bizarre-inner-workings-of-a-large-language-model%2F (paywalled)

https://ancillary-proxy.atarimworker.io?url=https%3A%2F%2Farchive.is%2F4mujU (free)

= This caught my eye first: studying something that claims to be brainy using brain-investigation tools and approaches.

Anthropic says it was inspired by brain-scan techniques used in neuroscience to build what the firm describes as a kind of microscope that can be pointed at different parts of a model while it runs. The technique highlights components that are active at different times. Researchers can then zoom in on different components and record when they are and are not active.

= Secondly, LLMs get "better" when they know when to shut up.

The latest generation of large language models, like Claude 3.5 and Gemini and GPT-4o, hallucinate far less than previous versions, thanks to extensive post-training (the steps that take an LLM trained on text scraped from most of the internet and turn it into a usable chatbot). But Batson’s team was surprised to find that this post-training seems to have made Claude refuse to speculate as a default behavior. When it did respond with false information, it was because some other component had overridden the “don’t speculate” component.

Slashdot Top Deals

IOT trap -- core dumped

Working...