Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror

Comment Re: It's because (Score 1) 98

You don't need to consume copyright materials.

Until you get to college and you're given a list of required textbooks and required reading in a humanities class. Or until the government, a public utility, or some other monopoly on an essential service requires a particular brand of proprietary operating system to run an application through which to access said service. This could be Windows or macOS on desktop, or Android with Google Play* or iOS on mobile.

* Though Android Open Source Project is free software, many popular applications require Google Play Services, which is proprietary and lawfully available only preinstalled on a handset.

Comment Re:Rational delusions (Score 1) 105

Well, again, a real brain just manipulates synaptic signals. There is no substantive difference between these and tokens because either can emulate the other. You might find it useful to think of it as a generalization of the Turing machine concept. Yes, I am a bot, you found me. Good work.

Comment Re:chess (Score 1) 105

Not really- it's simply an indication that the attention layers haven't been trained to play Chess. You could absolutely do so.

But isn't that what thinking is? Doing that adaption on the fly for w new situation?

A human doesn't remember every state the board was in, either, and if they tried to, I imagine it would significantly degrade their ability to play the board in front of them.

Funnily enough the really really really good players have no problem memorizing games and it appears to happen naturally. I can't, that's for sure and I don't think it helps per-se but it doesn't appear to be counter productive either. My brother used to pull shit like that. Play a bunch of people simultaneous blindfold, or play you blindfold then reel off the game from memory.

Comment Re:This was obvious ... (Score 1) 105

I'd say you're the one being pedantic ;)

That's what I mean. I meant I'm being very, very pedantic about the definition of "Markov" :)

I'm also not going to claim this is an any way useful for reasoning about LLMs.

It's a pretty serious contortion to call an LLM a Markov Chain.

Is it? Let's say your LLM has a maximum of 1000 tokens. Your state at time any time is 1000 tokens, plus 1000 bools indicting the presence or absence of a token.

Your state transition function is the transformer network.

Now you can generate state N+1 from state N without reference to state N-1!

There is some sleight of hand here. Implementation of Markov processes do have a variety of formulations, for example producing a probability distribution from a state, then you sample the distribution and blammmo! New state.

But that's a specific case of [new state] = f([old state],[noise])

That function could equally well be a transformer run on the old state with some noise injected. There's not an explicit modelling of the probability distribution, but does there need to be?

Rather trivially, the transformer perfectly models itself so while we don't have an explicit formulation of the distribution that we can sample, we have a perfect representation of the generate-and-sample function.

We're talking hundreds of millions of significant digits for the token IDs.

I'm not sure I get why. I think all you need is the exact simply the old tokens with the newly generated one concatenated on. Or more pedantically, the equivalent of that implemented with a fixed length and a set of occupancy bools.

With that you never need a previous state because all the previous state is in the current state, which is possible because the state is fixed in size and not all that big.

Is it cheating to simply fold the old state into the new state? Isn't that what velocity are and acceleration are with respect to position in a Kalman filter?

Comment Re:GPT-2 (and flies) cannot reason (Score 1) 105

I had a great conversation with ChatGPT 5 the other day, wherein we concluded that ChatGPT UI coders just plain dropped the ball on an important, missing UI feature. Was more pleasant than discussing such questions with an actual human in my experience.

Mind you I still felt a bit weird about it. Then the conversation turned back to the structural engineering equations I originally asked about, and got a fine, non-hallucinated result that checks out.

Comment Better reasoning than the average trailer park boy (Score 1) 105

Better reasoning than the average trailer park boy. You know, not those trailer park boys, but average ones.

How the fuck do we know that actual intelligence, and indeed, consciousness, is not just a matter of coming up with the most likely next word? Yeah, everyone's reasoning is brittle to some degree, even Einstein's.

Comment Re:This was obvious ... (Score 1) 105

To model an LLM as a Markov Chain, the Markov Chain would need a state the size of every possible configuration of the hidden state vectors.

I think pedantically you are mistaken. First the definition:

A Markov chain is basically one where state N+1 is conditionally independent of state N-1 given state N.

At first glance an LLM is not a Markov chain because it uses history, so state N+1 depends on N, N-1 and so on.

The key though is that LLMs have a bounded number of tokens. The state is then a fixed sized vector which has a list of all tokens in its window, plus a flag for each token indicating whether it's present or not. State N+1 can be predicted entirely from state N.

It's a real valued Markov chain, so you have the choice of modelling the transition PDF as Gaussian, or using some sort of approximation. And what better than a universal one such as a neural network. Specifically a transformer architectures one. And turns out that's not an approximation, since a transformer perfectly models itself and we're not interested in an abstract state transition function, but one in the LLM.

IMO this means that LLMs (excluding the ones that have access to non LLM things ot generate tokens) are Markovian. This doesn't mean to say they are a simple HMM or anything like that, merely that with a little restructuring of how they are executed have the Markovian conditional independence property.

Take from that what you will, I don't believe it has much bearing on the discussions of LLMs other than whether that are in a strict sense Markovian.

Comment Re:chess (Score 1) 105

What you experienced was its attention heads being overwhelmed by its context. It's a limitation of the models.

I would think a chess game would take a very long time to run out of tokens though. IIRC chatgpt tells you when it's exceeded the number of tokens it can process, but a few tens of moves won't be remotely close.

You will have better luck if you completely wipe the context and start over, giving it nothing but the current board state- that's how I handle the problem when I'm playing around with a game simulator driven by an LLM.

Undoubtedly, but this is yet another good indication that LLMs can't think (in case we needed it!). It's obvious to anyone who has played chess/knows the rules, or has really played any board game with full information that it's the board now that matters and nothing else.

Comment Re:No mention of GPA? (Score 1) 148

I've been a professional software engineer for 35 years, and been involved in hiring for all but the first two, in several different companies from tiny startups to giant corps (IBM and now Google). Maybe computer engineering is different (I doubt it), but in all that time I've never seen or heard of a company that cared about GPA, because it's a really lousy predictor of ability

I once interviewed a guy (and we subsequently hired) who lived by that creed. He didn't put his degree class on his CV (closest UK equivalent to GPA). Turned out later he has a 1st (i.e. 4.0 ish equivalent). Awesome guy, mad as a sack of badgers, and I need to bug him into going for a pint.

I agree BTW.

Slashdot Top Deals

The absence of labels [in ECL] is probably a good thing. -- T. Cheatham

Working...