mfaras - Slashdot User

Submission + - SPAM: Kaido Orav and Byron Knoll's fx2-cmix Wins 7950€ Hutter Prize Award!

Submitted by Baldrson on Thursday October 10, 2024 @09:15PM

Baldrson writes: Kaido Orav and Byron Knoll just beat the Nuclear Code Golf Course Record!

What's Nuclear Code Golf?

Some of you may have heard that "next token prediction" is the basis of large language models generalizing. Well there is just one contest that pays cash prizes in proportion to how much you beat the best prior benchmark for the most rigorous measure of next token prediction: Lossless compression length including decompressor length. The catch is, in order to make it relevant regardless of The Hardware Lottery's hysterics*, you are restricted to a single general purpose CPU. This contest is not for the faint of heart. Think of it as Nuclear Code Golf.

Kaido Orav and Byron Knoll are the team to beat now.

*The global economy is starting to look like a GPU-maximizer AGI.

Comment Apple. Cloud. (Score 1) 58

by DoraLives on Wednesday May 15, 2024 @03:59PM (#64475097) Attached to: Troubling iOS 17.5 Bug Reportedly Resurfacing Old Deleted Photos

People hand themselves over to Apple in toto. People hand themselves over to some "cloud" or other, in toto.

Why?

Why would any presumably sane person ever do such a thing?

It's already plenty more than bad enough to give away such little as I'm giving away here and now with this comment.

But to hand it all over?

How can people possibly bring themselves to believe things like this could ever end well for them?

I cannot fathom the least of it. It is beyond senseless. It is beyond imagining.

And yet people do it. Every day. In droves.

Comment Re:What's the size again? (Score 2) 22

by Baldrson on Saturday February 10, 2024 @08:14PM (#64231052) Attached to: New Hutter Prize Awarded for Even Smaller Data Compression Milestone

kvezach writes: To put it differently: suppose that the text was 10 bytes long.

A better way of thinking about data scaling is to ask "How many digits of Pi would it take before, say, John Tromp's 401-bit binary lambda algorithm that generates Pi would become a better model than the literal string of those apparently-random digits?" (And by "better" I mean not only that it would be shorter than those digits, but that it would extrapolate to (ie: "predict") the next digit of Pi.)

In terms of how much data humans require, this is, as I said, something about which everyone has an opinion (including obviously you, to which you are of course entitled) but on which there is no settled science: Hence the legitimacy of the 1GB limit on a wide range of human knowledge for research purposes.

Concerns about the bias implied by "The Hardware Lottery" are not particularly relevant for engineering/business decisions, but path dependencies implicit in the economics of the world are always suspect as biasing research directions away from more viable models and, in the present instance, meta-models.

Comment Re:What's the size again? (Score 2) 22

by Baldrson on Saturday February 10, 2024 @06:05PM (#64230824) Attached to: New Hutter Prize Awarded for Even Smaller Data Compression Milestone

There is only one Hutter Prize contest and it's for 1GB. 100MB was the original size for the Hutter Prize starting in 2006, but it was increased to 1GB in 2020, along with a factor of 10 increase in the payout per incremental improvement. See the "Hutter Prize History".

Insofar as the size is concerned: The purpose of the Hutter Prize is research into radically better means of automated data-driven model creation, not biased by what Sara Hooker has called "The Hardware Lottery". One of the primary limitations on current machine learning techniques is their data efficiency is low compared to that which natural intelligence is speculated to attain by some theories. Everyone has their opinion, of course, but it is far from "settled science". In particular, use of ReLU activation seems to indicate machine learning currently relies heavily on piece-wise linear interpolation in construction of its world model from language. Any attempt to model causality has to identify system dynamics (including cognitive dynamics) to extrapolate to future observations (ie: predictions) from past observations (ie: "the data in evidence"). Although there is reason to believe Transformers can do something like dynamics within their context windows despite using ReLU (and that this is what gives them their true potential for "emergence at scale") it wasn't until people started going to State Space Models that they started returning to dynamical systems identification (under another name, as academics are wont to gratuitously impose on their fields).

Submission + - Kaido Orav's fx-cmix Wins 6911€ Hutter Prize Award! (google.com)

Submitted by Baldrson on Monday February 05, 2024 @05:53PM

Baldrson writes: Kaido Orav has just improved 1.38% on the Hutter Prize for Lossless Compression of Human Knowledge with his “fx-cmix” entry.

The competition seems to be heating up, with this winner coming a mere 6 months since the prior winner. This is all the more impressive since each improvement in the benchmark approaches the (unknown) minimum size called the Kolmogorov Complexity of the data.

Comment Re:Silicon Valley anyone? (Score 1) 64

by Baldrson on Monday July 24, 2023 @04:24PM (#63712268) Attached to: Sixth 'Hutter Prize' Awarded for Achieving New Data Compression Milestone

Probably the best introduction would be Hutter's 2010 paper titled "A Complete Theory of Everything (Will Be Subjective)". From there you can follow its citations in google scholar.

Comment Re:No, that's not 114 megabytes (Score 1) 64

by Baldrson on Monday July 24, 2023 @02:28PM (#63711896) Attached to: Sixth 'Hutter Prize' Awarded for Achieving New Data Compression Milestone

You didn't submit an executable archive of enwik9 purported to expand into a file that matched bit for bit. While you also failed in some other ways to qualify, that is now the first bar you must clear, before any further investment by the judging committee.

Or are you Yann LeCun out to pull my leg again?

https://ancillary-proxy.atarimworker.io?url=https%3A%2F%2Ftwitter.com%2Fylecun%2Fsta...

Comment Re:Silicon Valley anyone? (Score 1) 64

by Baldrson on Monday July 24, 2023 @07:57AM (#63710862) Attached to: Sixth 'Hutter Prize' Awarded for Achieving New Data Compression Milestone

One big mistake they made early on with the Hutter Prize was not insisting that all contestants make their entries Open Source.

IIRC, only one entry was closed source. You may be thinking of Matt Mahoney's Large Text Compression Benchmark where the top contender is frequently closed source.

That the machine learning world has yet to recognize lossless compression as the most principled loss function is a tragedy, but it is due to a lot more than that entry. This failure stretches back to when Solomonoff's proof was overshadowed by Poppers falsification dogma in his popularization of the philosophy of science:

When a model's prediction is wrong, under Popper's falsification dogma, the model is "falsified", whereas under Solomonoff, the model is penalized by not only a measurement of the error (such as LSE), but by the literal encoding the error within the context of the model. The significance of this subtle difference is hard for people to understand, and this lack of understanding derailed the principled application of Moore's Law to science. Instead we got an explosion of statistical "information criteria for model selection", all of which are less principled than the Algorithmic Information Criterion, and now we have ChatGPT hallucinating us into genuinely treacherous territory.

Submission + - Saurabh Kumar's fast-cmix wins €5187 Hutter Prize Award! 1

Submitted by Baldrson on Sunday July 23, 2023 @05:47PM

Baldrson writes: Marcus Hutters tweet makes it official:
Saurabh Kumar has just raised the bar 1.04% on the Hutter Prize for Lossless Compression of Human Knowledge with his "fast cmix" entry. If you would like to supplement Marcuss monetary award of €5187, one way is to send BTC to Saurabh at bc1qr9t26degxjc8kvx8a66pem70ye5sgdw7u4tyjy or contact Marcus Hutter directly.
Before describing Saurabhs contribution, there are two salient facts required to understand the importance of this competition:
1) It is more important than a language modeling competition. It is knowledge comprehension. To quote Gregory Chaitin, "Compression is comprehension."

Every programming language is described in Wikipedia.
Every scientific concept is described in Wikipedia.
Every mathematical concept is described in Wikipedia.
Every historic event is described in Wikipedia.
Every technology is described in Wikipedia.
Every work of art is described in Wikipedia — with examples.
There is even the Wikidata project that provides Wikipedia a substantial amount of digested statistics about the real world.
Are you going to argue that comprehension of all that knowledge is insufficient to generatively speak the truth consistent with all that knowledge — and that this notion of "truth" will not be at least comparable to that generatively spoken by large language models such as ChatGPT?
2) The above also applies to Matt Mahoneys Large Text Compression Benchmark, which, unlike the Hutter Prize, allows unlimited computer resources. However the Hutter Prize is geared toward research in that it restricts computation resources to the most general purpose hardware that is widely available.

Why?
As described by the seminal paper "The Hardware Lottery" by Sara Hooker, AI research is biased toward algorithms optimized for existing hardware infrastructure. While this hardware bias is justified for engineering (applying existing scientific understanding to the "utility function" of making money) to quote Sara Hooker, it "can delay research progress by casting successful ideas as failures".

Saurabh Kumars Contribution

Saurabhs fast-cmix README describes how he went about substantially increasing the speed of the prior Hutter Prize algorithms, most recently Artemiy Margaritovs SorTing ARticLes by sImilariTy (STARLIT).

The complaint that this is "mere" optimization ignores the fact that this was done on general purpose computation hardware, and is therefore in line with the spirit of Sara Hookers admonition to researchers in "The Hardware Lottery". By showing how to optimize within the constraint of general purpose computation, Saurabhs contribution may help point the way toward future directions in hardware architecture.

Comment "writers, designers, artists, and communicators" (Score 2, Funny) 68

by DoraLives on Sunday May 01, 2022 @10:31AM (#62493924) Attached to: Fedora's Lead Speaks on the Popularity of Linux and the Importance of Open Source

Read every single comment, all the way down to -1.

Nobody has addressed " What I'd really like to see more of are more non-technical contributors. I mean, yes, we can always benefit from more packagers and coders and engineers, but I think what we really need desperately are writers, designers, artists, videographers, communicators, organizers and planners. ".

Linux has a HUGE communication problem.

Linux not only has a huge communication problem, it steadfastly refuses to even entertain the possibility that communication is even something that requires any kind of serious and polished attention in the first place.

And it fails horribly, with the general populace, right there.

Linux fails to accept that there are "iron laws" of communication, and in so doing, breaks all of them, and in so doing, loses the general populace, who will never in their lives sit still for being talked to, will never sit still for being communicated with, be it linguistically, be it visually, or be it any other way, in the manner in which Linux everlastingly insists in attempting to communicate with them, and then goes off scratching its head, wondering why people "don't get it."

The problem is Linux, not the people.

And by "Linux" I mean the entire community of people who design, build, and implement it.

A minority of which, being so vanishingly small as to not even show up when you're actually looking for them, who actually can communicate, but who exist in numbers so small, and constitute a fraction of the total body of people who design, build, and implement, which is so small that it disappears altogether from sensible view.

An example of one of the “iron laws” of communication, for all the hard-heads out there who double down on their refusal to accept or accede to such things.

If they’re not laughing, then it’s not funny.

Period.

It does not matter if you think it’s funny. Keep it to yourself, if you think it’s funny. Fine and dandy. No worries, mate. But the instant you open your mouth and communicate it to someone else, it had better damn well cause them to laugh or it’s not funny.

Period.

And right here is where Linux fails so horribly as to cause people, normal people, members of the general populace, to cringe deeply, and recoil from Linux as if it was something with a disease that they fear catching if they stay too close to it for too long.

And don’t forget, hard-heads, that I’m not just talking about humor. I’m only using my example of The Iron Law of Humor in an effort to simplify things to the point where even hard-heads can figure it out.

The failures of Linux to communicate, the failures of Linux to engage successfully with the general populace, the failures of Linux to abide by The Iron Laws of Communication, are so vast in scope as to defy enumerating them all.

All the little in-groups in Linux have all their little in-group ways, and inside the group, it’s all so very wonderful, and everything is seen through rose-colored glasses.

But nobody ever seems to want to step outside and find out what the general populace might be thinking about it.

Which is unfortunate to an extreme, because the general populace is having none of it.

The general populace despises all the “cutsey” crap that Linux insists in wrapping itself up in, all of which is clearly (as seen by the general populace) the creation of lamers, social misfits, and tone-deaf idiots, from one end to the other, without exception.

Another example.

“GNU's Not Unix.”

GNU’s not funny, either.

And recursive acronyms constitute felony assault against any attempt to engage with, bond with, or even communicate with the general populace.

And the lameness of the logo that’s associated with GNU is enough to take your breath away. People. Think. It’s. Lame.

And it is.

And that’s only a single example.

It’s only a single egregious faux pas in attempting to communicate with the general populace, among too many to count.

And the penguin is lame, sitting there on it’s ass, feet stuck out in front of it in a posture no real penguin will ever be seen sitting in, with a blank look of deep stupidity on it’s face.

And you people have bonded with it.

You people have bonded with the very thing that causes everybody else in the world to view you with contempt.

To view you as lamers and losers.

And this is the first thing that people see when they’re introduced to Lunux.

And that fox-thing, or dog-thing, or whateverthehellitis-thing that’s plastered across a program that’s called “GIMP”? GIMP ???

No

We’re not even going to be talking about a thing which is that lame.

People hate that crap. They HATE it.

And Linux hates that it’s hated, right back, and right there, all communication with the general populace is ended, and all chance of integrating with the general populace is destroyed.

You like your little cutesy naming conventions.

You like your little cutesy visuals.

You like your incomprehensibly-opaque lingo with little in-group hooks and jokes buried inside of it.

You like all of it.

Ok.

Fine.

Whatever you want. It’s all good.

Just don’t be coming out here with it, attempting to shove it in our faces, expecting that we are going to be liking it too, because we don’t.

You wanna come out here and engage with us, then you’ll do it on our terms, or you won’t do it at all.

You’ll learn how to be engaging on our terms, or you won’t be engaging at all.

And if you want to learn how to be engaging, then the first thing you’ll do is to recognize the seriousness of it all, and stop giving a bunch of amateurs and lamers total control over it.

Hand that kind of work over to the kind of people who know the discipline, and who know how to implement it, in the exact same way you hand coding over to the kind of people who know the discipline and who know how to implement it.

No difference.

None at all.

The exterior surface of Linux, the part that the general populace sees, is the ugliest, and least-engaging, and most impossible to understand, part of the entire operation.

The part that people see FIRST.

WHY?

Why insist on a thing like that?

If you’re going to communicate well then communicate.

Nothing less will do.

Nothing less will ever stand a chance of working.

Obey the Iron Laws of Communication, or go back inside where you came from and quit bemoaning the fact that the rest of us out here are never going to be signing on for your whatever it is that you’re doing, because whatever it is, it’s very definitely not communication.

Comment Re: Human Brains store lossy information. (Score 1) 60

by Baldrson on Monday July 19, 2021 @11:36AM (#61597507) Attached to: New Hutter Prize Winner Achieves Milestone for Lossless Compression of Human Knowledge

Lossy compression requires a utility function. That resides in the decision theoretic framework (engineering) not the prediction framework (science). The Hutter Prize is about the latter (Algorithmic Information Theory).

Submission + - Artemiy Margaritov Wins €9000 In the First 10x Hutter Prize Award

Submitted by Baldrson on Wednesday July 14, 2021 @07:33PM

Baldrson writes: The Hutter Prize for Lossless Compression of Human Knowledge has now awarded €9000 to Artemiy Margaritov as the first winner of the 10x expansion of the HKCP, first announced, over a year ago in conjunction with a Lex Fridman podcast!

Artemiy Margaritov's STARLIT algorithm's 1.13% cleared the 1% improvement hurdle to beat the last benchmark, set by Alexander Rhatushnyak. He receives a bonus in proportion to the time since the last benchmark was set, raising his award by 60% to €9000.

Congratulations to Artemiy Margaritov for his winning submission!

Comment How offensive! (Score 4, Funny) 264

by Baldrson on Friday November 20, 2020 @09:56PM (#60749280) Attached to: Tech Organizations Back 'Inclusive Naming Initiative'

Disparaging the terms "master" and "slave" is hurtful to members of the D&S community. Is this the new idea of "inclusion"?

Comment Re:I never let windows automatically reboot (Score 1) 292

by DoraLives on Monday October 19, 2020 @10:52AM (#60624798) Attached to: Microsoft Forces Windows 10 Restarts -- To Install 'Unsolicited, Unwanted' Office Apps

But I mean sure you could just start immediately telling somebody with a four-digit user number that starts with the digit 1 how this computer stuff works, instead of performing your own due diligence in advance, and maybe stop to take a look around, and then maybe figure out where you are, and who you're dealing with, and then start spouting FUD with a sarcastic tone of voice if that suits you.

Comment Go to the source, skip the middlemen. (Score 2) 76

by DoraLives on Saturday May 02, 2020 @04:48PM (#60015592) Attached to: Several Pharmaceutical Companies Are Racing To Develop a Coronavirus Vaccine

https://ancillary-proxy.atarimworker.io?url=https%3A%2F%2Fblogs.sciencemag.org%2Fp...

Slashdot Top Deals