Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror

Comment Re:BS (Score 1) 149

LLMs perform very well with what they've got in context.

True in general, I agree. How well any local tools pick out context to upload seems to be a big (maybe the big) factor in how good their results are with the current generation of models, and if they're relying on a RAG approach then there's definitely scope for that to work well or not.

That said, the experiment I mentioned that collapsed horribly was explicit about adding those source files as context. Unless there was then a serious bug related to uploading that context, it looks like one of the newest models available really did just get a prompt marginally more complicated than "Call this named function and print the output" completely wrong on that occasion. Given that several other experiments using the same tool and model did not seem to suffer from that kind of total collapse, and the performance of that tool and model combination was quite inconsistent overall, such a bug seems very unlikely, though of course I can't be 100% certain.

It's also plausible that the model was confused by having too much context. If it hadn't known about the rest of the codebase, including underlying SQL that it didn't need to respond to the immediate prompt, maybe it would have done better and not hallucinated a bad implementation of a function that was already there.

That's an interesting angle, IMHO, because it's the opposite take to the usual assumption that LLMs perform better when they have more relevant context. In fact, being more selective about the context provided is something I've noticed a few people advocating recently, though usually on cost/performance grounds rather than because they expected it to improve the quality of the output. This could become an interesting subject as we move to models that can accept much more context: if it turns out that having too much information can be a real problem, the general premise that soon we'll provide LLMs with entire codebases to analyse becomes doubtful, but then the question is what we do instead.

Comment Re:BS (Score 1) 149

I could certainly accept the possibility that I write bad prompts if that had been an isolated case, but such absurdities have not been rare in my experiments so far, and yet in other apparently similar scenarios I've seen much better results. Sometimes the AI nails it. Sometimes it's on a different planet. What I have not seen yet is much consistency in what does or doesn't get workable results so far, across several tools and models, several variations of prompting style, and both my own experiments and what I've heard about in discussions with others.

The thing is, if an AI-backed coding aid can't reliably parse a simple one-sentence prompt containing a single explicit instruction together with existing code as context that objectively defines the function call required to get started and the data format that will be returned, I contend that this necessarily means the AI is the problem. Again I can only rely on my own experience, but once you start down the path of spelling out exactly what you want in detail in the prompt and then iterating with further corrections or reinforcement to fix the problems in the earlier responses, I have found it close to certain that the session will end either unproductively with the results being completely discarded or with a series of prompts so long and detailed that you might as well have written the code yourself directly. Whatever effect sometimes causes the these LLMs to spectacularly miss the mark also seems to be quite sticky.

In the interests of completeness, there are several differences between the scenario you tested and the one I described above that potentially explain the very different results we achieved. I haven't tried anything with Qwen3, so I can't comment on the performance of that model from my own experience. I was using local tools that were handling the communication with (in that case) Sonnet, so they might have been obscuring some problems or failing to pass through some relevant information. I wasn't providing only the SQL and the function to be called, I gave the tool access to my entire codebase, probably a few thousands lines of code scattered across tens of files in that particular scenario. Any or all of those factors might have made a difference in the cases where I saw the AI's performance collapse.

Comment Re:I for one am SHOCKED. (Score 1) 52

You don't appear to consider the cost to everyone who didn't buy the glasses, but encounters someone wearing them.

This is the thing that people saying things like "You have no reasonable expectation of privacy in public" seem unable to grasp. There is a massive and qualitative difference between casual social observations that would naturally occur but naturally be forgotten just as quickly and the systematic, global scale, permanently recorded, machine-analysed surveillance orchestrated by the likes of Google and Meta. Privacy norms and (if you're lucky) laws supporting them developed for the former environment and are utterly inadequate at protecting us against the risks of the latter.

And it should probably be illegal to sell or operate any device that is intended be taken into private settings and includes both sensors and communications so that even in a private setting the organisations behind those devices can be receiving surveillance data without others present even knowing, never mind consenting.

Perhaps a proportionate penalty would be that the entire board and executive leadership team of any such organisation and a random selection of 20 of each of their family and friends should be moved to an open plan jail for a year where there are publicly accessible cameras and microphones covering literally every space. Oh, and any of the 20 potentially innocent bystanders who don't think that's OK have the option to leave, but if they do, their year gets added to the board member or executive they're associated with instead.

Comment Re:BS (Score 1) 149

FWIW, I was indeed surprised by some of the things these tools missed. And yes, the worst offenders were the hybrid systems running some sort of local front-end assistant talking to a remote model. Personally, while small context limits get blamed a lot for some of the limitations of current systems, I suspect that limitation is a bit misleading. Even with some of the newer models that can theoretically accept much more context, it would still be extremely slow and expensive to provide all of a large codebase to an LLM as context along with every prompt, at least until we reach a point where we can run the serious LLMs locally on developer PCs instead of relying on remote services.

Even with all of those caveats, if I give a tool explicit context that includes the SQL to define a few tables, a function that runs a SQL query using those tables and returns the results in an explicitly defined type, and a simple prompt to write a function that calls the other function (specified by name) and print out the data it's retrieved in a standard format like JSON, I would not expect it to completely ignore the explicitly named function, hallucinate a different function that it thinks is returning some hacky structure containing about 80% of the relevant data fields, and then mess up the widely known text output format. And yet that is exactly what Sonnet 3.7 did in one of my experiments. That is not a prototype front-end assistant misjudging which context to pass through or a failure to provide an effective prompt. That's a model that just didn't work at all on any level when given a simple task, a clear prompt, and all the context it could possibly need.

Comment Re:BS (Score 1) 149

As for their ability to infer, I couldn't agree less with that.

Dump an entire code base into their context window, and they demonstrate remarkable insight on the code.

Our mileage varies, I guess. I've done quite a few experiments like that recently and so far it seems worse than a 50/50 shot that most of the state-of-the-art models will even pick up on project naming conventions reliably, and far less that they'll follow basic design ideas like keeping UI code and database code in separate packages or preferring the common idioms in the programming languages I was using. These were typically tests with real, existing codebases on the scale of a few thousand lines, and the tools running locally had access to all of that code to provide any context to the remote services they wanted. I've also tried several strategies with including CONVENTIONS.md files and the like to see if that helped with the coding style, again with less than convincing results.

Honestly, after so much hype over the past couple of years, I've been extremely disappointed so far by the reality in my own experiments. I understand how LLMs work and wasn't expecting miracles, but I was expecting something that would at least be quicker than me and my colleagues at doing simple, everyday programming tasks. I'm not sure I've found any actual examples of that yet, and if I have, it was faster by more like 10% than 10x. The general response among my colleagues when we discuss these things is open ridicule at this point, as it seems like most of us have given it a try and reached similar conclusions. I'm happy for you if you've managed to do much better, but I've never seen it myself yet.

Comment Re:BS (Score 1) 149

I've done some experiments recently with LLM-backed tools to try to understand the current state of the art. FWIW, my own experience has been that for relatively simple boilerplate-generation jobs they can often produce useful code, but their limit is roughly the capabilities of a junior developer. They make mistakes fairly often. Maybe more importantly, even when their code technically produces the right answer, they rarely infer much about any existing design or coding standards and their code often doesn't fit in with what is already there. I have found that to be the case disappointingly consistently, even in projects small enough to fit the entire codebase in the context, and across a variety of prompting strategies and tools.

So far, I'd say a relatively good session can produce a lot of correct boilerplate without much human intervention other than the prompts themselves. I'm uncertain about whether it really does so significantly faster than a senior dev who could stream the same kind of boilerplate as fast as their fingers could type it, once you take into account the need to proofread and correct the LLM's output, but it was probably faster than a mid and certainly faster than a junior in most experiments I tried. In contrast, a bad session can last an hour or more and still result in discarding the entire output from numerous interactions with an LLM because it has produced literally no code of sufficient value to keep.

Comment Re:Where Do You See It Going? (Score 1) 125

This is the danger of consolidation. If everyone comes to rely on a single provider for critical IT facilities - see also Microsoft - and then one day even that big corporation decides that providing those facilities in the same way as it has for a long time is no longer the right business strategy - again, see also Microsoft - that can leave a big hole.

However, in Intel's case there are alternative providers of both compatible CPUs (AMD) and effective competitors (everyone building ARM). There have been times when AMD hardware looked better in the past and they've gone back and forth with Intel over the years. So unlike the Microsoft case, where a lot of their Windows and Office customers have put up with worse and worse software because they don't see any alternative, Intel could flush itself down the toilet and I expect most of the world would shrug and buy from others instead after a bit of short-term disruption to supply chains.

Comment Re:Will it make a differnce? (Score 1) 64

The community has not been able to deal with the DRM problem, not really. You still can't use any of the major streaming services normally on Linux. Almost all of them cap the feed at standard def or maybe 720p for one or two of the better ones.

Frankly, breaking that chokehold that the big players like Windows, macOS, iOS and Android have and forcing content distribution to open standards would be a boon to moving people off those big but increasingly customer-hostile platforms. Linux gaming used to be a sticking point for home users but these days games that don't work on Linux because of malware-like anticheats or the like are the ones that stand out. Today it's more about streaming media, something else that home users increasingly expect to Just Work.

But breaking that oligopoly is probably never going to happen through market forces alone, because the paranoid industry execs are convinced that anyone running Linux is just going to pirate all their stuff if they don't put DRM in all their contracts, as if the exact opposite isn't actually what's already happening. So regulation it is, and opening up a widely used market to much greater competition. It would help with allowing small players to make less user-hostile devices that stream to big screens as well.

Comment Re:Same (Score 4, Interesting) 98

There are so many things wrong with this statement, but to start with, what is or isn't illegal now might change with the next government. (Ask women in the US who have considered having an abortion if you think that's a someone-else problem that only applies in far away places with different cultures.)

You can now get in trouble for things you've said or done that aren't actually illegal.

It's not just about illegal content and your own police being able to access it, it's also about a commercial organisation with many thousands of employees being able to access it, and anyone else who manages to compromise that system being able to access it.

The list goes on. Everyone should be worried about moves to weaken online security. And contrary to the claim in the garbagewalled article that iPhone users have shown minimal interest, this story made the #1 spot on literally every major news and discussion website I use last week and in several cases stayed there for a day or more with unusually high participation in the resulting discussions.

Comment Re:Have to? (Score 2) 57

Because users aren't willing to pay what the service would cost with the additional revenue from selling your personal information.

[citation needed]

One of the major problems today is the often unspoken assumption that the above is true. And yet I know plenty of people who would be willing to pay entirely viable amounts of money, or extra money, for untainted products and services comparable to what we have today. The failure of the market to provide for that group of customers, whose size is unknown but certainly significant, is probably the strongest argument there is that capitalism has failed here and government regulation is needed to protect the ordinary people and rein in the "tech" (surveillance capitalist) firms.

Sometimes we don't necessarily even need it to be more profitable for the tech giants not to sell us out. We just need it to be viable for competitors to enter the market and distinguish themselves by offering respect for privacy as a selling point. That mostly doesn't depend on the price sensitivity of users. It depends on interoperability and being able to break the networking effects that keep companies like Meta, Alphabet and Microsoft dominant in their target markets.

Although when it comes to products like TVs and cars, it still wouldn't do any harm to dump on the entire industries that have sold out their customers for a few extra bucks. There should be laws requiring prominent advance disclosure of user/buyer-hostile practices before people choose to use or purchase something, like the very large health warnings cigarette companies have been legally required to cover much of their packaging in here in the UK for years, and if that still doesn't work, those user-hostile practices should simply be regulated out of existence without mercy.

Comment Re:I'd argue that smart TVs have always been shit (Score 4, Interesting) 249

The problem with this argument is that 5G and mesh networks are going to render your willingness to let your "smart" home devices spy on you and phone home irrelevant.

New cars already come with built-in SIM cards of their own. The next generation of networking technologies is probably going to make it cheap and ubiquitous for other devices to be independently online as well.

This is the part where the market has clearly failed and regulation is supposed to step in to protect the ordinary citizen, right?

Comment Re:I can't believe the U word wasn't mentioned (Score 4, Interesting) 136

Real engineers usually have formal responsibility for their work, and with that, they may also have the authority to direct changes or even bring a halt to a whole project to make sure the work is done properly and the results meet the necessary standards. People with that authority don't need a union to help them enforce acceptable quality levels and stop something known to be dangerous or harmful from going into production.

Part of the way you know software engineering has little to do with real engineering is that software engineers have no such responsibility and no such authority. This explains a lot about the quality of software compared to the quality of bridges, and about common attitudes among managers at software companies compared to real engineering organisations.

Comment Re:Were it so easy (Score 1) 136

Are you seriously contending that software developers working for Big Tech firms with TC that is often orders of magnitude higher than the average and, at least in the US, historically several times as high as similarly skilled and capable people doing similar work in other similarly developed countries, have no choice but to take those jobs and build whatever systems they are told in order to survive? Yeah, right.

No-one with the skills and experience to get those jobs at Big Tech actually needs to work on ad-tech, surveillance capitalism, high-pressure sales, content harvesting, addictive and ever-less-social media, or similarly ethically challenged applications to pay their bills. Not a single one of them.

Slashdot Top Deals

The flush toilet is the basis of Western civilization. -- Alan Coult

Working...