Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror

Comment Re:That's because you don't understand (Score 1) 134

Some are. I work more with smaller businesses than Big Tech and I don't think we've ever had more interest in our software development services.

There is a rational concern that technical people will understand the benefits and limitations of generative AI but management and executive leadership will fall for the hype because it was in the right Gartner quad or something and that will lead to restructuring and job losses. Businesses that get that wrong will probably be making a very expensive mistake and personally I'm quite looking forward to bumping our rates very significantly when they come crying to people who actually know what they're doing to clean up the mess later. It's not nice for anyone whose livelihood is being toyed with in the meantime, obviously, but I don't buy the arguments that this isn't fundamentally an economic inevitability as the comment I replied to was implying.

Comment Re:That's because you don't understand (Score 1) 134

Historically and economically, it is far from certain that your hypothetical 20% increase in productivity would actually result in a proportionate decrease in employment. Indeed, the opposite effect is sometimes observed. Increased efficiency makes each employee more productive/valuable, which in turn makes newer and harder problems cost-effective to solve.

Personally, I question whether any AI coding experiment I have yet performed myself resulted in as much as a 20% productivity gain anyway. I have seen plenty of first-hand evidence to support the theory that seems to be shared by most of the senior+ devs I've talked with, that AI code generators are basically performing on the level of a broadly- but shallowly-experienced junior dev and not showing much qualitative improvement over time.

Whenever yet another tech CEO trots out some random stat about how AI is now writing 105% of the new code in their org, I am reminded of the observation by another former tech CEO, Bill Gates, that measuring programming progress by lines of code is like measuring aircraft building progress by weight.

Comment Re:BS (Score 1) 149

LLMs perform very well with what they've got in context.

True in general, I agree. How well any local tools pick out context to upload seems to be a big (maybe the big) factor in how good their results are with the current generation of models, and if they're relying on a RAG approach then there's definitely scope for that to work well or not.

That said, the experiment I mentioned that collapsed horribly was explicit about adding those source files as context. Unless there was then a serious bug related to uploading that context, it looks like one of the newest models available really did just get a prompt marginally more complicated than "Call this named function and print the output" completely wrong on that occasion. Given that several other experiments using the same tool and model did not seem to suffer from that kind of total collapse, and the performance of that tool and model combination was quite inconsistent overall, such a bug seems very unlikely, though of course I can't be 100% certain.

It's also plausible that the model was confused by having too much context. If it hadn't known about the rest of the codebase, including underlying SQL that it didn't need to respond to the immediate prompt, maybe it would have done better and not hallucinated a bad implementation of a function that was already there.

That's an interesting angle, IMHO, because it's the opposite take to the usual assumption that LLMs perform better when they have more relevant context. In fact, being more selective about the context provided is something I've noticed a few people advocating recently, though usually on cost/performance grounds rather than because they expected it to improve the quality of the output. This could become an interesting subject as we move to models that can accept much more context: if it turns out that having too much information can be a real problem, the general premise that soon we'll provide LLMs with entire codebases to analyse becomes doubtful, but then the question is what we do instead.

Comment Re: yes? (Score 1) 35

This is, remarkably, one of the worst takes I have ever seen.

Everything is politics. Especially art. Narrative and storytelling is always going to be political. There are games about war between actual countries on this earth and you think games aren't political? Maybe candy crush isn't and that's all you play. But there are political choices made throughout the development of a game, and they can and should be scrutinized through that lens.

Some games are more political than others, definitely. That's fine. But any game with more than a facile narrative better be something we can talk politics about or it's a huge waste of time.

Even this discussion of whether politics belongs/is possible to remove from games is a political topic. Polygon was a good site that often had interesting takes. Iâ(TM)ll be sad to see it turned to AI slop.

Comment Re:BS (Score 1) 149

I could certainly accept the possibility that I write bad prompts if that had been an isolated case, but such absurdities have not been rare in my experiments so far, and yet in other apparently similar scenarios I've seen much better results. Sometimes the AI nails it. Sometimes it's on a different planet. What I have not seen yet is much consistency in what does or doesn't get workable results so far, across several tools and models, several variations of prompting style, and both my own experiments and what I've heard about in discussions with others.

The thing is, if an AI-backed coding aid can't reliably parse a simple one-sentence prompt containing a single explicit instruction together with existing code as context that objectively defines the function call required to get started and the data format that will be returned, I contend that this necessarily means the AI is the problem. Again I can only rely on my own experience, but once you start down the path of spelling out exactly what you want in detail in the prompt and then iterating with further corrections or reinforcement to fix the problems in the earlier responses, I have found it close to certain that the session will end either unproductively with the results being completely discarded or with a series of prompts so long and detailed that you might as well have written the code yourself directly. Whatever effect sometimes causes the these LLMs to spectacularly miss the mark also seems to be quite sticky.

In the interests of completeness, there are several differences between the scenario you tested and the one I described above that potentially explain the very different results we achieved. I haven't tried anything with Qwen3, so I can't comment on the performance of that model from my own experience. I was using local tools that were handling the communication with (in that case) Sonnet, so they might have been obscuring some problems or failing to pass through some relevant information. I wasn't providing only the SQL and the function to be called, I gave the tool access to my entire codebase, probably a few thousands lines of code scattered across tens of files in that particular scenario. Any or all of those factors might have made a difference in the cases where I saw the AI's performance collapse.

Comment Re:I for one am SHOCKED. (Score 1) 52

You don't appear to consider the cost to everyone who didn't buy the glasses, but encounters someone wearing them.

This is the thing that people saying things like "You have no reasonable expectation of privacy in public" seem unable to grasp. There is a massive and qualitative difference between casual social observations that would naturally occur but naturally be forgotten just as quickly and the systematic, global scale, permanently recorded, machine-analysed surveillance orchestrated by the likes of Google and Meta. Privacy norms and (if you're lucky) laws supporting them developed for the former environment and are utterly inadequate at protecting us against the risks of the latter.

And it should probably be illegal to sell or operate any device that is intended be taken into private settings and includes both sensors and communications so that even in a private setting the organisations behind those devices can be receiving surveillance data without others present even knowing, never mind consenting.

Perhaps a proportionate penalty would be that the entire board and executive leadership team of any such organisation and a random selection of 20 of each of their family and friends should be moved to an open plan jail for a year where there are publicly accessible cameras and microphones covering literally every space. Oh, and any of the 20 potentially innocent bystanders who don't think that's OK have the option to leave, but if they do, their year gets added to the board member or executive they're associated with instead.

Comment Re:BS (Score 1) 149

FWIW, I was indeed surprised by some of the things these tools missed. And yes, the worst offenders were the hybrid systems running some sort of local front-end assistant talking to a remote model. Personally, while small context limits get blamed a lot for some of the limitations of current systems, I suspect that limitation is a bit misleading. Even with some of the newer models that can theoretically accept much more context, it would still be extremely slow and expensive to provide all of a large codebase to an LLM as context along with every prompt, at least until we reach a point where we can run the serious LLMs locally on developer PCs instead of relying on remote services.

Even with all of those caveats, if I give a tool explicit context that includes the SQL to define a few tables, a function that runs a SQL query using those tables and returns the results in an explicitly defined type, and a simple prompt to write a function that calls the other function (specified by name) and print out the data it's retrieved in a standard format like JSON, I would not expect it to completely ignore the explicitly named function, hallucinate a different function that it thinks is returning some hacky structure containing about 80% of the relevant data fields, and then mess up the widely known text output format. And yet that is exactly what Sonnet 3.7 did in one of my experiments. That is not a prototype front-end assistant misjudging which context to pass through or a failure to provide an effective prompt. That's a model that just didn't work at all on any level when given a simple task, a clear prompt, and all the context it could possibly need.

Comment Re:BS (Score 1) 149

As for their ability to infer, I couldn't agree less with that.

Dump an entire code base into their context window, and they demonstrate remarkable insight on the code.

Our mileage varies, I guess. I've done quite a few experiments like that recently and so far it seems worse than a 50/50 shot that most of the state-of-the-art models will even pick up on project naming conventions reliably, and far less that they'll follow basic design ideas like keeping UI code and database code in separate packages or preferring the common idioms in the programming languages I was using. These were typically tests with real, existing codebases on the scale of a few thousand lines, and the tools running locally had access to all of that code to provide any context to the remote services they wanted. I've also tried several strategies with including CONVENTIONS.md files and the like to see if that helped with the coding style, again with less than convincing results.

Honestly, after so much hype over the past couple of years, I've been extremely disappointed so far by the reality in my own experiments. I understand how LLMs work and wasn't expecting miracles, but I was expecting something that would at least be quicker than me and my colleagues at doing simple, everyday programming tasks. I'm not sure I've found any actual examples of that yet, and if I have, it was faster by more like 10% than 10x. The general response among my colleagues when we discuss these things is open ridicule at this point, as it seems like most of us have given it a try and reached similar conclusions. I'm happy for you if you've managed to do much better, but I've never seen it myself yet.

Comment Re:BS (Score 1) 149

I've done some experiments recently with LLM-backed tools to try to understand the current state of the art. FWIW, my own experience has been that for relatively simple boilerplate-generation jobs they can often produce useful code, but their limit is roughly the capabilities of a junior developer. They make mistakes fairly often. Maybe more importantly, even when their code technically produces the right answer, they rarely infer much about any existing design or coding standards and their code often doesn't fit in with what is already there. I have found that to be the case disappointingly consistently, even in projects small enough to fit the entire codebase in the context, and across a variety of prompting strategies and tools.

So far, I'd say a relatively good session can produce a lot of correct boilerplate without much human intervention other than the prompts themselves. I'm uncertain about whether it really does so significantly faster than a senior dev who could stream the same kind of boilerplate as fast as their fingers could type it, once you take into account the need to proofread and correct the LLM's output, but it was probably faster than a mid and certainly faster than a junior in most experiments I tried. In contrast, a bad session can last an hour or more and still result in discarding the entire output from numerous interactions with an LLM because it has produced literally no code of sufficient value to keep.

Comment Re:Where Do You See It Going? (Score 1) 125

This is the danger of consolidation. If everyone comes to rely on a single provider for critical IT facilities - see also Microsoft - and then one day even that big corporation decides that providing those facilities in the same way as it has for a long time is no longer the right business strategy - again, see also Microsoft - that can leave a big hole.

However, in Intel's case there are alternative providers of both compatible CPUs (AMD) and effective competitors (everyone building ARM). There have been times when AMD hardware looked better in the past and they've gone back and forth with Intel over the years. So unlike the Microsoft case, where a lot of their Windows and Office customers have put up with worse and worse software because they don't see any alternative, Intel could flush itself down the toilet and I expect most of the world would shrug and buy from others instead after a bit of short-term disruption to supply chains.

Comment People make fun of Bulwer-Lytton (Score 1) 14

But they still quote "the pen is mightier than the sword", which in context is a superb description of good government:: "Beneath the rule of men entirely great, the pen is mightier than the sword". He should get credit. Besides, the opening of Paul Clifford was standard Victorian style.

Slashdot Top Deals

Kill Ugly Processor Architectures - Karl Lehenbauer

Working...