Comment Re:BS (Score 1) 149
LLMs perform very well with what they've got in context.
True in general, I agree. How well any local tools pick out context to upload seems to be a big (maybe the big) factor in how good their results are with the current generation of models, and if they're relying on a RAG approach then there's definitely scope for that to work well or not.
That said, the experiment I mentioned that collapsed horribly was explicit about adding those source files as context. Unless there was then a serious bug related to uploading that context, it looks like one of the newest models available really did just get a prompt marginally more complicated than "Call this named function and print the output" completely wrong on that occasion. Given that several other experiments using the same tool and model did not seem to suffer from that kind of total collapse, and the performance of that tool and model combination was quite inconsistent overall, such a bug seems very unlikely, though of course I can't be 100% certain.
It's also plausible that the model was confused by having too much context. If it hadn't known about the rest of the codebase, including underlying SQL that it didn't need to respond to the immediate prompt, maybe it would have done better and not hallucinated a bad implementation of a function that was already there.
That's an interesting angle, IMHO, because it's the opposite take to the usual assumption that LLMs perform better when they have more relevant context. In fact, being more selective about the context provided is something I've noticed a few people advocating recently, though usually on cost/performance grounds rather than because they expected it to improve the quality of the output. This could become an interesting subject as we move to models that can accept much more context: if it turns out that having too much information can be a real problem, the general premise that soon we'll provide LLMs with entire codebases to analyse becomes doubtful, but then the question is what we do instead.