>realistic
Here's the first sentence for mid 2025: Advertisements for computer-using agents emphasize the term âoepersonal assistantâ: you can prompt them with tasks like âoeorder me a burrito on DoorDashâ or âoeopen my budget spreadsheet and sum this monthâ(TM)s expenses.â They will check in with you as needed: for example, to ask you to confirm purchases.
That first sentence represents something far from the capabilities of any AI we have. Sure there are programs that can do those things. But an AI that can, without the installation of extra software, special configuration and so on? It won't exist for years. To order you a burrito on door dash it would need to have your payment information and authorization, be able to use the API for a delivery service, be able to open your browser and navigate the website, and know what kind of burrito you want. It needs to know what to do if your payment isn't accepted the first time. It needs to notice that the restaurant selected was accidentally one with your city name, but three states over. It needs to know what to do if your favorite burrito isn't available. Ideally it needs to be able to use a different delivery service if it costs less. This involves a hundred different tasks that have to be done in a particular order a particular way. And while yes it's possible to train an AI on that specific task and prepare it ahead of time, if you tell said AI "order some Chinese" the one trained for burritos will flop. There certainly are agents which in the lab can be told "order me a burrito" and due to all the prep that has been done can do so. Making it happen for an arbitrary consumer on an arbitrary device is enormously more difficult. Opening your budgeting program requires knowing where it is, where the program is, navigating the interface, understanding the format used, knowing where to put what information and so on. If an LLM could do that in the way it's presented we wouldn't need operating systems. This is LLM OS. Again, something that won't exist for at least ten years. The document STARTS with that level of misrepresentation. Then it snowballs. for example
>In a few rigged demos, it even lies in more serious ways, like hiding evidence that it failed on a task, in order to get better ratings
It's not a lie. Lying implies a level of autonomous thought that doesn't exist. The LLM didn't lie, the company did, for the purposes of pretending their AI is more advanced than it is.
I like a good piece of sci fi, my man. I do. And that's exactly what Project 2077 is.