One of the most fascinating aspects of H2O is the sheer number of forms it can take under different conditions.
...that there's a LOT of minerals and other nutrients in food, only a fraction of which are produced from chemicals in fertilisers, O2, and CO2. If you produce too much with too little consideration of the impact on the soil, you can produce marvellous dust bowls but eventually that's ALL you will produce.
There's a lot of stuff that is on the Internet that doesn't end up in AIs, either because the guys designing the training sets don't consider it a particular priority or because it's paywalled to death.
So the imbalance isn't just in languages and broader cultures, it's also in knowledge domains.
However, AI developers are very unlikely to see any of this as a problem, for one very very important reason --- it means they can sell the extremely expensive licenses to those who actually need that information, who can then train their own custom AIs on it. Why fix a problem where the fix means your major customers pay you $20 a month rather than $200 or $2000? They're really not going to sell ten times, certainly not a hundred times, as many $20 doing so, so there's no way they can skim off the corps if they program their AIs properly.
Let's take a look at software sizes, for a moment.
UNIX started at around 8k, and the entire Linux kernel could happily sit in the lower 1 megabyte of RAM for a long time, even with capabilities that terrified Microsoft and Apple.
The original game of Elite occuped maybe three quarters of a 100k floppy disk and used swapping and extensive use of data files to create a massive universe that could be loaded into 8k of RAM.
On a 80386SX with 5 megabytes of RAM (Viglens were weird but fun) and a 20 megabyte hard drive, running Linux, I could simultaneously run 7 MMORGs, X11R4, a mail server, a list server, an FTP server, a software router, a web server, a web cache, a web search engine, a web browser, and stil have memory left over to play Netrek, without slowing anything down.
These days, that wouldn't be enough to load the FTP server, let alone anything else.
On the one hand, not everything can be coded to SEL4 standards (although SEL4, by using Haskell as an initial language to develop the core and the proofs, was able to cut the cost of formal programming to around 1% of the normal value). On the other hand, a LOT of space is gratuitously wasted.
Yes, multiple levels of abstraction are a part of the problem. Nothing wrong with abstraction, OpenLook is great, but modern abstraction is mostly there due to incompetent architecture on previous levels and truly dreadful APIs. And, yes, APIs are truly truly dreadful if OpenLook is the paragon of beauty by comparison.
Nature always sides with the hidden flaw.