Sorry, I disagree strongly.
You don't need strong AI to actually implement monitoring, elasticity, and write software that fails in a method amenable to self-repair and in extreme cases, quick diagnosis.
I've been frontline on Pagerduty at the company I founded for 5 years. There were some rough times during the first few apple features. No longer, though. Most of our systems self-heal or are designed to tolerate and report outages and partitions gracefully. Some categories of failure do knock us offline, but they're total failures and the system knows to go into an exponential backoff mode in that case.
Sad part is, most of that code was trivial to write, universal over our entire app (we have one common shared library called "tragedy" that regulates nearly all of this). I wrote it years ago, and it has required only a handful of fixes over time.
People who say it's impossible usually don't understand what's actually hard about it, don't keep up with current research in the field, and say things like, "No one understands zookeeper" and "Multi-PAXOS is over-engineered" and "backpressure is what broken systems need." I try to help educate people on this stuff, but I have a strong financial incentive not to, so... I generally let other people do it and laugh at the useless hand-wringers talking about how we should give every engineer a piece of a broken bridge or some shit.