Artificial intelligence will not supplant DevOps: The expensive experience of the AWS outage.

When Amazon Web Services (AWS) experienced one of its most significant outages in recent years, it was not a shock to most people in the technology sector. This event followed the announcement that AWS had retrenched almost 40 percent of its DevOps staff and replaced them with what the company termed as AI-powered self-healing infrastructure.

The result? A global outage that shook businesses all over the world – and a very clear demonstration that AI is not equal to human skills in complex systems management.

What Actually Happened

A problem with DNS settings, according to the reports, caused the outage. It is the type of issue that an experienced DevOps engineer would have detected and solved within minutes. However, there were not enough seasoned engineers in the vicinity this time to notice it.

Rather, the heavily promoted AI-driven automation at AWS did not react since AI cannot solve something it does not identify. The machine learning systems simply do not know anything new, and new, unexpected edge cases are often out of their range.

In the meantime, the engineers, who had been well aware of the architecture of AWS, both internally and externally, both of its idiosyncrasy, its idiosyncratic failure modes. The years of experience that is gained through work experience had been lost.

The Real Issue: Buying the AI Hype.

It is not just a tale of AWS. It is regarding the increasing industry opinion that AI can be used to completely substitute human DevOps teams.

The executives are fond of the concept of automation as a means of saving costs. What they tend to forget, however, is that DevOps does not simply involve the utilization of scripts or pipelines. It is about seeing systems as a whole – where they are broken, why they are broken and how to make sure that the same does not occur in the future.

Sure, AI can be used to automatize repetitive tasks. However, it is not able to troubleshoot invisible problems, nor can it mimic intuition, pattern recognition and historical context, which make a great DevOps engineer priceless.

In Cases where Experience is More Than Automation.

An experienced DevOps engineer may remember that Route 53 is unreliable when leaping seconds, or that a subnet is likely to crash when traffic is high. It is not something that is documented but tacit knowledge gained through experience.

By replacing those people with AI, companies lose not only the number of people. They lose the group memory that assists systems to heal quickly in times of crisis.

A Lesson to Every Technological Company.

Assuming that AWS, with its enormous resources and state-of-the-art infrastructure, cannot completely automate DevOps using AI, how smaller startups would believe they can do the same?

The use of automation should not substitute engineers. The predictable is something that AI can deal with (scaling, backups, alerts), but the unpredictable remains to be the responsibility of humans.

The firms that rush out to blindly follow the path of AI automation without ensuring the human factor are preparing to learn its lessons at a very high cost when the things go wrong.

Leave a Comment

Your email address will not be published. Required fields are marked *