There have been countless articles written in the last few years about how machine learning is beginning to reshape the world. A less talked about topic, however, is how the technology’s confinement to academia over the last 50 years has concealed the massive difficulties of getting it to work against real-world problems. AI’s emergence into real-world systems is revealing how difficult such systems are to debug, test, monitor, and maintain. But a huge roadblock to addressing these problems hides in plain sight: machine learning is not software, and we need to stop treating it as such.
Over a series of posts, I’m going to dissect this paper, written in 2015: Hidden Technical Debt in Machine Learning Systems. This was one of the first research paper to highlight the hidden costs of ML in an academic setting. Throughout our posts, we will build on this and highlight the real-world problems of adopting machine learning, as well as investigate the methods of mitigating and avoiding these problems. Any unattributed block quotes throughout this article come from this paper.
In this first post, we’re going to walk through one of the most important topics: how machine learning is a different beast to traditional software, setting the scene for how to approach and consider machine learning in the right way.
The benefits and the problems of machine learning stem from the same fact: its algorithms are generated automatically from labeled data, not a developer’s own handwritten code. To understand what this means, we’re going to compare ML to something more familiar: software (or software 1.0).
Traditional software is a hand-coded program created by a programmer that is cleanly articulated using deterministic, handwritten rules.
In machine learning (or software 2.0), the programmer doesn’t actually write the program: the program is created by data labelers. The data labelers define the constraints of the program by labeling data, then the ML model automatically learns from the labeled data and generates the program using statistical optimization.
It can seem surprising when ML fails at tasks that seem trivial to us—and indeed are trivial for a simple handwritten software program to solve—such as learning arithmetic. However, this apparent shortcoming is also what makes machine learning so powerful. When a solution lacks a concrete mathematical specification, software tends to fail miserably, and it is here where machine learning excels.
For example, take the problem of verifying “connectedness”. Look at the maze above. Can you tell in less than a second if it’s possible for the mouse to reach the cheese? Probably not (if you can, submit yourself for scientific study immediately). The brain is not wired to solve problems in this way. For a software algorithm, however, this is a simple problem. One such algorithm is to trace your finger along each possible path in turn to find if one allows the mouse to reach the cheese. We can code this up and easily solve this problem, since it has a clear mathematical specification.
In 1969, Minsky and Papert1 showed that very simple algorithms can solve this problem, while the very complicated neural networks that we use in today’s applications cannot.
Can you identify the above shapes? Simple. Your brain immediately knows the left shape represents a mouse, and the right a piece of cheese.
For hand-coded software algorithms, this problem is daunting and complex. For machine learning algorithms, it’s simple. Why?
The power of traditional software’s handwritten rules is that it can solve problems with clear mathematical specifications, since the programmer is in complete control of the algorithm and can write down those specifications. But this benefit is also it’s biggest downside—and machine learning’s greatest strength.
The “connectedness” problem of the maze has a clear mathematical specification, so we can write hand-coded software to solve this problem. “Mousiness” and “cheesiness” do not have a clear mathematical specification, so we cannot envision a clear hand-coded program to solve this. Instead, we have to rely on heuristics and learning techniques.
Let’s imagine we start with a program where we write rules such as, “if the object is small, brown, and has two ears, it is a mouse”. While this might work for the above example, its limitations quickly become clear as we introduce new data and our algorithm begins to identify certain rats, gerbils, and chihuahuas as mice. We could start creating lots of handwritten rules to distinguish these cases, but this is slow and results in a rigid, convoluted system.
Machine learning solves this problem by learning patterns from new image data. Increasing the volume of input data makes defining heuristics more difficult, but it makes training ML systems more effective. Feed an ML algorithm labeled data—for example, an image database where all the images that show mice have a metadata tag saying “mouse”—and, with each image, the algorithm will become better at determining precisely what a mouse looks like. This is fundamentally different from software, where more data doesn’t improve the quality of the algorithm. But in machine learning, problems like the one above are best solved with clean, structured and well annotated data. Throughout this series of posts, we will describe in more detail how to achieve this.
It is this fact—that machine learning does not feature software’s deterministic, handwritten algorithms—that makes ML difficult to test, debug, maintain, and understand.
Take unit tests as an example: with software, unit tests are possible because of abstraction boundaries. You can test a piece of code because the algorithm is determined and explicitly laid out in code independent of the input data. An ML model’s algorithms, on the other hand, are determined by the input data. With each input, a new algorithm is born. To have the same test coverage as a software algorithm, you would need to write a unit test for every single input example ever given for every model you train—a next to impossible task. Additionally, it’s not even clear what you would test. The algorithm learned by the ML model is usually not explicitly discernible.
We’re using software logic to test something that cannot be expressed in software logic. ML is new and powerful, and it demands a new and powerful method of management.
Academic practitioners emerging from universities are often surprised to find that ML code accounts for only around 5% of machine learning systems in real-world applications1. They are accustomed to training an ML model then publishing a paper on it without ever pushing it into a production system. In reality, once the model is trained, they are only 5% of the way to creating a real-world product capable of providing value to customers.
On the other hand, those in the tech industry frequently find it surprising that they can’t just sprinkle machine learning over their current software system and reap the rewards. Adding ML into a software system, even through an external API, creates dependencies that affect interactions throughout the entire system—often in hidden and detrimental ways.
This dichotomy can be understood through the lens of technical debt, a metaphor introduced by Ward Cunningham in 1992 to help reason about the long term costs incurred by moving quickly in software engineering. As with fiscal debt, there are often sound strategic reasons to take on technical debt. Not all debt is bad, but all debt needs to be serviced.
In software engineering, you take on technical debt in order to move faster. Like when buying a house or starting a business: sometimes things are only possible through taking on debt. You do this knowing that, in the long term, you’ll create enough value to pay off the debt and potentially make a profit.
But there’s a problem:
ML systems have a special capacity for incurring technical debt, because they have all of the maintenance problems of traditional code plus an additional set of ML-specific issues. This debt may be difficult to detect because it exists at the system level rather than the code level.
Methods applied to solving technical debt incurred from software, therefore, cannot be applied to ML problems. Instead, we need to adopt a system-wide approach.
Hidden technical debt is the worst technical debt you can incur, and understanding the type of debt you’re incurring is essential to managing it. Imagine a teenager receiving their first credit card in the mail and immediately embarking on an extravagant shopping spree. Most likely, they did this ignorant of how 20% compound interest works. They will learn eventually, but likely too late.
Think of ML as that teenager’s high interest credit card. ML is particularly susceptible to hidden debt, which is dangerous because the interest on it compounds quickly and often goes unnoticed.
We are only beginning to explore the possibilities of machine learning. But this process of discovery is also revealing the challenges inherent in the technology. The utility of ML is well discussed, but the technology’s darker side is a much lesser known subject. Fortunately, there are ways to mitigate the problems that ML introduces and even avoid them all together.