7 costly surprises of machine learning: part five
As discussed in the first post in this series, 95% of the code in an ML model is actually just “plumbing”: code that handles configuration, feature extraction, monitoring, analysis, resource management, serving production models, etc.
Our last post highlighted the importance of maintaining stable, relevant input data free of feedback loops. In this post, we explore how plumbing code leads to systems that frequently wind up with designs high in technical debt.
This is the fifth in a series of seven posts dissecting a 2015 research paper, Hidden Technical Debt in Machine Learning Systems, and its implications when using ML to solve real-world problems. Any block quotes throughout this piece are from this paper.
ML researchers tend to develop general purpose solutions as self-contained packages. A wide variety of these are available as open-source packages at places… or from in-house code, proprietary packages, and cloud-based platforms
This general-purpose approach works fine in academia or if you plan to only ever use one package. But in the real world, this rarely leads to the best results.
Using general purpose packages creates the need for glue code in order to transform the input and output data into the right format, train the classifier using the package-specific API, store the model in a specific way, and optimize the objective function using a package-specific API.
Glue code is costly in the long term because it tends to freeze a system to the peculiarities of a specific package; testing alternatives may become prohibitively expensive
Using a general purpose package can hinder your ability to make improvements, as you’re denying yourself access to domain-specific properties and are unable to tune the classifier towards a domain-specific goal. In addition, new, more powerful libraries and algorithms are being released at a faster pace than ever before and it’s important to be able to quickly incorporate these into your pipeline.
For example, in 2012 the AlexNet Deep Neural Network dominated the ImageNet competition and changed the way people use machine learning in computer vision. Companies who cannot move fast enough to adopt these new technologies will ultimately be the companies that get left behind.
The best defence against glue code is to wrap your ML libraries in a common API. This avoids the problems that occur with creating pipelines specific to a particular ML package, which makes it easier to quickly try out new libraries and not get vendor lock in. This is especially important since the community is changing so quickly—new algorithms and libraries are coming out nearly every month.
A specific and notable case of glue code, which occurs in data preparation, is pre-processing spaghetti. Pre-processing spaghetti forms organically as new features and input data are identified over time. Trying to prepare data in an ML-friendly format leads to lots of joins, transformations, and sampling steps, often with intermediate file outputs and database lookups.
Managing these increasingly complex pre-processing pipelines is difficult and costly. In practice, it becomes hard to detect errors and recover from failures. Pre-processing spaghetti adds a significant amount of technical debt and hinders innovation by making systems slower, costlier, and less robust.
The negative effects of pre-processing spaghetti can be mitigated by end-to-end integration tests. In reality, these tests only solve the diagnosis piece of the problem and are often difficult and expensive to implement and maintain.
The only way to combat pre-processing spaghetti is to think holistically about data collection and feature engineering. The root cause of glue code and pre-processing spaghetti stems from the fact that in practice the roles of the engineer and the researcher are generally separated.
ML packages developed in isolation appear opaque to the engineers who implement them in practice.
A hybrid research approach where engineers and researchers are embedded together on the same teams (and indeed, are often the same people) can help reduce this source of friction significantly
When you find yourself wrapped in spaghetti, the best solution is often to scrap it all together and redesign data preparation from the ground up.
Dead experimental code paths
Glue code and pre-processing spaghetti can often have the unintended effect of making quick experiments increasingly difficult to conduct. The appealing solution is to create a temporary branch off from the main codebase, leaving the main infrastructure untouched.
However, over time, these accumulated code paths can create a growing debt due to the increasing difficulties of maintaining backward compatibility and an exponential increase in cyclomatic complexity. Testing all possible interactions between codepaths becomes difficult or impossible
An infamous example of the dangers that come with this is the 1998 Mars Climate Orbiter disaster. The $327 Million spacecraft went missing because of a miscalculation on a ground computer caused by an outdated experimental code path.
As with the case of dead flags in traditional software, it is often beneficial to periodically reexamine each experimental branch to see what can be ripped out. Often only a small subset of the possible branches is actually used; many others may have been tested once and abandoned.
A better solution still is to adopt a platform that makes it easy to conduct experiments in a way that doesn’t require the creation of glue code and pre-processing spaghetti. Such a platform has been designed with experiments as a first class citizen, making it easy to quickly test different features and classifiers without leading to exponential cyclomatic complexity.
The code that surrounds the core model is sizeable and can create a lot of problems if not managed carefully. In this post, we’ve explored mitigating and avoidance tactics to help your model tick along without the plumbing springing a leak.
So far, we’ve focused largely on the internals of an ML system. But what about when things out there in the world get messy? In the next post in this series, we’ll be exploring how to manage changes in the external world that lead to covariance and prediction shift, among other external effects that can profoundly alter the behavior of any ML system.