Apache Kafka and the four challenges of production machine learning systems – O’Reilly Media
Machine learning has become mainstream, and suddenly businesses everywhere are looking to build systems that use it to optimize aspects of their product, processes or customer experience. The cartoon version of machine learning sounds quite easy: you feed in training data made up of examples of good and bad outcomes, and the computer automatically learns from these and spits out a model that can make similar predictions on new data not seen before. What could be easier, right?
Those with real experience building and deploying production systems built around machine learning know that, in fact, these systems are shockingly hard to build. This difficulty is not, for the most part, the algorithmic or mathematical complexities of machine learning algorithms. Creating such algorithms is difficult, to be sure, but the algorithm creation process is mostly done by academic researchers. Teams that use the algorithms in production systems almost always use off-the-shelf libraries of pre-built algorithms. Nor is the difficulty primarily in using the algorithms to generate a model, though learning to debug and improve machine learning models is a skill in its own right. Rather, the core difficulty is that building and deploying systems that use machine learning is very different from traditional software engineering, and as a result, it requires practices and architectures different from what teams with a traditional software engineering background are familiar with.