How does real world machine learning production systems run?

Question

Dear Machine Learning/AI Community,

I am just a budding and aspiring Machine Learner who has worked on open online data sets and some POC's built locally for my project. I have built some models and converted into pickle objects in order to avoid re-training.

And this question always puzzles me. How does a real production system work for ML algorithms?

Say, I have trained my ML algorithm with some millions of data and I want to move it to production system or host it on a server. In real world, do they convert into pickle objects? If so, it would be huge pickled file, isn't. The ones I trained locally and converted for 50000 rows data itself took 300 Mb space on disk for that pickled object. I don't think so this is right approach.

So how does it work in order to avoid my ML algorithm to re-train and start predicting on incoming data? And how do we actually make ML algorithm as a continuous online learner. For example, I built a image classifier, and start predicting the incoming images. But I want to again train algorithm by adding the incoming online images to my previously trained data sets. May be not for every data, but daily once I want to combine all received data for that day and re-train with newly 100 images which my previously trained classifier predicted with actual value. And this approach shouldn't effect my previously trained algorithm to stop predicting incoming data as this re-training may take time based on computational resources and data.

I have Googled and read many articles, but couldn't find or understand to my above question. And this is puzzling me every day. Do manual intervention is needed for production systems as well? or any automated approach is there for it?

Any leads or answers to above questions would be highly helpful and appreciated. Please let me know if my questions doesn't make sense or not understandable.

This is not a project centric I am looking for. Just a generic case of real world production ML systems example.

Thank you in advance!

dennlinger dennlinger · Accepted Answer · 2018-06-22T06:12:39

Note that this is is very broadly formulated, and your question should be put on hold probably, but I try to give a brief summary of what you are trying to ask:

"How does a real production system work?"
Well, it always depends on the scale of your product, and in what way you are using ML/AI in your system. For the most parts, you would deploy a model on your server or app.
Note that deployment does NOT lineraly scale with the amount of training data you have. Rather, the size of your network is solely determined by the number of activations in your network. Note that, after training, you might not even need as much storage space, since for example CNNs have a very limited number of connections, while retaining a much larger number during training. I can highly recommend Roger Grosse's slides on the size of a network. This also directly relates to the second point.
"How to avoid re-training?"
From what I am aware of, most systems will not be retrained on a regular basis, at least for the smaller scale. This means that a network will mostly run in inference mode only, which has the aforementioned benefit of what I mentioned about the size of the network (and the time it takes to compute a result). Then again, this also highly depends on the specific task for which you are deploying a ML model. Image classification on "standard categories" have the benefit of already delivering quite substantial models (AlexNet, Inception, ResNet,...), whereas a model for machine translation mostly depends on your specific domain and vocabulary.
"How would I go about re-training?"
This is actually the tricky part, which has a significant field called "bandit learning" behind it. The problem is that most of your incoming "new" data will be unlabeled, i.e. cannot be used for the direct integration into a new training phase. Instead, you rely on feedback from users to give you a sense of what was wrong or right. Then again, not every user has the same ratings for the same machine translation (or same recommendations on Amazon etc.), for example, so judging whether your system is "right" or "wrong" becomes very hard.
There are obviously quite a few methods to automate labeling (i.e. nearest neighbor for images, or other similarity-based searches). Online learning therefore only also works if you have this continuous loop of feedback/retraining.

For larger scale systems, it also becomes important to scale your models, to perform the desired amount of predictions/classifications per second. This is also mentioned in the link to the TensorFlow deployment page I provided, and mainly builds on top of cloud/distributed architectures, such as Hadoop or (more recently) Kubernetes. Then again, for smaller products this is mostly overkill, but serves the purpose of delivering enough resources at any arbitrary scale (and possibly on demand).

As for the integration cycle of machine learning models, there is a nice overview in this article. I want to conclue by stressing that this is a heavily opinionated question, so every answer might be different!

How does real world machine learning production systems run?

1 Answers