Dear Machine Learning/AI Community,
I am just a budding and aspiring Machine Learner who has worked on open online data sets and some POC's built locally for my project. I have built some models and converted into pickle objects in order to avoid re-training.
And this question always puzzles me. How does a real production system work for ML algorithms?
Say, I have trained my ML algorithm with some millions of data and I want to move it to production system or host it on a server. In real world, do they convert into pickle objects? If so, it would be huge pickled file, isn't. The ones I trained locally and converted for 50000 rows data itself took 300 Mb space on disk for that pickled object. I don't think so this is right approach.
So how does it work in order to avoid my ML algorithm to re-train and start predicting on incoming data? And how do we actually make ML algorithm as a continuous online learner. For example, I built a image classifier, and start predicting the incoming images. But I want to again train algorithm by adding the incoming online images to my previously trained data sets. May be not for every data, but daily once I want to combine all received data for that day and re-train with newly 100 images which my previously trained classifier predicted with actual value. And this approach shouldn't effect my previously trained algorithm to stop predicting incoming data as this re-training may take time based on computational resources and data.
I have Googled and read many articles, but couldn't find or understand to my above question. And this is puzzling me every day. Do manual intervention is needed for production systems as well? or any automated approach is there for it?
Any leads or answers to above questions would be highly helpful and appreciated. Please let me know if my questions doesn't make sense or not understandable.
This is not a project centric I am looking for. Just a generic case of real world production ML systems example.
Thank you in advance!