You can either deploy the trained model to production yourself by creating an API, and deploying that on at least a couple instances connected with a load balancer.
The other route is to use a service that handles that for you. My service mlrequest makes it very simple to create and deploy high-availability, low-latency models.
You can use the reinforcement learning model that is well suited to making recommendations. Below is an example of a movie recommender.
from mlrequest import RL
rl = RL('your-api-key')
features = {'favorite-genre': 'action', 'last-movie-watched': 'die-hard'}
r = rl.predict(features=features, model_name='movie-recommender', session_id='the-user-session-id', negative_reward=0, action_count=3)
best_movie = r.predict_result[0]
When the model does a good job (e.g., the user clicked on or started watching the predicted movie) then you should reward the model by giving it a positive value (in this case, a 1).
r = rl.reward(model_name='movie-recommender', session_id='the-user-session-id', reward=1)
Every time the model is rewarded for a correct action it took, it will learn to do those actions in that context more often. The model therefore learns in real-time from its own environment and adjusts its behavior to give better recommendations. No manual intervention needed.