I've deployed a linear model for classification on Google Machine Learning Engine and want to predict new data using online prediction.
When I called the APIs using Google API client library, it took around 0.5s to get the response for a request with only one instance. I expected the latency should be less than 10 microseconds (because the model is quite simple) and 0.5s was way too long. I also tried to make predictions for the new data offline using the predict_proba method. It took 8.2s to score more than 100,000 instances, which is much faster than using Google ML engine. Is there a way I can reduce the latency of online prediction? The model and server which sent the request are hosted in the same region.
I want to make predictions in real-time (the response is returned immediately after the APIs gets the request). Is Google ML Engine suitable for this purpose?