0
votes

In our application, we currently expose a UI where a user can select some basic settings (model type, input features, hyperparameters) to specify a forecast model. Every time a user specifies such a model, the backend python application looks at these settings, pulls training data from relevant db, trains the relevant model and stores the model file which is then used at prediction time. The model is then re-trained according to a fixed frequency. We are looking to replace this entire flow with GCP, but I am not sure of the right approach on this. My initial thought is to write the entire backend application as a single VertexAI pipeline, where any time the user specifies a model, the pipeline is run and creates and deploys a model (custom or AutoML) which is then used at prediction time. What I am not sure is whether I can do the following:

  1. Since there is one pipeline which runs every time a user specifies a model, the pipeline will need to be parametrized. Say, user specifies a model for metric A, the pipeline creates and deploys model1, then for metric B, the pipeline deploys model2 and so on. So the pipeline will need to be parametrized.
  2. Can we actually pull data from different sources (other than BigQuery, Cloud Storage) in the pipeline?
  3. How could I schedule the pipeline runs for each model separately, i.e., say model A needs to be trained biweekly, model B needs to be trained weekly etc. Since there is just 1 pipeline and many deployed models, I am not even sure how to set the scheduling for the pipeline.

I am relatively new to GCP Vertex AI and exploring things so I am not sure I am on the right path. Does a single pipeline for this use case even make sense, or should I be considering writing a custom python application which then creates a new pipeline every single time a model is requested?