For folks who are Googling around with this problem - here's another option. The open source modelstore library is a wrapper that deals with the process of saving, uploading, and downloading models from Google Cloud Storage.
Under the hood, it saves scikit-learn models using joblib, creates a tar archive with the files, and up/downloads them from a Google Cloud Storage bucket using blob.upload_from_file()
and blob.download_to_filename()
.
In practice it looks a bit like this (a full example is here):
# Create modelstore instance
from modelstore import ModelStore
ModelStore.from_gcloud(
os.environ["GCP_PROJECT_ID"], # Your GCP project ID
os.environ["GCP_BUCKET_NAME"], # Your Cloud Storage bucket name
)
# Train and upload a model (this currently works with 9 different ML frameworks)
model = train() # Replace with your code to train a model
meta_data = modelstore.sklearn.upload("my-model-domain", model=model)
# ... and later when you want to download it
model_path = modelstore.download(
local_path="/path/to/a/directory",
domain="my-model-domain",
model_id=meta_data["model"]["model_id"],
)
The full documentation is here.