An end-to-end example of MLOps on Google Cloud using TensorFlow, TFX, and Vertex AI
This example implements the end-to-end MLOps process using Vertex AI platform and Smart Analytics technology capabilities. The example uses Keras to implement the ML model, TFX to implement the training pipeline, and Model Builder SDK to interact with Vertex AI.
Setup your MLOps environment on Google Cloud.
Start your AI Notebook instance.
Open the JupyterLab then open a new Terminal
Clone the repository to your AI Notebook instance:
git clone https://github.com/GoogleCloudPlatform/mlops-with-vertex-ai.git
cd mlops-with-vertex-ai
Install the required Python packages:
pip install tfx==1.2.0 --user
pip install -r requirements.txt
NOTE: You can ignore the pip dependencies issues. These will be fixed when upgrading to subsequent TFX version.
Upgrade the gcloud
components:
sudo apt-get install google-cloud-sdk
gcloud components update
The Chicago Taxi Trips dataset is one of public datasets hosted with BigQuery, which includes taxi trips from 2013 to the present, reported to the City of Chicago in its role as a regulatory agency. The task is to predict whether a given trip will result in a tip > 20%.
The 01-dataset-management notebook covers:
BigQuery
.Vertex AI
Dataset resource using the Python SDK.We experiment with creating a Custom Model using 02-experimentation notebook, which covers:
Dataflow
.Keras
classification model.Keras
model with Vertex AI
using a pre-built container.Cloud Storage
to Vertex AI
.Vertex AI
for hyperparameter tuning.We use Vertex TensorBoard and Vertex ML Metadata to track, visualize, and compare ML experiments.
In addition, the training steps are formalized by implementing a TFX pipeline. The 03-training-formalization notebook covers implementing and testing the pipeline components interactively.
The 04-pipeline-deployment notebook covers executing the CI/CD steps for the training pipeline deployment using Cloud Build. The CI/CD routine is defined in the pipeline-deployment.yaml file, and consists of the following steps:
TFX
pipeline.Cloud Storage
.After testing, compiling, and uploading the pipeline definition to Cloud Storage
, the pipeline is executed with respect to a trigger.
We use Cloud Functions and Cloud Pub/Sub as a triggering mechanism.
The Cloud Function
listens to the Pub/Sub
topic, and runs the training pipeline given a message sent to the Pub/Sub
topic.
The Cloud Function
is implemented in src/pipeline_triggering.
The 05-continuous-training notebook covers:
Pub/Sub
topic.Cloud Function
.The end-to-end TFX training pipeline implementation is in the src/pipelines directory, which covers the following steps:
hyperparam_gen
custom python component.BigQuery
using BigQueryExampleGen
component.StatisticsGen
and ExampleValidator
component.Dataflow
Transform
component.Vertex AI
using Trainer
component.ModelEvaluator
component.Cloud Storage
using Pusher
component.Vertex AI
using vertex_model_pusher
custom python component.The 06-model-deployment notebook covers executing the CI/CD steps for the model deployment using Cloud Build. The CI/CD routine is defined in build/model-deployment.yaml file, and consists of the following steps:
Vertex AI
.endpoint
.Vertex AI
endpoint.We serve the deployed model for prediction. The 07-prediction-serving notebook covers:
Vertex AI
endpoint for online prediction.Vertex AI
uploaded model for batch prediction.Vertex Pipelines
.After a model is deployed in for prediction serving, continuous monitoring is set up to ensure that the model continue to perform as expected. The 08-model-monitoring notebook covers configuring Vertex AI Model Monitoring for skew and drift detection:
You can view the parameters and metrics logged by your experiments, as well as the artifacts and metadata stored by
your Vertex Pipelines
in Cloud Console.
This is not an official Google product but sample code provided for an educational purpose.
Copyright 2021 Google LLC.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.