All about boosting customer happiness with science and code

❮ Blog Home

Custom model deployment on Google A.I. Platform Serving.

August 05, 2019 By Yash Vijay

My experience deploying custom Machine Learning models, pipelines and prediction routines at Wootric

Recently, Google Cloud AI Platform Serving (CAIP) added a new feature which Machine Learning (ML) practitioners can now use to deploy models with customized pre-processing pipelines and prediction routines using their favorite frameworks, all under one serverless microservice. In this blog post, I explain in detail how we at Wootric make use of this feature and mention a few nitty-gritties to be careful about, as the product is still in beta.

A little bit of background


Before we completely shift our focus to model deployment, I’m going to take just a bit to give you some context about the problem our model is trying to solve. At Wootric, we use NLP and Deep Learning to learn and tag themes in customer and employee feedback for different industries. One piece of feedback can be associated with one, or multiple tags, as shown in the visual above. To learn more about our product, visit this link. This can be treated as a multi-label classification problem with one model, or a two-class multi-class classification problem with n models. For the purposes of this blog, we’re going to treat this as a multi-class classification problem and deploy n models for n tags. 

Model Overview

When I began working on this problem, I experimented with traditional ML classifiers trained on different bag-of-words representations of the data for each tag, to act as baselines. Amongst these, the best ones overall turned out to be the random forest models (surprise surprise). In this post, we’re going to deploy these models for ease of explanation, so that we can spend more time with custom deployment on CAIP, and not get lost in model complexity. In later posts, we’ll discuss deployment at a higher level for a CNN based model, and a sequence based multi-label model, both written in pytorch.

A.I. Platform Serving Overview

Screen Shot 2019-07-30 at 6.19.41 PM  — — The blue boxes highlight where CAIP comes into play

We’re going to quickly talk about the benefits of CAIP in the context of the 4 boxes in the lower half of the above diagram. To learn more about how you can use CAIP for training and evaluating models, visit this link.

On CAIP, we can either deploy models trained using pipelines from frameworks that are directly compatible with it — sklearn, XGBoost, tensorflow — or deploy a custom predictor class with the required dependencies and any custom state-dependent pre-processing and post-processing classes as a source distribution package. A big positive here is the coupling of raw data pre-processing, prediction, and prediction post-processing, all under one model microservice; everything is happening server side! This has a few mentionable benefits. Firstly, since no preprocessing is happening on the client side, this decouples the client from the model. So, we can update the preprocessing code without updating clients, or by providing new endpoints for the update. Secondly, there’s no training-serving skew, as any changes in the pre-processing step would force us to retrain the model

In addition, custom prediction routines in CAIP allows us to integrate rule based predictions with model predictions, and also aggregate predictions from an ensemble of models stored in a bucket. 

Since this blog entirely focuses on custom deployment, if you’re interested in deploying pre-defined model pipelines from libraries supported by CAIP, refer to this tutorial

Getting into it…

Before we move forward, you should:

  • Get familiar with Google Cloud SDK,
  • Create a project on GCP, enable billing and enable the AI Platform (“Cloud Machine Learning Engine”) and Compute Engine APIs.

Alternatively, you can replicate these steps on the Google Cloud Platform (GCP) console, or using the REST API.

Let’s get into the code

So, I have 11 random forest models trained for 11 different tags. These tags are common themes in feedback written by consumers about SAAS-y companies. Following are the tags: 

["Alerts_Notification", "Bugs", "Customer_Support", "Documentation", "Feature_Request", "Onboarding", "Performance", "Price", "Reporting", "UX_UI", "Value_Prop"]

Given that we’re dealing with noisy text data, it’s definitely useful to have custom pre-processing helpers that can clean the data, as shown in this gist:

In brief, the cleaning steps involve lowercasing the data, lemmatizing using wordnet pos tags, and removing punctuation. An important step to note here is right on top of the gist, where I download the required NLTK resources onto the tmp folder, which I append to the list of paths where NLTK can look for resources. This is the only writable folder for such resources, during runtime. An alternative is to save these resources within the source distribution package tarball that contains all model artifacts as serialized objects, and directly load them into a custom predictor class (which we will discuss later).

Here is the code I used for training each model:

An important point to note in the above gist is the use of the FunctionTransformer method from sklearn that allows us to add custom pre-processing routines to the pipeline, as long as there’s agreement between the I/O of the custom routine and the I/O of its neighboring routines. Essentially, the above code is looping over each tag in the data frame, converting the tag columns to labels using the LabelEncoder. Then, a model is trained for each tag using feedback r_text and the transformed labels, which is finally exported as a pickled object. It can be argued that in this case, it would be optimal to have a common pre-processing sklearn pipeline object that contains clean_data_custom and text_transform, as every model is trained on the same data, with identical configuration. However, for simplicity we allow every model to have its own pre-processing pipeline. If you do end up with a common pipeline object, this can be incorporated in the custom predictor class, which we will discuss next. 

Let’s do a quick recap. As of now, I have 11 pickled model pipelines (one for each tag), and a preprocess module with helper functions for cleaning. 

The main question now is, how does CAIP direct the input feedback for prediction into these 11 models? In addition, what do I do with so that CAIP can link it to the model pipeline objects, so that they can use the clean function?

To solve the first issue, the custom prediction routine comes into play. Refer this CustomPredictor class I’ve written:

Before we move on, know that every time we create a new model version for a custom prediction routine, we pass on information to CAIP which it can use to look for model artifacts in a storage bucket. We’ll see how to do this in code a little later, but for now understand that this information is passed to gcloud using flags. So, when we create a new model version, we set the prediction-class gcloud flag to point to the CustomPredictor class in the multiTag module. What GCP now does is that it looks for this class in the source distribution package, and calls the from_path class method with the model directory as an argument, which it picks from another flag called origin which is also passed while creating a model version. Making sure that GCP passes the right directory path here is crucial! The method stores all the loaded models into a list and instantiates a CustomPredictor object with this list. 

Now, it’s imperative to follow this template when creating a custom predictor class (courtesy: google):

Basically, always have the from_path method return an instance of your custom predictor class with the relevant arguments, which in this case was just a list of models. This is also where we could have loaded a custom pre-processing pipeline object and passed it on to the custom predictor, which we briefly spoke about earlier. Now, all the magic happens within the predict function. This takes as input a list of JSON-deserialized input strings. As shown in the code, I simply loop over each instance, loop over every model and append the prediction results as a dictionary object in python. The returned results must be JSON serializable

Now getting back to the second issue (do you remember what it was?). How do I tell CAIP to link to the model pipeline objects, so that they can use the clean function? Also, how do I upload the multiTag.pymodule?

Both these questions have the same answer: as a source distribution package (SDP)! If you remember correctly, I’d mentioned a couple of times that all artifacts must be contained in the SDP and both these files count as model artifacts.

Here is the module that creates the SDP:

The scripts argument should be an iterable of the local paths of each relevant script. To create the SDP, run the following command:

python dist --formats=gztar

Here, dist is the destination directory for the SDP. 

Now that we have all the scripts and models ready, all thats left is to run a few gcloud commands to copy the models and SDP to a storage bucket, create a model resource and link it to this bucket, and finally create a model version and make online predictions!

First, create a bucket with a valid bucket name and region:

gsutil mb -l $REGION gs://$BUCKET_NAME

For copying the models and SDP, I use the gsutil cp command:

gsutil -m cp *.pkl gs://$BUCKET_NAME
gsutil cp dist/multiTag_custom_package-0.1.tar.gz 

It’s easier to have the bucket in the same project which is registered to the AI platform, else the bucket will need explicit access to AI platform service accounts.

Now that we have all our artifacts in a storage bucket, let’s create a model:

gcloud ai-platform models create "multiTag_model"

Let’s also set the following variables 


I placed all my models and artifacts directly into the bucket without creating any extra directories, so my MODEL_DIR was just the bucket URI (Universal Resource Identifier). In case you have a different directory structure for your model and associated artifacts, make sure you give the right absolute paths for MODEL_DIR and CUSTOM_CODE_PATH.

Now, create the version as shown in the code below. This is what I meant before, when I mentioned passing all location metadata regarding model artifacts to CAIP while creating a model version.

gcloud beta ai-platform versions create $VERSION_NAME \
 --model $MODEL_NAME \
 --origin $MODEL_DIR \
 --runtime-version=1.13 \
 --python-version=3.5 \
 --package-uris=$CUSTOM_CODE_PATH \

Once the version is created, you can now send inputs for online predictions. Create a .txt or a JSON file (recommended) with inputs in this format.

You can now get online predictions by running the code below:

gcloud ai-platform predict --model $MODEL_NAME --version   $VERSION_NAME --json-instances $INPUT_FILE

And get predictions:



Overall, I think the deployment pipeline is pretty smooth and easy to understand. However, one challenge that I faced when I’d just started playing around with CAIP was that for internal errors during deployment or post deployment (during predictions), the traceback’s aren’t super helpful for debugging. I had to scrutinize each bit of code to understand the issue, and then deploy again, which can get a little on your nerves as the deployment process itself takes a few minutes, especially for big models. So debugging little errors here and there can actually end up taking a lot of your time!

Wait, what? test my model with CAIP local predict, you say? For those of you who don’t know, before deployment, you can use the CAIP local predict command to test how your model serves predictions before you deploy it. The command uses dependencies in your local environment to perform predictions and returns results in the same format that gcloud ai-platform predict uses when it performs online predictions. Testing predictions locally can help you discover errors before you incur costs for online prediction requests. But, but.. BUT. This feature does NOT work with custom prediction routines, so be super careful with your code before deployment!

And finally, one last point to take note of is that the limit for model artifacts on CAIP is 250 MB. You can request for a higher quota to deploy larger models. 

So there you go, you made it! I really hope that you enjoyed reading this article and found it helpful. If you do have any questions please leave a comment! 

Happy Deploying!

Filed Under: AI