Deploy a Tensorflow Model to Production

Build and train the model

We are going to create a simple Tensorflow Sequential model to recognize handwritten digits from 0 to 9.

First, we load the data:

import tensorflow as tf
print("TensorFlow version:", tf.__version__)

mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

Then we define the model:

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')

The output is a Dense layer with 10 units. The model returns a vector of logits, one for each class.

predictions = model(x_train[:1]).numpy()

array([[0.10152689, 0.17136204, 0.12844075, 0.06615458, 0.11072428,
        0.11157966, 0.04291946, 0.10779996, 0.08779941, 0.07169289]],

We can compile the model and train it:

              metrics=['accuracy']), y_train, epochs=5)

After training, we evaluate our model on the test set to see how well it performs on data it has never seen before:

model.evaluate(x_test,  y_test, verbose=2)
# [0.0701991468667984, 0.9783000349998474]

It achieve 97.8% of accuracy. Not bad!

Save the model

We save the tensorflow model:

import os
model_version = "0001"
model_name = "my_mnist_model"
model_path = os.path.join(model_name, model_version), model_path)

Deploy on Google Cloud Platform

Now, we are going to deploy our model on Google Cloud Platform to make it available for live predictions.

You need to have the gcloud CLI installed on your computer to run the following commands.

Define the variables

Open a terminal and declare some variables for convenience:

export PROJECT_ID="project-123"
export BUCKET_NAME="my-bucket"
export REGION="europe-west4"

Create a Storage bucket

gsutil mb -p ${PROJECT_ID} -l ${REGION} -b on "gs://${BUCKET_NAME}"

Upload the model folder to your bucket

gsutil cp -r ./my_mnist_model "gs://${BUCKET_NAME}/"

Create the Vertex AI model

gcloud ai models upload \
  --region=${REGION} \
  --display-name="mnist" \
  --container-image-uri="" \

export MODEL_ID=$(gcloud ai models list --region=${REGION} --filter=display_name="mnist" --format="value(MODEL_ID)")

Create the endpoint

gcloud ai endpoints create \
  --region=${REGION} \

export ENDPOINT_ID=$(gcloud ai endpoints list --region=${REGION} --filter=display_name="mnist-endpoint" --format="value(ENDPOINT_ID)")

Deploy the model

gcloud ai endpoints deploy-model ${ENDPOINT_ID} \
  --region=${REGION} \
  --model=${MODEL_ID} \
  --display-name="mnist" \
  --machine-type="n1-standard-2" \

Use the model endpoint

Once our model has been deployed on GCP, we can use the endpoint to do batch predictions.

First let's generate some data from the test set and save it in a JSON file:

pred_file = {
  'instances': x_test[0:5].tolist(),

import json
# Write to a file
with open('pred_file.json', 'w') as f:
  json.dump(pred_file, f)

We can now send a simple CURL command to predict the results:

curl \
  -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  https://${REGION}${PROJECT_ID}/locations/${REGION}/endpoints/${ENDPOINT_ID}:predict \
  -d "@pred_file.json"

Cleanup GCP resources (optional)

To avoid incurring charges, you can delete the resources.

Undeploy the model

export DEPLOYED_MODEL_ID=$(gcloud ai endpoints describe ${ENDPOINT_ID} --project=${PROJECT_ID} --region=${REGION} --format="value(")

gcloud ai endpoints undeploy-model ${ENDPOINT_ID} --deployed-model-id=${DEPLOYED_MODEL_ID} --project=${PROJECT_ID} --region=${REGION}

Delete the endpoint

gcloud ai endpoints delete ${ENDPOINT_ID} --project=${PROJECT_ID} --region=${REGION}

Delete the model

gcloud ai models delete ${MODEL_ID} --project=${PROJECT_ID} --region=${REGION}