Chapter 19 moves from research to production. You’ll learn the complete lifecycle for shipping a TensorFlow model: exporting it to the SavedModel format, serving it via TensorFlow Serving’s REST and gRPC APIs, deploying to Google Vertex AI for serverless predictions, running in the browser with TensorFlow.js, and scaling training across multiple GPUs and machines with TensorFlow’s Distribution Strategy API. The chapter also revisits Keras Tuner for distributed hyperparameter searches and covers the PipeDream / Pathways model parallelism approaches.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/ageron/handson-ml3/llms.txt
Use this file to discover all available pages before exploring further.
What you’ll learn
- Exporting models in the SavedModel format with
model.save() - Inspecting SavedModels with
saved_model_cli - Installing and running TensorFlow Serving (Docker or native)
- Querying TF Serving via the REST API (
requests) and gRPC API - Deploying model versions — TF Serving automatically picks up new versions
- Deploying to Google Vertex AI for managed online prediction
- Running models in the browser with TensorFlow.js
- Distributed training:
MirroredStrategy(single machine, multiple GPUs) MultiWorkerMirroredStrategyfor multi-machine trainingCentralStorageStrategyandParameterServerStrategy- Distributed hyperparameter search with Keras Tuner
Key concepts
SavedModel format
model.save(path, save_format="tf") exports the model as a SavedModel — a directory containing:
saved_model.pb: the model’s computation graph and metadata.variables/: a checkpoint of all weight values.assets/: optional auxiliary files (e.g. vocabulary files for text models).
tf.saved_model.load() restores the model on any platform that runs TensorFlow.
TensorFlow Serving
TF Serving is a production-grade model server that monitors a versioned directory of SavedModels and serves predictions via REST (port 8501) or gRPC (port 8500). It handles multiple model versions simultaneously and can switch traffic to a new version without restarting. The REST endpoint follows the convention/v1/models/{model_name}:predict and accepts JSON-encoded inputs.
Distribution strategies
TensorFlow’stf.distribute API abstracts the communication between accelerators, letting you scale a training script by wrapping model creation and training in a strategy scope.
MirroredStrategy: replicates the model on all available GPUs on one machine; each GPU processes a different shard of the mini-batch and gradients are reduced via all-reduce (NCCL by default).MultiWorkerMirroredStrategy: extends MirroredStrategy to multiple machines; each machine runs one worker and gradients are communicated over the network.TPUStrategy: equivalent strategy for Google Cloud TPUs.
model.compile() calls must be inside the strategy’s scope() context manager.
Code examples
Exporting a model as SavedModel
Starting TF Serving with Docker
Querying TF Serving via REST
Multi-GPU training with MirroredStrategy
Running this notebook
Install Google Cloud SDK (optional)
The Vertex AI sections require Skip these sections if you don’t have a GCP account.
google-cloud-aiplatform~=1.36.2 and a GCP project. Install it with:Open in Colab
Open in ColabNote: On Colab you must restart the Runtime after installing
google-cloud-aiplatform.Install TF Serving
On Colab the notebook installs TF Serving automatically. Locally, use Docker (see the Docker command above) or install the native binary from the TensorFlow Serving APT repository.
Exercises
Exercises include deploying a model to Google Vertex AI, writing a client that calls the gRPC endpoint, implementing aMultiWorkerMirroredStrategy training script, and using Keras Tuner in distributed mode. Solutions are in the notebook.