8.3.1. Cloud Deployment#
Cloud platforms provide on-demand access to compute, storage, and networking resources without requiring teams to buy or manage physical hardware. For machine learning deployment, they offer a practical path from a containerized model on a developer’s laptop to an endpoint that can serve thousands of requests per second across multiple geographic regions.
The three dominant providers—AWS, Google Cloud, and Azure—each offer a similar menu of services at different abstraction levels: raw virtual machines where you control everything, managed container services that handle the runtime for you, Kubernetes clusters for large-scale orchestration, serverless functions for event-driven or low-traffic use cases, and specialized ML platforms that wrap all of these together with experiment tracking, model registries, and monitoring.
Choosing between them is often driven by where the rest of your system already lives. If your data pipeline runs on Google Cloud Storage and BigQuery, Cloud Run or Vertex AI are the natural deployment targets. An organization already deeply invested in AWS tooling will reach for SageMaker, ECS, or Lambda. The important thing is to understand the deployment pattern you need before worrying about which provider to use.
8.3.1.1. Why Cloud Deployment?#
Traditional On-Premises Challenges#
Hardware procurement (weeks/months)
Manual scaling
Maintenance overhead
Capital expenditure
Limited global reach
Cloud Advantages#
Instant provisioning: Resources in minutes
Pay-as-you-go: No upfront capital costs
Global reach: Deploy worldwide easily
Managed services: Less operational burden
Auto-scaling: Handle traffic spikes automatically
High availability: Built-in redundancy
8.3.1.2. Major Cloud Providers#
Amazon Web Services (AWS)#
ML-Specific Services:
SageMaker: End-to-end ML platform (training, deployment, monitoring)
Lambda: Serverless function execution
ECS/EKS: Container orchestration
EC2: Virtual machines
Strengths:
Most mature cloud platform
Extensive service catalog
Large ecosystem
Strong DevOps tools
Best for: Teams already using AWS, enterprise deployments
Google Cloud Platform (GCP)#
ML-Specific Services:
Vertex AI: Unified ML platform
Cloud Run: Serverless containers
GKE: Kubernetes engine
Cloud Functions: Serverless functions
Strengths:
Strong AI/ML capabilities
Excellent integration with TensorFlow
Good pricing
User-friendly console
Best for: TensorFlow users, data analytics-heavy workloads
Microsoft Azure#
ML-Specific Services:
Azure ML: Comprehensive ML service
AKS: Azure Kubernetes Service
Container Instances: Simple container deployment
Azure Functions: Serverless
Strengths:
Deep Microsoft ecosystem integration
Enterprise-friendly
Hybrid cloud capabilities
Strong in regulated industries
Best for: Microsoft shops, enterprise customers
8.3.1.3. Deployment Patterns#
Most cloud deployments of ML models fall into one of five patterns. Each represents a different trade-off between simplicity, control, scalability, and cost.
Virtual Machines#
The simplest cloud deployment: rent a virtual machine, install Docker, and run your container. You retain full control of the environment but are responsible for everything—operating system updates, security patches, monitoring, and manual scaling. This is the right starting point for internal tools or low-traffic deployments where operational simplicity matters more than scale.
Local Development
↓
Package Model & Code
↓
Deploy to Cloud VM
↓
Expose via Public IP/Domain
Example conceptual workflow:
# Create VM
aws ec2 run-instances --image-id ami-12345 --instance-type t3.medium
# SSH to VM
ssh user@vm-public-ip
# Install Docker
sudo apt-get install docker.io
# Run model container
docker run -d -p 80:5000 ml-model:v1.0
Pros:
Simple mental model
Full control
Similar to local development
Cons:
Manual management
Manual scaling
You manage security patches
Use when: Simple deployments, learning, fully custom environments
Managed Container Services#
Managed container services (Google Cloud Run, AWS ECS/Fargate, Azure Container Instances) run your Docker image without requiring you to manage the underlying virtual machines. You provide the container image; the platform handles provisioning, scaling, and availability. Most of these services scale to zero—costing nothing when idle—which makes them economical for APIs that receive intermittent traffic.
AWS ECS/Fargate (conceptual):
Docker Image → ECR (Registry)
↓
ECS Task Definition
↓
ECS Service (auto-scaling)
↓
Load Balancer → Internet
Google Cloud Run:
Docker Image → GCR (Registry)
↓
Cloud Run Service
↓
Automatic HTTPS endpoint
Benefits:
No server management
Auto-scaling built-in
Pay only for requests
HTTPS automatically configured
Use when: Production APIs, variable traffic, want simplicity
Kubernetes#
Container orchestration at scale.
Managed Kubernetes Services:
AWS EKS
Google GKE
Azure AKS
Conceptual architecture:
Kubernetes Cluster
├── Ingress (Load Balancer)
├── Service (Internal routing)
└── Deployments
├── Pod (Model Instance 1)
├── Pod (Model Instance 2)
└── Pod (Model Instance 3) [auto-scales]
Benefits:
Industry-standard orchestration
Multi-cloud portability
Advanced deployment strategies (canary, blue-green)
Rich ecosystem
Complexity:
Steep learning curve
Operational overhead
Configuration complexity
Use when: Large deployments, multiple models, advanced DevOps team
Serverless#
Execute code without managing servers.
AWS Lambda Example (conceptual):
# lambda_function.py
import json
import joblib
import boto3
# Load model from S3 on cold start
s3 = boto3.client('s3')
s3.download_file('my-bucket', 'model.joblib', '/tmp/model.joblib')
model = joblib.load('/tmp/model.joblib')
def lambda_handler(event, context):
"""AWS Lambda handler."""
features = event['features']
prediction = model.predict([features])[0]
return {
'statusCode': 200,
'body': json.dumps({'prediction': int(prediction)})
}
Deployment:
# Package dependencies
pip install -r requirements.txt -t package/
# Add function code
cp lambda_function.py package/
# Create deployment package
cd package && zip -r ../deployment.zip .
# Deploy to AWS Lambda
aws lambda create-function \
--function-name ml-prediction \
--runtime python3.11 \
--handler lambda_function.lambda_handler \
--zip-file fileb://deployment.zip
Benefits:
No server management
Pay per invocation
Auto-scales to zero
Built-in availability
Limitations:
Cold start latency (first request)
Execution time limits (AWS: 15 min, GCP: 60 min)
Memory limits (AWS: 10GB, GCP: 32GB)
Package size limits
Use when: Sporadic traffic, budget constraints, simple models
Managed ML Platforms#
Full-featured ML deployment platforms.
AWS SageMaker (conceptual):
Train Model in SageMaker Notebook
↓
Save Model to S3
↓
Create SageMaker Endpoint
↓
Invoke Endpoint (REST API)
Benefits:
End-to-end workflow (training → deployment)
Built-in monitoring
A/B testing support
Auto-scaling
Multi-model endpoints
Example (conceptual):
import sagemaker
from sagemaker.sklearn import SKLearnModel
# Create model from trained artifact
sklearn_model = SKLearnModel(
model_data='s3://bucket/model.tar.gz',
role=iam_role,
entry_point='inference.py',
framework_version='1.0-1'
)
# Deploy to endpoint
predictor = sklearn_model.deploy(
instance_type='ml.m5.large',
initial_instance_count=2
)
# Make predictions
prediction = predictor.predict(features)
Use when: Prefer managed solutions, using cloud ecosystem end-to-end
8.3.1.4. Cloud Deployment Decision Tree#
Do you need millisecond latency?
├─ Yes → Container service (ECS, Cloud Run) or VMs
└─ No
├─ Is traffic sporadic?
│ ├─ Yes → Serverless (Lambda, Cloud Functions)
│ └─ No → Container service
└─ Need complex orchestration?
├─ Yes → Kubernetes
└─ No → Managed ML platform (SageMaker, Vertex AI)
8.3.1.5. Best Practices#
DO:#
Start with managed services
Use Infrastructure as Code (Terraform, CloudFormation)
Implement auto-scaling
Monitor everything
Use multiple availability zones
Encrypt data
Version deployments
Test in staging environment first
DON’T:#
Expose models without authentication
Hard-code credentials
Ignore security best practices
Over-provision resources
Deploy directly to production
Skip monitoring
Forget about costs
8.3.1.6. Summary#
Cloud platforms offer:
Managed infrastructure: Focus on models, not servers
Global scale: Deploy worldwide easily
Multiple deployment options: VMs, containers, serverless, managed ML
Auto-scaling: Handle variable traffic automatically
Pay-as-you-go: Cost-effective for variable workloads
Choose your cloud deployment strategy based on:
Team expertise
Latency requirements
Budget constraints
Existing infrastructure
Scale requirements
For detailed implementation guides, refer to official cloud provider documentation: