8.3.1. Cloud Deployment#

Cloud platforms provide on-demand access to compute, storage, and networking resources without requiring teams to buy or manage physical hardware. For machine learning deployment, they offer a practical path from a containerized model on a developer’s laptop to an endpoint that can serve thousands of requests per second across multiple geographic regions.

The three dominant providers—AWS, Google Cloud, and Azure—each offer a similar menu of services at different abstraction levels: raw virtual machines where you control everything, managed container services that handle the runtime for you, Kubernetes clusters for large-scale orchestration, serverless functions for event-driven or low-traffic use cases, and specialized ML platforms that wrap all of these together with experiment tracking, model registries, and monitoring.

Choosing between them is often driven by where the rest of your system already lives. If your data pipeline runs on Google Cloud Storage and BigQuery, Cloud Run or Vertex AI are the natural deployment targets. An organization already deeply invested in AWS tooling will reach for SageMaker, ECS, or Lambda. The important thing is to understand the deployment pattern you need before worrying about which provider to use.

8.3.1.1. Why Cloud Deployment?#

Traditional On-Premises Challenges#

  • Hardware procurement (weeks/months)

  • Manual scaling

  • Maintenance overhead

  • Capital expenditure

  • Limited global reach

Cloud Advantages#

  • Instant provisioning: Resources in minutes

  • Pay-as-you-go: No upfront capital costs

  • Global reach: Deploy worldwide easily

  • Managed services: Less operational burden

  • Auto-scaling: Handle traffic spikes automatically

  • High availability: Built-in redundancy

8.3.1.2. Major Cloud Providers#

Amazon Web Services (AWS)#

ML-Specific Services:

  • SageMaker: End-to-end ML platform (training, deployment, monitoring)

  • Lambda: Serverless function execution

  • ECS/EKS: Container orchestration

  • EC2: Virtual machines

Strengths:

  • Most mature cloud platform

  • Extensive service catalog

  • Large ecosystem

  • Strong DevOps tools

Best for: Teams already using AWS, enterprise deployments

Google Cloud Platform (GCP)#

ML-Specific Services:

  • Vertex AI: Unified ML platform

  • Cloud Run: Serverless containers

  • GKE: Kubernetes engine

  • Cloud Functions: Serverless functions

Strengths:

  • Strong AI/ML capabilities

  • Excellent integration with TensorFlow

  • Good pricing

  • User-friendly console

Best for: TensorFlow users, data analytics-heavy workloads

Microsoft Azure#

ML-Specific Services:

  • Azure ML: Comprehensive ML service

  • AKS: Azure Kubernetes Service

  • Container Instances: Simple container deployment

  • Azure Functions: Serverless

Strengths:

  • Deep Microsoft ecosystem integration

  • Enterprise-friendly

  • Hybrid cloud capabilities

  • Strong in regulated industries

Best for: Microsoft shops, enterprise customers

8.3.1.3. Deployment Patterns#

Most cloud deployments of ML models fall into one of five patterns. Each represents a different trade-off between simplicity, control, scalability, and cost.

Virtual Machines#

The simplest cloud deployment: rent a virtual machine, install Docker, and run your container. You retain full control of the environment but are responsible for everything—operating system updates, security patches, monitoring, and manual scaling. This is the right starting point for internal tools or low-traffic deployments where operational simplicity matters more than scale.

Local Development
      ↓
Package Model & Code
      ↓
Deploy to Cloud VM
      ↓
Expose via Public IP/Domain

Example conceptual workflow:

# Create VM
aws ec2 run-instances --image-id ami-12345 --instance-type t3.medium

# SSH to VM
ssh user@vm-public-ip

# Install Docker
sudo apt-get install docker.io

# Run model container
docker run -d -p 80:5000 ml-model:v1.0

Pros:

  • Simple mental model

  • Full control

  • Similar to local development

Cons:

  • Manual management

  • Manual scaling

  • You manage security patches

Use when: Simple deployments, learning, fully custom environments

Managed Container Services#

Managed container services (Google Cloud Run, AWS ECS/Fargate, Azure Container Instances) run your Docker image without requiring you to manage the underlying virtual machines. You provide the container image; the platform handles provisioning, scaling, and availability. Most of these services scale to zero—costing nothing when idle—which makes them economical for APIs that receive intermittent traffic.

AWS ECS/Fargate (conceptual):

Docker Image → ECR (Registry)
                 ↓
            ECS Task Definition
                 ↓
            ECS Service (auto-scaling)
                 ↓
        Load Balancer → Internet

Google Cloud Run:

Docker Image → GCR (Registry)
                 ↓
            Cloud Run Service
                 ↓
        Automatic HTTPS endpoint

Benefits:

  • No server management

  • Auto-scaling built-in

  • Pay only for requests

  • HTTPS automatically configured

Use when: Production APIs, variable traffic, want simplicity

Kubernetes#

Container orchestration at scale.

Managed Kubernetes Services:

  • AWS EKS

  • Google GKE

  • Azure AKS

Conceptual architecture:

Kubernetes Cluster
├── Ingress (Load Balancer)
├── Service (Internal routing)
└── Deployments
    ├── Pod (Model Instance 1)
    ├── Pod (Model Instance 2)
    └── Pod (Model Instance 3) [auto-scales]

Benefits:

  • Industry-standard orchestration

  • Multi-cloud portability

  • Advanced deployment strategies (canary, blue-green)

  • Rich ecosystem

Complexity:

  • Steep learning curve

  • Operational overhead

  • Configuration complexity

Use when: Large deployments, multiple models, advanced DevOps team

Serverless#

Execute code without managing servers.

AWS Lambda Example (conceptual):

# lambda_function.py
import json
import joblib
import boto3

# Load model from S3 on cold start
s3 = boto3.client('s3')
s3.download_file('my-bucket', 'model.joblib', '/tmp/model.joblib')
model = joblib.load('/tmp/model.joblib')

def lambda_handler(event, context):
    """AWS Lambda handler."""
    features = event['features']
    prediction = model.predict([features])[0]
    
    return {
        'statusCode': 200,
        'body': json.dumps({'prediction': int(prediction)})
    }

Deployment:

# Package dependencies
pip install -r requirements.txt -t package/

# Add function code
cp lambda_function.py package/

# Create deployment package
cd package && zip -r ../deployment.zip .

# Deploy to AWS Lambda
aws lambda create-function \
  --function-name ml-prediction \
  --runtime python3.11 \
  --handler lambda_function.lambda_handler \
  --zip-file fileb://deployment.zip

Benefits:

  • No server management

  • Pay per invocation

  • Auto-scales to zero

  • Built-in availability

Limitations:

  • Cold start latency (first request)

  • Execution time limits (AWS: 15 min, GCP: 60 min)

  • Memory limits (AWS: 10GB, GCP: 32GB)

  • Package size limits

Use when: Sporadic traffic, budget constraints, simple models

Managed ML Platforms#

Full-featured ML deployment platforms.

AWS SageMaker (conceptual):

Train Model in SageMaker Notebook
              ↓
Save Model to S3
              ↓
Create SageMaker Endpoint
              ↓
Invoke Endpoint (REST API)

Benefits:

  • End-to-end workflow (training → deployment)

  • Built-in monitoring

  • A/B testing support

  • Auto-scaling

  • Multi-model endpoints

Example (conceptual):

import sagemaker
from sagemaker.sklearn import SKLearnModel

# Create model from trained artifact
sklearn_model = SKLearnModel(
    model_data='s3://bucket/model.tar.gz',
    role=iam_role,
    entry_point='inference.py',
    framework_version='1.0-1'
)

# Deploy to endpoint
predictor = sklearn_model.deploy(
    instance_type='ml.m5.large',
    initial_instance_count=2
)

# Make predictions
prediction = predictor.predict(features)

Use when: Prefer managed solutions, using cloud ecosystem end-to-end

8.3.1.4. Cloud Deployment Decision Tree#

Do you need millisecond latency?
├─ Yes → Container service (ECS, Cloud Run) or VMs
└─ No
     ├─ Is traffic sporadic?
     │   ├─ Yes → Serverless (Lambda, Cloud Functions)
     │   └─ No → Container service
     └─ Need complex orchestration?
         ├─ Yes → Kubernetes
         └─ No → Managed ML platform (SageMaker, Vertex AI)

8.3.1.5. Best Practices#

DO:#

  • Start with managed services

  • Use Infrastructure as Code (Terraform, CloudFormation)

  • Implement auto-scaling

  • Monitor everything

  • Use multiple availability zones

  • Encrypt data

  • Version deployments

  • Test in staging environment first

DON’T:#

  • Expose models without authentication

  • Hard-code credentials

  • Ignore security best practices

  • Over-provision resources

  • Deploy directly to production

  • Skip monitoring

  • Forget about costs

8.3.1.6. Summary#

Cloud platforms offer:

  • Managed infrastructure: Focus on models, not servers

  • Global scale: Deploy worldwide easily

  • Multiple deployment options: VMs, containers, serverless, managed ML

  • Auto-scaling: Handle variable traffic automatically

  • Pay-as-you-go: Cost-effective for variable workloads

Choose your cloud deployment strategy based on:

  • Team expertise

  • Latency requirements

  • Budget constraints

  • Existing infrastructure

  • Scale requirements

For detailed implementation guides, refer to official cloud provider documentation: