Containerization

8.2. Containerization#

Machine learning models accumulate dependencies: a specific Python version, particular library versions, system-level shared libraries, and sometimes GPU drivers. A model that runs correctly on one machine can fail silently or noisily on another if even one of these dependencies differs. Containerization solves this by packaging the model together with its entire environment into a single portable unit called a container.

Docker is the dominant containerization platform. A Docker container is defined by a Dockerfile—a text file that specifies which base image to start from, which packages to install, and how to run the application. Once built into an image, a container runs identically on any machine that has Docker installed—your laptop, a colleague’s workstation, a cloud VM, or a Kubernetes cluster.

For machine learning deployment, containerization is the bridge between a model that works in a development notebook and a model that can be reliably operated by others in production.

8.2.1. Docker for ML Deployment#

Docker is a platform that lets you package an application—your model, its dependencies, its runtime—into a self-contained unit called a container. Building once and running anywhere is the central promise: the same container image you build on your laptop will run identically on a cloud server, a colleague’s machine, or a Kubernetes cluster.

Understanding a few core concepts makes the rest of Docker much easier to follow.

An image is an immutable blueprint that defines everything inside a container: the operating system layer, installed packages, application code, and how it should start.
A container is a running instance of an image. You can run many containers from the same image simultaneously.
A Dockerfile is a plain-text script that tells Docker how to build an image, step by step.
A registry (such as Docker Hub or Amazon ECR) is a storage service for sharing and versioning images.

The relationship is simply: Dockerfile → (docker build) → Image → (docker run) → Container.

8.2.2. Your First ML Model Container#

Let us walk through packaging a simple scikit-learn model as a container. The file structure you need:

my_model/
├── model.joblib           # Trained model
├── app.py                 # Flask prediction API
├── requirements.txt       # Pinned dependencies
└── Dockerfile             # Container instructions

requirements.txt — always pin exact versions for reproducibility:

scikit-learn==1.3.2
joblib==1.3.2
numpy==1.24.3
flask==3.0.0

app.py — a minimal Flask API:

from flask import Flask, request, jsonify
import joblib
import numpy as np

app = Flask(__name__)
model = joblib.load('model.joblib')

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json()
    features = np.array(data['features']).reshape(1, -1)
    prediction = model.predict(features)
    probability = model.predict_proba(features)[0]
    return jsonify({
        'prediction': int(prediction[0]),
        'probability': probability.tolist()
    })

@app.route('/health', methods=['GET'])
def health():
    return jsonify({'status': 'healthy'})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

Note that Flask binds to 0.0.0.0 rather than 127.0.0.1. Inside a container, 127.0.0.1 refers only to the container’s own loopback interface; binding to 0.0.0.0 makes the application reachable from outside the container via port mapping.

Dockerfile:

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY model.joblib .
COPY app.py .

EXPOSE 5000

CMD ["python", "app.py"]

Build and run:

docker build -t ml-api:v1.0 .
docker run -p 5000:5000 ml-api:v1.0

The -p 5000:5000 flag maps port 5000 on the host to port 5000 inside the container. Once running, test with:

curl http://localhost:5000/health

curl -X POST http://localhost:5000/predict \
  -H "Content-Type: application/json" \
  -d '{"features": [5.1, 3.5, 1.4, 0.2]}'

8.2.3. Understanding the Dockerfile#

Each line in a Dockerfile creates a layer in the final image.

FROM python:3.11-slim

Specifies the base image. python:3.11-slim is an official Python image built on Debian with unnecessary packages stripped out, balancing convenience with size.

WORKDIR /app

Sets the working directory inside the container. All subsequent COPY, RUN, and CMD instructions operate relative to this path. The directory is created if it does not already exist.

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

Copies the requirements file before the application code. This is deliberate: Docker caches each layer, and a cached layer is reused if nothing above it has changed. Copying requirements separately means that changing your Python code does not force Docker to reinstall all dependencies on the next build.

EXPOSE 5000

Documents which port the container uses. It does not publish the port to the host by itself—that is done with -p when running the container—but it serves as documentation and is used by orchestration tools.

CMD ["python", "app.py"]

The default command run when the container starts. Using the JSON array form (rather than a shell string) avoids spawning an unnecessary shell process.

8.2.4. Essential Docker Commands#

Building:

# Build image
docker build -t image_name:tag .

# Build with no cached layers (fresh build)
docker build --no-cache -t image_name:tag .

Running:

# Run in foreground
docker run image_name:tag

# Run in background (detached)
docker run -d image_name:tag

# Run with port mapping
docker run -p 8080:5000 image_name:tag

# Run with environment variables
docker run -e MODEL_PATH=/models/model.joblib image_name:tag

# Run with a volume mount (share files between host and container)
docker run -v /host/path:/container/path image_name:tag

# Run interactively with a shell session
docker run -it image_name:tag bash

Managing containers:

docker ps                    # List running containers
docker ps -a                 # List all containers (including stopped)
docker stop container_id     # Stop a container
docker rm container_id       # Remove a stopped container
docker logs container_id     # View container output
docker logs -f container_id  # Follow logs in real time
docker exec -it container_id bash  # Open a shell in a running container

Managing images:

docker images                        # List images
docker rmi image_name:tag            # Remove an image
docker image prune                   # Remove unused images
docker save -o model.tar image:tag   # Export image to file
docker load -i model.tar             # Import image from file

8.2.5. Dockerfile Patterns for ML#

8.2.5.1. Environment Variables for Python#

Start every ML Dockerfile with these Python-specific settings:

ENV PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1 \
    PIP_NO_CACHE_DIR=1

PYTHONUNBUFFERED=1 ensures that logs from your application appear immediately rather than being buffered—crucial when debugging a running container. PYTHONDONTWRITEBYTECODE=1 prevents Python from writing .pyc files (unnecessary in containers and they add clutter). PIP_NO_CACHE_DIR=1 reduces image size by skipping pip’s download cache.

8.2.5.2. Layer Caching: Order Matters#

Docker caches each layer and reuses it if neither the instruction nor any file it depends on has changed. The practical rule: put instructions that change infrequently at the top, and instructions that change frequently at the bottom.

A poorly ordered Dockerfile wastes time:

# Every code change forces a full pip reinstall
FROM python:3.11-slim
WORKDIR /app
COPY . .
RUN pip install --no-cache-dir -r requirements.txt
CMD ["python", "src/app.py"]

A well-ordered Dockerfile is fast to rebuild after code changes:

FROM python:3.11-slim

ENV PYTHONUNBUFFERED=1 PYTHONDONTWRITEBYTECODE=1

WORKDIR /app

# System dependencies (rarely change)
RUN apt-get update && apt-get install -y --no-install-recommends libgomp1 \
    && rm -rf /var/lib/apt/lists/*

# Python dependencies (change occasionally)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Model file (changes when retrained)
COPY models/ ./models/

# Application code (changes frequently)
COPY src/ ./src/

CMD ["python", "src/app.py"]

Now a code change only invalidates the last layer; dependencies are reused from cache.

8.2.5.3. Handling Large Model Files#

How you include the model file in the container is a significant architectural decision.

Model baked into the image — the simplest approach:

COPY models/model.joblib ./models/

The model lives inside the image. Every model update requires a full rebuild and redeploy of the image. Works well for small models or when simplicity is preferred.

Model mounted at runtime — the image contains no model file:

VOLUME /app/models

Run with:

docker run -v /host/path/to/models:/app/models ml-model:v1.0

The container reads the model from a directory on the host. Updating the model requires only updating the directory, not rebuilding the image. Useful when models are large, frequently retrained, or managed separately from application code.

Model downloaded on startup — the container fetches the model from a remote store when it first runs:

# startup.py
import os, boto3
if not os.path.exists('models/model.joblib'):
    boto3.client('s3').download_file('my-bucket', 'model.joblib', 'models/model.joblib')

This keeps the image small and centralises model management in object storage (S3, GCS). The trade-off is that the first request after a cold start will be delayed by the download.

8.2.5.4. GPU Support#

For models that require GPU inference, use an NVIDIA base image:

FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04

RUN apt-get update && apt-get install -y python3.11 python3-pip \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

COPY requirements.txt .
RUN pip3 install --no-cache-dir -r requirements.txt

COPY models/ ./models/
COPY src/ ./src/

CMD ["python3", "src/app.py"]

Run with GPU access:

docker run --gpus all -p 5000:5000 ml-model-gpu:v1.0

8.2.5.5. The .dockerignore File#

Docker sends the entire build context (the directory you specify with .) to the daemon before building, including files not used by any COPY instruction. A .dockerignore file limits what gets sent, which speeds up every build.

Create .dockerignore at the root of your project:

__pycache__/
*.py[cod]
.venv/
venv/
env/
.git/
.gitignore
.vscode/
.idea/
*.ipynb
.ipynb_checkpoints/
data/
experiments/
models/checkpoints/
tests/
*.md
docs/
.DS_Store

8.2.6. Complete Sample Production Dockerfile#

Bringing the patterns above together:

# Stage 1: build
FROM python:3.11 as builder

ARG MODEL_VERSION=latest

WORKDIR /build

RUN apt-get update && apt-get install -y --no-install-recommends build-essential \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --prefix=/install --no-cache-dir -r requirements.txt

# Stage 2: runtime
FROM python:3.11-slim

LABEL maintainer="ml-team@example.com" version="1.0" description="ML Model API"

ENV PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1

RUN apt-get update && apt-get install -y --no-install-recommends libgomp1 curl \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

COPY --from=builder /install /usr/local

COPY models/ ./models/
COPY src/ ./src/

RUN useradd -m -u 1000 modeluser && chown -R modeluser:modeluser /app
USER modeluser

EXPOSE 5000

HEALTHCHECK --interval=30s --timeout=3s --start-period=10s --retries=3 \
    CMD curl -f http://localhost:5000/health || exit 1

CMD ["python", "src/app.py"]