Supervised vs. Unsupervised Learning

6.1.2. Supervised vs. Unsupervised Learning#

In machine learning, the presence or absence of labels fundamentally changes how we approach a problem. This distinction gives us two major categories: supervised learning and unsupervised learning. Understanding this difference is crucial because it determines which algorithms you can use and what questions you can answer.

6.1.2.1. The Label Makes All the Difference#

Imagine you’re teaching a child to recognize animals:

With labels (Supervised):

You show pictures and say “This is a cat,” “This is a dog”
The child learns by observing labeled examples
Later, they can identify new animals by what they’ve learned

Without labels (Unsupervised):

You give the child many animal pictures with no names
The child might still group similar-looking animals together
They discover patterns without being told what’s what

This same concept applies to machine learning. The fundamental question is:

Important

Do you have target values (labels) for your data?

YES → Use supervised learning
NO → Use unsupervised learning

6.1.2.2. Supervised Learning: Learning with a Teacher#

Supervised learning is like having a teacher who gives you both questions and answers during practice. The model learns from these labeled examples.

The Setup#

In supervised learning, your data consists of pairs: (X, y)

X: Input features (what you know)
y: Target output (what you want to predict)

The model learns a function f such that: y = f(X)

Two Main Types of Supervised Learning#

1. Regression: Predicting Continuous Values#

When your target y is a number (continuous), it’s a regression problem.

Examples:

Predicting house prices from features (location, size, bedrooms)
Forecasting temperature from historical weather data
Estimating customer lifetime value

2. Classification: Predicting Categories#

When your target y is a category (discrete), it’s a classification problem.

Examples:

Email spam detection (spam vs. not spam)
Medical diagnosis (disease type)
Image recognition (cat, dog, bird)

Supervised Learning Process#

The standard supervised learning workflow:

Collect labeled data: Gather (X, y) pairs
Split the data: Training set and test set
Train the model: Model learns f: X → y
Evaluate: Test on unseen data
Predict: Use model on new inputs

6.1.2.3. Unsupervised Learning: Learning Without a Teacher#

Unsupervised learning is like exploring a new city without a guide. You discover patterns, neighborhoods, and structure on your own.

The Setup#

In unsupervised learning, you only have X (input features) with no labels.

The model tries to find hidden structure in the data:

Group similar items (clustering)
Reduce dimensions (compression)
Detect anomalies (outliers)

Three Main Types of Unsupervised Learning#

1. Clustering: Discovering Groups#

Group similar data points together without being told the categories.

Examples:

Customer segmentation (group similar customers)
Document clustering (group similar articles)
Image segmentation (group similar pixels)

2. Dimensionality Reduction: Finding Compressed Representations#

Reduce the number of features while preserving important information.

Examples:

Visualizing high-dimensional data
Data compression
Noise reduction

3. Anomaly Detection: Finding Outliers#

Identify rare or unusual data points that don’t fit the pattern.

Examples:

Fraud detection
Quality control
Network intrusion detection

6.1.2.4. Side-by-Side Comparison#

Let’s compare the two approaches directly:

Aspect	Supervised Learning	Unsupervised Learning
Data	(X, y) - features + labels	X - features only
Goal	Predict y for new X	Find patterns in X
Examples	Regression, Classification	Clustering, PCA, Anomaly Detection
When to use	You have labeled data	No labels available or too expensive
Evaluation	Compare predictions to true labels	Harder - no ground truth
Algorithms	Linear/Logistic Regression, SVM, Random Forest, Neural Nets	K-Means, DBSCAN, PCA, t-SNE, Autoencoders

6.1.2.5. Semi-Supervised and Self-Supervised Learning#

Between the two extremes, there are hybrid approaches:

Semi-Supervised Learning#

Uses a small amount of labeled data + lots of unlabeled data.

Why it’s useful:

Labels are expensive (require human experts)
Unlabeled data is cheap (abundant)
Can achieve good performance with minimal labeling

Example: You have 100,000 images but only 1,000 are labeled. Semi-supervised methods use both.

Self-Supervised Learning#

The model creates its own labels from the data.

Common in:

Natural language processing (predict next word)
Computer vision (predict image rotations)

Example: Given “The cat sat on the ___”, predict the missing word using the context.

6.1.2.6. When to Use Each Approach#

Use Supervised Learning When:#

You have labeled data (or can afford to label it)
You need to make specific predictions
You can clearly define the target
Accuracy is critical

Example scenarios:

Medical diagnosis (disease present/absent)
Price prediction (specific number)
Spam detection (spam/not spam)

Use Unsupervised Learning When:#

You don’t have labels (or labeling is too expensive)
You want to explore data structure
You’re looking for hidden patterns
You want to reduce dimensionality

Example scenarios:

Customer segmentation (don’t know groups upfront)
Data exploration (understand data structure)
Anomaly detection (normal patterns unknown)
Feature engineering (create better features)

Supervised vs. Unsupervised Learning

Contents

6.1.2. Supervised vs. Unsupervised Learning#

6.1.2.1. The Label Makes All the Difference#

6.1.2.2. Supervised Learning: Learning with a Teacher#

The Setup#

Two Main Types of Supervised Learning#

1. Regression: Predicting Continuous Values#

2. Classification: Predicting Categories#

Supervised Learning Process#

6.1.2.3. Unsupervised Learning: Learning Without a Teacher#

The Setup#

Three Main Types of Unsupervised Learning#

1. Clustering: Discovering Groups#

2. Dimensionality Reduction: Finding Compressed Representations#

3. Anomaly Detection: Finding Outliers#

6.1.2.4. Side-by-Side Comparison#

6.1.2.5. Semi-Supervised and Self-Supervised Learning#

Semi-Supervised Learning#

Self-Supervised Learning#

6.1.2.6. When to Use Each Approach#

Use Supervised Learning When:#

Use Unsupervised Learning When:#