6.4.1. Baseline Models#
Before you train a sophisticated model, you need something to beat.
A baseline model makes predictions using the simplest possible rule — no learning, no patterns, no features. Its only job is to answer the question: what can you achieve by being completely naive?
If your trained model cannot outperform a baseline, something is wrong: the features carry no signal, the labels are corrupted, or there is a bug in your pipeline. The baseline is not a competitor — it is a sanity check.
6.4.1.1. Why You Need a Baseline#
It is easy to see 85% accuracy and think the model is working well. But if always predicting the majority class also gives 85%, the model has learned nothing at all.
On the 900 / 100 split below, a rule that always predicts class 0 achieves 90% accuracy. Any model that scores below that number has literally learned nothing.
import numpy as np
import pandas as pd
from myst_nb import glue
from sklearn.datasets import load_breast_cancer, load_diabetes
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.dummy import DummyClassifier, DummyRegressor
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LinearRegression
import warnings
warnings.filterwarnings('ignore')
np.random.seed(42)
# Imbalanced dataset: 90% class 0, 10% class 1
y_imbalanced = np.array([0] * 900 + [1] * 100)
X_imbalanced = np.random.randn(1000, 10)
always_zero_acc = (y_imbalanced == 0).mean()
glue('always-zero-acc', f'{always_zero_acc:.0%}', display=False)
6.4.1.2. Baseline Options for Classification#
Scikit-learn’s DummyClassifier implements all common baselines. The strategy parameter controls the rule:
cancer = load_breast_cancer()
X, y = cancer.data, cancer.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
strategies = {
'most_frequent': 'Always predict the most common class',
'stratified': 'Sample randomly, respecting class proportions',
'uniform': 'Sample uniformly at random from all classes',
'prior': 'Always predict the class with highest prior',
}
rows = []
for strategy, description in strategies.items():
dummy = DummyClassifier(strategy=strategy, random_state=42)
dummy.fit(X_train, y_train)
acc = dummy.score(X_test, y_test)
rows.append({'Strategy': strategy, 'Accuracy': round(acc, 3), 'Rule': description})
# Compare to a real model
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
rows.append({'Strategy': 'RandomForest', 'Accuracy': round(rf.score(X_test, y_test), 3), 'Rule': 'Trained model'})
display(pd.DataFrame(rows))
| Strategy | Accuracy | Rule | |
|---|---|---|---|
| 0 | most_frequent | 0.632 | Always predict the most common class |
| 1 | stratified | 0.596 | Sample randomly, respecting class proportions |
| 2 | uniform | 0.553 | Sample uniformly at random from all classes |
| 3 | prior | 0.632 | Always predict the class with highest prior |
| 4 | RandomForest | 0.956 | Trained model |
Which strategy to use as your reference?
most_frequent— the most common baseline for classification. Directly answers: “what if we always predict the majority?” Essential on imbalanced datasets.stratified— useful when you care about performance across classes, not just overall accuracy.prior— equivalent tomost_frequentfor binary problems; differs on multi-class.
Tip
For imbalanced classification, also check baseline F1 score and AUC, not just accuracy. A model that always predicts the majority class scores 0.0 F1 on the minority — which often reflects the actual problem better than accuracy does.
from sklearn.metrics import f1_score, roc_auc_score
# Simulate imbalanced data
np.random.seed(42)
y_imb = np.array([0] * 450 + [1] * 50)
X_imb = np.random.randn(500, 10)
X_tr, X_te, y_tr, y_te = train_test_split(X_imb, y_imb, test_size=0.2, stratify=y_imb, random_state=42)
dummy = DummyClassifier(strategy='most_frequent')
dummy.fit(X_tr, y_tr)
y_pred_dummy = dummy.predict(X_te)
rf2 = RandomForestClassifier(n_estimators=100, random_state=42)
rf2.fit(X_tr, y_tr)
y_pred_rf = rf2.predict(X_te)
metrics_df = pd.DataFrame({
'Metric': ['Accuracy', 'F1 (minority class)'],
'Baseline': [round((y_pred_dummy == y_te).mean(), 3),
round(f1_score(y_te, y_pred_dummy, pos_label=1), 3)],
'RandomForest': [round((y_pred_rf == y_te).mean(), 3),
round(f1_score(y_te, y_pred_rf, pos_label=1), 3)],
})
display(metrics_df)
| Metric | Baseline | RandomForest | |
|---|---|---|---|
| 0 | Accuracy | 0.9 | 0.9 |
| 1 | F1 (minority class) | 0.0 | 0.0 |
6.4.1.3. Baseline Options for Regression#
For regression, the baseline predicts a fixed constant for every input.
diabetes = load_diabetes()
X_reg, y_reg = diabetes.data, diabetes.target
X_tr_r, X_te_r, y_tr_r, y_te_r = train_test_split(
X_reg, y_reg, test_size=0.2, random_state=42
)
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
strategies_reg = {
'mean': 'Predict the training mean for every sample',
'median': 'Predict the training median for every sample',
}
reg_rows = []
for strategy, description in strategies_reg.items():
dummy = DummyRegressor(strategy=strategy)
dummy.fit(X_tr_r, y_tr_r)
y_pred = dummy.predict(X_te_r)
reg_rows.append({
'Model': strategy,
'RMSE': round(np.sqrt(mean_squared_error(y_te_r, y_pred)), 1),
'MAE': round(mean_absolute_error(y_te_r, y_pred), 1),
'R²': round(r2_score(y_te_r, y_pred), 3),
'Rule': description,
})
lr = LinearRegression()
lr.fit(X_tr_r, y_tr_r)
y_pred_lr = lr.predict(X_te_r)
reg_rows.append({
'Model': 'LinearRegression',
'RMSE': round(np.sqrt(mean_squared_error(y_te_r, y_pred_lr)), 1),
'MAE': round(mean_absolute_error(y_te_r, y_pred_lr), 1),
'R²': round(r2_score(y_te_r, y_pred_lr), 3),
'Rule': 'Trained model',
})
display(pd.DataFrame(reg_rows))
| Model | RMSE | MAE | R² | Rule | |
|---|---|---|---|---|---|
| 0 | mean | 73.2 | 64.0 | -0.012 | Predict the training mean for every sample |
| 1 | median | 72.9 | 62.7 | -0.003 | Predict the training median for every sample |
| 2 | LinearRegression | 53.9 | 42.8 | 0.453 | Trained model |
Which strategy for regression?
mean— the standard baseline. Note that \(R^2 = 0\) when predicting the mean by definition — any positive \(R^2\) means your model explains variance beyond the mean.median— preferred when outliers are present; more robust than the mean.
Note
\(R^2 = 0\) for the mean-predicting baseline is not a coincidence. \(R^2\) is defined as \(1 - \frac{SS_{res}}{SS_{tot}}\), where \(SS_{tot}\) is the variance around the mean. Predicting the mean makes \(SS_{res} = SS_{tot}\), so \(R^2 = 0\). A negative \(R^2\) means your model is worse than always guessing the mean — a serious warning sign.
6.4.1.4. Using Baselines in Practice#
Establishing a baseline is the first step in any modelling workflow. Run it once, record the number, and treat it as the minimum bar your model must clear.
# Step 1: establish baseline
baseline = DummyClassifier(strategy='most_frequent', random_state=42)
baseline_scores = cross_val_score(baseline, X, y, cv=5, scoring='accuracy')
# Step 2: first real model
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf_scores = cross_val_score(rf, X, y, cv=5, scoring='accuracy')
lift = rf_scores.mean() - baseline_scores.mean()
glue('baseline-acc', f"{baseline_scores.mean():.3f} ± {baseline_scores.std():.3f}", display=False)
glue('rf-acc', f"{rf_scores.mean():.3f} ± {rf_scores.std():.3f}", display=False)
glue('lift-pp', f"{lift*100:.1f}", display=False)
workflow_df = pd.DataFrame({
'Step': ['Baseline (most_frequent)', 'RandomForest'],
'CV Accuracy (mean ± std)': [
f"{baseline_scores.mean():.3f} ± {baseline_scores.std():.3f}",
f"{rf_scores.mean():.3f} ± {rf_scores.std():.3f}",
],
})
display(workflow_df)
| Step | CV Accuracy (mean ± std) | |
|---|---|---|
| 0 | Baseline (most_frequent) | 0.627 ± 0.004 |
| 1 | RandomForest | 0.956 ± 0.023 |
RandomForest lifts cross-validated accuracy by 32.9 percentage points over the majority-class baseline. The lift, not the raw accuracy, is the real measure of progress.
6.4.1.5. Summary#
Problem type |
Recommended baseline |
Key metric to compare |
|---|---|---|
Binary classification |
|
F1 on minority class, AUC |
Multi-class classification |
|
Macro F1 |
Regression |
|
RMSE, R² |
Regression with outliers |
|
MAE |
Every trained model you build in this chapter will be measured against a baseline first. If it cannot beat a dummy that predicts the mean or the majority class, there is no point tuning it.