6.2.1.6. Ensemble Methods#

A single model - however well-tuned - is limited by its assumptions and the particular random choices made during training. Ensemble methods combine multiple models so that their errors partially cancel out, producing a stronger predictor than any individual learner.

The statistical argument is simple: if \(B\) models each make independent errors with mean zero and variance \(\sigma^2\), averaging their predictions reduces the variance to \(\sigma^2 / B\). In practice, model errors are correlated, so the gain is smaller - but it is still substantial.

There are three complementary strategies covered on this page:

Strategy

Models trained

How predictions are combined

Voting

Different model types, in parallel

Average (or weighted average) of all predictions

Bagging

Same model type, on different bootstrap samples, in parallel

Average of all predictions

Stacking

Different base models + a meta-model

Meta-model learns from base-model outputs

Boosting - the fourth major ensemble strategy - is treated separately on the Boosting page because its sequential training logic is fundamentally different.


1. Voting (Averaging)#

Intuition#

The simplest ensemble: train several diverse models independently and average their predictions. Diversity is the key - models that make different kinds of errors benefit most from averaging. Using a mix of model families (linear, tree-based, kernel-based) is a common strategy.

In scikit-learn#

from sklearn.ensemble import VotingRegressor

voting = VotingRegressor(estimators=[
    ('ridge', Ridge(alpha=1.0)),
    ('dt',    DecisionTreeRegressor(max_depth=5)),
    ('svr',   Pipeline([('sc', StandardScaler()), ('svr', SVR(C=50))])),
])

Weights can be assigned via the weights parameter to give stronger models more influence.


2. Bagging (Bootstrap Aggregating)#

Intuition#

Bagging trains the same model type on many different bootstrap samples of the training data. Each bootstrap sample draws \(n\) points with replacement, so each model sees a slightly different view of the data.

High-variance models like deep decision trees make quite different predictions on different bootstrap samples - averaging smooths out those idiosyncratic errors. Bagging primarily reduces variance without increasing bias.

Note

If sampling is done without replacement instead of with replacement, the technique is called Pasting. The key difference: with Pasting, each sub-sample contains unique points, so individual models see less diversity between them. Bagging (with replacement) typically generalises better because the overlapping bootstrap samples force more variation, leading to more decorrelated models.

The Math#

\[\hat{y}_{\text{bag}}(x) = \frac{1}{B}\sum_{b=1}^{B} \hat{f}_b(x)\]

Each \(\hat{f}_b\) is trained on a bootstrap sample \(\mathcal{D}_b \sim \mathcal{D}\) (sampled with replacement). Roughly 37% of the original points are left out of each bootstrap sample - this out-of-bag (OOB) set can be used as a free validation set.

In scikit-learn#

from sklearn.ensemble import BaggingRegressor

bag = BaggingRegressor(
    estimator=DecisionTreeRegressor(max_depth=8),
    n_estimators=50,
    max_samples=0.8,    # fraction of training data per bootstrap
    random_state=42
)

3. Stacking#

Intuition#

Stacking is a two-level ensemble. Level-0 (base) models are trained first; their predictions are then used as input features for a level-1 (meta) model that learns the best way to combine them.

The critical design challenge is preventing leakage: if the base models were trained on the same data whose predictions you feed into the meta-model, they would simply memorise the training set and the meta-model would overfit badly.

The two standard solutions are:

  1. Hold-out split - Split the training data into two parts. Train base models on the first part, then generate predictions on the second (held-out) part. Feed those held-out predictions as features to the meta-model. This is simple and fast, but wastes training data.

  2. Out-of-fold (OOF) cross-validation (recommended) - Use \(k\)-fold CV: for each fold, train base models on the other \(k-1\) folds and predict on the held-out fold. This produces a full set of leak-free predictions for every training sample, which the meta-model then learns from. No data is wasted.

scikit-learn’s StackingRegressor implements the OOF approach automatically via the cv parameter.

The meta-model exploits the fact that different base models are strong in different regions of the input space.

In scikit-learn#

from sklearn.ensemble import StackingRegressor

stacking = StackingRegressor(
    estimators=[                                    # base models
        ('ridge', Ridge(alpha=1.0)),
        ('dt',    DecisionTreeRegressor(max_depth=5)),
        ('bag',   BaggingRegressor(n_estimators=20)),
    ],
    final_estimator=Ridge(alpha=1.0),               # meta-model
    cv=5                                            # OOF folds
)

Example#

Hide code cell source

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from myst_nb import glue
from sklearn.ensemble import VotingRegressor, BaggingRegressor, StackingRegressor
from sklearn.linear_model import Ridge
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.datasets import make_regression

np.random.seed(42)

X, y = make_regression(n_samples=300, n_features=10, n_informative=6,
                        noise=25, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42)

def fit_eval(name, model):
    model.fit(X_train, y_train)
    tr   = round(r2_score(y_train, model.predict(X_train)), 3)
    te   = round(r2_score(y_test,  model.predict(X_test)),  3)
    rmse = round(np.sqrt(mean_squared_error(y_test, model.predict(X_test))), 1)
    return {"Model": name, "Train R²": tr, "Test R²": te, "Test RMSE": rmse}, model
voting = VotingRegressor(estimators=[
    ('ridge', Ridge(alpha=1.0)),
    ('dt',    DecisionTreeRegressor(max_depth=5, random_state=42)),
    ('svr',   Pipeline([('sc', StandardScaler()), ('svr', SVR(C=50))])),
])

bagging = BaggingRegressor(
    estimator=DecisionTreeRegressor(max_depth=8),
    n_estimators=50, max_samples=0.8,
    random_state=42, n_jobs=-1
)

stacking = StackingRegressor(
    estimators=[
        ('ridge', Ridge(alpha=1.0)),
        ('dt',    DecisionTreeRegressor(max_depth=5, random_state=42)),
        ('bag',   BaggingRegressor(n_estimators=20, random_state=42)),
    ],
    final_estimator=Ridge(alpha=1.0),
    cv=5
)

rows = []
models_fitted = {}
for name, model in [("Voting", voting), ("Bagging (50 trees)", bagging),
                    ("Stacking", stacking)]:
    row, m = fit_eval(name, model)
    rows.append(row)
    models_fitted[name] = m

results_df = pd.DataFrame(rows)
results_df
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_1098d5c356dc4399b56723a73de12d28 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /loky-2612-uvh0czmi for automatic cleanup: unknown resource type semlock
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /loky-2612-9jdw88x3 for automatic cleanup: unknown resource type semlock
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /loky-2612-kc2y74ho for automatic cleanup: unknown resource type semlock
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /loky-2612-5bizon4n for automatic cleanup: unknown resource type semlock
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /loky-2612-kh6kjh8x for automatic cleanup: unknown resource type semlock
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /loky-2612-3gvd1ltv for automatic cleanup: unknown resource type semlock
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_3c410fd7cfbc40eb922d56bdb3129fc5 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /loky-2612-shxzombm for automatic cleanup: unknown resource type semlock
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /loky-2612-r_p395bc for automatic cleanup: unknown resource type semlock
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_3c410fd7cfbc40eb922d56bdb3129fc5 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_ce4a3d8bf6f1477380e963fe571d6e39_cec69041dde24c28b864338912665e96 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_29fee5b29b4946e2b2c7d23241d1eb79 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_29fee5b29b4946e2b2c7d23241d1eb79 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_c55b5a2ba32d41e1b1a6ab8640dbcf4d_61f71b67838f4c8c85cc822b2735f635 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_5ca847ec66b2437380c4a8404d2c3a3c for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_5ca847ec66b2437380c4a8404d2c3a3c for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_7a2c9ff1947947e3bf69518fc4a85b87_f789b0697a7e43249a5b2ddf965ba888 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_4fe8683f0f9641e780eb6a8e090cf083 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_4fe8683f0f9641e780eb6a8e090cf083 for automatic cleanup: unknown resource type folder
Model Train R² Test R² Test RMSE
0 Voting 0.960 0.840 71.0
1 Bagging (50 trees) 0.949 0.765 86.2
2 Stacking 0.979 0.977 27.0

Voting achieves \(R^2\) = 0.84, Bagging 0.765, and Stacking 0.977. Each improves over the single decision tree (see Decision Tree Regression).

Comparing Against a Single Decision Tree Baseline#

Hide code cell source

single_dt = DecisionTreeRegressor(max_depth=8, random_state=42)
single_dt.fit(X_train, y_train)
dt_r2 = round(r2_score(y_test, single_dt.predict(X_test)), 3)

all_names  = ["Single DT (depth=8)"] + results_df["Model"].tolist()
all_scores = [dt_r2] + results_df["Test R²"].tolist()

colors = ["#d9534f"] + ["#5bc0de", "#5cb85c", "#f0ad4e"]

fig, ax = plt.subplots(figsize=(9, 4))
bars = ax.bar(all_names, all_scores, color=colors, edgecolor="black", linewidth=0.7, alpha=0.85)
ax.axhline(dt_r2, color="grey", linestyle="--", linewidth=1.2, label=f"Single DT  R²={dt_r2}")
ax.set_ylabel("Test R²  (higher is better)", fontsize=12)
ax.set_title("Ensemble Strategies vs Single Model", fontsize=13, fontweight="bold")
ax.set_ylim(0, 1.05)
for bar, val in zip(bars, all_scores):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01,
            f"{val:.3f}", ha="center", fontsize=10)
ax.grid(True, alpha=0.3, axis="y")
plt.tight_layout()
plt.show()
../../../../_images/2d7391762c49bcd424e9ccb42d9e1383fbcc30fa80e1f5d6cdb603302e223cc2.png

The single decision tree baseline gives \(R^2\) = 0.445. All three ensemble strategies outperform it, with Stacking typically providing the largest gain by learning which base model to trust in which region.

How Many Bagging Estimators Are Enough?#

Hide code cell source

n_list = [1, 5, 10, 20, 50, 100, 200]
bag_scores = []
for n in n_list:
    m = BaggingRegressor(
        estimator=DecisionTreeRegressor(max_depth=8),
        n_estimators=n, random_state=42, n_jobs=-1
    )
    m.fit(X_train, y_train)
    bag_scores.append(r2_score(y_test, m.predict(X_test)))

plt.figure(figsize=(9, 4))
plt.plot(n_list, bag_scores, "o-", linewidth=2, markersize=7, color="steelblue")
plt.xlabel("Number of estimators", fontsize=12)
plt.ylabel("Test R²", fontsize=12)
plt.title("Bagging - Effect of Number of Estimators", fontsize=13, fontweight="bold")
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_8bace7b1d0944b7b9e891e6f4919d2fd_09b67b03ae1d462093da8a45cb628591 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_b1467ed8753d4ce8ae61b431e5646523 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_b1467ed8753d4ce8ae61b431e5646523 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_4edfa348b6314b5db3735ebd47cfda77_95c2f13eaf404dc29c610c2c487efae9 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_ccfb1bc53909452694b3d1b374992d3c for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_ccfb1bc53909452694b3d1b374992d3c for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_6675f29c88054573835753be49c84376_a10f6f16d36c4463a6112596c030fa33 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_3d19652af2954e59ab1d285461551105 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_3d19652af2954e59ab1d285461551105 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_ed256b1fed424e5e8318dfd653aaac6d_83616b8e77454f7e96e64c3210c6ce6d for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_bf08fdebc3bf4895a6197adeaeb6d01b for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_bf08fdebc3bf4895a6197adeaeb6d01b for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_5eaea896baad48879846072e0e5f11ef_2d546d78f4504f7d8433b331a251bb22 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_9a7c57a963c24b69afd3bca80131df43 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_9a7c57a963c24b69afd3bca80131df43 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_57b3aeb9b2af4236bc31d66252eb10fb_f37c799b59134793a017e8c1a0c0c8c2 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_9d42b53ced9b4c8aacefb37823a280d7 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_9d42b53ced9b4c8aacefb37823a280d7 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_26672e43dd8549dca0f724451147a7e2_49588ad08d0849519f031b501b8f0410 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_e8fdafbf075f4a8f99912002c937060f for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_e8fdafbf075f4a8f99912002c937060f for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_0d003b86d77d4bfa92a494f7a30a744e_8829932b2aa745ddb8a1bf0eb4ce1eec for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_052c92ef6764435e882a5fb819363f3b for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_052c92ef6764435e882a5fb819363f3b for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_71029f3fbbba4073b1ba7a856d526407_8310609a8b1d4e7e8c93dc34d2775029 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_a3455b9a7ec5478a875647d53c866acd for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_a3455b9a7ec5478a875647d53c866acd for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_17e21567c3a747eaacc67266e1861e47_93451ceec0c148acafe722989af862d5 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_a349358383674b4cb48f29105f1681d6 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_a349358383674b4cb48f29105f1681d6 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_359f43de3b594d5195af2570737405bc_121675c2f31342d8a645945f274ff505 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_94a9e3be26414a8987f2fcb49aa2902c for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_94a9e3be26414a8987f2fcb49aa2902c for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_3f807f63b42a44709b005a6603dacd7e_d982338b10f74d2892010029ee142dab for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_9e34409b426c40f29641645d7eb226da for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_2612_2bb2a4e7365942eb8e70324008b7050d_9e34409b426c40f29641645d7eb226da for automatic cleanup: unknown resource type folder
../../../../_images/10b17690cb5171c7dfdda440ee99881bc26af034d847a670a797313822054515.png

Performance improves rapidly up to ~20–50 estimators and then plateaus. Beyond that point, adding more models costs computation without meaningful gains.


Strengths and Weaknesses#

Strategy

Best for

Watch out for

Voting

Quick ensemble of diverse models

All base models must be independently strong

Bagging

Reducing variance of high-variance models

Slow to train with many estimators; doesn’t help low-variance models much

Stacking

Squeezing out maximum performance

Risk of leakage if OOF predictions are not used correctly; slow; harder to interpret

Tip

Stacking is often the most powerful but also the most expensive. Use it when you already have well-tuned base models and want a final performance boost. For everyday use, Random Forest - which is essentially optimised Bagging with additional decorrelation - is the better default.