6.2.1.6. Ensemble Methods#

A single model - however well-tuned - is limited by its assumptions and the particular random choices made during training. Ensemble methods combine multiple models so that their errors partially cancel out, producing a stronger predictor than any individual learner.

The statistical argument is simple: if \(B\) models each make independent errors with mean zero and variance \(\sigma^2\), averaging their predictions reduces the variance to \(\sigma^2 / B\). In practice, model errors are correlated, so the gain is smaller - but it is still substantial.

There are three complementary strategies covered on this page:

Strategy

Models trained

How predictions are combined

Voting

Different model types, in parallel

Average (or weighted average) of all predictions

Bagging

Same model type, on different bootstrap samples, in parallel

Average of all predictions

Stacking

Different base models + a meta-model

Meta-model learns from base-model outputs

Boosting - the fourth major ensemble strategy - is treated separately on the Boosting page because its sequential training logic is fundamentally different.


1. Voting (Averaging)#

Intuition#

The simplest ensemble: train several diverse models independently and average their predictions. Diversity is the key - models that make different kinds of errors benefit most from averaging. Using a mix of model families (linear, tree-based, kernel-based) is a common strategy.

In scikit-learn#

from sklearn.ensemble import VotingRegressor

voting = VotingRegressor(estimators=[
    ('ridge', Ridge(alpha=1.0)),
    ('dt',    DecisionTreeRegressor(max_depth=5)),
    ('svr',   Pipeline([('sc', StandardScaler()), ('svr', SVR(C=50))])),
])

Weights can be assigned via the weights parameter to give stronger models more influence.


2. Bagging (Bootstrap Aggregating)#

Intuition#

Bagging trains the same model type on many different bootstrap samples of the training data. Each bootstrap sample draws \(n\) points with replacement, so each model sees a slightly different view of the data.

High-variance models like deep decision trees make quite different predictions on different bootstrap samples - averaging smooths out those idiosyncratic errors. Bagging primarily reduces variance without increasing bias.

Note

If sampling is done without replacement instead of with replacement, the technique is called Pasting. The key difference: with Pasting, each sub-sample contains unique points, so individual models see less diversity between them. Bagging (with replacement) typically generalises better because the overlapping bootstrap samples force more variation, leading to more decorrelated models.

The Math#

\[\hat{y}_{\text{bag}}(x) = \frac{1}{B}\sum_{b=1}^{B} \hat{f}_b(x)\]

Each \(\hat{f}_b\) is trained on a bootstrap sample \(\mathcal{D}_b \sim \mathcal{D}\) (sampled with replacement). Roughly 37% of the original points are left out of each bootstrap sample - this out-of-bag (OOB) set can be used as a free validation set.

In scikit-learn#

from sklearn.ensemble import BaggingRegressor

bag = BaggingRegressor(
    estimator=DecisionTreeRegressor(max_depth=8),
    n_estimators=50,
    max_samples=0.8,    # fraction of training data per bootstrap
    random_state=42
)

3. Stacking#

Intuition#

Stacking is a two-level ensemble. Level-0 (base) models are trained first; their predictions are then used as input features for a level-1 (meta) model that learns the best way to combine them.

The critical design challenge is preventing leakage: if the base models were trained on the same data whose predictions you feed into the meta-model, they would simply memorise the training set and the meta-model would overfit badly.

The two standard solutions are:

  1. Hold-out split - Split the training data into two parts. Train base models on the first part, then generate predictions on the second (held-out) part. Feed those held-out predictions as features to the meta-model. This is simple and fast, but wastes training data.

  2. Out-of-fold (OOF) cross-validation (recommended) - Use \(k\)-fold CV: for each fold, train base models on the other \(k-1\) folds and predict on the held-out fold. This produces a full set of leak-free predictions for every training sample, which the meta-model then learns from. No data is wasted.

scikit-learn’s StackingRegressor implements the OOF approach automatically via the cv parameter.

The meta-model exploits the fact that different base models are strong in different regions of the input space.

In scikit-learn#

from sklearn.ensemble import StackingRegressor

stacking = StackingRegressor(
    estimators=[                                    # base models
        ('ridge', Ridge(alpha=1.0)),
        ('dt',    DecisionTreeRegressor(max_depth=5)),
        ('bag',   BaggingRegressor(n_estimators=20)),
    ],
    final_estimator=Ridge(alpha=1.0),               # meta-model
    cv=5                                            # OOF folds
)

Example#

Hide code cell source

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from myst_nb import glue
from sklearn.ensemble import VotingRegressor, BaggingRegressor, StackingRegressor
from sklearn.linear_model import Ridge
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.datasets import make_regression

np.random.seed(42)

X, y = make_regression(n_samples=300, n_features=10, n_informative=6,
                        noise=25, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42)

def fit_eval(name, model):
    model.fit(X_train, y_train)
    tr   = round(r2_score(y_train, model.predict(X_train)), 3)
    te   = round(r2_score(y_test,  model.predict(X_test)),  3)
    rmse = round(np.sqrt(mean_squared_error(y_test, model.predict(X_test))), 1)
    return {"Model": name, "Train R²": tr, "Test R²": te, "Test RMSE": rmse}, model
voting = VotingRegressor(estimators=[
    ('ridge', Ridge(alpha=1.0)),
    ('dt',    DecisionTreeRegressor(max_depth=5, random_state=42)),
    ('svr',   Pipeline([('sc', StandardScaler()), ('svr', SVR(C=50))])),
])

bagging = BaggingRegressor(
    estimator=DecisionTreeRegressor(max_depth=8),
    n_estimators=50, max_samples=0.8,
    random_state=42, n_jobs=-1
)

stacking = StackingRegressor(
    estimators=[
        ('ridge', Ridge(alpha=1.0)),
        ('dt',    DecisionTreeRegressor(max_depth=5, random_state=42)),
        ('bag',   BaggingRegressor(n_estimators=20, random_state=42)),
    ],
    final_estimator=Ridge(alpha=1.0),
    cv=5
)

rows = []
models_fitted = {}
for name, model in [("Voting", voting), ("Bagging (50 trees)", bagging),
                    ("Stacking", stacking)]:
    row, m = fit_eval(name, model)
    rows.append(row)
    models_fitted[name] = m

results_df = pd.DataFrame(rows)
results_df
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_dc2dc267ec774d4486024a9f72700cbe for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /loky-3376-9t8e6zqk for automatic cleanup: unknown resource type semlock
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /loky-3376-u2sh0fnv for automatic cleanup: unknown resource type semlock
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /loky-3376-xr9198e0 for automatic cleanup: unknown resource type semlock
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /loky-3376-94c_eanz for automatic cleanup: unknown resource type semlock
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /loky-3376-zdxbi7ue for automatic cleanup: unknown resource type semlock
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /loky-3376-swnsducs for automatic cleanup: unknown resource type semlock
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_5ddfac9f49ff414a88f7fa0b5301d253 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /loky-3376-jbpfbjqx for automatic cleanup: unknown resource type semlock
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /loky-3376-hwl0rkzg for automatic cleanup: unknown resource type semlock
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_5ddfac9f49ff414a88f7fa0b5301d253 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_9fe2187fcb8a4f11a8860aed895ad886_1cdaa13971884fd08632913600a944c3 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_f806d3f360a54c03a9eca325dea07ecf for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_f806d3f360a54c03a9eca325dea07ecf for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_6844ba87e46c4b3ebc691b5cf98d2c21_0f75b73316ab45d59e5e1369d0fa9179 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_2ddade094e104dcfa88a64402d331ff4 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_2ddade094e104dcfa88a64402d331ff4 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_d9b3328c7c274884a4bb2d682242ad76_f36679d26e2a4ec58f362ae0364f0257 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_8802f2918cfe4db68ac68edf60289bf2 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_8802f2918cfe4db68ac68edf60289bf2 for automatic cleanup: unknown resource type folder
Model Train R² Test R² Test RMSE
0 Voting 0.960 0.840 71.0
1 Bagging (50 trees) 0.949 0.765 86.2
2 Stacking 0.979 0.977 27.0

Voting achieves \(R^2\) = 0.84, Bagging 0.765, and Stacking 0.977. Each improves over the single decision tree (see Decision Tree Regression).

Comparing Against a Single Decision Tree Baseline#

Hide code cell source

single_dt = DecisionTreeRegressor(max_depth=8, random_state=42)
single_dt.fit(X_train, y_train)
dt_r2 = round(r2_score(y_test, single_dt.predict(X_test)), 3)

all_names  = ["Single DT (depth=8)"] + results_df["Model"].tolist()
all_scores = [dt_r2] + results_df["Test R²"].tolist()

colors = ["#d9534f"] + ["#5bc0de", "#5cb85c", "#f0ad4e"]

fig, ax = plt.subplots(figsize=(9, 4))
bars = ax.bar(all_names, all_scores, color=colors, edgecolor="black", linewidth=0.7, alpha=0.85)
ax.axhline(dt_r2, color="grey", linestyle="--", linewidth=1.2, label=f"Single DT  R²={dt_r2}")
ax.set_ylabel("Test R²  (higher is better)", fontsize=12)
ax.set_title("Ensemble Strategies vs Single Model", fontsize=13, fontweight="bold")
ax.set_ylim(0, 1.05)
for bar, val in zip(bars, all_scores):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01,
            f"{val:.3f}", ha="center", fontsize=10)
ax.grid(True, alpha=0.3, axis="y")
plt.tight_layout()
plt.show()
../../../../_images/2d7391762c49bcd424e9ccb42d9e1383fbcc30fa80e1f5d6cdb603302e223cc2.png

The single decision tree baseline gives \(R^2\) = 0.445. All three ensemble strategies outperform it, with Stacking typically providing the largest gain by learning which base model to trust in which region.

How Many Bagging Estimators Are Enough?#

Hide code cell source

n_list = [1, 5, 10, 20, 50, 100, 200]
bag_scores = []
for n in n_list:
    m = BaggingRegressor(
        estimator=DecisionTreeRegressor(max_depth=8),
        n_estimators=n, random_state=42, n_jobs=-1
    )
    m.fit(X_train, y_train)
    bag_scores.append(r2_score(y_test, m.predict(X_test)))

plt.figure(figsize=(9, 4))
plt.plot(n_list, bag_scores, "o-", linewidth=2, markersize=7, color="steelblue")
plt.xlabel("Number of estimators", fontsize=12)
plt.ylabel("Test R²", fontsize=12)
plt.title("Bagging - Effect of Number of Estimators", fontsize=13, fontweight="bold")
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_25642b3c728a4ad49a255c204685d636_091cbfcd7e8a4e2f8757e6c2eeace483 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_bd925b26ac7944ac8304b946569dbc98 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_bd925b26ac7944ac8304b946569dbc98 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_84bc1c8813bf4c868db3f4ae33251287_9d49e47595874e64abb2bba092eb0382 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_88ded7cb17ec4b959a8a696ad69c0a06 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_88ded7cb17ec4b959a8a696ad69c0a06 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_8d6c15af1def4f3a8a9b7e61e1e177f2_ba02287f4a12492b95a78af04093ee2e for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_7aa9b74c43d64ca689b4a61f4617c6d3 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_7aa9b74c43d64ca689b4a61f4617c6d3 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_f40f6f2d26b64024a09e4a498d6090ea_7d0a74f88f714eedaa16df0a963e4ca0 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_815f41a2e88a47739a7572e3ff64a036 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_815f41a2e88a47739a7572e3ff64a036 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_85dfd3f8ae81474ea1d0d37e8986cba1_721f658aca8748099fd3a73b9b28ca24 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_bab7d731f7ca47ba9cde7b825739aff1 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_bab7d731f7ca47ba9cde7b825739aff1 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_5dc0382bf6e644acb85c6630873efc81_89c0062011d44169a61f860f98583b47 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_2ab630f84a4944c9af01af7948c0d3d5 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_2ab630f84a4944c9af01af7948c0d3d5 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_63cdec281e6349b5852eb64e55c34d83_c814065879f94ff88436cd59dfeac34f for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_30f8716b12914b9a950252fad585ad4a for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_30f8716b12914b9a950252fad585ad4a for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_a21f81ce71eb4bde80e68e1b8b699e00_6820f1632a054cd2954c7d46e2771c6a for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_a3d10042c58141608a79a97a9518a5a6 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_a3d10042c58141608a79a97a9518a5a6 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_b6d376ad60fc427e8e268ea9b72659a7_8dc0d1677bfa424d9270b4e072d8c807 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_992d0a5dd65a4438b06057a6f8fdccb5 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_992d0a5dd65a4438b06057a6f8fdccb5 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_0a507db393494ff599eb4df0d6f7ca27_24adf6cfd2f343dab9b627b2c65b6410 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_dec34442573e4f3e9aca028fa5270354 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_dec34442573e4f3e9aca028fa5270354 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_fe95fcf8c48a4dc187c145e0820d49da_13518ee2c73645b2b794fab676323020 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_36c1c81565d84f7ea3dd91a44134de73 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_36c1c81565d84f7ea3dd91a44134de73 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_b3291b8ac4064a2aad68a3046f29c1ed_f7aa891d75d44eb1a69fe9d4d5270bc0 for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_c960cb9c3e7b4419b89ee0ba8ef21a8d for automatic cleanup: unknown resource type folder
Traceback (most recent call last):
  File "/home/runner/.local/share/uv/python/cpython-3.13.12-linux-x86_64-gnu/lib/python3.13/multiprocessing/resource_tracker.py", line 371, in main
    raise ValueError(
        f'Cannot register {name} for automatic cleanup: '
        f'unknown resource type {rtype}')
ValueError: Cannot register /dev/shm/joblib_memmapping_folder_3376_ea3967c7bdb14f9191c2d76856bb62cf_c960cb9c3e7b4419b89ee0ba8ef21a8d for automatic cleanup: unknown resource type folder
../../../../_images/10b17690cb5171c7dfdda440ee99881bc26af034d847a670a797313822054515.png

Performance improves rapidly up to ~20–50 estimators and then plateaus. Beyond that point, adding more models costs computation without meaningful gains.


Strengths and Weaknesses#

Strategy

Best for

Watch out for

Voting

Quick ensemble of diverse models

All base models must be independently strong

Bagging

Reducing variance of high-variance models

Slow to train with many estimators; doesn’t help low-variance models much

Stacking

Squeezing out maximum performance

Risk of leakage if OOF predictions are not used correctly; slow; harder to interpret

Tip

Stacking is often the most powerful but also the most expensive. Use it when you already have well-tuned base models and want a final performance boost. For everyday use, Random Forest - which is essentially optimised Bagging with additional decorrelation - is the better default.