4.2.2. Numeric Transformations#

Numeric features in a dataset often need scaling, normalization, or transformation before being fed into machine learning models. Proper transformation can improve model performance, stability, and convergence in optimization algorithms.

4.2.2.1. Min-Max Scaling#

Min-max scaling rescales values to a fixed range, usually [0, 1].

\[ x' = \frac{x - x\_{\min}}{x\_{\max} - x\_{\min}} \]

Example: We can use MinMaxScaler from sklearn.preprocessing to do this.

import pandas as pd
from sklearn.preprocessing import MinMaxScaler

data = {"Salary": [50000, 70000, 120000, 150000, 200000]}
df = pd.DataFrame(data)

scaler = MinMaxScaler()
df["Salary_minmax"] = scaler.fit_transform(df[["Salary"]])
display(df)
Salary Salary_minmax
0 50000 0.000000
1 70000 0.133333
2 120000 0.466667
3 150000 0.666667
4 200000 1.000000

This is useful for algorithms like neural networks or distance-based models (KNN, K-Means) that are sensitive to magnitude.

4.2.2.2. Standardization (Z-score)#

Standardization centers the data around the mean and scales by the standard deviation:

\[ x' = \frac{x - \mu}{\sigma} \]

Example: We can use StandardScaler from sklearn.preprocessing to do this.

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
df["Salary_zscore"] = scaler.fit_transform(df[["Salary"]])
display(df)
Salary Salary_minmax Salary_zscore
0 50000 0.000000 -1.254963
1 70000 0.133333 -0.885856
2 120000 0.466667 0.036911
3 150000 0.666667 0.590571
4 200000 1.000000 1.513338

Standardization is preferred for models like SVM, logistic regression, and PCA.

4.2.2.3. Log Transformation#

Log transformation reduces the impact of skewed distributions and extreme values.

\[ x' = \log(x + 1) \]

Example:

import numpy as np

df["Salary_log"] = np.log1p(df["Salary"])
display(df)
Salary Salary_minmax Salary_zscore Salary_log
0 50000 0.000000 -1.254963 10.819798
1 70000 0.133333 -0.885856 11.156265
2 120000 0.466667 0.036911 11.695255
3 150000 0.666667 0.590571 11.918397
4 200000 1.000000 1.513338 12.206078

Log transformation is especially helpful for income, population, or any highly skewed features.

4.2.2.4. Demo: How Normalization Improves Computation#

Algorithms like gradient descent converge faster when features are on similar scales. Let’s demonstrate with a simple linear regression.

from sklearn.linear_model import SGDRegressor
from sklearn.metrics import mean_squared_error
import time

# Large synthetic dataset
np.random.seed(0)
X = np.random.randint(0, 1000, size=(10000, 1))
y = 3 * X.squeeze() + 500 + np.random.randn(10000) * 100

# Without normalization
start = time.time()
model = SGDRegressor(max_iter=1000, tol=1e-3)
model.fit(X, y)
time_unscaled = time.time() - start

# With Min-Max scaling
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

start = time.time()
model.fit(X_scaled, y)
time_scaled = time.time() - start

print(f"Training time without scaling: {time_unscaled:.4f} s")
print(f"Training time with scaling: {time_scaled:.4f} s")
print(f"Boost: {time_unscaled / time_scaled:.2f}x faster")
Training time without scaling: 0.1446 s
Training time with scaling: 0.0114 s
Boost: 12.68x faster

Scaling the features reduces the number of iterations required for gradient descent to converge, thus improving computation speed and stability.