Implement and understand the two most fundamental ML algorithms — linear regression for predicting continuous values and logistic regression for binary classification — both from scratch using NumPy and via scikit-learn.
💡 Late payments and debt ratio are the strongest default predictors — matching real-world credit risk intuition. Negative coefficients (credit score, income) reduce default probability.
Summary
Algorithm
Use Case
Key Hyperparameter
Metric
Linear Regression
Continuous output
alpha (regularisation)
R², RMSE
Logistic Regression
Binary classification
C (inverse reg.)
Accuracy, ROC-AUC
Ridge
Regression + L2 penalty
alpha
RMSE
Lasso
Regression + feature selection
alpha
RMSE, non-zero coefs
Key Takeaways:
Gradient descent is the engine behind all these models
Always apply StandardScaler before logistic regression
Use classification_report not just accuracy
Regularisation prevents overfitting — always tune C or alpha
from sklearn.linear_model import Ridge, Lasso
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
import numpy as np
# High-noise regression with many features
X, y = make_regression(n_samples=500, n_features=50, n_informative=10, noise=30, random_state=42)
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.2, random_state=42)
from sklearn.linear_model import LinearRegression
models = {
'No regularisation': LinearRegression(),
'Ridge (L2)': Ridge(alpha=1.0),
'Lasso (L1)': Lasso(alpha=1.0, max_iter=5000),
}
for name, m in models.items():
m.fit(X_tr, y_tr)
train_r2 = r2_score(y_tr, m.predict(X_tr))
test_r2 = r2_score(y_te, m.predict(X_te))
n_zero = np.sum(np.abs(m.coef_) < 0.001) if hasattr(m, 'coef_') else 0
print(f"{name:<25} Train R²={train_r2:.3f} Test R²={test_r2:.3f} Zero coefs={n_zero}")
No regularisation Train R²=0.992 Test R²=0.947 Zero coefs=0
Ridge (L2) Train R²=0.982 Test R²=0.955 Zero coefs=0
Lasso (L1) Train R²=0.964 Test R²=0.960 Zero coefs=40