This project builds a machine learning model to predict whether a customer is likely to default on a loan. It covers the complete ML pipeline from data preprocessing to model evaluation and hyperparameter tuning.
The objective is to help financial institutions identify high-risk customers and make better lending decisions.
The dataset contains customer financial and demographic details such as:
- Age, Income, Savings
- Monthly Expenses
- Credit Score
- Loan Amount & Loan Term
- Employment Years
- Home Ownership
- Education
- Marital Status
- Region
- Recent Default History
target_default_risk
- 0 → No Default
- 1 → Default
- Handled missing values using median imputation
- Fixed categorical inconsistencies (e.g., spelling issues)
- Removed irrelevant columns like
customer_id
- Value counts and countplots for categorical features
- Distribution analysis of numerical features
- Correlation heatmap
- Outlier detection using boxplots
Created new meaningful features:
income_per_dependentsavings_to_income_ratio
These features help capture financial pressure and stability.
- Train-test split (80-20)
- Ordinal Encoding for education
- One-Hot Encoding for categorical variables
- StandardScaler for numerical features
- Outlier treatment using IQR-based capping (Winsorization)
The following models were trained and evaluated:
- Logistic Regression
- Decision Tree
- Support Vector Machine (SVM)
- Random Forest
- XGBoost
-
Used RandomizedSearchCV and GridSearchCV
-
Focused on improving:
- Random Forest
- XGBoost
Models were evaluated using:
- Accuracy
- Precision
- Recall
- F1 Score
- Confusion Matrix
- Random Forest and XGBoost performed the best
- XGBoost achieved the highest accuracy after tuning
- Feature engineering significantly improved model performance
- Python
- Pandas, NumPy
- Matplotlib, Seaborn
- Scikit-learn
- XGBoost
- Jupyter Notebook
- Clone the repository
git clone https://github.com/prabandkumar/loan-default-risk-prediction.git
- Install dependencies
pip install -r requirements.txt
- Run the notebook
jupyter notebook
- Build a web app using Flask or FastAPI
- Add model explainability (SHAP values)
- Deploy model on cloud
- Improve feature engineering