
Machine learning project analyzing 918 patient records using K-Means clustering and Decision Tree classification to identify cardiovascular risk patterns. Achieved 81% test accuracy with 88% precision.
Overview
Two-phase analysis: unsupervised clustering to segment patients into risk groups, then supervised classification to predict heart disease presence.
Approach
K-Means optimization (k=2 to k=15 via silhouette score), Decision Tree with 10-fold cross-validation, feature engineering for risk categories (age groups, cholesterol levels, MaxHR).
Results
Two distinct patient clusters identified (26% vs 79% disease prevalence). Decision Tree: 81.2% accuracy, 88% precision, 76% recall. ST_Slope identified as dominant predictor (70.6% feature importance).
Tools
Pythonpandasscikit-learnmatplotlibseaborn