based on various risk factors such as age, diseases, and smoking status
using supervised machine learning
According to the World Health Organization (WHO) stroke is the second leading cause of death globally, responsible for approximately 11% of total deaths. Stroke happens when blood supply to part of the brain is blocked or reduced. This causes brain cells to die within minutes as they did not receive the oxygen and nutrients they critically need to survive.
Many factors can increase the risk of stroke such as:
To visualise the relationship between various risk factors with the probability of getting stroke and design a predictive model to determine whether a patient is likely to get stroke based on risk factors like gender, age, various diseases, and smoking status.
The analysis comprised of four main stages; understanding of the data, data cleaning, exploratory data analysis (EDA), and classification of patients by building supervised machine learning model. I trained 5 classification models which are logistic regression, decision tree, random forest, svc, and knn. Since there is a large class imbalance in our data, I did oversampling using SMOTE. The oversampling and scaling is done within a pipeline to ensure that our sampling techniques don’t have data leakage.
You can find the full project details and code in my github account by clicking the button below.