Predicting the Onset of Diabetes using Clinical and Demographic Features

Authors: Sandagiri Gayathri
DIN
IMJH-SVU-JAN-2025-16
Abstract

The rising global prevalence of diabetes necessitates the development of effective diagnostic tools. This study explores the prediction of diabetes onset using clinical and demographic variables including glucose level, BMI, age, and family history. Utilizing a well-established dataset of 768 women from the Pima Indian population, we perform exploratory analysis and build a logistic regression model to assess the probability of diabetes presence. The model shows promising accuracy, with glucose level and BMI emerging as strong predictors. These findings emphasize the potential of machine learning in enhancing early diabetes detection and prevention strategies.

Keywords
Diabetes Onset Prediction Clinical and Demographic Features Logistic Regression Model Pima Indian Dataset Analysis Early Disease Detection Analytics
Introduction

Diabetes mellitus, especially type 2 diabetes, is a chronic metabolic disorder characterized by elevated blood glucose levels. Its detection is often delayed until complications arise, necessitating better early-warning mechanisms. This study uses machine learning techniques on a clinical dataset to understand the contributing features and predict diabetes onset efficiently. This can support timely interventions and potentially reduce healthcare burdens.

Conclusion

This study used a logistic regression model to predict diabetes onset based on clinical and demographic data. Key takeaways include: 

Glucose, BMI, and age are dominant features associated with diabetes. 

Logistic regression achieved ~75% accuracy, offering a simple yet effective tool for early screening. 

More advanced models or ensemble methods may further boost prediction quality

Article Preview