A Data-Driven Analysis for Predicting Polycystic Ovary Syndrome (PCOS) Using Clinical and Hormonal Indicators
Abstract
Polycystic Ovary Syndrome (PCOS) is a prevalent endocrine disorder among women of reproductive age, characterized by hormonal imbalance and irregular menstrual cycles. Early diagnosis is crucial to prevent long-term complications such as infertility, diabetes, and cardiovascular issues. This study analyzes a clinical dataset of 1,000 women to develop predictive models for PCOS based on features such as BMI, testosterone levels, menstrual irregularity, and antral follicle count. Machine learning classifiers are implemented to identify the most influential predictors. Results indicate that antral follicle count and testosterone levels are the most critical features, while the models achieve over 90% classification accuracy, supporting the viability of automated diagnostic tools in clinical practice.
Keywords
Download Options
Introduction
PCOS affects millions of women globally, yet it remains underdiagnosed due to varied symptoms and lack of standardized screening protocols. Key manifestations include irregular periods, elevated androgen levels, and polycystic ovaries. In this era of data-driven medicine, leveraging clinical data through machine learning offers a promising avenue for timely and accurate diagnosis.
This research focuses on analyzing structured PCOS data to uncover patterns and risk factors associated with PCOS and to develop predictive models to assist in early identification and intervention.
Conclusion
This study confirms the utility of machine learning models in diagnosing PCOS using basic clinical data. The Random Forest model achieved 91% accuracy, demonstrating robust performance and interpretability. Antral follicle count, testosterone levels, and menstrual history emerged as the most influential features. These models could serve as decision-support tools in gynecology clinics, improving early detection and management of PCOS.