AI Approaches for Imputing Missing Values in Mammogram Mass Data
Abstract
In data mining, one of the main challenges in data preprocessing is handling missing values. Imputation, the process of replacing missing data with substituted values, is crucial for ensuring accurate analysis. Many clinical diagnostic datasets often contain missing values, and excluding these incomplete datasets can introduce more problems than solutions. Traditional imputation methods are easy to implement but may introduce bias in the data. This paper proposes a data imputation technique using K-Nearest Neighbors (KNN) to address the issue of missing data. The method combines KNN predictive modeling with Support Vector Machine (SVM) for improved attribution. The aim of this study is to assess the impact of missing data on the data mining process of learning discovery. Handling missing values in the dataset is a challenging task. Our study explores AI techniques for missing value imputation using Mammogram mass data from the UCI repository. The findings indicate that classifier performance improves when Support Vector Machine (SVM) is employed
Keywords
Download Options
Introduction
Missing data, a common issue in data analysis, refers to the absence of data values for a variable of interest. It poses challenges for AI experts across various fields, from computational science to social sciences. Managing missing data is crucial as it can impact the quality of analysis and modeling. Various techniques for handling missing values have been proposed, but there is no universally best method. The goal of these techniques is to impute missing values using available data. Handling missing values is essential before applying data mining methods to ensure accurate analysis. This study explores the performance of the KNN algorithm for imputing missing data and its impact on model accuracy, using SVM for classification based on the imputed dataset [1].
Conclusion
This paper also evaluates approaches used to fill missing values and proposes a new and better approach to handle missing value situation and thereby enabling to feed correct input to the SVM classifier to get better prediction, diagnosis and treatment of the mammographic data. The proposed KNN data imputation method serves as an effective data imputation method for SVM classification in the case of missing information.