Investigating Clinical Data Integration for Enhanced Healthcare Management: A Focus on Feature Determination Based on Mutual Information for Multiclassification Performance

Authors: Galeti Sivaprasad
DIN
IMJH-SVU-MAY-2024-5
Abstract

This paper introduces a novel feature selection algorithm based on Mutual Information (MI) for both continuous and discrete-valued features. By evaluating the Mutual Information between combinations of features and classes, rather than individual features, the algorithm aims to retain class-discriminative information while reducing the feature set size. The proposed method is applied to the Vehicle dataset from the UCI Machine Learning Repository, where feature dimensionality is reduced using MI-based feature selection followed by a covering technique. The resulting subset of features undergoes preprocessing to ensure data distribution consistency. The classification performance, particularly using Multilayer Perceptron (MLP), demonstrates promising results compared to conventional classifiers and approaches using individual feature information.

Keywords
Mutual Information (MI) Feature Selection Multiclass Classification Multilayer Perceptron (MLP) Clinical Data Integration
Introduction

The quest for efficient feature selection algorithms in classification has a rich history in pattern recognition. Streamlining input feature sets can be crucial, aiming to simplify classifier construction and operation. Mutual Information (MI), measuring the information between two variables, offers a symmetric, non-negative metric, vital for identifying dependencies. This paper introduces an advanced feature selection algorithm based on MI, promising broader applicability, enhanced performance in classification tasks, reduced computational complexity, and enriched insight into feature interactions within classification problems.

Conclusion

This study explores methods to tackle the crucial task of feature selection in classification. It introduces a feature selection approach employing Mutual Information (MI) for feature elimination alongside MLP and Logistic Regression classifiers to identify key features. The combination of feature selection, preprocessing, and classification techniques yields promising classification outcomes. Evaluating the method's effectiveness using various classification metrics confirms that reducing the feature count enhances model accuracy. Efficient detection algorithms and feature selection methods are essential for effectively identifying classes within large datasets.

Article Preview