Comparative Analysis of Supervised Learning Algorithms for Predicting the Cellular Localization Sites of Proteins with Yeast Dataset

Authors: D. Induja
DIN
IMJH-SVU-MAY-2023-4
Abstract

The examination of protein restriction locales is a significant errand in bioinformatics. Foreseeing the yeast protein restriction locales is a promising space among various exploration techniques in view of the yeast protein estimation information which have numerous records/highlights. Proteins are a significant piece of the organic entity and are engaged with pretty much every cycle in the cell. This research paper presents a comparative analysis of two supervised learning algorithms, Multilayer Perceptron (MLP) and Logistic Regression, for predicting the cellular localization sites of proteins. The algorithms were evaluated based on their accuracy, precision, and recall. The results and their implications are discussed, leading to a conclusion about the effectiveness of each algorithm in this predictive modeling task.

Keywords
Protein Cellular Localization Prediction Yeast Dataset Analysis Multilayer Perceptron (MLP) Logistic Regression Classifier Supervised Learning in Bioinformatics
Introduction

A biochemical component is protein. It has one or more polypeptides that have been folded into a fiber or spherical. As a biological process, it operates. Peptide connections between the carboxyl groups and the amino acids of nearby residues are what hold together polypeptides, which are linear strands of amino acids [5]. The Yeast Protein Localization is the data of protein localization patterns in the yeast (Saccharomyces cerevisiae). Learning the functions and roles of yeast proteins involved in all cellular processes is essential to predict the yeast protein localization sites. The localization sites can also be used to evaluate protein information indicated from gene data. Additionally, we can infer which pathway an enzyme belongs to by its proper localization sites [6]. In light of the importance of predicting the yeast protein localization sites, many researchers involved in biology and computer science have been making effort to explore the prediction methods in the domain, and a great deal of research methods and results have emerged. Despite recent technical advances in the prediction of the yeast protein localization sites, experimental results have low accuracy and experimental determination remains time-consuming and labour-intensive. 

The order of genes encoded in the genetic code determines the amino acid sequence of the protein. The genetic code typically consists of the conventional twenty amino acids [6] and the pattern of protein folding varies depending on the organism or the use of cells. Importantly, these sorting proteins can also be applied in medicinal settings. Additionally, the information can be used to help with genetic modification, boost protein quality, or meet requirements.

Conclusion

In conclusion, the comparative analysis of the Multilayer Perceptron and Logistic Regression for predicting the cellular localization sites of proteins indicates that the Multilayer Perceptron outperforms Logistic Regression in terms of accuracy, precision, and recall. The results affirm the effectiveness of the Multilayer Perceptron as a robust algorithm for this specific predictive modeling task. Researchers and practitioners in bioinformatics can benefit from leveraging the Multilayer Perceptron for accurate protein localization predictions, contributing to advancements in protein research and related fields.

Article Preview