Identification of Biomarkers for Determining the Severity of Sars-cov-2 Infection Using Deep Learning and RNAseq Data
Abstract
The COVID-19 pandemic, caused by SARS-CoV-2, poses a significant global health challenge, with varying severity among infected individuals. Understanding the molecular mechanisms of disease severity and identifying reliable prognostic biomarkers is crucial for effective management. This study aimed to investigate biomarkers associated with SARS-CoV-2 Beta infection severity using Deep Learning and to elucidate underlying molecular mechanisms. RNASeq data from SARS-CoV-2 Beta and Omicron-infected samples obtained from GEO were examined to identify differentially expressed genes (DEGs). Data augmentation strategies were used to enhance dataset size and Machine Learning model accuracy for biomarker discovery.
Comparative analysis between Beta and Omicron variants revealed shared molecular signatures, such as IFI27 and OTOF, and Biological Processes associated with type I IFN response, defense response to the virus, and regulation of viral life cycle, suggesting a common response to the virus strands. Beta severity analysis identified highly upregulated genes related to immune response, such as C8B, DEFA1, and CD177, involved in complement activation and neutrophil degranulation. Genes related to the Rh (RHAG), Dombrock (ART4), and MNS (GYPA) blood groups were also highly upregulated, suggesting potential associations with SARS-CoV-2 infection severity. Feature selection with Recursive Feature Elimination with Cross-Validation method identified two sets of optimal gene variables on cGAN- and cwGAN-augmented datasets, respectively. Gene Ontology Enrichment analysis of overlapping features revealed upregulation of neutrophil degranulation and downregulation of T-cell activity, consistent with previous findings. ROC analysis using a Random Forest machine learning model and the five most important biomarkers (CCDC65, ZNF239, OTUD7A, CEP126, and TCTN2) achieved high accuracy (AUC: 0.98, Acc: 0.94) in predicting disease severity. These genes are associated with biological processes such as cilium assembly, IFN activation, and NF-kB pathway suppression. These findings concur with previously identified mechanisms and offer novel insights into transcriptional host response in severe COVID-19 cases. Further experimental validation is however required to assess the applicability of identified biomarkers in diverse patient populations.
Publisher
University of Nairobi
Rights
Attribution-NonCommercial-NoDerivs 3.0 United StatesUsage Rights
http://creativecommons.org/licenses/by-nc-nd/3.0/us/Collections
The following license files are associated with this item: