An Excited Cuckoo Search-grey Wolf Adaptive Kernel Svm For Effective Pattern Recognition In Dna Microarray Cancer Chips

Segera, Rene D

View/Open

Full text (6.209Mb)

Date

2021

Author

Segera, Rene D

Type

Thesis

Language

Metadata

Show full item record

Abstract

The scarcity of patient samples, curse-of-dimensionality and class imbalance of the available DNA microarray chips remain big hindrances for researchers to accurately and reliably classify cancerous tissues without overfitting. Moreover, these challenges are magnified when resource (computational power and memory) constrained devices like smart phones, tablets, and personal digital assistants are used to mine these datasets, rendering effective portable microarray data mining a very difficult task to achieve. Thus, gene selection and classification have turned out to be the most researched topics in DNA microarray based cancer diagnosis. An effective gene selection phase derives an informative gene subset from otherwise a highly dimensional dataset to reduce noise, computational overheads and model overfitting. On the other hand, an enhanced learning and classification phase builds a model that accurately and reliably classify a given DNA patient sample. This research has formulated a novel memetic approach: Excited-(E)-Adaptive Cuckoo Search-(ACS)-Intensification Dedicated Grey Wolf (IDGWO), i.e. EACSIDGWO for optimal gene selection. EACSIDGWO is an algorithm where the step size of ACS and the nonlinear control strategy of parameter !→of the IDGWO are innovatively made adaptive via the concept of the complete voltage and current responses of a direct current (DC) excited resistor-capacitor (RC) circuit. Since the population has a higher diversity at early stages of the proposed EACSIDGWO algorithm, both the ACS and IDGWO are jointly involved in local exploitation. Furthermore, to enhance mature convergence at later stages of the proposed algorithm, the role of ACS is switched to global exploration while the IDGWO is still left conducting the local exploitation. The performance of EACSIDGWO as a gene selector is evaluated on six standard DNA microarray chips derived from Irvine (UCI) repository namely Ovarian Cancer(4000 genes), Central Nervous System Cancer (7129 genes), Colon Cancer (2000 genes), Breast Cancer Wisconsin(prognosis) (33 genes), Breast Cancer Wisconsin(diagnostic) (30 genes) and SPECTF Heart Cancer (44 genes). The EACSIDGWO achieved the most compact informative gene subsets along with the highest classification accuracies as follows: Ovarian Cancer (274 genes, 100%), Central Nervous System Cancer (1208 genes, 72%), Colon Cancer (538 genes, 91%), Breast Cancer Wisconsin (prognosis) (5 genes, 87%), Breast Cancer Wisconsin (diagnostic) (3 genes, 98%) and SPECTF Heart Cancer (4 genes, 88%). Extended Binary Cuckoo Search (EBCS), the second best state-of-the-art published algorithm, attained the following: Ovarian Cancer (1811 genes, 99%), Central Nervous System Cancer (3446 genes, 67%), Colon Cancer (988 genes, 89%), Breast Cancer Wisconsin (prognosis) (6 genes, 86%), Breast Cancer Wisconsin (diagnostic) (3 genes, 97%) and SPECTF Heart Cancer (6 genes, 86%). The results indicate that the proposed technique has comprehensive superiority in reducing the size of informative gene subsets as well as locating the most significant optimal gene subsets. To improve the performance of the classification phase (the last stage of the DNA microarray-based cancer analysis), another novel hybrid model is proposed. This model is based on particle swarm optimization (PSO), principal component analysis (PCA) and multiclass support vector machine (MCSVM) i.e. PSO-PCALGP- MCSVM. The MCSVM adopts a novel hybrid Linear-Gaussian-Polynomial (LGP) kernel formulated in this research. The hybrid LGP kernel innovatively combines the advantages of three standard kernels (Linear, Gaussian and Polynomial) in a novel manner, where a Gaussian kernel embedding a Polynomial kernel is linearly combined with a Linear kernel. To reveal the superior global gene extraction, prediction and learning ability of this model against three single kernel-based models: PSO-PCA-L-MCSVM (using a single Linear kernel), PSO-G-MCSVM (using a single Gaussian kernel) and PSO-P-MCSVM (using a single Polynomial kernel), four datasets: Colon cancer, Acute Lymphoblastic Leukemia-Acute myeloid Leukemia (ALL-AML), St. Jude Leukemia dataset and Lung cancer were used. Adopting three extended evaluation metrics (G-mean, Accuracy (Acc) and F-score) the proposed model achieved the following: Colon Cancer (G-mean: 0.88, Acc: 0.88, F-score: 0.87), ALL-AML (G-mean: 0.94, Acc: 0.94, F-score: 0.94), Lung Cancer (G-mean: 0.99, Acc: 0.97, F-score: 0.96) and St. Jude Leukemia dataset (G-mean: 0.97, Acc: 0.96, F-score: 0.90). The PSO-G-MCSVM, the second best published model, attained the following: Colon Cancer (G-mean: 0.82, Acc: 0.82, Fscore: 0.82), ALL-AML (G-mean: 0.94, Acc: 0.94, F-score: 0.94), Lung Cancer (G-mean: 0.98, Acc: 0.96, F-score: 0.93) and St. Jude Leukemia dataset (G-mean: 0.97, Acc: 0.95, F-score: 0.85). Considering the reported compact informative gene subsets selection along with the very high classification accuracy, it is evident that the proposed models are promising DNA microarray data mining tools for both cost effective computers and online servers ,as well as resource constrained mobile devices.