dc.description.abstract | Near-infrared Raman spectroscopy is a spectroscopic technique capable of providing
fingerprint-type information on biochemical molecules. For the early detection of cancer, specific
biomarkers, e.g., biofluids’ biomarkers, need to be detected with high sensitivity. This enhances
diagnostic accuracy in detecting biochemical fingerprints that would point to onset of cancer
development. The aim of this study was to test and evaluate novelized machine learning techniques
for detection and identification of trace biomarker alterations in saliva and blood pointing to the
onset and progression of leukemia and breast cancers via a laser Raman spectral analysis approach.
The spectral measurements were done in 393-2063 cm-1 region, based on a 785 nm excitation laser.
The spectral data analysis were done in the 500-1800 cm-1 region; the considered fingerprint region
for biological specimens.
Trace biomarkers were studied by analysis of intermediate and higher-order principal
components. The utility of intermediate and higher-order principal components in revealing trace
biochemical alterations (trace biomarkers) in biological samples was first experimented on
prostatic cells’ spectra data. The statistical relevance of principal components were determined by
the use of the two-sample t-test and the effect size statistical criteria. For breast cancer and
leukemia studies, the concentrations of trace biomarkers were estimated using the partial least
squares regression model applied to the spectra of pure compounds and the biofluids spectrum.
Whole blood and saliva simulates spiked with prepared concentrations of the various biochemical
components ranging from 1 ppm to 500 ppm were used for for method development. Then, various
optimized machine learning techniques that included independent component analysis (ICA),
multidimensional scaling (MDS), partial least square discriminant analysis (PLS-DA), kernel
density estimators, support vector machines (SVM), and backpropagation neural networks
(BPNN) were utilized to analyze and classify the blood and saliva trace biomarkers’ Raman spectra
from healthy and diseased subjects.
Results using pairwise comparison of mean intensity (peak intensity ratios) and
multivariate statistical techniques disclosed that biochemical changes of proteins, lipids, and
nucleic acid components can be associated with prostate cancer, breast cancer, and leukemia
progression. Four prominent regions: cytosine / guanine (566 ± 0.70 cm-1), glycerol (630 cm-1),
saccharides (1370 ± 0.86 cm-1), tryptophan (1618 ± 1.73 cm-1); and six subtle regions:
phospholipids (1076 cm-1), amide III (1232, 1234 cm-1), amide III (1276, 1278 cm-1),
phospholipids / nucleic acids (1330, 1333 cm-1), lipids (1434, 1442 cm-1), amide II (1471, 1479
cm-1) were identified, which can be regarded as useful biomarkers for prostate cancer diagnosis.
Six spectral bands were determined: glycerol (589 cm-1), tryptophan / phosphatidylinositol (594
cm-1), glutamate / tryptophan (630 cm-1), glutamate (1626 cm-1), glycine / valine (1630 cm-1), and
amide I / β-carotene (1638 cm-1) which can be regarded as new biomarkers of breast cancer in the
blood-based breast cancer spectroscopy.
The fitting model revealed that trace proteins, nucleic acids, and lipid biochemicals in
blood and saliva increased with breast malignancy, whereas amounts of glycogen decreased with
progression of breast malignancy. For blood samples, the determined concentrations of proteins,
saccharides, amino acids, nucleic acids and lipids components in diseased patients were in the
range of 237.82-384.96 ppm, 36.4-84.3 ppm, 14.31-83.69 ppm, 66.4-96.8 ppm, and 71.95-297
ppm, respectively, whereas respective concentrations in control samples were 233.86 ppm, 73.7
ppm, 10.48 ppm, 62.1 ppm, and 18-190 ppm. For saliva samples, concentrations of 62.5-126.3
ppm, 11.5-33.9 ppm, 4.90-20.6 ppm, 7.60-9.16 ppm, and 359.6 ppm representing trace proteins,
saccharides, amino acids, nucleic acids and lipids in diseased patients were obtained. The
respective concentrations in control samples were 27.7 ppm, 33.9 ppm, 2.17-3.66 ppm, 7.35 ppm,
and 43.9-145.2 ppm.
The quantitative analysis based on the selected trace biomarker regions suggested that
biochemical changes of proteins and membranous lipids increased with leukemia malignancy
whereas biochemical changes of nucleic acids, glycogen, and non-membranous lipids decreased
with leukemia malignancy. For blood samples, the determined concentrations of proteins,
saccharides, amino acids, nucleic acids and lipids components in diseased patients were 6.14 ppm,
2.8 ppm, 1.89-11.1 ppm, 32.25 ppm, and 2.21-3.9135 ppm, respectively, whereas respective
concentrations in control samples were 4.04 ppm, 2.72 ppm, 2.29-14.7 ppm, 15.61 ppm, and 4.32-
7.1565 ppm. For saliva samples, concentrations of 8.737 ppm, 7.82 ppm, 15.88-17.80 ppm, 5.077
ppm, 0.282-3.645 ppm representing trace proteins, saccharides, amino acids, nucleic acids and
lipids in diseased patients were obtained. The respective concentrations in control samples were
11.39 ppm, 14.90 ppm, 1.72-5.04 ppm, 1.069 ppm, and 1.81-4.769 ppm.
The cross-validated models utilized to analyze and classify the blood and saliva Raman
spectra from healthy subjects, breast tumor patients, and leukemia patients yielded diagnostic
sensitivities of 46% to 100%, as well as specificities of 71% to 100%. Although the number of
samples involved in this study were few, the results demonstrate that analysis of Raman spectra of
blood and saliva using optimized machine learning diagnostic algorithms has great potential for
the noninvasive and label-free detection of breast cancer and leukemia. | en_US |