Assignment of operational taxonomic units for metagenomic experiments
Abstract
Species characterization is a challenging task in metagenomics mainly due to data complexity
and sequence fragmentation. Reliance on broadly conserved genes as phylotyping markers for
assigning sequences to their operational taxonomic units is limited by high cost of sequencing
and the fact that some markers do not span the entire phylogenetic range. To expand the
phylotyping candidate loci for both microbial and viral communities, a set of markers including
narrowly conserved genes was assessed against genes used in automated pipeline for
phylogenomic analysis. The study assessed Clustering of bacteria and archae (microbial) and
viral genomes‟ orthologs, identified suitable phylotyping markers in microbial and viral genomes
and and compared the identified markers against AMPHORA. OrthoMCL analysis was
employed for clusters generation followed by selection of suitable phylotyping candidates on the
basis of sequence identity. Sequences with at least 70 and 55 percent identities for the microbial
and viral databases respectively were selected and their phylotyping accuracy determined on
simulated pyrosequencing datasets. Up to 4 times increase in sensitivity was achieved with
increased number of markers from 31 to 145 and high specificity of 0.99 was recorded at all
taxonomic ranks assessed. Nevertheless, there was notable increase in number of misclassified
reads with increased reference markers. Viral markers on the other hand showed average
sensitivity and high specificity of 0.97. Recorded increase in sensitivity indicates that use of
narrowly conserved genes will improve the accuracy with which metagenomics reads are placed
into their respective taxa. The results will hence equip medical practioners with disease outbreak
preparedness and will facilitate pathogen management. In addition, accurate large-scale analysis
of environmental samples to determine composition of microbial and viral communities in an
ecosystem will be enhanced.
Citation
Master of Science degree in Bioinformatics.Publisher
University of Nairobi Centre for Biotechnology and Bioinformatics