Loading…
Back To Schedule
Tuesday, July 28 • 18:50 - 18:55
Black cat in a dark room: search for new viruses in metagenomes

Log in to save this to your schedule, view media, leave feedback and see who's attending!



Detection of hidden viral diversity is a challenging task, which goes beyond the standard protocol of processing metagenomic data. Meanwhile, publicly available databases contain a large amount of metagenomic data — the promising source of novel viral genomes, which remains largely understudied. Here we present the new pipeline for detecting full-length viral genomes from assembled metagenomes.
Viral genomes represent cyclic or linear molecules with the ends containing repeated sequences. Both types could be recognized as cyclic sequences. We detect such contigs by searching repeats ranging from 50 to 200 bp using Knuth-Morris-Pratt algorithm. This algorithm takes linear time depending on the maximum length of allowed repeat, which permits to process large amounts of data and reduce its dimensionality. We classify cyclic sequences as viral or non-viral based on predicted gene content using viralVerify tool (github.com/ablab/viralVerify). For each selected viral contig we identify the capsid and terminase genes based on HMM profiles. We aligned found protein sequences against nr NCBI database with Diamond. The protein sequences, both queries and hits, belonging to each HMM profile were clustered with CD-HIT v4.8.1 (span 80%, identity 50%). The resulting centroid sequences were aligned using MAFFT v7.310 with default parameters, followed by phylogeny reconstruction using UPGMA and RAxML v8.2.11 separately. Clusters that do not contain any hits were classified as previously unknown. Annotation of selected contigs was performed by VGAS tool. The completeness of viral contigs were inspected with (https://github.com/mikeraiko/viralComplete).
We tested our pipeline on assembled metagenomes from NCBI Assembly database. More than 170 Gb of data representing about 1300 metagenomes derived from seawater, soil and biofilms habitats were analyzed. Our analysis revealed that the diversity of viruses is much greater than we know up to date. Hundreds of new viruses clusters were detected. For example, we identified 3 new representatives of the Siphoviridae and Podoviridae bacteriophage families from 10 biofilm-derived metagenomes. Our approach allows us to detect full-length viral genomes with lower chance of false-positive result. In the future, the user of our pipeline can submit metagenome assemblies or raw reads to the input and receive annotated viral genomes from the data. Further analysis of metagenomes from other habitats is indispensable.
This work was supported by St. Petersburg State University (project ID 51555639).

Speakers
avatar for Yulia Yakovleva

Yulia Yakovleva

Department of Cytology and Histology, Saint Petersburg State University, Bioinformatics Institute
Saint Petersburg State University, Russia


Tuesday July 28, 2020 18:50 - 18:55 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09