Loading…

Log in to bookmark your favorites and sync them to your phone or calendar.

Monday, July 27
 

13:45 MSK

Opening notes
Speakers
avatar for Anton Korobeynikov

Anton Korobeynikov

Associate Professor, Center for Algorithmic Biotechnology, Saint Petersburg State University, 6 linia V.O., 11/21d, 1990034 St Petersburg, Russia
avatar for Alla Lapidus

Alla Lapidus

Professor, Center for Algorithmic Biotechnology, Saint Petersburg State University, 6 linia V.O., 11/21d, 1990034 St Petersburg, Russia



Monday July 27, 2020 13:45 - 14:00 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

14:00 MSK

MGnify: Introduction to course organisation
Speakers
avatar for Alexandre Almeida

Alexandre Almeida

Postdoctoral Fellow (ESPOD), EMBL-EBI
I am an EBI-Sanger Postdoctoral Fellow focusing on the study of the human gut microbiome using genome-resolved metagenomics. My main research interest is understanding the role of the large uncultured diversity of the gut microbiome in human health and disease.
avatar for Rob Finn

Rob Finn

Team Leader, Sequence Families, EMBL-EBI
Dr Rob Finn leads EMBL-EBI’s Microbiome Informatics team, which is responsible for the MGnify resource, which provides access to the metagenomics, metatranscriptomics and assembly analysis services. The functional and taxonomic profiles of these datasets, once made public, can be... Read More →
avatar for Lorna Richardson

Lorna Richardson

Microbiome Resources Co-ordinator, EMBL-EBI
avatar for Ekaterina Sakharova

Ekaterina Sakharova

Bioinformatician, EMBL-EBI


Monday July 27, 2020 14:00 - 14:15 MSK
Zoom Mgnify https://zoom.us/j/93441398259?pwd=ZVRiWWl5ZWFpNlVQZUhVcDB0aTBndz09

14:15 MSK

MGnify: services offered (Part 1)
Speakers
avatar for Alexandre Almeida

Alexandre Almeida

Postdoctoral Fellow (ESPOD), EMBL-EBI
I am an EBI-Sanger Postdoctoral Fellow focusing on the study of the human gut microbiome using genome-resolved metagenomics. My main research interest is understanding the role of the large uncultured diversity of the gut microbiome in human health and disease.
avatar for Rob Finn

Rob Finn

Team Leader, Sequence Families, EMBL-EBI
Dr Rob Finn leads EMBL-EBI’s Microbiome Informatics team, which is responsible for the MGnify resource, which provides access to the metagenomics, metatranscriptomics and assembly analysis services. The functional and taxonomic profiles of these datasets, once made public, can be... Read More →
avatar for Lorna Richardson

Lorna Richardson

Microbiome Resources Co-ordinator, EMBL-EBI
avatar for Ekaterina Sakharova

Ekaterina Sakharova

Bioinformatician, EMBL-EBI


Monday July 27, 2020 14:15 - 15:15 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

15:30 MSK

MGnify: assembly pipeline and website (Part 2)
Speakers
avatar for Alexandre Almeida

Alexandre Almeida

Postdoctoral Fellow (ESPOD), EMBL-EBI
I am an EBI-Sanger Postdoctoral Fellow focusing on the study of the human gut microbiome using genome-resolved metagenomics. My main research interest is understanding the role of the large uncultured diversity of the gut microbiome in human health and disease.
avatar for Rob Finn

Rob Finn

Team Leader, Sequence Families, EMBL-EBI
Dr Rob Finn leads EMBL-EBI’s Microbiome Informatics team, which is responsible for the MGnify resource, which provides access to the metagenomics, metatranscriptomics and assembly analysis services. The functional and taxonomic profiles of these datasets, once made public, can be... Read More →
avatar for Lorna Richardson

Lorna Richardson

Microbiome Resources Co-ordinator, EMBL-EBI
avatar for Ekaterina Sakharova

Ekaterina Sakharova

Bioinformatician, EMBL-EBI


Monday July 27, 2020 15:30 - 16:30 MSK
Zoom Mgnify https://zoom.us/j/93441398259?pwd=ZVRiWWl5ZWFpNlVQZUhVcDB0aTBndz09

17:00 MSK

Indexing large and numerous sequencing datasets
Genomic analyses often rely on sequence comparisons. The exponential growth of sequencing data repositories prompts the development of ever-faster algorithms for sequence search: starting from the Smith-Waterman algorithm for pairwise alignment [1], then Blast-like approaches for searching in sequence databases [2], and more recent breakthroughs in database indexing strategies (e.g. Diamond [3], or BIGSI [4]). But the recent data deluge means that even these latest tools can not be used to screen across the full set of sequencing experiments available today.
In this talk, I propose to focus on the problem of querying large unassembled raw sequencing data on the fly, for instance towards the goal of searching for a sequence of interest in all publicly available metagenomes. Hence, I will propose an overview of current methods dedicated to the indexation of large and numerous genomic datasets. These methods are mainly based on the indexation of kmers, words of length k.
Finally, I will focus on a novel strategy to construct a bloom-filter based data-structure, HowDe-SBT [5], one of the state-of-the-art index data-structures. I will present the algorithmic foundations and the current results.

[1] Smith, T. F., & Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of molecular biology
[2] Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of molecular biology
[3] Buchfink, B., Xie, C., & Huson, D. H. (2015). Fast and sensitive protein alignment using DIAMOND. Nature methods
[4] Bradley, P., Den Bakker, H. C., Rocha, E. P., McVean, G., & Iqbal, Z. (2019). Ultrafast search of all deposited bacterial and viral genomic data. Nature biotechnology
[5] Harris, R. S., & Medvedev, P. (2019). Improved representation of sequence Bloom trees. Bioinformatics

Speakers
avatar for Pierre Peterlongo

Pierre Peterlongo

Research scientist, Inria Rennes Bretagne Atlantique, GenScale team


Monday July 27, 2020 17:00 - 17:15 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

17:00 MSK

Q & A: keynotes
Speakers
avatar for Pierre Peterlongo

Pierre Peterlongo

Research scientist, Inria Rennes Bretagne Atlantique, GenScale team
avatar for Robert Fulton

Robert Fulton

Director of Technical Development, McDonnell Genome Institute
I'm the Director of Technology Development and have >25 years of Genomics experience.  I'm happy to discuss a broad range of wet lab operations associated with genomics.  
avatar for Tatiana Tatarinova

Tatiana Tatarinova

Fletcher Jones Endowed Chair in Computational Biology, University of LaVerne
Professor of Computational biology moonlighting as a rock musicianhttps://soundcloud.com/tatiana-tatarinova-378061263/zdes-314-zdes
avatar for Inna Dubchak

Inna Dubchak

Affiliate, Lawrence Berkeley National Laboratory
avatar for Terry Gaasterland

Terry Gaasterland

Professor of Computational Biology and Genomics; Director,, Bioinformatics & Systems Biology Program, University of California, San Diego
Trained originally as a computer scientist, I transitioned into Computational Biology as an application area for logic-based data and query integrity-checking methods (used in my early, purely CS work for Cooperative Query Answering). I quickly became fascinated with the idea of... Read More →
avatar for Stephen Nayfach

Stephen Nayfach

Research Scientist, Joint Genome Institute


Monday July 27, 2020 17:00 - 19:00 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

17:15 MSK

A scalable prokaryotic taxonomy in the age of big data
The great majority of microorganisms have yet to be cultured and characterised. This so-called “microbial dark matter” is now being revealed at an ever-increasing rate by sequence-based culture independent methods. In the past few years, thousands of near complete genomes of uncultured microbes have been assembled from sequence data obtained directly from environmental and clinical sources providing the opportunity to fully articulate microbial diversity for the first time. Current estimates suggest that cultured microorganisms only capture ~15% of total microbial diversity based on evolutionary divergence of marker genes. We propose a genome-based taxonomy founded on the existing classification of cultured organisms, but corrected for polyphyletic groups and calibrated to take into account relative evolutionary divergence. The result is a fully systematized classification of Bacteria in an evolutionary framework. Of ~100,000 publicly available bacterial genomes, over half required one or more changes to their existing taxonomy. These include extensive changes at both high ranks, such as amalgamation of the Candidate Phyla Radiation into one phylum and low ranks including subdivision of the genus Clostridium into more than 100 distinct genera.



Speakers
avatar for Phil Hugenholtz

Phil Hugenholtz

Professor of Microbiology; Director, Australian Centre for Ecogenomics, The University of Queensland, Australia


Monday July 27, 2020 17:15 - 17:30 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

17:30 MSK

The McDonnell Genome Institute at Washington University: Contributions towards the Human Reference Genome
TBA

Speakers
avatar for Robert Fulton

Robert Fulton

Director of Technical Development, McDonnell Genome Institute
I'm the Director of Technology Development and have >25 years of Genomics experience.  I'm happy to discuss a broad range of wet lab operations associated with genomics.  


Monday July 27, 2020 17:30 - 17:45 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

17:45 MSK

Turning systems off: Negative regulators in gene expression programs
Stem cells balance the maintenance of pleuripotency and readiness to differentiate. Upon initiation of differentiation, they must be ready to proceed along a multiplicity of fates.  Conflicting signals between states must be regulated during transitions, and this regulation may be critical to completing the transition. Wnt signaling governs both maintenance of pleuripotency (at low doses) and differentiation (at higher doses) in stem cells.  Our recently reported discovery and characterization of a new, essential regulator of the Wnt signaling pathway provided a missing piece in how stem cells initiate differentiation. Time series RNA-seq data of WNT3a-treated human embryonic stem cells revealed a new transcription factor that regulates Wnt-response by displacing a ubiquitous transcription factor to repress gene expression. ChIP-seq data for each transcription factor revealed specific genes targeted by this negative feedback system.  Genome-editing disabled the new transcription factor and confirmed the negative feedback loop. This talk will (1) trace how the regulatory loop emerged from the multiple 'omics datas and also cover two related studies that used the interpretation of gene expression patterns through RNA-seq to explore (2) Wnt-pathway regulation of "head-tail" organization during differentiation, and (3) dose-dependent Wnt-regulation of lineage specificity.  The latter study engineered and characterized a new line of intermediate mesodermal progenitor cells positioned to incorporate and contribute to kidney.

Speakers
avatar for Terry Gaasterland

Terry Gaasterland

Professor of Computational Biology and Genomics; Director,, Bioinformatics & Systems Biology Program, University of California, San Diego
Trained originally as a computer scientist, I transitioned into Computational Biology as an application area for logic-based data and query integrity-checking methods (used in my early, purely CS work for Cooperative Query Answering). I quickly became fascinated with the idea of... Read More →



Monday July 27, 2020 17:45 - 18:00 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

18:00 MSK

Assessing the quality of metagenome-assembled viral genomes
Over the last several years, millions of new viral sequences have been identified from metagenomes that have vastly expanded knowledge of Earth’s viral diversity. However, these sequences range from small fragments to complete genomes and can be flanked by host DNA for integrated proviruses. To address these problems, we developed CheckV, which is an automated pipeline for estimating the completeness of viral genomes as well as the identification and removal of contamination from the host organism. After validation on mock datasets, CheckV was applied to large and diverse viral genome collections, including IMG/VR and the Global Ocean Virome. This revealed >40,000 high-quality genomes with >90% completeness, though the vast majority of sequences were small genome fragments. Additionally, we found that removal of host contamination significantly improved identification of auxiliary metabolic genes and interpretation of viral-encoded functions. We expect CheckV will be broadly useful for all researchers studying and reporting viral genomes assembled from metagenomes. CheckV is freely available at: http://bitbucket.org/berkeleylab/CheckV.

Speakers
avatar for Stephen Nayfach

Stephen Nayfach

Research Scientist, Joint Genome Institute


Monday July 27, 2020 18:00 - 18:15 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

18:15 MSK

VISTA family of tools for comparative genomics
Comparing genomic sequences across related species is a fruitful source of biological insight. The first step in comparing genomic sequences is to align them — to map the letters of the sequences to each other. After an alignment is computed, visualization frameworks become essential to enable users to interact with the sequence and conservation data, especially in the context of longer DNA sequences or whole genomes. Visualization frameworks should be easy to understand by a biologist and provide insight into the mutations that a particular genomic locus has undergone.

The VISTA portal is a comprehensive comparative genomics resource that provides biomedical scientists with a single unified framework to generate and download multiple sequence alignments, visualize the results in the context of existing annotations and analyze comparative results in search for important sequence signals in alignments. The VISTA suite of programs has been in development and continued use since 2000. VISTA has popularized the visualization of the level of conservation in the format of a continuous curve based on the conservation in a sliding window. These concepts proved to be extremely successful due to the easy interpretation of the resulting plots.

Speakers
avatar for Inna Dubchak

Inna Dubchak

Affiliate, Lawrence Berkeley National Laboratory


Monday July 27, 2020 18:15 - 18:30 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

18:30 MSK

Local Ancestry
In association studies, researchers combine samples with different (usually, opposing) phenotypes and compare frequencies of Single Nucleotide Polymorphisms (SNPs) in two groups. When a sufficient number of samples are available (typically thousands of samples for complex traits), a significant difference in frequencies between the groups indicates the existence of an association between the position on the genome and the studied phenotype.
However, there is a danger that the association due to the study group is not homogenous in terms of provenance/origin (for example all people with the disease are of French origin, and the healthy cohort is Bulgarian). In such a case, two populations may have different frequencies of ancestry informative markers (AIM), that are not causal to the phenotype. This situation is particularly real if the phenotype/disease is more common in one population than in another, which is typical in crop breeding and human GWAS studies. In such cases, the samples are almost certainly biased. We developed a new tool for determining local origin along a genome from whole-genome sequencing or high-density genotyping experiments

Speakers
avatar for Tatiana Tatarinova

Tatiana Tatarinova

Fletcher Jones Endowed Chair in Computational Biology, University of LaVerne
Professor of Computational biology moonlighting as a rock musicianhttps://soundcloud.com/tatiana-tatarinova-378061263/zdes-314-zdes



Monday July 27, 2020 18:30 - 18:45 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

19:00 MSK

MGnify: Hands-on review (parts 1-2)
Speakers
avatar for Alexandre Almeida

Alexandre Almeida

Postdoctoral Fellow (ESPOD), EMBL-EBI
I am an EBI-Sanger Postdoctoral Fellow focusing on the study of the human gut microbiome using genome-resolved metagenomics. My main research interest is understanding the role of the large uncultured diversity of the gut microbiome in human health and disease.
avatar for Rob Finn

Rob Finn

Team Leader, Sequence Families, EMBL-EBI
Dr Rob Finn leads EMBL-EBI’s Microbiome Informatics team, which is responsible for the MGnify resource, which provides access to the metagenomics, metatranscriptomics and assembly analysis services. The functional and taxonomic profiles of these datasets, once made public, can be... Read More →
avatar for Lorna Richardson

Lorna Richardson

Microbiome Resources Co-ordinator, EMBL-EBI
avatar for Ekaterina Sakharova

Ekaterina Sakharova

Bioinformatician, EMBL-EBI


Monday July 27, 2020 19:00 - 20:00 MSK
Zoom Mgnify https://zoom.us/j/93441398259?pwd=ZVRiWWl5ZWFpNlVQZUhVcDB0aTBndz09
 
Tuesday, July 28
 

11:00 MSK

Installing and Searching BLAST Databases in a Data Science Framework
Data science embodies a pipeline of processes: acquisition, cleaning and organization of data, quality control and assurance, validation, and downstream visualization and analytics. Because of the overwhelming number of tools for each of these steps, the greatest challenge is often making those tools work in concert to facilitate a thorough and insightful analysis.
The BIRCH system (http://home.cc.umanitoba.ca/~psgendb/) is a framework consisting of hundreds of bioinformatics tools, unified through the BioLegato family of programmable graphical applications. Each BioLegato application represents a specific class of biological objects, packaging together the data and the methods for each class of objects. We describe BioLegato applications for BLAST searches, implementing data science principles. For example, in blncbi the user retrieves sequences from NCBI using a graphical Entrez query builder. Amino acid sequences matching the query pop up in blprotein, a BioLegato application that displays proteins, and lets the user run protein-specific tasks. A protein can be selected for a BLAST search, and output will appear in bpfetch: a BioLegato spreadsheet object for protein hits. The blpfetch spreadsheet makes it easy to scan hundreds of hits, refining the list into one or more subsets for retrieval. Sequences are retrieved to a new blprotein object for downstream analysis. Because each object is a separate window with a small screen footprint, the user has more of a sense of working directly with the data than in typical web interfaces.
BioLegato gives the user flexibility at all steps in a pipeline. Because output of each step appears in a new BioLegato object, there are no dead ends. Output from one step can be used directly as input for subsequent steps because BioLegato takes care of things like file format conversion, which is a tedious and sometimes error-prone part of using tools at the command line. We call this process ad hoc pipelining. Ad hoc pipelining enables the user to learn from each step before going to the next. We also describe blastdbkit, a Python script run from BioLegato, for downloading and managing BLAST databases on the users's computer.
Together, these tools provide an integrated point and click pipeline for sequence database searches, within the context of the larger BIRCH system. New programs can be added to any BioLegato application by creating a file using BioLegato's PCD language, which specifies parameters to be set and a shell command to run the program. In this way, the core BIRCH functions can be integrated seamlessly with locally-installed bioinformatics software.

Posters
avatar for Brian Fristensky

Brian Fristensky

Associate Professor, University of Manitoba
RESEARCH:Phylogenomics of plant-pathogen interactionsDevelopment of bioinformatics softwareTEACHINGCytogeneticsPlant BiotechnologyBioinformatics



Tuesday July 28, 2020 11:00 - 11:05 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

11:00 MSK

MGnify: Hands-on review (parts 1-2)
Speakers
avatar for Alexandre Almeida

Alexandre Almeida

Postdoctoral Fellow (ESPOD), EMBL-EBI
I am an EBI-Sanger Postdoctoral Fellow focusing on the study of the human gut microbiome using genome-resolved metagenomics. My main research interest is understanding the role of the large uncultured diversity of the gut microbiome in human health and disease.
avatar for Rob Finn

Rob Finn

Team Leader, Sequence Families, EMBL-EBI
Dr Rob Finn leads EMBL-EBI’s Microbiome Informatics team, which is responsible for the MGnify resource, which provides access to the metagenomics, metatranscriptomics and assembly analysis services. The functional and taxonomic profiles of these datasets, once made public, can be... Read More →
avatar for Lorna Richardson

Lorna Richardson

Microbiome Resources Co-ordinator, EMBL-EBI
avatar for Ekaterina Sakharova

Ekaterina Sakharova

Bioinformatician, EMBL-EBI


Tuesday July 28, 2020 11:00 - 12:00 MSK
Zoom Mgnify https://zoom.us/j/93441398259?pwd=ZVRiWWl5ZWFpNlVQZUhVcDB0aTBndz09

11:00 MSK

Q & A: posters
Tuesday July 28, 2020 11:00 - 14:00 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

11:05 MSK

Using comparative transcriptomics to dissect the mechanisms of early maturation trait in garden pea (Pisum sativum L.)
Seed development remains a crucial point in modern molecular plant science. Although mechanisms underlying seed developmental progress have been thoroughly studied, the intraspecies differences in seed developmental rate are poorly dissected in most of the plant species. This problem is especially urgent for crop species, seeds of which are of agricultural and economic value. In this regard comparative transcriptomic and/or proteomic studies of plant accessions belonging to the same species yet differing in their seed development speed offer great opportunities for elucidation of respective mechanisms; however, only few are present to date. In this work we have used whole transcriptome sequencing (RNA-Seq) approach to study the mechanisms underlying early seed maturation feature in garden pea (Pisum sativum L.). An early maturing pea line Sprint-2 was chosen for this study. At the initial point of this work differential expression was estimated between Sprint-2 seeds at 10 and 20 days after pollination, indicating features of late embryogenesis at the former point and seed desiccation at the latter one. Further analysis of differential expression between Sprint-2 and two pea cultivars with intermediate maturation rate at the respective time points proved that Sprint-2 shows developmental retardation at the initial phases of development yet undergoes acceleration between 10 and 20 days after pollination. Further analysis of differentially expressed genes revealed altered gibberellin/abscisic acid ratio to be a putative mechanism underlying Sprint-2 developmental acceleration. We also found several features of early maturation in Sprint-2, which included successive switch between DNA methylation programs, premature activity of amylase genes, enhanced mobile element propagation and retarded storage protein accumulation. Finally, several point mutations absent in other pea cultivars were found in Sprint-2 affecting transcription factors belonging to the LAFL group of maturation regulators. We believe that, coupled with the thorough whole genome association analysis, these results may shed light on the mechanisms of seed maturation speed control and could be further utilized in pea breeding programs.
This work was financially supported by the Russian Science Foundation (grant No 17-16-01100).

Posters
YV

Yury V. Malovichko

All-Russia Research Institute for Agricultural Microbiology (ARRIAM), Pushkin, St. Petersburg, Russia, St. Petersburg State University, St. Petersburg, Russia



Tuesday July 28, 2020 11:05 - 11:10 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

11:10 MSK

Comparative in silico miRNA and their target analysis in Cestrum nocturnum L. and Cestrum diurnum L. in response to stress and flower-anthesis
The flowers of Cestrum nocturnum L. (CS1) and Cestrum diurnumL. (CS2) bloom at night and at day time, respectively. So, it is interesting to find out the responsible molecular factors especially miRNAs and their underlying mechanism governing the flower-anthesis in these two species. Homology-search based computational analysis was employed  for the identification of miRNAs in CS1 and CS2. A total of twenty six miRNA families (miR5218, miR3513-3p, miR1089, miR815a, miR815b, miR815c, miR1023a-5p, miR2673a, miR2673b, miR1095a, miR1095b, miR172, miR849, miR1063a, miR1063b, miR1063c, miR1063d, miR1063e, miR1063f, miR1063g, miR1063h, miR1070, miR172d, miR1023a-3p, miR5021, miR2919) in CS1 and ten miRNA families (miR5658, miR1533, miR5021, miR5256, miR167a, miR167b, miR167h, miR1436, miR5205a, miR5303) were predicted in CS2. Out of the which, fifteen and nine miRNA families in CS1 and CS2 were identified to regulate 1024 and 1007 corresponding target transcripts. These miRNAs showed significant function in plant circadian rhythm and several stress responses. A biological network of CS1 and CS2 was built based on the role of miRNAs, corresponding gene regulation and their validation. Simulataneously, the gene ontology and pathway analysis indicated that miR815a, miR849, and miR172d had regulatory control over the nocturnal rhythm in CS1 by regulating PIF1, UBP12, EFS and miR5021, miR5658, miR5658, miR1533, miR5205a, miR1436 had regulatory control over the diurnal rhythm of FT, COP, FUS9, GI-FB, PLP6, and SCL13 gene in CS2. MiR2919, miR849, miR2673a, miR5021, miR815a, miR172, miR1089 had a regulatory role over LRR, GRAS, Hsp70 in CS1 and miR5021, miR5303, miR5205a, miR1436 had the regulatory role over LRR, HSP70, MYB48, SOS1, NPC2 in CS2. Some of the corresponding genes were found to be involved in stress responses. The phylogenetic analysis indicates that, the predicted miRNA families display maximum sequence similarity with the reported miRNAs of the same family.

Posters
avatar for NASREEN BANO

NASREEN BANO

PhD Scholar (UGC-SRF), CSIR-NATIONAL BOTANICAL RESEARCH INSTITUTE LUCKNOW



Tuesday July 28, 2020 11:10 - 11:15 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

11:15 MSK

Bioinformatics analysis of short-chain fatty acid production potential in the human
Background: Short-chain fatty acids (SCFAs) are the end products of non-digestible dietary fiber fermentation by human intestinal microbiota. The main SCFAs are produced and absorbed in the large intestine include acetate, propionate, and butyrate. The microbiota-produced SCFAs have a regulatory role on the host homeostasis of glucose, lipid metabolism, and adipose tissue; they fuel intestinal epithelial cells promoting the decrease of oxygen concentration in the lumen and are involved in the regulation of the immune system and inflammatory response. Many factors including diet and nutritional status, medical conditions, and drugs affect the composition of human gut microbiota and the amount of SCFAs produced in the colon.
Methods and Results: We previously developed the Phenotype Profiler tool enabling prediction of metabolic capabilities of the human gut microbiome (HGM) based on 16S taxonomic abundance profiles. The tool is utilizing the concept of binary metabolic phenotype assigned to each genome in the reference collection of HGM genomes. We performed metabolic reconstruction of butyrate and propionate fermentation pathways in the reference set of >2,600 microbial genomes using a subsystems-based approach implemented in the SEED genomic platform. As result, each reference genome was assigned a binary (“1” or “0”) phenotype reflecting the presence/absence of at least one functional variant from at least four different metabolic pathway variants for both butyrate and propionate synthesis. The obtained binary phenotype matrix (BPM) for reference genomes was used to calculate a community phenotype matrix (CPM) for all mapped taxa obtained from 16S analysis. The community phenotype index (CPI) for each 16S samples was calculated as the sum of the respective CPM values of each taxon multiplied by the relative abundances. CPI gives a probabilistic estimate of the fraction of HGM cells possessing a specific metabolic pathway (such as butyrate or propionate synthesis).
We collected nine previously published datasets obtained in the course of either in vivo (human, mice, rats, piglets), or in vitro fermentation studies and containing both 16S metagenomics data and SCFA metabolomic measurements to investigate if the predicted metabolic potentials (calculated as CPI values) correlate with measured metabolite concentrations. Each 16S dataset was analyzed using QIIME2 and the obtained amplicon sequence variants were annotated using a multi-taxonomic assignment (MTA) approach based on 16S sequence identity versus the NCBI and RDP databases. We further applied the Phenotype Profiler tool to the annotated ASV abundance tables and calculated CPI values for butyrate and propionate production in each sample. The obtained CPI values showed the absence of correlation with experimentally measured concentrations of butyrate and propionate in fecal samples from five in vivo studies, which can be explained by highly efficient absorption of SCFAs in the large intestine. In contrast, the CPI values correlate with the level of SCFAs obtained from in vitro bacterial fermentation experiments of fecal inoculum.
Conclusion: The high concordance between in silico predicted and in vitro measured butyrate and propionate provide strong validation of the genomic-based phenotype profiling approach.

Aknowledgments
This research was supported by the Russian Science Foundation (grant #19-14-00305).

Posters
avatar for Maria Frolova

Maria Frolova

Institute of Cell Biophysics, Russian Academy of Sciences



Tuesday July 28, 2020 11:15 - 11:20 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

11:20 MSK

Pseudomonas Phage PaBG — A Jumbo Member of an Old Parasite Family
Bacteriophage PaBG is a Jumbo Myoviridae phage isolated from water of Lake Baikal. This phage has limited diffusion ability and thermal stability and infects a narrow range of Pseudomonas aeruginosa strains. Therefore, it is hardly suitable for phage therapy applications. However, the analysis of the genome of PaBG presents a number of insights into the evolutionary history of this phage and Jumbo phages in general. We suggest that PaBG represents an ancient group distantly related to all known classified families of phages.



Posters
avatar for Peter Evseev

Peter Evseev

Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russia



Tuesday July 28, 2020 11:20 - 11:25 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

11:25 MSK

Secondary Structure Prediction by Combination of Formal Grammars and Neural Networks
An approach for sequences secondary structure analysis by the composition of formal grammars and neural networks was proposed recently. In this work, we investigate the applicability of this approach for RNA secondary structure prediction. We show that it is possible to use residual networks to correct secondary structure features extracted by context-free grammars.

Posters


Tuesday July 28, 2020 11:25 - 11:30 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

11:30 MSK

iJump: a fast tool for tracking bacterial mobile elements rearrangements in course of adaptive laboratory evolution
Mobile elements rearrangements in bacteria may lead to gene inactivation or deregulation providing an important contribution to adaptation. While the challenge of mapping these rearrangements was addressed for individual genomes, no efficient tools are available for tracking their dynamics in evolving populations, such as in adaptive laboratory evolution (ALE).
We are using ALE in a custom-engineered continuous culture device (morbidostat) to study dynamics and mechanisms of antibiotic resistance in major gram-negative bacterial pathogens. Acquisition of mutations in evolving populations is monitored by deep sequencing of samples in time-series.
To observe evolutionary paths driven by “jumping” of IS elements we have developed the iJump software, which is using soft-clipped reads from the SAM/BAM alignment extracted from the boundaries of known mobile elements to find new junctions and estimate their frequencies. The performance of iJump was first tested on a simulated data set where it showed 1-4% error in frequency estimation. Application of the iJump tool to our ALE studies with Escherichia coli, Acinetobacter baumannii and Pseudiomonas aeruginosa confirmed its practical utility and revealed IS-driven bacterial adaptations to known antibiotics and novel drug candidates. The results were verified by Nanopore-based sequencing and MIC determination of selected individual clones. Software available at https://github.com/sleyn/ijump

Posters
avatar for Semen Leyn

Semen Leyn

Postdoctoral Associate, Sanford Burnham Prebys Medical Discovery Institute
Doing bioinformatics on several bacterial genomics projects including study of antibiotics resistance development for novel drug candidates and metabolic reconstruction in human gut microbiome.



Tuesday July 28, 2020 11:30 - 11:35 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

11:35 MSK

Dissecting the evolutionary mechanisms of the 3-domain Cry toxins diversity with CryProcessor
Biologicals based on the entomopathogenic gram-positive bacteria, Bacillus thuringiensis, represent one of the most widespread biopesticides. The potency and specificity of insecticidal action are mostly determined by a set of insecticadial moieties with various activities produced into the crystalline inclusions during the sporulation growth phase of the bacterium. Although some other virulence factors such as chitinases are produced by B. thuringiensis, Cry-toxins refer to the most useful and agriculturally applicable biopesticides. Cry toxins and their subset, 3-D  (three-domain) Cry toxins, not only possess a comprehensive range of affected hosts but also exhibit immense specificity. Unfortunately, an emerging resistance of insects to these toxins due to mutations in host receptor genes retards the progress of fruitful pest management. The two strategies that could contribute to solving this issue are the massive search for the novel toxins and the construction of artificial toxins by means of the domain shuffling. To discover new 3-D Cry toxins in continually appearing genomic data, we developed an HMM-based tool called CryProcessor that allows retrieving sequences of 3-D Cry toxins from large datasets of genetic data and provides an opportunity to get the layout of individual domains. This tool outperforms its analogs in terms of accuracy, speed, and throughput. Cry toxin domain layout provided by CryProcessor, will significantly facilitate the development of chimeric toxins by accelerating in silico construction of chimeric toxins. Considering the diversity of Cry toxins, one generally accepted yet not lucidly validated hypothesis links this diversification of the 3-D Cry toxins with domain exchange between various toxins. However, the only evidence supporting this idea was obtained by a comparison of protein sequences in small groups of the toxins. To fulfill this gap, we conducted a first large-scale phylogenetic study of the 3-D Cry toxins. Using CryProcessor, we screened the IPG and Genbank databases and identified approximately 600 novel toxins, which were then merged with toxins from the Bt-Nomenclature. We then constructed phylogenetic trees based both on full sequences and specific domains. The evaluation of topological differences between the trees revealed a noticeable discrepancy between the topology of the full sequence-based tree and the domain-only trees. We then screened sequences for signals of recombination events. As a result, we revealed 50 recombination events, and it is noteworthy that they belonged to each of the domains. Our results indicate that recombination events represent a pivotal mechanism for the evolution and diversification of 3-D Cry toxins. A clearer understanding of this process and a more in-depth look into the history of recombination events would allow us to develop new specific toxins precisely and efficiently.

Posters
AE

Anton E. Shikov

All-Russia Research Institute for Agricultural Microbiology (ARRIAM), Pushkin, St. Petersburg, Russia, St. Petersburg State University, St. Petersburg, Russia



Tuesday July 28, 2020 11:35 - 11:40 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

11:40 MSK

Metatranscriptome analysis of an endemic Lake Baikal amphipod microbiome
The littoral amphipod Eulimnogammarus verrucosus is an endemic amphipod of Lake Baikal. The exoskeleton of E. verrucosus is known to be the habitat of epibiont ciliates.
In the late summer and autumn of 2019, intensive fouling of amphipods by ciliates was observed, which led to the death of E. verrucosus individuals in the laboratory. Transcriptomic sequencing of E. verrucosus caught in September 2019, was performed. We hypothesized that the change in composition of the symbiotic community can be a reason for the death of animals. Preliminary microscopic analysis of the fouling confirmed that the epibiont belonged to ciliates.
The aim of the work was to analyze the symbiotic community of E. verrucosus.
We decided to assess the microbiome composition using: (1) 18S rRNA as a marker of biodiversity, and (2) coding RNA (metatranscriptome assembly). The transcriptome assemblies were performed using three different tools (MEGAHIT, rnaSPAdes, and Trinity). The best assembly obtained with rnaSPAdes was used for further analysis. The annotation and taxonomic analysis were performed using MMseqs2.
As a result, (1) according to rRNA analysis, the number of ciliates has not changed: (2) the diversity of ciliates in the animal with fouling was bigger and included the following species: Glaucoma chattoni, Paramecium caudatum, Didinium nasutum, Condylostoma magnum, Blepharisma musculus. These species were not found in the animal sampled earlier. However, the changes occurring in the microbiome composition were minor.
Moreover, we plan to continue studying the microbiome composition of endemic Baikal amphipods and to analyze the microbiomes of other species, as well as amphipods inhabiting different places.
The study was supported by the Russian Science Foundation/Helmholtz Association of German Research Centres (RSF grant number 18-44-06201).

Posters
JO

Julie Ozerova

Bioinformatics Institute, St. Petersburg, Russia



Tuesday July 28, 2020 11:40 - 11:45 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

11:45 MSK

Whole Genome Sequencing Of 101 Neglected African Crops
The African Orphan Crops Consortium’s (AOCC) goal is to sequence, assemble and annotate the genomes of 101 traditional African food crops to improve their nutritional content. This will provide long lasting solutions for Africa’s nutritional security. To ensure quality nutrition, increased food access through higher yields and sustainable national breeding programs using smarter, faster and economic technologies. The resulting information will be put in the public domain with the endorsement of the African Union. To provide localized solutions to address hunger, stunting, and chronic malnutrition of African children and population through the improvement of African crops.

Posters
avatar for Samuel Muthemba

Samuel Muthemba

Lab Technician, ICRAF



Tuesday July 28, 2020 11:45 - 11:50 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

11:50 MSK

Genome assembly and search for genetic markers of adaptation of shore flies (Diptera, Ephydridae) to extreme habitats
Dipteran insects characterized by the huge taxonomic diversity (more than 160 thousand species at the time of 2013 (Zhang, 2013)), the diversity of ecological niches, and adaptation to various extreme environmental conditions. At the same time, the mechanisms of these adaptations are poorly studied, especially at the genomics level. Now the NCBI Genome database contains only 165 dipteran genomes (for comparison, for 40 thousand vertebrates we have more than 1000 sequenced genomes). Different species of the large family Ephydridae (shore flies), about 2000 species, are adapted to the most adverse environmental conditions from saline and alkaline impoundment and hot springs to oil puddles (Kadavy et al., 2020), but the genomes of only two species of this curious family - Ephydra gracilis and E. hians (syn. Cirrula hians) - were sequenced. The genome of the third fly from family Ephydridae - E. riparia - were sequenced at the department of biological evolution at Lomonosov Moscow State University.
In this study we de novo assembled the E. riparia genome, assessed assemble quality and compared it with the assemblies of two related species.
Raw reads were filtered by Trimmomatic and cleaned of contamination by Kraken2. E. riparia genome was assembled by SPAdes, SOAPdenovo and Platanus programs, the best one was done by SPAdes. The quality was assessed by QUAST and BUSCO. We made structural genomes annotation and turned to the functional one. See more details on our GitHub repository: https://github.com/Terraslavonica/E_riparia.
E. riparia genome, about 600M bp, was assembled into scaffold with average coverage 11.2x, with N50 equal 3.5K. bp and L50 – 53K contigs. In assembly we detected 53% of genes typical for Diptera by BUSCO. Comparing with assembling of close relatives, E. gracilis (410M bp, cov. 9.4x, N50 2.1K, L50 46K, BUSCO 81%) and E. hians (399M bp, cov. 27.0x, N50 1.8K, L50 53K, BUSCO 37%), we can conclude that E. riparia assembling is good enough for further analyses.
Based on these three assemblies we carry out functional annotation of the genomes and search the genes that can contribute to adaptation to extreme habitats and stressful conditions, it might be genes encoding LEA-proteins, heat shock proteins, aquaporins, metal-transport proteins, proteins that are the part of the signalling cascades p38 and JNK MAPK, etc. (Craig et al., 2004; Xu et al., 2013; Davies et al., 2014; Benoit et al., 2014; Reidl et al., 2016; Huang et al., 2016; Muthusamy et al., 2017; Pawłowicz, Masajada, 2019; Das et al., 2020).

Posters
avatar for Ekaterina Yakovleva

Ekaterina Yakovleva

Lomonosov Moscow State University, Bioinformatics Institute
Evolutionary biologist and bioinformatician a bit



Tuesday July 28, 2020 11:50 - 11:55 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

11:55 MSK

Predicting CAZy profiles in wood-decay fungal communities with molecular ecological networks
Understanding the functional organization of ecological communities is essential for predicting their succession in stable and changing environments and designing effective strategies to control it. Functioning of terrestrial ecosystems strongly depends on the rate of organic matter turnover and fungi are the main drivers of this complex process in forests. In this work, based on ecological network analysis and functional predictions from amplicon sequences, we characterized between-species interactions in wood-decay fungal communities and mapped functional attributes to their biological networks.
The analysis is based on fungal abundance profiles obtained with high-throughput sequencing of rRNA gene internal transcribed spacer (ITS2). Ecological networks were inferred with SPRING (semi-parametric rank-based correlation and partial correlation estimation). Copy numbers of gene families, encoding extracellular enzymes involved in decomposition of plant biopolymers (e.g., cellulose, hemicellulose, and lignin degrading CAZymes) were reconstructed with PICRUSt2 based on the JGI MycoCosm database of reference genomes.
We compare the predicted functional profiles of undisturbed and degraded communities of wood-decay fungi and estimate the consequences of species loss for biotic interactions. We classify functional elements by their vulnerability to chemical pollution and by the importance in wood decomposition.

The work was funded by Russian Foundation for Basic Research (grant 18-29-05042).

Posters
VM

Vladimir Mikryukov

Senior researcher, Institute of plant and animal ecology UB RAS, Ekaterinburg
Institute of Plant and Animal Ecology, Ural Branch, Russian Academy of Sciences, Russia



Tuesday July 28, 2020 11:55 - 12:00 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

12:00 MSK

Metagenomic analysis of bacterial communities associated with sponges from the White Sea
Sponges (phylum Porifera) form a symbiotic relationship with the community of microorganisms.  Sponges and their symbionts produce various pharmacologically active substances that have, among other things, antibacterial properties. Metagenomic analysis of the microbiome allows to find out the taxonomic diversity and properties of the microbial community, but also opens up opportunities in the search for new secondary metabolites. We collected and analyzed metagenomes of four marine sponges Homoeodictya (Isodictya) palmata, Halichondria panacea, Halichondria sitiens, Myxilla incrustans and surrounding seawater, collected in Kandalaksha Bay (White Sea) in august 2016 and 2018. In 2018 metaviroms of the sponges and seawater were studied. Sequencing was performed on Illumina NextSeq, with approximately 100mln paired-end 150+150bp reads per sample. Raw reads were analyzed and filtered with FastQC and Trimmomatic. Metagenomes were assembled using metaSPAdes and quality was assessed with MetaQUAST. Contigs with a length > 5 kb were used to further analyses. Predicting of ORFs was carried out with MetaGeneMark. Analysis of secondary metabolite biosynthesis gene clusters (BGC) with antiSMASH in metagenomes revealed a NRPS, bacteriocins, lanthipeptides, LAPs, PKS and other BGC types (649 clusters totally). CRISPR-Cas systems were detected and classified with CRISPRCasTyper. The most common systems in sponges belong to the class 1, type I and subtypes I-C and I-F. Taxonomic annotation of contigs was performed with DIAMOND using blastx and NCBI nr database and results were submitted to MEGAN6. The composition of metagenomes is mainly represented by classes Gammaproteobacteria and Alphaproteobacteria. In addition, we observed an increase in the abundance of Gammaproteobacteria in all samples from 2018, especially genus Alteromonas and Pseudoalteromonas. We suggest that it can be related with an anomalously high temperature of seawater  in summer 2018 (Ereskovsky et al. 2019). Bacterial communities of sponges and seawater differ in their composition and diversity of species based on Bray–Curtis dissimilarity and PCoA. The microbiome of H. palmata and M. incrustans of 2016 and 2018 are similar to each other while the communities of H. sitiens and H. panicea are different. In the metaviromes the most abundant families are Myoviridae, Podoviridae, Siphoviridae and group of Prokaryotic dsDNA virus sp. We obtained 10 contigs that were identified as large DNA viruses with a length more than 100 kb and one contig with a length 227 kb.

Posters
AR

Anastasia Rusanova

Institute of Molecular Genetics of the Russian Academy of Sciences



Tuesday July 28, 2020 12:00 - 12:05 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

12:05 MSK

Genomic insights into the carbon and energy metabolism of the first obligate autotrophic acetogenic bacterium
Acetogenic bacteria are a specialized group of strictly anaerobic microorganisms that are able to produce acetate from two molecules of carbon dioxide via the Wood–Ljungdahl pathway during anaerobic respiration. The most prominent metabolic feature of acetogenic bacteria is chemolithoautotrophic growth with H2 and CO2 as the substrates. However, acetogens have to grow under tough competitive conditions, since methanogens are the dominant hydrogenotrophs in many anaerobic environments. Therefore, many acetogens show an outstanding metabolic flexibility, being capable of switching from autotrophic metabolism to heterotrophic one by utilizing a vast variety of organic substrates, like sugars, alcohols, organic acids, aldehydes, aromatic compounds. As a consequence, only facultative autotrophic acetogens are currently known.
The thermophilic acetogenic bacterium strain 3443-3Ac was isolated from a sediment sample collected from a terrestrial hot spring in Uzon Caldera, Kamchatka, Russia. Strain 3443-3Ac grew chemolithoautotrophically with H2 or formate as the energy source and HCO3-/CO2 as the carbon source and electron acceptor. The isolate was unable to utilize organic compounds being the first example of obligate autotroph among acetogens. To support this finding, the genome of 3443-3Ac was sequenced and analyzed. The genomic data allow to propose a model for electron and carbon flow during acetogenesis from H2/CO2 in strain 3443-3Ac. Electrons coming from molecular hydrogen are transferred to the electron carriers NAD+ and ferredoxin by the soluble electron-bifurcating hydrogenase HydABC. Reduced ferredoxin is used in the reductive reactions of acetogenesis (Wood-Ljungdahl pathway) as well as by the Ech complex to generate a pmf. The H+ gradient is harnessed for energy conservation by the H+-dependent F1F0-ATPase. Thus, strain 3443-3Ac is an ‘Ech-acetogen’ and uses a two-module respiration system comprising Ech complex and the H+-dependent F1F0-ATPase for energy conservation.
Phylogenetic analysis showed that strain 3443-3Ac was related to the representatives of the genera Thermoanaerobacterium, Thermoanaerobacter and Caldanaerobacter (a TTC group), representing a new genus within this group. However, with Thermoanaerobacter kivui as an exception, all TTC representatives are chemoorganoheterotrophic bacteria growing solely by fermentation and incapable of acetogenesis. Despite the inability of strain 3443-3Ac to grow organoheterotrophically, its genome encodes all the enzymes for glucose fermentation via glycolysis, and the closest homologs of these proteins are corresponding proteins in organoheterotrophic TTC members. Thereby, it could be suggested that the inability of strain 3443-3Ac to grow on sugars is linked with the loss of sugar transporters and carbohydrate-specific enzymes. Indeed, the genome analysis of TTC representatives revealed the presence of genes encoding phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS system) in all TTC members except 3443-3Ac. The set of genes encoding ATP-binding cassette (ABC) transporters is also significantly reduced in strain 3443-3Ac as compared to other representatives of TTC.
The likely evolutionary events that led to the emergence of the obligately autotropic acetogenic mode of life (an earlier unknown lifestyle) will be discussed in our presentation based on the results of comparative genomic analysis.
The work was supported by the Ministry of Science and Higher Education of the Russian
Federation, grant # 05.616.21.0124 (unique identifier RFMEFI61619X0124).

Posters
FE

Frolov E.N.

Winogradsky Institute of Microbiology, Federal Research Center of Biotechnology, Russian Academy of Sciences, Leninsky prospect 33 Bldg 2, 119071, Moscow, Russia



Tuesday July 28, 2020 12:05 - 12:10 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

12:10 MSK

Preliminary analysis of the soil microbiota associated with the gigantism of the plants of the Chernevaya Taiga of Siberia
Chernevaya taiga can be described as a boreal forest formation, limited in its spread by hyper humid sections of the Altai-Sayan mountainous region. It is characterized by a series of unique ecological traits, the most notable of which is the gigantism of the perennial grassy plants and bushes.
The main goal of the study is to discover and parametrize the main factors that affect the anomalously elevated effective fertility of the Chernevaya taiga soils, with the focus on the microbial communities. We aim to establish a link between the distinct properties of Chernevaya taiga with the chemical parameters of the soil, the rate of moisturization, the unique composition of the microbiota and/or the aggregate of all of these factors.
Based on 16S analysis of the soils from two Chernveaya taiga locations (Novosibirsk and Tomsk regions) and control soils, we found that the richness of the soil microbiota is decreasing significantly with increasing sampling depth. The taxonomic structure of the microbiota of the top layers (0-15 cm) has similar properties in the different geographical locations of the Chernevaya taiga. The most prevalent phyla in the top layers of the Chernevaya taiga soils are Proteobacteria, Acidobacteria and Verrucomicrobia.
Also we investigated differences in microbiota composition of the rhizosphere of Crepis sibirica between Chernevaya taiga and control regions, using linear discriminant analysis effect size approach. We found bacterial taxa that are relatively abundant between both groups. Bacteroidetes (in particular Sphingobacteria and Cytophagia) were more abundant in the control group, and Actinobacteria (mostly Thermoleophilia) and Verrucomicrobia (Chthoniobacterales) in the chernevaya taiga samples. It may indicate the specificity of the chernevaya taiga microbiome, and its importance for the features of this biotope.
This work was supported by Russian Scientific Foundation (grant ID 19-16-00049).

Posters
avatar for Mikhail Rayko

Mikhail Rayko

Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University



Tuesday July 28, 2020 12:10 - 12:15 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

12:15 MSK

Comparative analysis of pangenome structure in generalist and specialist bacterial species
Pangenome is a total set of protein-coding genes present in the collection of genomes belonging to one bacterial taxon. The pangenome structure is commonly divided into three parts: “core” - genes present in all given genomes, “unique genome” - strain-specific genes, and “periphery”, or “accessory genome”, - genes present in more than two, but not all given genomes. Various bacterial species differ by their core-to-periphery ratio. However, the source of this difference is still unclear.

There is an implication that the bacterial species habitat ubiquity could be a crucial factor defining the pangenome structure. The data on bacterial presence in various environments could be obtained from metagenomic experiments. We are using metagenomic data from the Earth Microbiome Project (EMP) [1] which is to date the biggest collection of metagenomic samples from a wide range of environments and geographic regions prepared according to a unified pipeline.

It is currently debated which part of pangenome structure (core or accessory genome) is responsible for bacterial adaptation to a broad spectrum of ecological niches. We hypothesized that members of a bacterial species that is present in many habitats (generalist species) need to have specific accessory genes in order to adapt to various conditions in different environments. On the contrary, there are specialist species restricted by few habitats and therefore, probably, more genetically homogeneous with a core genome prevalence in their pangenomes.

This work aims to determine if bacterial pangenome structure is associated with the number of habitats where a bacterial species is present. A particular attention is devoted to the development of environment classification based on the EMP samples similarity.

1. Thompson L. R. et al. (2017) A communal catalogue reveals Earth’s multiscale microbial diversity, Nature. – Т. 551. – №. 7681.

Posters
avatar for Daria Nikolaeva

Daria Nikolaeva

Kharkevich Institute for Information Transmission Problems (IITP RAS)



Tuesday July 28, 2020 12:15 - 12:20 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

12:20 MSK

RAD54L mutations lead to adenocarcinomas due to the formation of recombinant DNA intermediates
As Gary LeRoy et al. (2005) showed in a recent study demonstration of ATP-dependent activity of HJ moving branches from immunopurified FLAG-labeled RecQL1 eluate using the example of the human gene RAD54L. Also in the UCSC genome browser presented many publications related to mutant gene activity, which indicates the biomedical potential of the study of this gene.
Data were collected and maintained by the UCSC Genome Browser. For the gene under consideration, most matches with Human mRNA were found in the NCBI RefSeq, UCSC genes, and GENCODE databases. The ratio of most introns and exons (only the initial ones have different lengths), their length corresponds to Human mRNA and Human EST. When considering the RAD54L gene with different annotations in RefSeq, it is impossible to give an accurate assessment of the comparison with the experimental and predicted RNAs with PolyA end (there are no such RNAs). According to the RAD54L gene, sequencing in the cell culture K562 PolyA + predominates, found many pseudogenes. Most of the transcript is located in the cytosol and nucleosome. The expression level for CAGE is most observed in the nucleosome and cytosol.
The results revealed that this gene in the annotations was reasonably correctly calculated and interpreted. A large number of peptides are observed for the RAD54L gene, a high level of ribosomal profiling signals that coincide with the exon structure of the gene, therefore it is a protein-coding gene. According to COSMIC, multiple mutations were identified that correspond to all exons of the gene in endometrial and lung tumor diseases. Adenocarcinoma, colonic, somatic, Lymphoma, non-Hodgkin, Breast cancer, invasive ductal were found in the OMIM database.
Based on the results obtained, it can be argued that this RAD54L gene encodes a protein, which is an important link in many physiological, functional and pathological processes in the human body. The mutational activity of this gene is highly likely to lead to the development of pathologies of different genesis: the analysis showed an effect on the oncogenic process, the development of somatic diseases. It has a large role in the development of hereditary diseases. In conclusion, there is a sufficient number of publications on the information of this gene, which indicates possible promising studies of biomedicine in terms of prevention and treatment of somatic and oncogenic diseases of humans.

Posters
avatar for Bogdan Shcheglov

Bogdan Shcheglov

Student, Far Eastern Federal University



Tuesday July 28, 2020 12:20 - 12:25 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

12:25 MSK

Comparative analysis of Rhizobiales genomes using ANIClust: species reclassifications and identification of unauthentic genomes and false type strains
Taxonomic decisions on Rhizobiales order rested heavily on the interpretation of highly conserved 16S rRNA gene sequences and error-prone DNA-DNA hybridization experiments. Current pragmatic bacterial species definition considers that strains that present at least 95% of average nucleotide identity (ANI) belong to the same species. Here, we computed ANI from 520 genome sequence assemblies from type strains of species from the order Rhizobiales. Thresholds of ANI >=95 and percentage of conserved DNA >=0.7 were used to group the genomes using Maximal Clique Enumeration (MCE). Using this approach, we found that: i) there are synonymy between Aurantimonas manganoxydans and Aurantimonas coralicida, Chelativorans oligotrophicus and Chelativorans multitrophicus, Methylobacterium phyllosphaerae and Methylobacterium oryzae, Rhizobium fabae and Rhizobium pisi, Rhizobium azibense and Rhizobium gallicum, Rhizobium favelukesii and Rhizobium tibeticum, Rhodoplanes piscinae and Rhodoplanes serenus, and Brucella ceti, Brucella inopinata, Brucella microti, Brucella vulpis and Brucella melitensis; ii) Chelatobacter heintzii is not a synonym of Aminobacter aminovorans, iii) “Rhizobium halotolerans” and “Bartonella mastomydis” should remain as not validly  published species as they represent synonyms of other species already validated; iv) Bartonella vinsonii subsp. arupensis and Bartonella vinsonii subsp. berkhoffii do not belong to the same species; v) the genome accessions GCF_003024615.1 for Mesorhizobium loti LMG 6125T, GCF_003024595.1 for Mesorhizobium plurifarium LMG 11892T, GCF_003096615.1 for Methylobacterium organophilum DSM 760T, and GCF_000373025.1 for Rhizobium gallicum R-602 spT lack authenticity evidence; additionally, we provide an authentic genome for a type strain, SEMIA 4085T, of R. gallicum; vi) “Methylobacterium platani” SE2.11, "Xanthobacter autotrophicus" Py2 and "Aminobacter aminovorans" KCTC 2477 represent false type strains. Information of the general authenticity, completeness and contamination for the entire genome set is provided, which may be useful for researchers to select the best genome assemble for their purposes.



Posters
avatar for Camila Gazolla Volpiano

Camila Gazolla Volpiano

Departmento de Genética, Instituto de Biociências, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil



Tuesday July 28, 2020 12:25 - 12:30 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

12:30 MSK

Metagenomic Analysis Using k-mer-based Tools Reveal the Presence of Heavy Metal Response Genes in Cyanobacteria found in Copper Mining Sites of Benguet Province, Philippines
Heavy metal contamination in mining sites causes growth inhibition of green vegetation. Fortunately, there are photosynthetic autotrophs, the cyanobacteria that can survive in extreme conditions of the mine tailings. Surface water samples were collected from three sampling points in each Tailings Storage Facility (TSF) of Philex mines in Benguet Province, Philippines such as the re-vegetated Philex TSF1 and the currently active Philex TSF3. Genomic DNA was extracted from all water samples and subjected to shotgun sequencing. A total of 72.87 Gbases raw reads were successfully assembled using St. Petersburg genome assembler (SPAdes). A default and custom-based approaches for both CLARK v1.2.5 and Kraken2 metagenomic classifiers were used in determining taxonomic assignments to contigs using k-mer matches. Prokka was used for the rapid annotation and its output coding sequences were subjected to the evolutionary genealogy of genes-Non-supervised Orthologous Groups (eggNOG) mapper for the analysis of gene ontology. The default CLARK classified a large number of sequences across all sampling points in both re-vegetated and active mining sites. Taxonomic assignments revealed the top five cyanobacteria, namely, the unicellular Synechococcus sp., Cyanobium sp., and Gloeobacter sp., the filamentous, non-heterocystous Leptoplyngbya sp., and the filamentous, heterocystous Nostoc sp. Whereas the custom-based CLARK classified the Leptolyngbya sp., which is about 3% to 4% of the assembled contigs. On the other hand, Kraken2 results revealed the most dominant Rank Order Nostocales ranging from 0.05% to 0.63% of the classified sequences. The cyanobacterial custom-based Kraken2 revealed a large number of sequences belonging to filamentous Fischerella sp. and Trichodesmium sp. in Philex TSF1. A unicellular Microcystis and filamentous Nostoc sp., Spirulina sp., and Pseudanabaena sp. dominated the active Philex TSF3 site. CLARK was able to discriminate cyanobacteria up to the species level while the default Kraken2 classifier was able to distinguish up to the dominant Rank Order taxon. Although the custom-based CLARK detected more cyanobacteria at the Rank Order level compared to Kraken2, the former was only able to determine a single cyanobacterium at the genus level. Kraken2 revealed varying identifications of cyanobacteria in all sites while CLARK consistently identify the same cyanobacterial species among all sites. Protein-coding sequences output from Prokka that were evaluated using eggNOG revealed the genes conferring stress response to Cu2+, Zn2+, Pb2+, Cd2+, Ca2+ metal ions and smt metallothionein. These genes are reported to be responsible for the efflux/transport functions and heavy metal resistance that can be major attributes of cyanobacterial species for their survival to extreme metal conditions. Enhanced growth of Leptolyngbya sp. might also lead to probable formation of viable biological crusts initiating a re-vegetation process. This is the first report of filamentous cyanobacteria dominating the copper and gold mine tailings in Benguet Province successfully assembled and analyzed using a shotgun metagenomic approach.



Posters
avatar for Libertine Rose S. Sanchez

Libertine Rose S. Sanchez

Institute of Biology Postdoctoral Research Fellow Metagenomics, Metabarcoding, University of the Philippines Diliman
Plant Genetics and Cyanobacterial Biotechnology LaboratoryCIP Researcher, National Institute of Molecular Biology and Biotechnology



Tuesday July 28, 2020 12:30 - 12:35 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

12:35 MSK

Analysis of genetic variants associated with infection of Mycobacterium leprae in exomes of Mexican mestizo population
Leprosy, also known as Hansen's disease, is an infectious chorionic disease caused by the bacillus Mycobacterium leprae that mainly affects the peripheral nerves and the skin. This disease has accompanied man for at least 4,000 years and during all that time it has been one of the most dangerous diseases worldwide.

M. leprae has low virulence, to spread it is necessary the prolongated contact with the patient (close and prolonged contact) and have a genetic predisposition to acquire the disease. Mira and collaborators identified a locus within the PARK / PACRG gene that is associated with the susceptibility of the human population to develop leprosy. In mice, the NRAMP1 gene has been identified on chromosome 1, which controls both the susceptibility and resistance of intracellular pathogens.

On the other hand, several reports of single nucleotide polymorphisms (SNPs) associated with functions of resistance and susceptibility to M. leprae pathogenesis have emerged worldwide, therefore, in this project, we compile a basis of SNP data identified in all populations, associated with M. leprae pathogenesis, and compared with the SNPs identified in the "SIGMA T2D" project, a project aimed at the study of type 2 diabetes, which sequenced the exomes of around 3700 individuals of Latin American, specifically Mexican descent. This selected classification selects SNPs associated with M. leprae with greater probabilities of being present in exomes of the Mexican population for their in vitro study without the need to sequence complete exomes.

Currently, Mexico is among the 15 countries with the highest incidence of leprosy worldwide, with Sinaloa being one of the main states affected by this disease, ranking 1st in the nation in registered cases of leprosy with 150 cases registered at the end of 2019, therefore, it identifies genetic markers that help in the prevention methods of the disease is of vital importance for the development of strategies for its eradication.

Posters
ME

Miguel Elenes

PhD Student, Universidad Autónoma de Sinaloa



Tuesday July 28, 2020 12:35 - 12:40 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

12:40 MSK

Genome assembly of heterozygous tropical trees - will the real (pan)genome stand up?
High-throughput sequencing has the potential to greatly enhance population genetics studies through its power to genotype individuals at multiple loci. Though methods exist for obtaining genotypes and polymorphism data without a reference genome, having a reference sequence at least for the single-copy regions of the genome does help when one would like to compare diversity parameters across different species. As the individual genomes are highly heterozygous, genome assembly programmes struggle to deliver an acceptable reference sequence even for the non-repetitive part of the genome. Platanus, Platanus-allee, SPAdes and Meraculous were compared for their genome assembly capabilities. None of the programmes delivered a usable reference genome from the sequence data, approximately 12-25 x genome coverage of tropical tree species sequenced by pair-end Illumina sequence reads of 101 or 150 nucleotides.
Through k-mer frequency analysis at multiple k-mer lengths the genomes are shown to be highly heterozygous. The k-mer length at which the peak frequency of homozygous k-mers equals the peak frequency of heterozygous k-mers is proposed as a reliable measure to compare the level of polymorphism across heterozygous species. This measure fails though when the heterozygosity is lower and a k-mer longer than the read length would be needed to detect equal peak frequencies.
By “haplotype-specific k-mer walking” contigs longer than 10,000 bp for the two haplotypes could be reconstructed at a number of loci in several species (Xylia xylocarpa, Gluta usitata, Dipterocarpus tuberculatus). The two haplotypes in a single individual generally differ by about 1 polymorphism per 100 bp, more so in the intergenic regions, intermediate in the introns and less so in the exons. About 70% of polymorphisms are simple SNPs, about 5% insertion deletions from 1 to several hundred nucleotides and the rest variations in repeat number of mostly mononucleotide repeats. Sufficient number of read pairs contain two polymorphisms to phase most polymorphic sites. As the sequencing reads are derived from fragments with target insert size of 500 bp, the phasing is broken when polymorphisms are more than 500 bp separated, but the sequences still connect. The walking process needs to be further automated and parallelized to have any chance of building a more or less complete view of the single copy region of the genome.
The data from moderate to low coverage Illumina genome sequencing contain sufficient information for the assembly of long contigs representing the two haplotypes derived from the two genomes in heterozygous individuals.

Posters
avatar for Hugo Volkaert

Hugo Volkaert

Principal investigator, Center for Agricultural Biotechnology, Kamphaeng Saen Campus, Nakhon Pathom 73140, Thailand
I am a forest ecologist and population geneticist hoping to use DNA sequence data to study the evolution and adaptation of tropical trees to their environment. Currently trying to assemble tree genomes on a shoestring budget, using low coverage shot gun sequencing of single libraries... Read More →



Tuesday July 28, 2020 12:40 - 12:45 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

12:45 MSK

Genome-wide characterization of postinfectious functional dyspepsia-associated antibiotic-resistant Escherichia coli isolates from Mexico
Functional dyspepsia (FD) is one of the most common functional gastrointestinal disorders and affects more than 20% of the global population. FD is defined by the presence of fullness, plenitude, or epigastric or burning sensation, with no evidence of organic, metabolic, or systemic diseases that explain those symptoms. The exact etiology of FD is not clearly understood. However, one of the risk factors associated with developing this condition are gastrointestinal infections, where different pathogens have been related, Escherichia coli among them. E coli is a rod-shape Gram-negative bacterium commonly found as a commensal in the human microbiota; however, its genome plasticity has driven the evolution into pathogenic strains and to the acquisition of antibiotic-resistance properties. In this study, whole-genome sequencing (WGS) was used for the molecular characterization of two antibiotic-resistant E. coli isolates collected from postinfectious FD patients. Genomic DNA was extracted using a ZymoBIOMICS DNA Miniprep Kit and sequenced on Illumina Miniseq (2x150 PE). De novo assemblies by SPAdes v.3.12 and A5 assemblers were concatenated to generate final draft assemblies using Mix tool, which then were scaffolded using Medusa server. The draft genome sequences were annotated using Prokka v.1.12 and analyzed regarding phylogroup, multilocus sequences typing (MLST), serotyping, plasmid replicon, acquired antimicrobial resistance and virulence-associated genes using MLST v.2.0, SerotypeFinder v.2.0, PlasmidFinder v.2.0, ResFinder v.3.2 tools and BLASTn search against Virulence Factors Database (VFDB), respectively. According to the in silico typification EC-FD20-2 and EC-FD21-2 strains were classified as ST399-O13:H30 and ST69-O17/O77:H18, respectively. A total of 9 genes conferring resistance to aminoglycosides, quinolones, macrolides, phenicols, sulphonamides, tetracycline, and trimethoprim were identified. Neither β-lactamase genes nor mutations in the quinolone resistance-determining region (QRDR) were detected. A class 1 integron linked to IncFII type plasmid was identified in EC-FD21-2 genome. The WGS analysis revealed that both E. coli strains harbored virulence-associated genes; nonetheless, EC-FD21-2 genome encoded different adherence and iron uptake systems compared to EC-FD20-2 genome. Additionally, EC-FD21-2 housed the increased serum survival protein (iss), Endonuclease colicin E2 (celb), and Enteroaggregative immunoglobulin repeat protein (air) virulence factors giving an insight of its host colonization and adaptation advantage. To the best of our knowledge, this is the first WGS characterization of antibiotic-resistant E. coli isolates recovered from postinfectious FD patients in Mexico. The genomic data evidenced the basis of antibiotic-resistance and the pathogenic potential of these E. coli strains allowing a correct characterization.

Posters
avatar for José Antonio Magaña-Lizárraga

José Antonio Magaña-Lizárraga

Doctoral student, Unidad de Investigaciones en Salud Pública “Dra. Kaethe Willms”, Facultad de Ciencias Químico Biológicas, Universidad Au



Tuesday July 28, 2020 12:45 - 12:50 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

12:50 MSK

Strain specific traits in the protein production associated with vegetative cells-to-spores transition in Bacillus thuringiensis
Organic agriculture and the trend of reducing usage of chemical pesticides require the development of biological pest-control methods. They include using natural pathogens of insects, like Bacillus thuringiensis. It is Gram-positive bacterium, which produces a great variety of different toxins of proteinaceous and non-proteinaceous nature. Among them, highly specific crystal-forming Cry-toxins are accumulated upon transition from the stage of vegetative cells to spores. The set of Cry-toxins produced by each strain of B. thuringiensis, is remarkably diverse and determines the host-specificity of the strain. The Cry-toxins’ genes are harbored on the different plasmids, which also contain genes encoding proteins involved in the process of sporulation. The strains of B. thuringiensis differ in the number of plasmids in their genomes, but the strain specificity of proteins produced in spores and vegetative cells at proteomic level remains poorly studied. In this study we used HPLC-Orbitrap-MS proteomics to quantitively compare the production of proteins at two stages, vegetative cells and spores, in three different B. thuringiensis serovars, var. thuringiensis, var. darmstadiensis and var. israelensis. Also, we compared B. thuringiensis var. israelensis with one strain of the same serovar, which lacked the ability to produce Cry-toxins. As expected, Cry-toxins were identified at spore stage in all strains except the one, which could not produce them. We also identified a set of proteins differentially expressed at the stage of spores including spore coat proteins, flotillin-like proteins and exosporium proteins. These proteins participate in the cell differentiation and exosporium attachment to the spore. Taking together, the data obtained in this study revealed the differences between proteomes of B. thuringiensis strains at the stages of vegetative cells and spores and have shown the similar patterns in the protein production across different serovars.
This work was supported by the Russian Foundation for Basic Research (Grant No 20-316-70020).

Posters
KA

Kirill Antonets

All-Russia Research Institute for Agricultural Microbiology, Saint Petersburg State University



Tuesday July 28, 2020 12:50 - 12:55 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

12:55 MSK

Metagenomic analysis of virus diversity in cave water habitats
Aquatic viruses have been extensively studied over the past decade, yet aspects of virus communities in cave waters remain poorly described. Our goal was to characterize viromes of cave water sampled in oligotrophic environments where Proteus anguinus, also known as olm or European cave salamander is present. Due to their dependence on in many cases sensitive water habitats, amphibian species are vulnerable to a variety of threats including viral infections.
Water samples (5 litres) form 7 different locations of underground cave system in Slovenia were first concentrated using CIM monolithic chromatography, a method that can efficiently concentrate viruses from high-volume water samples. Then, we used shotgun high-throughput sequencing followed by direct similarity search of sequencing reads against comprehensive database on protein level and subsequent taxonomic classification and visualization. Reads classifying as Caudovirales bacteriophages were most abundant in all cave water samples. Nucleocytoplasmic large DNA viruses from Asfarviridae, Iridoviridae, Mimiviridae, Phycodnaviridae families were also abundantly detected together with virophages (Lavidaviridae) that require a coinfection with giant DNA virus. ssDNA phages from Inoviridae and Microviridae family were detected as well as sequences of eukaryotic circular Rep-encoding single-strand (CRESS) DNA viruses. Sequences of ssRNA plant infecting viruses were abundantly present in some cave water samples, part of them possibly reflecting antropogenic contaminations. Targeted detection of ranaviruses (from the family Iridoviridae), the main viral threat to amphibian diversity, showed negative results using qPCR and these viruses were also not detected in metagenomics analysis of cave water samples.
Overall, our findings provide insight into cave water viromes describing common virus community of karstic underground caves system and identifying specific differences in pathogens and viral indicators detected in different sampling sites.

Posters
avatar for Katarina Bačnik

Katarina Bačnik

National Institute of Biology, IPS "Jožef Stefan" Slovenia



Tuesday July 28, 2020 12:55 - 13:00 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

13:00 MSK

Reducing redundancy of input data sets to improve inference of transcription factor binding sites
The majority of bacterial genome annotations lack information about transcription factor (TF) binding sites (operators) which control how genomic information is expressed. We are developing an application (SigmoID) to solve this problem in a highly automated fashion. SigmoID can both discover unknown operator motifs and annotate matching operators in genomic sequences. In brief, the motif discovery algorithm involves analysing 3D structures of TF-operator complexes, finding TFs with the same contacts between operators and DNA-binding domains and then looking for autoregulatory operator motifs in the promoter regions surrounding the genes encoding these TFs (more detail will be provided in the accompanying talk).
    The success of motif discovery strongly depends on the diversity of promoter region dataset. Assembling appropriate datasets proved to be challenging due to large sizes and rapid expansion of protein databases. This report describes our solution to this problem.
    The first step of our pipeline includes finding TFs homologous to the one being studied and selecting the homologues with identical specificity determinant or CR-tag (amino acid residues specifically contacting operator bases). This stage proved to be highly unreliable if public phmmer or blastp servers were used. Local searches require fast workstation and maintaining large databases which is undesirable taking into the account target audience (bench scientists). Also, many thousands of homologous proteins with matching CR-tag are expected for many TFs, while not more than 30-50 are usually required. Therefore, we have replaced the problematic database search step by fast lookup tables. The tables match CR-tag to IDs of all proteins with this tag. They are generated once for each protein family by running hmmsearch and determining CR-tags for each hit. The excessive redundancy problem was mostly solved by using reference proteome databases provided by PIR. For each protein family, five lookup tables were built: from full protein database and reference proteomes at 75%, 55%, 35% and 15% co-membership thresholds.
    The optimal homolog number can often be achieved by simply taking IDs of the proteins from one of the five lookup tables. In cases when homologue number is still excessive, an additional clustering stage is performed after extraction of the corresponding promoter regions. We found MeShClust (doi:10.1093/nar/gky315) to be the optimal tool at this stage.
    The efficiency of different clustering approaches and database search options was tested by inferring operator motifs for E. coli TFs from several protein families. The double clustering approach proved to be the fastest and produced better motifs in some cases as it didn’t have to resort to random selection of suboptimal promoter regions when their number was excessive. We have also noticed many cases of SigmoID producing realistic motifs (matching experimental data and suitable for genome-wide search) when such a motif was not present in the RegulonDB database or was incorrect.
The SigmoID v2 software with CR-tag lookup tables for 13 TF families is available at github.com/nikolaichik/SigmoID.

Posters
PV

Pavel Vychik

Belarusian State University



Tuesday July 28, 2020 13:00 - 13:05 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

13:05 MSK

Computational tools for the de novo assembly of bacterial genomes: a comparative study. 
The emergence of next generation sequencing technologies (NGS) has radically transformed the techniques for identifying nitrogenous bases in DNA and has boosted the production of scientific research across the globe, especially in the field of sequencing and genetic analysis of bacterial organisms.        
The post-sequencing step - the computational assembly of DNA fragments - is a complex process and highly dependent on the platforms on which the organisms were sequenced, especially on the methodology adopted by each of them to obtain the biological data. This dependence, linked to several other factors, ended up creating favorable conditions for a large-scale production of assemblers of bacterial genomes, in which each one has different configurations, instructions and operating parameters. In addition, some of these software are difficult to install, require time to understand because of the long manuals and, sometimes, the user is unable to achieve the expected result, either by the data set used as input or by not knowing the flow of operation of the program. Thus, in addition to being concerned with complex biological factors, the bioinformatics professionals also need to have technical knowledge in computing to choose the ideal tool for their project.
Furthermore, the correct choice of the assembly tool to be used in the research is extremely important for its success. However, the scarcity of current works that deal with the performance and usability of these software makes the choice of an assembler something difficult to be done. Given the above, the work developed will provide information on the performance, precision and usability of 7 bacterial genomes assemblers, which were compared with each other using six SRA samples from the second generation platform Ion Personal Genome Machine (IonPGM).    
In our study we tried to describe in detail how the input data influence in the performance of the programs and the final quality of their assemblies. For that, quality metrics were used that allowed us to assess the accuracy of the results produced and also in the analysis of the software performance and behavior. Moreover, when evaluating the general level of usability and implementation, we found that the programs that work via terminal are easier to run compared to those that use configuration files, as these require more time to understand the workflow.
In the end, this research will assist professionals in the bioinformatics field in choosing the most appropriate tool for their project and that meets their needs, in addition to contributing to the advancement of techniques related to the assembly of bacterial genomes. Thus, the study developed becomes an important source of information for researchers in the field of computational biology, collaborating for scientific production in the area.

Posters
GS

Gustavo Silva

Instituto Federal de Educação, Ciência e Tecnologia da Bahia (IFBA) - Campus Seabra
MB

Matheus Brito de Oliveira

Teacher, IFBA
Master in Applied Computing



Tuesday July 28, 2020 13:05 - 13:10 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

13:10 MSK

First insights in transcriptome-derived SNPs of two cryptic species Lasiopodomys gregalis and L. raddei (Cricetidae, Rodentia)
Recently described narrow-headed vole species complex consists of two cryptic species: widespread in the Palearctic realm Lasiopodomys gregalis and Lasiopodomys raddei, whose distribution range is confined to the South-Eastern Transbaikalia.
Next generation sequencing is a powerful tool that could provide a clue to cryptic speciation processes. In this study, we use 6 RNA-seq data sets (3 for each species) in order to find and annotate variants that possibly contribute to the speciation process of L. raddei and L. gregalis.
The raw reads were assessed and trimmed by FastQC and TRIMMOMATIC, correspondingly. The reference transcriptome was prepared on the basis of one L. raddei RNA-seq data set assembled by Trinity. For final reference, we choose contigs that match any mammalian genes (DIAMOND BLASTx against nr NCBI, E-value < 10-5). The alignment was created by the bwa-mem algorithm and resulted in a mean coverage of 96%. The .bam files were sorted and filtered in Picard, variant calling was held by GATK HaplotypeCaller.
The final list included 904 SNPs contrasting in L. gregalis and in L. raddei. The contigs having SNP were annotated by eggNOG mapper and Gene Ontology Resource (http://geneontology.org/).
Our search resulted in revealing at least 73 genes involved in reproduction, cell division, developmental and metabolic processes. For example, the list of identified genes included Bag6 involved in spermatogenesis, the replication protein RPA2, also playing role in reproductive processes, KLHL21 (Kelch-like protein) required for efficient chromosome alignment and cytokinesis, Dnm1l (dynamin-1-like protein gene) functions in the mitochondrial and peroxisomal division.
The study was supported by the Russian Fund for Basic Research grant No 18-34-20118 and Russian Academy of Science research program No АААА-А19-119020790106-0.

Posters
avatar for Maria Skazina

Maria Skazina

Researcher, Bioinformatics Institute; Saint-Petersburg State University, Department of Applied Ecology



Tuesday July 28, 2020 13:10 - 13:15 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

13:15 MSK

TRANSCRIPTOME ANALYSIS OF GRASS FROG RANA TEMPORARIA
The study of the oogenesis stage in the cloning experiments has shown that genome reprogramming depends on the composition of ooplasm and nucleoplasm. As a result of the comparison of X. laevis nucleus transcripts (without karyosphere) and ooplasm, it was revealed that coding genes prevail in the ooplasm and transcripts from repeated DNA in the nucleoplasm. Many DNA repeats are actively transcribed in early embryogenesis when the parent genome is activated, and when they are inhibited, further embryo development is impossible. It is assumed that the adhesiveness of tandem repeats plays a crucial role in the movement of chromosomal areas in the interphase nucleus.
The reads were cleaned from adapters and optical duplicates and then assembled using the Transdecoder program. The cDNA obtained from the extracted RNA was sequenced on Illumina HiSeq 2500, and 11,552,749 paired reads were obtained. The transcript collected 3961 or 14.6% of sequences were annotated. Most (93.48%) of the unassembled reads were localized in the draft version of the R. temporaria genome assembly (NCBI assembly: GCA_009802015.1) using the STAR program. Besides, we compared all collected continents with the NCBI Nucleotide database, but only for 4078 sequences, we found a match (9.91%). In the course of data analysis, we found that despite the presence of the genome in the draft assembly, most of the transcripts are absent in nucleotide databases. Repeats (including satellite DNA) were found in predicted sizes from 5 to 114 Mb with the highest frequency, most of them are not described in the databases except for CR1 and frog major satellite S1a. We present the first time qualitative and quantitative analysis of the karyosphere transcriptome.

Posters
VD

Victoria Dikaya

Applied Genomics Laboratory, SCAMT Institute, ITMO University, Saint Petersburg, Russia



Tuesday July 28, 2020 13:15 - 13:20 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

13:20 MSK

Genomic diversity and phenotypic characterization of drug resistance in Mycobacterium abscessus complex between Colombian and U.S. strains
Background: Mycobacterium abscessus complex (MABC) is the most common etiological agent of lung disease caused by rapidly growing mycobacteria. MABC Infection is particularly important in patients with cystic fibrosis leading to an accelerated decline in lung function and poor clinical outcomes. M. abscessus subsp. abscessus, M. abscessus subsp. massiliense and M. abscessus subsp. bolleti configured the MABC, differentiation is important as they differ in their susceptibility profile. MABC is considerate a multidrug resistance pathogen with cure rates between 30 to 50%. The aim of this study was to identify antimicrobial susceptibility profile and genomic variants related to resistance and diversity on Colombian isolates against U.S. isolates with previously known subspecies identification.

Methods: seventeen respiratory isolates were selected and cultured in Middlebrook 7H9/11. Drug susceptibility testing(DST) to clarithromycin, amikacin and cefoxitin were performed applying the broth microdilution method, the minimum_inhibitory_concentration was determinated in day 3 and day 14 in the case of clarithromycin, and day 3 for amikacin and cefoxitin. DNA isolation was performed using the FastDNASpin kit (MD biomedicals). DNA samples were sent to Yale University for whole genome sequencing using the Illumina HiSeq-2500 (2x150 bp). In silico data preprocessing was performed using Trimmomatic software and FastQC to assess their quality. We randomly selected genomic data from eleven respiratory isolates from U.S. Reads were mapped to ATCC 19977 with Bowtie2. The search for SNPs and indels were done using Samtools and Bcftools. To the multiple genomes alignment was used Mauve and SNPs phylogenomic tree was performed using snp-sites and IQtree with maximum likelihood method.

Results: We differentiate the MABC members into the three subspecies building a midpoint rooted tree using the 232.501 SNPs detected. The tree revels the presence of the three subspecies in Colombian isolates: two M. abscessus subsp. massiliense , four M. abscessus subsp boletti and 11 M. abscessus subsp. abscessus. DST to clarithromycin showed that 5/17 (29%) were susceptible, and 8/17 (47%) were resistant by day 3; on day 14 only 4/17 (23%) remained susceptible and 7/17 (41%) were resistant. For amikacin 14/17 (82%) were susceptible and none of the isolates were resistant. Using IGV to identify variants, we observed a 274 bp deletion (159-432 erm 41 gene) from isolates MAB205-MAB1050 (Colombia) and FLAC005- FLAC006- FALC054-FLAC055- FLAC030 (U.S.). Additionally, we identified different patterns when analyzing SNPs between the two different populations (e.g. a new allele at position 2,346,018 erm 41 gene in U.S. strains) and deletions at 4,080 position MAB_4039c gene in U.S. strains.

Conclusions: Our findings support the differentiation of MABC into three subspecies. Amikacin was the most effective antimicrobial between the three tested. Susceptibility to clarithromycin was very low, which could be explained by the low prevalence of M. abscessus subsp. massiliense in our isolates. We found SNPs and indels related to resistance and population traits, the latter could be used as geographic markers. However, due to the small sample of isolates studied, further studies are necessary to confirm the findings and explore additional differences between MABC from both regions.

Posters
LN

Laura Natalia Victoria

Corporaciones para investigaciones biológicas CIB



Tuesday July 28, 2020 13:20 - 13:25 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

13:25 MSK

Transcriptome analysis revealed abnormalities in skeletal muscle regeneration affected by mutations in LMNA gene
Introduction. LMNA gene encodes proteins lamin A and C that form nuclear lamina. Lamina locates on the nuclear periphery and keeps its structure, controls chromatin organization, participates in gene expression and cell division. Mutations in LMNA cause the diseases called laminopathies. These diseases lead to a muscular dystrophies, cardiomyopathies, lipodystrophies, neuropathies and premature aging syndromes. Interestingly, that many laminopathies are muscle specific, and occurs in adulthood. Molecular mechanisms of disease development and progression remain unknown despite a large number of publications on this topic. In this work we investigate lamin mutations G232R that results in Emery-Dreifuss muscular dystrophy 2 and R482L associated with familial partial lipodystrophy type 2. The aim of our study was to investigate the effect of mutations G232E and R482L in LMNA gene on skeletal muscle regeneration and functioning in vitro using transcriptome sequencing.
Methods. The cell line of mouse myoblasts C2C12 – stem cells of skeletal muscles - were transfected with lentiviruses containing mutant variants of human LMNA: WT (control), G232E, R482L. The effectiveness of infection was assessed by immunocytochemical staining with human lamin antibodies. Differentiation of myoblasts was performed in the myogenic direction using a medium with a low serum content (HS 2%). RNA was collected on d0, d2, and d4 days of differentiation, each state represented in triplicate. Libraries for RNA sequencing were prepared using the TruSeq kit, and sequenced with HiSeq 2500, Illumina. Raw data were aligned to the mouse genome GRCm38 with the annotation GENCODE vM22; the number of reeds was calculated using the featureCounts program. Analysis of differential expression (DE) and pathways were performed in R using the DESeq2 and fgsea packages. Statistically significant results were selected with FDR=1% and log2fc>1 for DEGs and FDR=5% for pathways.
Results. The structure of the nuclear lamina in undifferentiated myoblasts with G232E and R482L mutations was disrupted – it was in condensed state and formed aggregates. However, all tree transgenic cell lines successfully differentiated and formed myotubes. We found a significant decreasing in fusion coefficient for mutant cells. Accordingly, the expression of regulators of myoblasts fusion Myom and Mymx was higher in WT cells. Inside undifferentiated myoblasts with mutations we found differentially expressed genes and pathways that responsible for activation of myogenesis and cell cycle arrest signatures. However, we did not observe spontaneous differentiation of myoblasts. We conclude that cells with LMNA mutations are more committed to myogenic direction than WT. We found the upregulation of myogenic and mitotic pathways in mutant cells on d2 and d4 with respect to WT condition. These indicate that the balance between differentiation and proliferation was impaired in G232E and R482L. In G232E mutation in spite of increased OXPHOS parameters and upregulated pathways responsible for mitochondrial respiration, increased respiration most likely is a result of incomplete substrate oxidation. In R482L cells both glycolysis and OXTPHOS have been suppressed.
Conclusion. We showed that mutations G232E и R482L in LMNA gene change the morphology of nucleus, myoblasts commitment and myotubes metabolism.
The work was carried out with Russian Science Foundation grant #16-15-10178-П.

Posters
avatar for Oksana Ivanova

Oksana Ivanova

Junior Researcher, Almazov National Medical Research Centre
Hi! My name is Oksana and I am from St. Petersburg ITMO University. Currently I have finished my master’s studies in bioinformatics and systems biology. Now I investigate the effect of different mutations on muscles and heart at Almazov Centre.



Tuesday July 28, 2020 13:25 - 13:30 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

13:30 MSK

Comparison of extraction methods and sequence platforms for complex water sample
Viruses are one of the most important threats to agriculture. The diversity and impact of viruses present in various sources of irrigation water, such as rivers or processed wastewater, remains understudied. High-throughput sequencing is the most well rounded approach for exploring the virome of irrigation water.. However, water as a sample for metagenomics analysis is incredibly complex. It can contain large amounts of very diverse genetic material that is not only limited to viruses. This makes it challenging to successfully detect individual viral species with certainty. On the other hand, the amount of plant virus genomic material present in water samples is very low and we must employ multiple concentration and purification steps to maximize the sequence yield and quality, without introducing too much bias. In preparation for large-scale irrigation water sampling, we have been optimizing individual steps of our analysis workflow. Here, we describe the comparative analysis done on a single sample of a Serbian river water used for irrigation. We compared two nucleic acid extraction protocols on the same original sample. The modified Trizol protocol for RNA extraction has the advantage of producing high quantities of good-quality RNA with relatively high length of nucleic acids fragments, however, it can be time consuming. On the other hand, Qiagen’s QIAmp MinElute Virus Spin Kit is a faster, more user-friendly alternative that consists of a well-standardized extraction method that targets both DNA and RNA. Assessment of the extraction methods was done using the Illumina MiSeq platform. Samples were normalized based on read count and compared in terms of viral families’ richness, with special consideration for RNA viruses, as the majority of plant viruses have RNA as their genetic material. We have identified several viral species that were detected in these samples and compared the genome coverage obtained with the two extraction methods. The bioinformatics analysis was done using Qiagen’s CLC Genomic Workbench software for pre-processing of reads and majority of individual species mappings, Diamond blastx for similarity search of obtained sequencing reads against GenBank nonredundant database, and MEGAN6 for visualization and comparison of Diamond results. Results indicated the advantage of using the modified Trizol extraction protocol for detection of plant viruses. Having this in mind, we have also assessed and compared the performance of Oxford Nanopore Technologies MinION platform using the Flongle flow cell paired with Ligation Sequencing Kit for library preparation of a Trizol-extracted RNA from the same sample in order to evaluate ability of a long read sequencing platform to increase reliability of detection.

Posters
avatar for Olivera Maksimović Carvalho Ferreira

Olivera Maksimović Carvalho Ferreira

Young researcher, National Institute of Biology Slovenia, IPS "Jožef Stefan" Slovenia
Hello everyone,I am a young researcher at the National Institute of Biology in Slovenia working under INEXTVIR project. For the most part I am working with plant viruses and water and their mutual relationship. At the moment we are predominantly testing irrigation water (via HTS sequencing... Read More →



Tuesday July 28, 2020 13:30 - 13:35 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

13:35 MSK

The elements of CRISPR-Cas-like system in genomes of 3 ecotypes of Arabidopsis thaliana
It is well-known that mitochondria in higher plant species have an extremely large genome compared to small-sized genomes of some bacterial species. The genome of higher plants mitochondria is actively involved in horizontal gene transfer processes where it can act both as a donor and a gene acceptor. Another important feature of higher plants mitochondrial genome is the presence of species-specific sets of linear and circular plasmids in these organelles of many plant species studied in this regard. These plasmids behave like typical mobile genetic elements in terms of ability to perform gene transfer processes. It was shown earlier that mitochondrial plasmid of Vicia faba contains canonical CRISPR (clustered regularly interspaced short palindromic repeats) locus (Mojica et al., 2000).Taking into account the evolutionary origin of mitochondria and plant mitochondrial genome structure, we have attempted by in silico methods to search for genetic elements similar to those of bacterial and archaeal CRISPR-Cas systems in nuclear genomes of 3 ecotypes (Col-0, Ler, C24) of model plant Arabidopsis thaliana.
We have found sites corresponding to the organization of CRISPR loci of prokaryotic type in mitochondrial and nuclear genome of A. thaliana. Contextual analysis of complete sequence of mitochondrial genome of A. thaliana(ecotype C24, Genbank Accesion Number JF729200) allowed us to discover a site whose structure completely corresponds to the organization of CRISPR loci of prokaryotic origin. This CRISPR locus is formed by 3 perfect direct repeats, separated by 2 spacer sequences. Analysis of these sequences using a database of plant viruses showed that the detected spacers have homology with the DNA of two strains (isolate Cabb B-JI and altered virulence isolate D/H) of cauliflower mosaic virus, which is able to infect A. thaliana.
The search for the genetic elements of adaptive immunity of the prokaryotic type in the nuclear genome of A. thaliana made it possible to detect elements of the CRISPR-Cas system on all 5 chromosomes of this species in the form of relatively numerous CRISPR loci and some putative cas genes. The number of CRISPR loci ranged from 16 on chromosome 3 to 23 on chromosome 5.
We suggest that the main functions of the CRISPR-Cas-like system elements found in A. thaliana plants can be protection not only from viral and plasmid DNA, but possibly from any DNA of foreign origin. Nowadays there is no particular hypothesis about the origin of CRISPR-Cas-like elements in plant genetic apparatus. We believe that such elements may have appeared and then remained partially conserved during the eukaryogenesis since such an ancestors of eukaryotes as archaea and alphaproteobacteria possessed them.
The discovery of the components of adaptive immunity in plants creates, in addition to existing methods of genomic editing, a novel one using plant native CRISPR-Cas-like system permitting to create transgenic plants with much more wide spectrum of economically valuable properties for general consumption.

Posters
avatar for Ivan Petrushin

Ivan Petrushin

Assistant Professor, Irkutsk State University



Tuesday July 28, 2020 13:35 - 13:40 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

13:40 MSK

Genome Assembly of Microbes By Leveraging Evolutionary Relationships
Microbial genomics has seen rapid improvements in the past decade primarily due to the development of novel algorithms capable of assembling the data generated by a variety of next-generation sequencing technologies, into a high quality genome. Depending on the sequencing technology, type of libraries, and the complexity of the genome, this has most often resulted in the generation of draft genomes. The completion of these microbial genomes however, have remained a challenge. Recent technologies capable of producing extremely long reads allow for the determination of the complete genomes of microbes. However, the cost-effectiveness of short-read technologies has resulted in the deposition of 4,68,154 (as of Dec 2019) permanent-draft genomes (i.e., genomes unlikely to be ever completed) in the NCBI database, while the number of complete genomes is only 16,814. Out of these 4,68,154 genomes, 2,62,766 were obtained from the surveillance project, which has increased drastically since 2017 from 13 to 5,883 in 2018 and 2,56,860 in the year 2019. Unfortunately, a large number of these organisms are unavailable in any culture collection for resequencing using long-read technologies, in order to complete the genome. With some exceptions, the short-read data of these genomes available in the short-read archive (SRA) contains information corresponding to the entire genome. When a closely related genome is available, this can be used as a reference to map the short-read data to determine the genome, and often times this performs better than a de novo assembly. We propose a workflow to use information from multiple-reference genomes to obtain an improved assembly (as compared to either single-reference mapping, single-reference-guided, or de novo assembly) of microbial genomes using short-read data from the SRA. It is envisaged that with the increase in the number of complete genomes of a given Genus of microbe in the NCBI, the information contained in the genomes of related microbes can be exploited to obtain an assembly with improved contiguity, and with no loss in strain-specific information, using the original short-read data from the SRA. A proof-of-concept using simulated short-read data sets of E. coli is presented to highlight the improvements in the final assembly guided by multiple reference genomes.



Posters
US

Urmi Shah

Research Intern, Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh.
Working as a research intern for the first time, under the guidance of Dr. Srikrishna Subramanian has been a great opportunity to learn and expand my horizon in the field of Genomics. I have worked on the project of improving a genome assembly of microbes. Looking at the increased... Read More →



Tuesday July 28, 2020 13:40 - 13:45 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

13:45 MSK

The diversity of Methanogenic and Methanotrophic Archaea in bottom sediments of the Yenisei River
New knowledge about methanogenic and methanotrophic Archaea diversity and activity in bottom sediments is important for understanding of processes involved to the transportation of carbon and biogenic compounds from terrestrial ecosystems to aquatic ones, and their biogeochemical transformation. The bottom sediments of lakes and rivers are the hotspots of methane production, but composition and activity of methanogenic and methanotrophic microorganisms remains poorly described for the bottom sediments of the great Arctic rivers. In our study we collected the bottom sediments accumulating along the left and right banks of the Yenisei River in 18 sites located between 56.0ºN and 67.4ºN. All sediment samples were immediately frozen (-18oC) and kept frozen during transportation to laboratory. In parallel with sediment sampling we collected samples of dissolved greenhouse gases by head space method and further measured CO2 and CH4 concentrations and δ13С-CH4 and δ13С-CO2 (Picarro 2201-i, USA).
Metagenomic studies were performed in the Core Centrum ‘Genomic Technologies, Proteomics and Cell Biology’ (All-Russia Research Institute for Agricultural Microbiology). DNA was isolated using the MN NucleoSpin Kit (MN, Germany). A Precellus 24 homogenizer (Bertin, USA) was used as a destructive mechanical action. The quality of DNA isolation was checked electrophoretically (1% agarose gel, Bio-Rad, USA) and by PCR (Bio-Rad T100 Thermal Cycler). DNA sequencing was performed via Illumina MiSEQ sequencing system (USA), with primers F515 (GTGCCAGCMGCCGCGTAA) and R806 (GGACTACVSGGGTATCTAAT) for V4 variable region of the 16s rDNA gene, according manufacturer`s manual. Sequence processing was performed in R, using followed libraries: phyloseq, dada2, ShorReads, Biostrings and ggplot2.
    In total 29 prokaryotes belonging to Achaea were identified in sediment samples including 15 metanogenic archaeons, 1 anaerobic methanotroph (Candidatus Methanoperedens), 5 prokaryotes of Thermoplasmata, 5 prokaryotes of phylum Thaumarchaeota, 1 prokaryote of Bathyarchaeia, 1 prokaryote of Woesearchaeia, and 1 prokaryote of Altiarchaeia.
    Methanogenic community structure of bottom sediments of the Yenisei River dominated by archaeons belonging to Methanosarcina, Methanosaeta and Methanoregula. The OTU abundance of these archaeons was larger in sediments collected between 56ºN and 61ºN. Along this channel segment the values of δ13С-CH4 in the dissolved methane has increased from -54 to -43‰ VPDB that indicated methylotrophic and acetoclastic methanogenesis. In the segment between 61ºN and 64ºN the OTU abundance of methanogenic Archaea decreased dramatically (5-190 times) which was accompanied by the sharp depletion of δ13С-CH4 up to -60 – -80 ‰ VPDB indicating the shift to hydrogenotrophic metabolic pathway of methane production. Also in this river area we observed increasing OTU abundance of anaerobic methanotrophs belonging to Candidatus Methanoperedens. Further North (64-67ºN) we observed enrichment of δ13С-CH4 and increasing in methanogenic community archaeons belonging to Methanosarcina and Methanoregula.
Acknowledgments: this study was supported by the grant from the RFBR, the project № 18-05-60203_Arctica, and 19-05-50107.

Posters
SE

Svetlana Evgrafova

Senior Researcher, V.N. Sukachev Institute of Forest FRC KSC SB RAS, Krasnoyarsk, Russia



Tuesday July 28, 2020 13:45 - 13:50 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

13:50 MSK

Microbiome in tundra and forest tundra permafrost soils, southern Yamal, Russia
Permafrost soils differ significantly from other soils since they serve as a huge reservoir organic carbon accumulated during the Quaternary Period, which is in potential risk to be released in conditions of observed Arctic warming. This paper is aimed to characterize existing carbon pools, possible mineralization risks of soil organic matter and assess microbial communities in tundra and forest tundra permafrost soils of southern Yamal region. Profile distribution of carbon, nitrogen and C:N ratio showed non-gradual changing with the depth due to manifestation of cryopedogenesis in soil profiles which lead to cryogenic mass transfer. Mean carbon stocks for study area were 7.85±2.24 kg m-2 (for 0-10 cm layer), 14.97±5.53 kg m-2 (for 0-30 cm), 23.99±8.00 kg m-2 (for 0-100 cm).  Analysis of humus type revealed the predominance of highly mineralizable low-molecular fragments which testifies high mineralizing risks in system of humus substances in conditions of Arctic warming. The taxonomic analysis of the soil microbiomes revealed 48 bacterial and archaeal phyla, among which Proteobacteria (27% on average), Actinobacteria (20%), Acidobacteria (13%), Chroloflexi (12%), Gemmatimonadetes (7%),  Verrucomicrobia (7%), Planctomycetes (6%),  Bacteroidetes (2 %), AD3 (3%) and Nitrospirae (3 %) constituted the majority (more than 95% of sequences in the amplicon libraries). The number of OTUs was higher in topsoils with more decomposed Histic horizons. This is probably due to less acidic pH (acidoneutral) in topsoil horizons values compared to other studied sites (exclusively acidic). This is in line with the results of basal respiration analysis showed the higher values in corresponding samples. To estimate alpha diversity, the indices for richness (observed species, ChaoI) and evenness (Faith’s index, Shannon evenness) were calculated. pH range and nitrogen accumulation were found as the main environmental factors describing the microbial community diversity and composition in studied soils. pH was found in strong positive correlation with Shannon index and in moderate positive correlation with phylogenetic diversity index. Whereas soil microbial communities have been studied quite detailed in temperate environments, little information is published for the permafrost environments. It is of particular interest the issues of comprehensive study of microbial diversity and its effects on the functioning and stability of the Arctic and sub-Arctic ecosystems, the carbon dynamics controlled by microbiome, and significance of changing environmental conditions for microbiome functioning in the Arctic, which remain poorly understood.    
This work is supported by Russian Scientific Foundation, project № 17-16-01030.

Posters
IA

Ivan Alekseev

Saint Petersburg State University



Tuesday July 28, 2020 13:50 - 13:55 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

13:55 MSK

Computer-aided screening of the LDH inhibitors via structural filtration..
Human lactate dehydrogenase A plays a significant role in the glucose metabolism of cancer cells [1]  and constitutes an attractive target for chemotherapy [2]. The active site of LDH-A includes substrate-binding and coenzyme-binding sites. Also, an additional binding site is formed upon the transition of the mobile loop 96-111 into the open conformation, which may be important for the design of new inhibitors [3].
In this work the structural criteria for the selection of promising inhibitors were established and virtual screening of a large library of chemical compounds was performed. By analysis of binding of substrates and known inhibitors, we indentified the residues forming the most imortant interactions: Arg168, Arg105 and Thr247. Models of the closed and open forms of human LDH-A have been constructed based on the crystal structures 1i10 and 4l4s using the Amber package. Molecular docking of 2379 ZINC compounds with sulfo group (isosteric analogue of the carboxyl substrateʼs group) into the active site of protein models was done using Lead Finder. Next, we have applied the established structural criteria (hydrogen bonds with Arg168, Arg105 and Thr247, as well as additional interactions with His192, Asn137, Gln99 и Ile241 residues), which allowed us to select 4% of compounds as putative inhibitors of the closed LDH-A form and 5% of compounds as inhibitors of the open form [4].
This work was supported by the The Foundation for Assistance to Small Innovative Enterprises in Science and Technology (FASIE) “UMNIK”(contract № 12449GU/2017).
1.    Warburg et al. (1956) On respiratory impairment in cancer cells, Science, 124:
267-272.
2.    Hamanaka R. B., Chandel N. S. (2012) Targeting glucose metabolism for cancer therapy, Journal of Experimental Medicine, 209: 211-215.
3.    Nilov D.K., Prokhorova E.A., Svedas V.K. (2015) Search for human lactate dehydrogenase A inhibitors using structure-based modeling, Acta Naturae, 7: 62-68.
4. Gushchina I., Svedas V., Nilov D. (2018) Selection of bifunctional tyrosyl-DNA phosphodiesterase 1 inhibitors through molecular modeling, FEBS OPEN BIO, 8: 455.

Posters
ES

Elina Smolkina

student, Moscow State University



Tuesday July 28, 2020 13:55 - 14:00 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

14:00 MSK

MGnify: Viral annotation (Part 3)
In the initial lecture session we will cover:

- The importance of viral metagenomic analysis
- Detection of viral sequences using VirSorter and VirFinder
- Taxonomic assignment
- Using VIRify to explore the human gut virome

In the hands-on we will be using VIRify to predict and classify viral genomes in a mock microbiome community.

Speakers
avatar for Alexandre Almeida

Alexandre Almeida

Postdoctoral Fellow (ESPOD), EMBL-EBI
I am an EBI-Sanger Postdoctoral Fellow focusing on the study of the human gut microbiome using genome-resolved metagenomics. My main research interest is understanding the role of the large uncultured diversity of the gut microbiome in human health and disease.
avatar for Rob Finn

Rob Finn

Team Leader, Sequence Families, EMBL-EBI
Dr Rob Finn leads EMBL-EBI’s Microbiome Informatics team, which is responsible for the MGnify resource, which provides access to the metagenomics, metatranscriptomics and assembly analysis services. The functional and taxonomic profiles of these datasets, once made public, can be... Read More →
avatar for Lorna Richardson

Lorna Richardson

Microbiome Resources Co-ordinator, EMBL-EBI
avatar for Ekaterina Sakharova

Ekaterina Sakharova

Bioinformatician, EMBL-EBI



Tuesday July 28, 2020 14:00 - 15:15 MSK
Zoom Mgnify https://zoom.us/j/93441398259?pwd=ZVRiWWl5ZWFpNlVQZUhVcDB0aTBndz09

15:15 MSK

MGnify: MAG generation (Part 4)
Speakers
avatar for Alexandre Almeida

Alexandre Almeida

Postdoctoral Fellow (ESPOD), EMBL-EBI
I am an EBI-Sanger Postdoctoral Fellow focusing on the study of the human gut microbiome using genome-resolved metagenomics. My main research interest is understanding the role of the large uncultured diversity of the gut microbiome in human health and disease.
avatar for Rob Finn

Rob Finn

Team Leader, Sequence Families, EMBL-EBI
Dr Rob Finn leads EMBL-EBI’s Microbiome Informatics team, which is responsible for the MGnify resource, which provides access to the metagenomics, metatranscriptomics and assembly analysis services. The functional and taxonomic profiles of these datasets, once made public, can be... Read More →
avatar for Lorna Richardson

Lorna Richardson

Microbiome Resources Co-ordinator, EMBL-EBI
avatar for Ekaterina Sakharova

Ekaterina Sakharova

Bioinformatician, EMBL-EBI


Tuesday July 28, 2020 15:15 - 16:30 MSK
Zoom Mgnify https://zoom.us/j/93441398259?pwd=ZVRiWWl5ZWFpNlVQZUhVcDB0aTBndz09

17:00 MSK

The 3C criterion: Contiguity, Completeness and Correctness to assess de novo genome assemblies
De novo genome assembly is an open challenge in bioinformatic analyzes. Although the genome of an organism is "unique", different assemblies can be obtained depending on the type of DNA sequencing technology, the algorithms and parameters, as well as the complexity of the genome.
In order to select the reconstructed sequence that is closest to the real genome, different approaches to evaluate and select assemblies have been implemented. First, metrics such as the N50, L50, NG50, and others, have been used that are related to the number and size of pieces obtained with respect to the expected sequence, the contiguity. Other comparison strategies have focused on the ability to reconstruct essential genes and known elements of the genomes, referring to completeness (how much of the genome is represented by the pieces of the assembly) as a requirement that has a more biological meaning than just the number of fragments in the assembly. Additionally, the accuracy between the sequenced and the expected bases has been a matter of discussion, due to the difference in the performance of different DNA sequencing technologies. This can be referred as correctness, how well those pieces accurately represent the genome sequenced.
Due to the above, we have recently conceptualized criterion 3C (contiguity, completeness and correctness) as a set of metrics that can be used to benchmark genome assemblies (Molina-Mora, et al., 2020). This allows assembly selection to consider different aspects at the same time. We assessed this concept with the assembly of a bacterial genome, using Pseudomonas aeruginosa AG1 as a study model. Regarding the reference genome (P. aeruginosa PAO1), it was initially estimated that P. aeruginosa AG1 had ~ 1 Mb additional DNA sequence in its genome, so a de novo assembly was required. To do this, we used ultra-deep sequencing by short- (Illumina) and long-reads (Nanopore) technologies. An exhaustive comparison of different algorithms and technology combinations was done, resulting in the selection of a candidate assembly using the criterion 3C.
Thus, in this talk we will delve into the comparison of different assemblies, highlighting: (i) the definitions and relevance of contiguity, completeness and correctness metrics, (ii) the results obtained by sequencing technologies in hybrid or non-hybrid approaches based on metrics, (iv) aspects of the use of guide genomes for scaffolding, assembly polishing and manual curation, and (v) the challenges that still persist in this field of genome assembly. For this, we used the described model and two new isolates of P. aeruginosa (strains C25 and C50) that we have sequenced in the same way. For each genome, 10 approaches (hybrid or not) were implemented using different assemblers (Unycicler, SPAdes, IDBA, SKESA, Canu and Flye).
From the benchmarking results, well-known results of a better performance of long reads technologies to solve repeated zones and the fidelity obtained by short reads technology stand out. Despite the fact that some assembly algorithms achieved a single contig as expected, surprisingly a large number of fragmented genes were identified for the cases with long reads data. Thus, assessment using 3C criterion showed a substantially improved performance for a hybrid assembly approach, using the best advantages each sequencing technology.

Reference:
Molina-Mora, J.-A., Campos-Sánchez, R., Rodríguez, C., Shi, L., & García, F. (2020). High quality 3C de novo assembly and annotation of a multidrug resistant ST-111 Pseudomonas aeruginosa genome: Benchmark of hybrid and non-hybrid assemblers. Scientific Reports, 10(1), 1392. https://doi.org/10.1038/s41598-020-58319-6

Speakers
avatar for Jose Arturo Molina-Mora

Jose Arturo Molina-Mora

Microbiologist bioinformatician, Universidad de Costa Rica
Nature, trips and bioinformatics!


Tuesday July 28, 2020 17:00 - 17:05 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

17:00 MSK

Q & A: talks
Tuesday July 28, 2020 17:00 - 19:00 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

17:05 MSK

Impact of genetic variation on the association rate constant of von Willebrand factor and GPIbα platelet receptor
Von Willebrand factor (VWF) is a large multimeric protein involved in the processes of platelet adhesion and activation. A1-domain of von Willebrand factor subunit interacts with the complex GPIb-V-IX, a platelet transmembrane receptor complex, via the receptor GPIbα. 

Information concerning VWF and GPIbα genetic variation are available in open database ClinVar. In the present work we analysed the impact of genetic variation on the association rate constant of von Willebrand factor and GPIbα platelet receptor. Basing on the PDB-structures the rate constants ka of VWF-A1 and GPIbα association were determined for the series of genetic variations (Alsallq & Zhou, 2008). The work has been focused on the clinically significant genetic variants both of VWF-A1 and GPIbα (Landrum et al., 2018). 

It was found that certain mutations (Trp1313Cys, Arg1379Cys) in von Willebrand factor A1-domain caused several fold decrease, while the mutation (Gly249Val) is followed by significant increase of the association rate constant ka values.

Models of VWF A3-domain and collagen III interaction, as well as VWF A1-domain and bitiscetin interaction have been studied in the similar manner. Mutation Ser1783Ala in VWF A3-domain caused а nearly two-fold increase of the ka value in case of the interaction model with collagen III. 

The results obtained seem to be important for the interpretation of clinical data concerning Bernard-Soulier syndrome, von Willebrand disease (VWD) and pseudo-VWD.

The work has been supported by Russian Science Foundation (Grant 19-11-00260).

Speakers
avatar for Maria Gefen

Maria Gefen

National Research Center for Hematology & Moscow Institute of Physics and Technology


Tuesday July 28, 2020 17:05 - 17:10 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

17:10 MSK

Network perspective on metabolic diversity among mononuclear phagocytes
The diversity of myeloid cells across different tissues is truly astonishing, both in function and in their developmental trajectory. Additional dimension of this diversity is manifested by the metabolic characteristics of individual phagocytes which can vary significantly based on the cell type and its location. At present, direct metabolomics profiling of tissue residing subpopulations is not feasible, as the process of ex vivo sorting can be lengthy and cause significant metabolic perturbations. However, RNA levels are significantly more stable to the sorting process and can serve as a reasonably reliable proxy to activities of metabolic pathways. In this work we focus on understanding metabolic variability across phagocytic subpopulations through integrated examination of several large-scale datasets that transcriptionally profiled subsets of myeloid cells.
Specifically, we have assembled compendium of three datasets, including first public release of the new dataset generated by Mononuclear Phagocytes Open Source ImmGen project. This dataset totals 337 samples and provides a unique source of information about individual cell subpopulations. It extends previous ImmGen effort that included 202 samples of various mononuclear phagocytes, also analysed in this study. Furthermore, we have leveraged recently released single-cell RNA-seq profiling of the multiple murine organs and reanalysed those data by focusing only on the mononuclear phagocytic populations, comprising 36,480 cells across 18 tissues.
Using these transcriptional data, we sought to identify major metabolic features characteristic of different populations of phagocytic cells, and define how these features vary across cell types and locations. This is computational task that has not been address previously for the datasets of such scale. Indeed, a previously described computational approach, called GAM (PMID: 27098040) uses metabolic networks as the backbone for analysis of transcriptional data and provides a verifiable and systematic description of the metabolic differences. However, datasets in question contain hundreds of individual profiles, while GAM approach is designed to analyse comparison between two conditions. Therefore, in this work we have developed novel computational approach, GAM-clustering, which performs unbiased search of a collection of metabolic subnetworks that jointly define metabolic variability across large datasets. By doing so, GAM-clustering reveals metabolically similar subpopulations in a manner that does not require explicit annotation or pair-wise comparison of individual samples. Our analysis revealed major metabolic features associated with different cell subpopulations and highlighted a number of metabolic modules that are specific to individual cell types, tissues of residence, or developmental stages. As an example, GAM-clustering analysis revealed that cholesterol pathway might play an important role in the context of migratory dendritic cells (DC), which we validated using in vivo pharmacological inhibition of this pathway followed by tracking of DC migration. Consistent with the analysis, statins have demonstrated inhibitory effect on DC migratory ability, finding that has not been reported previously.
Taken together, our work provides both (1) unique data and analysis resource in terms of studying variability of phagocytes, as well as (2) validated computational approach that can unbiasedly analyse both single-cell RNA-seq data as well as multi-sample bulk RNA-seq datasets in terms of underlying metabolic features.

Speakers
avatar for Anastasiia Gainullina

Anastasiia Gainullina

PhD student, ITMO University
Gene Expression Analysis, Biological Networks (Metabolic, etc), Teaching



Tuesday July 28, 2020 17:10 - 17:15 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

17:15 MSK

Shifts in the microbial community of soil in long-term burial conditions
Paloesols (buried soils) are generally formed by covering the undisturbed soil with mounds of different origin. Fresh organic matter is no longer delivered to the buried soil for a long time, as well as humidification, temperature and air regimes changes. This leads to the emergence of various diagenetic processes and shifts in the structure of soil microbiome. Microbial communities of paleosols are considered to be partially conserved and serve as sources of information describing soil conditions before burial, however, this issue is still unclear. On the one hand, a number of chemical and morphological properties and their profile stratification persist in buried soils, on the other hand, a decrease in the number of microorganisms and shifts in the trophic and taxonomic structure of the microbiome are observed. To assess changes in the prokaryotic community during burial, a comparative analysis of the microbiome of the dark chestnut buried under the mound of 500  B.C. and the adjacent surface dark chestnut soil, located in the same landscape conditions, was performed. To scale this difference, other soil types (chernozem, sod-podzol, and gray soil) were taken for comparison. 16S rRNA gene copies abundance was assessed with qPCR, taxonomic structure was analyzed by using throughput sequencing of amplicon libraries of V4 16S rRNA fragments with dada2 package and QIIME2 software.The significance of the differences in representation and abundance of philotypes was assessed with DESeq2 package. Metabolic pathways were reconstructed using Picrust2 software.
The buried soil demonstrated the conservation of the profile stratification with the corresponding differentiation of microbial communities. The decrease in total bacterial number (1.8 - 15.7 times depending on the horizon), as well as significant differentiation between A and B horizons was determined here, in comparison with surface soil.
Significant differences in microbiomes of different horizons were revealed even at the level of phyla (especially Actinobacteria, Proteobacteria, Firmicutes, Thaumarchaeota (Archaea), Acidobacteria, Chloroflexi, Bacteroidetes, Planctomycetes). We determined significant changes in the soil microbiome, and the scale of these changes was comparable with the differences between soils of different types. In the buried soil a decrease in the genus Gaiella, orders Rubrobacterales, Solirubrobacterales, Nitrososphaerales (Archaea), Frankiales, and an increase in the Acidimicrobiia class, phyla Firmicutes (Bacillales) and Chloroflexi were observed. In the upper horizons, the shares of Bacteroidetes and Verrucomicrobia increased. Thus, the burial increases the proportion of microorganisms capable of survival under adverse environmental conditions and the oligotrophic type of nutrition. The presence of microorganisms participating in the nitrogen cycle in the buried soil (Nitrolancea, Candidatus Alisiosphaera, Rhizobiales, Candidatus Nitrososphaera) may indicate that its environment remains stable after burial and maintains the cycle of the main biogenic elements. However, cluster analysis showed that the microbiomes of A and B horizons of the buried soil migrate to the group of C horizons, which may indicate a greater degree of their “mineralization”. This is confirmed by the analysis of potential metabolic pathways showing the predominance of degradation processes in horizons A and B of the buried soil.

This work was supported by the Russian Science Foundation, № 18-16-00073.

Speakers
KA

Kichko A.A.

All-Russian Research Institute for Agricultural Microbiology


Tuesday July 28, 2020 17:15 - 17:20 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

17:20 MSK

Specialized Metabolism Gene Clusters from Red Sea Brine Pool Microbial Metagenomes
Mining for specialized metabolism gene clusters (SMGCs) is one approach to finding new antibacterial and anticancer natural products, especially from under-explored environments. Microbial metagenomes from Atlantis II Deep, Discovery Deep and Kebrit Deep Red Sea brine pools were shotgun sequenced and 2,751 Red Sea brine SMGCs were detected. The Red Sea brine SMGCs were found to be potentially encoding for natural products pertaining to 28 classes, that were functionally grouped into three main categories, which comprise the following diverse chemistries -in addition to hybrid clusters-: (1) saccharides, fatty acids, aryl polyenes, acyl-homoserine lactones, (2) terpenes, ribosomal peptides, non-ribosomal peptides, polyketides, phosphonates and (3) polyunsaturated fatty acids, ectoine, ladderane and others. We recently reported our findings, and here we will focus on the specific methodology of SMGCs detection in metagenomic samples, and on a particular selected group of natural products, which are the Ribosomally synthesized and post-translationally modified peptides (RiPPs). Although RiPPs constitute only 0.78% of the total Red Sea brine SMGCs, they are technically feasible to test in the lab, and thus it can be selected for prioritization for downstream experimentation. Moreover, several earlier studies have reported RiPPs belonging to similar classes, which exhibited antibacterial and/or anticancer effects. Bacteriocins (17 SMGCs), saccharide-bacteriocin hybrid clusters (3 SMGCs), Microcins (3 SMGCs) and Lanthipeptides (2 SMGCs), constitute the detected Red Sea brine RiPPs. In addition to our earlier reported results, here we will focus more on the methodology and recommendations for optimal mining microbial metagenomes for SMGCs, furthermore, we focus on and prioritize an additional selected group (RiPPs) for recommendation to the experimental work to validate and highlight the importance of the implemented  methodology.

Speakers
avatar for Laila Ziko

Laila Ziko

Postdoctoral Researcher, Adjunct Assistant Professor, Biology Department, American University in Cairo
I'm a Postdoc interested in different topics that I really like to work on. Natural products from microbes, Metagenomics, antibacterial compounds, anticancer compounds, all are very interesting to me. Reach out for discussion & possible collaboration, look out for my talk on our recent... Read More →


Tuesday July 28, 2020 17:20 - 17:25 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

17:25 MSK

A platform for genomic characterization of Enterococcus spp.
The growing demand for genomic data analysis made essential the development of scalable and robust bioinformatic workflows. This are very interesting tools once reduce researcher´s efforts by the automation of task execution and guarantee the reproducibility of the data analysis. Until the moment, there are few genomic analysis workflows designed for a specific bacterial genera. So, we present JAMIRA a reproducible and scalable pipeline for prokaryote genomic data analysis designed for the genera Enterococcus spp.. In the last decade, enterococci have emerged as one of the main bacterial genera of clinical relevance, as they are important carriers of virulence genes and posses intrinsic resistance to commonly used antimicrobial agents including most cephalosporins, all semi-synthetic penicillins and clindamycin. The proposed workflow integrates a comprehensive set of  genomic analysis tools for the prediction of phages, plasmids, genomic islands, antimicrobial resistance genes and virulence factors that may be associated with the adaptation of commensal and clinical bacteria. Therefore, our pipeline automate several tasks commonly performed in comparative genomic studies in order to contribute to the elucidation of the biological mechanisms which associated enterococci isolates with public health outcomes. The pipeline development initiate by the selection of bioinformatic tools used for the identification of elements associated with successful colonization and genomic plasticity of prokaryotes. Available free tools were compared in order to select the most appropriate for the genetic study of genus Enterococcus spp. In order to facilitate installation of the software dependencies of each tool and the consequent integration in the pipeline, tools available on the Bioconda platform were used. To ensure data analysis reproducibility, the workflow was constructed based on the Snakemake framework which has a readable definition language, and integrated with Conda package manager that encapsulates all software dependencies necessary for the execution of each tool. Actually, JAMIRA platform includes the following genomic analysis tools, Abricate, RGI, PlasmidFinder, IslandPath-DIMOB and PhiSpy.  A web application of JAMIRA is being implemented using the PHP Laravel framework for the elaboration of the program's internal structure and JavaScript, HTML and CSS for the graphic interface. Initially, MySQL management system were used for data storage, which can be changed according to user demand. The application has a graphical interface that allows the analysis of genomic data files in FASTA format, dispensing softwares installation or the use of command line for the analysis running and configuration of the workspace. Therefore, JAMIRA is an automated and easy-to-use workflow that will allow scientists with no background in bioinformatics to perform reproducible and trustworthy genomic data analyses, contributing to the understanding of the differences between commensal and clinical strains, as well as to the elucidation of biological mechanisms which made bacteria from enterococci genera associated with public health risks.

Speakers
RS

Rafaella Santana Bueno

Undergraduate in Biomedical Informatics, UFCSPA - Federal University of Health Sciences of Porto Alegre


Tuesday July 28, 2020 17:25 - 17:30 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

17:30 MSK

A rigorous approach to UPGMA phylogeny by multidimensional scaling of pairwise distances and bioinformatic outcomes for a commercially significant geneset
This study aimed to rigorously determine a branching order for a set of interrelated housekeeping genes. Following extensive genomic sequencing of the algal triterpenoid-biofuel producer Botryococcus braunii, we rapidly obtained the target set of genes that are related to squalene synthase, by using selective Blast and then iterative SOAP for assembly of overlapping blast hits. In that way we exhaustively ascertained that there were only four homologues present. Introns were crossed by using 2-4 kb paired ends. Full length genes were annotated including potential alternative C termini. We hypothesised that these four key biofuel genes had evolved by two successive gene duplications from squalene synthase, and that two code for proteins tethered to a membrane by their C-termini. In a novel approach to phylogeny, using Matlab we first obtained Needleman-Wunsch protein alignments that minimized the PAM-250 genetic distances between each pair of sequences. We stored each pairwise alignment in a matrix and selected the case with the minimal penalty when normalised. Multiple alignment was intentionally omitted, as all genes were true homologues, so all intergene pairs were valid comparisons. After pairwise alignments, the distance between sequences was computed using the Poisson model. To visualize these distances, multidimensional scaling (MDS) was used to create an optimally distance-preserving projection onto two axes, allowing direct visualization of the relative genetic distances between sequences. The novel MDS approach critically informs the succeeding steps in tree generation, and differs from prior applications of MDS in tree comparison. Use of the Poisson model guarantees an ultrametric tree in the subsequent phylogenetic construction. Phylogeny was analysed by hierarchical clustering, using the Unweighted Pair Group Method with Arithmetic Mean (UPGMA) method. UPGMA merges the two nearest neighbor sequences into one cluster C, and determines the new distance d(C,K) between C and the remaining clusters K; in UPGMA this distance d is the average of distances for all sequences in the cluster. The algorithm iterates, terminating when all sequences are merged into a single cluster, which becomes the root of the generated tree. Each merge operation represents one branch of the resulting phylogenetic tree. The root node is wherever the last merge is made. We suggest that the MDS method conducted may detect bioinformatic richness present in the sequences, relative to other phylogenetic methods that tend to treat AA or nucleotide columns as if they did not form part of a whole gene. By considering the gene as the critical unit upfront, and by defining the gene only by its relationship to each other gene, and then allowing distance to be multidimensional, the algorithm presented may respond to uniquely conserved areas sampled across two genes at a time, and which may represent potential ancestral richness predating the pair. The phylogenetic branching order obtained from the tree for this gene set correlates well to observed synapomorphies (here motifs and introns) present across the gene set, giving us high confidence in the phylogenetic order of duplication of the constituent genes, and allowing us to infer biochemical signatures in the active-site pockets of this wider set of triterpenoid biosynthesis proteins.

Speakers
RM

Robert Moore

There are two speakers sharing the talk. This speaker Robert has experience in molecular microbiology, gene annotation, genome mining, phylogeny and taxonomy, and has worked in plant science, microbiology, and genetics fields. Currently in the environmental microbiology industry... Read More →
MB

Michael Barnathan

Temple University


Tuesday July 28, 2020 17:30 - 17:35 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

17:35 MSK

Do multiple long-distance transfers shape TBEV spread pattern?
Tick-borne encephalitis (TBE) is viral zoonosis transmitted by the bite of infected ticks. About 20 years ago, the TBEV was divided into three main subtypes based on the phylogenetic analysis: European, Siberian, and Far-Eastern. The geographic distribution of subtypes mostly corresponds to the nominal region. However, some exceptions are known. Herein, 848 TBEV sequences (1028 nt E-gene fragments) were analyzed to indicate all long-distance virus transfers, that can be revealed from the sequence data. Threshold of 500 km was used for the selection of long-distance virus transfers. Temporal estimates for these events were obtained using Bayesian evolutionary analysis. Noteworthy, ticks are not able to spread the infection on their own over such a distance. In other words, these long-distance virus transmissions were caused by vector-assisted tick transmission. In all subtypes and most of the smaller groups in these subtypes, there were a lot of recent long-distance virus transfers. Moreover, this is suggested to be a systematic pattern, rather than anecdotal events. Most of the known sequences of the European subtype were obtained in Switzerland, n=41 out of 178, or the Czech Republic, n=35 out of 178. Genetic diversity of viruses found within each of these two countries was comparable with the diversity of the whole subtype, n=178. At the same time, this subtype is distributed throughout Central and Eastern Europe, Altai, the Irkutsk Region (Russia), and South Korea. The above arguments allow us to state that long transfers may be considered as a normal and abundant pattern in TBEV spreading.

Speakers
AA

Andrei A. Deviatkin

Sechenov First Moscow State Medical University


Tuesday July 28, 2020 17:35 - 17:40 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

17:40 MSK

Comparative genome analyses of an extensively drug-resistant (XDR), uropathogenic Morganella morganii SMM01
Morganella morganii is an opportunistic gram-negative, facultative bacterial pathogen belonging to the Proteeae tribe of Morganellaceae family. It has been implicated in a wide range of clinical and community-acquired infections. An extensively drug resistant (XDR) M. morganii SMM01 was isolated from the urine of a male patient with urine and faecal incontinence at Sri Sathya Sai Institute of Higher Medical Sciences, Puttaparthi, India. Antibiotic susceptibility testing (AST) of M. morganii SMM01 revealed that the pathogen was non-susceptible to Ticarcillin/Clavulanic Acid, Piperacillin/Tazobactam, Ceftazidime, Cefoperazone/Sulbactam, Cefepime, Aztreonam, Doripenem, Imipenem, Ciprofloxacin, Levofloxacin, Minocycline, Tigecycline, Colistin and Trimethoprim/Sulfamethoxazole. Among the antibiotics tested, the isolate was found susceptible to only Meropenem, Amikacin and Gentamicin. Here, the complete genome of M. morganii SMM01 was produced using raw reads from Nanopore and Illumina sequencing technologies and assembled with Unicycler. The genome has a size of 3930130 bp and a GC content of 51.5 %. RAST has annotated 3972 Protein-coding Genes (CDS), 97 RNAs in the genome. In silico analysis of antibiotic resistance genes in M. morganii SMM01 revealed the presence of SAT-2, DHA-22, AAC(6')-Ib-cr, OXA-1, aadA, CRP, catII, tet(B), KpnH (protein homolog model), and tetR (protein overexpression model) genes in its genome. Mutations in some of the genes as detected by the protein variant model have conferred resistance to Pulvomycin (EF-Tu), beta-lactam antibiotics (PBP3), sulfonamides (folP) and floroquinolones (gyrB). A total of 63 different virulence factors were identified in SMM01 which were involved in Virulence, Invasion, Toxicity, Regulation of gene expression, Intracellular survival and replication, Endotoxin, Cellular metabolism and biofilm formation. Complete cassette of Hemolysin, an exotoxin member of the RTX superfamily that allows bacterial pathogens to effectively tamper with normal host cell processes, enhancing virulence and pathogenesis was specifically detected in the study genome. Eight prophages and 19 genomic islands were identified in the SMM01 genome. Further, Comparative genome analyses of SMM01 was performed with the genomes of all the available 66 M. Morganii to understand the plasticity of the pathogen’s genome. The pan-genome analysis was found to contain 10,352 CDS, of which 1,581 represent the core genome. The accessory genome varied from 613 (SA36) to 2,238 (INSRALV892) genes across the genomes. The pan-genome of M. Morganii is determined as an “open” pan-genome. The findings of this study further expand the current understanding of M. morganii’s genome nature of adaptability to a variety of different niches, their intrinsic resistance to antimicrobial drugs and their ability to rapidly acquire AMR and virulence determinants.

Speakers
avatar for Chanakya Pachi Pulusu

Chanakya Pachi Pulusu

Doctoral Research Scholar, Department of Biosciences, Sri Sathya Sai Institute of Higher Learning, Prasanthi Nilayam, India. 515134.


Tuesday July 28, 2020 17:40 - 17:45 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

17:50 MSK

Simple sequence repeats (SSRs) indicate evolution hotspots in goldfish (Carassius auratus) and Cyprinidae fish
Simple sequence repeats (SSR) also known as microsatellites are known to relate to active chromosome evolution including rearrangement events, regions under selection pressure, etc. Here we accordingly consider SSR dataset for a variety Cyprinidae representatives. Analysis of length of SSR as well as their oligonucleotide composition revealed difference between domesticated and free-living fish. Cyprinidae representatives are characterized by a complex picture of polyploidy, whole-genome duplication, interspecies as well as interphylum hybridization, and chromosome rearrangement events [1]. Cyprinidae therefore makes an exceptional study subject in the field. Moreover, genes encoding artificially selected traits were shown to have SSRs length well correlated with the trait quantity [2]. SSRs are tend to mutate easily; this occurs vie expansions (e.g., trinucleotide expansions) and collapses being connected with the evolution rate [3]. Here we describe SSRs for 28 Cyprinidae fish (including goldfish Carassius auratus has been bred over centuries) taken from FishMicrosat database [4]. For each species set we considered lengths and oligonucleotide composition. The result was clusterized using Ward's method yielding domesticated fish and free-living mostly clustered separately. The same oligo frequency dataset was analysed using principal component analysis (PCA). This reinforced the result while also pointing to significant difference between the clusters. Indeed, TG/CA dinucleotide presented more in domesticated fish with smaller amount of AG/TC dinucleotide. This might suggest the presence of specific SSR region with some relation to genes under strict evolutionary pressure.
By considering longest SSRs in Carassius auratus case-by-case, we were able to note individual features putatively connected to a distinct evolutionary events. The outstanding sample here is ca. 100 bps-long trinucleotide SSR that is flanked by nearly perfectly homological sequences 1) from Cyprinus carpio chromosome 43 and 2) from Carassius gibelio known microsatellite sequence [5]. Moreover, this SSR track is perfect in C. auratus but not in homologous sequences. Aike cases point particular genomic coordinates that might be important for the evolution of the fish.
References
1) Boron A., Spoz A., Porycka K. et al. Comparative Cytogenetics. 2014, V. 8(3), P. 233–248.
2) Fondon J. W., Garner H. R. // 2004, V. 101(52), P. 18058–18063.
3) K. T. Xie, G. Wang, A. C. Thompson et al. // Science. 2019. V. 363. P. 81–84. DOI:10.1126/science.aan1425.
4) Nagpure N.S., Rashid I., Pati R. et al. BMC Genomics. 2013, V. 14, P. 630.
5) https://blast.ncbi.nlm.nih.gov/Blast.cgi

Speakers
MO

Mikhail Orlov

ICB RAS
PhD-student doing bioinformatics, biophysics, and ecological modelling


Tuesday July 28, 2020 17:50 - 17:55 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

17:55 MSK

Pannopi: prokaryotic genome assembly and annotation pipeline.
The emergence of a new generation of sequencing for the first time allowed us to significantly accelerate and reduce the cost of determining the complete sequence of millions of genomes of organisms, from bacteria to human. Scientists are now looking to miniaturize and automate the sequencing process, increase the amount of data obtained, and reduce the cost of it. It is clear from bioinformatics that as the cost of sequencing decreases, the number of data processed will increase. It is necessary to identify and automate the areas of analysis that are routine.
We created Pannopi - a scalable, easy-to-use assembly and annotation pipeline based on a hierarchical pan-genome graph. The program performs a large-scale analysis of the nucleic acid sequence of bacteria from preparation to functional annotation. The process runs from the preparation of sequence reads to genome assembly, through cleaning up the genome from external contamination to structural and functional annotation. Quality control is carried out throughout the process. Pannopi has tests and benchmarks not only for genome assembly, but also for its annotation using eight genomes from different taxonomic groups. This allows new annotation methods to be tested and benchmarking quickly.
Pipeline includes the most advanced and effective tools for genomic annotation and allows for flexible customization of their use. So that the user can select between a few genome assemblers and annotators or even to run all of the tools for subsequent compare. Also, Pannopi allows users to select the taxons for pan-genome comparative genomics and required modules; it can be used on a separate command-line program or through a web interface. Pannopi output includes includes: raw data quality control; assembly; cleaned from contamination assembly; assembly quality control; structural annotation; functional annotation; pangenome-based comparative annotation; lists of antibiotic resistance and virulence genes, plasmids, phages, IS elements, tandem repeats, mlst-type, and serotype.

Speakers
avatar for Danil Zilov

Danil Zilov

First-year master's degree student, Applied Genomics Laboratory, SCAMT Institute, ITMO University


Tuesday July 28, 2020 17:55 - 18:00 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

18:00 MSK

REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets
In this work we present REINDEER, a novel computational method that performs indexing of sequences and records their abundances in each dataset across a collection of datasets.To the best of our knowledge, other indexing methods have so far been unable to record abundances efficiently across large datasets.
Results:
We used REINDEER to index the abundances of sequences within 2,585 human RNA-seq experiments in 45 hours using only 56 GB of RAM. This makes REINDEER the first method able to record abundances at the scale of∼4 billion distinct k-mers across 2,585 datasets. REINDEER also supports exact presence/absence queries of k-mers. Briefly, REINDEER constructs the compacted de Bruijn graph(DBG) of each dataset, then conceptually merges those DBGs into a single global one. Then, REINDEER constructs and indexes monotigs, which in a nutshell are groups ofk-mers of similar abundances.
Availability:
-software: https://github.com/kamimrcht/REINDEER
-preprint: https://www.biorxiv.org/content/10.1101/2020.03.29.014159v2

Speakers
avatar for Camille Marchet

Camille Marchet

Lille University, FR


Tuesday July 28, 2020 18:00 - 18:05 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

18:05 MSK

Metagenomic analysis of colon biopsy samples in patients with ulcerative colitis
Introduction
Ulcerative colitis belongs to the group of multifactorial diseases with unexplained aetiology. It is assumed that the impaired composition and reduced variety of intestinal microbiota are factors of inflammatory bowel diseases and affect their pathogenesis.
The relevance of this research is prescribed by the development of a personalized approach in medicine, the collection of the material before the medical intervention in the course of the disease.

Methods
The study involved 25 patients with ulcerative colitis in whom biopsy was taken from the inflamed region and non-inflamed region of the large intestine.
16S rRNA data processing
The data were classified using Kraken 2. It was shown that this program is better suited for taxonomic annotations of intestinal microbiota data.
Kraken 2, when processing paired-end reads, processes them separately, checking the information about the paired-end reads; this is an important condition at processing 16S data. The data have been analyzed by Pavian and visualized for comparative analysis in Recentrifuge.

Results
Human DNA in 0.7% was filtered by Kraken 2. Percentage of microorganisms found in all samples: Bacteria 99.936%, Viruses 0.055%, Archaea 0.009%. Viral DNA was filtered by Pavian for making the OTU table.
Taxonomic analysis revealed the following changes in reference to 4 main types of bacteria in the biopsy of the affected area of the large intestine compared to healthy area bacteria: decrease of Firmicutes by 2.43%, Proteobacteria by 2.70%, increase of Bacteroidetes by 5.62%, Actinobacteria by 0.09%. The changes are minor, which is confirmed by the Mann-Whitney U-criterion on the homogeneity of sample values.
Shannon index in samples with the inflamed region and non-inflamed region accordingly: 3.2597, 3.0905. The difference is not significant. The richness in samples with the inflamed region and non-inflamed region accordingly: 267.40, 335.28; which indicates much greater species diversity in non-inflamed regions.

Discussion
Similar results are obtained in such studies: no significant taxonomic changes have been found; no differences have been found in terms of the assumed coded functions, indicating that phylogenetic analysis is not necessary. There were significant changes only between individuals or among samples taken from the same patient; this is due to the peculiarities of metagenomic analysis in the context of individual microbiota differences and differences in microbiota ratios in the intestinal sections.
The usage of a third-party database of healthy people was not mentioned as a practice for the study. The complexity of the database selection is represented of the abundance of parameters Significant parameters for the choice of the metagenomic database will be the features of samples taken for analysis: biopsy or faeces; the age of the patients under study, the length of the reads method of sequencing, metagenomic pipeline, the number of samples, taking a biopsy from certain areas of the intestine, the method of collecting material.

Speakers
avatar for Nikita Bulantsev

Nikita Bulantsev

first-year master's degree, Applied Genomics Laboratory, SCAMT Institute, ITMO University, Saint Petersburg, Russia
I’m biotechnologist, first-year master student of molecular biology at ITMO University, bioinformatics cluster. My research areas are metagenomics and personalized medicine. Besides, I'm interested in AI, machine learning.


Tuesday July 28, 2020 18:05 - 18:10 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

18:10 MSK

The search for genetic risk factors of ischemic stroke with the genome-wide association study and machine learning methods
The ischemic stroke (IS) is a neurological deficit of sudden onset due to brain infarction. It is the primary cause of acquired disability in adults and a leading cause of death. The disease is multifactorial where genetic factors have a certain contribution. Numerous genetic polymorphisms are believed to increase the risk of IS, each having a small effect size. The advent of genome-wide genotyping caused a wave of genome-wide association studies (GWAS). At least ten candidate-genes associated with IS are described and verified in different studies, however many additional genomic regions are expected to be confirmed or identified. The machine learning (ML) approaches looks quite promising here. In this research we present the results of identification of single nucleotide polymorphisms (SNPs) associated with the development of IS in individuals of the Eastern Slavic ancestry with the use of GWAS and ML approaches.

The case and control groups consisted of 1051 and 421 individuals, correspondingly. They were genotyped with DNA-microarrays of different types. Upon combining the genotypic data and meeting the requirement of quality control, we obtained about 82 thousands of SNPs for the investigation. The GWAS included an associative test, an exact Fisher test and the Bayes factor method. The machine learning approaches involved Support Vector Machine, k-Nearest Neighbors, Random Forest, Logistic Regression (LR), Gradient Boosting, and Neural Network (NN). They were aimed to classify the patients and healthy people using the genotypic data. The highest accuracy (ROC-AUC) of 0.697 was achieved with the NN method. The effect of SNP on the outcome variable was estimated with SHAP values for LR model. The top ranked SNPs identified were in good agreement with the results of GWAS.

In this research we also assessed the influence of missing genotypes on the results of both GWAS and ML methods. We compared different strategies for combining the genotypes obtained with different DNA-microarrays and provided some recommendations on the appropriate way of doing this. We also annotated the SNPs found in GWAS in terms of genes and speculated that the candidate genes can be associated not only with IS but also with some other diseases (e.g., Alzheimer disease, Parkinson disease) suggesting common basic mechanisms for developing of brain injuries.

The study was funded by RFBR (Russian Foundation for Basic Research) according to the research project No 19-29-01151.

Speakers
avatar for Gennady V. Khvorykh

Gennady V. Khvorykh

bioinformatician, Department of Molecular Bases of Human Genetics, Institute of Molecular Genetics of the Russian Academy of Sciences, Moscow, Russia
Medical and population genetics is within the scope of my interests. I search for the genetic variants that contribute to ischemic stroke, using GWAS and AI. Besides I search for the signals of natural selection, applying statistical approaches to the genotypes of several populations... Read More →


Tuesday July 28, 2020 18:10 - 18:15 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

18:15 MSK

VarQuest+: modification-tolerant database search of secondary metabolites mass spectra
Secondary metabolites (SMs) are at the center of attention for a wide range of researchers from biologists and ecologists to pharmacologists and biomedical scientists [1]. Modern mass spectrometry instruments allow rapid and low-cost scanning of thousands of metabolites which result in huge amounts of high-resolution data. Although this data represents a gold mine for future discoveries, its interpretation remains a bottleneck and requires appropriate computational methods [2]. The current software is either limited to specific classes of SMs, for example, peptidic natural products (VarQuest [3]), or can perform only standard database search which allows identification of known SMs but fails to discover their novel variants (Dereplicator+ [4]).

Here we present VarQuest+, a database search tool capable of identifying novel variants of a wide range of known SMs including polyketides, alkaloids, flavonoids, saponins, and many others. Algorithmic and software innovations in VarQuest+ make it much more efficient in the running time and memory consumption in comparison to existing analogs. This efficiency allowed the implementation of modification-tolerant search mode in VarQuest+, which is more challenging than a regular database search.

We benchmarked VarQuest+ on a Korean medical plants dataset (2.5 millions of mass spectra collected on 337 samples). The standard search of the KNApSAcK database (51,179 plant SMs [5]) resulted in the identification of 349 compounds. VarQuest+ modification-tolerant search identified 4,253 SMs, an order of magnitude more than Dereplicator+. Using the same search parameters, VarQuest+ is twenty times more efficient than Dereplicator+ in runtime, and four times more memory efficient.

The reported study was funded by RFBR, project number 20-04-01096.

References
[1] Cragg, G. M., & Newman, D. J. (2013) Natural products: a continuing source of novel drug leads. Biochimica et Biophysica Acta (BBA)-General Subjects, 1830(6), 3670-3695.
[2] Wang, M. et al. (2016) Sharing and community curation of mass spectrometry data with Global Natural Products Social molecular networking. Nat. Biotechnol., 34, 828.
[3] Gurevich, A. et al. (2018) Increased diversity of peptidic natural products revealed by modification-tolerant database search of mass spectra. Nat. Microbiol., 3, 319.
[4] Mohimani, H., et al (2018) Dereplication of microbial metabolites through database search of mass spectra. Nat. comm., 9:4035
[5] Afendi, F.M. et al (2012) KNApSAcK Family Databases: Integrated Metabolite–Plant Species Databases for Multifaceted Plant Research. Plant and Cell Physiology, 53 (2), e1.

Speakers
avatar for Alexey Gurevich

Alexey Gurevich

Senior Research Scientist, Center for Algorithmic Biotechnology, St. Petersburg State University, St. Petersburg, Russia
I am leading Natural Product Discovery research direction at CAB (http://cab.spbu.ru/research/antibiotics-discovery/). Together with the Center for Computational Mass Spectrometry at UCSD and Mohimani Lab at Carnegie Mellon University, we are creating software for identification of... Read More →



Tuesday July 28, 2020 18:15 - 18:20 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

18:20 MSK

A zero inflated log-normal model for inference of sparse microbial association networks
The advent of metagenomics has prompted the development of efficient taxonomic profiling methods allowing to measure the abundance of organisms in a wide range of environments. Multivariate abundance data further has the potential to enable inference of associations between microbial populations, but several technical issues need to be accounted for, like the compositional nature of the data and its extreme sparsity.

The ecological network reconstruction problem is frequently cast into the paradigm of Gaussian graphical
models (GGMs) for which efficient structure inference algorithms are available. Unfortunately, GGMs can not properly account for the extremely sparse patterns occurring in real-world datasets. In particular, structural zeros corresponding to true absences of biological signals fail to be properly handled by most statistical methods.

We present here a zero-inflated log-normal graphical model specifically aimed at handling such "biological" zeros, and demonstrate significant performance gains over state-of-the-art statistical methods for the inference of association networks.



Speakers

Tuesday July 28, 2020 18:20 - 18:25 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

18:25 MSK

Single-cell ChIP-seq imputation with machine learning models leveraging bulk ENCODE data
Next generation sequencing is routinely used in biomedical research and pharmaceutical industry. Applied in combination with chromatin immunoprecipitation (ChIP-seq), it provides detailed insights in cell genomic properties such as chromatin accessibility and protein-DNA interactions that play a key role in gene regulation and chromatin structure (ENCODE project consortium, 2012). Recently developed assays for single-cell ChIP-seq (scChIP-seq) enable the characterization of these molecular events on single-cell resolution. This allows the investigation of cell differentiation processes that are of crucial interest in many research fields, especially in cancer studies. While the sequencing coverage can be as low as 1000 reads per single cell (Rotem, Assaf, et al. 2015), it was nevertheless possible to investigate relationships between drug-sensitive and resistant breast cancer cells (Grosselin, Kevin, et al. 2019). Such concise findings would not have been possible with bulk ChIP-seq data. However, the sparsity problem caused by the low signal given for an individual cell, hampers further investigations and there is a need for a dedicated imputation method for scChIP-seq. Furthermore, past publications based on sparse datasets from single-cell RNA-seq which is more established, demonstrate that imputation methods strongly enhance research on such data (Peng, Tao, et al. 2019). Eventually, the full potential of future scChIP-seq studies will not be captured without the application of a dedicated imputation method to complete the data. To address this need we developed SIMPA, an algorithm for Single-cell chIp-seq iMPutAtion.

Based on a large dataset of more than 2250 preprocessed bulk ChIP-seq datasets from the ENCODE data portal, SIMPA leverages statistical patterns within a reference set specified by the target, the investigated histone mark or transcription factor used in the scChIP. The existence of those patterns was proved by a cross-validation analysis on classification models. Considering one single cell, SIMPA trains numerous (~120,000 on 5kb resolution) machine learning models to impute missing genomic regions while being sensitive to the
sparse signal of the individual cell. Compared to another imputation strategy (Xiong, Lei, et
al. 2019) that does not involve reference bulk data, SIMPA achieves a better clustering by cell-types. Using a KEGG pathway enrichment tool (Li, Shaojuan, et al. 2019) we could show that functionally related pathways were recovered in a cell-type-specific manner, but only on imputed results form SIMPA. Finally, randomization tests confirmed that both the single cells signal and the target-specific reference data is used by SIMPA to achieve these meaningful imputations.

Our new imputation algorithm was validated on a set of more than 2600 B-cell and T-cell single cells for two different histone marks: H3K4me3 and H3K27me3 at 5kb and 50kb resolution, respectively. Indeed, this is so far the only scChIP-seq dataset available for human cells. In order to efficiently use resources, SIMPA was implemented with an MPI interface to distribute the computations to many cores possibly from different compute nodes. Software is available at https://github.com/salbrec/SIMPA

In conclusion, to address problems related to data sparsity in single-cell ChIP-seq, we developed the first dedicated imputation method that generates accurate and biologically relevant results.

Speakers
avatar for Steffen Albrecht

Steffen Albrecht

PhD Student, Johannes Gutenberg University Mainz
Hello, my name is Steffen Albrecht and I am from Mainz in Germany.Currently, I am a PhD student in the group Computational Biology and Data Mining and my main topics are machine learning and bioinformatics data integration. The application fields are imputation, e.g. for sparse data... Read More →


Tuesday July 28, 2020 18:25 - 18:30 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

18:30 MSK

Preliminary Analysis of Resistome in Mycobacterium abscessus
Mycobacterium abscessus (Mab), a complex of rapidly growing non-tuberculous mycobacteria, causes human infections that are difficult to treat because of its resistance to multiple antibiotics. The whole genome sequences of 1,581 Mab downloaded from the NCBI FTP site were used to infer phylogenetic relationships and investigate the resistome in silico. A total of 2,975 putative protein sequences of resistance genes from 32 distinct drug classes were detected using Comprehensive Antibiotic Resistance Database (CARD) and ARG-ANNOT databases. The most abundant resistance genes detected were related to beta-lactams (1,962 genes), aminoglycosides (258 genes) and fluoroquinolones (205 genes). These genes encoded (i) many multidrug efflux pumps, such as a homolog of Pseudomonas aeruginosa MexAB-OprM involved in resistance to macrolides, fluoroquinolones, monobactams, carbapenems, cephalosporins, cephamycins, penams, tetracyclines, peptides, aminocoumarin, diaminopyrimidines, sulfonamides, phenicols and penems; (ii) different types of beta-lactamases, for instance, KPC type beta-lactamases that decrease susceptibility to monobactams, carbapenems, cephalosporins, and penams, as well as (iii) various transferases, such as a homolog of mph(B) phosphotransferase from Escherichia coli that decreases susceptibility to macrolides. These findings give insight into the mechanisms of resistance to antibiotics in Mab especially those commonly used to treat Mab infections.

Speakers
SL

Shay Lee Chong

Faculty of Information Science and Technology, Multimedia University, Melaka, Malaysia


Tuesday July 28, 2020 18:30 - 18:35 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

18:35 MSK

Genome-wide inference of bacterial transcription factor binding sites: new method and its applications
None of the current bacterial genome annotation pipelines handles regulatory sequences. Transcription factor binding sites (TFBS or operators) are the most abundant regulatory elements, the methods for their fast genome-wide inference are currently lacking while the importance of TFBSs for understanding genome function is critical.
The method of bacterial TFBS inference we are developing is based on the analysis of 3D structures of transcription factor (TF)-operator complexes. We use TF residues contacting DNA bases as a tag (CR-tag) to link TFs with their operators. TFBSs can be inferred genome-wide via either (1) fast automated CR-tag based genome scan with a library of CR-tagged experimentally characterised TFBS motifs or (2) application of slow semi-automated de novo TFBS inference protocol combining CR-tag information with genome structure analysis.
The first approach allows to reliably transfer regulatory information between different species, not necessarily closely related. Even distantly related TFs of Gram-negative and Gram-positive bacteria can have the same CR-tags and hence recognise the same operators. However, direct regulatory information transfer is most efficient within the same taxonomic order (e.g. over 50% of TF orthologue pairs within Enterobacteriales have identical CR tags).
The de novo protocol builds upon the well-established phylogenetic footprinting approach replacing assumption of similar TFs recognising similar operators by strict 3D-structure based criterium (CR-tag) and is universally applicable to any bacterial species.
We illustrate the following applications of our approach:
1) Correcting poorly defined motifs.
For most TFs in a given species, just one or very few targets exist and proper TFBS models cannot be built. With our de novo TFBS inference protocol, orthologous operator sequences can be collected from other species that have TFs with the same CR-tag. This usually provides enough information for properly defining the motif and building high-quality operator model. This approach can vastly improve the usability of the data from single-organism TFBS databases like RegulonDB.
2) Resolving regulation details for paralogous TFs.
Using our CR-tag based approach and experimental evidence, we show that paralogous quorum-sensing regulators in Pectobacterium spp. recognise the same operator sequence, although completely different operators have been suggested previously.
3) The advantages of full-scale genome-wide TFBS inference.
With a current collection of TFBS profiles, genome-wide scan finds operators for the majority of transcription units in a typical enterobacterial genome. This helps to reveal unexpected regulators for many transcriptional units and allows deciphering regulatory cascades. We will provide examples of such inferred transcriptional cascades supported by experimental data.
4) Genome-wide TFBS scan can also be useful when correcting automated genome annotation, since finding an operator for a well-characterised TF can suggest functions for the downstream genes (doi:10.7717/peerj.2056).
The TFBS inference method described here is added to version 2 of our existing application for TFBS analysis which together with a collection of TFBS profiles is available at github.com/nikolaichik/SigmoID.

Speakers
avatar for Yevgeny Nikolaichik

Yevgeny Nikolaichik

Associate Professor, Belarusian State University


Tuesday July 28, 2020 18:35 - 18:40 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

18:40 MSK

Metagenomic Analysis Using k-mer-based Tools Reveal the Presence of Heavy Metal Response Genes in Cyanobacteria found in Copper Mining Sites of Benguet Province, Philippines
Heavy metal contamination in mining sites causes growth inhibition of green vegetation. Fortunately, there are photosynthetic autotrophs, the cyanobacteria that can survive in extreme conditions of the mine tailings. Surface water samples were collected from three sampling points in each Tailings Storage Facility (TSF) of Philex mines in Benguet Province, Philippines such as the re-vegetated Philex TSF1 and the currently active Philex TSF3. Genomic DNA was extracted from all water samples and subjected to shotgun sequencing. A total of 72.87 Gbases raw reads were successfully assembled using St. Petersburg genome assembler (SPAdes). A default and custom-based approaches for both CLARK v1.2.5 and Kraken2 metagenomic classifiers were used in determining taxonomic assignments to contigs using k-mer matches. Prokka was used for the rapid annotation and its output coding sequences were subjected to the evolutionary genealogy of genes-Non-supervised Orthologous Groups (eggNOG) mapper for the analysis of gene ontology. The default CLARK classified a large number of sequences across all sampling points in both re-vegetated and active mining sites. Taxonomic assignments revealed the top five cyanobacteria, namely, the unicellular Synechococcus sp., Cyanobium sp., and Gloeobacter sp., the filamentous, non-heterocystous Leptoplyngbya sp., and the filamentous, heterocystous Nostoc sp. Whereas the custom-based CLARK classified the Leptolyngbya sp., which is about 3% to 4% of the assembled contigs. On the other hand, Kraken2 results revealed the most dominant Rank Order Nostocales ranging from 0.05% to 0.63% of the classified sequences. The cyanobacterial custom-based Kraken2 revealed a large number of sequences belonging to filamentous Fischerella sp. and Trichodesmium sp. in Philex TSF1. A unicellular Microcystis and filamentous Nostoc sp., Spirulina sp., and Pseudanabaena sp. dominated the active Philex TSF3 site. CLARK was able to discriminate cyanobacteria up to the species level while the default Kraken2 classifier was able to distinguish up to the dominant Rank Order taxon. Although the custom-based CLARK detected more cyanobacteria at the Rank Order level compared to Kraken2, the former was only able to determine a single cyanobacterium at the genus level. Kraken2 revealed varying identifications of cyanobacteria in all sites while CLARK consistently identify the same cyanobacterial species among all sites. Protein-coding sequences output from Prokka that were evaluated using eggNOG revealed the genes conferring stress response to Cu2+, Zn2+, Pb2+, Cd2+, Ca2+ metal ions and smt metallothionein. These genes are reported to be responsible for the efflux/transport functions and heavy metal resistance that can be major attributes of cyanobacterial species for their survival to extreme metal conditions. Enhanced growth of Leptolyngbya sp. might also lead to probable formation of viable biological crusts initiating a re-vegetation process. This is the first report of filamentous cyanobacteria dominating the copper and gold mine tailings in Benguet Province successfully assembled and analyzed using a shotgun metagenomic approach.

Speakers
avatar for Libertine Rose S. Sanchez

Libertine Rose S. Sanchez

Institute of Biology Postdoctoral Research Fellow Metagenomics, Metabarcoding, University of the Philippines Diliman
Plant Genetics and Cyanobacterial Biotechnology LaboratoryCIP Researcher, National Institute of Molecular Biology and Biotechnology


Tuesday July 28, 2020 18:40 - 18:45 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

18:45 MSK

Efficient dynamic associative dictionary for large k-mer sets
Motivation:
Since BLAST introduced the seed and extend paradigm, indexing fixed length words (k-mers) from a set of sequences is the bread and butter of most algorithms and methods relying on sequence similarity. Due to the ever-increasing amount of available reference genomes, there is a growing interest in global approaches able to take into account a very broad sequence range. Ambitious applications such as pangenomics or metagenomics require to index billions of distinct k-mers and would benefit from incorporating as many reference genomes as possible. Recently the problem of representing huge k-mer sets with a low memory usage and a high throughput caught the interest of the community. In the last few years, several efficient methods (Pufferfish[1], Bifrost[2], BLight[3], REINDEER[4], Kallisto[5], Jellyfish[6], SRC[7]) were proposed with various applications: k-mer counting, quantification, assembly, ...
Some implementations are specific to their main application, others are generic libraries that can fit various purposes. Jellyfish indexes k-mers using an efficient lock-free dynamic hash table scheme to enable fast k-mer counting. Such a scheme needs to store each k-mer in memory, which represents a memory cost of several bytes per k-mer (4 bytes for 31-mers). Probabilistic dictionaries[7] can use less than 2 bytes per k-mer at the cost of a low false positive rate. Recent improvements provided efficient deterministic k-mer set representations, exploiting nucleotide redundancy in k-mer sets to lower the memory cost[1], and k-mer partitioning to further reduce the storage cost and raise cache coherency [3]. However, the efficiency of some of those methods rely on their static aspect. Large construction or update costs make them unfit to some applications where insertions or deletion are required. For instance, rapid acquisition of new data for microbial pangenomes could benefit from dynamic structures. Large scale dynamic de Bruijn graphs [2,8] are another possible application that is gaining traction.

Results:
We present BRISK (Brisk Reduced Index for sequence of k-mers) a resource-efficient dynamic dictionary able to associate value to k-mers without false positives. It relies on three main ideas.
First, instead of storing k-mers independently, we store super-k-mers, a sequence of k-mers that share the same minimizer, to reduce the amount of nucleotides required to encode overlapping k-mers. We partition super-k-mers according to their minimizers, which allows us to work on smaller structures, and improves cache coherence. Second, we represent a partition as a sorted list of super-k-mers to ensure fast retrieval of k-mers. Lastly, we use less nucleotides by encoding only the suffix and the prefix of a super k-mer without its minimizer itself. In practice using this scheme we are able to encode on average[9] eight 31-mers into a single super-k-mer that can fit on a 64 bits integer.The larger the minimizer size, the faster the queries but also the larger the space overhead. That means that queries can be adapted for different space/time tradeoffs. Furthermore, index usage is highly cache coherent as querying several k-mers sharing the same minimizer only requires one random memory access. For instance, we are able to index 3.2 billion k-mers within 2 minutes and less than 10 GB of RAM.
[1]Almodaresi, F., Sarkar, H., Srivastava, A. and Patro, R., 2018. A space and time-efficient index for the compacted colored de Bruijn graph. Bioinformatics, 34(13), pp.i169-i177.
[2]Holley, G. and Melsted, P., 2019. Bifrost–Highly parallel construction and indexing of colored and compacted de Bruijn graphs. BioRxiv, p.695338.
[3]Marchet, C., Kerbiriou, M. and Limasset, A., 2019, April. Indexing De Bruijn graphs with minimizers. In Recomb seq.
[4]Marchet, C., Iqbal, Z., Gautheret, D., Salson, M. and Chikhi, R., 2020. REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets. ISMB.
[5]Bray, N.L., Pimentel, H., Melsted, P. and Pachter, L., 2016. Near-optimal probabilistic RNA-seq quantification. Nature biotechnology, 34(5), pp.525-527.
[6]Marçais, G. and Kingsford, C., 2011. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics, 27(6), pp.764-770.
[7]Marchet, C., Lecompte, L., Limasset, A., Bittner, L. and Peterlongo, P., 2020. A resource-frugal probabilistic dictionary and applications in bioinformatics. Discrete Applied Mathematics, 274, pp.92-102.
[8]Crawford, V. G., Kuhnle, A., Boucher, C., Chikhi, R., & Gagie, T. (2018). Practical dynamic de Bruijn graphs. Bioinformatics, 34(24), 4189-4195.
[9]Baharav, T.Z., Kamath, G.M., David, N.T. and Shomorony, I., 2020, May. Spectral Jaccard Similarity: A new approach to estimating pairwise sequence alignments. In International Conference on Research in Computational Molecular Biology (pp. 223-225). Springer, Cham.



Speakers
avatar for Yoann Dufresne

Yoann Dufresne

Institut Pasteur


Brisk pdf

Tuesday July 28, 2020 18:45 - 18:50 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

18:50 MSK

Black cat in a dark room: search for new viruses in metagenomes
Detection of hidden viral diversity is a challenging task, which goes beyond the standard protocol of processing metagenomic data. Meanwhile, publicly available databases contain a large amount of metagenomic data — the promising source of novel viral genomes, which remains largely understudied. Here we present the new pipeline for detecting full-length viral genomes from assembled metagenomes.
Viral genomes represent cyclic or linear molecules with the ends containing repeated sequences. Both types could be recognized as cyclic sequences. We detect such contigs by searching repeats ranging from 50 to 200 bp using Knuth-Morris-Pratt algorithm. This algorithm takes linear time depending on the maximum length of allowed repeat, which permits to process large amounts of data and reduce its dimensionality. We classify cyclic sequences as viral or non-viral based on predicted gene content using viralVerify tool (github.com/ablab/viralVerify). For each selected viral contig we identify the capsid and terminase genes based on HMM profiles. We aligned found protein sequences against nr NCBI database with Diamond. The protein sequences, both queries and hits, belonging to each HMM profile were clustered with CD-HIT v4.8.1 (span 80%, identity 50%). The resulting centroid sequences were aligned using MAFFT v7.310 with default parameters, followed by phylogeny reconstruction using UPGMA and RAxML v8.2.11 separately. Clusters that do not contain any hits were classified as previously unknown. Annotation of selected contigs was performed by VGAS tool. The completeness of viral contigs were inspected with (https://github.com/mikeraiko/viralComplete).
We tested our pipeline on assembled metagenomes from NCBI Assembly database. More than 170 Gb of data representing about 1300 metagenomes derived from seawater, soil and biofilms habitats were analyzed. Our analysis revealed that the diversity of viruses is much greater than we know up to date. Hundreds of new viruses clusters were detected. For example, we identified 3 new representatives of the Siphoviridae and Podoviridae bacteriophage families from 10 biofilm-derived metagenomes. Our approach allows us to detect full-length viral genomes with lower chance of false-positive result. In the future, the user of our pipeline can submit metagenome assemblies or raw reads to the input and receive annotated viral genomes from the data. Further analysis of metagenomes from other habitats is indispensable.
This work was supported by St. Petersburg State University (project ID 51555639).

Speakers
avatar for Yulia Yakovleva

Yulia Yakovleva

Department of Cytology and Histology, Saint Petersburg State University, Bioinformatics Institute
Saint Petersburg State University, Russia


Tuesday July 28, 2020 18:50 - 18:55 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

18:55 MSK

Transcriptome meta-analysis of low phosphate-treated rice roots reveals novel players in phosphate starvation response of crops
Phosphorus (P) is a major essential macronutrient for plants, and its availability in the soil is scarce and not renewable. Molecular responses to low concentrations of inorganic phosphate (Pi) are better understood in the model plant Arabidopsis thaliana L., whereas several information gaps still exist in crops. Besides, rice has invaluable socioeconomic importance and is a crop model system for studies on Pi starvation responses due to its high tolerance to Pi deprivation. Thus, unraveling the mechanisms related to Pi starvation in rice is extremely important for plant breeding and agronomic productivity and sustainability. In this work, we analyzed transcriptomic data available in the Gene Expression Omnibus (GEO) public repository aiming to identify new components involved in Pi starvation responses in rice roots. Our analysis was based on all datasets available unambiguously specifying the comparison between the transcriptomics of rice (Oryza sativa L. cv. Nipponbare) roots after a Pi-starvation treatment and rice roots grown under a Pi-sufficient condition during the same period. A total of seven datasets from four independent experiments matching these criteria were used. To explore more deeply the candidate genes that are probably the most directly related to Pi starvation responses, we focused only on the top group of genes, that were significantly responsive in at least six datasets for this review. Precisely, 8 and 25 genes were concomitantly up-regulated, and 3 and 34 genes were concomitantly down-regulated in seven and six datasets, respectively, out of all the seven datasets used in this analysis. The selected genes, for both up- and down-regulated groups, displayed different levels of expression, although being generally consistent among the datasets. By analysing the information available in the literature, we were able to group the Pi-responsive genes into two categories: the ones that are well-known players in Pi metabolism or related processes; and the ones that have unknown function or have known functions in processes or pathways other than those related to Pi metabolism. We present the identification of these genes and discuss their known roles with possible relation with Pi metabolism. Moreover, we pinpoint genes probably important for the Pi metabolism but not yet explored, as well as new insights for known genes but not explored in the context of Pi starvation. Orthologs of the promising Pi-responsive genes with unclear function identified in rice roots were also discussed in the context of other crops. Our analysis and discussion offer new alternatives for the development of crops, mainly rice, with improved phosphate acquisition and use efficiency, helping to cope with the global food crisis combined with the panorama of depleting phosphorus resources. This work is part of our review entitled “Phosphate starvation responses in crop roots: from well-known players to novel candidates”, under publication process.

Speakers
avatar for Yugo Lima-Melo

Yugo Lima-Melo

Postdoctoral researcher, Universidade Federal do Rio Grande do Sul


Tuesday July 28, 2020 18:55 - 19:00 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

19:00 MSK

Assembly and Annotation of Ashkenazi Reference Genome
We describe the assembly and annotation of a new, population-specific human reference genome.  We used publicly available data for HGP HG002 individual from Ashkenazi jewish trio, available from Genome In  A Bottle (GIAB) project.  The new reference that we call Ash1, is more complete than the human reference GRCh38. While GRCh38 is a mosaic of five different individual genomes, our reference represents a single individual. The Ashkenazi reference genome, has 2,973,118,650 nucleotides placed on the chromosomes as compared to 2,937,639,212 in GRCh38. We annotated the genome by transferring the CHESS annotation from GRCh38 genome. The new annotation identified 20,157 protein-coding genes, of which 19,563 are >99% identical to their counterparts on GRCh38. 40 of the protein-coding genes in GRCh38 are missing from Ash1; however, all of these genes are members of multi-gene families for which Ash1 contains other copies. Alignment of DNA sequences from an unrelated part-Ashkenazi (~70%) individual to Ash1 identified ~1 million fewer homozygous SNPs than alignment of those same sequences to the more-distant GRCh38 genome, illustrating one of the benefits of population-specific reference genomes.

Speakers
AZ

Alexey Zimin

Johns Hopkins University, Baltimore, MD, USA


Tuesday July 28, 2020 19:00 - 19:05 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

19:00 MSK

MGnify: Hands-on review (parts 3-4)
Speakers
avatar for Alexandre Almeida

Alexandre Almeida

Postdoctoral Fellow (ESPOD), EMBL-EBI
I am an EBI-Sanger Postdoctoral Fellow focusing on the study of the human gut microbiome using genome-resolved metagenomics. My main research interest is understanding the role of the large uncultured diversity of the gut microbiome in human health and disease.
avatar for Rob Finn

Rob Finn

Team Leader, Sequence Families, EMBL-EBI
Dr Rob Finn leads EMBL-EBI’s Microbiome Informatics team, which is responsible for the MGnify resource, which provides access to the metagenomics, metatranscriptomics and assembly analysis services. The functional and taxonomic profiles of these datasets, once made public, can be... Read More →
avatar for Lorna Richardson

Lorna Richardson

Microbiome Resources Co-ordinator, EMBL-EBI
avatar for Ekaterina Sakharova

Ekaterina Sakharova

Bioinformatician, EMBL-EBI


Tuesday July 28, 2020 19:00 - 20:00 MSK
Zoom Mgnify https://zoom.us/j/93441398259?pwd=ZVRiWWl5ZWFpNlVQZUhVcDB0aTBndz09

19:05 MSK

Serratus: Ultra-deep search to discover novel coronaviruses
Speakers
avatar for Artem Babaian

Artem Babaian

University of British Columbia


Tuesday July 28, 2020 19:05 - 19:10 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09

19:15 MSK

Closing Remarks
Speakers
avatar for Anton Korobeynikov

Anton Korobeynikov

Associate Professor, Center for Algorithmic Biotechnology, Saint Petersburg State University, 6 linia V.O., 11/21d, 1990034 St Petersburg, Russia
avatar for Alla Lapidus

Alla Lapidus

Professor, Center for Algorithmic Biotechnology, Saint Petersburg State University, 6 linia V.O., 11/21d, 1990034 St Petersburg, Russia


Tuesday July 28, 2020 19:15 - 19:30 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09
 
Wednesday, July 29
 

11:00 MSK

MGnify: Hands-on review (parts 3-4)
Speakers
avatar for Alexandre Almeida

Alexandre Almeida

Postdoctoral Fellow (ESPOD), EMBL-EBI
I am an EBI-Sanger Postdoctoral Fellow focusing on the study of the human gut microbiome using genome-resolved metagenomics. My main research interest is understanding the role of the large uncultured diversity of the gut microbiome in human health and disease.
avatar for Rob Finn

Rob Finn

Team Leader, Sequence Families, EMBL-EBI
Dr Rob Finn leads EMBL-EBI’s Microbiome Informatics team, which is responsible for the MGnify resource, which provides access to the metagenomics, metatranscriptomics and assembly analysis services. The functional and taxonomic profiles of these datasets, once made public, can be... Read More →
avatar for Lorna Richardson

Lorna Richardson

Microbiome Resources Co-ordinator, EMBL-EBI
avatar for Ekaterina Sakharova

Ekaterina Sakharova

Bioinformatician, EMBL-EBI


Wednesday July 29, 2020 11:00 - 12:00 MSK
Zoom Mgnify https://zoom.us/j/93441398259?pwd=ZVRiWWl5ZWFpNlVQZUhVcDB0aTBndz09
 
  • Timezone
  • Filter By Date Bioinformatics: from Algorithms to Applications 2020 Jul 27 -29, 2020
  • Filter By Venue Virtual
  • Filter By Type
  • MGnify Workshop
  • Opening / closing
  • Q & A: Keynotes
  • Q & A: Posters
  • Q & A: Talks


Twitter Feed

Filter sessions
Apply filters to sessions.