Secondary metabolites (SMs) are at the center of attention for a wide range of researchers from biologists and ecologists to pharmacologists and biomedical scientists [1]. Modern mass spectrometry instruments allow rapid and low-cost scanning of thousands of metabolites which result in huge amounts of high-resolution data. Although this data represents a gold mine for future discoveries, its interpretation remains a bottleneck and requires appropriate computational methods [2]. The current software is either limited to specific classes of SMs, for example, peptidic natural products (VarQuest [3]), or can perform only standard database search which allows identification of known SMs but fails to discover their novel variants (Dereplicator+ [4]).
Here we present VarQuest+, a database search tool capable of identifying novel variants of a wide range of known SMs including polyketides, alkaloids, flavonoids, saponins, and many others. Algorithmic and software innovations in VarQuest+ make it much more efficient in the running time and memory consumption in comparison to existing analogs. This efficiency allowed the implementation of modification-tolerant search mode in VarQuest+, which is more challenging than a regular database search.
We benchmarked VarQuest+ on a Korean medical plants dataset (2.5 millions of mass spectra collected on 337 samples). The standard search of the KNApSAcK database (51,179 plant SMs [5]) resulted in the identification of 349 compounds. VarQuest+ modification-tolerant search identified 4,253 SMs, an order of magnitude more than Dereplicator+. Using the same search parameters, VarQuest+ is twenty times more efficient than Dereplicator+ in runtime, and four times more memory efficient.
The reported study was funded by RFBR, project number 20-04-01096.
References [1] Cragg, G. M., & Newman, D. J. (2013) Natural products: a continuing source of novel drug leads. Biochimica et Biophysica Acta (BBA)-General Subjects, 1830(6), 3670-3695. [2] Wang, M. et al. (2016) Sharing and community curation of mass spectrometry data with Global Natural Products Social molecular networking. Nat. Biotechnol., 34, 828. [3] Gurevich, A. et al. (2018) Increased diversity of peptidic natural products revealed by modification-tolerant database search of mass spectra. Nat. Microbiol., 3, 319. [4] Mohimani, H., et al (2018) Dereplication of microbial metabolites through database search of mass spectra. Nat. comm., 9:4035 [5] Afendi, F.M. et al (2012) KNApSAcK Family Databases: Integrated Metabolite–Plant Species Databases for Multifaceted Plant Research. Plant and Cell Physiology, 53 (2), e1.
Senior Research Scientist, Center for Algorithmic Biotechnology, St. Petersburg State University, St. Petersburg, Russia
I am leading Natural Product Discovery research direction at CAB (http://cab.spbu.ru/research/antibiotics-discovery/). Together with the Center for Computational Mass Spectrometry at UCSD and Mohimani Lab at Carnegie Mellon University, we are creating software for identification of... Read More →