Back To Schedule
Tuesday, July 28 • 18:35 - 18:40
Genome-wide inference of bacterial transcription factor binding sites: new method and its applications

Log in to save this to your schedule, view media, leave feedback and see who's attending!

None of the current bacterial genome annotation pipelines handles regulatory sequences. Transcription factor binding sites (TFBS or operators) are the most abundant regulatory elements, the methods for their fast genome-wide inference are currently lacking while the importance of TFBSs for understanding genome function is critical.
The method of bacterial TFBS inference we are developing is based on the analysis of 3D structures of transcription factor (TF)-operator complexes. We use TF residues contacting DNA bases as a tag (CR-tag) to link TFs with their operators. TFBSs can be inferred genome-wide via either (1) fast automated CR-tag based genome scan with a library of CR-tagged experimentally characterised TFBS motifs or (2) application of slow semi-automated de novo TFBS inference protocol combining CR-tag information with genome structure analysis.
The first approach allows to reliably transfer regulatory information between different species, not necessarily closely related. Even distantly related TFs of Gram-negative and Gram-positive bacteria can have the same CR-tags and hence recognise the same operators. However, direct regulatory information transfer is most efficient within the same taxonomic order (e.g. over 50% of TF orthologue pairs within Enterobacteriales have identical CR tags).
The de novo protocol builds upon the well-established phylogenetic footprinting approach replacing assumption of similar TFs recognising similar operators by strict 3D-structure based criterium (CR-tag) and is universally applicable to any bacterial species.
We illustrate the following applications of our approach:
1) Correcting poorly defined motifs.
For most TFs in a given species, just one or very few targets exist and proper TFBS models cannot be built. With our de novo TFBS inference protocol, orthologous operator sequences can be collected from other species that have TFs with the same CR-tag. This usually provides enough information for properly defining the motif and building high-quality operator model. This approach can vastly improve the usability of the data from single-organism TFBS databases like RegulonDB.
2) Resolving regulation details for paralogous TFs.
Using our CR-tag based approach and experimental evidence, we show that paralogous quorum-sensing regulators in Pectobacterium spp. recognise the same operator sequence, although completely different operators have been suggested previously.
3) The advantages of full-scale genome-wide TFBS inference.
With a current collection of TFBS profiles, genome-wide scan finds operators for the majority of transcription units in a typical enterobacterial genome. This helps to reveal unexpected regulators for many transcriptional units and allows deciphering regulatory cascades. We will provide examples of such inferred transcriptional cascades supported by experimental data.
4) Genome-wide TFBS scan can also be useful when correcting automated genome annotation, since finding an operator for a well-characterised TF can suggest functions for the downstream genes (doi:10.7717/peerj.2056).
The TFBS inference method described here is added to version 2 of our existing application for TFBS analysis which together with a collection of TFBS profiles is available at github.com/nikolaichik/SigmoID.

avatar for Yevgeny Nikolaichik

Yevgeny Nikolaichik

Associate Professor, Belarusian State University

Tuesday July 28, 2020 18:35 - 18:40 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09