Loading…
Tuesday, July 28 • 18:25 - 18:30
Single-cell ChIP-seq imputation with machine learning models leveraging bulk ENCODE data

Log in to save this to your schedule, view media, leave feedback and see who's attending!



Next generation sequencing is routinely used in biomedical research and pharmaceutical industry. Applied in combination with chromatin immunoprecipitation (ChIP-seq), it provides detailed insights in cell genomic properties such as chromatin accessibility and protein-DNA interactions that play a key role in gene regulation and chromatin structure (ENCODE project consortium, 2012). Recently developed assays for single-cell ChIP-seq (scChIP-seq) enable the characterization of these molecular events on single-cell resolution. This allows the investigation of cell differentiation processes that are of crucial interest in many research fields, especially in cancer studies. While the sequencing coverage can be as low as 1000 reads per single cell (Rotem, Assaf, et al. 2015), it was nevertheless possible to investigate relationships between drug-sensitive and resistant breast cancer cells (Grosselin, Kevin, et al. 2019). Such concise findings would not have been possible with bulk ChIP-seq data. However, the sparsity problem caused by the low signal given for an individual cell, hampers further investigations and there is a need for a dedicated imputation method for scChIP-seq. Furthermore, past publications based on sparse datasets from single-cell RNA-seq which is more established, demonstrate that imputation methods strongly enhance research on such data (Peng, Tao, et al. 2019). Eventually, the full potential of future scChIP-seq studies will not be captured without the application of a dedicated imputation method to complete the data. To address this need we developed SIMPA, an algorithm for Single-cell chIp-seq iMPutAtion.

Based on a large dataset of more than 2250 preprocessed bulk ChIP-seq datasets from the ENCODE data portal, SIMPA leverages statistical patterns within a reference set specified by the target, the investigated histone mark or transcription factor used in the scChIP. The existence of those patterns was proved by a cross-validation analysis on classification models. Considering one single cell, SIMPA trains numerous (~120,000 on 5kb resolution) machine learning models to impute missing genomic regions while being sensitive to the
sparse signal of the individual cell. Compared to another imputation strategy (Xiong, Lei, et
al. 2019) that does not involve reference bulk data, SIMPA achieves a better clustering by cell-types. Using a KEGG pathway enrichment tool (Li, Shaojuan, et al. 2019) we could show that functionally related pathways were recovered in a cell-type-specific manner, but only on imputed results form SIMPA. Finally, randomization tests confirmed that both the single cells signal and the target-specific reference data is used by SIMPA to achieve these meaningful imputations.

Our new imputation algorithm was validated on a set of more than 2600 B-cell and T-cell single cells for two different histone marks: H3K4me3 and H3K27me3 at 5kb and 50kb resolution, respectively. Indeed, this is so far the only scChIP-seq dataset available for human cells. In order to efficiently use resources, SIMPA was implemented with an MPI interface to distribute the computations to many cores possibly from different compute nodes. Software is available at https://github.com/salbrec/SIMPA

In conclusion, to address problems related to data sparsity in single-cell ChIP-seq, we developed the first dedicated imputation method that generates accurate and biologically relevant results.

Speakers
avatar for Steffen Albrecht

Steffen Albrecht

PhD Student, Johannes Gutenberg University Mainz
Hello, my name is Steffen Albrecht and I am from Mainz in Germany.Currently, I am a PhD student in the group Computational Biology and Data Mining and my main topics are machine learning and bioinformatics data integration. The application fields are imputation, e.g. for sparse data... Read More →


Tuesday July 28, 2020 18:25 - 18:30 MSK
Zoom Conference https://zoom.us/j/94321101353?pwd=QlJBb09uM0NVVnVyK0FkbTJ3Nkcrdz09