Simultaneous deep generative modelling and clustering of single-cell genomic data

feature-image

Play all audios:

Loading...

ABSTRACT Recent advances in single-cell technologies, including single-cell ATAC-seq (scATAC-seq), have enabled large-scale profiling of the chromatin accessibility landscape at the


single-cell level. However, the characteristics of scATAC-seq data, including high sparsity and high dimensionality, have greatly complicated the computational analysis. Here, we propose


scDEC, a computational tool for scATAC-seq analysis with deep generative neural networks. scDEC is built on a pair of generative adversarial networks, and is capable of simultaneously


learning the latent representation and inferring cell labels. In a series of experiments, scDEC demonstrates superior performance over other tools in scATAC-seq analysis across multiple


datasets and experimental settings. In downstream applications, we demonstrate that the generative power of scDEC helps to infer the trajectory and intermediate state of cells during


differentiation and the latent features learned by scDEC can potentially reveal both biological cell types and within-cell-type variations. We also show that it is possible to extend scDEC


for the integrative analysis of multi-modal single cell data. Access through your institution Buy or subscribe This is a preview of subscription content, access via your institution ACCESS


OPTIONS Access through your institution Access Nature and 54 other Nature Portfolio journals Get Nature+, our best-value online-access subscription $29.99 / 30 days cancel any time Learn


more Subscribe to this journal Receive 12 digital issues and online access to articles $119.00 per year only $9.92 per issue Learn more Buy this article * Purchase on SpringerLink * Instant


access to full article PDF Buy now Prices may be subject to local taxes which are calculated during checkout ADDITIONAL ACCESS OPTIONS: * Log in * Learn about institutional subscriptions *


Read our FAQs * Contact customer support SIMILAR CONTENT BEING VIEWED BY OTHERS SIMULTANEOUS DIMENSIONALITY REDUCTION AND INTEGRATION FOR SINGLE-CELL ATAC-SEQ DATA USING DEEP LEARNING


Article Open access 23 February 2022 MULTI-BATCH SINGLE-CELL COMPARATIVE ATLAS CONSTRUCTION BY DEEP LEARNING DISENTANGLEMENT Article Open access 12 July 2023 SCICAN: SINGLE-CELL CHROMATIN


ACCESSIBILITY AND GENE EXPRESSION DATA INTEGRATION VIA CYCLE-CONSISTENT ADVERSARIAL NETWORK Article Open access 12 September 2022 DATA AVAILABILITY The InSilico dataset was collected from


the GEO database with accession number GSE65360. The mouse Forebrain dataset was downloaded from the GEO database with accession number GSE100033. The Splenocyte dataset can be accessed at


ArrayExpress database with accession number E-MTAB-6714. The All blood dataset can be accessed at the GEO database with accession number GSE96772. The mouse atlas data are available at


http://atlas.gs.washington.edu/mouse-atac. The human PBMCs dataset used in multi-modal single cell analysis was downloaded from 10x Genomics


(https://support.10xgenomics.com/single-cell-multiome-atac-gex) with entry ‘pbmc_granulocyte_sorted_10k’. The preprocessed scATAC-seq data used as input for scDEC model in this study can be


downloaded from https://doi.org/10.5281/zenodo.397785856. CODE AVAILABILITY scDEC is open-source software based on the TensorFlow library57, which is available on Github


(https://github.com/kimmo1019/scDEC) and Zenodo (https://doi.org/10.5281/zenodo.4560834)58. A CodeOcean capsule with several example datasets is available at


https://codeocean.com/capsule/0746056/tree/v159. The pretrained models on both benchmark single-cell datasets and 10x Genomics PBMCs multi-modal single-cell dataset were provided. REFERENCES


* Klemm, S. L., Shipony, Z. & Greenleaf, W. J. Chromatin accessibility and the regulatory epigenome. _Nat. Rev. Genet._ 20, 207–220 (2019). Article  Google Scholar  * Corces, M. R. et


al. The chromatin accessibility landscape of primary human cancers. _Science_ 362, eaav1898 (2018). Article  Google Scholar  * Stuart, T. & Satija, R. Integrative single-cell analysis.


_Nat. Rev. Genet._ 20, 257–272 (2019). Article  Google Scholar  * Cusanovich, D. A. et al. Multiplex single-cell profiling of chromatin accessibility by combinatorial cellular indexing.


_Science_ 348, 910–914 (2015). Article  Google Scholar  * Buenrostro, J. D. et al. Single-cell chromatin accessibility reveals principles of regulatory variation. _Nature_ 523, 486–490


(2015). Article  Google Scholar  * Chen, H. et al. Assessment of computational methods for the analysis of single-cell ATAC-seq data. _Genome Biol._ 20, 241 (2019). Article  Google Scholar 


* Zamanighomi, M. et al. Unsupervised clustering and epigenetic classification of single cells. _Nat. Commun._ 9, 2410 (2018). Article  Google Scholar  * González-Blas, C. B. et al.


cisTopic: cis-regulatory topic modeling on single-cell ATAC-seq data. _Nat. Methods_ 16, 397–400 (2019). Article  Google Scholar  * Cusanovich, D. A. et al. A single-cell atlas of in vivo


mammalian chromatin accessibility. _Cell_ 174, 1309–1324.e1318 (2018). Article  Google Scholar  * Baker, S. M., Rogerson, C., Hayes, A., Sharrocks, A. D. & Rattray, M. Classifying cells


with Scasat, a single-cell ATAC-seq analysis tool. _Nucleic Acids Res._ 47, e10 (2019). Article  Google Scholar  * Fang, R. et al. Comprehensive analysis of single cell ATAC-seq data with


SnapATAC. _Nat. Commun._ 12, 1337 (2021). Article  Google Scholar  * Goodfellow, I. et al. Generative adversarial nets. In _Proceedings of Advances in Neural Information Processing Systems_


(_NeurIPS_) 2672–2680 (NIPS, 2014). * Kingma, D. P. & Welling, M. Auto-encoding variational bayes. In _Proceedings of International Conference on Learning Representations_ (ICLR, 2014).


* Liu, Q., Lv, H. & Jiang, R. hicGAN infers super resolution Hi-C data with generative adversarial networks. _Bioinformatics_ 35, i99–i107 (2019). Article  Google Scholar  * Xiong, L. et


al. SCALE method for single-cell ATAC-seq analysis via latent feature extraction. _Nat. Commun._ 10, 4576 (2019). Article  Google Scholar  * Zhu, J.-Y., Park, T., Isola, P. & Efros, A.


A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In _Proceedings of the IEEE International Conference on Computer Vision_ 2223–2232 (ICCV, 2017). * Liu,


Q., Xu, J., Jiang, R. & Wong, W. H. Density estimation using deep generative neural networks. _Proc. Natl Acad. Sci. USA_ 118, e2101344118 (2021). Article  Google Scholar  * van der


Maaten, L. & Hinton, G. Visualizing data using t-SNE. _J. Mach. Learn. Res._ 9, 2579–2605 (2008). MATH  Google Scholar  * McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold


approximation and projection. _J. Open Source Software_ 3, 861 (2018). Article  Google Scholar  * Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of


communities in large networks. _J. Stat. Mech._ 2008, P10008 (2008). Article  MATH  Google Scholar  * Preissl, S. et al. Single-nucleus analysis of accessible chromatin in developing mouse


forebrain reveals cell-type-specific transcriptional regulation. _Nat. Neurosci._ 21, 432–439 (2018). Article  Google Scholar  * Chen, X., Miragaia, R. J., Natarajan, K. N. & Teichmann,


S. A. A rapid and robust method for single cell chromatin accessibility profiling. _Nat. Commun._ 9, 5345 (2018). Article  Google Scholar  * Buenrostro, J. D. et al. Integrated single-cell


analysis maps the continuous regulatory landscape of human hematopoietic differentiation. _Cell_ 173, 1535–1548 (2018). Article  Google Scholar  * Schep, A. N., Wu, B., Buenrostro, J. D.


& Greenleaf, W. J. chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. _Nat. Methods_ 14, 975–978 (2017). Article  Google Scholar  *


Mathelier, A. et al. JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles. _Nucleic Acids Res._ 44, D110–115 (2016). Article  Google


Scholar  * Shaltouki, A., Peng, J., Liu, Q., Rao, M. S. & Zeng, X. Efficient generation of astrocytes from human pluripotent stem cells in defined conditions. _Stem Cells_ 31, 941–952


(2013). Article  Google Scholar  * Bayam, E. et al. Genome-wide target analysis of NEUROD2 provides new insights into regulation of cortical projection neuron migration and differentiation.


_BMC Genomics_ 16, 681 (2015). Article  Google Scholar  * Owa, T. et al. Meis1 coordinates cerebellar granule cell development by regulating Pax6 transcription, BMP signaling and Atoh1


degradation. _J. Neurosci._ 38, 1277–1294 (2018). Article  Google Scholar  * Hallonet, M., Hollemann, T., Pieler, T. & Gruss, P. _Vax1_, a novel homeobox-containing gene, directs


development of the basal forebrain and visual system. _Genes Dev._ 13, 3106–3114 (1999). Article  Google Scholar  * Cesari, F. et al. Mice deficient for the Ets transcription factor Elk-1


show normal immune responses and mildly impaired neuronal gene activation. _Mol. Cell. Biol._ 24, 294–305 (2004). Article  Google Scholar  * Stolt, C. C. et al. The Sox9 transcription factor


determines glial fate choice in the developing spinal cord. _Genes Dev._ 17, 1677–1689 (2003). Article  Google Scholar  * Street, K. et al. Slingshot: cell lineage and pseudotime inference


for single-cell transcriptomics. _BMC Genomics_ 19, 477 (2018). Article  Google Scholar  * Iwasaki, H. & Akashi, K. Myeloid lineage commitment from the hematopoietic stem cell.


_Immunity_ 26, 726–740 (2007). Article  Google Scholar  * Gilmour, J. et al. A crucial role for the ubiquitously expressed transcription factor Sp1 at early stages of hematopoietic


specification. _Development_ 141, 2391–2401 (2014). Article  Google Scholar  * Anderson, K. C. et al. Expression of human B cell-associated antigens on leukemias and lymphomas: a model of


human B cell differentiation. _Blood_ 63, 1424–1433 (1984). * Villani, A.-C. et al. Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors.


_Science_ 356, eaah4573 (2017). Article  Google Scholar  * Argelaguet, R. et al. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. _Genome Biol._


21, 111 (2020). Article  Google Scholar  * Jin, S., Zhang, L. & Nie, Q. scAI: an unsupervised approach for the integrative analysis of parallel single-cell transcriptomic and epigenomic


profiles. _Genome Biol._ 21, 25 (2020). Article  Google Scholar  * Stuart, T. et al. Comprehensive integration of single-cell data. _Cell_ 177, 1888–1902.e1821 (2019). Article  Google


Scholar  * Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. _Nat. Methods_ 16, 1289–1296 (2019). * Teller, V. Speech and language processing:


an introduction to natural language processing, computational linguistics, and speech recognition. _Comput. Linguist._ 26, 638–641 (2000). Article  Google Scholar  * Chowdhury, G. G.


_Introduction to Modern Information Retrieval_ (Facet, 2010). * Halko, N., Martinsson, P.-G. & Tropp, J. A. Finding structure with randomness: probabilistic algorithms for constructing


approximate matrix decompositions. _SIAM Rev._ 53, 217–288 (2011). Article  MathSciNet  MATH  Google Scholar  * Pedregosa, F. et al. Scikit-learn: machine learning in Python. _J. Mach.


Learn. Res._ 12, 2825–2830 (2011). MathSciNet  MATH  Google Scholar  * Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V. & Courville, A. C. Improved training of Wasserstein GANs. In


_Proceedings of Advances in Neural Information Processing Systems_ 5767–5777 (NIPS, 2017). * Yi, Z., Zhang, H., Tan, P. & Gong, M. Dualgan: Unsupervised dual learning for image-to-image


translation. In _Proceedings of the IEEE International Conference on Computer Vision_ 2849–2857 (ICCV, 2017). * Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In


_Proceedings of International Conference on Learning Representations_ (ICLR, 2014). * Mukherjee, S., Asnani, H., Lin, E. & Kannan, S. In _Proceedings of the AAAI Conference on Artificial


Intelligence_ Vol. 33, 4610–4617 (AAAI, 2019). * Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In _Proceedings of


the 32nd International Conference on Machine Learning_ 448–456 (ICML, 2015). * Strehl, A. & Ghosh, J. Cluster ensembles—a knowledge reuse framework for combining multiple partitions.


_J. Mach. Learn. Res._ 3, 583–617 (2002). MathSciNet  MATH  Google Scholar  * Hubert, L. & Arabie, P. Comparing partitions. _J. Classification_ 2, 193–218 (1985). Article  MATH  Google


Scholar  * Rosenberg, A. & Hirschberg, J. V-measure: A conditional entropy-based external cluster evaluation measure. In _Proceedings of the 2007 Joint Conference on Empirical Methods in


Natural Language Processing and Computational Natural Language Learning_ 410–420 (EMNLP-CoNLL, 2007). * Rand, W. M. Objective criteria for the evaluation of clustering methods. _J. Am.


Stat. Assoc._ 66, 846–850 (1971). Article  Google Scholar  * Tibshirani, R., Walther, G. & Hastie, T. Estimating the number of clusters in a data set via the gap statistic. _J. R. Stat.


Soc. B_ 63, 411–423 (2001). Article  MathSciNet  MATH  Google Scholar  * Mann, H. B. & Whitney, D. R. On a test of whether one of two random variables is stochastically larger than the


other. _Ann. Math. Stat._ 18, 50–60 (1947). * Liu, Q. et al. scDEC: data for simultaneous deep generative modeling and clustering of single cell genomic data. _Zenodo_


https://doi.org/10.5281/zenodo.3984189 (2020). * Abadi, M. et al. Tensorflow: a system for large-scale machine learning. In _Proceedings of 12th USENIX Symposium on Operating Systems Design


and Implementation_ 265–283 (OSDI, 2016). * Liu, Q. et al. scDEC: code for simultaneous deep generative modeling and clustering of single cell genomic data. _Zenodo_


https://doi.org/10.5281/zenodo.4560834 (2021). * Liu, Q. et al. scDEC: simultaneous deep generative modeling and clustering of single cell genomic data. _CodeOcean_


https://doi.org/10.24433/CO.3347162.v1 (2020). Download references ACKNOWLEDGEMENTS This work was supported by NIH grants R01 HG010359 (W.H.W.) and P50 HG007735 (W.H.W.). This work was also


supported by the National Key Research and Development Program of China no. 2018YFC0910404 (R.J.), the National Natural Science Foundation of China nos 61873141 (R.J.), 61721003 (R.J.) and


61573207 (R.J.). AUTHOR INFORMATION AUTHORS AND AFFILIATIONS * Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research


Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing, China Qiao Liu, Shengquan Chen & Rui


Jiang * Department of Statistics, Stanford University, Stanford, CA, USA Qiao Liu & Wing Hung Wong * Department of Biomedical Data Science, Bio-X Program, Center for Personal Dynamic


Regulomes, Stanford University, Stanford, CA, USA Wing Hung Wong Authors * Qiao Liu View author publications You can also search for this author inPubMed Google Scholar * Shengquan Chen View


author publications You can also search for this author inPubMed Google Scholar * Rui Jiang View author publications You can also search for this author inPubMed Google Scholar * Wing Hung


Wong View author publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS W.H.W., R.J. and Q.L. conceived the study. Q.L. designed and implemented scDEC. Q.L.,


S.C. and W.H.W. performed the data analysis. Q.L. and W.H.W. interpreted the results. Q.L., R.J. and W.H.W. wrote the manuscript. CORRESPONDING AUTHORS Correspondence to Rui Jiang or Wing


Hung Wong. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no competing interests. ADDITIONAL INFORMATION PEER REVIEW INFORMATION _Nature Machine Intelligence_ thanks the


anonymous reviewers for their contribution to the peer review of this work. PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps and


institutional affiliations. SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION Supplementary Figs. 1–18 and Tables 1–6. REPORTING SUMMARY RIGHTS AND PERMISSIONS Reprints and permissions


ABOUT THIS ARTICLE CITE THIS ARTICLE Liu, Q., Chen, S., Jiang, R. _et al._ Simultaneous deep generative modelling and clustering of single-cell genomic data. _Nat Mach Intell_ 3, 536–544


(2021). https://doi.org/10.1038/s42256-021-00333-y Download citation * Received: 14 August 2020 * Accepted: 14 March 2021 * Published: 10 May 2021 * Issue Date: June 2021 * DOI:


https://doi.org/10.1038/s42256-021-00333-y SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable link Sorry, a shareable link is not


currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative