A robust benchmark for detection of germline large deletions and insertions

feature-image

Play all audios:

Loading...

ABSTRACT New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help


translate these methods to routine research and clinical practice, we developed a sequence-resolved benchmark set for identification of both false-negative and false-positive germline large


insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle Consortium


integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12,745 isolated, sequence-resolved insertion (7,281) and deletion (5,464)


calls ≥50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5,262 insertions and 4,095 deletions supported by ≥1


diploid assembly. We demonstrate that the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked- and long-read sequencing and


optical mapping. Access through your institution Buy or subscribe This is a preview of subscription content, access via your institution ACCESS OPTIONS Access through your institution Access


Nature and 54 other Nature Portfolio journals Get Nature+, our best-value online-access subscription $29.99 / 30 days cancel any time Learn more Subscribe to this journal Receive 12 print


issues and online access $209.00 per year only $17.42 per issue Learn more Buy this article * Purchase on SpringerLink * Instant access to full article PDF Buy now Prices may be subject to


local taxes which are calculated during checkout ADDITIONAL ACCESS OPTIONS: * Log in * Learn about institutional subscriptions * Read our FAQs * Contact customer support SIMILAR CONTENT


BEING VIEWED BY OTHERS COMPREHENSIVE BENCHMARKING AND GUIDELINES OF MOSAIC VARIANT CALLING STRATEGIES Article Open access 12 October 2023 TRADEOFFS IN ALIGNMENT AND ASSEMBLY-BASED METHODS


FOR STRUCTURAL VARIANT DETECTION WITH LONG-READ SEQUENCING DATA Article Open access 19 March 2024 VARIANT CALLING AND BENCHMARKING IN AN ERA OF COMPLETE HUMAN GENOME SEQUENCES Article 14


April 2023 DATA AVAILABILITY Raw sequence data were previously published in Scientific Data (https://doi.org/10.1038/sdata.2016.25) and deposited in the National Center for Biotechnology


Information (NCBI) Sequence Read Archive with the accession codes SRX847862 to SRX848317, SRX1388732 to SRX1388743, SRX852933, SRX5527202, SRX5327410 and SRX1033793 to SRX1033798. 10×


Genomics Chromium bam files used are available at


ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/analysis/10XGenomics_ChromiumGenome_LongRanger2.2_Supernova2.0.1_04122018/. The data used in this paper and other


data sets for these genomes are available at ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/ and in the NCBI BioProject PRJNA200694. The v0.6 SV benchmark set (only compare to


variants in the Tier 1 vcf inside the Tier 1 bed with the FILTER ‘PASS’) for HG002 on GRCh37 is available in dbVar accession nstd175 and at


ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/analysis/NIST_SVs_Integration_v0.6/. Input SV callsets, assemblies and other analyses for this trio are available


at ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/analysis/. CODE AVAILABILITY Scripts for integrating candidate structural variants to form the benchmark set in


this paper are available in a GitHub repository at https://github.com/jzook/genome-data-integration/tree/master/StructuralVariants/NISTv0.6. This repository includes Jupyter notebooks for


the comparisons to HGSVC, GRC, vg, paragraph and Bionano. Publicly available software used to generate input callsets is described in the Methods. CHANGE HISTORY * _ 22 JULY 2020 An


amendment to this paper has been published and can be accessed via a link at the top of the paper. _ REFERENCES * Sebat, J. et al. Strong association of de novo copy number mutations with


autism. _Science_ 316, 445–449 (2007). Article  CAS  PubMed  PubMed Central  Google Scholar  * Merker, J. D. et al. Long-read genome sequencing identifies causal structural variation in a


Mendelian disease. _Genet. Med._ 20, 159–163 (2018). Article  CAS  PubMed  Google Scholar  * Mantere, T., Kersten, S. & Hoischen, A. Long-read sequencing emerging in medical genetics.


_Front. Genet._ 10, 426 (2019). Article  CAS  PubMed  PubMed Central  Google Scholar  * Roses, A. D. et al. Structural variants can be more informative for disease diagnostics, prognostics


and translation than current SNP mapping and exon sequencing. _Expert Opin. Drug Metab. Toxicol._ 12, 135–147 (2016). Article  CAS  PubMed  Google Scholar  * Chiang, C. et al. The impact of


structural variation on human gene expression. _Nat. Genet._ 49, 692–699 (2017). Article  CAS  PubMed  PubMed Central  Google Scholar  * Chaisson, M. J. P. et al. Multi-platform discovery of


haplotype-resolved structural variation in human genomes. _Nat. Commun._ 10, 1784 (2019). Article  PubMed  PubMed Central  Google Scholar  * Ball, M. P. et al. A public resource


facilitating clinical use of genomes. _Proc. Natl Acad. Sci. USA_ 109, 11920–11927 (2012). Article  CAS  PubMed  PubMed Central  Google Scholar  * Zook, J. M. et al. Extensive sequencing of


seven human genomes to characterize benchmark reference materials. _Sci. Data_ 3, 160025 (2016). Article  CAS  PubMed  PubMed Central  Google Scholar  * Zook, J. M. et al. An open resource


for accurately benchmarking small variant and reference calls. _Nat. Biotechnol._ 37, 561–566 (2019). Article  CAS  PubMed  PubMed Central  Google Scholar  * Sebat, J. et al. Large-scale


copy number polymorphism in the human genome. _Science_ 305, 525–528 (2004). Article  CAS  PubMed  Google Scholar  * Spies, N. et al. Genome-wide reconstruction of complex structural


variants using read clouds. _Nat. Methods_ 14, 915–920 (2017). Article  CAS  PubMed  PubMed Central  Google Scholar  * Marks, P. et al. Resolving the full spectrum of human genome variation


using Linked-Reads. _Genome Res._ 29, 635–645 (2019). Article  CAS  PubMed  PubMed Central  Google Scholar  * Karaoglanoglu, F. et al. VALOR2: characterization of large-scale structural


variants using linked-reads. _Genome Biol._ 21, 72 (2020). Article  PubMed  PubMed Central  Google Scholar  * Weisenfeld, N. I., Kumar, V., Shah, P., Church, D. M. & Jaffe, D. B. Direct


determination of diploid genome sequences. _Genome Res._ 27, 757–767 (2017). Article  CAS  PubMed  PubMed Central  Google Scholar  * Sedlazeck, F. J. et al. Accurate detection of complex


structural variations using single-molecule sequencing. _Nat. Methods_ 15, 461–468 (2018). Article  CAS  PubMed  PubMed Central  Google Scholar  * Cretu Stancu, M. et al. Mapping and phasing


of structural variation in patient genomes using nanopore sequencing. _Nat. Commun._ 8, 1326 (2017). Article  PubMed  PubMed Central  Google Scholar  * Chaisson, M. J. P. et al. Resolving


the complexity of the human genome using single-molecule sequencing. _Nature_ 517, 608–611 (2014). Article  PubMed  PubMed Central  Google Scholar  * Chin, C.-S. et al. Phased diploid genome


assembly with single-molecule real-time sequencing. _Nat. Methods_ 13, 1050–1054 (2016). Article  CAS  PubMed  PubMed Central  Google Scholar  * Koren, S. et al. De novo assembly of


haplotype-resolved genomes with trio binning. _Nat. Biotechnol._ https://doi.org/10.1038/nbt.4277 (2018). * Kaiser, M. D. et al. Automated structural variant verification in human genomes


using single-molecule electronic DNA mapping. Preprint at https://www.biorxiv.org/content/10.1101/140699v1.full (2017). * Lam, E. T. et al. Genome mapping on nanochannel arrays for


structural variation analysis and sequence assembly. _Nat. Biotechnol._ 30, 771–776 (2012). Article  CAS  PubMed  Google Scholar  * Barseghyan, H. et al. Next-generation mapping: a novel


approach for detection of pathogenic structural variants with a potential utility in clinical diagnosis. _Genome Med._ 9, 90 (2017). Article  PubMed  PubMed Central  Google Scholar  * Zook,


J. M. et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. _Nat. Biotechnol._ 32, 246–251 (2014). Article  CAS  PubMed  Google Scholar 


* Krusche, P. et al. Best practices for benchmarking germline small-variant calls in human genomes. _Nat. Biotechnol._ 37, 555–560 (2019). Article  CAS  PubMed  PubMed Central  Google


Scholar  * Cleveland, M. H., Zook, J. M., Salit, M. & Vallone, P. M. Determining performance metrics for targeted next-generation sequencing panels using reference materials. _J. Mol.


Diagn._ 20, 583–590 (2018). * Wenger, A. M. et al. Highly-accurate long-read sequencing improves variant detection and assembly of a human genome. _Nat. Biotechnol._ 37, 1155-1162 (2019). *


Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. _Nature_ 526, 75–81 (2015). Article  CAS  PubMed  PubMed Central  Google Scholar  * Conrad, D. F. et


al. Origins and functional impact of copy number variation in the human genome. _Nature_ 464, 704–712 (2010). Article  CAS  PubMed  Google Scholar  * Parikh, H. et al. svclassify: a method


to establish benchmark structural variant calls. _BMC Genomics_ 17, 64 (2016). Article  PubMed  PubMed Central  Google Scholar  * Pang, A. W. et al. Towards a comprehensive structural


variation map of an individual human genome. _Genome Biol._ 11, R52 (2010). Article  PubMed  PubMed Central  Google Scholar  * Mu, J. C. et al. Leveraging long read sequencing from a single


individual to provide a comprehensive resource for benchmarking variant calling methods. _Sci. Rep._ 5, 14493 (2015). Article  CAS  PubMed  PubMed Central  Google Scholar  * Huddleston, J.


et al. Discovery and genotyping of structural variation from long-read haploid genome sequence data. _Genome Res._ 27, 677–685 (2017). Article  CAS  PubMed  PubMed Central  Google Scholar  *


English, A. C. et al. Assessing structural variation in a personal genome-towards a human reference diploid genome. _BMC Genomics_ 16, 286 (2015). Article  PubMed  PubMed Central  Google


Scholar  * Audano, P. A. et al. Characterizing the major structural variant alleles of the human genome. _Cell_ 176, 663–675 (2019). Article  CAS  PubMed  PubMed Central  Google Scholar  *


Wala, J. A. et al. SvABA: genome-wide detection of structural variants and indels by local assembly. _Genome Res._ 28, 581–591 (2018). Article  CAS  PubMed  PubMed Central  Google Scholar  *


Cameron, D. L. et al. GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly. _Genome Res._ 27, 2050–2060 (2017). Article  CAS  PubMed 


PubMed Central  Google Scholar  * Nattestad, M. et al. Complex rearrangements and oncogene amplifications revealed by long-read DNA and RNA sequencing of a breast cancer cell line. _Genome


Res._ 28, 1126–1135 (2018). Article  CAS  PubMed  PubMed Central  Google Scholar  * Lee, A. Y. et al. Combining accurate tumor genome simulation with crowdsourcing to benchmark somatic


structural variant detection. _Genome Biol._ 19, 188 (2018). Article  PubMed  PubMed Central  Google Scholar  * Xia, L. C. et al. SVEngine: an efficient and versatile simulator of genome


structural variations with features of cancer clonal evolution. _Gigascience_ 7, https://doi.org/10.1093/gigascience/giy081 (2018). * Jeffares, D. C. et al. Transient structural variations


have strong effects on quantitative traits and reproductive isolation in fission yeast. _Nat. Commun._ 8, 14061 (2017). Article  CAS  PubMed  PubMed Central  Google Scholar  * Spies, N.,


Zook, J. M., Salit, M. & Sidow, A. svviz: a read viewer for validating structural variants. _Bioinformatics_ 31, 3994–3996 (2015). * Song, J. H. T., Lowe, C. B. & Kingsley, D. M.


Characterization of a human-specific tandem repeat associated with bipolar disorder and Schizophrenia. _Am. J. Hum. Genet._ 103, 421–430 (2018). Article  CAS  PubMed  PubMed Central  Google


Scholar  * Chapman, L. M. et al. SVCurator: a crowdsourcing app to visualize evidence of structural variants for the human genome. Preprint at


https://www.biorxiv.org/content/10.1101/581264v1 (2019). * Collins, R. L. et al. An open resource of structural variation for medical and population genetics. Preprint at


https://www.biorxiv.org/content/10.1101/578674v1 (2019). * Hickey, G. et al. Genotyping structural variants in pangenome graphs using the vg toolkit. _Genome Biol._ 21, 35 (2020). Article 


PubMed  PubMed Central  Google Scholar  * Chen, S. et al. Paragraph: a graph-based structural variant genotyper for short-read sequence data. _Genome Biol._ 20, 291 (2019). Article  PubMed 


PubMed Central  Google Scholar  * Falconer, E. et al. DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution. _Nat. Methods_ 9, 1107–1112 (2012).


Article  CAS  PubMed  PubMed Central  Google Scholar  * Miga, K. H. et al. Telomere-to-telomere assembly of a complete human X chromosome. Preprint at


https://www.biorxiv.org/content/10.1101/735928v3 (2019). * Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. _Nat. Biotechnol._ 36, 338–345 (2018).


Article  CAS  PubMed  PubMed Central  Google Scholar  Download references ACKNOWLEDGEMENTS We thank many GIAB Consortium Analysis Team members for helpful discussions about the design of


this benchmark. We thank J. Monlong and G. Hickey for sharing genotypes for HG002 from vg and paragraph. We thank T. Hefferon at NIH/NCBI for assistance with the dbVar submission. Certain


commercial equipment, instruments or materials are identified to specify adequately experimental conditions or reported results. Such identification does not imply recommendation or


endorsement by the National Institute of Standards and Technology, nor does it imply that the equipment, instruments or materials identified are necessarily the best available for the


purpose. C.X. and S.S. were supported by the Intramural Research Program of the National Library of Medicine, National Institutes of Health. N.F.H., J.C.M., S.K. and A.M.P. were supported by


the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health. J.M.Z. and N.D.O. were supported by the National Institute of Standards and


Technology and an interagency agreement with the Food and Drug Administration. C.E.M. acknowledges the XSEDE Supercomputing Resources, STARR I13-0052 and NIH R01AI151059. AUTHOR INFORMATION


AUTHORS AND AFFILIATIONS * Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA Justin M. Zook, Nathan D. Olson & Lesley Chapman *


National Human Genome Research Institute, National Institutes of Health, Rockville, MD, USA Nancy F. Hansen, James C. Mullikin, Sergey Koren & Adam M. Phillippy * National Center for


Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA Chunlin Xiao & Stephen Sherry * Department of Human Genetics, University of


California, Los Angeles, Los Angeles, CA, USA Paul C. Boutros * Roche Sequencing Solutions, Belmont, CA, USA Sayed Mohammad E. Sahraeian * Ontario Institute for Cancer Research, Toronto,


Ontario, Canada Vincent Huang * Charles-Bruneau Cancer Centre, Division of Hematology-Oncology, CHU Sainte-Justine, Montreal, Quebec, Canada Alexandre Rouette * Molecular Biology Institute,


University of California, Los Angeles, Los Angeles CA, USA Noah Alexander * Department of Physiology and Biophysics, Institute for Computational Biomedicine, Weill Cornell Medicine, New


York, NY, USA Christopher E. Mason, Iman Hajirasouliha & Camir Ricketts * The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell


Medicine, New York, NY, USA Christopher E. Mason * The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY, USA Christopher E. Mason * The Feil Family


Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA Christopher E. Mason * Bionano Genomics, Inc., San Diego, CA, USA Joyce Lee * Davies Research Centre, School of


Animal and Veterinary Sciences, University of Adelaide, Roseworthy, SA, Australia Rick Tearle * 10× Genomics, Pleasanton, CA, USA Ian T. Fiddes & Alvaro Martinez Barrio * Broad Institute


of Harvard and MIT, Cambridge, MA, USA Jeremiah Wala * Google, Mountain View, CA, USA Andrew Carroll * Department of Computer Science, Roy G. Perry College of Engineering, Prairie View


A&M University, Prairie View, TX, USA Noushin Ghaffari * Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA Oscar L. Rodriguez & 


Ali Bashir * BC Cancer Genome Sciences Centre, Vancouver, British Columbia, Canada Shaun Jackman * Biomedical Genetics, Department of Medicine, Boston University Medical School, Boston, MA,


USA John J. Farrell * Pacific Biosciences, Menlo Park, CA, USA Aaron M. Wenger * Department of Computer Engineering, Bilkent University, Ankara, Turkey Can Alkan * Department of Computer


Engineering, Konya Food and Agriculture University, Konya, Turkey Arda Soylev * Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA Michael C. Schatz *


Department of Genetics, Harvard Medical School, Boston, MA, USA Shilpa Garg & George Church * Heinrich Heine University, Medical Faculty, Düsseldorf, Germany Tobias Marschall *


Department of Bioinformatics and Computational Biology, MD Anderson Cancer Center, Houston, TX, USA Ken Chen * Department of Computer Science, Rice University, Houston, TX, USA Xian Fan *


Bioinformatics R&D, Spiral Genetics, Seattle, WA, USA Adam C. English * Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, USA Jeffrey A. Rosenfeld * Department of Pathology,


Robert Wood Johnson Medical School, New Brunswick, NJ, USA Jeffrey A. Rosenfeld * Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor,


MI, USA Weichen Zhou & Ryan E. Mills * Nabsys 2.0, LLC, Providence, RI, USA Jay M. Sage, Jennifer R. Davis, Michael D. Kaiser, John S. Oliver & Anthony P. Catalano * Quantitative and


Computational Biology, University of Southern California, Los Angeles, CA, USA Mark J. P. Chaisson * Joint Initiative for Metrology in Biology, SLAC National Accelerator Lab, Stanford


University, Stanford, CA, USA Noah Spies & Marc Salit * Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA Fritz J. Sedlazeck Authors * Justin M. Zook View


author publications You can also search for this author inPubMed Google Scholar * Nancy F. Hansen View author publications You can also search for this author inPubMed Google Scholar *


Nathan D. Olson View author publications You can also search for this author inPubMed Google Scholar * Lesley Chapman View author publications You can also search for this author inPubMed 


Google Scholar * James C. Mullikin View author publications You can also search for this author inPubMed Google Scholar * Chunlin Xiao View author publications You can also search for this


author inPubMed Google Scholar * Stephen Sherry View author publications You can also search for this author inPubMed Google Scholar * Sergey Koren View author publications You can also


search for this author inPubMed Google Scholar * Adam M. Phillippy View author publications You can also search for this author inPubMed Google Scholar * Paul C. Boutros View author


publications You can also search for this author inPubMed Google Scholar * Sayed Mohammad E. Sahraeian View author publications You can also search for this author inPubMed Google Scholar *


Vincent Huang View author publications You can also search for this author inPubMed Google Scholar * Alexandre Rouette View author publications You can also search for this author inPubMed 


Google Scholar * Noah Alexander View author publications You can also search for this author inPubMed Google Scholar * Christopher E. Mason View author publications You can also search for


this author inPubMed Google Scholar * Iman Hajirasouliha View author publications You can also search for this author inPubMed Google Scholar * Camir Ricketts View author publications You


can also search for this author inPubMed Google Scholar * Joyce Lee View author publications You can also search for this author inPubMed Google Scholar * Rick Tearle View author


publications You can also search for this author inPubMed Google Scholar * Ian T. Fiddes View author publications You can also search for this author inPubMed Google Scholar * Alvaro


Martinez Barrio View author publications You can also search for this author inPubMed Google Scholar * Jeremiah Wala View author publications You can also search for this author inPubMed 


Google Scholar * Andrew Carroll View author publications You can also search for this author inPubMed Google Scholar * Noushin Ghaffari View author publications You can also search for this


author inPubMed Google Scholar * Oscar L. Rodriguez View author publications You can also search for this author inPubMed Google Scholar * Ali Bashir View author publications You can also


search for this author inPubMed Google Scholar * Shaun Jackman View author publications You can also search for this author inPubMed Google Scholar * John J. Farrell View author publications


You can also search for this author inPubMed Google Scholar * Aaron M. Wenger View author publications You can also search for this author inPubMed Google Scholar * Can Alkan View author


publications You can also search for this author inPubMed Google Scholar * Arda Soylev View author publications You can also search for this author inPubMed Google Scholar * Michael C.


Schatz View author publications You can also search for this author inPubMed Google Scholar * Shilpa Garg View author publications You can also search for this author inPubMed Google Scholar


* George Church View author publications You can also search for this author inPubMed Google Scholar * Tobias Marschall View author publications You can also search for this author inPubMed


 Google Scholar * Ken Chen View author publications You can also search for this author inPubMed Google Scholar * Xian Fan View author publications You can also search for this author


inPubMed Google Scholar * Adam C. English View author publications You can also search for this author inPubMed Google Scholar * Jeffrey A. Rosenfeld View author publications You can also


search for this author inPubMed Google Scholar * Weichen Zhou View author publications You can also search for this author inPubMed Google Scholar * Ryan E. Mills View author publications


You can also search for this author inPubMed Google Scholar * Jay M. Sage View author publications You can also search for this author inPubMed Google Scholar * Jennifer R. Davis View author


publications You can also search for this author inPubMed Google Scholar * Michael D. Kaiser View author publications You can also search for this author inPubMed Google Scholar * John S.


Oliver View author publications You can also search for this author inPubMed Google Scholar * Anthony P. Catalano View author publications You can also search for this author inPubMed Google


Scholar * Mark J. P. Chaisson View author publications You can also search for this author inPubMed Google Scholar * Noah Spies View author publications You can also search for this author


inPubMed Google Scholar * Fritz J. Sedlazeck View author publications You can also search for this author inPubMed Google Scholar * Marc Salit View author publications You can also search


for this author inPubMed Google Scholar CONTRIBUTIONS J.M.Z. contributed project design, manuscript writing, generating SV input callsets and integrating SV calls. N.D.O. contributed SV


integration and figures. L.M.C. contributed benchmark evaluation. N.F.H. contributed SV callsets, benchmark evaluation, SV integration and manuscript editing. J.C.M. contributed SV callsets


and SV integration. C.X. contributed data management, SV callsets, benchmark evaluation and manuscript editing. S.S. contributed data management and SV callsets. S.K. contributed de novo


assembilies. A.M.P. contributed de novo assemblies. P.C.B. contributed manuscript writing, SV callsets and benchmark evaluation. S.M.E.S. contributed SV input callsets, benchmark evaluation


and manuscript editing. V.H. contributed SV callsets and benchmark evaluation. A.R. contributed SV callsets and benchmark evaluation. N.A. contributed benchmark evaluation. C.E.M.


contributed project design, manuscript editing and benchmark evaluation. I.H. contributed project design, manuscript editing and SV callsets. C.R. contributed SV callsets. J.L. contributed


SV callsets and benchmark evaluation. R.T. contributed provision and interpretation of Complete Genomics data and formats. I.T.F. contributed SV callsets, benchmark evaluation and de novo


assemblies. A.M.B. contributed SV callsets, benchmark evaluation and de novo assemblies. J.W. contributed SV callsets. A.C. contributed SV callsets and benchmark evaluation. N.G. contributed


genome assembly of the Ashkenazi trio, DISCOVER de novo and manuscript editing. O.L.R. contributed SV callsets and de novo assemblies. A.B. contributed SV callsets and de novo assemblies.


S.J. contributed de novo assembilies. J.J.F. contributed SV callsets. A.M.W. contributed SV callsets and benchmark evaluation. C.A. contributed SV callsets. A.S. contributed SV callsets.


M.C.S. contributed project design and manuscript editing. S.G. contributed integrative phasing short variant calls. G.C. contributed integrative phasing short variant calls. T.M. contributed


haplotype phasing. K.C. contributed SV callsets. X.F. contributed SV callsets. A.C.E. contributed SV callsets, benchmark evaluations and SV integration. J.A.R. contributed SV callsets and


project design. W.Z. contributed SV callsets. R.E.M. contributed SV callsets. J.M.S. contributed data collection, SV callsets and benchmark evaluation. J.R.D. contributed data collection, SV


callsets and benchmark evaluation. M.D.K. contributed SV callsets, benchmark evaluation and SV-Verify development. J.S.O. contributed SV callsets and benchmark evaluation. A.P.C.


contributed data collection. N.S. contributed SV integration (svviz2 development). M.J.P.C. contributed SV callsets. F.J.S. contributed SV callsets, manuscript editing and SV integration.


M.S. contributed project design and manuscript writing. CORRESPONDING AUTHOR Correspondence to Justin M. Zook. ETHICS DECLARATIONS COMPETING INTERESTS A.M.W. is an employee and shareholder


of Pacific Biosciences. A.M.B. and I.T.F. are employees and shareholders of 10× Genomics. G.M.C. is the founder and holds leadership positions of many companies described at


http://arep.med.harvard.edu/gmc/tech.html. F.J.S. has received sponsored travel from Oxford Nanopore and Pacific Biosciences and received a 2018 sequencing grant from Pacific Biosciences.


J.L. is an employee and shareholder of Bionano Genomics. A.C. is an employee of Google and is a former employee of DNAnexus. J.M.S., J.R.D., M.D.K., J.S.O. and A.P.C. are employees of Nabsys


2.0. A.C.E. is an employee and shareholder of Spiral Genetics. S.M.E.S. is an employee of Roche. ADDITIONAL INFORMATION PUBLISHER’S NOTE Springer Nature remains neutral with regard to


jurisdictional claims in published maps and institutional affiliations. EXTENDED DATA EXTENDED DATA FIG. 1 NUMBER OF LONG READS SUPPORTING THE SV ALLELE VS. THE REFERENCE ALLELE IN THE


BENCHMARK SET. Variants are colored by heterozygous (blue) and homozygous (dark orange) genotype, and are stratified into deletions and insertions, and into SVs overlapping and not


overlapping tandem repeats longer than 100 bp in the reference. EXTENDED DATA FIG. 2 MENDELIAN CONTINGENCY TABLE FOR SITES WITH CONSENSUS GENOTYPES FROM SVVIZ IN THE SON, FATHER, AND MOTHER.


SVs in boxes highlighted in red violate the expected Mendelian inheritance pattern. Variants on chromosomes X and Y are excluded. EXTENDED DATA FIG. 3 COMPARISON OF FALSE NEGATIVE RATES FOR


THE UNION OF ALL LONG READ-BASED SV DISCOVERY METHODS, THE UNION OF ALL SHORT READ-BASED DISCOVERY METHODS, AND PAIRED-END AND MATE-PAIR SHORT READ GENOTYPING OF KNOWN SVS. Variants are


stratified into deletions (top) and insertions (bottom), and into SVs overlapping (right) and not overlapping (left) tandem repeats longer than 100 bp in the reference. SVs are also


stratified by size into 50 bp to 99 bp, 100 bp to 299 bp, 300 bp to 999 bp, and ≥1000 bp. EXTENDED DATA FIG. 4 KNOWN LIMITATIONS OF THE V0.6 BENCHMARK. It is important to understand the


limitations of any benchmark, such as the limitations below for v0.6, when interpreting the resulting performance metrics. SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION Supplementary


Notes 1–4. REPORTING SUMMARY SUPPLEMENTARY TABLE 1 Variant callsets used to develop the benchmark (‘discovery’) and to evaluate the benchmark’s reliability in identifying false positives and


false negatives (‘evaluation’). SUPPLEMENTARY TABLE 2 Detailed results from manual curation of putative false positives and false negatives from evaluation of benchmark set and of deletions


not in v0.6 that were in the population-based gnomAD-SV v2.1 callset that were homozygous reference in less than 5% of individuals of European ancestry, and at least 1,000 Europeans had the


variant. RIGHTS AND PERMISSIONS Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Zook, J.M., Hansen, N.F., Olson, N.D. _et al._ A robust benchmark for detection of germline


large deletions and insertions. _Nat Biotechnol_ 38, 1347–1355 (2020). https://doi.org/10.1038/s41587-020-0538-8 Download citation * Received: 16 July 2019 * Accepted: 28 April 2020 *


Published: 15 June 2020 * Issue Date: November 2020 * DOI: https://doi.org/10.1038/s41587-020-0538-8 SHARE THIS ARTICLE Anyone you share the following link with will be able to read this


content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative