Analysis and benchmarking of small and large genomic variants across tandem repeats

feature-image

Play all audios:

Loading...

ABSTRACT Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits and are linked to over 60 disease phenotypes. However, they are often


excluded from at-scale studies because of challenges with variant calling and representation, as well as a lack of a genome-wide standard. Here, to promote the development of TR methods, we


created a catalog of TR regions and explored TR properties across 86 haplotype-resolved long-read human assemblies. We curated variants from the Genome in a Bottle (GIAB) HG002 individual to


create a TR dataset to benchmark existing and future TR analysis methods. We also present an improved variant comparison method that handles variants greater than 4 bp in length and varying


allelic representation. The 8.1% of the genome covered by the TR catalog holds ~24.9% of variants per individual, including 124,728 small and 17,988 large variants for the GIAB HG002


‘truth-set’ TR benchmark. We demonstrate the utility of this pipeline across short-read and long-read technologies. Access through your institution Buy or subscribe This is a preview of


subscription content, access via your institution ACCESS OPTIONS Access through your institution Access Nature and 54 other Nature Portfolio journals Get Nature+, our best-value


online-access subscription $29.99 / 30 days cancel any time Learn more Subscribe to this journal Receive 12 print issues and online access $209.00 per year only $17.42 per issue Learn more


Buy this article * Purchase on SpringerLink * Instant access to full article PDF Buy now Prices may be subject to local taxes which are calculated during checkout ADDITIONAL ACCESS OPTIONS:


* Log in * Learn about institutional subscriptions * Read our FAQs * Contact customer support SIMILAR CONTENT BEING VIEWED BY OTHERS DISCOVERY AND QUALITY ANALYSIS OF A COMPREHENSIVE SET OF


STRUCTURAL VARIANTS AND SHORT TANDEM REPEATS Article Open access 10 June 2020 A DEEP POPULATION REFERENCE PANEL OF TANDEM REPEAT VARIATION Article Open access 23 October 2023 VARIANT CALLING


AND BENCHMARKING IN AN ERA OF COMPLETE HUMAN GENOME SEQUENCES Article 14 April 2023 DATA AVAILABILITY The TR catalog (version 1.2) can be found at https://zenodo.org/records/8387564 (ref.


74). Supplementary Table 4 holds the paths to the input assemblies used to create the pVCF. The pVCF can be found at https://zenodo.org/records/6975244 (ref. 76). The TandemRepeat benchmark


is hosted at https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/TandemRepeats_v1.0 (ref. 83). Comparison VCFs from TR callers HipSTR, GangSTR,


Medaka and TRGT and whole-genome VCFs from DeepVariant, BioGraph and Sniffles are available at https://zenodo.org/records/10724503 (ref. 84). CODE AVAILABILITY All code created for this


project is available under an open-source license. Analysis scripts for this project are hosted at https://github.com/ACEnglish/adotto/ (ref. 85). Truvari can be found at


https://github.com/ACEnglish/truvari/ (ref. 86). Laytr can be found at https://github.com/ACEnglish/laytr/ (ref. 87). A lightweight version of the TR catalog creation process is available as


a snakemake pipeline at https://github.com/nate-d-olson/adotto-smk (ref. 88). The overlap permutation tool regioners can be downloaded from https://github.com/ACEnglish/regioners (ref. 89).


REFERENCES * Levinson, G. & Gutman, G. A. Slipped-strand mispairing: a major mechanism for DNA sequence evolution. _Mol. Biol. Evol._ 4, 203–221 (1987). CAS  PubMed  Google Scholar  *


Fan, H. & Chu, J.-Y. A brief review of short tandem repeat mutation. _Genom. Proteom. Bioinform._ 5, 7–14 (2007). CAS  Google Scholar  * Shriver, M. D., Jin, L., Chakraborty, R. &


Boerwinkle, E. VNTR allele frequency distributions under the stepwise mutation model: a computer simulation approach. _Genetics_ 134, 983–993 (1993). CAS  PubMed  PubMed Central  Google


Scholar  * Wright, J. M. Mutation at VNTRs: are minisatellites the evolutionary progeny of microsatellites? _Genome_ 37, 345–347 (1994). CAS  PubMed  Google Scholar  * Willems, T. et al. The


landscape of human STR variation. _Genome Res._ 24, 1894–1904 (2014). CAS  PubMed  PubMed Central  Google Scholar  * Ren, J., Gu, B. & Chaisson, M. J. P. vamos: variable-number tandem


repeats annotation using efficient motif sets. _Genome Biol._ 24, 175 (2023). CAS  PubMed  PubMed Central  Google Scholar  * Noyes, M. D. et al. Familial long-read sequencing increases yield


of de novo mutations. _Am. J. Hum. Genet._ 109, 631–646 (2022). CAS  PubMed  PubMed Central  Google Scholar  * DeJesus-Hernandez, M. et al. Expanded GGGGCC hexanucleotide repeat in


noncoding region of _C9ORF72_ causes chromosome 9p-linked FTD and ALS. _Neuron_ 72, 245–256 (2011). CAS  PubMed  PubMed Central  Google Scholar  * Depienne, C. & Mandel, J.-L. 30 years


of repeat expansion disorders: what have we learned and what are the remaining challenges? _Am. J. Hum. Genet._ 108, 764–785 (2021). CAS  PubMed  PubMed Central  Google Scholar  * Mirceta,


M., Shum, N., Schmidt, M. H. M. & Pearson, C. E. Fragile sites, chromosomal lesions, tandem repeats, and disease. _Front. Genet._ 13, 985975 (2022). CAS  PubMed  PubMed Central  Google


Scholar  * Hannan, A. J. Repeat DNA expands our understanding of autism spectrum disorder. _Nature_ 589, 200–202 (2021). CAS  PubMed  Google Scholar  * Hannan, A. J. Tandem repeats mediating


genetic plasticity in health and disease. _Nat. Rev. Genet._ 19, 286–298 (2018). CAS  PubMed  Google Scholar  * Stanley, U. et al. Forensic DNA profiling: autosomal short tandem repeat as a


prominent marker in crime investigation. _Malays. J. Med. Sci._ 27, 22–35 (2020). Google Scholar  * Hall, C. L. et al. Accurate profiling of forensic autosomal STRs using the Oxford


Nanopore Technologies MinION device. _Forensic Sci. Int. Genet._ 56, 102629 (2022). CAS  PubMed  Google Scholar  * Warner, J. P. et al. A general method for the detection of large CAG repeat


expansions by fluorescent PCR. _J. Med. Genet._ 33, 1022–1026 (1996). CAS  PubMed  PubMed Central  Google Scholar  * Jeffreys, A. J., Wilson, V. & Thein, S. L. Hypervariable


‘minisatellite’ regions in human DNA. _Nature_ 314, 67–73 (1985). CAS  PubMed  Google Scholar  * Dolzhenko, E. et al. ExpansionHunter: a sequence-graph based tool to analyze variation in


short tandem repeat regions. _Bioinformatics_ 35, 4754–4756 (2019). CAS  PubMed  PubMed Central  Google Scholar  * Willems, T. et al. Genome-wide profiling of heritable and de novo STR


variations. _Nat. Methods_ 14, 590–592 (2017). CAS  PubMed  PubMed Central  Google Scholar  * Mousavi, N., Shleizer-Burko, S., Yanicky, R. & Gymrek, M. Profiling the genome-wide


landscape of tandem repeat expansions. _Nucleic Acids Res._ 47, e90 (2019). CAS  PubMed  PubMed Central  Google Scholar  * Dolzhenko, E. et al. Characterization and visualization of tandem


repeats at genome scale. _Nat. Biotechnol._ https://doi.org/10.1038/s41587-023-02057-3 (2024). Article  PubMed  Google Scholar  * Chiu, R., Rajan-Babu, I.-S., Friedman, J. M. & Birol, I.


Straglr: discovering and genotyping tandem repeat expansions using whole genome long-read sequences. _Genome Biol._ 22, 224 (2021). CAS  PubMed  PubMed Central  Google Scholar  * Nurk, S.


et al. The complete sequence of a human genome. _Science_ 376, 44–53 (2022). CAS  PubMed  PubMed Central  Google Scholar  * Aganezov, S. et al. A complete reference genome improves analysis


of human genetic variation. _Science_ 376, eabl3533 (2022). CAS  PubMed  PubMed Central  Google Scholar  * Rhie, A. et al. The complete sequence of a human Y chromosome. _Nature_ 621,


344–354 (2023). CAS  PubMed  PubMed Central  Google Scholar  * Olson, N. D. et al. Variant calling and benchmarking in an era of complete human genome sequences. _Nat. Rev. Genet._ 24,


464–483 (2023). CAS  PubMed  Google Scholar  * Majidian, S., Agustinho, D. P., Chin, C.-S., Sedlazeck, F. J. & Mahmoud, M. Genomic variant benchmark: if you cannot measure it, you cannot


improve it. _Genome Biol._ 24, 221 (2023). PubMed  PubMed Central  Google Scholar  * Wagner, J. et al. Benchmarking challenging small variants with linked and long reads. _Cell Genom._ 2,


100128 (2022). CAS  PubMed  PubMed Central  Google Scholar  * Zook, J. M. et al. A robust benchmark for detection of germline large deletions and insertions. _Nat. Biotechnol._ 38, 1347–1355


(2020). CAS  PubMed  PubMed Central  Google Scholar  * Wagner, J. et al. Curated variation benchmarks for challenging medically relevant autosomal genes. _Nat. Biotechnol._ 40, 672–680


(2022). CAS  PubMed  PubMed Central  Google Scholar  * English, A. C., Menon, V. K., Gibbs, R. A., Metcalf, G. A. & Sedlazeck, F. J. Truvari: refined structural variant comparison


preserves allelic diversity. _Genome Biol._ 23, 271 (2022). CAS  PubMed  PubMed Central  Google Scholar  * Yang, J. & Chaisson, M. J. P. TT-Mars: structural variants assessment based on


haplotype-resolved assemblies. _Genome Biol._ 23, 110 (2022). PubMed  PubMed Central  Google Scholar  * Audano, P. A. & Beck, C. R. Small polymorphisms are a source of ancestral bias in


structural variant breakpoint placement. _Genome Res._ 34, 7–19 (2024). CAS  PubMed  PubMed Central  Google Scholar  * Fu, Y., Mahmoud, M., Muraliraman, V. V., Sedlazeck, F. J. &


Treangen, T. J. Vulcan: improved long-read mapping and structural variant calling via dual-mode alignment. _GigaScience_ 10, giab063 (2021). PubMed  PubMed Central  Google Scholar  *


Gelfand, Y., Rodriguez, A. & Benson, G. TRDB—the Tandem Repeats Database. _Nucleic Acids Res._ 35, D80–D87 (2007). CAS  PubMed  Google Scholar  * Halman, A., Dolzhenko, E. & Oshlack,


A. STRipy: a graphical application for enhanced genotyping of pathogenic short tandem repeats in sequencing data. _Hum. Mutat._ 43, 859–868 (2022). CAS  PubMed  PubMed Central  Google


Scholar  * Kent, W. J. et al. The human genome browser at UCSC. _Genome Res._ 12, 996–1006 (2002). CAS  PubMed  PubMed Central  Google Scholar  * Saini, S., Mitra, I., Mousavi, N., Fotsing,


S. F. & Gymrek, M. A reference haplotype panel for genome-wide imputation of short tandem repeats. _Nat. Commun._ 9, 4397 (2018). PubMed  PubMed Central  Google Scholar  * Benson, G.


Tandem Repeats Finder: a program to analyze DNA sequences. _Nucleic Acids Res._ 27, 573–580 (1999). CAS  PubMed  PubMed Central  Google Scholar  * Smit, A., Hubley, R. & Green, P.


RepeatMasker. http://www.repeatmasker.org (2013). * Wlodzimierz, P., Hong, M. & Henderson, I. R. TRASH: tandem repeat annotation and structural hierarchy. _Bioinformatics_ 39, btad308


(2023). CAS  PubMed  PubMed Central  Google Scholar  * Novák, P., Neumann, P. & Macas, J. Global analysis of repetitive DNA from unassembled sequence reads using RepeatExplorer2. _Nat.


Protoc._ 15, 3745–3776 (2020). PubMed  Google Scholar  * Delucchi, M., Näf, P., Bliven, S. & Anisimova, M. TRAL 2.0: tandem repeat detection with circular profile hidden Markov models


and evolutionary aligner. _Front. Bioinform._ 1, 691865 (2021). PubMed  PubMed Central  Google Scholar  * El-Sawy, M. & Deininger, P. Tandem insertions of Alu elements. _Cytogenet.


Genome Res._ 108, 58–62 (2004). Google Scholar  * Moretti, T. R. et al. Population data on the expanded CODIS core STR loci for eleven populations of significance for forensic DNA analyses


in the United States. _Forensic Sci. Int. Genet._ 25, 175–181 (2016). CAS  PubMed  Google Scholar  * Collins, R. L. et al. A structural variation reference for medical and population


genetics. _Nature_ 581, 444–451 (2020). CAS  PubMed  PubMed Central  Google Scholar  * Stevanovski, I. et al. Comprehensive genetic diagnosis of tandem repeat expansion disorders with


programmable targeted nanopore sequencing. _Sci. Adv._ 8, eabm5386 (2022). CAS  PubMed  PubMed Central  Google Scholar  * Pellerin, D. et al. Deep intronic _FGF14_ GAA repeat expansion in


late-onset cerebellar ataxia. _N. Engl. J. Med._ 388, 128–141 (2022). PubMed  PubMed Central  Google Scholar  * Tan, D. et al. CAG repeat expansion in _THAP11_ is associated with a novel


spinocerebellar ataxia. _Mov. Disord._ 38, 1282–1293 (2023). CAS  PubMed  Google Scholar  * Mukamel, R. E. et al. Protein-coding repeat polymorphisms strongly shape diverse human phenotypes.


_Science_ 373, 1499–1505 (2021). CAS  PubMed  PubMed Central  Google Scholar  * Liu, Z. et al. Inconsistent genotyping call at _DYS389_ locus and implications for interpretation. _Int. J.


Legal Med._ 132, 1043–1048 (2018). PubMed  Google Scholar  * White, P. S., Tatum, O. L., Deaven, L. L. & Longmire, J. L. New, male-specific microsatellite markers from the human Y


chromosome. _Genomics_ 57, 433–437 (1999). CAS  PubMed  Google Scholar  * Vinces, M. D., Legendre, M., Caldara, M., Hagihara, M. & Verstrepen, K. J. Unstable tandem repeats in promoters


confer transcriptional evolvability. _Science_ 324, 1213–1216 (2009). CAS  PubMed  PubMed Central  Google Scholar  * Sulovari, A. et al. Human-specific tandem repeat expansion and


differential gene expression during primate evolution. _Proc. Natl Acad. Sci. USA_ 116, 23243–23253 (2019). CAS  PubMed  PubMed Central  Google Scholar  * Annear, D. J. et al. Abundancy of


polymorphic CGG repeats in the human genome suggest a broad involvement in neurological disease. _Sci. Rep._ 11, 2515 (2021). CAS  PubMed  PubMed Central  Google Scholar  * Liao, W.-W. et


al. A draft human pangenome reference. _Nature_ 617, 312–324 (2023). CAS  PubMed  PubMed Central  Google Scholar  * Ebert, P. et al. Haplotype-resolved diverse human genomes and integrated


analysis of structural variation. _Science_ 372, eabf7117 (2021). CAS  PubMed  PubMed Central  Google Scholar  * Garg, S. et al. Chromosome-scale, haplotype-resolved assembly of human


genomes. _Nat. Biotechnol._ 39, 309–312 (2021). CAS  PubMed  Google Scholar  * Li, H. Minimap2: pairwise alignment for nucleotide sequences. _Bioinformatics_ 34, 3094–3100 (2018). CAS 


PubMed  PubMed Central  Google Scholar  * Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. _Nat.


Methods_ 18, 170–175 (2021). CAS  PubMed  PubMed Central  Google Scholar  * Jarvis, E. D. et al. Semi-automated assembly of high-quality diploid human reference genomes. _Nature_ 611,


519–531 (2022). CAS  PubMed  PubMed Central  Google Scholar  * Dunn, T. & Narayanasamy, S. vcfdist: accurately benchmarking phased small variant calls in human genomes. _Nat. Commun._


14, 8149 (2023). CAS  PubMed  PubMed Central  Google Scholar  * Cleary, J. G. et al. Comparing variant call files for performance benchmarking of next-generation sequencing variant calling


pipelines. Preprint at _bioRxiv_ https://doi.org/10.1101/023754 (2015). * Tan, A., Abecasis, G. R. & Kang, H. M. Unified representation of genetic variants. _Bioinformatics_ 31,


2202–2204 (2015). CAS  PubMed  PubMed Central  Google Scholar  * Marco-Sola, S., Moure, J. C., Moreto, M. & Espinosa, A. Fast gap-affine pairwise alignment using the wavefront algorithm.


_Bioinformatics_ 37, btaa777 (2020). Google Scholar  * Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. _Nat. Methods_ 15,


461–468 (2018). CAS  PubMed  PubMed Central  Google Scholar  * Park, J., Kaufman, E., Valdmanis, P. N. & Bafna, V. TRviz: a Python library for decomposing and visualizing tandem repeat


sequences. _Bioinform. Adv._ 3, vbad058 (2023). PubMed  PubMed Central  Google Scholar  * Krause, A. et al. Junctophilin 3 (_JPH3_) expansion mutations causing Huntington disease like 2


(HDL2) are common in South African patients with African ancestry and a Huntington disease phenotype. _Am. J. Med. Genet. B_ 168, 573–585 (2015). CAS  Google Scholar  * Wieben, E. D. et al.


A common trinucleotide repeat expansion within the transcription factor 4 (_TCF4_, E2-2) gene predicts Fuchs corneal dystrophy. _PLoS ONE_ 7, e49083 (2012). CAS  PubMed  PubMed Central 


Google Scholar  * Jam, H. Z. et al. A deep population reference panel of tandem repeat variation. _Nat. Commun._ 14, 6711 (2023). Google Scholar  * Bakhtiari, M., Shleizer-Burko, S., Gymrek,


M., Bansal, V. & Bafna, V. Targeted genotyping of variable number tandem repeats with adVNTR. _Genome Res._ 28, 1709–1719 (2018). CAS  PubMed  PubMed Central  Google Scholar  * Sonay,


T. B. et al. Tandem repeat variation in human and great ape populations and its impact on gene expression divergence. _Genome Res._ 25, 1591–1599 (2015). CAS  Google Scholar  * Quinlan, A.


R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. _Bioinformatics_ 26, 841–842 (2010). CAS  PubMed  PubMed Central  Google Scholar  * Howe, K. L.


et al. Ensembl 2021. _Nucleic Acids Res._ 49, D884–D891 (2020). PubMed Central  Google Scholar  * English, A. Project Adotto tandem-repeat regions and annotations. _Zenodo_


10.5281/zenodo.8387564 (2022). * Danecek, P. et al. Twelve years of SAMtools and BCFtools. _GigaScience_ 10, giab008 (2021). PubMed  PubMed Central  Google Scholar  * English, A. Project


Adotto whole-genome variants. _Zenodo_ 10.5281/zenodo.6975244 (2022). * Li, H. et al. A synthetic-diploid benchmark for accurate variant-calling evaluation. _Nat. Methods_ 15, 595–597


(2018). PubMed  PubMed Central  Google Scholar  * Chin, C.-S. et al. A diploid assembly-based benchmark for variants in the major histocompatibility complex. _Nat. Commun._ 11, 4794 (2020).


CAS  PubMed  PubMed Central  Google Scholar  * Wootton, J. C. & Federhen, S. Statistics of local complexity in amino acid sequences and sequence databases. _Comput. Chem._ 17, 149–163


(1993). CAS  Google Scholar  * Šošić, M. & Šikić, M. Edlib: a C/C++ library for fast, exact sequence alignment using edit distance. _Bioinformatics_ 33, btw753 (2016). Google Scholar  *


Bonfield, J. K. et al. HTSlib: C library for reading/writing high-throughput sequencing data. _GigaScience_ 10, giab007 (2021). PubMed  PubMed Central  Google Scholar  * Katoh, K. &


Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. _Mol. Biol. Evol._ 30, 772–780 (2013). CAS  PubMed  PubMed Central  Google


Scholar  * English, A. et al. GIAB TandemRepeats benchmark v1.0. https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/AshkenazimTrio/HG002_NA24385_son/TandemRepeats_v1.0 (2023).


* English, A. et al. GIAB TR comparison VCFs. _Zenodo_ 10.5281/zenodo.10724503 (2024). * English, A. et al. Working space for the GIAB TR benchmarking project. _GitHub_


https://github.com/ACEnglish/adotto (2023). * English, A. Structural variant toolkit for VCFs. _GitHub_ https://github.com/ACEnglish/truvari (2023). * English, A. et al. Library for variant


benchmarking stratification. _GitHub_ https://github.com/ACEnglish/laytr (2023). * Olson, N. A snakemake based pipeline to build Adotto TR databases. _GitHub_


https://github.com/nate-d-olson/adotto-smk (2023). * English, A. A rust implementation of regioneR for interval overlap permutation testing. _GitHub_ https://github.com/ACEnglish/regioners


(2023). Download references ACKNOWLEDGEMENTS We would like to thank the GIAB community for constant support. We thank J. McDaniel for very helpful comments on the paper, M. Wykes and S. Nurk


for assistance in processing Medaka results and V. Bafna for contributions to the TR catalog. A.C.E. and F.J.S. were supported by HHSN268201800002I, U01AG058589, 1U01HG011758-01 and


1UG3NS132105-01. H.Z.J. was supported by NIH/NHGRI R01HG010149. M.J.P.C. and B.G. were supported by R01HG011649 and 5U24HG007497, respectively. J.P. was supported in part by HG010149.


Certain commercial equipment, instruments or materials are identified to adequately specify the experimental conditions or reported results. Such identification does not imply recommendation


or endorsement by the National Institute of Standards and Technology, nor does it imply that the equipment, instruments or materials identified are necessarily the best available for the


purpose. AUTHOR INFORMATION Author notes * These authors contributed equally: Justin M. Zook, Fritz J. Sedlazeck. AUTHORS AND AFFILIATIONS * Human Genome Sequencing Center, Baylor College of


Medicine, Houston, TX, USA Adam C. English & Fritz J. Sedlazeck * Pacific Biosciences of California, Menlo Park, CA, USA Egor Dolzhenko & Michael A. Eberle * Department of Computer


Science and Engineering, University of California, San Diego, La Jolla, CA, USA Helyaneh Ziaei Jam, Jonghun Park & Melissa Gymrek * Oxford Nanopore Technologies, Inc., New York, NY, USA


Sean K. McKenzie * Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA Nathan D. Olson, Justin Wagner & Justin M. Zook * Applied and


Translational Neurogenomics Group, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium Wouter De Coster * Applied and Translational Neurogenomics Group, Department of Biomedical


Sciences, University of Antwerp, Antwerp, Belgium Wouter De Coster * Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA Bida Gu 


& Mark J. P. Chaisson * Department of Medicine, University of California, San Diego, La Jolla, CA, USA Melissa Gymrek * Department of Molecular and Human Genetics, Baylor College of


Medicine, Houston, TX, USA Fritz J. Sedlazeck * Department of Computer Science, Rice University, Houston, TX, USA Fritz J. Sedlazeck Authors * Adam C. English View author publications You


can also search for this author inPubMed Google Scholar * Egor Dolzhenko View author publications You can also search for this author inPubMed Google Scholar * Helyaneh Ziaei Jam View author


publications You can also search for this author inPubMed Google Scholar * Sean K. McKenzie View author publications You can also search for this author inPubMed Google Scholar * Nathan D.


Olson View author publications You can also search for this author inPubMed Google Scholar * Wouter De Coster View author publications You can also search for this author inPubMed Google


Scholar * Jonghun Park View author publications You can also search for this author inPubMed Google Scholar * Bida Gu View author publications You can also search for this author inPubMed 


Google Scholar * Justin Wagner View author publications You can also search for this author inPubMed Google Scholar * Michael A. Eberle View author publications You can also search for this


author inPubMed Google Scholar * Melissa Gymrek View author publications You can also search for this author inPubMed Google Scholar * Mark J. P. Chaisson View author publications You can


also search for this author inPubMed Google Scholar * Justin M. Zook View author publications You can also search for this author inPubMed Google Scholar * Fritz J. Sedlazeck View author


publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS A.C.E. performed data analysis and software development. E.D., H.Z.J., N.D.O., S.K.M., J.P., B.G.,


J.W., M.G. and M.J.P.C. contributed to testing and data processing. A.C.E., J.M.Z. and F.J.S. designed the study. A.C.E., E.D., H.Z.J., N.D.O., S.K.M., J.P., W.D.C., M.A.E., B.G., J.W.,


M.G., M.J.P.C., J.M.Z. and F.J.S. reviewed and edited the paper. CORRESPONDING AUTHORS Correspondence to Adam C. English or Fritz J. Sedlazeck. ETHICS DECLARATIONS COMPETING INTERESTS F.J.S.


receives research support from Illumina, Genentech, PacBio and ONT. E.D. and M.A.E. are employees and shareholders of PacBio. S.K.M. is an employee and shareholder of ONT. W.D.C. has


received free consumables from ONT. The other authors declare no competing interests. PEER REVIEW PEER REVIEW INFORMATION _Nature Biotechnology_ thanks the anonymous reviewer(s) for their


contribution to the peer review of this work. ADDITIONAL INFORMATION PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional


affiliations. SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION Supplementary Figs. 1–7, Methods and Tables 1–3, 6 and 8–13. REPORTING SUMMARY SUPPLEMENTARY TABLES Supplementary Tables 4


(assembly sources), 5 (assembly statistics), 7 (replicate tiers) and 14 (pathogenic and phenotypic TRs). SUPPLEMENTARY MATERIAL 1 Laytr HTML report for TRGT. SUPPLEMENTARY MATERIAL 2 Laytr


HTML report for Sniffles. RIGHTS AND PERMISSIONS Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with


the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable


law. Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE English, A.C., Dolzhenko, E., Ziaei Jam, H. _et al._ Analysis and benchmarking of small and large genomic variants across


tandem repeats. _Nat Biotechnol_ 43, 431–442 (2025). https://doi.org/10.1038/s41587-024-02225-z Download citation * Received: 30 October 2023 * Accepted: 28 March 2024 * Published: 26 April


2024 * Issue Date: March 2025 * DOI: https://doi.org/10.1038/s41587-024-02225-z SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable


link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative