Nanopore-based single molecule sequencing of the d4z4 array responsible for facioscapulohumeral muscular dystrophy

feature-image

Play all audios:

    

ABSTRACT Subtelomeric macrosatellite repeats are difficult to sequence using conventional sequencing methods owing to the high similarity among repeat units and high GC content. Sequencing


these repetitive regions is challenging, even with recent improvements in sequencing technologies. Among these repeats, a haplotype carrying a particular sequence and shortening of the D4Z4


array on human chromosome 4q35 causes one of the most prevalent forms of muscular dystrophy with autosomal-dominant inheritance, facioscapulohumeral muscular dystrophy (FSHD). Here, we


applied a nanopore-based ultra-long read sequencer to sequence a BAC clone containing 13 D4Z4 repeats and flanking regions. We successfully obtained the whole D4Z4 repeat sequence, including


the pathogenic gene _DUX4_ in the last D4Z4 repeat. The estimated sequence accuracy of the total repeat region was 99.8% based on a comparison with the reference sequence. Errors were


typically observed between purine or between pyrimidine bases. Further, we analyzed the D4Z4 sequence from publicly available ultra-long whole human genome sequencing data obtained by


nanopore sequencing. This technology may be a new tool for studying D4Z4 repeats and pathomechanism of FSHD in the future and has the potential to widen our understanding of subtelomeric


regions. SIMILAR CONTENT BEING VIEWED BY OTHERS RAPID AND COMPREHENSIVE DIAGNOSTIC METHOD FOR REPEAT EXPANSION DISEASES USING NANOPORE SEQUENCING Article Open access 26 October 2022


DETECTION OF REPEAT EXPANSIONS IN LARGE NEXT GENERATION DNA AND RNA SEQUENCING DATA WITHOUT ALIGNMENT Article Open access 30 July 2022 OPTIMIZED TESTING STRATEGY FOR THE DIAGNOSIS OF


GAA-_FGF14_ ATAXIA/SPINOCEREBELLAR ATAXIA 27B Article Open access 15 June 2023 INTRODUCTION Facioscapulohumeral muscular dystrophy (FSHD) is one of the most prevalent adult-onset muscular


dystrophies. The genomes of most patients with FSHD have a common feature, i.e., a contracted subtelomeric macrosatellite repeat array called D4Z4 on chromosome 4q35. The D4Z4 array consists


of a highly similar 3.3-kb single repeat unit. Normally, the D4Z4 array is highly methylated and forms heterochromatin. Patients with FSHD have less than 11 D4Z4 repeats1,2,3. In Japan, the


majority of patients with FSHD have less than 7 repeats4. Shortening of the D4Z4 array causes the de-repression of the flanking genes as well as _DUX4_, located in the last D4Z4 repeat. The


ectopic expression of _DUX4_ is toxic in muscle tissues and is thought to be a causal factor for FSHD5,6,7,8,9. In addition to the repeat number, the haplotype of the last D4Z4 repeat is


important for the development of FSHD1,2. The telomeric flanking region of D4Z4 contains the 3′ UTR of _DUX4_ and is called the pLAM region. The presence of a polyadenylation signal in this


region allows _DUX4_ expression and disease manifestation10. In contrast, individuals without polyadenylation signals do not manifest the disease2. Molecular diagnosis of FSHD is commonly


made by Southern blotting of genomic DNA after restriction enzyme digestion to measure the D4Z4 array length and estimate the number of repeats. Haplotype analysis requires a different


probe1. Sequencing of this D4Z4 array using Sanger sequencing or short-read sequencers (up to 300 bp for Illumina and IonTorrent) is technically difficult owing to the high similarity and


the high GC content of the repeats. The Oxford Nanopore Technologies MinION (Oxford, UK) is a single-molecule sequencer that can produce long reads exceeding 100 kbp11. Therefore, MinION


sequencing may enable the determination of pathogenicity by sequencing the complete D4Z4 array. RESULTS NANOPORE-BASED D4Z4 SEQUENCING USING A BAC CLONE The D4Z4 array on 4q35 has EcoRI


sites in its flanking region. We took advantage of this restriction enzyme to excise the full-length D4Z4 repeats with flanking sequences, for a total of 49,877 bp. Both sides of the


EcoRI-digested DNA fragment had unique sequences that are not found in the D4Z4 repeats (4,823 bp on centromeric side and 865 bp on the telomeric side). We used a bacterial artificial


chromosome (BAC) clone containing 13 D4Z4 repeats with flanking regions on chr4 (RP11-242C23) cloned to the backbone pBACe3.6. RP11-242C23 contained multiple EcoRI sites (Fig. 1a). pBACe3.6


vector-derived DNA was digested, yielding fragments of less than 10 kb (Fig. 1a). We were able to easily separate the D4Z4-containing DNA fragment (49877 bp) from vector-derived DNA by


agarose gel electrophoresis and gel extraction (Fig. 1b). We extracted the D4Z4 array-containing DNA and subjected it to MinION 1D sequencing (Oxford Nanopore Technologies, Oxford, UK).


Base-calling was initially performed using MinKNOW ver. 1.5.12 and fastq conversion was performed using poretools to obtain 20,761 reads12. Base-calling was not possible for 87,410 reads


using the real-time MinKNOW basecaller probably due to running out of computer memory; we used Albacore (v.1.1.0) to obtain the fastq sequences in these cases. A total of 128,171 reads were


obtained, with an average read length of 7,577 bp (Supplemental Table 1). We mapped the reads to the reference BAC clone sequence (GenBank accession number CT476828.7) using LAST13.


Visualization of mapped reads using IGV showed coverage of the whole D4Z4 array (Fig. 2). The longest read mapped to the D4Z4 repeat was 29,060 bp. The consensus sequence had an accuracy of


99.85% using last-genotype (https://github.com/mcfrith/last-genotype). The identity of the consensus sequence was 99.72% when simply employing the most common base. We also used BWA-MEM for


mapping and found that the consensus sequence had a lower accuracy (99.18%). Thus, we decided to use LAST and last-genotype for subsequent analyses. The haplotype of the telomeric flanking


region of the last D4Z4 repeat is important for disease manifestation. There are two equally common haplotypes, A and B (Supplemental Fig. 1). Haplotype A, which includes pLAM sequence, has


an added polyadenylation signal at the 3′ UTR of the _DUX4_ gene10. This polyadenylation signal allows the ectopic expression of _DUX4_, which is toxic in muscle cells of patients with FSHD


with the contracted D4Z4 array14. Haplotype B lacks homologous sequence to pLAM. Individuals with haplotype B do not manifest the disease, despite having the contracted D4Z4 allele. Thus, it


is important to identify the pLAM sequence for the molecular diagnosis of FSHD. Using MinION, we successfully sequenced the whole pLAM region with an accuracy of 100% (Fig. 3). In total, 75


bases were different from the reference BAC clone sequence among the whole D4Z4 array sequence of 49,877 bp (0.15%) (Supplemental Fig. 2a). Among 75 bases, 53 (70.7%) substitutions were


between purines or between pyrimidines (Supplemental Fig. 2b). Most of these errors were repeatedly detected at the same position in the repeats (indicated by asterisks in Supplemental Fig. 


2a). Interestingly, 10 out of 12 recurrent errors were seen in the CCXGG sequence at the X position. We suspect that most of these errors are likely due to base-call errors rather than


random mutations in the BAC clone. We also compared the nanopore-sequenced _DUX4_ open-reading frame (ORF) to the reference and the Sanger sequencing results for the subcloned _DUX4_ ORF.


The accuracy of the _DUX4_ ORF sequence was 99.95% (Supplemental Fig. 3). D4Z4 DETECTION USING NANOPORE-BASED WHOLE HUMAN GENOME SEQUENCING We tested whether we can identify the D4Z4 array


from whole genome sequencing data obtained from the MinION sequencer. We used the publicly available human reference standard genome NA12878 with R9.4 chemistry11. This project contains two


sets of data. The rel3 dataset had approximately 26 × coverage with an N50 length of 10.6 kb. Rel4 had 5 × coverage of ultra-long reads with an N50 of 99.7 kb, indicating that rel4 contained


reads that possibly cover the whole D4Z4 region. Using the ultra-long read dataset, rel4, 8 reads were aligned to the 5′ sequence of the D4Z4 repeat including p13E-11, centromeric flanking


sequence (5734 bp) with high confidence (mismap = 1e-10, alignment length > 2000 bp) using LAST. These reads were extracted and then aligned to the whole human genome (GRCh38). Two reads


(read1-2) were mapped to the chromosome 10 (chr10) D4Z4 region, while two reads were mapped to the chromosome 4 (chr4) D4Z4 region (read 3–4) with high confidence: error probability < = 


10−5 (Fig. 4a, Supplemental Table 3). Reads that mapped to chr4 were also aligned to reference sequence for 4qA (CT476828.7), 4qAL(KQ983258.1) and 4qB(AC225782.3), and these 2 reads only


aligned to the last D4Z4 of the 4qB haplotype (Supplemental Fig. 4, Supplemental Fig. 5). To determine the number of D4Z4 repeats, those 4 reads were aligned to a single D4Z4 repeat. Both


reads mapped to chr4 have 17 D4Z4 repeats and those mapped to chr10 have 20 D4Z4 repeats (Fig. 4b). DISCUSSION Sequencing a highly repetitive subtelomeric region is extremely challenging.


There is variation in the number of repeats among individuals and sometimes within individuals, i.e., somatic mosaicism. It has been reported that subtelomeric regions form heterochromatin,


functioning as an insulator or repressor of nearby genes or preventing telomere shortening15,16. It is important to determine the relationship between phenotypic differences and either


sequence or structural variation in repeats not only to decipher the pathomechanisms of the disease, but also to obtain a deeper understanding of human genomes. Here, we applied a


nanopore-based sequencer to investigate the subtelomeric repeat array associated with FSHD for the first time. In the near future, it will be feasible to search for these difficult regions


to find causal relationships between the human genome and genetic diseases; even given the prevailing use of high-throughput sequencing of coding regions, the genetic causes of many diseases


remain unsolved. The disease locus of FSHD was identified at 4q35 more than 20 years ago; however, the mechanism underlying the disease has been a mystery for years and the causative genes


have not been identified until recently, when accumulating evidence has shown that the misexpression of _DUX4_ is associated with the disease. Further, it is still unclear whether there is


any sequence polymorphism in the _DUX4_ gene or flanking regions, as it is difficult to sequence the gene and the _DUX4_ transcript, which is expressed at the very low levels even in the


muscle tissues of patients14,17. Since therapeutic approaches including nucleic acid drugs targeting _DUX4_ mRNA are being studied18,19, it may be useful to determine the exact _DUX4_


sequence of patients for the development of effective therapies as well as an integrative diagnostic method. Currently, the number of D4Z4 repeats is usually determined by Southern blotting


using a probe that hybridizes to the centromeric flanking sequence, p13E-111. If the patient has a deletion at this probe site, it is not possible to detect the D4Z4 repeat by Southern


blotting. The Southern blotting technique is complicated and time-consuming. Alternative methods have been investigated, but are not widely used4,20. Cost-effectiveness of the nanopore


sequencers is currently uncertain. If we could enrich the D4Z4 containing DNA, one flowcell ($900 in U.S. dollars as of August 2017) is well capable of many samples to determine the repeat


number and haplotype. Even a dataset with 1% of the reads of the original data in our BAC clone sequencing was enough to obtain the similar quality of consensus sequence (data not shown).


MinION sequencer does not require capital investment and sequencer is provided from the company without any cost. Compared to Southern blotting, it does not require trained-skill. In


addition, it produces the data in 48 hours. Although it is uncertain, considering these advantages, nanopore sequencing may be competitive with conventional techniques such as Southern


blotting. Morioka _et al_. sequenced D4Z4 using the PacBio sequencer21 and analyzed random fragments from the BAC clone. The advantage of the nanopore sequencer over the PacBio sequencer is


the ultra-long read capability11. It has the potential to obtain reads of more than 100 kbp, the approximate mean size of D4Z4 in healthy individuals. Currently, we could only obtain two


reads that potentially cover all chr4 D4Z4 repeats from human genome data with 5 × coverage using 14 flow-cells11. In that paper, they were not able to complete an alignment of the


ultra-long reads using BWA-MEM. However, using LAST aligner, we could map the ultra-long reads to chr4 and chr10 and could differentiate the D4Z4 repeats from each chromosome. The advantage


of the ultra-long reads is that even a single read can differentiate the highly homologous D4Z4 arrays on chr4 and chr10 because it has enough length to find the preferable alignment using


the unique flanking sequence of the repeats. In addition, we could successfully align nanopore reads to a single D4Z4 repeat using the LAST aligner and last-split22, which indicates the


number of the repeats. This individual has 17 D4Z4 repeats with haplotype 4qB on chr4 at least on one allele, which is the normal size observed in non-FSHD individuals. As the NA12878


standard DNA originated from an individual without FSHD, this repeat number is reasonable. As FSHD patients have D4Z4 number less than 11, we think this ultra-long read sequencing may be


usable to detect the disease-causing contracted D4Z4 array. If the data output for the MinION sequencer improves, it will be possible to obtain sequence data with better resolution. This


approach is potentially applicable to subtelomeric regions of other chromosomes or even to centromere sequences. Interestingly, small portions of the 2 reads mapped to chr4 (read3 and read4)


were also mapped to chr10. We do not know the exact reason of this (Fig. 4a, arrow). One possibility is that GRCh38 does not well represent the sequence of chr4 in this region. The other


possible reason is chromosomal translocation between chr4 and chr10, which is known to be frequent23,24,25. In nanopore-based sequencing, changes in electric current are detected as


nucleotides pass through the pore. We observed that the errors tend to occur between purines or between pyrimidines, probably because they have similar chemical structures11. In addition, we


also observed that substitution errors tend to occur at the same nucleotide position across repeats (Supplemental Fig. 2, asterisk). This may reflect the fact that the nanopore detects


combinations of nucleotides and the specific combination CCXGG was prone to be misread. We anticipate further improvements of the base-calling algorithm, which will make MinION more


beneficial for medical applications. Sequencing technologies are continuously developed. During the preparation of this manuscript, the new chemistry R9.5 with the new flow-cell FLO-MIN107


was released. Considering the rapid improvements in this technique, it may not be very long before this sequencing technology is used for D4Z4 repeat analyses for patients with FSHD.


CONCLUSIONS Using MinION with a R9.4 flow-cell and 1D sequencing chemistry, we successfully sequenced the complete EcoRI-digested D4Z4 array from a BAC clone that contained the D4Z4 repeat


region of human chromosome 4. Our deep sequencing results had an accuracy of 99.8% for the whole D4Z4 array and flanking region. This includes the pLAM region, with an accuracy of 100%, and


the whole ORF of the pathogenic gene _DUX4_, with the accuracy of 99.95%, which are important regions for determining the pathogenesis. This short report may provide a basis for the future


use of nanopore sequencing to deepen our understanding of highly heterogeneous subtelomeric regions that may contribute to human disease. MATERIALS AND METHODS BAC CLONE The RP11-242C23


human BAC clone was obtained from BAC PAC Resources Center (https://bacpaacresources.org). This BAC clone was sequenced and deposited at GenBank under accession number CT476828.7 by the


Wellcome Trust Sanger Institute. It contained 13 3306-bp D4Z4 repeats. PREPARATION OF D4Z4 REPEATS FROM THE BAC CLONE RP11-242C23 was digested using EcoRI and treated with Klenow Fragment


DNA Polymerase (Takara, Shiga, Japan) at 37 °C for 20 min. DNA was subjected to electrophoresis on a 0.5% agarose gel. Bands larger than the 10-kb marker (GeneRuler 1 kb DNA Ladder; Thermo


Fisher Scientific, Waltham, MA, USA) were excised using a razor under ultraviolet light. The DNA fragments larger than 1 kb were subjected to phenol-chloroform DNA preparation. Agarose gels


were soaked in phenol and incubated for 30 min at −80 °C. Then, the aqueous phase was collected and phenol-chloroform DNA preparation was performed. The EcoRI-digested whole D4Z4 repeat was


enriched in the DNA sample. MINION 1D SEQUENCING Library preparation was performed using a SQK-LSK108 Sequencing Kit R9.4 version (Oxford Nanopore Technologies, Oxford, UK) using 500 ng of


DNA. MinION sequencing was performed using one FLO-MIN106 (R9.4) flow cell with the MinION MK1b sequencer (Oxford Nanopore Technologies). Base-calling and fastq conversion were performed


with MinKNOW ver. 1.5.12 followed by poretools or Albacore. SEQUENCE ALIGNMENT BY LAST AND BWA-MEM Sequence reads were aligned to the EcoRI-digested D4Z4 repeat reference (Fig. 1b,


Supplemental material) using LAST (version 847) with the commands below13,22,26; lastdb -P8 -uNEAR -R01 rdb reference.fasta last-train -P8 rdb reads.fasta > train.out lastal -P8 -p


train.out -m20 -j4 rdb reads.fasta | last-split > out.maf last-genotype train.out out.maf –p 1 > genotype.txt Sequences were also mapped to the reference genome hg19 using BWA-mem with


default settings27. Consensus sequences were obtained and sequence identity was calculated using UGENE28. Mapped reads were visualized using IGV software29. SUBCLONING OF THE LAST D4Z4


REPEAT An _Escherichia coli_ transformant with the RP11-242C23 human BAC clone was cultured in LB medium containing 12.5 µg/ml chloramphenicol at 37 °C The human BAC clone DNA was purified


using the QIAGEN Plasmid Midi Kit (Hilden, Germany) according to the “User-Developed Protocol (QP01).” Briefly, bacterial lysate from a 100-ml scale culture was passed through a QIAGEN-tip


100 column. The BAC clone DNA was eluted with buffer QF prewarmed to 65 °C and concentrated by isopropanol precipitation. To obtain the DNA clone containing the last D4Z4 repeat with the


pLAM region, 50 ng of the purified BAC clone was used as a template for PCR with the forward primer 5′-cgcgtccgtccgtgaaattcc-3′ and the reverse primer 5′-caggggatattgtgacatatctctgcac-3′. PCR


was performed with PrimeSTAR GXL DNA Polymerase (Takara) with the following cycling conditions: 98 °C for 2 min and 30 cycles of 98 °C for 10 s, 60 °C for 15 s, and 68 °C for 30 min. PCR


products were gel-purified and cloned into a pCR blunt vector (ThermoFisher Scientific) with the Mighty Mix DNA Ligation Kit (Takara). The sequence of the resulting plasmid was confirmed by


Sanger sequencing with M13 forward and M13 reverse primers. D4Z4 SEQUENCE ANALYSIS USING THE ULTRA-LONG HUMAN WHOLE GENOME SEQUENCE The human whole genome sequenced by MinION sequencers was


downloaded (https://github.com/nanopore-wgs-consortium/NA12878)11. Using the ultra-long read dataset, rel4, the 5′ sequence of the D4Z4 repeat (5734 bp) was aligned and 8 reads that have


alignment with high confidence (mismap = 1e-10, alignment length > 2000) were extracted and then aligned to the whole human genome (GRCh38) like this22,26; lastdb -P8 -uNEAR -R01 db


GRCh38 fasta last-train -P8 db reads.fasta > train.out lastal -P8 -p train.out -m50 -D1e9 hdb reads.fasta | last-split -m1e-5 | last-postmask > alns.maf Extracted reads that mapped to


chr4 were also aligned to reference sequence for 4qA (CT476828.7), 4qAL(KQ983258.1) and 4qB(AC225782.3) to determine the haplotype. To determine the number of D4Z4 repeats, those 4 reads


were aligned to the single D4Z4 repeat. Alignment was visualized using last-dotplot (http://last.cbrc.jp/doc/last-dotplot.html). REFERENCES * Wijmenga, C. _et al_. Chromosome 4q DNA


rearrangements associated with facioscapulohumeral muscular dystrophy. _Nat Genet_ 2, 26–30 (1992). Article  CAS  PubMed  Google Scholar  * Lemmers, R. J. _et al_. Facioscapulohumeral


muscular dystrophy is uniquely associated with one of the two variants of the 4q subtelomere. _Nat Genet_ 32, 235–236 (2002). Article  CAS  PubMed  Google Scholar  * van Deutekom, J. C. _et


al_. FSHD associated DNA rearrangements are due to deletions of integral copies of a 3.2 kb tandemly repeated unit. _Hum Mol Genet_ 2, 2037–2042 (1993). Article  PubMed  Google Scholar  *


Goto, K., Nishino, I. & Hayashi, Y. K. Rapid and accurate diagnosis of facioscapulohumeral muscular dystrophy. _Neuromuscul Disord_ 16, 256–261 (2006). Article  PubMed  Google Scholar  *


Mitsuhashi, H., Mitsuhashi, S., Lynn-Jones, T., Kawahara, G. & Kunkel, L. M. Expression of DUX4 in zebrafish development recapitulates facioscapulohumeral muscular dystrophy. _Hum Mol


Genet_ 22, 568–577 (2013). Article  CAS  PubMed  Google Scholar  * Kowaljow, V. _et al_. The DUX4 gene at the FSHD1A locus encodes a pro-apoptotic protein. _Neuromuscul Disord_ 17, 611–623


(2007). Article  PubMed  Google Scholar  * Snider, L. _et al_. RNA transcripts, miRNA-sized fragments and proteins produced from D4Z4 units: new candidates for the pathophysiology of


facioscapulohumeral dystrophy. _Hum Mol Genet_ 18, 2414–2430 (2009). Article  CAS  PubMed  PubMed Central  Google Scholar  * Wallace, L. M. _et al_. DUX4, a candidate gene for


facioscapulohumeral muscular dystrophy, causes p53-dependent myopathy _in vivo_. _Ann Neurol_ 69, 540–552 (2011). Article  CAS  PubMed  Google Scholar  * Wuebbles, R. D., Long, S. W., Hanel,


M. L. & Jones, P. L. Testing the effects of FSHD candidate gene expression in vertebrate muscle development. _Int J Clin Exp Pathol_ 3, 386–400 (2010). CAS  PubMed  PubMed Central 


Google Scholar  * Lemmers, R. J. _et al_. A unifying genetic model for facioscapulohumeral muscular dystrophy. _Science_ 329, 1650–1653 (2010). Article  CAS  PubMed  PubMed Central  ADS 


Google Scholar  * M Jain, S. K. _et al_. Nanopore sequencing and assembly of a human genome with ultra-long reads. _BioRxiv_ (2017). * Loman, N. J. & Quinlan, A. R. Poretools: a toolkit


for analyzing nanopore sequence data. _Bioinformatics_ 30, 3399–3401 (2014). Article  CAS  PubMed  PubMed Central  Google Scholar  * Kielbasa, S. M., Wan, R., Sato, K., Horton, P. &


Frith, M. C. Adaptive seeds tame genomic sequence comparison. _Genome Res_ 21, 487–493 (2011). Article  CAS  PubMed  PubMed Central  Google Scholar  * Snider, L. _et al_. Facioscapulohumeral


dystrophy: incomplete suppression of a retrotransposed gene. _PLoS Genet_ 6, e1001181 (2010). Article  PubMed  PubMed Central  CAS  Google Scholar  * Ottaviani, A. _et al_. The D4Z4


macrosatellite repeat acts as a CTCF and A-type lamins-dependent insulator in facio-scapulo-humeral dystrophy. _PLoS Genet_ 5, e1000394 (2009). Article  PubMed  PubMed Central  CAS  Google


Scholar  * Stadler, G. _et al_. Telomere position effect regulates DUX4 in human facioscapulohumeral muscular dystrophy. _Nat Struct Mol Biol_ 20, 671–678 (2013). Article  CAS  PubMed 


PubMed Central  Google Scholar  * Jones, T. I. _et al_. Facioscapulohumeral muscular dystrophy family studies of DUX4 expression: evidence for disease modifiers and a quantitative model of


pathogenesis. _Hum Mol Genet_ 21, 4419–4430 (2012). Article  CAS  PubMed  PubMed Central  Google Scholar  * Ansseau, E. _et al_. Antisense Oligonucleotides Used to Target the DUX4 mRNA as


Therapeutic Approaches in FaciosScapuloHumeral Muscular Dystrophy (FSHD). _Genes (Basel)_ 8 (2017). * Wallace, L. M. _et al_. RNA interference inhibits DUX4-induced muscle toxicity _in


vivo_: implications for a targeted FSHD therapy. _Mol Ther_ 20, 1417–1423 (2012). Article  CAS  PubMed  PubMed Central  Google Scholar  * Vasale, J. _et al_. Molecular combing compared to


Southern blot for measuring D4Z4 contractions in FSHD. _Neuromuscul Disord_ 25, 945–951 (2015). Article  PubMed  Google Scholar  * Morioka, M. S., Kitazume, M., Osaki, K., Wood, J. &


Tanaka, Y. Filling in the Gap of Human Chromosome 4: Single Molecule Real Time Sequencing of Macrosatellite Repeats in the Facioscapulohumeral Muscular Dystrophy Locus. _PLoS One_ 11,


e0151963 (2016). Article  PubMed  PubMed Central  CAS  Google Scholar  * Frith, M. C. & Kawaguchi, R. Split-alignment of genomes finds orthologies more accurately. _Genome Biol_ 16, 106


(2015). Article  PubMed  PubMed Central  CAS  Google Scholar  * Lemmers, R. J. _et al_. Inter- and intrachromosomal sub-telomeric rearrangements on 4q35: implications for facioscapulohumeral


muscular dystrophy (FSHD) aetiology and diagnosis. _Hum Mol Genet_ 7, 1207–1214 (1998). Article  CAS  PubMed  Google Scholar  * Linardopoulou, E. V. _et al_. Human subtelomeres are hot


spots of interchromosomal recombination and segmental duplication. _Nature_ 437, 94–100 (2005). Article  CAS  PubMed  PubMed Central  ADS  Google Scholar  * van Deutekom, J. C. _et al_.


Evidence for subtelomeric exchange of 3.3 kb tandemly repeated units between chromosomes 4q35 and 10q26: implications for genetic counselling and etiology of FSHD1. _Hum Mol Genet_ 5,


1997–2003 (1996). Article  PubMed  Google Scholar  * Hamada, M., Ono, Y., Asai, K. & Frith, M. C. Training alignment parameters for arbitrary sequencers with LAST-TRAIN. _Bioinformatics_


33, 926–928 (2017). PubMed  Google Scholar  * Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. _arXiv_ (2013). * Okonechnikov, K., Golosova, O. &


Fursov, M. & team, U. Unipro UGENE: a unified bioinformatics toolkit. _Bioinformatics_ 28, 1166–1167 (2012). Article  CAS  PubMed  Google Scholar  * Robinson, J. T. _et al_. Integrative


genomics viewer. _Nat Biotechnol_ 29, 24–26 (2011). Article  CAS  PubMed  PubMed Central  Google Scholar  Download references ACKNOWLEDGEMENTS This study was supported by MEXT-Supported


Program for the Strategic Research Foundation at Private Universities (to SN and MTU). This work was supported by JSPS KAKENHI Grant Number JP15K19477 (to HM) and JP26700030 (to MCF). AUTHOR


INFORMATION AUTHORS AND AFFILIATIONS * Biomedical Informatics Laboratory, Department of Molecular Life Science, Tokai University School of Medicine, Isehara, Kanagawa, 259-1193, Japan


Satomi Mitsuhashi, So Nakagawa & Tadashi Imanishi * Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama, Kanagawa, 236-0004, Japan Satomi


Mitsuhashi * Micro/Nano Technology Center, Tokai University, Hiratsuka, Kanagawa, 259-1291, Japan So Nakagawa & Mahoko Takahashi Ueda * Artificial Intelligence Research Center, National


Institute of Advanced Industrial Science and Technology (AIST), Tokyo, 135-0064, Japan Martin C. Frith * Graduate School of Frontier Sciences, University of Tokyo, Chiba, 277-8562, Japan


Martin C. Frith * Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, 169-8555, Japan Martin C.


Frith * Department of Applied Biochemistry, School of Engineering, Tokai University, Hiratsuka, Kanagawa, 259-1292, Japan Hiroaki Mitsuhashi Authors * Satomi Mitsuhashi View author


publications You can also search for this author inPubMed Google Scholar * So Nakagawa View author publications You can also search for this author inPubMed Google Scholar * Mahoko Takahashi


Ueda View author publications You can also search for this author inPubMed Google Scholar * Tadashi Imanishi View author publications You can also search for this author inPubMed Google


Scholar * Martin C. Frith View author publications You can also search for this author inPubMed Google Scholar * Hiroaki Mitsuhashi View author publications You can also search for this


author inPubMed Google Scholar CONTRIBUTIONS S.M. and H.M. designed the study and collected experimental materials. S.M., S.N., M.T.U., H.M., M.C.F. and T.I. analyzed and interpreted the


data. S.M. drafted the original manuscript. CORRESPONDING AUTHOR Correspondence to Satomi Mitsuhashi. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare that they have no competing


interests. ADDITIONAL INFORMATION PUBLISHER'S NOTE: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. ELECTRONIC


SUPPLEMENTARY MATERIAL SUPPLEMENTAL DATA RIGHTS AND PERMISSIONS OPEN ACCESS This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use,


sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative


Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated


otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds


the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. Reprints and


permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Mitsuhashi, S., Nakagawa, S., Takahashi Ueda, M. _et al._ Nanopore-based single molecule sequencing of the D4Z4 array responsible for


facioscapulohumeral muscular dystrophy. _Sci Rep_ 7, 14789 (2017). https://doi.org/10.1038/s41598-017-13712-6 Download citation * Received: 03 July 2017 * Accepted: 25 September 2017 *


Published: 01 November 2017 * DOI: https://doi.org/10.1038/s41598-017-13712-6 SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable


link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative