Detection of ribonucleotides embedded in dna by nanopore sequencing

feature-image

Play all audios:

Loading...

ABSTRACT Ribonucleotides represent the most common non-canonical nucleotides found in eukaryotic genomes. The sources of chromosome-embedded ribonucleotides and the mechanisms by which


unrepaired rNMPs trigger genome instability and human pathologies are not fully understood. The available sequencing technologies only allow to indirectly deduce the genomic location of


rNMPs. Oxford Nanopore Technologies (ONT) may overcome such limitation, revealing the sites of rNMPs incorporation in genomic DNA directly from raw sequencing signals. We synthesized two


types of DNA molecules containing rNMPs at known or random positions and we developed data analysis pipelines for DNA-embedded ribonucleotides detection by ONT. We report that ONT can


identify all four ribonucleotides incorporated in DNA by capturing rNMPs-specific alterations in nucleotide alignment features, current intensity, and dwell time. We propose that ONT may be


successfully employed to directly map rNMPs in genomic DNA and we suggest a strategy to build an ad hoc basecaller to analyse native genomes. SIMILAR CONTENT BEING VIEWED BY OTHERS MAPPING


RIBONUCLEOTIDES EMBEDDED IN GENOMIC DNA TO SINGLE-NUCLEOTIDE RESOLUTION USING RIBOSE-MAP Article 04 June 2021 QUANTITATIVE PROFILING OF PSEUDOURIDYLATION DYNAMICS IN NATIVE RNAS WITH


NANOPORE SEQUENCING Article 13 May 2021 DETECTION OF GENETIC VARIATION AND BASE MODIFICATIONS AT BASE-PAIR RESOLUTION ON BOTH DNA AND RNA Article Open access 29 January 2021 INTRODUCTION Due


to the inherent chemical instability of RNA molecules, living organisms usually store their genetic information in DNA. DNA, indeed, lacks the reactive 2′-OH group of the ribose ring, which


can attack the sugar-phosphate backbone, generating breaks with genotoxic outcomes1. To guarantee the proper transmission of genetic information, cells must duplicate their DNA extremely


faithfully, avoiding mutations that can promote genome instability, leading to pathologies like cancer and neurodegenerative diseases2,3. Nevertheless, DNA integrity is constantly challenged


by a variety of exogenous and endogenous sources of damage and replication stress4,5,6. Ribonucleotides represent the most common non-canonical nucleotides found in the eukaryotic


genome7,8,9,10,11. Single rNMPs insertions in the genome primarily result from the ability of replicative DNA polymerases to duplicate chromosomal DNA, despite their high-fidelity


rates7,8,10,12. Other cellular processes potentially contributing to the incorporation of ribonucleotides in DNA are Okazaki fragments priming, R-loops formation and reparative DNA


synthesis13. The abundance and, to a certain degree, biassed distribution of ribonucleotides in the eukaryotic genome14,15,16 implies a biological relevance in specific cellular contexts.


For example, it was demonstrated that chromosome-embedded rNMPs provide sites of incision to initiate the mismatch repair (MMR) pathway in the leading strand of DNA17,18. Although having


physiological functions, ribonucleotides must be only transiently present in the genome and their prompt removal from DNA is fundamental to prevent negative consequences19,20,21. Unrepaired


ribonucleotides, especially in stretches of multiple insertions, can affect the structure of the DNA double helix22,23,24,25 and the assembly of nucleosomes26,27, they can hamper the


progression of the replicative DNA polymerases10,12,28 and they can favour mutagenesis and genomic instabilities29,30,31. To restore the correct DNA:DNA composition, cells are equipped with


ribonucleases H (RNase H1 and H2 in eukaryotes), endonucleolytic enzymes specialised in the removal of ribonucleotides from double-stranded (dsDNA) molecules32. RNase H2 mutations are


associated with Aicardi–Goutières syndrome (AGS)33, a rare autoinflammatory disorder that mainly affects the brain, and patient-derived cells show an accumulation of rNMPs in their


genome34,35,36. RNase H2 dysfunctions have also been linked to some cancers37,38,39,40,41 and to systemic lupus erythematosus (SLE)42. In order to fully unravel the mechanisms responsible


for embedding rNMPs in chromosomal DNA, to define the molecular details of ribonucleotide-induced genome instability and to determine the detrimental impact that DNA-embedded ribonucleotides


have on cells and patients, it is crucial to identify exactly how and where rNMPs are localised in the genome. Different high-throughput sequencing techniques have been developed to map


ribonucleotides at the genomic level with single-nucleotide resolution14,16,43,44,45,46,47. All these methods are based on the generation of breaks in genomic DNA by either enzymatic or


chemical digestion at the DNA-RNA junction. Hence, they share some limitations: they only allow to indirectly deduce the location of ribonucleotides in the genome, and they fail to


eventually discriminate between a single rNMP or a potentially more harmful stretch of consecutive rNMPs at a certain position13. Additionally, they were applied to


constitutively/permanently RNase H-deficient yeast strains, which accumulate thousands of ribonucleotides in the template DNA. As mentioned above, this can compromise the progression and


fidelity of the replicative DNA polymerases10,12,28, conceivably altering the real sites of rNMPs incorporation in a single round of DNA replication. Direct sequencing would be the best


approach to identify the location of single and multiple consecutive rNMPs in genomic DNA. In this context, the sequencing technologies developed by Oxford Nanopore Technologies (ONT) may


provide an appealing solution. Nanopore sequencing (extensively reviewed in48,49,50) is based on the use of engineered nanopores serving as biosensors embedded in a membrane of electrically


resistant synthetic polymers. A voltage bias of about 200 mV is usually clamped across the two sides of the membrane so that ions in an electrolytic solution flow freely through the pores,


generating an open current that is measured over time. A motor protein with helicase activity enables dsDNA or RNA:DNA duplexes to unwind and controls the translocation of single-stranded


DNA or RNA molecules through the nanopore from the negatively to the positively charged side at a translocation speed ranging from 260 to 520 bp/s for DNA strands. Each nucleotide crossing


the sensing region generates a characteristic disruption in the ion flow, detectable as a distinct change in the open current. Nucleotide identity is decoded using specific machine


learning-based algorithms, allowing real-time sequencing of single molecules. Nanopore sequencing sensitivity and versatility not only allow for the detection of the four canonical bases in


a nucleic acid filament, but it also permits the identification of base analogues51,52,53, nucleotide modifications as tiny as methylation52, and other structures like DNA adducts54,55 or


non-B DNA structures (as G-quadruplexes)56. These features make Nanopore sequencing the perfect candidate tool to attempt to directly identify and map rNMPs embedded in genomic DNA. In this


study, we assessed the feasibility of directly detecting rNMPs embedded in DNA molecules by Nanopore sequencing. We sequenced and analysed two types of rNMPs-containing synthetic DNA


molecules and we developed dedicated bioinformatic tools for their recognition. First, by using rNMPs-containing synthetic DNA primers complementary to the viral M13mp18 circular


single-stranded DNA (ssDNA), we generated linear dsDNA fragments containing single rNMPs at known positions. Second, taking advantage of a _Taq_-I614K DNA polymerase mutant produced in our


laboratory25,57, we obtained, from the same template, dsDNA molecules with randomly incorporated rNMPs. Nanopore libraries were constructed without prior chemical processing of the


ribonucleotides so that rNMPs were in their unmodified native state. In-house ad hoc data analysis pipelines based on the assessment of nucleotide alignment errors together with alterations


in reference-anchored current intensity values and dwell times were successfully elaborated and exploited, following the approach used for the detection of other nucleotide modifications by


the ONT sequencing platform58,59,60,61,62. Our results show an unexplored ability of Nanopore sequencing not only to identify the occurrence of all four ribonucleotides incorporated in DNA


molecules at known positions but also to recognise sites where rNMPs were randomly incorporated in DNA. These findings demonstrate that Nanopore sequencing may successfully be employed to


directly detect and map rNMPs embedded in native genomic DNA. Moreover, we identified the tools that may be exploited to build a specialised basecaller able to reveal the presence and exact


positions of rNMPs in chromosomal DNA. RESULTS CONSTRUCTION AND SEQUENCING OF SYNTHETIC DSDNA SUBSTRATES CONTAINING SINGLE RNMPS AT KNOWN POSITIONS To determine whether Nanopore sequencing


technologies are suitable for direct detection of ribonucleotides embedded in DNA, we needed special DNA substrates containing the four rNMPs at known positions so that, if a specific


sequencing signal was detected, it could be traced back to the presence of a specific rNMP. To this aim, we designed DNA oligonucleotides containing single rNMPs, complementary to the viral


M13mp18 circular ssDNA, which we exploited as primers and templates, respectively, for in vitro extension reactions (Fig. 1a). In particular, we designed three distinct oligonucleotides


(Ribo1A, Ribo1B and Ribo1C) complementary to the same M13mp18 region, each containing different combinations of 2–4 single rNMPs embedded in different DNA sequence contexts, for a total of


nine single rNMP substitutions (Fig. 1b). These oligonucleotides were used to perform three independent in vitro extension reactions. The integrity of the generated M13mp18 circular dsDNA


was ensured by enzymatic treatment to obtain covalently closed molecules that were cleaved with a combination of restriction enzymes to produce linear dsDNA fragments (Fig. 1a). Such


synthetic dsDNA substrates containing rNMPs in the proximity of the 5’ end of the strand complementary to the original M13mp18 ssDNA template, were ligated to Nanopore adapters (Fig. 1a),


according to the standard Nanopore library construction protocol. The resulting “Ribo1A”, “Ribo1B” and “Ribo1C” libraries were sequenced on R9.4.1 flow cells (Fig. 1a). A “DNA-only” control


library of dsDNA fragments having the same sequence of the rNMPs-containing dsDNA fragments, but without rNMPs, was obtained with the same procedure, using an oligonucleotide entirely made


of dNMPs. A total amount of reads of 1.86E + 05 for “DNA-only”, 5.34E + 05 for “Ribo1A”, 5.75E + 06 for “Ribo1B” and 4.81E + 06 for “Ribo1C” passed the quality control checks and were


successfully basecalled by the Guppy basecaller. RIBONUCLEOTIDES EMBEDDED IN DNA AT KNOWN POSITIONS CAN BE IDENTIFIED BY BASECALLING ERRORS AND BY ALTERATIONS IN CURRENT INTENSITY AND DWELL


TIME PROFILES Current intensity signals generated by Nanopore sequencing are recorded in real-time with a sampling frequency of 4 kHz and provide information about the 5- to 6-mer sequence


context inside the pocket of the pore at a given time _t_58,61,63,64,65. These signals are stored as time-series data in fast5 files and need to be translated into nucleotide sequences to


allow downstream analysis, such as basic alignment to a reference sequence. This conversion is a very demanding computational problem, as the same nucleotide can be responsible for


completely different current intensity signals depending on the surrounding sequence context. The process of translating raw ONT data into nucleotide sequences is known as “basecalling”, and


it is presently achieved through sequence-to-sequence deep neural network (DNN) algorithms, even though it represents a continuously evolving research field. Since specific nucleotide


modifications in both DNA and RNA alter the raw signal generated by the sequencing machinery in a peculiar manner, they can be detected either by expanding the basecaller vocabulary with


additional nucleotides or by searching for systematic, reproducible and non-random “errors” in basecalling features and nucleotide alignment profiles, probably due to misinterpretations of


the signal deriving from a modified nucleotide that is not included in the basecaller training dataset52,58,63. Based on these assumptions, we initially attempted to recognise DNA-embedded


ribonucleotides from alterations in the nucleotide alignment profiles. As indicated in the workflow summary of Supplementary Fig. 1, the fast5 files generated during sequencing were


converted to fastQ files and aligned to the M13mp18 ssDNA reference sequence by minimap2. The generated BAM files were split into forward (+) strand sequences, directly mapping to the


M13mp18 ssDNA sequence, and reverse (−) strand sequences, mapping to the sequence complementary to the M13mp18 ssDNA and expected to contain the single rNMP substitutions in the primer


region. Nucleotide sequences were independently retrieved from “DNA-only”, “Ribo1A”, “Ribo1B” and “Ribo1C” runs. For the “DNA-only” run, 83.44% of the passed reads were successfully mapped


against the reference sequence by the aligner, of which 46.77% were mapped on the reverse (-) strand. An average of 91.72% of the passed reads of the three “Ribo” runs were aligned to the


reference sequence, of which an average of 49.96% mapped on the reverse (−) strand. The nucleotide alignment profiles of forward and reverse strands were separately obtained by plotting the


difference in the frequency of detection of A, C, G, T, deletions, and insertions measured at each M13mp18 genomic coordinate for each “Ribo” sample respect to “DNA-only” (Fig. 2).


Strikingly, when comparing each “Ribo” sample to the control, numerous, reproducible alterations resulting in a noisier alignment profile, were detected only on the reverse strand in


correspondence of the regions known to contain ribonucleotides (Fig. 2a–c, bottom graphs). On the other hand, no clear and reproducible alterations were detected on the forward strand (Fig. 


2a–c, upper graphs). According to our experimental design, we expected four rNMP substitutions on the reverse strand of the “Ribo1A” sample (rA at position 4985, rG at position 4997, rC at


position 5004 and U at position 5015), three rNMP substitutions on the reverse strand of the “Ribo1B” sample (rG at position 4984, U at position 4998 and rA at position 5016), and two rNMP


substitutions on the reverse strand of the “Ribo1C” sample (rC at position 4996, and rA at position 5007) (Fig. 1b). At all these locations, indeed, we generally observed a decrease in the


frequency of detection of the predicted base, compensated by an increase in the rate of mismatches or indels (Fig. 2a–c, bottom graphs). Interestingly, similar perturbations were observed


not only in correspondence of the exact coordinates where ribonucleotides were included in the “Ribo” primers but also at some nearby positions (Fig. 2a–c, bottom graphs). This is consistent


with the fact that, as already pointed out, ONT electric signals, and thus basecalling features, depend on a stretch of 5–6 nucleotides passing through nanopore channels at a given time. A


single rNMP is, therefore, presumed to affect the signals generated by the surrounding nucleotides. The alterations due to the presence of rA, except for rA at position 5016, and rG were


generally more evident, while the ones induced by rC and U were less pronounced but still clearly visible (Fig. 2a–c, bottom graphs). In summary, we demonstrated that the presence of all


four rNMPs embedded in DNA caused an increased rate of basecalling errors not only in correspondence of the ribonucleotide itself but also in the immediately surrounding area. Many


computational tools available for the detection of base modifications by ONT use current intensity-based methods59,60,62,66,67. The starting point of every ONT sequencing experiment is,


indeed, constituted by the electric signals stored inside the fast5 files generated by the sequencer. When the final goal is to look for dissimilarities between different samples, as in our


case, an effective approach is to directly investigate the current intensity signals. Therefore, in addition to exploiting mistakes in basecalling and nucleotide alignment profiles, we opted


for comparing the current intensity profiles of each “Ribo” run to the one of the “DNA-only” control. A preliminary step for the analysis of current intensity profiles consists in anchoring


the electric events contained in the raw data files to the reference genome. As reported in the workflow summary of Supplementary Fig. 1, the BAM files generated by minimap2 were, thus,


used for “re-squiggling” the electric events against the M13mp18 reference sequence with the f5c eventalign software in order to compare the forward and reverse current distributions derived


from “Ribo1A”, “Ribo1B” and “Ribo1C” samples to the ones derived from the “DNA-only” sample. Comparably to what we found out from the analysis of nucleotide alignment profiles, numerous,


reproducible current intensity alterations were clearly visible only on the reverse strand around the regions containing the 9 single ribonucleotides (Fig. 3a–i, bottom graphs). In this


case, the strongest perturbations consisted in some positions of the “Ribo” samples showing bimodal distributions of the current, not noticeable for the corresponding positions of the


“DNA-only” control (Fig. 3a, b, e, f, h, i, bottom graphs, dotted, red circles). Again, these alterations were not limited to the genomic coordinate where each rNMP was included, but they


were extended to the surrounding nucleotides. As expected, no clear signs of alteration were detected on the forward strand (Fig. 3a–i, upper graphs) when comparing “Ribo1A”, “Ribo1B and


“Ribo1C” runs to the “DNA-only” control. The presence of all these electric signal changes only on the reverse strand confirmed that we were really observing perturbations caused by the


presence of ribonucleotides in DNA. Another strategy used for direct detection of nucleotide modifications by Nanopore sequencing is based on the analysis of the dwell time68,69, which is


the time a nucleotide spends inside the nanopore channel during sequencing. Dwell times were extracted by exploiting again the f5c eventalign software. The dwell time profiles of the forward


and reverse strands were separately obtained by calculating the difference between the mean dwell time value of each “Ribo” run and the mean dwell time value of the “DNA-only” run at each


position (Fig. 4). Dwell time profiles analysis of the reverse strand revealed clear alterations related to the presence of all rNMPs embedded in “Ribo1A”, “Ribo1B” and “Ribo1C” samples,


except for rA at position 5016 (Fig. 4, orange lines). By contrast, dwell time profiles of the forward strand showed no clear signs of alteration (Fig. 4, blue lines). Once more, these


perturbations were not restricted to the genomic coordinates corresponding to each rNMP, but they spanned on several upstream and downstream proximal positions. These findings are in


accordance with the observations previously made by analysing nucleotide alignment profiles and current intensities. Taken together, these results indicate that all four rNMPs embedded in


DNA can be successfully identified by Nanopore sequencing by searching for rNMPs-related errors in basecalling, perturbations in current distributions, and dwell time profiles.


RIBONUCLEOTIDES EMBEDDED IN DNA AT KNOWN POSITIONS CAN BE EFFICIENTLY RECOGNISED FROM A MIXTURE OF DNMPS- AND RNMPS-CONTAINING READS IN SILICO As mentioned in the introduction, single rNMPs


incorporations are mostly caused by replicative DNA polymerase misinsertions7,8,10,12. Then, a sample of genomic DNA would not always contain a rNMP at a certain position. We, therefore,


wondered whether the signal of a specific rNMP would still be recognisable in a sample that contained molecules with and without rNMPs at a given location. To investigate this, we performed


an in silico simulated washout assay (Fig. 5), in which we virtually mixed at different ratios a fixed total number of reads mapping on the reverse strand, randomly extracted from the


“DNA-only” control and the “Ribo1B” sample. For each ratio, we evaluated all the previously analysed features around the site of rG incorporation at position 4984. The analysis of nucleotide


alignment profiles of “Ribo1B” compared to “DNA-only” showed that, on the reverse strand, rG:4984 was responsible for very pronounced alterations at its genomic coordinate (Fig. 2b). The


analysis of basecalling features at position 4984, indeed, revealed that when reads uniquely derived from the “DNA-only” sample (0% of “Ribo1B” reads on total reads), the vast majority of


basecalled events corresponded to G (Fig. 5a). When “Ribo1B” derived reads represented 20% of the total number of reads, the frequency of detection of A and, to a lesser extent, T, deletions


and insertions was increased to a total error frequency higher than 10% (Fig. 5a). The frequency of detection of A, T, deletions and insertions kept rising together with the fraction of


“Ribo1B” derived reads. At 100% of “Ribo1B” derived reads, the total frequency of errors was higher than 40% (Fig. 5a). When looking at the current distributions of “Ribo1B” and “DNA-only”


on the reverse strand, we observed that rG:4984 generated a peculiar bimodal distribution of the current at position T:4980 (Fig. 3e). For this reason, we evaluated the effect of increasing


fractions of “Ribo1B” derived reads on the current intensity at this position. In accordance with what was described above, the effect of rG on the electric signals was detectable when


“Ribo1B” derived reads constituted 20% of the total of reads, and it increased with the fraction of “Ribo1B” derived reads (Fig. 5b). As already shown in Fig. 4e, reproducible dwell time


perturbations generated by rG:4984 were observed at coordinate A:4985, when comparing the reverse strands of “Ribo1B” and “DNA-only” runs. The impact of increasing percentages of “Ribo1B”


reads on dwell time at that position was consistent with the previous results: dwell time values went up along with the fraction of “Ribo1B” reads on the number of total reads (Fig. 5c).


Interestingly, in silico simulated washout experiments conducted on all the other rNMPs, excluded rA at position 5016 that did not show evident perturbations, led to similar alteration


patterns (Supplementary Fig. 2). In light of these observations, we can conclude that ONT allows the detection of DNA-embedded ribonucleotides even in a sample where, at a given position,


dNMP- and rNMP-containing reads are mixed. These analyses were performed in situations in which features were aggregated onto genomic positions. Different strategies tailored to increase the


granularity at a per-read level would strongly lower down the minimal percentage of ribonucleotides needed at a certain position for their successful detection, which would be relevant for


their visualisation inside genomic DNA samples. CONSTRUCTION AND SEQUENCING OF DSDNA SUBSTRATES CONTAINING RANDOMLY INCORPORATED RCMPS To assess the performance of ONT sequencing on


substrates more similar to genomic DNA samples, we generated dsDNA molecules containing rCMPs at unknown, random positions. We took advantage of a _Taq_-I614K DNA Polymerase mutant25,57 that


was produced in our laboratory and was extensively characterised for its ability to synthesise hundreds-of-bp-long, rCMP-containing DNA molecules in the presence of all four dNTPs and


rCTP25. _Taq_-I614K DNA Polymerase, although designed to incorporate rCTPs, cannot function in the absence of dCTP during the synthesis; the addition of dCTP to the reaction is thus


necessary. Therefore, the incorporation of rCMPs will be random25. Using the M13mp18 DNA as a template, we employed _Taq_-I614K to generate dsDNA fragments of 525 bp in the presence of 400,


600, or 800 μM rCTP. Such synthetic dsDNA substrates containing rCMPs in both strands were subsequently ligated to Nanopore barcodes and adapters (Fig. 1c). The resulting Nanopore “rC 400 


μM”, “rC 600 μM” and “rC 800 μM” libraries, constituted by dsDNA fragments containing a different combination of rCMP incorporations at unknown positions on both strands, were finally


sequenced on R9.4.1 flow cells (Fig. 1c). The same procedure was followed in a PCR reaction in which _Taq_-I614K was not provided with rCMP, to obtain a “DNA-only” control library.


Importantly, the 525 bp fragments containing randomly incorporated rCMPs included the region used to produce the DNA fragments with ribonucleotides at known positions described previously.


In this way, the two types of rNMPs-containing synthetic dsDNA substrates shared the same sequence (Fig. 1a, c, grey boxes), which would allow us to eventually compare them in the following


analyses. RIBONUCLEOTIDES EMBEDDED IN DNA AT RANDOM POSITIONS INDUCE ALTERATIONS IN NUCLEOTIDE SEQUENCE ALIGNMENT, CURRENT INTENSITY AND DWELL TIME PROFILES As for the substrates containing


rNMPs at defined positions, we started by evaluating the effect of randomly incorporated rCMPs on nucleotide alignment profiles. Following the pipeline described, the BAM files generated by


minimap2 were split into forward (+) and reverse (-) strand sequences with respect to the M13mp18 reference. Nucleotide sequences were independently retrieved from “DNA-only”, “rC 400 μM”,


“rC 600 μM” and “rC 800 μM” runs, and the nucleotide alignment profiles of forward and reverse strands were separately obtained by plotting the difference in the frequency of detection of A,


C, G, T, deletions, and insertions measured at each genomic coordinate for each “rC” sample respect to the “DNA-only” control (Supplementary 3). Two different “DNA-only” technical


replicates (“DNA-only” and “DNA-only2”) were compared to evaluate the basal instrumental bias (Supplementary Fig. 3a), revealing a quite “flat” distribution of the signal, indicative of


consistency between the runs. When comparing “rC 400 μM” (Supplementary Fig. 3b), “rC 600 μM” (Supplementary Fig. 3c), and “rC 800 μM” (Supplementary Fig. 3d) to the internal “DNA-only”


control, instead, both the forward and reverse strands showed much noisier nucleotide alignment profiles across the entire length of the substrate, consistent with the possibility of rCMP


being incorporated at different positions in both strands with variable stoichiometry. To make the interpretation of results easier, basecalling features were plotted in graphs as deltas of


the total error detected (sum of mismatches and indels frequencies) at each position of each “rC” sample with respect to the “DNA-only” control on the forward (Fig. 6a–d, left panels) and


reverse (Fig. 6a–d, right panels) strands separately. In accordance with what is described above, the comparison between the two “DNA-only” replicates exhibited a pretty compact distribution


around zero of the values of delta total error on both strands (Fig. 6a). On the other hand, the values of delta total error for internal “DNA-only” vs “rC 400 μM”, “rC 600 μM” and “rC 800 


μM” on both strands, generally oscillated above or below zero (Fig. 6b–d). We subsequently collected all the values obtained from the previous analysis in a box plot in which deltas were


grouped for comparison on each strand (Fig. 6e). This representation of the data allowed us to observe that the mean delta total error on both strands significantly increased together with


the increasing concentration of rC from 400 to 800 μM, used in the PCR step with _Taq_-I614K (Fig. 6e). The average percentage variations on both strands respect to the internal control were


+164.57%, +150.52% and +223.17% for “rC 400 μM”, “rC 600 μM” and “rC 800 μM”, respectively, with a Pearson correlation coefficient computed between rCMP concentrations and average delta


total error equal to 0.90 (_p_ = 0.002, divided by strand). Ionic current values were retrieved again by using the f5c eventalign software. Current variations were separately evaluated for


forward and reverse strands and represented in scatter plots, where each point corresponds to a genomic coordinate, whose value was calculated as the sum of the absolute values of the delta


between the current distributions of the “rC” sample respect to the “DNA-only” control (Fig. 7a–d). In line with what was observed for the basecalling features, in the scatter plot deriving


from the comparison between the two “DNA-only” replicates, dots are generally very close to zero for both strands (Fig. 7a), while in the scatter plots where “rC” samples were compared to


the “DNA-only” control, many points were located above the zero, reaching values between 0.6 and 0.8 (Fig. 7b–d). Data were also represented in a box plot grouped for comparison and strand,


where, especially for the reverse strand, a positive correlation between rC concentrations and current perturbations is clearly observable (Fig. 7e). The average percentage variations on


both strands respect to the internal control were +97.66%, +101.03% and +102.04% for “rC 400 μM”, “rC 600 μM” and “rC 800 μM”, respectively, with a high linear correlation between rCMP


concentrations and average current variations with _r_ = 0.82 (_p_ = 0.012, divided by strand). Dwell times were measured with the f5c eventalign software and dwell time variations were


computed similarly to what was described above. Dwell times were represented in line plots, where each position corresponding to a genomic coordinate shows the value of the sum of the


absolute values of the differences between the dwell time distributions of the “rC” sample with respect to the “DNA-only” control (Fig. 8a–d). Also in this case, the analysis of “DNA-only”


vs “DNA-only2” showed signal alterations close to zero in both strands (Fig. 8a), while each “rC” sample vs “DNA-only” showed an increase in the differences in dwell time distributions on


both the forward and reverse strands (Fig. 8b–d). The data collected in a box plot grouped for comparison and strand showed the same behaviour already described for the previous features,


confirming that also alterations in the dwell times increased together with the concentration of rC employed to obtain the substrates (Fig. 8e). The average percentage variations on both


strands respect to the internal control, in this case, were lower in magnitude, being +11.16%, +14.11% and +28.98% for “rC 400 μM”, “rC 600 μM” and “rC 800 μM”, respectively, with a


significant positive Pearson correlation coefficient equal to 0.87 (_p_ = 0.005, divided by strand). SYNTHETIC DSDNA SUBSTRATES CONTAINING RCMPS AT KNOWN POSITIONS OR RANDOMLY DISTRIBUTED


RCMPS SHOW SIMILARLY PERTURBED PROFILES The _Taq_-I614K DNA polymerase mutant is reported to incorporate 1 rCTP every 19 dCTPs in the presence of 800 μM rCTP25, which accounts for a


probability of about 5% to have a rCMP inserted at a certain position, a value that is far below the 20% level that we analysed in the in silico washout experiments. The analysis of “rC”


samples compared to the “DNA-only” control revealed a statistically significant magnified noise in basecalling features due to raw current signals perturbations, which strictly correlated


with the concentration of rCTP used for the step of PCR amplification with _Taq_-I614K (Figs. 6e, 7e and 8e), we tested the possibility to specifically recognise rCMP-related alterations at


a certain position, even in a sample with such low ribonucleotide levels. We started by looking for a strategy to identify anomalous reads. To do that, we verified the existence of outliers


along all the 525 reference positions by selecting all the genomic coordinates with ionic current or dwell time alterations outside a confidence interval of the mean ±2 std. dev. We then


computed a general overall “anomaly” index for both features as the sum of the differences of outlier data points. In this way, we detected strong anomalies in both ionic current scores,


with a fold-change for the computed indexes of +6.27, +7.22 and +5.32 for the “rC” runs with respect to the “DNA-only” run and dwell time scores with a fold-change of +0.94, +1.35 and +0.67


for “rC” samples compared to “DNA-only”. Even if these investigations indicated that the analysed features were non-uniformly affected, the presence of outliers generally appeared to be more


pronounced for “rC” runs than “DNA-only” runs. We thus tested a more sophisticated approach to discern anomalous reads at a per-read level, based on an unsupervised machine learning


algorithm, the Isolation Forest (iForest)70,71. Indeed, the identification of randomly incorporated rCMPs, which can be considered a relatively rare event occurring during the synthetic


activity of _Taq_-I614K, is exactly the type of task that could easily be tackled by using models suited for the detection of anomalies like the iForest, which has been recently exploited to


successfully address similar problems72. For this investigation, we focused our attention on the “rC 800 μM” run, which was most affected, and we selected all the reads that mapped on the


reverse strand and covered a ±5 nt-long region surrounding the M13mp18:4996 site, where rCMP was known to be incorporated in the previously analysed “Ribo1C” sample. In this context, a total


of about 427k chunks of reads were retrieved using a custom in-house Python script, leveraging the Pysam library (see the Methods section for further details). Each read was then encoded as


a 12 nt-long features vector representing the disposition of matches, mismatches and indels, as described in Fig. 9a. After the standardisation of these encoded chunks of reads, an iForest


object was fit and trained using a “contamination” parameter equal to 0.05, that is roughly the expected ratio of reads containing at least one rCMP on the region of interest, according to


our knowledge and expectation. We then applied the trained model to classify each read as an outlier (high anomaly score, probably carrying at least one rCMP) or an inlier (a read showing a


pattern shared with the majority of the other reads, with putatively no rCMPs). A wide proportion of the variance in our dataset was due to variations between inlier and outlier reads (Fig. 


9b), suggesting that the presence of a relatively low quantity of rCMPs was sufficient to generate alterations in basecalling features, distinguishable from basal instrumental noise, even if


focusing on a per-read level resolution. To further confirm that the majority of the 21,310 detected outlier reads had at least one rCMP, we analysed basecalling and current features as we


did for the analysis of the “Ribo1C” sample, but stratifying the reads based on the iForest prediction. We found out that the patterns of the outlier reads almost recapitulated the ones


observed for the “Ribo1C” run at position 4996 corresponding to rC for all the investigated features (Fig. 9c–e), while the inlier reads appeared to be mainly unaffected. The existence of


some differences between these samples can be explained by the presence of two positions in close proximity to rC:4996 (C:4994 and C:4990), where rCMP might have been incorporated by the


_Taq_-I614K polymerase in some of the outlier reads, producing a more complex mixture of DNA strands. These results indicate that the iForest model trained on alignment errors can


efficiently select reads carrying rNMPs-related signals and they strongly suggest that similar machine learning models tailored for a per-read inference may be the most suitable strategy for


direct identification and mapping of ribonucleotides in genomic DNA by ONT. DISCUSSION More than 10,000 and more than 1,000,000 rNMPs have been estimated to be inserted into yeast and mouse


genomes, respectively, in each cell cycle, making ribonucleotides the most probable source of DNA alteration in eukaryotic cells73. Despite their physiological functions in specific


circumstances16,17, the presence of chromosome-embedded ribonucleotides erroneously left in DNA is generally detrimental, as they affect DNA replication10,12,28 and other processes, leading


to genome instability18,19,20,28,29,30. Different pathologies are linked to mutations in RNase H enzymes33,37,38,39,40,41,42, which are normally responsible for the removal of


ribonucleotides from DNA32. It thus becomes crucial to extensively comprehend the mechanisms leading to rNMPs incorporation in chromosomes and the molecular details of ribonucleotide-induced


genome instability in eukaryotic cells. To this extent, high-throughput sequencing techniques have already been developed to try to map ribonucleotides at the genomic level with


single-nucleotide resolution. These methods entail an enzymatic or chemical generation of breaks at DNA–RNA junctions in genomic DNA, only allowing to indirectly infer the position of


ribonucleotides in the genome. Moreover, they were applied to RNase H-depleted cells accumulating thousands of rNMPs in their template DNA, which may have altered the real sites of


incorporation during a single round of DNA replication. A tempting solution to these limitations may come from the direct sequencing strategies developed by Oxford Nanopore Technologies.


Nanopore sequencing has already been successfully exploited to recognise a series of modifications both in DNA and RNA. We therefore wondered whether Nanopore sequencing could be exploited


for direct identification of rNMPs embedded in DNA substrates. To evaluate the feasibility of this approach, we generated synthetic DNA molecules containing the four different


ribonucleotides at known positions so that if a specific sequencing signal change was observed, it could be linked to the occurrence of a specific rNMP. Thanks to the optimisation of


in-house ad hoc data analysis pipelines based on existing bioinformatic tools for ONT data collection and manipulation, we pioneered the direct detection of ribonucleotides embedded in DNA


molecules by Oxford Nanopore sequencing. We assessed this either by searching for systematic, consistent, non-random “errors” in nucleotide alignments and basecalling features or by directly


investigating current intensity signals and dwell times. In order to assess Nanopore sequencing performance on samples whose characteristics were more similar to real genomes, we


investigated its ability to recognise rNMPs in heterogeneous samples consisting of reads containing both rNMPs and dNMPs at a certain position. This was done initially by mixing in silico


different percentages of reads containing rNMPs and dNMPs at a known position and then by analysing DNA molecules where rCMPs were randomly incorporated by PCR. Exploiting the iForest


machine learning algorithm, we verified that, as for single rNMP substitutions at fixed positions, randomly incorporated rCMP was also inducing alterations in the same features previously


analysed. Bimodal distributions in the current intensity profiles, similar to the ones observed in our samples, were previously demonstrated to be an effective index for the quantification


of modified nucleotides in single reads in direct RNA sequencing datasets67. Even if our experimental set-up was still not comprehensive of all possible k-mers, our findings confirmed that


alterations due to rNMPs insertions in DNA can be efficiently detected by ONT, also in a sample with low ribonucleotide levels. Therefore, our data provide a robust proof of concept that


machine learning models customised to get per-read inferences may be the best approach to directly visualise and map ribonucleotides in genomic DNA through the Nanopore system. These results


encourage us to build a machine learning-based model for the detection of embedded rNMPs that will require the generation of an ad-hoc comprehensive training dataset covering all the


possible rNTPs incorporation contexts. Comparably to what was done in58, this approach will allow us to have a deeper understanding of the relevance of each used variable. Although


deoxyribonucleotides and ribonucleotides only differ in the presence of the OH group on the 2′ carbon of ribose, we showed that Nanopore sequencing can detect such small structural


dissimilarity. Finding a way to increase the difference between the two types of nucleotides would possibly make the identification of rNMPs incorporated in DNA even easier. The existence of


stretches of consecutive rNMPs embedded in genomic DNA is still ambiguous, although a growing body of evidence supports the idea that they may arise from aberrations in Okazaki fragments


priming or joining, R-loops formation or processing, and reparative DNA synthesis13. Unlike single rNMPs, which are relatively tolerated up to a certain level, rNMPs stretches were reported


to be more dangerous for cell viability, aggravating replicative DNA polymerase stalling, DNA double helix distortions, and genomic instability25,28. The identification of multiple genomic


rNMPs with the available technologies has proven to be extremely challenging. The enzymatic or chemical digestion at DNA-RNA junctions indispensable to the sequencing techniques elaborated


up to now, makes it impossible to eventually discern the presence of single or multiple rNMPs at given positions in the genome. Furthermore, the most common strategies to study RNA:DNA


hybrids rely on the S9.6 monoclonal antibody or on catalytically inactive RNase H1, which both indistinctly bind any type of hybrid present in the genome (R-loops, DNA replication primers,


hybrids at DSBs and eventually stretches of consecutive rNMPs embedded in DNA)13. ONT might provide not only a solution to directly map rNMPs in chromosomes but also to distinguish sites of


single and multiple ribonucleotide insertions in the genome. Demonstrating the occurrence of stretches of consecutive rNMPs embedded in eukaryotic genomes would help to clarify the


contribution of the two RNase H enzymes to the recognition and processing of the different RNA substrates found in DNA, as well as the unconventional ability to synthesise multiple rNMPs


insertions that certain DNA polymerases, like the Y-family polymerase η, seem to possess at least in some peculiar conditions74,75,76,77. This would ultimately contribute to shed light on


the molecular details linking ribonucleotides, replication stress, genome instability and severe human pathologies. Naturally, the investigation of multiple ribonucleotides in DNA by


Nanopore sequencing will require extra experimental and bioinformatic strategies. The raw sequencing signals deriving from multiple ribonucleotides might require a greater effort to be


decoded, due to several overlapping alterations coming from each single rNMP in the stretch. In conclusion, our work provides the first evidence that the Oxford Nanopore sequencing platform


can directly distinguish ribonucleotides included in DNA molecules and the proof of concept that Nanopore sequencing may successfully be employed to directly detect and map rNMPs embedded in


genomic DNA, giving further proof of the potentialities of third-generation sequencing platforms. Moreover, the basecalling alterations we detect due to embedded rNMPs may explain at least


some of the basecalling errors that have been reported in ONT genomic sequencing efforts. Our work may thus contribute to helping the further development of ONT approaches. METHODS


PREPARATION OF SYNTHETIC DSDNA SUBSTRATES CONTAINING RNMPS AT KNOWN POSITIONS Each in vitro extension reaction was carried out with 1 μg of M13mp18 ssDNA (New England Biolabs, catalogue #


N4040) and about 25 nM of complementary oligonucleotide, exploiting the Phusion™ High-Fidelity DNA Polymerase (ThermoFisher Scientific, catalogue # F530) in the presence of 200 μM of each


dNTP in a final volume of 50 μL. Oligos were annealed to the template at 60 °C for 30 s and then extended by the polymerase at 72 °C for 10 min. “Ribo1A” samples were obtained starting from


oligo 5′-Phosphate-AAU GGC TAT TAG TrCT TTA ATrG CGC GAA CTG ATrA GCC CT-3′, “Ribo1B” samples were obtained starting from oligo 5′-Phosphate-ArAT GGC TAT TAG TCT TTA AUG CGC GAA CTG ATA rGCC


CT-3′, “Ribo1C” samples were obtained starting from oligo 5′-Phosphate-AAT GGC TAT TrAG TCT TTA ATG rCGC GAA CTG ATA GCC CT-3′, containing single rNMPs (underlined NMPs) at different


positions, while “DNA-only” samples were obtained starting from oligo 5′-Phosphate-AAT GGC TAT TAG TCT TTA ATG CGC GAA CTG ATA GCC CT-3′ that does not contain rNMPs. All oligos were


complementary to the M13mp18 sequence from position 4980 to 5017. Extension reaction products were then treated with the NEBNext® FFPE DNA Repair Mix (New England Biolabs, catalogue #


M6630), assembling the reaction according to the “Protocol for use with Other User-supplied Library Construction Reagents” reported on the product webpage, adding 3U of T4 DNA Polymerase


(New England Biolabs catalogue # M0203), and incubated at 20 °C for 3 h. The covalently closed circular dsDNA molecules obtained were O/N digested at 25 °C with MscI (New England Biolabs


catalogue # R0534) and SwaI (New England Biolabs catalogue # R0604) restriction enzymes in NEB buffer 3.1. Digestion products were run on a 0.8% agarose gel, and only DNA fragments of 5545 


bp were extracted from the gel and purified with the NucleoSpin Gel and PCR Clean-up kit from Macherey-Nagel, according to the manufacturer’s instructions. PREPARATION OF DSDNA SUBSTRATES


CONTAINING RANDOMLY INCORPORATED RNMPS A fragment of 2026 bp was PCR-amplified from the M13mp18 ssDNA (New England Biolabs, catalogue # N4040) with forward DNA primer 5′-GAA GAA CTC AAA CTA


TCG GC-3’ and reverse DNA primer 5′-GAT ATT AGC GCT CAA TTA CC-3’, by using the Phusion™ High-Fidelity DNA Polymerase (ThermoFisher Scientific, catalogue # F530) according to manufacturer’s


instructions, and employed as template to PCR-amplify a fragment of 525 bp with forward DNA primer 5′-Phosphate-CCT GAA AGC GTA AGA ATA CG-3′ and reverse DNA primer 5′-Phosphate-GCC ATC ATC


TGA TAA TCA GG-3′, by using the mutant _Taq_-I614K DNA Polymerase (GeneSpin Srl, www.genespin.com). The resulting 525 bp amplicon maps on the M13mp18 reference sequence at position


4533-5057, so it includes the region of the “Ribo” and “DNA-only” oligos. Reactions were carried out as described25 in the presence of 200 μM of each dNTP for “DNA-only” samples, and in the


presence of 200 μM dATP, dGTP, and dTTP, 100 μM dCTP, and 400 μM or 600 μM, or 800 μM rCTP for “rC” samples, in a final volume of 50 μL × 48 reactions. Oligos were annealed to the template


at 50 °C for 30 s and then extended by the polymerase at 72 °C for 2 min for 45 cycles. PCR products were pooled together, concentrated by precipitation in absolute ethanol with 3 M sodium


acetate at pH 5.2, run on a 0.8% agarose gel, extracted from the gel, and purified with the NucleoSpin Gel and PCR Clean-up kit from Macherey–Nagel, according to manufacturer’s instructions.


DIRECT DNA LIBRARY PREPARATION AND SEQUENCING 1 μg of “Ribo1A”, “Ribo1B”, “Ribo1C” or “DNA-only” dsDNA fragments obtained as described in “Preparation of synthetic dsDNA substrates


containing rNMPs at known positions” were used for Nanopore libraries preparation with the SQK-LSK109 ligation sequencing kit, following the protocol “Ligation sequencing gDNA - Version:


GDE_9063_v109_revAP_25May2022. Each library was loaded on a single R9.4.1 MinION flow cell using the EXP-FLP002 Flow Cell Priming Kit and a MinION Mk1B or GridION device. Two independent


libraries for each of the above samples were sequenced and analysed. In total, 250 ng of “rC 400 μM”, “rC 600 μM”, “rC 800 μM” and “DNA-only” dsDNA fragments, obtained as described in


“Preparation of synthetic dsDNA substrates containing randomly incorporated rNMPs”, were used for preparation of a single barcoded Nanopore library with the EXP-NBD104 native barcoding kit


and the SQK-LSK109 ligation sequencing kit, following the protocol “Ligation sequencing amplicons—native barcoding—Version: NBA_9093_v109_revO_12Nov2019—Last update: 10/03/2023”. The


barcoded library was loaded on two different MinION R9.4.1 flow cells using the EXP-FLP002 Flow Cell Priming Kit and a GridION device. BASECALLING PROCEDURES AND MAPPING Reads were locally


basecalled by Guppy 5.0.1 with GPU acceleration (NVIDIA A100-40GB) using the dna_r9.4.1_450bps_hac.cfg configuration file. The M13mp18 reference sequence was downloaded from the New England


Biolabs Inc. website, and it is available at the url:


https://international.neb.com/-/media/nebus/page-images/tools-and-resources/interactive-tools/dna-sequences-and-maps/text-documents/m13mp18fsa.txt?rev=187bdc8b92314f13ba46d107b5b5553d&hash=6F212E5A79D842E6A911DF43AFAA9C07.


The fastQ files containing reads passing the Guppy quality control check, generated in the basecalling step, were merged into a single fastQ file per run and then mapped to the reference


sequence by minimap2 v.2.2278, using standard presets and setting the -ax flag to the recommended value “map-ont”. Samtools 1.13 was used to sort, index, and filter the produced BAM files,


which were then split into forward (+) and reverse (−) strand-related files using the SAM flag 16 since ribonucleotides were expected only on the reverse strand for the “Ribo1A”, “Ribo1B”


and “Ribo1C” runs with single rNMPs. The same strategy was also used for the analysis of randomly incorporated rNMPs. The subsequent analysis workflow was separated into two branches


conducted for both the strands of each run. The first branch aimed at retrieving and analysing the differences between “DNA-only” and “Ribo1” runs at the level of the basecalling features


and nucleotide alignment profiles, the second one was focused on ionic current intensities aligned back to the reference and on related dwell times. A general overview of the workflow is


schematized in Supplementary Fig. 1. For in silico simulated washout experiments, a fixed total amount of 100k reads mapping on the reverse strand were retrieved from whole BAM files from


both “DNA-only” and “Ribo1A” (or “Ribo1B”, or “Ribo1C”) runs and mixed with increasing different proportions of reads containing single rNMPs. The proportions “DNA-only”:“Ribo” reads in


these mixed and filtered BAM files used were 100:0, 80:20, 60:40, 40:60, 20:80 and 0:100. The generated BAM files were then deeply analysed at the level of all the features of interest. All


data analysis procedures were run on HPC-HTC clusters equipped with up to 40 cores, 256 GB of RAM, and several TB of disk space. ALIGNMENT PROFILES ANALYSIS Thanks to our in-house Python


scripts, the split BAM files on forward (+) and reverse (−) strands were further investigated to retrieve their alignment profiles (basecalling features) using the Pysam software v.0.19.0


https://github.com/pysam-developers/pysam79,80,81, which leverages on the htslib C-API and the pileup engine. For each reference site where a ribonucleotide was expected, an interval


covering the entire primer region (or the whole 525 bp-long PCR product for random rNMPs) was inspected and the frequencies of the aligned A, T, C, G, insertions, and deletions were


retrieved. These alignment profiles from forward and reverse BAM files were analysed separately and, for the sake of clarity, alignment profiles were shown in a simplified version (Fig. 2


and S3), as differences in frequencies for each analysed feature between the two conditions/runs “DNA-only” and “Ribo”. For the in silico simulated washout experiments, mixed BAM files with


increasing ratios of “Ribo” reads were investigated with the same approach focusing on the sites of interest in search for alteration due to rNMPs incorporation. For the evaluation of


basecalling features alterations due to the random incorporations of rCMPs, alignment profiles were retrieved analogously and summarised as total error for every genomic position computing


the sum of frequencies of unexpected aligned bases, deletions, and insertions, related to each run. So, also in this case, the difference between the total error frequencies computed on the


“DNA-only” control run and the “rC” run was calculated. All data analysis procedures were run on HPC-HTC clusters equipped with up to 40 cores, 256 GB of RAM and several TB of disk space.


All in-house scripts used are publicly available at https://github.com/F0nz0/nanopore-ribos. CURRENT PROFILES ANALYSIS For the second workflow branch, related to ionic current intensities


and dwell times analysis, the f5c software v.0.759,82 was exploited. The f5c software uses raw signals stored in fast5 files, together with BAM files, a reference genome sequence, and


basecalled reads inside fastQ files, to detect events occurring in raw signals related to nucleotides movements inside the pore and to align these back to the reference, accordingly. Since


outputs were generally very large, the extraction of events was limited to the same intervals analysed for the basecalling features via the setting of the -w flag of f5c. This allowed us to


retrieve information about raw signal levels and their mapping positions with respect to each reference coordinate for a given interval. In-house Python scripts were written to pre-process


these f5c events tables. In particular, following the indexing procedure required on fast5 files, the first step was to split the f5c table into two tables of events mapping either on the


reverse or on the forward strand. Then, unrecognised events were filtered out and all the remaining events mapping to the same genomic position and thus belonging to the same read were


collapsed together, calculating the mean current intensity value and the total dwell time. Analogously to what was done for the basecalling features, forward and reverse strand related


collapsed f5c tables were used to plot and analyse ionic current intensities along with dwell times on regions around the expected ribonucleotide incorporation sites to compare “DNA-only”


and “Ribo” runs. Dwell times analysis was shown as the difference of dwell times means between the two conditions (Figs. 4 and 9) when comparing control and “Ribo1” runs or as actual values


(Figs. 5 and Supplementary Fig. 2) varying in function of the ratio of “Ribo1” reads used for the in silico simulated washout analysis. For the _Taq_-I614 K-related set of experiments, where


the incorporation of rCMPs was expected to be random and with a varying and low stoichiometry, a different approach was used to allow the difference in currents and dwell times to emerge


from the basal level of noise. F5c eventalign tables for the region of interest were retrieved and pre-processed for forward and reverse strands using the same strategy and, for each genomic


position and for both ionic currents and dwell times, the sum of the absolute values of the differences between the “DNA-only” control and “rC” runs was computed and shown on the whole 525


bp-long PCR products or in an aggregated manner, stratifying per comparison and strand. All data analysis procedures were run on HPC-HTC clusters equipped with up to 40 cores, 256 GB of RAM


and several TB of disk space. All in-house scripts used here are publicly available at https://github.com/F0nz0/nanopore-ribos. IFOREST CLUSTERING ON THE “RC 800 ΜM” RUN For the clustering


analysis, all the reads produced by sequencing “rC 800 μM—_Taq_-I614K” DNA strands mapping on a ±5 nt-long region overlapping the site M13mp18:4996 were considered. Using custom in-house


Python scripts and functions, basecalling features for the iForest model were extracted from the related BAM file leveraging on the Pysam (v. 0.18) module. Aligned reads were traversed


individually, and information about mismatches and indels in the surrounding interval of ±5 bases were collected and encoded using a custom vectorisation strategy (consisting of a 12-long


vector), as shown in Fig. 9a, similarly to what was done for other related tasks72. More in detail, vectorized basecalling features were encoded for each read and for each position within


the explored interval as follows: +1 for matched bases, −1 for mismatched bases and 0 for deletions, while the last integer of the vector was the insertions count. Encoded basecalling


features vectors were standardised and used to fit an iForest model via the scikit-learn package and the use of the _sklearn.ensemble.IsolationForest_ class, setting the maximum number of


available threads (32 for our machine setup). The contamination parameter for this type of unsupervised machine learning model is of pivotal importance since it sets a threshold on the


anomaly score used to classify observations as anomalies-outliers or normal ones. Based on the available literature25, we set the contamination parameter to 0.05 to train the model with a


number of estimators equal to 200 and select all the samples with a bootstrap strategy to build individual trees. By means of our trained model, each basecalling features vector was


classified as an outlier or inlier and visualised via principal component analysis (PCA) focusing on the first three principal components, subsampling a subset of inlier reads equal to the


number of predicted outliers, only for visualisation purposes. In addition to PCA visualisation, the iForest predictions were used to analyse more deeply basecalling, ionic currents, and


dwell-times features, separating inliers from the outliers and comparing these two groups against the “Ribo1C” reads, looking for similar patterns. STATISTICS AND REPRODUCIBILITY For what


concerns substrates with ribonucleotides at known positions, a single “DNA-only” control library was prepared and sequenced, while two different libraries were independently prepared and


sequenced for “Ribo1A”, “Ribo1B” and “Ribo1C”. The two “Ribo” replicates gave reproducible and consistent results when compared to the “DNA-only” run. In order to avoid redundancy, only


results related to the comparisons between “DNA-only” and the first “Ribo” replicate were shown in Figures. Regarding substrates with randomly incorporated rCMPs, a single barcoded library


was prepared and sequenced twice on two different flow cells. For each flow cell, “rC” samples were compared to the internal “DNA-only” control, giving almost identical results. As shown in


Figs. 6–8, we also compared the two DNA-only technical replicates to estimate instrumental sources of variation. In order to avoid redundancy, only results related to the internal


comparisons between “DNA-only” and the “rC” samples loaded on the first flow cell were shown in the Figs. When indicated in Fig. caption, statistical analysis via two-way ANOVA was performed


using the Python3 stats models library. All data are available from the authors upon reasonable request. REPORTING SUMMARY Further information on research design is available in the Nature


Portfolio Reporting Summary linked to this article. DATA AVAILABILITY FASTQ and FAST5 files can be retrieved from the SRA database at the BioProject with accession code PRJNA928310 or at the


URL: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA928310. The list of all SRA accession numbers, their corresponding URLs and the numeric sources of all data are available within the file


Supplementary Data 1. All other data are available from the authors upon request. CODE AVAILABILITY All source code and in-house scripts used in this research work are publicly available at


the GitHub repository and at https://doi.org/10.5281/zenodo.7709403. REFERENCES * Li, Y. & Breaker, R. R. Kinetics of RNA degradation by specific base catalysis of transesterification


involving the 2γ-hydroxyl group. _J. Am. Chem. Soc._ 121, 5364–5372 (1999). Article  CAS  Google Scholar  * Yao, Y. & Dai, W. Genomic instability and cancer. _J. Carcinog. Mutagen._ 5,


1–3 (2014). CAS  Google Scholar  * Yurov, Y. B., Vorsanova, S. G. & Iourov, I. Y. Chromosome instability in the neurodegenerating brain. _Front. Genet._ 10, 892 (2019). Article  CAS 


PubMed  PubMed Central  Google Scholar  * Hoeijmakers, J. H. J. Genome maintenance mechanisms for preventing cancer. _Nature_ 411, 366–374 (2001). Article  CAS  PubMed  Google Scholar  *


Aguilera, A. & Gómez-González, B. Genome instability: a mechanistic view of its causes and consequences. _Nat. Rev. Genet._ 9, 204–217 (2008). Article  CAS  PubMed  Google Scholar  *


Sertic, S. et al. Non-canonical CRL4A/4BCDT2 interacts with RAD18 to modulate post replication repair and cell survival. _PLoS ONE_ 8, e60000 (2013). Article  CAS  PubMed  PubMed Central 


Google Scholar  * McElhinny, S. A. N. et al. Abundant ribonucleotide incorporation into DNA by yeast replicative polymerases. _Proc. Natl Acad. Sci. USA_ 107, 4949–4954 (2010). Article 


Google Scholar  * Sparks, J. L. et al. RNase H2-initiated ribonucleotide excision repair. _Mol. Cell_ 47, 980–986 (2012). Article  CAS  PubMed  PubMed Central  Google Scholar  * Reijns, M.


A. M. et al. Enzymatic removal of ribonucleotides from DNA is essential for mammalian genome integrity and development. _Cell_ 149, 1008–1022 (2012). Article  CAS  PubMed  PubMed Central 


Google Scholar  * Clausen, A. R. et al. Ribonucleotide incorporation, proofreading and bypass by human DNA polymerase δ. _DNA Repair_ 12, 121–127 (2013). Article  CAS  PubMed  Google Scholar


  * Williams, J. S., Lujan, S. A. & Kunkel, T. A. Processing ribonucleotides incorporated during eukaryotic DNA replication. _Nat. Rev. Mol. Cell Biol._ 17, 350–363 (2016). Article  CAS


  PubMed  PubMed Central  Google Scholar  * Clausen, A. R. et al. Structure-function analysis of ribonucleotide bypass by B family DNA replicases. _Proc. Natl Acad. Sci. USA_ 110,


16802–16807 (2013). Article  CAS  PubMed  PubMed Central  Google Scholar  * Nava, G. M. et al. One, no one, and one hundred thousand: the many forms of ribonucleotides in DNA. _Int. J. Mol.


Sci._ 21, 1–23 (2020). Article  Google Scholar  * Koh, K. D., Balachander, S., Hesselberth, J. R. & Storici, F. Ribose-seq: global mapping of ribonucleotides embedded in genomic DNA.


_Nat. Methods_ 12, 251–257 (2015). Article  CAS  PubMed  PubMed Central  Google Scholar  * Balachander, S. et al. Ribonucleotide incorporation in yeast genomic DNA shows preference for


cytosine and guanosine preceded by deoxyadenosine. _Nat. Commun._ 11, 1–14 (2020). Article  Google Scholar  * Iida, T., Iida, N., Sese, J. & Kobayashi, T. Evaluation of repair activity


by quantification of ribonucleotides in the genome. _Genes Cells_ 26, 555–569 (2021). Article  CAS  PubMed  PubMed Central  Google Scholar  * Ghodgaonkar, M. M. et al. Ribonucleotides


misincorporated into DNA act as strand-discrimination signals in eukaryotic mismatch repair. _Mol. Cell_ 50, 323–332 (2013). Article  CAS  PubMed  PubMed Central  Google Scholar  * Lujan, S.


A. et al. Ribonucleotides are signals for mismatch repair of leading-strand replication errors. _Mol. Cell_ 50, 437–443 (2013). Article  CAS  PubMed  PubMed Central  Google Scholar  *


Potenski, C. J. & Klein, H. L. How the misincorporation of ribonucleotides into genomic DNA can be both harmful and helpful to cells. _Nucleic Acids Res._ 42, 10226–10234 (2014). Article


  CAS  PubMed  PubMed Central  Google Scholar  * Williams, J. S. & Kunkel, T. A. Ribonucleotides in DNA: origins, repair and consequences. _DNA Repair_ 19, 27–37 (2014). Article  CAS 


PubMed  PubMed Central  Google Scholar  * Kellner, V. & Luke, B. Molecular and physiological consequences of faulty eukaryotic ribonucleotide excision repair. _EMBO J._ 39, e102309


(2020). Article  CAS  PubMed  Google Scholar  * Jaishree, T. N., Wang, A. H. J., van der Marel, G. A. & van Boom, J. H. Structural Influence of RNA incorporation in DNA: quantitative


nuclear magnetic resonance refinement of d(CG)r(CG)d(CG) and d(CG)r(C)d(TAGCG). _Biochemistry_ 32, 4903–4911 (1993). Article  CAS  PubMed  Google Scholar  * Egli, M., Usman, N. & Rich,


A. Conformational influence of the ribose 2’-hydroxyl group: crystal structures of DNA-RNA chimeric duplexes. _Biochemistry_ 32, 3221–3237 (1993). Article  CAS  PubMed  Google Scholar  *


Derose, E. F. et al. Solution structure of the Dickerson DNA dodecamer containing a single ribonucleotide. _Biochemistry_ 51, 2407–2416 (2012). Article  CAS  PubMed  Google Scholar  *


Meroni, A. et al. The incorporation of ribonucleotides induces structural and conformational changes in DNA. _Biophys. J._ 113, 1373–1382 (2017). Article  CAS  PubMed  PubMed Central  Google


Scholar  * Hovatter, K. R. & Martinson, H. G. Ribonucleotide-induced helical alteration in DNA prevents nucleosome formation. _Proc. Natl Acad. Sci. USA_ 84, 1162–1166 (1987). Article 


CAS  PubMed  PubMed Central  Google Scholar  * Fu, I., Smith, D. J. & Broyde, S. Rotational and translational positions determine the structural and dynamic impact of a single


ribonucleotide incorporated in the nucleosome. _DNA Repair_ 73, 155–163 (2019). Article  CAS  PubMed  Google Scholar  * Lazzaro, F. et al. RNase H and postreplication repair protect cells


from ribonucleotides incorporated in DNA. _Mol. Cell_ 45, 99–110 (2012). Article  CAS  PubMed  PubMed Central  Google Scholar  * Kim, N. et al. Mutagenic processing of ribonucleotides in DNA


by yeast topoisomerase I. _Science_ 332, 1561–1564 (2011). Article  CAS  PubMed  PubMed Central  Google Scholar  * Conover, H. N. et al. Stimulation of chromosomal rearrangements by


ribonucleotides. _Genetics_ 201, 951–961 (2015). Article  CAS  PubMed  PubMed Central  Google Scholar  * Klein, H. L. Genome instabilities arising from ribonucleotides in DNA. _DNA Repair_


56, 26–32 (2017). Article  CAS  PubMed  PubMed Central  Google Scholar  * Cerritelli, S. M. & Crouch, R. J. Ribonuclease H: the enzymes in eukaryotes. _FEBS J._ 276, 1494–1505 (2009).


Article  CAS  PubMed  Google Scholar  * Crow, Y. J. et al. Mutations in genes encoding ribonuclease H2 subunits cause Aicardi-Goutières syndrome and mimic congenital viral brain infection.


_Nat. Genet._ 38, 910–916 (2006). Article  CAS  PubMed  Google Scholar  * Pizzi, S. et al. Reduction of hRNase H2 activity in Aicardi-Goutières syndrome cells leads to replication stress and


genome instability. _Hum. Mol. Genet._ 24, 649–658 (2015). Article  CAS  PubMed  Google Scholar  * Kind, B. et al. Altered spatio-temporal dynamics of RNase H2 complex assembly at


replication and repair sites in Aicardi-Goutiéres syndrome. _Hum. Mol. Genet._ 23, 5950–5960 (2014). Article  CAS  PubMed  Google Scholar  * Giordano, A. M. S. et al. DNA damage contributes


to neurotoxic inflammation in Aicardi-Goutières syndrome astrocytes. _J. Exp. Med._ 219, e20211121 (2022). Article  CAS  PubMed  PubMed Central  Google Scholar  * Shah, S. P. et al.


Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. _Nature_ 461, 809–813 (2009). Article  CAS  PubMed  Google Scholar  * Williams, K. A. et al. A


systems genetics approach identifies CXCL14, ITGAX, and LPCAT2 as novel aggressive prostate cancer susceptibility genes. _PLoS Genet._ 10, e1004809 (2014). Article  PubMed  PubMed Central 


Google Scholar  * Mottaghi-Dastjerdi, N. et al. Identification of novel genes involved in gastric carcinogenesis by suppression subtractive hybridization. _Hum. Exp. Toxicol._ 34, 3–11


(2015). Article  CAS  PubMed  Google Scholar  * Dai, B. et al. RNaseH2A is involved in human gliomagenesis through the regulation of cell proliferation and apoptosis. _Oncol. Rep._ 36,


173–180 (2016). Article  CAS  PubMed  Google Scholar  * Beyer, U. et al. Rare ADAR and RNASEH2B variants and a type I interferon signature in glioma and prostate carcinoma risk and


tumorigenesis. _Acta Neuropathol._ 134, 905–922 (2017). Article  CAS  PubMed  Google Scholar  * Günther, C. et al. Defective removal of ribonucleotides from DNA promotes systemic


autoimmunity. _J. Clin. Invest._ 125, 413–424 (2015). Article  PubMed  Google Scholar  * Clausen, A. R. et al. Tracking replication enzymology in vivo by genome-wide mapping of


ribonucleotide incorporation. _Nat. Struct. Mol. Biol._ 22, 185–191 (2015). Article  CAS  PubMed  PubMed Central  Google Scholar  * Reijns, M. A. M. et al. Lagging-strand replication shapes


the mutational landscape of the genome. _Nature_ 518, 502–506 (2015). Article  CAS  PubMed  PubMed Central  Google Scholar  * Daigaku, Y. et al. A global profile of replicative polymerase


usage. _Nat. Struct. Mol. Biol._ 22, 192–198 (2015). Article  CAS  PubMed  PubMed Central  Google Scholar  * Zatopek, K. M. et al. RADAR-seq: a RAre DAmage and Repair sequencing method for


detecting DNA damage on a genome-wide scale. _DNA Repair_ 80, 36–44 (2019). Article  CAS  PubMed  Google Scholar  * Sriramachandran, A. M. et al. Genome-wide nucleotide-resolution mapping of


DNA replication patterns, single-strand breaks, and lesions by GLOE-Seq. _Mol. Cell_ 78, 975–985.e7 (2020). Article  CAS  PubMed  PubMed Central  Google Scholar  * Deamer, D., Akeson, M.


& Branton, D. Three decades of nanopore sequencing. _Nat. Biotechnol._ 34, 518–524 (2016). Article  CAS  PubMed  PubMed Central  Google Scholar  * Wang, Y. et al. Nanopore sequencing


technology, bioinformatics and applications. _Nat. Biotechnol._ 39, 1348–1365 (2021). Article  CAS  PubMed  PubMed Central  Google Scholar  * Lin, B., Hui, J. & Mao, H. Nanopore


technology and its applications in gene sequencing. _Biosensors_ 11, 214 (2021). Article  CAS  PubMed  PubMed Central  Google Scholar  * Georgieva, D., Liu, Q., Wang, K. & Egli, D.


Detection of base analogs incorporated during DNA replication by nanopore sequencing. _Nucleic Acids Res._ 48, e88–e88 (2020). Article  CAS  PubMed  PubMed Central  Google Scholar  * Xu, L.


& Seki, M. Recent advances in the detection of base modifications using the Nanopore sequencer. _J. Hum. Genet._ 65, 25–33 (2020). Article  CAS  PubMed  Google Scholar  * Müller, C. A.


et al. Capturing the dynamics of genome replication on individual ultra-long nanopore sequence reads. _Nat. Methods_ 16, 429–436 (2019). Article  PubMed  Google Scholar  * Nookaew, I. et al.


Detection and Discrimination of DNA Adducts Differing in Size, Regiochemistry, and Functional Group by Nanopore Sequencing. _Chem. Res. Toxicol._ 33, 2944–2952 (2020). Article  CAS  PubMed


  PubMed Central  Google Scholar  * Zhao, X. et al. Detection and characterization of single cisplatin adducts on DNA by nanopore sequencing. _ACS Omega_ 6, 17027–17034 (2021). Article  CAS


  PubMed  PubMed Central  Google Scholar  * Hosseini, M. et al. Deep statistical modelling of nanopore sequencing translocation times reveals latent non-B DNA structures. _Bioinformatics_


39, i242–i251 (2023). Article  PubMed  PubMed Central  Google Scholar  * Patel, P. H. & Loeb, L. A. Multiple Amino Acid Substitutions Allow DNA Polymerases to Synthesize RNA. _J. Biol.


Chem._ 275, 40266–40272 (2000). Article  CAS  PubMed  Google Scholar  * Liu, H. et al. Accurate detection of m6A RNA modifications in native RNA sequences. _Nat. Commun._ 10, 4079 (2019).


Article  PubMed  PubMed Central  Google Scholar  * Gamaarachchi, H. et al. GPU accelerated adaptive banded event alignment for rapid comparative nanopore signal analysis. _BMC Bioinforma._


21, 343 (2020). Article  CAS  Google Scholar  * Stoiber, M. et al. De novo identification of DNA modifications enabled by genome-guided nanopore signal processing. _bioRxiv_


https://doi.org/10.1101/094672 (2017). * Begik, O. et al. Quantitative profiling of pseudouridylation dynamics in native RNAs with nanopore sequencing. _Nat. Biotechnol._ 39, 1278–1291


(2021). Article  CAS  PubMed  Google Scholar  * Leger, A. et al. RNA modifications detection by comparative nanopore direct RNA sequencing. _Nat. Commun._ 12, 17 (2021). Article  Google


Scholar  * Rang, F. J., Kloosterman, W. P. & De Ridder, J. From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy. _Genome Biol._ 19, 90


(2018). Article  PubMed  PubMed Central  Google Scholar  * Teng, H. et al. Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning. _GigaScience_ 7,


giy037 (2018). Article  PubMed  PubMed Central  Google Scholar  * Noakes, M. T. et al. Increasing the accuracy of nanopore DNA sequencing using a time-varying cross membrane voltage. _Nat.


Biotechnol._ 37, 651–656 (2019). Article  CAS  PubMed  PubMed Central  Google Scholar  * Gamaarachchi, H. et al. Fast nanopore sequencing data analysis with SLOW5. _Nat. Biotechnol._ 40,


1026–1029 (2022). Article  CAS  PubMed  PubMed Central  Google Scholar  * Pratanwanich, P. N. et al. Identification of differential RNA modifications from nanopore direct RNA sequencing with


xPore. _Nat. Biotechnol._ 39, 1394–1402 (2021). Article  CAS  PubMed  Google Scholar  * Stephenson, W. et al. Direct detection of RNA modifications and structure using single-molecule


nanopore sequencing. _Cell Genomics_ 2, 100097 (2022). Article  CAS  PubMed  PubMed Central  Google Scholar  * Fleming, A. M., Mathewson, N. J., Howpay Manage, S. A. & Burrows, C. J.


Nanopore dwell time analysis permits sequencing and conformational assignment of pseudouridine in SARS-CoV-2. _ACS Cent. Sci._ 7, 1707–1717 (2021). Article  CAS  PubMed  PubMed Central 


Google Scholar  * Liu, F. T., Ting, K. M. & Zhou, Z.-H. Isolation forest. In _2008 Eighth IEEE International Conference on Data Mining (IEEE)_, pp. 413–422.


https://doi.org/10.1109/ICDM.2008.17 (2008). * Liu, F. T., Ting, K. M. & Zhou, Z.-H. Isolation-based anomaly detection. _ACM Trans. Knowl. Discov. Data_ 6, 1–39 (2012). Article  Google


Scholar  * Fonzino, A. et al. Unraveling C-to-U RNA editing events from direct RNA sequencing. _RNA Biol._ 21, 1–14 (2024). Article  CAS  PubMed  Google Scholar  * Caldecott, K. W. Ribose—an


internal threat to DNA. _Science_ 343, 260–261 (2014). Article  CAS  PubMed  Google Scholar  * Su, Y., Egli, M. & Guengerich, F. P. Mechanism of ribonucleotide incorporation by human


DNA polymerase η. _J. Biol. Chem._ 291, 3747–3756 (2016). Article  CAS  PubMed  PubMed Central  Google Scholar  * Mentegari, E. et al. Ribonucleotide incorporation by human DNA polymerase η


impacts translesion synthesis and RNase H2 activity. _Nucleic Acids Res._ 45, 2600–2614 (2017). CAS  PubMed  Google Scholar  * Gali, V. K. et al. Translesion synthesis DNA polymerase η


exhibits a specific RNA extension activity and a transcription-associated function. _Sci. Rep._ 7, 1–17 (2017). Article  CAS  Google Scholar  * Meroni, A. et al. RNase H activities


counteract a toxic effect of Polymerase η in cells replicating with depleted dNTP pools. _Nucleic Acids Res._ 47, 4612–4623 (2019). Article  CAS  PubMed  PubMed Central  Google Scholar  *


Li, H. Minimap2: pairwise alignment for nucleotide sequences. _Bioinformatics_ 34, 3094–3100 (2018). Article  CAS  PubMed  PubMed Central  Google Scholar  * Li, H. et al. The Sequence


Alignment/Map format and SAMtools. _Bioinformatics_ 25, 2078–2079 (2009). Article  PubMed  PubMed Central  Google Scholar  * Bonfield, J. K. et al. HTSlib: C library for reading/writing


high-throughput sequencing data. _GigaScience_ 10, giab007 (2021). Article  PubMed  PubMed Central  Google Scholar  * Danecek, P. et al. Twelve years of SAMtools and BCFtools. _GigaScience_


10, giab008 (2021). Article  PubMed  PubMed Central  Google Scholar  * Simpson, J. T. et al. Detecting DNA cytosine methylation using nanopore sequencing. _Nat. Methods_ 14, 407–410 (2017).


Article  CAS  PubMed  Google Scholar  Download references ACKNOWLEDGEMENTS This work was supported by the Associazione Italiana Ricerca sul Cancro—AIRC (IG-21806 to M.M.F.), Ministero


dell’Università e della Ricerca (PRIN2017_2022KJHC7S to M.M.F. and PRIN_2022JA8JY5 to F.L.), National Center for Gene Therapy and Drugs Based on RNA Technology—MUR (Project no. CN_00000041


to G.P.), Authors are also grateful to the following National Research Centers: “High-Performance Computing, Big Data and Quantum Computing” (Project no. CN_00000013 to E.P.); and Extended


Partnerships: MNESYS (Project no. PE_0000006 to G.P.) and Age-It (Project no. PE_00000015 to G.P.). This work was also supported by ELIXIR-IT through the empowering project ELIXIRNextGenIT


(Grant Code IR0000010 to G.P.). Figure 1 was created with Biorender.com. AUTHOR INFORMATION Author notes * These authors contributed equally: Lavinia Grasso, Adriano Fonzino. AUTHORS AND


AFFILIATIONS * Dipartimento di Bioscienze, Università degli Studi di Milano, Via Celoria 26, 20133, Milano, Italy Lavinia Grasso, Federico Lazzaro & Marco Muzi-Falconi * Dipartimento di


Bioscienze, Biotecnologie e Ambiente, Università di Bari A. Moro, Via Orabona 4, 70126, Bari, Italy Adriano Fonzino, Caterina Manzari, Ernesto Picardi, Carmela Gissi & Graziano Pesole *


Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia, Via Adamello 16, 20139, Milano, Italy Tommaso Leonardi * Istituto di Biomembrane, Bioenergetica e


Biotecnologie Molecolari, Consiglio Nazionale delle Ricerche, Via Amendola 122/O, 70126, Bari, Italy Ernesto Picardi, Carmela Gissi & Graziano Pesole Authors * Lavinia Grasso View author


publications You can also search for this author inPubMed Google Scholar * Adriano Fonzino View author publications You can also search for this author inPubMed Google Scholar * Caterina


Manzari View author publications You can also search for this author inPubMed Google Scholar * Tommaso Leonardi View author publications You can also search for this author inPubMed Google


Scholar * Ernesto Picardi View author publications You can also search for this author inPubMed Google Scholar * Carmela Gissi View author publications You can also search for this author


inPubMed Google Scholar * Federico Lazzaro View author publications You can also search for this author inPubMed Google Scholar * Graziano Pesole View author publications You can also search


for this author inPubMed Google Scholar * Marco Muzi-Falconi View author publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS L.G., C.M. and C.G. conducted


the biological experiments. A.F., T.L. and E.P. conducted the computational analyses. F.L., G.P. and M.M-F. conceived the study and oversaw the research. L.G. and A.F. wrote the first draft


of the paper. All authors contributed to paper reviewing and editing. CORRESPONDING AUTHORS Correspondence to Federico Lazzaro, Graziano Pesole or Marco Muzi-Falconi. ETHICS DECLARATIONS


COMPETING INTERESTS The authors declare no competing interest. PEER REVIEW PEER REVIEW INFORMATION _Communications Biology_ thanks Ganesh N. Pandian and the other, anonymous, reviewer(s) for


their contribution to the peer review of this work. Primary Handling Editors: Tobias Goris. ADDITIONAL INFORMATION PUBLISHER’S NOTE Springer Nature remains neutral with regard to


jurisdictional claims in published maps and institutional affiliations. SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION DESCRIPTION OF ADDITIONAL SUPPLEMENTARY FILES SUPPLEMENTARY DATA 1


REPORTING SUMMARY RIGHTS AND PERMISSIONS OPEN ACCESS This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation,


distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and


indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to


the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will


need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Reprints and permissions ABOUT THIS ARTICLE


CITE THIS ARTICLE Grasso, L., Fonzino, A., Manzari, C. _et al._ Detection of ribonucleotides embedded in DNA by Nanopore sequencing. _Commun Biol_ 7, 491 (2024).


https://doi.org/10.1038/s42003-024-06077-w Download citation * Received: 13 March 2023 * Accepted: 20 March 2024 * Published: 23 April 2024 * DOI: https://doi.org/10.1038/s42003-024-06077-w


SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy to


clipboard Provided by the Springer Nature SharedIt content-sharing initiative