Multicenter evaluation of gut microbiome profiling by next-generation sequencing reveals major biases in partial-length metabarcoding approach

Select a language for the TTS:
UK English Female
UK English Male
US English Female
US English Male
Australian Female
Australian Male
Language selected: (auto detect) - EN

Play all audios:

ABSTRACT Next-generation sequencing workflows, using either metabarcoding or metagenomic approaches, have massively contributed to expanding knowledge of the human gut microbiota, but

methodological bias compromises reproducibility across studies. Where these biases have been quantified within several comparative analyses on their own, none have measured inter-laboratory

reproducibility using similar DNA material. Here, we designed a multicenter study involving seven participating laboratories dedicated to partial- (P1 to P5), full-length (P6) metabarcoding,

or metagenomic profiling (MGP) using DNA from a mock microbial community or extracted from 10 fecal samples collected at two time points from five donors. Fecal material was collected, and

the DNA was extracted according to the IHMS protocols. The mock and isolated DNA were then provided to the participating laboratories for sequencing. Following sequencing analysis according

to the laboratories’ routine pipelines, relative taxonomic-count tables defined at the genus level were provided and analyzed. Large variations in alpha-diversity between laboratories,

uncorrelated with sequencing depth, were detected among the profiles. Half of the genera identified by P1 were unique to this partner and two-thirds of the genera identified by MGP were not

detected by P3. Analysis of beta-diversity revealed lower inter-individual variance than inter-laboratory variances. The taxonomic profiles of P5 and P6 were more similar to those of MGP

than those obtained by P1, P2, P3, and P4. Reanalysis of the raw sequences obtained by partial-length metabarcoding profiling, using a single bioinformatic pipeline, harmonized the

description of the bacterial profiles, which were more similar to each other, except for P3, and closer to the profiles obtained by MGP. This study highlights the major impact of the

bioinformatics pipeline, and primarily the database used for taxonomic annotation. Laboratories need to benchmark and optimize their bioinformatic pipelines using standards to monitor their

effectiveness in accurately detecting taxa present in gut microbiota. SIMILAR CONTENT BEING VIEWED BY OTHERS CROSS-COMPARISON OF GUT METAGENOMIC PROFILING STRATEGIES Article Open access 06

November 2024 THE STANDARDISATION OF THE APPROACH TO METAGENOMIC HUMAN GUT ANALYSIS: FROM SAMPLE COLLECTION TO MICROBIOME PROFILING Article Open access 19 May 2022 VARIABILITY AND BIAS IN

MICROBIOME METAGENOMIC SEQUENCING: AN INTERLABORATORY STUDY COMPARING EXPERIMENTAL PROTOCOLS Article Open access 29 April 2024 The development of next-generation high-throughput sequencing

technologies has facilitated significant advances in microbial ecology, allowing the study of microbial communities at an unprecedented level of resolution, through the ability to profile

their diversity and characterize their genetic information, without prior cultivation1. In addition to their significant impact on our understanding of life forms, DNA sequencing

technologies, including metabarcoding and metagenomics, have opened up a new area of genomics for studying microbial ecosystems. Metabarcoding uses amplicon PCR sequencing, most often of the

16S rRNA gene as a phylogenetic marker that is restricted to bacteria and archaea. Metagenomics allows analysis of collective microbial genomes in their natural habitat, using shotgun

sequencing, which captures the entire genetic information of a sample. While single-gene amplicon sequencing only allows exploration of the taxonomic diversity of prokaryotic taxa,

sequencing the entire genomic content allows exploration of gene-encoded functions as well as information about the genomes of microorganisms from multiple prokaryotic taxa. Both sequencing

approaches have common biases and limitations, which have been reviewed2, such as those linked to sample collection3, biobanking4, contaminants5, the selection of DNA extraction

protocols6,7, library preparation methods8,9,10 approach over metabarcoding in detecting low-abundance bacteria36. None of these studies measured inter-laboratory reproducibility, using

similar DNA material to measure biases generated by sequencing protocols and bioinformatics pipelines or sequencing protocols only. Here, we report an inter-laboratory reproducibility study

of metabarcoding (P1 to P6) versus metagenomic profiling (MGP) approaches using similar DNA material extracted in triplicate following the Human Microbiome Standards (IHMS) protocol from 10

fecal samples collected from five donors at two time points as well as one mock DNA community sample. In order to assess specific biases due to sequencing protocols, raw sequences delivered

by partners were reprocessed using a single bioinformatics pipeline. METHODS STUDY DESIGN, PARTICIPANTS, AND SAMPLING Five healthy adult subjects (S1–S5) were enrolled. All research was

performed in accordance with guidelines approval for the Institut Nationale de Recherche pour l'Agriculture, l'Alimentation et l'Environnement (INRAE) to manage human-derived

biological samples granted by the Ministry of Research and Education under approval number DC-2012-1728. Informed consent was obtained from all five donors. Their fecal samples were

collected at two time points in October 2018 (Sx_1) and January 2019 (Sx_2) following the IHMS protocols (SOP 05_V2) (Fig. 1A). The samples were collected by the subjects at home and stored

at room temperature in a stabilizing solution (RNAlater® Stabilization Solution, Thermo Fisher Scientific, Waltham, USA), and transported within 24 h to our facilities. DNA EXTRACTION,

STANDARD SOLUTION, AND TRANSFER TO PARTNER LABORATORIES DNA extraction from stool samples was performed as recommended by IHMS SOP 06_V2 (MGP SOP 06_V3), in triplicate using a QIAsymphony®

DSP Virus/Pathogen Midi Kit (Qiagen). ZymoBIOMICS™ Microbial Community DNA Standard was used as an internal positive control (ref. D6306; Zymo Research) (Fig. 1A). It contains a mixture of

genomic DNA extracted from 10 microbial species comprising 8 bacteria (12% genomic DNA abundance each), a yeast, and a protist (2% genomic DNA abundance each), altogether covering a wide

range of GC contents (from 15 to 85%). Before further processing, the extracted DNA was quantified using Qubit™ Fluorometric Quantitation (Qubit™ dsDNA HS kit, ref Q32851, Thermo Fisher

Scientific) and qualified using DNA size profiling on a Fragment Analyzer™ (Genomic DNA 50 kb kit, ref DNF-467-O500, Agilent Technologies, Santa Clara, USA). Multiple DNA extracts obtained

from similar fecal samples were pooled and mix as a unique solution that was shared in equal amount among partner laboratories. Partner laboratories (P1 to P6) received 1 µg of the DNA

extracted in triplicate from the 10 fecal samples as well as triplicate solutions of the mock community DNA. Partner laboratories were not able to visually differentiate mock community DNA

from human fecal DNA samples, allowing us to use this mock community DNA as internal control. The transfer was done in a box by mail in 2 mL Eppendorf tubes maintained at 4 °C using cold

packs. METAGENOMIC AND METABARCODING DNA SEQUENCING Metagenomic analysis was performed in MetaGenoPolis unit at the INRAE facilities by shotgun DNA sequencing (Fig. 1B, MGP). The libraries

were generated from 1 µg of high-molecular-weight DNA (> 20 kbp). Shearing of the DNA into fragments of approximately 150 bp was performed using an ultrasonicator E220 system (Covaris,

Woburn, USA), and DNA fragment library construction was performed using the 5500 SOLiD™ Fragment 48 Library Core Kit (Thermo Fisher Scientific). Purified and amplified DNA fragment libraries

were sequenced using an Ion Proton™ Sequencer (Thermo Fisher Scientific), with a minimum of 20 million high-quality single-end reads (150 bp) per library. Metabarcoding sequencing was

performed in the partner laboratories (P1 to P6). P1 to P5 performed partial-length (Fig. 1B, P1 to P5) and P6 full-length metabarcoding sequencing of the 16S rRNA gene (Fig. 1B, P6).

Details of the metabarcoding materials and methods used by each partner are provided in Supplementary Table 1. Upon reception, all partners performed DNA quantity control, P1 and P2

additionally performed DNA purity control, which also included DNA size control using dedicated methodologies. P1 to P5 sequenced the V3–V4 regions of the 16S rRNA gene 2 × 250 bp, or 2 ×

300 bp for P2, using a MiSeq sequencer (Illumina, San Diego, USA). P1 and P2 used the same pair of primers, while the other partners such as P3, P4, and P5 used different primer pairs but

with few differences (Supplementary Table 2). P6 built libraries using the LUMI-Seq® methodology, enabling the sequencing of full-length 16S rRNA genes using the Illumina short-read

platform. The method incorporated randomized unique molecular barcodes on the 5ʹ ends of individual 16S rRNA gene template molecules. After molecular barcoding, the full-length 16S gene

V1–V9 was PCR amplified in order to make multiple copies and to increase the signal. The full-length fragments were then enzymatically tagmented while keeping the UMI tag on all pieces. The

library was then sequenced using a NextSeq500 sequencer, in 2 × 150 bp, using the classical Illumina sequencing workflow. Each partner committed to produce a minimum of 40 k reads per DNA

sample. Following sequencing, the number of total raw reads per partial-length metabarcoding partner varied from 3.64 (P3) to 18.99 (P5) million reads (Supplementary Fig. 1, raw reads

number). P6 sequenced an average of 6,625 full-length 16S rRNA reads per sample. BIOINFORMATIC DATA ANALYSIS Metagenomic reads were cleaned using Alien Trimmer v0.2.437 to remove resilient

sequencing adapters and trim low-quality nucleotides at the 3ʹ side using a quality and length cut-off of 20 and 45 bp, respectively. Cleaned reads were subsequently filtered from human and

other possible food contaminant DNA (using human genome RCh37-p10, _Bos taurus,_ and _Arabidopsis thaliana_ with an identity score threshold of 97%). Filtered high-quality reads were mapped

with an identity threshold of 95% upon mapping on the 10.4 million genes Integrated Gut Catalogue 2 (IGC2)38, using Bowtie v2.2.639 included in METEOR v3.2 software40. A table of the gene

abundance was generated by means of a two-step procedure using METEOR. First, the unique mapped reads (reads mapped to a unique gene in the catalogue) were attributed to their corresponding

genes. Second, shared reads (reads that mapped with the same alignment score to multiple genes in the catalogue) were attributed according to the ratio of their unique mapping counts. The

gene abundance table was processed for rarefaction and normalization and further analysis using the MetaOMineR v1.2 (momr) R package41. To decrease technical bias due to different sequencing

depths and avoid any artifacts of sample size on low-abundance genes, read counts were rarefied. The gene abundance table was rarefied to 14 million reads per sample by random sampling and

removing without replacement. The resulting rarefied gene abundance table was normalized according to the FPKM strategy (normalization by the gene size and the number of total mapped reads

reported in frequency) to generate the gene abundance profile table. The gene count was computed as the number of genes detected (i.e., with a strictly positive abundance) in a given sample

after downsizing. For taxonomic profiling, the IGC2 catalogue was organized into 1990 Metagenomic Species (MSP), i.e., clusters with a minimum of 100 genes, using MSPminer42. MSP taxonomy

was assigned with the Genome Taxonomy Database43. The relative abundance of an MSP was computed as the mean abundance of its 100 ‘marker’ genes (i.e., the genes that correlate the most

altogether). If less than 10% of ‘marker’ genes were seen in a sample, the abundance of the MSP was set to 0. The relative abundances at higher taxonomical ranks were computed as the sum of

the relative abundances of the MSP that belong to a given taxon. The MSP count was assessed as the number of MSP present in a sample (i.e., with a strictly positive abundance). The MSP table

was then resolved at the genus taxonomical level. For DNA sequences obtained from mock community samples, genomes of microbial strains present in the reagent and downloaded from a link

specified in the instruction manual of the provider (ZymoResearch) were considered to build a catalogue that was further used as a reference database to obtain relative abundances of

microbial species using Kraken v.2.1.044. We acknowledge that the choice to use a genome catalogue of the ten species present in the mock community, according to the manufacturer’s

recommendation, makes it impossible to identify other species. The key steps of the bioinformatic pipeline used to analyze the metabarcoding data are described in Supplementary Table 2. P1

and P5 used mothur45 as the data analysis software, while P3 and P4 used QIIME46, P2 used FROGS47 and P6 DADA248. Only DADA2 used ASV as the clustering strategy, while the others used OTU.

Among the partners using OTU clustering, we noticed the use of different software, except for P1 and P5, who used a similar one, OptiClust49. P2 used Swarm50, P3 used UCLUST51, P4 used

USEARCH51. OptiClust, UCLUST, and USEARCH use a clustering approach based on centroid selection and a global clustering threshold (set to 97% similarity), where closely related amplicons can

be placed into different OTUs, while Swarm clusters iteratively by using a small user-chosen local clustering threshold, allowing OTUs to reach their natural limits49. All partners used

FastQC for read quality control. The partners used different read merging software, except P1 and P3, who used FLASH52. Following reads merging, we noticed that P1 obtain merged reads

shorter (~ 150 bp) than those obtained by the other partners (~ 450 bp). Different thresholds were used for sequence removal (Supplementary Table 1). P1, P3, and P5 removed chimeric

sequences, P2, P4, and P5 removed rare OTUs, P2 and P6 removed sequences based on read size, P3 based on read quality, and P1 was the only partner removing homopolymers. P3 and P4 did not

use sequence denoising software. Except for P3 and P4, who used the same RDP classifier tools53, the other partners used different taxonomic affiliation methods. P5 used Greengenes54, P2

used NCBI Blast + 55, and P6 a naive Bayesian classifier56. For partial-length metabarcoding, three different reference databases for taxonomic annotation were used. P1 and P3 used

Greengenes54, P2 and P4 used SILVA57, and P5 used RDP. P2 returned all taxonomies considering all blast best hits and a consensus taxonomy with tagging of ambiguous taxons as

“Multi-affiliation”, which were considered as unclassified taxons in the following analysis. P6 using full-length metabarcoding used an in-house database. Bacterial and archaeal genomes at

all assembly levels (Complete, Chromosome, Scaffold, Contig) were downloaded from the RefSeq database58 in May 2019, using ncbi-genome-download version 0.2.959. The taxonomic information of

the genomes were retrieved from the NCBI Taxonomy database taxdump files downloaded from the Taxonomy FTP site60. The 16S rRNA sequences were extracted from the genomes using barrnap version

0.961. For each genome, the extracted sequences were compared with each other using the clustering tool CD-HIT version 3.162,63 to keep only the unique sequences in each genome. The

database was complemented by a curated collection of prokaryotic 16S rRNA sequences from the 16S rRNA RefSeq Targeted Loci Project64 download in April 2019. Sequences containing non-standard

nucleotides or having unclear species identity (e.g., containing “sp.”) were removed. All the remaining sequences were then clustered at 100% using CD-HIT. For each cluster, the longest

sequence was defined as the representative sequence and the taxonomy of the sequences making up the cluster was checked if there is a majority (with a threshold of 90%). The majority

taxonomy would be selected as the cluster annotation. Otherwise, the cluster annotation would move up the taxonomic rank. For example, if 50% of the sequences in a cluster are annotated to

the species _Staphylococcus epidermidis_ and 50% annotated to _Staphylococcus aureus_, we will go up to the genus level by assigning Staphylococcus to the cluster, leaving the annotation of

the species level empty. The final database contains 72,954 sequences that represent 15,041 species, 3151 genera, 510 families, and 52 phyla. Following data trimming using a dedicated

metabarcoding partner bioinformatic protocol, the number of sequences obtained by the partners varied from 1.7 to 8.2 million sequences (Supplementary Fig. 1, trimmed read). Sequence

trimming removed more than 43% of the generated raw sequences in P2, which used stricter trimming conditions (read size, chimeric sequences, and rare OTUs) compared to the other partners

such as P1 and P5, with 25% of the original raw sequences removed and P3 and P4 with less than 2%. In order to measure biases with partial-length metabarcoding partners due to the

bioinformatics pipeline only, the data analysis was repeated from demultiplexed raw sequence data provided by P1 to P5, using a single and new bioinformatics pipeline. The key steps of the

bioinformatics pipeline used to analyze the metabarcoding data are described in Supplementary Table 3. Briefly, we used cutadapt65 to remove primers and SPAdes66 to correct for sequencing

errors. The paired-end reads were merged using PEAR67, chimeric sequences were removed using UCHIME 368, and the remaining sequences were clustered into ASV using Vsearch (v2.15.1).

Importantly, ASVs with numbers of sequences below 8 counts were removed (default parameter). For phylogeny identification, RDPTools suit v2.1169 was used. Following data trimming,

approximately 30% of original raw sequences were removed for P5, which is in a similar range as what the partner previously obtained, and 50% for P1, P2, and P3, which is two times higher

for P1 and twenty-five times higher than P3 compared to what the partners previously obtained and in a similar range for P2. More surprisingly, 85% of the raw sequences generated by P4 were

removed (Supplementary Fig. 1, reprocess trimmed read) when less than 2% of the sequences were removed by this partner. STATISTICAL AND DATA ANALYSIS The total number of bacterial genera

identified in all samples, by each partner, was summed to calculate the genus richness. This count was performed in case genera were present in at least two replicates out of three. The

partners provided the Shannon diversity index, calculated at the species taxonomic or OTU level, as a measure of the alpha-diversity. Both measurements of alpha-diversity were used to draw

the boxplot visualization carried out with the _ggplot_ function from the ggplot2 R package. Venn diagrams were plotted using the _draw.pairwise.venn_ function of the VennDiagram R package.

The bacterial genera relative abundance count tables provided by the partners were concatenated into a single table (Fig. 1C). A list of all bacterial genera provided by the partners’

relative count tables was made into a single entity and then re-associated with relative abundances provided by individual partners using the VLOOKUP function in Excel. Bray–Curtis indexes

were calculated using the _vegdist_ function of the vegan R package as a measure of the beta-diversity. Stacked bar plots with hierarchical clustering for visualization in a dendrogram form

were drawn based on the Bray–Curtis index values, using the _as.dendogram, upgma_ and _gglplot_ functions of the ggplot2 R package. PCoA (principal coordinate analysis) visualizations were

carried out using the _pcoa_ and _s.class_ functions of the ade4 and ape R package from Bray–Curtis dissimilarity matrices. Common and partner-specific bacterial genera were visualized using

_UpSet_ plots with the UpSet R package. All statistical analyses were performed using R v.4.1.2 software (http://cran.r-project.org/). For phylogenetic tree visualization, 16S rRNA genes

sequence alignment was carried out with _ClustalOmega_ (using default parameters), and the alignment files were then submitted to a phylogenetic analysis using Phylogeny.fr customized

workflow service70 including alignment curation with _Gblocks_ (using default parameters)71, tree construction with _PhyML_ (boostrap 100)72, and visualization by _TreeDyn_73. ETHICAL

APPROVAL AND CONSENT TO PARTICIPATE Each donor consent to participate in the protocol signing consent form and accepting their samples to be conserve in our biobank. Approval for the

Institut Nationale de Recherche pour l'Agriculture, l'Alimentation et l'Environnement (INRAE) to manage human-derived biological samples was granted by the Ministry of

Research and Education under approval number DC-2012-1728, updated DC-2020-1728. RESULTS BACTERIAL PROFILE VARIATIONS IN MOCK COMMUNITIES AT THE GENUS LEVEL While 16S rRNA metabarcoding only

identified bacteria and archaea, _Saccharomyces cerevisiae_ and _Cryptosporidium_ were only identified by a shotgun metagenomics approach. Only three metabarcoding partners (P1, P4, and

P5), out of six, detected all eight bacterial species present in the mock sample. Partner P2 missed _Escherichia_, P3 missed _Limosilactobacillus_, and P6 with LUMI-Seq® missed

_Pseudomonas_. P2 and P6 had the highest count of unclassified genera, 10.8% ± 0.4 and 44.2% ± 0.6, respectively. The high proportion of unclassified genera identified by P6 led to

underestimation of _Escherichia_, _Limosilactobacillus_, and _Salmonella_ abundances. P2 and P3 overestimated the relative proportion of _Bacillus_ and underestimated the proportion of

_Pseudomonas_. P3 overestimated the relative proportion of _Listeria_ and underestimated the relative abundance of _Staphylococcus_. Overall, from beta-diversity analysis based on

Bray–Curtis dissimilarity indexes (Fig. 2), MGP provided the best proximity with the theoretical profile, this result being expected as identification of other species was not possible,

followed by P1, P4, and P5, while P2, P3, and P6 stood out as outliers. Thus, beta-diversity analysis highlighted the lower ability of P2, P3, and P6 to correctly profile the reference

sample at the bacterial genus level as well as the good performance of P1. COMPARATIVE ALPHA-DIVERSITY AND GENUS RICHNESS ANALYSIS IN MOCK AND HUMAN FECAL SAMPLES In mock sample, all

metabarcoding partners overestimated the alpha-diversity, due to the identification of additional bacterial species, compared to the theoretical value (Fig. 3A). For the human fecal samples,

P1, P2, and P5 underestimated while P3, P4, and P6 overestimated the alpha-diversity compared to the indexes calculated by MGP. P1 and P6 were outliers in their respective groups. For the

human fecal samples, we noticed a substantial degree of inter-metabarcoding partner variation for genus richness measured from identical DNA samples, as for samples S3_1, partner P3

identified an average of 18.7 ± 0.6 genera while P4 identified 103 ± 4 genera (Fig. 3B). Overall, for any sample considered, the difference in the bacterial genus richness between P3 and P4

was the highest. P1, P2, P4, and P5 overestimated the bacterial genus richness compared to MGP and P6, who identified comparable genus richness, while it was underestimated by partner P3.

While all partners identified unclassified bacterial genera, their average relative abundances varied from 7.9 for P4 to 61.9% for P6 (Supplementary Fig. 2A). Among the partial-length

metabarcoding partners, P1, P2, and P4 had the lowest average relative abundances of unclassified bacterial genera, ranging from 7.9% for P4 to 21.9% for P2, followed by P3 and P5 with 47.7%

and 34%, respectively. Unexpectedly, these proportions of unclassified bacterial genera among partial-length 16S rRNA were lower than those obtained by laboratories using higher sequencing

taxonomical resolution such as MGP or full-length sequencing using LUMI-Seq®, with 51.4% and 61.9%, respectively. Thus, P1, P2, and P4 were using weaker conditions for classification of OTU

clusters compared to the other partners. By contrast, with a performance similar to those obtained for bacterial genus profiling using mock samples, P3 missed the most abundant human gut

genera such as _Bacteroides, Parabacteroides_, and _Prevotella_, which were not detected or identified at low relative abundance (e.g., _Faecalibacterium_ (< 0.05%)—Supplementary Fig. 3).

These bacterial genera are core members of the human gut microbiota, and analytical pipelines missing them may be identified and flagged as poor service providers in gut microbiota analysis

by regulatory and legal agencies. COMPARATIVE BETA-DIVERSITY ANALYSIS IN HUMAN FECAL SAMPLES Following the aggregation of counting tables, at the bacterial genus level, provided by the

partners after sequencing of DNA isolated from stool samples, a total of 429 unique genera were identified. For human fecal samples, the dissimilarity between all pairs of partners, as

measured by Bray–Curtis indexes at the bacterial genus level, was such that we observed lower inter-individual variance than inter-partner variance (Fig. 4). The microbiota profiles depicted

by partners P5 and P6 were the most similar to those obtained by MGP. P2 and P4, which clustered together, as well as the P3 and P1 partners, provided microbiota profiles that were the most

dissimilar to those of the other partners. We also noticed that this clustering was markedly influenced by the taxonomic database that was used for the phylogenetic annotation

(Supplementary Table 1). The number of confounding factors exceeding the number of laboratories, all conclusions should be taken with caution. Considering bacterial genera identified as

shared between the metabarcoding partners and MGP or exclusively identified by a single partner (Supplementary Fig. 4A), we observed three groups of partners. The first cluster comprised P1,

P2, and P4, for which most of the identified bacterial genera were only identified by metabarcoding. The average total relative abundance of these partner-exclusive bacterial genera ranged

from 10.6 to 29.8% for P1 and 30.8 to 65% for P2 and P4, with a high variance between samples. For P1 and P4, these results confirm their use of weaker parameters for OTU classification at

the genus level, as described earlier. The second cluster comprised P3, for which most of the identified bacterial genera were only identified by MGP, and these bacterial genera represented

an average total relative abundance between 20.1 and 28.4%, depending on the samples. This confirms the tendency of P3 to miss bacterial genera classification. The third cluster, comprising

P5 and P6, had most of its identified bacterial genera shared between the partners and MGP, which explained their similarity as measured by the Bray–Curtis distance. For the two latter

partners (P5 and P6), the bacterial genera identified by MGP only represented a low average total relative abundance, ranging from 2.2 to 8.2% for P5 and 0.3% to 5.7% for P6. Overall, the

bacterial genera exclusively identified by the metabarcoding partners represented a higher average total relative abundance compared to those exclusively identified by MGP. In a genus

intersection count analysis performed between all partners (Fig. 5), P1 had the highest number of genera exclusively identified by a single partner, between 27 and 45, which accounted for

approximately 34.6% (S3_1) to 47.4% (S1_2) of the total number of genera identified by this partner and represented between 2.9 and 10.2% of the total relative abundances of the bacterial

genera identified by this partner. P1 thus accounted for a total of 159 partner-exclusive bacterial genera, representing a low proportion of the total relative abundance (Supplementary Table

4). The tendency of P1 to identify such a high number of partner-exclusive bacterial genera contributes to the high genus richness and β-diversity dissimilarity measured compared with the

other partners (Figs. 3B and 4). On the other hand, P3 did not identify any partner-exclusive bacterial genera, except one in S4_2. This observation confirms the low genus richness

previously reported (Fig. 3B) as well as the tendency of this partner to misidentify the core gut bacterial genera. In most samples, P5, P6, and P4 identified between 12 and 26

partner-exclusive bacterial genera, which is higher than MGP, and P2, which identified 10 partner-exclusive bacterial genera. P2 and P4 shared the highest number of bacterial genera (32 to

37), varying between 24.3 and 60.3% of the total relative abundances of the bacterial genera identified by these partners. P2 and P4 thus accounted for a total of 43 partner-exclusive

bacterial genera, representing a high proportion of the total relative abundance (Supplementary Table 5). We noticed that most of the exclusive bacterial genera common to P2 and P4 were

different taxonomic sub-divisions of genera, such as _Ruminococcus_ sub-divided in Ruminococcaceae _NK4A214_ or _UCG-002_, _-003_, _-004_, _-005_, _-009_, _-010_, _-013_, _-014_,

_Ruminococcus 1_ and _2_ or _Prevotella_, _Ruminiclostridium, or Lachnospira_. All these taxonomic intermediate names present in the SILVA database used by both partners (Supplementary Table

1) largely contributed to the β-diversity similarity of the bacterial genera profiles observed between P2 and P4 and their dissimilarity with the other partners (Fig. 4). This also explains

why P2 and P4 overestimated the bacterial genus richness. However, these sequences associated with genera sub-groups or intermediate taxonomic ranks mainly corresponded to yet uncultured

bacterial groups and were not well defined at the genus level (Supplementary Fig. 5). Partner P5 identified between 14 and 22 partner-exclusive bacterial genera, which represented between

1.5 and 8.1% of the total diversity. P6 identified between 12 and 21 partner-exclusive bacterial genera, which represented between 4.9 and 9.7% of the total diversity. MGP identified between

6 and 14 partner-exclusive bacterial genera, which represented between 0.5 and 6.7% of the total diversity. This relatively low amount of partner-exclusive bacterial genera accounting for a

low total relative abundance contributed to the low dissimilarity of the bacterial genera profile as measured by β-diversity analysis between P5, P6, and MGP (Fig. 4). The number of shared

bacterial genera among all partners was between 7 and 15, depending on the samples, representing from 3.1 to 34.8% of the total relative abundances of bacterial genera, depending on the

partners. A summary of the criteria used for classification by the metabarcoding partners based on their capacity to approximate bacterial genus profile as measured by the use of metagenomic

is presented (Table 1). COMPARATIVE ANALYSIS OF THE BACTERIAL PROFILES OBTAINED AFTER REANALYSIS OF PARTIAL-LENGTH METABARCODING DATASETS To measure the specific impact of bioinformatic

pipelines on the bacterial genera profiles obtained from the partial-length metabarcoding partners (P1 to P5), we reanalyzed the raw sequence datasets issued from all partners with the use

of a single bioinformatic pipeline (Supplementary Table 3). For the mock samples, the bacterial genus profiles were similar to those obtained previously, except that P2 displayed a lower

prevalence of unclassified genus and identified _Escherichia_. P3 still missed the _Limosilactobacillus_ genus (data not shown). For the human fecal samples, the average relative abundances

of the unclassified bacterial genera varied from 20.4 to 22% for P1, P2, P4, and P5 and 37.2% for P3 (Supplementary Fig. 2B). A higher proportion of the sequences provided by P5 was

classified following the new analysis. Only the sequencing data provided by P3 had a high proportion of unclassified bacterial genera, while for the other metabarcoding sequencing partners

the proportion of unclassified bacterial genera was rather uniform. The reprocessing of sequencing data identified a total of 180 bacterial genera, so less than half of those identified

previously. Variation of the alpha-diversity, measured by the Shannon diversity index at the species level and genus richness, between the partners was lower than that measured previously.

The average Shannon diversity index varied between 3.5 and 6 in the original datasets and between 4 and 5 upon reanalysis, depending on the partners (Supplementary Fig. 6A). The average

genus richness varied between 23.2 and 102.1 in the original datasets, while the new assessments varied from 51.0 to 91.5, depending on the partners (Supplementary Fig. 6B). The bacterial

species profiles obtained by P1 had the lowest average Shannon diversity index in the original datasets (3.7 ± 0.2). Following reanalysis, it had one of the highest average Shannon diversity

indexes (4.8 ± 0.2). This increase in species alpha-diversity was also observed for P2 and P5, albeit to a lower extent. For P1 and P2, the reanalysis measured a lower richness at the genus

level. The P1 and P2 partners had an increase in the bacterial species diversity index, which did not translate into an increase in the genus richness. In contrast, the bacterial species

profiles obtained by P3 had the highest average Shannon diversity index in the original datasets (5.5 ± 0.4), whereas reanalysis led to the lowest average Shannon diversity index (4.3 ±

0.3). This decrease in species diversity following reanalysis was associated with an increase in the genus richness, which varied from 23.2 ± 3.1 to 50.9 ± 3.8. Thus, for this partner, a

decrease in the bacterial species diversity index did not translate into a decreased genus richness. This decrease in the species alpha-diversity was also observed, albeit at a lower extent,

for P4, which, in this case, also translated into a decreased genus richness upon reanalysis. The changes in the alpha-diversity index and genus richness for partner P5 were very small.

Following reanalysis and measurement of the beta-diversity as defined by the Bray–Curtis index, we found a higher similarity of the profiles obtained at the genus level between P1, P2, P4,

and P5, but P3 remained as an outlier (Fig. 6). However, considering comparative analysis between the metabarcoding and metagenomic partners, we still found lower inter-individual variance

than inter-laboratory variances, as previously observed. Considering the richness of bacterial genus identified as shared between metabarcoding and metagenomic or exclusively identified by a

single partner (Supplementary Fig. 4B), the first group of partners (P1, P2, and P4), previously identified as the one with the highest number of bacterial genera exclusively identified by

a single metabarcoding partner, was now the group with the most abundant shared bacterial genera. The metabarcoding partners-exclusive bacterial genera represented an average total relative

abundance ranging from 8.4 to 26.9%, depending on the partners and samples. The values were of the same order of magnitude as those previously measured for P1, but they were lower for P2 and

P4. The second group, represented by P3, was still dominated by bacterial genera exclusively identified by MGP, but their average total relative abundance ranging between 13.9 and 25.1%,

depending on the samples, was slightly decreased. Among these genera, we still noticed the absence of the most abundant human gut bacterial genera, such as _Bacteroides_ and _Prevotella_.

The third group, represented by P5, was still dominated by bacterial genera shared between the metabarcoding and metagenomic partners. For this partner, the bacterial genera exclusively

identified by metagenomics were still low, representing an average total relative abundance ranging from 2.5 to 7.8%, depending on the samples. In this reanalysis, the number of bacterial

genera shared between all partners was higher, between 21 and 33 (Supplementary Fig. 7), representing 20.4% to 62.1% of the total relative abundances of the bacterial genera, followed by the

metabarcoding partners-exclusive genera, accounting for 16 to 22 members or 7.1% to 29.6% of the total relative abundances. We noticed that exclusive bacterial genera were dominant in P5

and MGP, with 8 to 19, representing less than 0.2% of the relative abundances, and 14 to 20 genera, representing between 2.4 and 7.8% of the relative abundances, respectively. The data

reprocessing did not shift bacterial genus profiles obtained by P5 compared to the other partners. This may be due to the fact that the taxonomic annotation database used in this

reprocessing analysis was the same as the one used by P5. All partners, excluding P3, tended to share between 6 and 10 genera, accounting for 6.3% to 45.4% of the total relative abundances.

DISCUSSION In this multicenter study, a similar DNA standard solution or DNA isolated from human feces samples were provided in triplicates to partners who are experts in gut microbiome

profiling by shotgun metagenomics or metabarcoding with the aim of comparing the impact of their routine sequencing methods and bioinformatic pipelines in resolving bacterial profiles at the

genus and OTU, ASV, or species level. To differentiate the impact of sequencing protocols from bioinformatic strategies, all raw sequencing data from the partial-length metabarcoding

partners was reanalyzed using a single bioinformatic pipeline. The reasonable expectation is that inter-individual (inter-sample) differences should be the primary driver of microbiome

profile stratification. Our work shows that this is not the case for metabarcoding pipelines, hence questioning the possibility of standardization of 16S rRNA-based approaches. Our findings

highlight significant differences in the bacterial species alpha-diversity index, and genus richness, as well as significant differences in the beta-diversity index showing a lower

inter-individual variance than inter-laboratory variances. Our study also reports a dominance of genera exclusively identified by a single metabarcoding partner. These differences in the

bacterial genus profiles between partners, quantifying methodological biases, are seldom documented and appear greater in magnitude compared to the perceived expectations and to what has

been reported to date in the literature. The sequencing strategy, including the choice of primers for metabarcoding, greatly affects the analysis outcomes. P3 failed to identify a few common

bacterial genera, such as _Limosilactobacillus_ or the predominant human gut commensals _Bacteroides_ or _Prevotella_, or identified them but at a too low proportion, such as

_Faecalibacterium_, even following data reprocessing with a single and different bioinformatic pipeline. In the original dataset, while providing a high species α-diversity index, this

partner is also the one with the lowest genus richness compared to the others. It has been claimed on the USEARCH website (https://drive5.com/usearch/manual/uclust_algo.html) that UCLUST,

which was the clustering method used by this partner, is not designed for OTU clustering, and such observation has also been reported in an empirical study and may explain such inflating

α-diversity. Following data reprocessing, these two alpha-diversity indices became among the lowest and more consistent with each other. This partner, providing the lowest number of raw

sequences, also accounted for the highest proportion of unclassified sequences compared to the other partial-length metabarcoding partners, which is still impactful upon data reprocessing.

Unlike the other partial-length metabarcoding partners, the change of bioinformatic pipeline does not allow correction of this outlier position highlighted by beta-diversity analysis of the

original dataset. The absence or underestimation of a few of the most abundant human gut bacterial genera explains the outlier position of this partner and highlights the importance of the

sequencing strategy. We can hypothesize that the primers and/or the PCR amplification protocol chosen by P3 do not amplify the V3-V4 region of the 16S rRNA gene of these bacterial genera.

The settings for genus-level sequence assignment will matter greatly in the final outcome of metabarcoding analysis. P1 presents the particularity of identifying many partner-exclusive

bacterial genera, for most at a low proportion, but that together accounted for up to 10% of the total bacterial diversity. These were not identified following reprocessing of the raw

sequences with a new bioinformatic pipeline. We hypothesize that the sequence homology threshold for bacterial genus identification in the bioinformatics pipeline was set too low to allow

accurate bacterial genus identification, hence resulting in over-estimation of the diversity. The choice of the reference database markedly influenced the genus-level sequence assignment,

and thereby the outcome of metabarcoding analysis. P1 and P3 both used the Greengenes database for taxonomic classification. Unlike other the databases used in this study, such as SILVA and

RDP, Greengenes taxonomy is assigned based on automatic de novo 16S rRNA gene tree construction and rank mapping from other taxonomic sources, mainly the NCBI, which is not curated. Although

still included in some metabarcoding analysis packages, such as QIIME, the database has not been updated for the past ten years74. Although the Greengenes database website recommends use of

more updated sources for taxonomical annotation, this database is still used and referenced in more than a thousand publications each year. Greengenes275, a reference tree that unifies

genomic and 16S rRNA databases, recently published, should give a real chance of standardization across methods. In beta-diversity analysis, P2 and P4 displayed similar microbiota profiles,

which may be partly due to use of a common database for taxonomic identification. The use of the SILVA database and the presence of numerous sub-genera or intermediate taxonomical groups

explains the high amount of partner-exclusive bacterial genera identified for these two partners, accounting for almost half of the retrieved total bacterial genera diversity. Most of the

observed differences were due to the identification of bacterial genera sub-taxa, belonging to _Ruminococcus_, _Prevotella_, _Ruminiclostridium,_ or _Lachnospira_, which are mostly

represented by uncultured bacteria. As previously reported, this observation may be due to the high amount of sequences present in the SILVA database, with few of them being associated with

intermediate taxonomical ranks that are not present in other databases76. In this case, we assume that the presence of hypothetical sub-groups of bacterial genera in the reference database

is the main driver of the dissimilarity measured. To a lesser extent, a similar trend was also observed for partner P5, using the RDP database. However, only the genus, _Clostridium,_ was

divided into different sub-genera or intermediate ranks, which impacted the measurement of beta-diversity compared to MGP. Use of the RDP database nevertheless allowed the highest level of

similarity to be obtained to the bacterial profiles obtained by MGP or full-length metabarcoding. Separating the impact of sequencing platforms and bioinformatic pipelines in metabarcoding

analysis shows that both will influence the outcome. Although the analysis of sequences derived from various sequencing platforms using a single unique pipeline allowed greater similarities

to be obtained and diminished the “laboratory-effect” (inter-laboratory differences), it still did not allow inter-platform differences to be completely masked. Overall, both sequence

production and bioinformatics influence the distribution of samples, and both should be rigorously standardized if we are to expect distributions whereby samples from the same individual

cluster irrespective of who runs the analysis. Reprocessing of partial-length metabarcoding partner 16S rRNA sequence files with a single bioinformatic pipeline highlighted the biases due to

the use of different databases for taxonomical annotation. The results highlighted that these profiles dissimilarities between P1, P2, P4, and P5 are due to bioinformatic differences in the

way taxonomical annotation is carried out, which explains why the use of a single bioinformatic procedure homogenizes the alpha- and beta-diversity outputs. It has been previously shown

that in many instances Greengenes, SILVA, and RDP cannot be mapped reliably to one another74, thus explaining much of the dissimilarity observed before reprocessing of sequences using only

the RDP database for taxonomic annotation. However, despite the use of a single bioinformatic pipeline and similar taxonomic annotation databases, none of the bacterial genus profiles allow

an inter-individual variance to be reached that is lower than the inter-laboratories variances. The use of full-length 16S rRNA gene in LUMI-Seq® allows minimization of the presence of

exclusive bacterial genera and improvement of the identification of unclassified reads, which tends to provide profiles with higher similarity with the ones provided by MGP and P5 compared

to other partial-length metabarcoding partners. As previously demonstrated, the average bacterial species alpha-diversity as measured by the Shannon index in full-length was higher than that

in the partial-length metabarcoding partners23. However, the relative proportion of unclassified bacterial sequences was high, which may be due to the missing information in the full-length

16S rRNA gene database used. We also noticed the absence of the genus _Pseudomonas_ in the mock samples as well as _Alistipes_ in S1_1 and _Bifidobacterium_ in all samples except for S1_1

and S5_2, while they were identified by all other partners. These latter results contradict what has been reported by Jeong et al. using a similar methodology applied to gut microbiota

profiling. Here again, the primer choice may have had a strong impact. CONCLUSIONS Previous inter-laboratories studies reported biases in bacterial taxonomic profiling, following the

transfer of different raw biospecimens77, human stool samples36, to raw 16S rRNA sequences obtained from mock microbial community samples78. Here, technical replicates of DNA extracted from

human stool samples were transferred as aliquots to different laboratories, for sequencing and analysis. This approach allowed us for the first time to characterize specific biases due to

library preparation and sequence production all the way to the bioinformatics steps, without having to consider the sample collection or DNA extraction protocols. Reprocessing of raw 16S

rRNA sequences using a single bioinformatic pipeline also allowed measurement of biases specifically due to the bioinformatics pipeline. This multi-center evaluation study of gut microbiome

profiling reveals major biases mainly due to library preparation and the databases used for taxonomic annotation in bioinformatic pipelines for partial-length metabarcoding. Whereas biases

due to library preparation have been evaluated8, the impact of the choice of databases only for bacterial genus taxonomic annotation has not been investigated much to date. To our knowledge,

studies reporting bioinformatic pipeline benchmarks for metabarcoding profiling of bacteria from mock microbial communities79 or human stool samples14 were performed using similar reference

databases. Yet it is known that databases cannot be mapped reliably onto one another74, and differences between bacterial profiling using mock communities have been evaluated76. This study

highlights major differences in bacterial genus identification and relative abundance assessment due to the bioinformatic pipeline used, primarily due to the choice of the database used for

taxonomic annotation. According to the choices made concerning primer design, PCR amplification protocol in sequencing strategy as well as taxonomic reference database and sequence homology

threshold used in bioinformatic pipeline, we recommend the systematic use of _in-silico_ methodology allowing to test the relevance of these combinations of choice according to the objective

of the study. This study also reveals that the use of a single bioinformatics pipeline does not allow reduction of the proportion of partner-exclusive bacterial genera in order to allow for

a lower inter-individual variance than inter-laboratories variances between metabarcoding and metagenomic partners. For laboratories to control for the presence of false-positive or

negative bacterial genera and to accurately evaluate their pipeline or set-up standard next-generation sequencing protocols and bioinformatic pipeline, publicly or commercially available

reference biospecimens, cells, and DNA reagents should be used. Such gut-representative DNA mock community standards have recently been developed for the microbiome field80 and others have

been made commercially available by companies such as ATCC (MSA-1006). Other companies, such as Zymo Research, can also provide gut cell microbiome standard and fecal sample references

(ZymoBIOMICS). As previously highlighted, the use of these standards is critical to build-up clinical microbiota profiling or use in research laboratories to improve publication

reproducibility as well as transportability of methods and results to routine practice81. In the near future, raw metabarcoding or metagenomics DNA sequences obtained from mock microbial

communities representative of stool specimens or fecal samples, made available in publications with the pipeline being reported as set by the STORMS initiative82, may also be systematically

used by laboratories for pipeline evaluation. If microbiome profiling is ever to be made available on the dashboard of clinicians, standardizable and inter-laboratory homogeneity of outputs

will be crucial features. DATA AVAILABILITY The datasets supporting the conclusions of this article, Metagenomic and metabarcoding FastQ files, are available in NCBI BioProject under

accession number PRJNA911046. ABBREVIATIONS * IGC2: Integrated gene catalogue 2 * IHMS: International human microbiome standards * INRAE: Institut nationale de recherche pour

l'agriculture, l'alimentation et l'environnement * MGP: Metagenomic profiling * MSP: Metagenomic species * PCoA: Principal coordinate analysis REFERENCES * Vincent, A. T.,

Derome, N., Boyle, B., Culley, A. I. & Charette, S. J. Next-generation sequencing (NGS) in the microbiological world: How to make the most of your money. _J. Microbiol. Methods_ 138,

60–71 (2017). Article PubMed CAS Google Scholar * Nearing, J. T., Comeau, A. M. & Langille, M. G. I. Identifying biases and their potential solutions in human microbiome studies.

_Microbiome_ https://doi.org/10.1186/s40168-021-01059-0 (2021). Article PubMed PubMed Central Google Scholar * Penington, J. S. _et al._ Influence of fecal collection conditions and 16S

rRNA gene sequencing at two centers on human gut microbiota analysis. _Sci. Rep._ 8, 4386 (2018). Article PubMed PubMed Central ADS Google Scholar * Ilett, E. E. _et al._ Gut microbiome

comparability of fresh-frozen versus stabilized-frozen samples from hospitalized patients using 16S rRNA gene and shotgun metagenomic sequencing. _Sci. Rep._ 9, 13351 (2019). Article

PubMed PubMed Central ADS Google Scholar * Salter, S. J. _et al._ Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. _BMC Biol._ 12, 87

(2014). Article PubMed PubMed Central Google Scholar * Costea, P. I. _et al._ Towards standards for human fecal sample processing in metagenomic studies. _Nat. Biotechnol._ 35, 1069–1076

(2017). Article PubMed CAS Google Scholar * Lim, M. Y., Song, E.-J., Kim, S. H., Lee, J. & Nam, Y.-D. Comparison of DNA extraction methods for human gut microbial community

profiling. _Syst. Appl. Microbiol._ 41, 151–157 (2018). Article PubMed CAS Google Scholar * Sze, M. A. & Schloss, P. D. The impact of DNA polymerase and number of rounds of

amplification in PCR on 16S rRNA gene sequence data. _mSphere_ https://doi.org/10.1128/mSphere.00163-19 (2019). Article PubMed PubMed Central Google Scholar * Jones, M. B. _et al._

Library preparation methodology can influence genomic and functional predictions in human microbiome research. _Proc. Natl. Acad. Sci. U.S.A._ 112, 14024–14029 (2015). Article PubMed

PubMed Central ADS CAS Google Scholar * Schirmer, M. _et al._ Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform. _Nucl. Acids Res._ 43,

e37 (2015). Article PubMed PubMed Central Google Scholar * Thorsen, J. _et al._ Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene

amplicon data analysis methods used in microbiome studies. _Microbiome._ 4, 62 (2016). Article PubMed PubMed Central Google Scholar * Hillmann, B. _et al._ Evaluating the information

content of shallow shotgun metagenomics. _mSystems_ https://doi.org/10.1128/mSystems.00069-18 (2018). Article PubMed PubMed Central Google Scholar * Whon, T. W. _et al._ The effects of

sequencing platforms on phylogenetic resolution in 16 S rRNA gene profiling of human feces. _Sci. Data._ 5, 180068 (2018). Article PubMed PubMed Central CAS Google Scholar * Marizzoni,

M. _et al._ Comparison of bioinformatics pipelines and operating systems for the analyses of 16S rRNA gene amplicon sequences in human fecal samples. _Front. Microbiol._ 11, 1262 (2020).

Article PubMed PubMed Central Google Scholar * Weiss, S. _et al._ Normalization and microbial differential abundance strategies depend upon data characteristics. _Microbiome._ 5, 27

(2017). Article PubMed PubMed Central Google Scholar * Lynch, M. D. J. & Neufeld, J. D. Ecology and exploration of the rare biosphere. _Nat. Rev. Microbiol._ 13, 217–229 (2015).

Article PubMed CAS Google Scholar * Abellan-Schneyder, I. _et al._ Primer, pipelines, parameters: Issues in 16S rRNA gene sequencing. _mSphere_ https://doi.org/10.1128/mSphere.01202-20

(2021). Article PubMed PubMed Central Google Scholar * Wei, Z.-G. _et al._ Comparison of methods for picking the operational taxonomic units from amplicon sequences. _Front. Microbiol._

12, 644012 (2021). Article PubMed PubMed Central Google Scholar * Nearing, J. T. _et al._ Microbiome differential abundance methods produce different results across 38 datasets. _Nat.

Commun._ https://doi.org/10.1038/s41467-022-28034-z (2022). Article PubMed PubMed Central Google Scholar * Caruso, V., Song, X., Asquith, M. & Karstens, L. Performance of microbiome

sequence inference methods in environments with varying biomass. _mSystems_ https://doi.org/10.1128/mSystems.00163-18 (2019). Article PubMed PubMed Central Google Scholar * Acinas, S. G.

_et al._ Fine-scale phylogenetic architecture of a complex bacterial community. _Nature._ 430, 551–554 (2004). Article PubMed ADS CAS Google Scholar * Větrovský, T. & Baldrian, P.

The variability of the 16S rRNA gene in bacterial genomes and its consequences for bacterial community analyses. _PLoS ONE_ 8, e57923 (2013). Article PubMed PubMed Central ADS Google

Scholar * Jeong, J. _et al._ The effect of taxonomic classification by full-length 16S rRNA sequencing with a synthetic long-read technology. _Sci. Rep._ 11, 1727 (2021). Article PubMed

PubMed Central ADS CAS Google Scholar * Hassler, H. B. _et al._ Phylogenies of the 16S rRNA gene and its hypervariable regions lack concordance with core genome phylogenies. _Microbiome_

https://doi.org/10.1186/s40168-022-01295-y (2022). Article PubMed PubMed Central Google Scholar * Pereira-Marques, J. _et al._ Impact of host DNA and sequencing depth on the taxonomic

resolution of whole metagenome sequencing for microbiome analysis. _Front. Microbiol._ 10, 1277 (2019). Article PubMed PubMed Central Google Scholar * Gweon, H. S. _et al._ The impact of

sequencing depth on the inferred taxonomic composition and AMR gene content of metagenomic samples. _Environ. Microbiome_ https://doi.org/10.1186/s40793-019-0347-1 (2019). Article PubMed

PubMed Central Google Scholar * Laudadio, I. _et al._ Quantitative assessment of shotgun metagenomics and 16S rDNA amplicon sequencing in the study of human gut microbiome. _OMICS_ 22,

248–254 (2018). Article PubMed CAS Google Scholar * Park, S.-Y., Ufondu, A., Lee, K. & Jayaraman, A. Emerging computational tools and models for studying gut microbiota composition

and function. _Curr. Opin. Biotechnol._ 66, 301–311 (2020). Article PubMed PubMed Central CAS Google Scholar * Jovel, J. _et al._ Characterization of the gut microbiome using 16S or

shotgun metagenomics. _Front. Microbiol._ 7, 459 (2016). Article PubMed PubMed Central Google Scholar * Mitra, S. _et al._ Analysis of the intestinal microbiota using SOLiD 16S rRNA gene

sequencing and SOLiD shotgun sequencing. _BMC Genomics._ 14(Suppl 5), S16 (2013). Article PubMed PubMed Central Google Scholar * Rausch, P. _et al._ Comparative analysis of amplicon and

metagenomic sequencing methods reveals key features in the evolution of animal metaorganisms. _Microbiome_ 7, 133 (2019). Article PubMed PubMed Central Google Scholar * Biegert, G.,

Karpinets, T., Wu, X., Alam, M.B.E., Sims, T.T., Yoshida-Court, K., _et al_. Diversity and composition of gut microbiome of cervical cancer patients by 16S rRNA and whole-metagenome

sequencing (2020). * Vogtmann, E. _et al._ Colorectal cancer and the human gut microbiome: Reproducibility with whole-genome shotgun sequencing. _PLoS ONE._ 11, e0155362 (2016). Article

PubMed PubMed Central Google Scholar * Ranjan, R., Rani, A., Metwally, A., McGee, H. S. & Perkins, D. L. Analysis of the microbiome: Advantages of whole genome shotgun versus 16S

amplicon sequencing. _Biochem. Biophys. Res. Commun._ 469, 967–977 (2016). Article PubMed CAS Google Scholar * Clooney, A. G. _et al._ Comparing apples and oranges? Next generation

sequencing and its impact on microbiome analysis. _PLoS ONE_ 11, e0148028 (2016). Article PubMed PubMed Central Google Scholar * Han, D. _et al._ Multicenter assessment of microbial

community profiling using 16S rRNA gene sequencing and shotgun metagenomic sequencing. _J Adv Res._ 26, 111–121 (2020). Article PubMed PubMed Central CAS Google Scholar * Criscuolo, A.

& Brisse, S. AlienTrimmer: A tool to quickly and accurately trim off multiple short contaminant sequences from high-throughput sequencing reads. _Genomics_ 102, 500–506 (2013). Article

PubMed CAS Google Scholar * Wen, C. _et al._ Quantitative metagenomics reveals unique gut microbiome biomarkers in ankylosing spondylitis. _Genome Biol._ 18, 142 (2017). Article PubMed

PubMed Central Google Scholar * Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. _Nat. Methods_ 9, 357–359 (2012). Article PubMed PubMed Central CAS Google

Scholar * Cotillard, A. _et al._ Dietary intervention impact on gut microbial gene richness. _Nature_ 500, 585–588 (2013). Article PubMed ADS CAS Google Scholar * Le Chatelier, E. _et

al._ Richness of human gut microbiome correlates with metabolic markers. _Nature_ 500, 541–546 (2013). Article PubMed Google Scholar * Plaza Oñate, F. _et al._ MSPminer: Abundance-based

reconstitution of microbial pan-genomes from shotgun metagenomic data. _Bioinformatics_ 35, 1544–1552 (2019). Article PubMed Google Scholar * Parks, D. H. _et al._ A standardized

bacterial taxonomy based on genome phylogeny substantially revises the tree of life. _Nat. Biotechnol._ 36, 996–1004 (2018). Article PubMed CAS Google Scholar * Wood, D. E., Lu, J. &

Langmead, B. Improved metagenomic analysis with Kraken 2. _Genome Biol._ 20, 257 (2019). Article PubMed PubMed Central CAS Google Scholar * Schloss, P. D. _et al._ Introducing mothur:

Open-source, platform-independent, community-supported software for describing and comparing microbial communities. _Appl. Environ. Microbiol._ 75, 7537–7541 (2009). Article PubMed PubMed

Central ADS CAS Google Scholar * Caporaso, J. G. _et al._ QIIME allows analysis of high-throughput community sequencing data. _Nat. Methods_ 7, 335–336 (2010). Article PubMed PubMed

Central CAS Google Scholar * Escudié, F. _et al._ FROGS: Find, rapidly, OTUs with galaxy solution. _Bioinformatics_ 34, 1287–1294 (2018). Article PubMed Google Scholar * Callahan, B.

J. _et al._ DADA2: High-resolution sample inference from Illumina amplicon data. _Nat. Methods_ 13, 581–583 (2016). Article PubMed PubMed Central CAS Google Scholar * Westcott, S. L.

& Schloss, P. D. OptiClust, an improved method for assigning amplicon-based sequence data to operational taxonomic units. _mSphere_ https://doi.org/10.1128/mSphereDirect.00073-17 (2017).

Article PubMed PubMed Central Google Scholar * Mahé, F., Rognes, T., Quince, C., de Vargas, C. & Dunthorn, M. Swarm: Robust and fast clustering method for amplicon-based studies.

_PeerJ._ 2, e593 (2014). Article PubMed PubMed Central Google Scholar * Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. _Bioinformatics_ 26, 2460–2461 (2010).

Article PubMed CAS Google Scholar * Magoč, T. & Salzberg, S. L. FLASH: Fast length adjustment of short reads to improve genome assemblies. _Bioinformatics_ 27, 2957–2963 (2011).

Article PubMed PubMed Central Google Scholar * Maidak, B. L. _et al._ The RDP (Ribosomal Database Project) continues. _Nucl. Acids Res._ 28, 173–174 (2000). Article PubMed PubMed

Central CAS Google Scholar * DeSantis, T. Z. _et al._ Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. _Appl. Environ. Microbiol._ 72, 5069–5072

(2006). Article PubMed PubMed Central ADS CAS Google Scholar * Camacho, C. _et al._ BLAST+: Architecture and applications. _BMC Bioinform._ 10, 421 (2009). Article Google Scholar *

Wang, Q., Garrity, G. M., Tiedje, J. M. & Cole, J. R. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. _Appl. Environ. Microbiol._ 73,

5261–5267 (2007). Article PubMed PubMed Central ADS CAS Google Scholar * Quast, C. _et al._ The SILVA ribosomal RNA gene database project: Improved data processing and web-based tools.

_Nucl. Acids Res._ 41, D590–D596 (2013). Article PubMed CAS Google Scholar * O’Leary, N. A. _et al._ Reference sequence (RefSeq) database at NCBI: Current status, taxonomic expansion,

and functional annotation. _Nucl. Acids Res._ 44, D733–D745 (2016). Article PubMed CAS Google Scholar * Blin, K. ncbi-genome-download: Zenodo (2023). * Schoch, C. L. _et al._ NCBI

Taxonomy: A comprehensive update on curation, resources and tools. Database (Oxford) https://doi.org/10.1093/database/baaa062 (2020). * Seemann, T. barrnap 0.9: Rapid ribosomal RNA

prediction (2013). https://github.com/tseemann/barrnap. * Li, W. & Godzik, A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences.

_Bioinformatics_ 22, 1658–1659 (2006). Article PubMed CAS Google Scholar * Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: Accelerated for clustering the next-generation sequencing

data. _Bioinformatics_ 28, 3150–3152 (2012). Article PubMed PubMed Central CAS Google Scholar * Sayers, E. W. _et al._ Database resources of the National Center for Biotechnology

Information. _Nucl. Acids Res._ 47, D23–D28 (2019). Article PubMed CAS Google Scholar * Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. _EMBnet J._

17, 10 (2011). Article Google Scholar * Bankevich, A. _et al._ SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. _J. Comput. Biol._ 19, 455–477

(2012). Article MathSciNet PubMed PubMed Central CAS Google Scholar * Zhang, J., Kobert, K., Flouri, T. & Stamatakis, A. PEAR: A fast and accurate Illumina Paired-End reAd mergeR.

_Bioinformatics_ 30, 614–620 (2014). Article PubMed CAS Google Scholar * Edgar, R. C., Haas, B. J., Clemente, J. C., Quince, C. & Knight, R. UCHIME improves sensitivity and speed of

chimera detection. _Bioinformatics_ 27, 2194–2200 (2011). Article PubMed PubMed Central CAS Google Scholar * Cole, J. R. _et al._ Ribosomal Database Project: Data and tools for high

throughput rRNA analysis. _Nucl. Acids Res._ 42, D633–D642 (2014). Article PubMed CAS Google Scholar * Dereeper, A. _et al._ Phylogeny.fr: Robust phylogenetic analysis for the

non-specialist. _Nucl. Acids Res._ 36, W465–W469 (2008). Article PubMed PubMed Central CAS Google Scholar * Castresana, J. Selection of conserved blocks from multiple alignments for

their use in phylogenetic analysis. _Mol. Biol. Evol._ 17, 540–552 (2000). Article PubMed CAS Google Scholar * Guindon, S. _et al._ New algorithms and methods to estimate

maximum-likelihood phylogenies: Assessing the performance of PhyML 3.0. _Syst. Biol._ 59, 307–321 (2010). Article PubMed CAS Google Scholar * Chevenet, F., Brun, C., Bañuls, A.-L., Jacq,

B. & Christen, R. TreeDyn: Towards dynamic graphics and annotations for analyses of trees. _BMC Bioinform._ 7, 439 (2006). Article Google Scholar * Balvočiūtė, M. & Huson, D. H.

SILVA, RDP, Greengenes, NCBI and OTT—How do these taxonomies compare?. _BMC Genomics_ https://doi.org/10.1186/s12864-017-3501-4 (2017). Article PubMed PubMed Central Google Scholar *

McDonald, D. _et al._ Greengenes2 unifies microbial data in a single reference tree. _Nat. Biotechnol._ https://doi.org/10.1038/s41587-023-01845-1 (2023). Article PubMed PubMed Central

Google Scholar * Park, S.-C. & Won, S. Evaluation of 16S rRNA databases for taxonomic assignments using a mock community. _Genomics Inform._ 16, e24 (2018). Article PubMed PubMed

Central Google Scholar * Sinha, R. _et al._ Assessment of variation in microbial community amplicon sequencing by the Microbiome Quality Control (MBQC) project consortium. _Nat.

Biotechnol._ 35, 1077–1086 (2017). Article PubMed PubMed Central CAS Google Scholar * O’Sullivan, D. M. _et al._ An inter-laboratory study to investigate the impact of the

bioinformatics component on microbiome analysis using mock communities. _Sci. Rep._ 11, 10590 (2021). Article PubMed PubMed Central ADS CAS Google Scholar * Straub, D. _et al._

Interpretations of environmental microbial community studies are biased by the selected 16S rRNA (Gene) amplicon sequencing pipeline. _Front. Microbiol._ 11, 550420 (2020). Article PubMed

PubMed Central Google Scholar * Amos, G. C. A. _et al._ Developing standards for the microbiome field. _Microbiome._ 8, 98 (2020). Article PubMed PubMed Central Google Scholar *

Scherz, V., Greub, G. & Bertelli, C. Building up a clinical microbiota profiling: A quality framework proposal. _Crit. Rev. Microbiol._ 48(3), 356–375 (2021). Article PubMed Google

Scholar * Mirzayi, C. _et al._ Reporting guidelines for human microbiome research: The STORMS checklist. _Nat. Med._ 27, 1885–1892 (2021). Article PubMed PubMed Central CAS Google

Scholar Download references ACKNOWLEDGEMENTS The authors thank the five healthy human donors as well as all external partners involved in this multicenter study. Particularly, François Le

Vacon, Yao Amouzou, Morgane Pierre, Younous Adrouji, Thomas Carton and Sébastien Leuillet from Biofortis SAS, are acknowledged for their participation as partners as well as their support in

reviewing the article. Sophie Domingues is thanks for professional English editing of the article. FUNDING Joel Doré’s contribution was facilitated in part by funding from the European

Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement ERC-2017-AdG No. 788191—Homo.symbiosus). AUTHOR INFORMATION AUTHORS AND

AFFILIATIONS * Université Paris-Saclay, INRAE, MetaGenoPolis, 78350, Jouy-en-Josas, France Hugo Roume & Joël Doré * Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute,

78350, Jouy-en-Josas, France Stanislas Mondot & Joël Doré * Discovery & Front End Innovation, Lesaffre Institute of Science & Technology, Lesaffre International, 101 rue de

Menin, 59700, Marcq-en-Barœul, France Hugo Roume * BIOASTER, Microbiology Technology Institute, 40 Avenue Tony Garnier, 69007, Lyon, France Adrien Saliou * Biofortis SAS, 3 Route de la

Chatterie, Saint-Herblain, 44800, Nantes, France Sophie Le Fresne-Languille Authors * Hugo Roume View author publications You can also search for this author inPubMed Google Scholar *

Stanislas Mondot View author publications You can also search for this author inPubMed Google Scholar * Adrien Saliou View author publications You can also search for this author inPubMed

Google Scholar * Sophie Le Fresne-Languille View author publications You can also search for this author inPubMed Google Scholar * Joël Doré View author publications You can also search for

this author inPubMed Google Scholar CONTRIBUTIONS H.R. and J.D. designed the study, J.D. supervised the collection of human fecal samples, H.R. isolated the DNA, transferred it to the

partner laboratories, performed the shotgun sequencing, and analyzed the metabarcoding and metagenomic results provided by the partners. S.M. performed the reprocessing of all partial-length

16S rRNA sequences using his own bioinformatic pipeline. H.R., S.M., and J.D. wrote the manuscript. All authors and partner laboratories discussed the results and commented on the

manuscript. CORRESPONDING AUTHOR Correspondence to Joël Doré. ETHICS DECLARATIONS COMPETING INTERESTS Adrien Saliou and Sophie Le Fresne-Languille declare a conflict of interest. Other

authors declare that they have no competing interests. We acknowledge a link of interest for Bioaster and Biofortis that may provide microbiome analysis as a service. Yet they in no way

influenced the main message of the publication that merely states that 16S rDNA-based metabarcoding approaches suffer major limitations and may very likely never be standardizable. This

information aimed at the scientific community is to help promote more robust microbiome work in the future, be it by academic or by private stakeholders. It should be noted that neither of

these project partners did pay for the implementation of the project. Each of these partners financed the conduct of the experiments for which it was responsible, and the project as a whole

was designed, supervised and managed by the academic partner, MetaGenoPolis. ADDITIONAL INFORMATION PUBLISHER'S NOTE Springer Nature remains neutral with regard to jurisdictional claims

in published maps and institutional affiliations. SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION 1. SUPPLEMENTARY TABLE 4. SUPPLEMENTARY TABLE 5. RIGHTS AND PERMISSIONS OPEN ACCESS

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as

long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third

party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the

article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the

copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Roume, H., Mondot, S.,

Saliou, A. _et al._ Multicenter evaluation of gut microbiome profiling by next-generation sequencing reveals major biases in partial-length metabarcoding approach. _Sci Rep_ 13, 22593

(2023). https://doi.org/10.1038/s41598-023-46062-7 Download citation * Received: 27 February 2023 * Accepted: 27 October 2023 * Published: 18 December 2023 * DOI:

https://doi.org/10.1038/s41598-023-46062-7 SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable link Sorry, a shareable link is not

currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative

Multicenter evaluation of gut microbiome profiling by next-generation sequencing reveals major biases in partial-length metabarcoding approach

Play all audios:

Trending News

Latest News