
- Select a language for the TTS:
- UK English Female
- UK English Male
- US English Female
- US English Male
- Australian Female
- Australian Male
- Language selected: (auto detect) - EN
Play all audios:
ABSTRACT Next-generation sequencing workflows, using either metabarcoding or metagenomic approaches, have massively contributed to expanding knowledge of the human gut microbiota, but
methodological bias compromises reproducibility across studies. Where these biases have been quantified within several comparative analyses on their own, none have measured inter-laboratory
reproducibility using similar DNA material. Here, we designed a multicenter study involving seven participating laboratories dedicated to partial- (P1 to P5), full-length (P6) metabarcoding,
or metagenomic profiling (MGP) using DNA from a mock microbial community or extracted from 10 fecal samples collected at two time points from five donors. Fecal material was collected, and
the DNA was extracted according to the IHMS protocols. The mock and isolated DNA were then provided to the participating laboratories for sequencing. Following sequencing analysis according
to the laboratories’ routine pipelines, relative taxonomic-count tables defined at the genus level were provided and analyzed. Large variations in alpha-diversity between laboratories,
uncorrelated with sequencing depth, were detected among the profiles. Half of the genera identified by P1 were unique to this partner and two-thirds of the genera identified by MGP were not
detected by P3. Analysis of beta-diversity revealed lower inter-individual variance than inter-laboratory variances. The taxonomic profiles of P5 and P6 were more similar to those of MGP
than those obtained by P1, P2, P3, and P4. Reanalysis of the raw sequences obtained by partial-length metabarcoding profiling, using a single bioinformatic pipeline, harmonized the
description of the bacterial profiles, which were more similar to each other, except for P3, and closer to the profiles obtained by MGP. This study highlights the major impact of the
bioinformatics pipeline, and primarily the database used for taxonomic annotation. Laboratories need to benchmark and optimize their bioinformatic pipelines using standards to monitor their
effectiveness in accurately detecting taxa present in gut microbiota. SIMILAR CONTENT BEING VIEWED BY OTHERS CROSS-COMPARISON OF GUT METAGENOMIC PROFILING STRATEGIES Article Open access 06
November 2024 THE STANDARDISATION OF THE APPROACH TO METAGENOMIC HUMAN GUT ANALYSIS: FROM SAMPLE COLLECTION TO MICROBIOME PROFILING Article Open access 19 May 2022 VARIABILITY AND BIAS IN
MICROBIOME METAGENOMIC SEQUENCING: AN INTERLABORATORY STUDY COMPARING EXPERIMENTAL PROTOCOLS Article Open access 29 April 2024 The development of next-generation high-throughput sequencing
technologies has facilitated significant advances in microbial ecology, allowing the study of microbial communities at an unprecedented level of resolution, through the ability to profile
their diversity and characterize their genetic information, without prior cultivation1. In addition to their significant impact on our understanding of life forms, DNA sequencing
technologies, including metabarcoding and metagenomics, have opened up a new area of genomics for studying microbial ecosystems. Metabarcoding uses amplicon PCR sequencing, most often of the
16S rRNA gene as a phylogenetic marker that is restricted to bacteria and archaea. Metagenomics allows analysis of collective microbial genomes in their natural habitat, using shotgun
sequencing, which captures the entire genetic information of a sample. While single-gene amplicon sequencing only allows exploration of the taxonomic diversity of prokaryotic taxa,
sequencing the entire genomic content allows exploration of gene-encoded functions as well as information about the genomes of microorganisms from multiple prokaryotic taxa. Both sequencing
approaches have common biases and limitations, which have been reviewed2, such as those linked to sample collection3, biobanking4, contaminants5, the selection of DNA extraction
protocols6,7, library preparation methods8,9,10 approach over metabarcoding in detecting low-abundance bacteria36. None of these studies measured inter-laboratory reproducibility, using
similar DNA material to measure biases generated by sequencing protocols and bioinformatics pipelines or sequencing protocols only. Here, we report an inter-laboratory reproducibility study
of metabarcoding (P1 to P6) versus metagenomic profiling (MGP) approaches using similar DNA material extracted in triplicate following the Human Microbiome Standards (IHMS) protocol from 10
fecal samples collected from five donors at two time points as well as one mock DNA community sample. In order to assess specific biases due to sequencing protocols, raw sequences delivered
by partners were reprocessed using a single bioinformatics pipeline. METHODS STUDY DESIGN, PARTICIPANTS, AND SAMPLING Five healthy adult subjects (S1–S5) were enrolled. All research was
performed in accordance with guidelines approval for the Institut Nationale de Recherche pour l'Agriculture, l'Alimentation et l'Environnement (INRAE) to manage human-derived
biological samples granted by the Ministry of Research and Education under approval number DC-2012-1728. Informed consent was obtained from all five donors. Their fecal samples were
collected at two time points in October 2018 (Sx_1) and January 2019 (Sx_2) following the IHMS protocols (SOP 05_V2) (Fig. 1A). The samples were collected by the subjects at home and stored
at room temperature in a stabilizing solution (RNAlater® Stabilization Solution, Thermo Fisher Scientific, Waltham, USA), and transported within 24 h to our facilities. DNA EXTRACTION,
STANDARD SOLUTION, AND TRANSFER TO PARTNER LABORATORIES DNA extraction from stool samples was performed as recommended by IHMS SOP 06_V2 (MGP SOP 06_V3), in triplicate using a QIAsymphony®
DSP Virus/Pathogen Midi Kit (Qiagen). ZymoBIOMICS™ Microbial Community DNA Standard was used as an internal positive control (ref. D6306; Zymo Research) (Fig. 1A). It contains a mixture of
genomic DNA extracted from 10 microbial species comprising 8 bacteria (12% genomic DNA abundance each), a yeast, and a protist (2% genomic DNA abundance each), altogether covering a wide
range of GC contents (from 15 to 85%). Before further processing, the extracted DNA was quantified using Qubit™ Fluorometric Quantitation (Qubit™ dsDNA HS kit, ref Q32851, Thermo Fisher
Scientific) and qualified using DNA size profiling on a Fragment Analyzer™ (Genomic DNA 50 kb kit, ref DNF-467-O500, Agilent Technologies, Santa Clara, USA). Multiple DNA extracts obtained
from similar fecal samples were pooled and mix as a unique solution that was shared in equal amount among partner laboratories. Partner laboratories (P1 to P6) received 1 µg of the DNA
extracted in triplicate from the 10 fecal samples as well as triplicate solutions of the mock community DNA. Partner laboratories were not able to visually differentiate mock community DNA
from human fecal DNA samples, allowing us to use this mock community DNA as internal control. The transfer was done in a box by mail in 2 mL Eppendorf tubes maintained at 4 °C using cold
packs. METAGENOMIC AND METABARCODING DNA SEQUENCING Metagenomic analysis was performed in MetaGenoPolis unit at the INRAE facilities by shotgun DNA sequencing (Fig. 1B, MGP). The libraries
were generated from 1 µg of high-molecular-weight DNA (> 20 kbp). Shearing of the DNA into fragments of approximately 150 bp was performed using an ultrasonicator E220 system (Covaris,
Woburn, USA), and DNA fragment library construction was performed using the 5500 SOLiD™ Fragment 48 Library Core Kit (Thermo Fisher Scientific). Purified and amplified DNA fragment libraries
were sequenced using an Ion Proton™ Sequencer (Thermo Fisher Scientific), with a minimum of 20 million high-quality single-end reads (150 bp) per library. Metabarcoding sequencing was
performed in the partner laboratories (P1 to P6). P1 to P5 performed partial-length (Fig. 1B, P1 to P5) and P6 full-length metabarcoding sequencing of the 16S rRNA gene (Fig. 1B, P6).
Details of the metabarcoding materials and methods used by each partner are provided in Supplementary Table 1. Upon reception, all partners performed DNA quantity control, P1 and P2
additionally performed DNA purity control, which also included DNA size control using dedicated methodologies. P1 to P5 sequenced the V3–V4 regions of the 16S rRNA gene 2 × 250 bp, or 2 ×
300 bp for P2, using a MiSeq sequencer (Illumina, San Diego, USA). P1 and P2 used the same pair of primers, while the other partners such as P3, P4, and P5 used different primer pairs but
with few differences (Supplementary Table 2). P6 built libraries using the LUMI-Seq® methodology, enabling the sequencing of full-length 16S rRNA genes using the Illumina short-read
platform. The method incorporated randomized unique molecular barcodes on the 5ʹ ends of individual 16S rRNA gene template molecules. After molecular barcoding, the full-length 16S gene
V1–V9 was PCR amplified in order to make multiple copies and to increase the signal. The full-length fragments were then enzymatically tagmented while keeping the UMI tag on all pieces. The
library was then sequenced using a NextSeq500 sequencer, in 2 × 150 bp, using the classical Illumina sequencing workflow. Each partner committed to produce a minimum of 40 k reads per DNA
sample. Following sequencing, the number of total raw reads per partial-length metabarcoding partner varied from 3.64 (P3) to 18.99 (P5) million reads (Supplementary Fig. 1, raw reads
number). P6 sequenced an average of 6,625 full-length 16S rRNA reads per sample. BIOINFORMATIC DATA ANALYSIS Metagenomic reads were cleaned using Alien Trimmer v0.2.437 to remove resilient
sequencing adapters and trim low-quality nucleotides at the 3ʹ side using a quality and length cut-off of 20 and 45 bp, respectively. Cleaned reads were subsequently filtered from human and
other possible food contaminant DNA (using human genome RCh37-p10, _Bos taurus,_ and _Arabidopsis thaliana_ with an identity score threshold of 97%). Filtered high-quality reads were mapped
with an identity threshold of 95% upon mapping on the 10.4 million genes Integrated Gut Catalogue 2 (IGC2)38, using Bowtie v2.2.639 included in METEOR v3.2 software40. A table of the gene
abundance was generated by means of a two-step procedure using METEOR. First, the unique mapped reads (reads mapped to a unique gene in the catalogue) were attributed to their corresponding
genes. Second, shared reads (reads that mapped with the same alignment score to multiple genes in the catalogue) were attributed according to the ratio of their unique mapping counts. The
gene abundance table was processed for rarefaction and normalization and further analysis using the MetaOMineR v1.2 (momr) R package41. To decrease technical bias due to different sequencing
depths and avoid any artifacts of sample size on low-abundance genes, read counts were rarefied. The gene abundance table was rarefied to 14 million reads per sample by random sampling and
removing without replacement. The resulting rarefied gene abundance table was normalized according to the FPKM strategy (normalization by the gene size and the number of total mapped reads
reported in frequency) to generate the gene abundance profile table. The gene count was computed as the number of genes detected (i.e., with a strictly positive abundance) in a given sample
after downsizing. For taxonomic profiling, the IGC2 catalogue was organized into 1990 Metagenomic Species (MSP), i.e., clusters with a minimum of 100 genes, using MSPminer42. MSP taxonomy
was assigned with the Genome Taxonomy Database43. The relative abundance of an MSP was computed as the mean abundance of its 100 ‘marker’ genes (i.e., the genes that correlate the most
altogether). If less than 10% of ‘marker’ genes were seen in a sample, the abundance of the MSP was set to 0. The relative abundances at higher taxonomical ranks were computed as the sum of
the relative abundances of the MSP that belong to a given taxon. The MSP count was assessed as the number of MSP present in a sample (i.e., with a strictly positive abundance). The MSP table
was then resolved at the genus taxonomical level. For DNA sequences obtained from mock community samples, genomes of microbial strains present in the reagent and downloaded from a link
specified in the instruction manual of the provider (ZymoResearch) were considered to build a catalogue that was further used as a reference database to obtain relative abundances of
microbial species using Kraken v.2.1.044. We acknowledge that the choice to use a genome catalogue of the ten species present in the mock community, according to the manufacturer’s
recommendation, makes it impossible to identify other species. The key steps of the bioinformatic pipeline used to analyze the metabarcoding data are described in Supplementary Table 2. P1
and P5 used mothur45 as the data analysis software, while P3 and P4 used QIIME46, P2 used FROGS47 and P6 DADA248. Only DADA2 used ASV as the clustering strategy, while the others used OTU.
Among the partners using OTU clustering, we noticed the use of different software, except for P1 and P5, who used a similar one, OptiClust49. P2 used Swarm50, P3 used UCLUST51, P4 used
USEARCH51. OptiClust, UCLUST, and USEARCH use a clustering approach based on centroid selection and a global clustering threshold (set to 97% similarity), where closely related amplicons can
be placed into different OTUs, while Swarm clusters iteratively by using a small user-chosen local clustering threshold, allowing OTUs to reach their natural limits49. All partners used
FastQC for read quality control. The partners used different read merging software, except P1 and P3, who used FLASH52. Following reads merging, we noticed that P1 obtain merged reads
shorter (~ 150 bp) than those obtained by the other partners (~ 450 bp). Different thresholds were used for sequence removal (Supplementary Table 1). P1, P3, and P5 removed chimeric
sequences, P2, P4, and P5 removed rare OTUs, P2 and P6 removed sequences based on read size, P3 based on read quality, and P1 was the only partner removing homopolymers. P3 and P4 did not
use sequence denoising software. Except for P3 and P4, who used the same RDP classifier tools53, the other partners used different taxonomic affiliation methods. P5 used Greengenes54, P2
used NCBI Blast + 55, and P6 a naive Bayesian classifier56. For partial-length metabarcoding, three different reference databases for taxonomic annotation were used. P1 and P3 used
Greengenes54, P2 and P4 used SILVA57, and P5 used RDP. P2 returned all taxonomies considering all blast best hits and a consensus taxonomy with tagging of ambiguous taxons as
“Multi-affiliation”, which were considered as unclassified taxons in the following analysis. P6 using full-length metabarcoding used an in-house database. Bacterial and archaeal genomes at
all assembly levels (Complete, Chromosome, Scaffold, Contig) were downloaded from the RefSeq database58 in May 2019, using ncbi-genome-download version 0.2.959. The taxonomic information of
the genomes were retrieved from the NCBI Taxonomy database taxdump files downloaded from the Taxonomy FTP site60. The 16S rRNA sequences were extracted from the genomes using barrnap version
0.961. For each genome, the extracted sequences were compared with each other using the clustering tool CD-HIT version 3.162,63 to keep only the unique sequences in each genome. The
database was complemented by a curated collection of prokaryotic 16S rRNA sequences from the 16S rRNA RefSeq Targeted Loci Project64 download in April 2019. Sequences containing non-standard
nucleotides or having unclear species identity (e.g., containing “sp.”) were removed. All the remaining sequences were then clustered at 100% using CD-HIT. For each cluster, the longest
sequence was defined as the representative sequence and the taxonomy of the sequences making up the cluster was checked if there is a majority (with a threshold of 90%). The majority
taxonomy would be selected as the cluster annotation. Otherwise, the cluster annotation would move up the taxonomic rank. For example, if 50% of the sequences in a cluster are annotated to
the species _Staphylococcus epidermidis_ and 50% annotated to _Staphylococcus aureus_, we will go up to the genus level by assigning Staphylococcus to the cluster, leaving the annotation of
the species level empty. The final database contains 72,954 sequences that represent 15,041 species, 3151 genera, 510 families, and 52 phyla. Following data trimming using a dedicated
metabarcoding partner bioinformatic protocol, the number of sequences obtained by the partners varied from 1.7 to 8.2 million sequences (Supplementary Fig. 1, trimmed read). Sequence
trimming removed more than 43% of the generated raw sequences in P2, which used stricter trimming conditions (read size, chimeric sequences, and rare OTUs) compared to the other partners
such as P1 and P5, with 25% of the original raw sequences removed and P3 and P4 with less than 2%. In order to measure biases with partial-length metabarcoding partners due to the
bioinformatics pipeline only, the data analysis was repeated from demultiplexed raw sequence data provided by P1 to P5, using a single and new bioinformatics pipeline. The key steps of the
bioinformatics pipeline used to analyze the metabarcoding data are described in Supplementary Table 3. Briefly, we used cutadapt65 to remove primers and SPAdes66 to correct for sequencing
errors. The paired-end reads were merged using PEAR67, chimeric sequences were removed using UCHIME 368, and the remaining sequences were clustered into ASV using Vsearch (v2.15.1).
Importantly, ASVs with numbers of sequences below 8 counts were removed (default parameter). For phylogeny identification, RDPTools suit v2.1169 was used. Following data trimming,
approximately 30% of original raw sequences were removed for P5, which is in a similar range as what the partner previously obtained, and 50% for P1, P2, and P3, which is two times higher
for P1 and twenty-five times higher than P3 compared to what the partners previously obtained and in a similar range for P2. More surprisingly, 85% of the raw sequences generated by P4 were
removed (Supplementary Fig. 1, reprocess trimmed read) when less than 2% of the sequences were removed by this partner. STATISTICAL AND DATA ANALYSIS The total number of bacterial genera
identified in all samples, by each partner, was summed to calculate the genus richness. This count was performed in case genera were present in at least two replicates out of three. The
partners provided the Shannon diversity index, calculated at the species taxonomic or OTU level, as a measure of the alpha-diversity. Both measurements of alpha-diversity were used to draw
the boxplot visualization carried out with the _ggplot_ function from the ggplot2 R package. Venn diagrams were plotted using the _draw.pairwise.venn_ function of the VennDiagram R package.
The bacterial genera relative abundance count tables provided by the partners were concatenated into a single table (Fig. 1C). A list of all bacterial genera provided by the partners’
relative count tables was made into a single entity and then re-associated with relative abundances provided by individual partners using the VLOOKUP function in Excel. Bray–Curtis indexes
were calculated using the _vegdist_ function of the vegan R package as a measure of the beta-diversity. Stacked bar plots with hierarchical clustering for visualization in a dendrogram form
were drawn based on the Bray–Curtis index values, using the _as.dendogram, upgma_ and _gglplot_ functions of the ggplot2 R package. PCoA (principal coordinate analysis) visualizations were
carried out using the _pcoa_ and _s.class_ functions of the ade4 and ape R package from Bray–Curtis dissimilarity matrices. Common and partner-specific bacterial genera were visualized using
_UpSet_ plots with the UpSet R package. All statistical analyses were performed using R v.4.1.2 software (http://cran.r-project.org/). For phylogenetic tree visualization, 16S rRNA genes
sequence alignment was carried out with _ClustalOmega_ (using default parameters), and the alignment files were then submitted to a phylogenetic analysis using Phylogeny.fr customized
workflow service70 including alignment curation with _Gblocks_ (using default parameters)71, tree construction with _PhyML_ (boostrap 100)72, and visualization by _TreeDyn_73. ETHICAL
APPROVAL AND CONSENT TO PARTICIPATE Each donor consent to participate in the protocol signing consent form and accepting their samples to be conserve in our biobank. Approval for the
Institut Nationale de Recherche pour l'Agriculture, l'Alimentation et l'Environnement (INRAE) to manage human-derived biological samples was granted by the Ministry of
Research and Education under approval number DC-2012-1728, updated DC-2020-1728. RESULTS BACTERIAL PROFILE VARIATIONS IN MOCK COMMUNITIES AT THE GENUS LEVEL While 16S rRNA metabarcoding only
identified bacteria and archaea, _Saccharomyces cerevisiae_ and _Cryptosporidium_ were only identified by a shotgun metagenomics approach. Only three metabarcoding partners (P1, P4, and
P5), out of six, detected all eight bacterial species present in the mock sample. Partner P2 missed _Escherichia_, P3 missed _Limosilactobacillus_, and P6 with LUMI-Seq® missed
_Pseudomonas_. P2 and P6 had the highest count of unclassified genera, 10.8% ± 0.4 and 44.2% ± 0.6, respectively. The high proportion of unclassified genera identified by P6 led to
underestimation of _Escherichia_, _Limosilactobacillus_, and _Salmonella_ abundances. P2 and P3 overestimated the relative proportion of _Bacillus_ and underestimated the proportion of
_Pseudomonas_. P3 overestimated the relative proportion of _Listeria_ and underestimated the relative abundance of _Staphylococcus_. Overall, from beta-diversity analysis based on
Bray–Curtis dissimilarity indexes (Fig. 2), MGP provided the best proximity with the theoretical profile, this result being expected as identification of other species was not possible,
followed by P1, P4, and P5, while P2, P3, and P6 stood out as outliers. Thus, beta-diversity analysis highlighted the lower ability of P2, P3, and P6 to correctly profile the reference
sample at the bacterial genus level as well as the good performance of P1. COMPARATIVE ALPHA-DIVERSITY AND GENUS RICHNESS ANALYSIS IN MOCK AND HUMAN FECAL SAMPLES In mock sample, all
metabarcoding partners overestimated the alpha-diversity, due to the identification of additional bacterial species, compared to the theoretical value (Fig. 3A). For the human fecal samples,
P1, P2, and P5 underestimated while P3, P4, and P6 overestimated the alpha-diversity compared to the indexes calculated by MGP. P1 and P6 were outliers in their respective groups. For the
human fecal samples, we noticed a substantial degree of inter-metabarcoding partner variation for genus richness measured from identical DNA samples, as for samples S3_1, partner P3
identified an average of 18.7 ± 0.6 genera while P4 identified 103 ± 4 genera (Fig. 3B). Overall, for any sample considered, the difference in the bacterial genus richness between P3 and P4
was the highest. P1, P2, P4, and P5 overestimated the bacterial genus richness compared to MGP and P6, who identified comparable genus richness, while it was underestimated by partner P3.
While all partners identified unclassified bacterial genera, their average relative abundances varied from 7.9 for P4 to 61.9% for P6 (Supplementary Fig. 2A). Among the partial-length
metabarcoding partners, P1, P2, and P4 had the lowest average relative abundances of unclassified bacterial genera, ranging from 7.9% for P4 to 21.9% for P2, followed by P3 and P5 with 47.7%
and 34%, respectively. Unexpectedly, these proportions of unclassified bacterial genera among partial-length 16S rRNA were lower than those obtained by laboratories using higher sequencing
taxonomical resolution such as MGP or full-length sequencing using LUMI-Seq®, with 51.4% and 61.9%, respectively. Thus, P1, P2, and P4 were using weaker conditions for classification of OTU
clusters compared to the other partners. By contrast, with a performance similar to those obtained for bacterial genus profiling using mock samples, P3 missed the most abundant human gut
genera such as _Bacteroides, Parabacteroides_, and _Prevotella_, which were not detected or identified at low relative abundance (e.g., _Faecalibacterium_ (< 0.05%)—Supplementary Fig. 3).
These bacterial genera are core members of the human gut microbiota, and analytical pipelines missing them may be identified and flagged as poor service providers in gut microbiota analysis
by regulatory and legal agencies. COMPARATIVE BETA-DIVERSITY ANALYSIS IN HUMAN FECAL SAMPLES Following the aggregation of counting tables, at the bacterial genus level, provided by the
partners after sequencing of DNA isolated from stool samples, a total of 429 unique genera were identified. For human fecal samples, the dissimilarity between all pairs of partners, as
measured by Bray–Curtis indexes at the bacterial genus level, was such that we observed lower inter-individual variance than inter-partner variance (Fig. 4). The microbiota profiles depicted
by partners P5 and P6 were the most similar to those obtained by MGP. P2 and P4, which clustered together, as well as the P3 and P1 partners, provided microbiota profiles that were the most
dissimilar to those of the other partners. We also noticed that this clustering was markedly influenced by the taxonomic database that was used for the phylogenetic annotation
(Supplementary Table 1). The number of confounding factors exceeding the number of laboratories, all conclusions should be taken with caution. Considering bacterial genera identified as
shared between the metabarcoding partners and MGP or exclusively identified by a single partner (Supplementary Fig. 4A), we observed three groups of partners. The first cluster comprised P1,
P2, and P4, for which most of the identified bacterial genera were only identified by metabarcoding. The average total relative abundance of these partner-exclusive bacterial genera ranged
from 10.6 to 29.8% for P1 and 30.8 to 65% for P2 and P4, with a high variance between samples. For P1 and P4, these results confirm their use of weaker parameters for OTU classification at
the genus level, as described earlier. The second cluster comprised P3, for which most of the identified bacterial genera were only identified by MGP, and these bacterial genera represented
an average total relative abundance between 20.1 and 28.4%, depending on the samples. This confirms the tendency of P3 to miss bacterial genera classification. The third cluster, comprising
P5 and P6, had most of its identified bacterial genera shared between the partners and MGP, which explained their similarity as measured by the Bray–Curtis distance. For the two latter
partners (P5 and P6), the bacterial genera identified by MGP only represented a low average total relative abundance, ranging from 2.2 to 8.2% for P5 and 0.3% to 5.7% for P6. Overall, the
bacterial genera exclusively identified by the metabarcoding partners represented a higher average total relative abundance compared to those exclusively identified by MGP. In a genus
intersection count analysis performed between all partners (Fig. 5), P1 had the highest number of genera exclusively identified by a single partner, between 27 and 45, which accounted for
approximately 34.6% (S3_1) to 47.4% (S1_2) of the total number of genera identified by this partner and represented between 2.9 and 10.2% of the total relative abundances of the bacterial
genera identified by this partner. P1 thus accounted for a total of 159 partner-exclusive bacterial genera, representing a low proportion of the total relative abundance (Supplementary Table
4). The tendency of P1 to identify such a high number of partner-exclusive bacterial genera contributes to the high genus richness and β-diversity dissimilarity measured compared with the
other partners (Figs. 3B and 4). On the other hand, P3 did not identify any partner-exclusive bacterial genera, except one in S4_2. This observation confirms the low genus richness
previously reported (Fig. 3B) as well as the tendency of this partner to misidentify the core gut bacterial genera. In most samples, P5, P6, and P4 identified between 12 and 26
partner-exclusive bacterial genera, which is higher than MGP, and P2, which identified 10 partner-exclusive bacterial genera. P2 and P4 shared the highest number of bacterial genera (32 to
37), varying between 24.3 and 60.3% of the total relative abundances of the bacterial genera identified by these partners. P2 and P4 thus accounted for a total of 43 partner-exclusive
bacterial genera, representing a high proportion of the total relative abundance (Supplementary Table 5). We noticed that most of the exclusive bacterial genera common to P2 and P4 were
different taxonomic sub-divisions of genera, such as _Ruminococcus_ sub-divided in Ruminococcaceae _NK4A214_ or _UCG-002_, _-003_, _-004_, _-005_, _-009_, _-010_, _-013_, _-014_,
_Ruminococcus 1_ and _2_ or _Prevotella_, _Ruminiclostridium, or Lachnospira_. All these taxonomic intermediate names present in the SILVA database used by both partners (Supplementary Table
1) largely contributed to the β-diversity similarity of the bacterial genera profiles observed between P2 and P4 and their dissimilarity with the other partners (Fig. 4). This also explains
why P2 and P4 overestimated the bacterial genus richness. However, these sequences associated with genera sub-groups or intermediate taxonomic ranks mainly corresponded to yet uncultured
bacterial groups and were not well defined at the genus level (Supplementary Fig. 5). Partner P5 identified between 14 and 22 partner-exclusive bacterial genera, which represented between
1.5 and 8.1% of the total diversity. P6 identified between 12 and 21 partner-exclusive bacterial genera, which represented between 4.9 and 9.7% of the total diversity. MGP identified between
6 and 14 partner-exclusive bacterial genera, which represented between 0.5 and 6.7% of the total diversity. This relatively low amount of partner-exclusive bacterial genera accounting for a
low total relative abundance contributed to the low dissimilarity of the bacterial genera profile as measured by β-diversity analysis between P5, P6, and MGP (Fig. 4). The number of shared
bacterial genera among all partners was between 7 and 15, depending on the samples, representing from 3.1 to 34.8% of the total relative abundances of bacterial genera, depending on the
partners. A summary of the criteria used for classification by the metabarcoding partners based on their capacity to approximate bacterial genus profile as measured by the use of metagenomic
is presented (Table 1). COMPARATIVE ANALYSIS OF THE BACTERIAL PROFILES OBTAINED AFTER REANALYSIS OF PARTIAL-LENGTH METABARCODING DATASETS To measure the specific impact of bioinformatic
pipelines on the bacterial genera profiles obtained from the partial-length metabarcoding partners (P1 to P5), we reanalyzed the raw sequence datasets issued from all partners with the use
of a single bioinformatic pipeline (Supplementary Table 3). For the mock samples, the bacterial genus profiles were similar to those obtained previously, except that P2 displayed a lower
prevalence of unclassified genus and identified _Escherichia_. P3 still missed the _Limosilactobacillus_ genus (data not shown). For the human fecal samples, the average relative abundances
of the unclassified bacterial genera varied from 20.4 to 22% for P1, P2, P4, and P5 and 37.2% for P3 (Supplementary Fig. 2B). A higher proportion of the sequences provided by P5 was
classified following the new analysis. Only the sequencing data provided by P3 had a high proportion of unclassified bacterial genera, while for the other metabarcoding sequencing partners
the proportion of unclassified bacterial genera was rather uniform. The reprocessing of sequencing data identified a total of 180 bacterial genera, so less than half of those identified
previously. Variation of the alpha-diversity, measured by the Shannon diversity index at the species level and genus richness, between the partners was lower than that measured previously.
The average Shannon diversity index varied between 3.5 and 6 in the original datasets and between 4 and 5 upon reanalysis, depending on the partners (Supplementary Fig. 6A). The average
genus richness varied between 23.2 and 102.1 in the original datasets, while the new assessments varied from 51.0 to 91.5, depending on the partners (Supplementary Fig. 6B). The bacterial
species profiles obtained by P1 had the lowest average Shannon diversity index in the original datasets (3.7 ± 0.2). Following reanalysis, it had one of the highest average Shannon diversity
indexes (4.8 ± 0.2). This increase in species alpha-diversity was also observed for P2 and P5, albeit to a lower extent. For P1 and P2, the reanalysis measured a lower richness at the genus
level. The P1 and P2 partners had an increase in the bacterial species diversity index, which did not translate into an increase in the genus richness. In contrast, the bacterial species
profiles obtained by P3 had the highest average Shannon diversity index in the original datasets (5.5 ± 0.4), whereas reanalysis led to the lowest average Shannon diversity index (4.3 ±
0.3). This decrease in species diversity following reanalysis was associated with an increase in the genus richness, which varied from 23.2 ± 3.1 to 50.9 ± 3.8. Thus, for this partner, a
decrease in the bacterial species diversity index did not translate into a decreased genus richness. This decrease in the species alpha-diversity was also observed, albeit at a lower extent,
for P4, which, in this case, also translated into a decreased genus richness upon reanalysis. The changes in the alpha-diversity index and genus richness for partner P5 were very small.
Following reanalysis and measurement of the beta-diversity as defined by the Bray–Curtis index, we found a higher similarity of the profiles obtained at the genus level between P1, P2, P4,
and P5, but P3 remained as an outlier (Fig. 6). However, considering comparative analysis between the metabarcoding and metagenomic partners, we still found lower inter-individual variance
than inter-laboratory variances, as previously observed. Considering the richness of bacterial genus identified as shared between metabarcoding and metagenomic or exclusively identified by a
single partner (Supplementary Fig. 4B), the first group of partners (P1, P2, and P4), previously identified as the one with the highest number of bacterial genera exclusively identified by
a single metabarcoding partner, was now the group with the most abundant shared bacterial genera. The metabarcoding partners-exclusive bacterial genera represented an average total relative
abundance ranging from 8.4 to 26.9%, depending on the partners and samples. The values were of the same order of magnitude as those previously measured for P1, but they were lower for P2 and
P4. The second group, represented by P3, was still dominated by bacterial genera exclusively identified by MGP, but their average total relative abundance ranging between 13.9 and 25.1%,
depending on the samples, was slightly decreased. Among these genera, we still noticed the absence of the most abundant human gut bacterial genera, such as _Bacteroides_ and _Prevotella_.
The third group, represented by P5, was still dominated by bacterial genera shared between the metabarcoding and metagenomic partners. For this partner, the bacterial genera exclusively
identified by metagenomics were still low, representing an average total relative abundance ranging from 2.5 to 7.8%, depending on the samples. In this reanalysis, the number of bacterial
genera shared between all partners was higher, between 21 and 33 (Supplementary Fig. 7), representing 20.4% to 62.1% of the total relative abundances of the bacterial genera, followed by the
metabarcoding partners-exclusive genera, accounting for 16 to 22 members or 7.1% to 29.6% of the total relative abundances. We noticed that exclusive bacterial genera were dominant in P5
and MGP, with 8 to 19, representing less than 0.2% of the relative abundances, and 14 to 20 genera, representing between 2.4 and 7.8% of the relative abundances, respectively. The data
reprocessing did not shift bacterial genus profiles obtained by P5 compared to the other partners. This may be due to the fact that the taxonomic annotation database used in this
reprocessing analysis was the same as the one used by P5. All partners, excluding P3, tended to share between 6 and 10 genera, accounting for 6.3% to 45.4% of the total relative abundances.
DISCUSSION In this multicenter study, a similar DNA standard solution or DNA isolated from human feces samples were provided in triplicates to partners who are experts in gut microbiome
profiling by shotgun metagenomics or metabarcoding with the aim of comparing the impact of their routine sequencing methods and bioinformatic pipelines in resolving bacterial profiles at the
genus and OTU, ASV, or species level. To differentiate the impact of sequencing protocols from bioinformatic strategies, all raw sequencing data from the partial-length metabarcoding
partners was reanalyzed using a single bioinformatic pipeline. The reasonable expectation is that inter-individual (inter-sample) differences should be the primary driver of microbiome
profile stratification. Our work shows that this is not the case for metabarcoding pipelines, hence questioning the possibility of standardization of 16S rRNA-based approaches. Our findings
highlight significant differences in the bacterial species alpha-diversity index, and genus richness, as well as significant differences in the beta-diversity index showing a lower
inter-individual variance than inter-laboratory variances. Our study also reports a dominance of genera exclusively identified by a single metabarcoding partner. These differences in the
bacterial genus profiles between partners, quantifying methodological biases, are seldom documented and appear greater in magnitude compared to the perceived expectations and to what has
been reported to date in the literature. The sequencing strategy, including the choice of primers for metabarcoding, greatly affects the analysis outcomes. P3 failed to identify a few common
bacterial genera, such as _Limosilactobacillus_ or the predominant human gut commensals _Bacteroides_ or _Prevotella_, or identified them but at a too low proportion, such as
_Faecalibacterium_, even following data reprocessing with a single and different bioinformatic pipeline. In the original dataset, while providing a high species α-diversity index, this
partner is also the one with the lowest genus richness compared to the others. It has been claimed on the USEARCH website (https://drive5.com/usearch/manual/uclust_algo.html) that UCLUST,
which was the clustering method used by this partner, is not designed for OTU clustering, and such observation has also been reported in an empirical study and may explain such inflating
α-diversity. Following data reprocessing, these two alpha-diversity indices became among the lowest and more consistent with each other. This partner, providing the lowest number of raw
sequences, also accounted for the highest proportion of unclassified sequences compared to the other partial-length metabarcoding partners, which is still impactful upon data reprocessing.
Unlike the other partial-length metabarcoding partners, the change of bioinformatic pipeline does not allow correction of this outlier position highlighted by beta-diversity analysis of the
original dataset. The absence or underestimation of a few of the most abundant human gut bacterial genera explains the outlier position of this partner and highlights the importance of the
sequencing strategy. We can hypothesize that the primers and/or the PCR amplification protocol chosen by P3 do not amplify the V3-V4 region of the 16S rRNA gene of these bacterial genera.
The settings for genus-level sequence assignment will matter greatly in the final outcome of metabarcoding analysis. P1 presents the particularity of identifying many partner-exclusive
bacterial genera, for most at a low proportion, but that together accounted for up to 10% of the total bacterial diversity. These were not identified following reprocessing of the raw
sequences with a new bioinformatic pipeline. We hypothesize that the sequence homology threshold for bacterial genus identification in the bioinformatics pipeline was set too low to allow
accurate bacterial genus identification, hence resulting in over-estimation of the diversity. The choice of the reference database markedly influenced the genus-level sequence assignment,
and thereby the outcome of metabarcoding analysis. P1 and P3 both used the Greengenes database for taxonomic classification. Unlike other the databases used in this study, such as SILVA and
RDP, Greengenes taxonomy is assigned based on automatic de novo 16S rRNA gene tree construction and rank mapping from other taxonomic sources, mainly the NCBI, which is not curated. Although
still included in some metabarcoding analysis packages, such as QIIME, the database has not been updated for the past ten years74. Although the Greengenes database website recommends use of
more updated sources for taxonomical annotation, this database is still used and referenced in more than a thousand publications each year. Greengenes275, a reference tree that unifies
genomic and 16S rRNA databases, recently published, should give a real chance of standardization across methods. In beta-diversity analysis, P2 and P4 displayed similar microbiota profiles,
which may be partly due to use of a common database for taxonomic identification. The use of the SILVA database and the presence of numerous sub-genera or intermediate taxonomical groups
explains the high amount of partner-exclusive bacterial genera identified for these two partners, accounting for almost half of the retrieved total bacterial genera diversity. Most of the
observed differences were due to the identification of bacterial genera sub-taxa, belonging to _Ruminococcus_, _Prevotella_, _Ruminiclostridium,_ or _Lachnospira_, which are mostly
represented by uncultured bacteria. As previously reported, this observation may be due to the high amount of sequences present in the SILVA database, with few of them being associated with
intermediate taxonomical ranks that are not present in other databases76. In this case, we assume that the presence of hypothetical sub-groups of bacterial genera in the reference database
is the main driver of the dissimilarity measured. To a lesser extent, a similar trend was also observed for partner P5, using the RDP database. However, only the genus, _Clostridium,_ was
divided into different sub-genera or intermediate ranks, which impacted the measurement of beta-diversity compared to MGP. Use of the RDP database nevertheless allowed the highest level of
similarity to be obtained to the bacterial profiles obtained by MGP or full-length metabarcoding. Separating the impact of sequencing platforms and bioinformatic pipelines in metabarcoding
analysis shows that both will influence the outcome. Although the analysis of sequences derived from various sequencing platforms using a single unique pipeline allowed greater similarities
to be obtained and diminished the “laboratory-effect” (inter-laboratory differences), it still did not allow inter-platform differences to be completely masked. Overall, both sequence
production and bioinformatics influence the distribution of samples, and both should be rigorously standardized if we are to expect distributions whereby samples from the same individual
cluster irrespective of who runs the analysis. Reprocessing of partial-length metabarcoding partner 16S rRNA sequence files with a single bioinformatic pipeline highlighted the biases due to
the use of different databases for taxonomical annotation. The results highlighted that these profiles dissimilarities between P1, P2, P4, and P5 are due to bioinformatic differences in the
way taxonomical annotation is carried out, which explains why the use of a single bioinformatic procedure homogenizes the alpha- and beta-diversity outputs. It has been previously shown
that in many instances Greengenes, SILVA, and RDP cannot be mapped reliably to one another74, thus explaining much of the dissimilarity observed before reprocessing of sequences using only
the RDP database for taxonomic annotation. However, despite the use of a single bioinformatic pipeline and similar taxonomic annotation databases, none of the bacterial genus profiles allow
an inter-individual variance to be reached that is lower than the inter-laboratories variances. The use of full-length 16S rRNA gene in LUMI-Seq® allows minimization of the presence of
exclusive bacterial genera and improvement of the identification of unclassified reads, which tends to provide profiles with higher similarity with the ones provided by MGP and P5 compared
to other partial-length metabarcoding partners. As previously demonstrated, the average bacterial species alpha-diversity as measured by the Shannon index in full-length was higher than that
in the partial-length metabarcoding partners23. However, the relative proportion of unclassified bacterial sequences was high, which may be due to the missing information in the full-length
16S rRNA gene database used. We also noticed the absence of the genus _Pseudomonas_ in the mock samples as well as _Alistipes_ in S1_1 and _Bifidobacterium_ in all samples except for S1_1
and S5_2, while they were identified by all other partners. These latter results contradict what has been reported by Jeong et al. using a similar methodology applied to gut microbiota
profiling. Here again, the primer choice may have had a strong impact. CONCLUSIONS Previous inter-laboratories studies reported biases in bacterial taxonomic profiling, following the
transfer of different raw biospecimens77, human stool samples36, to raw 16S rRNA sequences obtained from mock microbial community samples78. Here, technical replicates of DNA extracted from
human stool samples were transferred as aliquots to different laboratories, for sequencing and analysis. This approach allowed us for the first time to characterize specific biases due to
library preparation and sequence production all the way to the bioinformatics steps, without having to consider the sample collection or DNA extraction protocols. Reprocessing of raw 16S
rRNA sequences using a single bioinformatic pipeline also allowed measurement of biases specifically due to the bioinformatics pipeline. This multi-center evaluation study of gut microbiome
profiling reveals major biases mainly due to library preparation and the databases used for taxonomic annotation in bioinformatic pipelines for partial-length metabarcoding. Whereas biases
due to library preparation have been evaluated8, the impact of the choice of databases only for bacterial genus taxonomic annotation has not been investigated much to date. To our knowledge,
studies reporting bioinformatic pipeline benchmarks for metabarcoding profiling of bacteria from mock microbial communities79 or human stool samples14 were performed using similar reference
databases. Yet it is known that databases cannot be mapped reliably onto one another74, and differences between bacterial profiling using mock communities have been evaluated76. This study
highlights major differences in bacterial genus identification and relative abundance assessment due to the bioinformatic pipeline used, primarily due to the choice of the database used for
taxonomic annotation. According to the choices made concerning primer design, PCR amplification protocol in sequencing strategy as well as taxonomic reference database and sequence homology
threshold used in bioinformatic pipeline, we recommend the systematic use of _in-silico_ methodology allowing to test the relevance of these combinations of choice according to the objective
of the study. This study also reveals that the use of a single bioinformatics pipeline does not allow reduction of the proportion of partner-exclusive bacterial genera in order to allow for
a lower inter-individual variance than inter-laboratories variances between metabarcoding and metagenomic partners. For laboratories to control for the presence of false-positive or
negative bacterial genera and to accurately evaluate their pipeline or set-up standard next-generation sequencing protocols and bioinformatic pipeline, publicly or commercially available
reference biospecimens, cells, and DNA reagents should be used. Such gut-representative DNA mock community standards have recently been developed for the microbiome field80 and others have
been made commercially available by companies such as ATCC (MSA-1006). Other companies, such as Zymo Research, can also provide gut cell microbiome standard and fecal sample references
(ZymoBIOMICS). As previously highlighted, the use of these standards is critical to build-up clinical microbiota profiling or use in research laboratories to improve publication
reproducibility as well as transportability of methods and results to routine practice81. In the near future, raw metabarcoding or metagenomics DNA sequences obtained from mock microbial
communities representative of stool specimens or fecal samples, made available in publications with the pipeline being reported as set by the STORMS initiative82, may also be systematically
used by laboratories for pipeline evaluation. If microbiome profiling is ever to be made available on the dashboard of clinicians, standardizable and inter-laboratory homogeneity of outputs
will be crucial features. DATA AVAILABILITY The datasets supporting the conclusions of this article, Metagenomic and metabarcoding FastQ files, are available in NCBI BioProject under
accession number PRJNA911046. ABBREVIATIONS * IGC2: Integrated gene catalogue 2 * IHMS: International human microbiome standards * INRAE: Institut nationale de recherche pour
l'agriculture, l'alimentation et l'environnement * MGP: Metagenomic profiling * MSP: Metagenomic species * PCoA: Principal coordinate analysis REFERENCES * Vincent, A. T.,
Derome, N., Boyle, B., Culley, A. I. & Charette, S. J. Next-generation sequencing (NGS) in the microbiological world: How to make the most of your money. _J. Microbiol. Methods_ 138,
60–71 (2017). Article PubMed CAS Google Scholar * Nearing, J. T., Comeau, A. M. & Langille, M. G. I. Identifying biases and their potential solutions in human microbiome studies.
_Microbiome_ https://doi.org/10.1186/s40168-021-01059-0 (2021). Article PubMed PubMed Central Google Scholar * Penington, J. S. _et al._ Influence of fecal collection conditions and 16S
rRNA gene sequencing at two centers on human gut microbiota analysis. _Sci. Rep._ 8, 4386 (2018). Article PubMed PubMed Central ADS Google Scholar * Ilett, E. E. _et al._ Gut microbiome
comparability of fresh-frozen versus stabilized-frozen samples from hospitalized patients using 16S rRNA gene and shotgun metagenomic sequencing. _Sci. Rep._ 9, 13351 (2019). Article
PubMed PubMed Central ADS Google Scholar * Salter, S. J. _et al._ Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. _BMC Biol._ 12, 87
(2014). Article PubMed PubMed Central Google Scholar * Costea, P. I. _et al._ Towards standards for human fecal sample processing in metagenomic studies. _Nat. Biotechnol._ 35, 1069–1076
(2017). Article PubMed CAS Google Scholar * Lim, M. Y., Song, E.-J., Kim, S. H., Lee, J. & Nam, Y.-D. Comparison of DNA extraction methods for human gut microbial community
profiling. _Syst. Appl. Microbiol._ 41, 151–157 (2018). Article PubMed CAS Google Scholar * Sze, M. A. & Schloss, P. D. The impact of DNA polymerase and number of rounds of
amplification in PCR on 16S rRNA gene sequence data. _mSphere_ https://doi.org/10.1128/mSphere.00163-19 (2019). Article PubMed PubMed Central Google Scholar * Jones, M. B. _et al._
Library preparation methodology can influence genomic and functional predictions in human microbiome research. _Proc. Natl. Acad. Sci. U.S.A._ 112, 14024–14029 (2015). Article PubMed
PubMed Central ADS CAS Google Scholar * Schirmer, M. _et al._ Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform. _Nucl. Acids Res._ 43,
e37 (2015). Article PubMed PubMed Central Google Scholar * Thorsen, J. _et al._ Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene
amplicon data analysis methods used in microbiome studies. _Microbiome._ 4, 62 (2016). Article PubMed PubMed Central Google Scholar * Hillmann, B. _et al._ Evaluating the information
content of shallow shotgun metagenomics. _mSystems_ https://doi.org/10.1128/mSystems.00069-18 (2018). Article PubMed PubMed Central Google Scholar * Whon, T. W. _et al._ The effects of
sequencing platforms on phylogenetic resolution in 16 S rRNA gene profiling of human feces. _Sci. Data._ 5, 180068 (2018). Article PubMed PubMed Central CAS Google Scholar * Marizzoni,
M. _et al._ Comparison of bioinformatics pipelines and operating systems for the analyses of 16S rRNA gene amplicon sequences in human fecal samples. _Front. Microbiol._ 11, 1262 (2020).
Article PubMed PubMed Central Google Scholar * Weiss, S. _et al._ Normalization and microbial differential abundance strategies depend upon data characteristics. _Microbiome._ 5, 27
(2017). Article PubMed PubMed Central Google Scholar * Lynch, M. D. J. & Neufeld, J. D. Ecology and exploration of the rare biosphere. _Nat. Rev. Microbiol._ 13, 217–229 (2015).
Article PubMed CAS Google Scholar * Abellan-Schneyder, I. _et al._ Primer, pipelines, parameters: Issues in 16S rRNA gene sequencing. _mSphere_ https://doi.org/10.1128/mSphere.01202-20
(2021). Article PubMed PubMed Central Google Scholar * Wei, Z.-G. _et al._ Comparison of methods for picking the operational taxonomic units from amplicon sequences. _Front. Microbiol._
12, 644012 (2021). Article PubMed PubMed Central Google Scholar * Nearing, J. T. _et al._ Microbiome differential abundance methods produce different results across 38 datasets. _Nat.
Commun._ https://doi.org/10.1038/s41467-022-28034-z (2022). Article PubMed PubMed Central Google Scholar * Caruso, V., Song, X., Asquith, M. & Karstens, L. Performance of microbiome
sequence inference methods in environments with varying biomass. _mSystems_ https://doi.org/10.1128/mSystems.00163-18 (2019). Article PubMed PubMed Central Google Scholar * Acinas, S. G.
_et al._ Fine-scale phylogenetic architecture of a complex bacterial community. _Nature._ 430, 551–554 (2004). Article PubMed ADS CAS Google Scholar * Větrovský, T. & Baldrian, P.
The variability of the 16S rRNA gene in bacterial genomes and its consequences for bacterial community analyses. _PLoS ONE_ 8, e57923 (2013). Article PubMed PubMed Central ADS Google
Scholar * Jeong, J. _et al._ The effect of taxonomic classification by full-length 16S rRNA sequencing with a synthetic long-read technology. _Sci. Rep._ 11, 1727 (2021). Article PubMed
PubMed Central ADS CAS Google Scholar * Hassler, H. B. _et al._ Phylogenies of the 16S rRNA gene and its hypervariable regions lack concordance with core genome phylogenies. _Microbiome_
https://doi.org/10.1186/s40168-022-01295-y (2022). Article PubMed PubMed Central Google Scholar * Pereira-Marques, J. _et al._ Impact of host DNA and sequencing depth on the taxonomic
resolution of whole metagenome sequencing for microbiome analysis. _Front. Microbiol._ 10, 1277 (2019). Article PubMed PubMed Central Google Scholar * Gweon, H. S. _et al._ The impact of
sequencing depth on the inferred taxonomic composition and AMR gene content of metagenomic samples. _Environ. Microbiome_ https://doi.org/10.1186/s40793-019-0347-1 (2019). Article PubMed
PubMed Central Google Scholar * Laudadio, I. _et al._ Quantitative assessment of shotgun metagenomics and 16S rDNA amplicon sequencing in the study of human gut microbiome. _OMICS_ 22,
248–254 (2018). Article PubMed CAS Google Scholar * Park, S.-Y., Ufondu, A., Lee, K. & Jayaraman, A. Emerging computational tools and models for studying gut microbiota composition
and function. _Curr. Opin. Biotechnol._ 66, 301–311 (2020). Article PubMed PubMed Central CAS Google Scholar * Jovel, J. _et al._ Characterization of the gut microbiome using 16S or
shotgun metagenomics. _Front. Microbiol._ 7, 459 (2016). Article PubMed PubMed Central Google Scholar * Mitra, S. _et al._ Analysis of the intestinal microbiota using SOLiD 16S rRNA gene
sequencing and SOLiD shotgun sequencing. _BMC Genomics._ 14(Suppl 5), S16 (2013). Article PubMed PubMed Central Google Scholar * Rausch, P. _et al._ Comparative analysis of amplicon and
metagenomic sequencing methods reveals key features in the evolution of animal metaorganisms. _Microbiome_ 7, 133 (2019). Article PubMed PubMed Central Google Scholar * Biegert, G.,
Karpinets, T., Wu, X., Alam, M.B.E., Sims, T.T., Yoshida-Court, K., _et al_. Diversity and composition of gut microbiome of cervical cancer patients by 16S rRNA and whole-metagenome
sequencing (2020). * Vogtmann, E. _et al._ Colorectal cancer and the human gut microbiome: Reproducibility with whole-genome shotgun sequencing. _PLoS ONE._ 11, e0155362 (2016). Article
PubMed PubMed Central Google Scholar * Ranjan, R., Rani, A., Metwally, A., McGee, H. S. & Perkins, D. L. Analysis of the microbiome: Advantages of whole genome shotgun versus 16S
amplicon sequencing. _Biochem. Biophys. Res. Commun._ 469, 967–977 (2016). Article PubMed CAS Google Scholar * Clooney, A. G. _et al._ Comparing apples and oranges? Next generation
sequencing and its impact on microbiome analysis. _PLoS ONE_ 11, e0148028 (2016). Article PubMed PubMed Central Google Scholar * Han, D. _et al._ Multicenter assessment of microbial
community profiling using 16S rRNA gene sequencing and shotgun metagenomic sequencing. _J Adv Res._ 26, 111–121 (2020). Article PubMed PubMed Central CAS Google Scholar * Criscuolo, A.
& Brisse, S. AlienTrimmer: A tool to quickly and accurately trim off multiple short contaminant sequences from high-throughput sequencing reads. _Genomics_ 102, 500–506 (2013). Article
PubMed CAS Google Scholar * Wen, C. _et al._ Quantitative metagenomics reveals unique gut microbiome biomarkers in ankylosing spondylitis. _Genome Biol._ 18, 142 (2017). Article PubMed
PubMed Central Google Scholar * Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. _Nat. Methods_ 9, 357–359 (2012). Article PubMed PubMed Central CAS Google
Scholar * Cotillard, A. _et al._ Dietary intervention impact on gut microbial gene richness. _Nature_ 500, 585–588 (2013). Article PubMed ADS CAS Google Scholar * Le Chatelier, E. _et
al._ Richness of human gut microbiome correlates with metabolic markers. _Nature_ 500, 541–546 (2013). Article PubMed Google Scholar * Plaza Oñate, F. _et al._ MSPminer: Abundance-based
reconstitution of microbial pan-genomes from shotgun metagenomic data. _Bioinformatics_ 35, 1544–1552 (2019). Article PubMed Google Scholar * Parks, D. H. _et al._ A standardized
bacterial taxonomy based on genome phylogeny substantially revises the tree of life. _Nat. Biotechnol._ 36, 996–1004 (2018). Article PubMed CAS Google Scholar * Wood, D. E., Lu, J. &
Langmead, B. Improved metagenomic analysis with Kraken 2. _Genome Biol._ 20, 257 (2019). Article PubMed PubMed Central CAS Google Scholar * Schloss, P. D. _et al._ Introducing mothur:
Open-source, platform-independent, community-supported software for describing and comparing microbial communities. _Appl. Environ. Microbiol._ 75, 7537–7541 (2009). Article PubMed PubMed
Central ADS CAS Google Scholar * Caporaso, J. G. _et al._ QIIME allows analysis of high-throughput community sequencing data. _Nat. Methods_ 7, 335–336 (2010). Article PubMed PubMed
Central CAS Google Scholar * Escudié, F. _et al._ FROGS: Find, rapidly, OTUs with galaxy solution. _Bioinformatics_ 34, 1287–1294 (2018). Article PubMed Google Scholar * Callahan, B.
J. _et al._ DADA2: High-resolution sample inference from Illumina amplicon data. _Nat. Methods_ 13, 581–583 (2016). Article PubMed PubMed Central CAS Google Scholar * Westcott, S. L.
& Schloss, P. D. OptiClust, an improved method for assigning amplicon-based sequence data to operational taxonomic units. _mSphere_ https://doi.org/10.1128/mSphereDirect.00073-17 (2017).
Article PubMed PubMed Central Google Scholar * Mahé, F., Rognes, T., Quince, C., de Vargas, C. & Dunthorn, M. Swarm: Robust and fast clustering method for amplicon-based studies.
_PeerJ._ 2, e593 (2014). Article PubMed PubMed Central Google Scholar * Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. _Bioinformatics_ 26, 2460–2461 (2010).
Article PubMed CAS Google Scholar * Magoč, T. & Salzberg, S. L. FLASH: Fast length adjustment of short reads to improve genome assemblies. _Bioinformatics_ 27, 2957–2963 (2011).
Article PubMed PubMed Central Google Scholar * Maidak, B. L. _et al._ The RDP (Ribosomal Database Project) continues. _Nucl. Acids Res._ 28, 173–174 (2000). Article PubMed PubMed
Central CAS Google Scholar * DeSantis, T. Z. _et al._ Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. _Appl. Environ. Microbiol._ 72, 5069–5072
(2006). Article PubMed PubMed Central ADS CAS Google Scholar * Camacho, C. _et al._ BLAST+: Architecture and applications. _BMC Bioinform._ 10, 421 (2009). Article Google Scholar *
Wang, Q., Garrity, G. M., Tiedje, J. M. & Cole, J. R. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. _Appl. Environ. Microbiol._ 73,
5261–5267 (2007). Article PubMed PubMed Central ADS CAS Google Scholar * Quast, C. _et al._ The SILVA ribosomal RNA gene database project: Improved data processing and web-based tools.
_Nucl. Acids Res._ 41, D590–D596 (2013). Article PubMed CAS Google Scholar * O’Leary, N. A. _et al._ Reference sequence (RefSeq) database at NCBI: Current status, taxonomic expansion,
and functional annotation. _Nucl. Acids Res._ 44, D733–D745 (2016). Article PubMed CAS Google Scholar * Blin, K. ncbi-genome-download: Zenodo (2023). * Schoch, C. L. _et al._ NCBI
Taxonomy: A comprehensive update on curation, resources and tools. Database (Oxford) https://doi.org/10.1093/database/baaa062 (2020). * Seemann, T. barrnap 0.9: Rapid ribosomal RNA
prediction (2013). https://github.com/tseemann/barrnap. * Li, W. & Godzik, A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences.
_Bioinformatics_ 22, 1658–1659 (2006). Article PubMed CAS Google Scholar * Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: Accelerated for clustering the next-generation sequencing
data. _Bioinformatics_ 28, 3150–3152 (2012). Article PubMed PubMed Central CAS Google Scholar * Sayers, E. W. _et al._ Database resources of the National Center for Biotechnology
Information. _Nucl. Acids Res._ 47, D23–D28 (2019). Article PubMed CAS Google Scholar * Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. _EMBnet J._
17, 10 (2011). Article Google Scholar * Bankevich, A. _et al._ SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. _J. Comput. Biol._ 19, 455–477
(2012). Article MathSciNet PubMed PubMed Central CAS Google Scholar * Zhang, J., Kobert, K., Flouri, T. & Stamatakis, A. PEAR: A fast and accurate Illumina Paired-End reAd mergeR.
_Bioinformatics_ 30, 614–620 (2014). Article PubMed CAS Google Scholar * Edgar, R. C., Haas, B. J., Clemente, J. C., Quince, C. & Knight, R. UCHIME improves sensitivity and speed of
chimera detection. _Bioinformatics_ 27, 2194–2200 (2011). Article PubMed PubMed Central CAS Google Scholar * Cole, J. R. _et al._ Ribosomal Database Project: Data and tools for high
throughput rRNA analysis. _Nucl. Acids Res._ 42, D633–D642 (2014). Article PubMed CAS Google Scholar * Dereeper, A. _et al._ Phylogeny.fr: Robust phylogenetic analysis for the
non-specialist. _Nucl. Acids Res._ 36, W465–W469 (2008). Article PubMed PubMed Central CAS Google Scholar * Castresana, J. Selection of conserved blocks from multiple alignments for
their use in phylogenetic analysis. _Mol. Biol. Evol._ 17, 540–552 (2000). Article PubMed CAS Google Scholar * Guindon, S. _et al._ New algorithms and methods to estimate
maximum-likelihood phylogenies: Assessing the performance of PhyML 3.0. _Syst. Biol._ 59, 307–321 (2010). Article PubMed CAS Google Scholar * Chevenet, F., Brun, C., Bañuls, A.-L., Jacq,
B. & Christen, R. TreeDyn: Towards dynamic graphics and annotations for analyses of trees. _BMC Bioinform._ 7, 439 (2006). Article Google Scholar * Balvočiūtė, M. & Huson, D. H.
SILVA, RDP, Greengenes, NCBI and OTT—How do these taxonomies compare?. _BMC Genomics_ https://doi.org/10.1186/s12864-017-3501-4 (2017). Article PubMed PubMed Central Google Scholar *
McDonald, D. _et al._ Greengenes2 unifies microbial data in a single reference tree. _Nat. Biotechnol._ https://doi.org/10.1038/s41587-023-01845-1 (2023). Article PubMed PubMed Central
Google Scholar * Park, S.-C. & Won, S. Evaluation of 16S rRNA databases for taxonomic assignments using a mock community. _Genomics Inform._ 16, e24 (2018). Article PubMed PubMed
Central Google Scholar * Sinha, R. _et al._ Assessment of variation in microbial community amplicon sequencing by the Microbiome Quality Control (MBQC) project consortium. _Nat.
Biotechnol._ 35, 1077–1086 (2017). Article PubMed PubMed Central CAS Google Scholar * O’Sullivan, D. M. _et al._ An inter-laboratory study to investigate the impact of the
bioinformatics component on microbiome analysis using mock communities. _Sci. Rep._ 11, 10590 (2021). Article PubMed PubMed Central ADS CAS Google Scholar * Straub, D. _et al._
Interpretations of environmental microbial community studies are biased by the selected 16S rRNA (Gene) amplicon sequencing pipeline. _Front. Microbiol._ 11, 550420 (2020). Article PubMed
PubMed Central Google Scholar * Amos, G. C. A. _et al._ Developing standards for the microbiome field. _Microbiome._ 8, 98 (2020). Article PubMed PubMed Central Google Scholar *
Scherz, V., Greub, G. & Bertelli, C. Building up a clinical microbiota profiling: A quality framework proposal. _Crit. Rev. Microbiol._ 48(3), 356–375 (2021). Article PubMed Google
Scholar * Mirzayi, C. _et al._ Reporting guidelines for human microbiome research: The STORMS checklist. _Nat. Med._ 27, 1885–1892 (2021). Article PubMed PubMed Central CAS Google
Scholar Download references ACKNOWLEDGEMENTS The authors thank the five healthy human donors as well as all external partners involved in this multicenter study. Particularly, François Le
Vacon, Yao Amouzou, Morgane Pierre, Younous Adrouji, Thomas Carton and Sébastien Leuillet from Biofortis SAS, are acknowledged for their participation as partners as well as their support in
reviewing the article. Sophie Domingues is thanks for professional English editing of the article. FUNDING Joel Doré’s contribution was facilitated in part by funding from the European
Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement ERC-2017-AdG No. 788191—Homo.symbiosus). AUTHOR INFORMATION AUTHORS AND
AFFILIATIONS * Université Paris-Saclay, INRAE, MetaGenoPolis, 78350, Jouy-en-Josas, France Hugo Roume & Joël Doré * Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute,
78350, Jouy-en-Josas, France Stanislas Mondot & Joël Doré * Discovery & Front End Innovation, Lesaffre Institute of Science & Technology, Lesaffre International, 101 rue de
Menin, 59700, Marcq-en-Barœul, France Hugo Roume * BIOASTER, Microbiology Technology Institute, 40 Avenue Tony Garnier, 69007, Lyon, France Adrien Saliou * Biofortis SAS, 3 Route de la
Chatterie, Saint-Herblain, 44800, Nantes, France Sophie Le Fresne-Languille Authors * Hugo Roume View author publications You can also search for this author inPubMed Google Scholar *
Stanislas Mondot View author publications You can also search for this author inPubMed Google Scholar * Adrien Saliou View author publications You can also search for this author inPubMed
Google Scholar * Sophie Le Fresne-Languille View author publications You can also search for this author inPubMed Google Scholar * Joël Doré View author publications You can also search for
this author inPubMed Google Scholar CONTRIBUTIONS H.R. and J.D. designed the study, J.D. supervised the collection of human fecal samples, H.R. isolated the DNA, transferred it to the
partner laboratories, performed the shotgun sequencing, and analyzed the metabarcoding and metagenomic results provided by the partners. S.M. performed the reprocessing of all partial-length
16S rRNA sequences using his own bioinformatic pipeline. H.R., S.M., and J.D. wrote the manuscript. All authors and partner laboratories discussed the results and commented on the
manuscript. CORRESPONDING AUTHOR Correspondence to Joël Doré. ETHICS DECLARATIONS COMPETING INTERESTS Adrien Saliou and Sophie Le Fresne-Languille declare a conflict of interest. Other
authors declare that they have no competing interests. We acknowledge a link of interest for Bioaster and Biofortis that may provide microbiome analysis as a service. Yet they in no way
influenced the main message of the publication that merely states that 16S rDNA-based metabarcoding approaches suffer major limitations and may very likely never be standardizable. This
information aimed at the scientific community is to help promote more robust microbiome work in the future, be it by academic or by private stakeholders. It should be noted that neither of
these project partners did pay for the implementation of the project. Each of these partners financed the conduct of the experiments for which it was responsible, and the project as a whole
was designed, supervised and managed by the academic partner, MetaGenoPolis. ADDITIONAL INFORMATION PUBLISHER'S NOTE Springer Nature remains neutral with regard to jurisdictional claims
in published maps and institutional affiliations. SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION 1. SUPPLEMENTARY TABLE 4. SUPPLEMENTARY TABLE 5. RIGHTS AND PERMISSIONS OPEN ACCESS
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as
long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third
party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the
article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Roume, H., Mondot, S.,
Saliou, A. _et al._ Multicenter evaluation of gut microbiome profiling by next-generation sequencing reveals major biases in partial-length metabarcoding approach. _Sci Rep_ 13, 22593
(2023). https://doi.org/10.1038/s41598-023-46062-7 Download citation * Received: 27 February 2023 * Accepted: 27 October 2023 * Published: 18 December 2023 * DOI:
https://doi.org/10.1038/s41598-023-46062-7 SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable link Sorry, a shareable link is not
currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative