Ecological insights into soil health according to the genomic traits and environment-wide associations of bacteria in agricultural soils

feature-image

Play all audios:

Loading...

Soil microbiomes are sensitive to current and previous soil conditions, and bacterial ‘bioindicators’ of biological, physical, and chemical soil properties have considerable potential for


soil health assessment. However, the lack of ecological or physiological information for most soil microorganisms limits our ability to interpret the associations of bioindicators and, thus,


their utility for guiding management. We identified bioindicators of tillage intensity and twelve soil properties used to rate soil health using a 16S rRNA gene-based survey of farmland


across North America. We then inferred the genomic traits of bioindicators and evaluated their environment-wide associations (EWAS) with respect to agricultural management practice,


disturbance, and plant associations with 89 studies from agroecosystems. Most bioindicators were either positively correlated with biological properties (e.g., organic matter) or negatively


correlated with physical and chemical properties. Higher soil health ratings corresponded with smaller genome size and higher coding density, while lower ratings corresponded with larger


genomes and higher rrn copy number. Community-weighted genome size explained most variation in health ratings. EWAS linked prominent bioindicators with the impacts of environmental


disturbances. Our findings provide ecological insights into bioindicators of soil properties relevant to soil health management, illustrating the tight coupling of microbiome and soil


function.


Managing soil health promotes the long-term fertility and ecological integrity of agricultural lands [1, 2]. Soil health encompasses a range of soil properties that contribute value to


agroecosystems, including nutrient and water cycling, biodiversity, plant pathogen suppression, and pollution mitigation. Soil health is monitored using biological, physical, and chemical


indicators that correspond with these functions [3,4,5]. Ideally, indicators should be directly linked to soil function, interpretable, and exhibit a dynamic response to management practices


[6,7,8,9,10]. The soil microbiome has considerable potential to serve in this capacity. Microbial communities are highly sensitive to management practices [11,12,13,14], including those


that shape properties that determine soil health in agricultural systems [15,16,17,18,19,20]. The broad ecological and functional diversity of bacteria in soil provides rich information


about soil conditions, which was recently used to predict soil health status [21]. However, our ability to interpret the responses of bacterial ‘bioindicators’ is limited by our sparse


understanding of the ecology and function of most bacteria in soil. Bridging this gap between soil microbial ecology and soil health will improve the use of microbiome data in soil health


monitoring.


Ecological insight into soil microbiome structure and function can be derived by leveraging the large amounts of DNA sequencing data available in public repositories. One form of ecological


inference can be derived from genomic data, whereby microbial traits can be estimated from representative genomes that are close relatives of taxa observed in phylogenetic gene marker


surveys [22]. Genomic traits, such as genome size, codon usage bias, and rrn copy number, can be used to derive ecological information from trends in soil microbiome composition [23, 24]


based on the evolutionary tradeoffs between growth, survival, and reproduction shaping these traits [25,26,27]. Genomic traits form the basis of several life-history frameworks that group


bacteria by ecological strategies (e.g., ‘generalist’ vs. ‘specialist’) [28]; adaptive tradeoffs between growth rate, yield, and stress tolerance [26, 29, 30]; or metabolic dependency (e.g.,


‘prototrophic’ vs. ‘auxotrophic’) [31]. These frameworks have been used to interpret microbiome trends associated with agricultural management practices, such as tillage intensity and


nutrient management [32, 33].


While promising, the genomic inference of ecological traits has notable limitations. For example, many of the most active and abundant microorganisms in agricultural soils lack


representative genomes from which traits might be predicted [34,35,36,37]. Ecological information can still be derived for these non-cultivated organisms by profiling their phylogenetic gene


markers across the growing number of publicly available amplicon sequencing projects [38, 39]. An ‘environment-wide association survey’ (EWAS) approach follows the principle of reverse


ecology, where information is inferred from changes in the abundance and distribution of genes across sites [40], in our case the 16S rRNA phylogenetic marker gene across environmental


conditions. Traditional approaches assign a trait using curated databases [41, 42], which tend to exclude uncultured or poorly characterized taxa. This is problematic since unclassified taxa


are often indicative of soil properties relevant to soil health management [21, 37, 43, 44]. In contrast, EWAS requires no prior knowledge, given the capacity to obtain information for any


organism with a phylogenetic gene marker present in sequencing databases [45,46,47,48]. An EWAS approach is primarily limited by the poor quality of metadata reported for most sequencing


projects [49] and a historical lack of standardization in sequencing workflows. These drawbacks are partially compensated for by the sheer volume of available sequencing projects and renewed


efforts to systematize data publishing will improve the efficacy of EWAS over time [50].


Our study identified and characterized bacterial bioindicators of soil properties used in soil health assessment using a large amplicon sequencing survey of farmland across North America.


Our first objective was to utilize 16S rRNA gene sequencing data to identify bioindicators that correlate with twelve biological (e.g., organic matter), physical, and chemical soil


properties used in soil health assessment. We focused on profiling specific bioindicator species given the relatively minor differences observed in diversity metrics reported for our dataset


[21]. Our second objective was to evaluate trends in bioindicators using (i) inferred genomic traits and (ii) a 16S rRNA gene-based EWAS to understand the ecological basis for their


associations with soil health. For (i), we tested whether trends in community-weighted genomic traits corresponded to variation in soil health ratings. For (ii), we explored the


environment-wide associations (EWAS) of key bioindicators using a database comprised of agricultural microbiomes (derived from 89 prior studies) that included diverse metadata grouped by


study factors into broad (management practice, disturbance, and plant association) and specific categories (fertilization, land-use, tillage, drought etc.). This combined approach yielded


ecological information about the most abundant bioindicators of soil health and provided new perspectives on the relationships between the soil microbiome and properties related to healthy


soil function.


Our primary dataset consisted of 778 soil samples sourced from farmland across the USA, representing diverse cropping systems, as part of a soil health initiative led by Cornell University


and the USDA Natural Resources Conservation Service. Soils originated from 191 unique locations that differed in agricultural management practices and soil health ratings. This dataset was


used in a separate study to test the accuracy of microbiome-based machine learning for predicting soil health [21]. Our study aims to identify bioindicators and explore the underlying


ecological basis for their association with soil health ratings, which have yet to be examined.


The soil properties of each sample were collected using the Comprehensive Assessment of Soil Health (CASH) framework (Table S1), which uses biological (soil organic matter, respiration, ACE


protein, and active carbon, also known as ‘permanganate oxidizable organic carbon’), chemical (pH, phosphorus, potassium, and minor elements), and physical ratings (aggregate stability,


available water capacity, soil texture, and surface and sub-surface hardness) to assess soil health [7]. Tillage data was collected for most soils (n = 599) and was coded as ‘till’ vs. ‘no


till.’ Surface and sub-surface hardness ratings were inverted so that more compacted soils corresponded with higher ratings (opposite of CASH framework); these ratings were present for a


subset of samples (n = 309 and 292, respectively). Measurements for each soil property were transformed using a scoring function to create a normalized rating that accounts for differences


in soil texture [7]. A total health score was then calculated from the unweighted mean of all twelve ratings. Perspectives on the nature of soil health assessment and health indicators


continues to evolve [10]. The soil properties in the CASH framework have been used extensively to assess the impacts of soil management practices on soil function [7].


Total DNA was extracted from soils to determine bacterial community composition and was also used to estimate microbial biomass [51]. DNA was extracted using the DNeasy PowerSoil Kit, as per


manufacturers recommendation (QIAGEN, Germantown, MD, USA). DNA concentration was quantified in triplicate using the Quant-iT™ PicoGreen™ dsDNA Assay Kit (Thermo Fisher Scientific, Inc.,


Waltham, MA, USA). Bacterial community composition was determined through amplicon sequencing of the V4 region of the 16S rRNA gene using Illumina MiSeq (2 × 250 paired-end) and dual-indexed


barcoded primers (515f/806r; sequences provided in Table S2) as previously described [21, 52]. Demultiplexing, filtering and trimming, and chimera removal were performed with QIIME2 (v.


2020.2) [53] using default parameters and trimming left and right by 5 bp. Operational taxonomic units (OTUs) were defined as amplicon sequence variants and assigned taxonomic


classifications using QIIME2 with dependencies on DADA2 [38] and the Silva database (nr_v132) [54], respectively. Raw sequencing data was archived at the National Centre for Biotechnology


Information (BioProject: PRJEB35975).


Bacterial OTUs indicative of soil health were determined by Spearman rank correlations using the ‘rcorr’ function in the R package Hmisc (v. 1.34.0) [55]. Prior to correlation analyses, OTUs


occurring at low frequency (fewer than 10 samples), and at low relative abundance (