Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Oct 26;10(5):e0077022.
doi: 10.1128/spectrum.00770-22. Epub 2022 Aug 18.

Geochemistry and Multiomics Data Differentiate Streams in Pennsylvania Based on Unconventional Oil and Gas Activity

Affiliations

Geochemistry and Multiomics Data Differentiate Streams in Pennsylvania Based on Unconventional Oil and Gas Activity

Maria Fernanda Campa et al. Microbiol Spectr. .

Abstract

Unconventional oil and gas (UOG) extraction is increasing exponentially around the world, as new technological advances have provided cost-effective methods to extract hard-to-reach hydrocarbons. While UOG has increased the energy output of some countries, past research indicates potential impacts in nearby stream ecosystems as measured by geochemical and microbial markers. Here, we utilized a robust data set that combines 16S rRNA gene amplicon sequencing (DNA), metatranscriptomics (RNA), geochemistry, and trace element analyses to establish the impact of UOG activity in 21 sites in northern Pennsylvania. These data were also used to design predictive machine learning models to determine the UOG impact on streams. We identified multiple biomarkers of UOG activity and contributors of antimicrobial resistance within the order Burkholderiales. Furthermore, we identified expressed antimicrobial resistance genes, land coverage, geochemistry, and specific microbes as strong predictors of UOG status. Of the predictive models constructed (n = 30), 15 had accuracies higher than expected by chance and area under the curve values above 0.70. The supervised random forest models with the highest accuracy were constructed with 16S rRNA gene profiles, metatranscriptomics active microbial composition, metatranscriptomics active antimicrobial resistance genes, land coverage, and geochemistry (n = 23). The models identified the most important features within those data sets for classifying UOG status. These findings identified specific shifts in gene presence and expression, as well as geochemical measures, that can be used to build robust models to identify impacts of UOG development. IMPORTANCE The environmental implications of unconventional oil and gas extraction are only recently starting to be systematically recorded. Our research shows the utility of microbial communities paired with geochemical markers to build strong predictive random forest models of unconventional oil and gas activity and the identification of key biomarkers. Microbial communities, their transcribed genes, and key biomarkers can be used as sentinels of environmental changes. Slight changes in microbial function and composition can be detected before chemical markers of contamination. Potential contamination, specifically from biocides, is especially concerning due to its potential to promote antibiotic resistance in the environment. Additionally, as microbial communities facilitate the bulk of nutrient cycling in the environment, small changes may have long-term repercussions. Supervised random forest models can be used to identify changes in those communities, greatly enhance our understanding of what such impacts entail, and inform environmental management decisions.

Keywords: 16S rRNA; Marcellus shale; geochemistry; hydraulic fracturing; metatranscriptomics; natural gas.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

FIG 1
FIG 1
Map of all streams sampled in Pennsylvania for this study. The map shows sampling location, well pads location, cleared area, location of compressor stations, the pipeline right-of-way, wastewater pond, and watershed boundaries.
FIG 2
FIG 2
Correlogram of sample metadata showing significant Spearman rank correlations. Correlations were calculated using the water UOG data set. The square colors show whether the correlation was positive (blue) or negative (red), with darker squares indicating stronger correlations.
FIG 3
FIG 3
Principal coordinates analyses (PCoA) with vectors, indicating significant environmental parameters and 95% confidence intervals around the centroids, with lines connecting the centroids to their respective samples. Dark gray, UOG+ samples; light gray, UOG samples. Circles, UOG samples; squares, UOG+ samples, with the exception of samples from the two spill sites (ABR and LLR), which are shown with diamonds. UOG data sets are visualized in all panels. (A) Weighted Unifrac distance of total bacterial community, 16S rRNA gene-derived ASVs from streambed sediment. (B) Weighted Unifrac distance of 16S rRNA gene-derived ASVs from stream water. (C) Bray-Curtis dissimilarity of active microbial community, metatranscriptomics of streambed sediment. (D) Bray-Curtis dissimilarity of active genes, metatranscriptomics of streambed sediment. (E) Bray-Curtis dissimilarity of active antimicrobial resistance genes, metatranscriptomics of streambed sediment.
FIG 4
FIG 4
Burkholderiales top 20 most expressed genes in UOG+ (left) and UOG (right), with genes highly expressed in both shown in the middle. The difference between each gene’s average normalized (based on counts per minute) expression in UOG from its average in UOG+ is shown on the x axis. Therefore, negative values indicate higher expression in UOG+, while positive values indicate higher expression in UOG. Axes are not consistent across panels. Several of the differences in expression were significant (see Table S7 in the supplemental material).
FIG 5
FIG 5
The four random forest models with the highest overall predictive accuracy for their input data. The top predictors of unconventional oil and gas status for each model are listed; the x axis represents the mean decrease in Gini index for each predictor, and the y axis lists the top 10 predictors for each model. (A) The BALANCED land cover and geochemistry model with metadata, overall accuracy 100%. (B) The PAIRED active antimicrobial resistance genes and metadata model, overall accuracy of 96.27%. (C) The BALANCED active microbial composition model, overall accuracy 93.45%. (D) The 16S rRNA gene amplicon ASVs UOG sediment model, overall accuracy 89.87%. These four models had AUC values >0.89.

References

    1. U.S. Energy Information Administration. 2020. U.S. total energy exports exceed imports in 2019 for the first time in 67 years. https://www.eia.gov/todayinenergy/detail.php?id=43395. Accessed 31 August 2020.
    1. Howarth RW, Santoro R, Ingraffea A. 2011. Methane and the greenhouse-gas footprint of natural gas from shale formations. Climatic Change 106:679–690. doi: 10.1007/s10584-011-0061-5. - DOI
    1. Osborn SG, Vengosh A, Warner NR, Jackson RB. 2011. Methane contamination of drinking water accompanying gas-well drilling and hydraulic fracturing. Proc Natl Acad Sci USA 108:8172–8176. doi: 10.1073/pnas.1100682108. - DOI - PMC - PubMed
    1. Warner NR, Jackson RB, Darrah TH, Osborn SG, Down A, Zhao KG, White A, Vengosh A. 2012. Geochemical evidence for possible natural migration of Marcellus Formation brine to shallow aquifers in Pennsylvania. Proc Natl Acad Sci USA 109:11961–11966. doi: 10.1073/pnas.1121181109. - DOI - PMC - PubMed
    1. Drollette BD, Hoelzer K, Warner NR, Darrah TH, Karatum O, O'Connor MP, Nelson RK, Fernandez LA, Reddy CM, Vengosh A, Jackson RB, Elsner M, Plata DL. 2015. Elevated levels of diesel range organic compounds in groundwater near Marcellus gas operations are derived from surface activities. Proc Natl Acad Sci USA 112:13184–13189. doi: 10.1073/pnas.1511474112. - DOI - PMC - PubMed

Publication types