Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jun 23:17:249.
doi: 10.1186/s12859-016-1134-2.

PRECOG: a tool for automated extraction and visualization of fitness components in microbial growth phenomics

Affiliations

PRECOG: a tool for automated extraction and visualization of fitness components in microbial growth phenomics

Luciano Fernandez-Ricaud et al. BMC Bioinformatics. .

Abstract

Background: Phenomics is a field in functional genomics that records variation in organismal phenotypes in the genetic, epigenetic or environmental context at a massive scale. For microbes, the key phenotype is the growth in population size because it contains information that is directly linked to fitness. Due to technical innovations and extensive automation our capacity to record complex and dynamic microbial growth data is rapidly outpacing our capacity to dissect and visualize this data and extract the fitness components it contains, hampering progress in all fields of microbiology.

Results: To automate visualization, analysis and exploration of complex and highly resolved microbial growth data as well as standardized extraction of the fitness components it contains, we developed the software PRECOG (PREsentation and Characterization Of Growth-data). PRECOG allows the user to quality control, interact with and evaluate microbial growth data with ease, speed and accuracy, also in cases of non-standard growth dynamics. Quality indices filter high- from low-quality growth experiments, reducing false positives. The pre-processing filters in PRECOG are computationally inexpensive and yet functionally comparable to more complex neural network procedures. We provide examples where data calibration, project design and feature extraction methodologies have a clear impact on the estimated growth traits, emphasising the need for proper standardization in data analysis.

Conclusions: PRECOG is a tool that streamlines growth data pre-processing, phenotypic trait extraction, visualization, distribution and the creation of vast and informative phenomics databases.

Keywords: Automation; Data pre-processing; Data presentation; Fitness components; Phenomics; Yeast; growth.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
PRECOG’s overall design. a The functionality that PRECOG provides is organized as a pipeline that follows four basic steps: step 1 - data import, step 2 - data processing, step 3 - data visualization, and step 4 - data export. b Screenshots from the PRECOG desktop application. PRECOG’s user interface is divided into two zones: actions and views. The action zone controls the program’s functions: i) upload data files, ii) setting parameters, iii) experiment selection, iv) graph controls, and v) save data. The views zone presents the data in different displays: vi) table view, vii) thumbnail view and viii) detailed curve view
Fig. 2
Fig. 2
Effects of data pre-processing. Effects of different types of noise in the raw data (red line) on the fully (black line) or partially (green line) processed data, the latter without the mean filter that removes spikes. If spikes are not removed, as in the case of wide spikes consisting of more than one data point, the processed data will be distorted (as seen in the lower two graphs). Figures are screenshots from PRECOG.
Fig. 3
Fig. 3
Filtering data using PRECOG’s quality indices. a Data is filtered using four quality indices, QI1 - “overall noisiness”, QI2 - “local noisiness”, QI3 - “number of spikes”, and QI4 - “curve collapses”. Upper panels: performance of each quality filter on the “aggregated 90 k set”, including almost 90,000 growth curves. x-axis shows QI score, y-axis shows number of growth curves flagged at each QI score setting. Blue bars = non-cumulative flagging, red line = cumulative flagging, dashed black line = selected QI score setting that flags a cumulative 5 % of curves. Lower panels: performance of the QI filter (QI score, y-axis) on the two selected benchmarking sets of 100 high- and 100 low-quality curves (x-axis). Dashed horizontal lines: performance at the 5 % rejection threshold selected based on the “aggregated 90 k set” growth curves. b Summary performance of all quality indices. Number of curves that obtain 0, 1, 2, 3 and 4 flags in the “aggregated 90 k set” at the selected threshold, where each quality index flags the worst 5 % of growth curves, i.e. 90 % of all curves were not scored by any of the quality indices while 2 % were scored by all four. c Summary performance of all quality indices. Number of QI flags in the high- and low-quality benchmarking sets. Colours indicate quality index responsible for the flagging, with blue = QI1, red = QI2, green = QI3 and purple = QI4. d Number of false positives and negatives in the two benchmarking sets, as a function of using various thresholds from the “aggregated 90 k set”
Fig. 4
Fig. 4
Benchmarking of PRECOG’s default data cleaning algorithm. We compared PRECOG’s pre-processing filters against a computationally demanding neural network pre-processing procedure. After the pre-processing, the fitness components growth lag, growth rate and growth efficiency were extracted by PRECOG’s standard procedure from the high- (left panel) and low-quality (right panel) benchmarking sets, and compared. Low-quality curves are in the right panel marked (green with a red mark) if flagged by all four quality-indices (thus, indicating curves of really low quality)
Fig. 5
Fig. 5
Fitness components extraction from calibrated and non-calibrated growth curves. Fitness components were extracted from the high-quality benchmarking set of growth curves. a Example curves for each of the fitness components extracted (growth lag, rate and efficiency). For each estimated fitness component, markers (red triangles = non-calibrated data, black circles = calibrated data) indicate the data underlying that estimate. b Correlation between calibrated and non-calibrated data. Dotted line indicates the 1:1 relation. c Calibration function for different organisms. Recorded optical density (x-axis) and actual population size (density), as reflected in the OD recorded for a diluted cell suspension and multiplication with the dilution factor (y-axis), is shown
Fig. 6
Fig. 6
Comparing two algorithms for extracting doubling time. a Effect of sampling frequency on doubling time. Sampling frequency denotes the fixed time (interval) between consecutive measurements. At the start of the experiment the user sets the sampling frequency: PRECOG’s default algorithm (upper panel), the algorithm based on linear regression (lower panel). Averages from the high- and low-quality sets are indicated. b Doubling times extracted from data with 20 minute sampling intervals (our default value) for the high- (upper panel) and low-quality (lower panel) benchmarking sets are shown for the two algorithms

References

    1. Warringer J, Blomberg A. Yeast Phenomics-Large-scale Mapping of the Genetic Basis for Organismal Traits. In Hancock JM, editor. Phenomics. CRC Press; 2014. p.172–207.
    1. Blake JA, Bult CJ, Eppig JT, Kadin JA, Richardson JE, Airey MT, Anagnostopoulos A, Babiuk RP, Baldarelli RM, Baya MJ, Beal JS, Bello SM, Bradt DW, Burkart DL, Butler NE, Campbell JW, Corbani LE, Cousins SL, Dahmen SJ, Dene H, Diehl AD, Forthofer KL, Frazer KS, Geel DB, Hall MM, Knowlton M, Lewis JR, Lu I, Maltias LJ, McAndrews-Hill M, et al.: The Mouse Genome Database genotypes::Phenotypes. Nucleic Acids Res 2009, 37(SUPPL. 1):D712-9. - PMC - PubMed
    1. de la Cruz N, Bromberg S, Pasko D, Shimoyama M, Twigger S, Chen J, Chen C-F, Fan C, Foote C, Gopinath GR, Harris G, Hughes A, Ji Y, Jin W, Li D, Mathis J, Nenasheva N, Nie J, Nigam R, Petri V, Reilly D, Wang W, Wu W, Zuniga-Meyer A, Zhao L, Kwitek A, Tonellato P, Jacob H. The Rat Genome Database (RGD): developments towards a phenome database. Nucleic Acids Res. 2005;33(Database issue):D485–91. doi: 10.1093/nar/gki050. - DOI - PMC - PubMed
    1. Engel SR, Balakrishnan R, Binkley G, Christie KR, Costanzo MC, Dwight SS, Fisk DG, Hirschman JE, Hitz BC, Hong EL, Krieger CJ, Livstone MS, Miyasato SR, Nash R, Oughtred R, Park J, Skrzypek MS, Weng S, Wong ED, Dolinski K, Botstein D, Cherry JM: Saccharomyces Genome Database provides mutant phenotype data. Nucleic Acids Res 2009;38:D433-6. - PMC - PubMed
    1. Zhao H, Yao W, Ouyang Y, Yang W, Wang G, Lian X, Xing Y, Chen L, Xie W. RiceVarMap: a comprehensive database of rice genomic variations. Nucleic Acids Res. 2015;43(Database issue):D1018–22. doi: 10.1093/nar/gku894. - DOI - PMC - PubMed

Publication types