Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Oct 22;19(Suppl 11):358.
doi: 10.1186/s12859-018-2329-5.

Automated quality control for a molecular surveillance system

Affiliations

Automated quality control for a molecular surveillance system

Seth Sims et al. BMC Bioinformatics. .

Abstract

Background: Molecular surveillance and outbreak investigation are important for elimination of hepatitis C virus (HCV) infection in the United States. A web-based system, Global Hepatitis Outbreak and Surveillance Technology (GHOST), has been developed using Illumina MiSeq-based amplicon sequence data derived from the HCV E1/E2-junction genomic region to enable public health institutions to conduct cost-effective and accurate molecular surveillance, outbreak detection and strain characterization. However, as there are many factors that could impact input data quality to which the GHOST system is not completely immune, accuracy of epidemiological inferences generated by GHOST may be affected. Here, we analyze the data submitted to the GHOST system during its pilot phase to assess the nature of the data and to identify common quality concerns that can be detected and corrected automatically.

Results: The GHOST quality control filters were individually examined, and quality failure rates were measured for all samples, including negative controls. New filters were developed and introduced to detect primer dimers, loss of specimen-specific product, or short products. The genotyping tool was adjusted to improve the accuracy of subtype calls. The identification of "chordless" cycles in a transmission network from data generated with known laboratory-based quality concerns allowed for further improvement of transmission detection by GHOST in surveillance settings. Parameters derived to detect actionable common quality control anomalies were incorporated into the automatic quality control module that rejects data depending on the magnitude of a quality problem, and warns and guides users in performing correctional actions. The guiding responses generated by the system are tailored to the GHOST laboratory protocol.

Conclusions: Several new quality control problems were identified in MiSeq data submitted to GHOST and used to improve protection of the system from erroneous data and users from erroneous inferences. The GHOST system was upgraded to include identification of causes of erroneous data and recommendation of corrective actions to laboratory users.

Keywords: HCV; HVR1; Molecular surveillance; Outbreak detection; Quality control; Transmission.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable since no personally identifiable information (PII) is contained in this study.

Consent for publication

Not applicable.

Competing interests

The authors declare they don’t have any competing interests. The findings and conclusions in this article are those of the authors and do not necessarily represent the official position of the U.S. Centers for Disease Control and Prevention (CDC).

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Sankey diagram of read pair allocation for all samples after deduplication. Arrow thickness represents the proportion of read pairs removed by the filter step. The “not sampled” step represents those reads not used after 20,000 read pair random sampling
Fig. 2
Fig. 2
Performance of primer dimer filter. a) Mean values of filters before the introduction of the primer dimer filter. b) Mean value of filters after introduction of the primer dimer filter
Fig. 3
Fig. 3
Histogram of primer dimer filter normalized values. Normalization is calculated with respect to the number of read pairs entering into the filter
Fig. 4
Fig. 4
Scatter plots of the primer dimer filter compared against 3 other filters
Fig. 5
Fig. 5
Mean values (before normalization) of all GHOST QC task filters with respect to the 4 mutually exclusive sample sets
Fig. 6
Fig. 6
Boxplots of filter distributions after normalization for all deduplicated samples
Fig. 7
Fig. 7
Scatter plot showing samples in categories PNN, FN, and PN. Box shows the application of the three threshold combination using minimization of Gini impurity index
Fig. 8
Fig. 8
Breakdown of data categorizations using parameters from the three filter threshold combination. Top row shows histograms of each of the three filters. Bottom row shows results of using any two of the filters alone
Fig. 9
Fig. 9
Histogram of the ratio of bit score-derived log probabilities of best to second-best subtype matches of the sequences in all deduplicated samples submitted to GHOST. Solid line indicates the cutoff ratio of 2, with the area under the curve to the left of the cutoff representing unique sequences that are classified only at the genotype level
Fig. 10
Fig. 10
Histogram of prevalence ratios for all non-dominant subtypes where prevalence ratio is defined as the total frequency of the subtype divided by the total frequency of the dominant type
Fig. 11
Fig. 11
All deduplicated samples submitted to GHOST, including artificially created panel verification samples and non-linking samples. Node and link colors were arbitrarily assigned to clusters
Fig. 12
Fig. 12
All links found in GHOST. Nodes representing samples artificially created for panel verifications by state pilot participants were removed, along with non-linking samples. Box encloses a chordless cycle. Node and link colors were arbitrarily assigned to clusters
Fig. 13
Fig. 13
All links found in GHOST with removal of nodes representing samples artificially created for panel verifications by state pilot participants and nodes representing samples associated with a project with known quality control issues. Non-linking samples removed. Node and link colors were arbitrarily assigned to clusters

References

    1. WHO Global eradication of smallpox: WHO global Commission for the Certification of smallpox eradication. J Med Assoc Thail. 1979;62(8):461. - PubMed
    1. de Quadros CA, Olive JM, Hersh BS, Strassburg MA, Henderson DA, Brandling-Bennett D, Alleyne GA. Measles elimination in the Americas. Evolving strategies. JAMA. 1996;275(3):224–229. doi: 10.1001/jama.1996.03530270064033. - DOI - PubMed
    1. Cattand P, Jannin J, Lucas P. Sleeping sickness surveillance: an essential step towards elimination. Tropical Med Int Health. 2001;6(5):348–361. doi: 10.1046/j.1365-3156.2001.00669.x. - DOI - PubMed
    1. Broekmans JF, Migliori GB, Rieder HL, Lees J, Ruutu P, Loddenkemper R, Raviglione MC, World Health Organization IUAT, Lung D, Royal Netherlands Tuberculosis Association Working G European framework for tuberculosis control and elimination in countries with a low incidence. Recommendations of the World Health Organization (WHO), International Union against Tuberculosis and Lung Disease (IUATLD) and Royal Netherlands Tuberculosis Association (KNCV) working group. Eur Respir J. 2002;19(4):765–775. doi: 10.1183/09031936.02.00261402. - DOI - PubMed
    1. Nesheim S, Taylor A, Lampe MA, Kilmarx PH, Fitz Harris L, Whitmore S, Griffith J, Thomas-Proctor M, Fenton K, Mermin J. A framework for elimination of perinatal transmission of HIV in the United States. Pediatrics. 2012;130(4):738–744. doi: 10.1542/peds.2012-0194. - DOI - PubMed

LinkOut - more resources