Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Feb 19;8(2):53.
doi: 10.3390/v8020053.

Identification of Known and Novel Recurrent Viral Sequences in Data from Multiple Patients and Multiple Cancers

Affiliations

Identification of Known and Novel Recurrent Viral Sequences in Data from Multiple Patients and Multiple Cancers

Jens Friis-Nielsen et al. Viruses. .

Abstract

Virus discovery from high throughput sequencing data often follows a bottom-up approach where taxonomic annotation takes place prior to association to disease. Albeit effective in some cases, the approach fails to detect novel pathogens and remote variants not present in reference databases. We have developed a species independent pipeline that utilises sequence clustering for the identification of nucleotide sequences that co-occur across multiple sequencing data instances. We applied the workflow to 686 sequencing libraries from 252 cancer samples of different cancer and tissue types, 32 non-template controls, and 24 test samples. Recurrent sequences were statistically associated to biological, methodological or technical features with the aim to identify novel pathogens or plausible contaminants that may associate to a particular kit or method. We provide examples of identified inhabitants of the healthy tissue flora as well as experimental contaminants. Unmapped sequences that co-occur with high statistical significance potentially represent the unknown sequence space where novel pathogens can be identified.

Keywords: assay contamination; cancer causing viruses; next generation sequencing; novel sequence identification; oncoviruses; sequence clustering; taxonomic characterisation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic representation of the bioinformatics pipeline used to process sequencing reads from all data sets. The ‘preprocessing’ step includes removal of adapter sequences, trimming of low-quality sequences, and merging of paired-end reads. Data sets progress in parallel until the ‘clustering’ step, where contigs from all data sets are pooled and grouped.
Figure 2
Figure 2
p-values of all significant associations. Rows describe features with biological features in red, methodological in green and technical in blue. There are 73 features significantly associated to one or more clusters. Columns describe all significant associations of each of the 6165 unique clusters. The cluster identifiers have been excluded to avoid cluttering.
Figure 3
Figure 3
Lowest p-values of clusters established by the pipeline. The p-values are arranged by feature of the strongest significant association of each of the 6165 clusters. The 50 features involved as strongest associations have been coloured by type: biological (red), methodological (green), and technical (blue). The boxes span the first and third quartiles. The dark band inside each box represents the median. The whiskers of the boxes extend to the lowest and highest values within a distance of 1.5 times the interquartile range. As can be seen, most p-values were above 1e-24, but a few methodological features have associated clusters with very low p-values, such as f056, f068, f069, f076, f079, and f084. The library preparation kit ScriptSeq v2 RNA-Seq, Illumina (f084) displays strongly associated clusters with p-values as low as 3.04e-89 that mapped as species Avian myeloblastosis-associated virus. Clusters that were annotated as NCBI species Parvovirus NIH/CQV were associated to laboratory-kit RNeasy MinElute, Qiagen (f076) with minimal p-value 5.48e-38. Finally, a cluster annotated as Acanthocystis turfacea chlorella virus MN0810.1 (ATCV) was associated to DNase/RNase: Promega DNase stop solution (f069) with p-value = 4.19e-12.
Figure 4
Figure 4
Unmapped clusters. The clusters are placed by their strongest associated feature. Feature types are marked in colour as follows: biological (red), methodological (green), and technical (blue). Top: Number of clusters associated to each feature on a log-10 scaled axis. There are 648 associated clusters of feature DNase/RNase: Promega DNase stop solution (f069), and 1 associated cluster to feature Polymerases: Phusion HF, NEB (f086). Bottom: Base-pair length (bp) of all cluster representatives (longest contig of each cluster) on a log-10 scaled axis. The N50 of all unmapped cluster representatives are marked by a brown dot. The longest cluster representative is 33.6 kb with N50 = 617 bp.

Similar articles

Cited by

References

    1. Bouvard V., Baan R., Straif K., Grosse Y., Secretan B., El Ghissassi F., Benbrahim-Tallaa L., Guha N., Freeman C., Galichet L., et al. A Review of Human Carcinogens—Part B: Biological Agents. Lancet Oncol. 2009;10:321–322. doi: 10.1016/S1470-2045(09)70096-8. - DOI - PubMed
    1. Van der Hoek L. Identification of a New Human Coronavirus. Nat. Med. 2004;10:368–373. doi: 10.1038/nm1024. - DOI - PMC - PubMed
    1. Allander T., Tammi M.T., Eriksson M., Bjerkner A., Tiveljung-Lindell A., Andersson B. Cloning of a Human Parvovirus by Molecular Screening of Respiratory Tract Samples. Proc. Natl. Acad. Sci. USA. 2005;102:12891–12896. doi: 10.1073/pnas.0504666102. - DOI - PMC - PubMed
    1. Jones M.S., Kapoor A., Lukashov V.V., Simmonds P., Hecht F., Delwart E. New DNA Viruses Identified in Patients with Acute Viral Infection Syndrome. J. Virol. 2005;79:8230–8236. doi: 10.1128/JVI.79.13.8230-8236.2005. - DOI - PMC - PubMed
    1. Woo P.C.Y., Lau S.K.P., Chu C., Chan K., Tsoi H., Huang Y., Wong B.H.L., Poon R.W.S., Cai J.J., Luk W., et al. Characterization and Complete Genome Sequence of a Novel Coronavirus, Coronavirus HKU1, from Patients with Pneumonia. J. Virol. 2005;79:884–895. doi: 10.1128/JVI.79.2.884-895.2005. - DOI - PMC - PubMed

Publication types

LinkOut - more resources