Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Aug 17:13:206.
doi: 10.1186/1471-2105-13-206.

CaPSID: a bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes

Affiliations

CaPSID: a bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes

Ivan Borozan et al. BMC Bioinformatics. .

Abstract

Background: It is now well established that nearly 20% of human cancers are caused by infectious agents, and the list of human oncogenic pathogens will grow in the future for a variety of cancer types. Whole tumor transcriptome and genome sequencing by next-generation sequencing technologies presents an unparalleled opportunity for pathogen detection and discovery in human tissues but requires development of new genome-wide bioinformatics tools.

Results: Here we present CaPSID (Computational Pathogen Sequence IDentification), a comprehensive bioinformatics platform for identifying, querying and visualizing both exogenous and endogenous pathogen nucleotide sequences in tumor genomes and transcriptomes. CaPSID includes a scalable, high performance database for data storage and a web application that integrates the genome browser JBrowse. CaPSID also provides useful metrics for sequence analysis of pre-aligned BAM files, such as gene and genome coverage, and is optimized to run efficiently on multiprocessor computers with low memory usage.

Conclusions: To demonstrate the usefulness and efficiency of CaPSID, we carried out a comprehensive analysis of both a simulated dataset and transcriptome samples from ovarian cancer. CaPSID correctly identified all of the human and pathogen sequences in the simulated dataset, while in the ovarian dataset CaPSID's predictions were successfully validated in vitro.

PubMed Disclaimer

Figures

Figure 1
Figure 1
CaPSID platform. The CaPSID platform is made of three components: A computational pipeline written in Python for executing digital subtraction, a core MongoDB database for storing reference sequences and alignment results, and a web application in Grails for visualizing and querying the data.
Figure 2
Figure 2
Shows sortable tables of coverage statistics for a sample displayed by CaPSID.
Figure 3
Figure 3
CaPSID’s integrated genome browser JBrowse. CaPSID’s integrated genome browser JBrowse, displaying the distribution of read alignments from different samples for a given genome (reads aligning simultaneously to the human reference are shown in red).
Figure 4
Figure 4
The top four pathogen genomes hit in the 293T cells as calculated by CaPSID. The top four genomes hit with the maximum coverage greater than 90% ranked by their maximum gene coverage.
Figure 5
Figure 5
The distribution of aligned reads across top four pathogen genomes hit in the OVCA0016 cells.A- Shows the distribution of hits across the SV40 viral genome, with aligned reads concentrating almost entirely across its small and large T-antigens. B- Shows E1A and E1B genes to be expressed in all three adenoviruses.
Figure 6
Figure 6
Analysis of adenovirus E1A/E1B and SV40 T antigen expression in OVCA0016 cultures.A - Western blotting analysis. Extracts from OVCA0016, 292T and H1299 cells were analyzed by western blotting using M73 (E1A), 2A6 (E1B55K) and Pab101 (SV40 T antigen, BD Pharmingen) antibodies, as described previously [21]. B - Immunofluorescence microscopy. The same cells types were grown on coverslips and analyzed by confocal immunofluorescence microscopy [22]. Cells not expressing T antigen have been indicated with arrows on the figure.

References

    1. zur Hausen H. Infections Causing Human Cancer. Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim; 2006.
    1. Javier RT, Butel JS. The history of tumor virology. Cancer Res. 2008;68:7693–7706. - PMC - PubMed
    1. Hudson TJ. et al. International network of cancer genome projects. Nature. 2010;464:993–998. - PMC - PubMed
    1. Feng H, Shuda M, Chang Y, Moore PS. Clonal Integration of a Polyomavirusin Human Merkel Cell Carcinoma. Science. 2009;319:1096–1100. - PMC - PubMed
    1. Palacios G, Druce J, Du L, Tran T, Birch C, Briese T, Conlan S, Quan PL, Hui J, Marshall J, Simons JF, Egholm M, Paddock CD, Shieh WJ, Goldsmith CS, Zaki SR, Catton M, Lipkin WI. A new arenavirus in a cluster of fatal transplant-associated diseases. N Engl J Med. 2008;358:991–998. - PubMed

Publication types

LinkOut - more resources