Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Dec 26:1:132.
doi: 10.4172/jcsb.1000013.

Management of High-Throughput DNA Sequencing Projects: Alpheus

Affiliations

Management of High-Throughput DNA Sequencing Projects: Alpheus

Neil A Miller et al. J Comput Sci Syst Biol. .

Abstract

High-throughput DNA sequencing has enabled systems biology to begin to address areas in health, agricultural and basic biological research. Concomitant with the opportunities is an absolute necessity to manage significant volumes of high-dimensional and inter-related data and analysis. Alpheus is an analysis pipeline, database and visualization software for use with massively parallel DNA sequencing technologies that feature multi-gigabase throughput characterized by relatively short reads, such as Illumina-Solexa (sequencing-by-synthesis), Roche-454 (pyrosequencing) and Applied Biosystem's SOLiD (sequencing-by-ligation). Alpheus enables alignment to reference sequence(s), detection of variants and enumeration of sequence abundance, including expression levels in transcriptome sequence. Alpheus is able to detect several types of variants, including non-synonymous and synonymous single nucleotide polymorphisms (SNPs), insertions/deletions (indels), premature stop codons, and splice isoforms. Variant detection is aided by the ability to filter variant calls based on consistency, expected allele frequency, sequence quality, coverage, and variant type in order to minimize false positives while maximizing the identification of true positives. Alpheus also enables comparisons of genes with variants between cases and controls or bulk segregant pools. Sequence-based differential expression comparisons can be developed, with data export to SAS JMP Genomics for statistical analysis.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Key components of Alpheus and data flow through the system.
Figure 2
Figure 2
Partial schema of Alpheus. Transcriptome alignments and substitution sequence variants are stored in this core schema, as described in detail in Materials and Methods.
Figure 3
Figure 3
Overlaid kernel density estimates of gene expression by sequence read frequencies. Gene expression of whole blood mRNA for normal Illumina library prep (red), fragmented after poly-A selection, and with Ribo(−) exclusion. The X-axis show log2 transformed gene expression values, while the Y-axis shows kernel densities. Without log transformation, samples showed greater variability in kernel densities and sequence read frequencies showed near exponential decay.
Figure 4
Figure 4
Unsupervised PCA of expression data. Three dimensional plot of unsupervised PCA by Pearson product-moment correlation of log sequence expression. Normal (Red) and fragmented (Blue) libraries are more similar than the Ribo(−) prepped libraries (blue).
Figure 5
Figure 5
Hierarchal clustering of log transformed expression data. 13,791 genes out of 33,887 total genes were at least two-fold different. Most genes had much higher expression in both normal and fragmented library preps than Ribo(−). Normal and fragmented prep had 6,577 genes that were at least two fold different. 10,404 genes were different between normal and ribo(−), while 11,104 genes were two fold different between fragmented and ribo(−) library preps
Figure 6
Figure 6
Pairwise sample correlations of Log2 transformed read frequencies, showing pairwise correlation coefficients. Pairwise comparisons suggest fairly linear distribution of gene expression of the normal library technique versus the fragmented technique, while there is much great frequency distribution between ribo (−) and the normal and fragmented techniques.

References

    1. Addo QC, Eshoo TW, Bartel DP, Axtell MJ. Endogenous siRNA and miRNA targets identified by sequencing of the Arabidopsis degradome. Curr Biol. 2008;18:758–762. - PMC - PubMed
    1. Addo QC, Miller W, Axtell MJ. CleaveLand: A pipeline for using degradome data to find cleaved small RNA targets. Bioinformatics (Oxford, England) 2008 - PMC - PubMed
    1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research. 1997;25:3389–3402. - PMC - PubMed
    1. Burnside J, Ouyang M, Anderson A, Bernberg E, Lu C, et al. Deep sequencing of chicken microRNAs. BMC genomics. 2008;9:185. - PMC - PubMed
    1. Butcher LM, Beck S. Future impact of integrated high-throughput methylome analyses on human health and disease. Journal of genetics and genomics = Yi chuan xue bao. 2008;35:391–401. - PubMed

LinkOut - more resources