Management of High-Throughput DNA Sequencing Projects: Alpheus

Affiliations

PMID: 20151039
PMCID: PMC2819532
DOI: 10.4172/jcsb.1000013

Management of High-Throughput DNA Sequencing Projects: Alpheus

Neil A Miller et al. J Comput Sci Syst Biol. 2008.

. 2008 Dec 26:1:132.

doi: 10.4172/jcsb.1000013.

Affiliation

¹ National Center for Genome Resources, 2935 Rodeo Park Drive East, Santa Fe, NM 87505, USA.

PMID: 20151039
PMCID: PMC2819532
DOI: 10.4172/jcsb.1000013

Abstract

High-throughput DNA sequencing has enabled systems biology to begin to address areas in health, agricultural and basic biological research. Concomitant with the opportunities is an absolute necessity to manage significant volumes of high-dimensional and inter-related data and analysis. Alpheus is an analysis pipeline, database and visualization software for use with massively parallel DNA sequencing technologies that feature multi-gigabase throughput characterized by relatively short reads, such as Illumina-Solexa (sequencing-by-synthesis), Roche-454 (pyrosequencing) and Applied Biosystem's SOLiD (sequencing-by-ligation). Alpheus enables alignment to reference sequence(s), detection of variants and enumeration of sequence abundance, including expression levels in transcriptome sequence. Alpheus is able to detect several types of variants, including non-synonymous and synonymous single nucleotide polymorphisms (SNPs), insertions/deletions (indels), premature stop codons, and splice isoforms. Variant detection is aided by the ability to filter variant calls based on consistency, expected allele frequency, sequence quality, coverage, and variant type in order to minimize false positives while maximizing the identification of true positives. Alpheus also enables comparisons of genes with variants between cases and controls or bulk segregant pools. Sequence-based differential expression comparisons can be developed, with data export to SAS JMP Genomics for statistical analysis.

PubMed Disclaimer

Figures

**Figure 1**
Key components of *Alpheus* and data flow through the system.

**Figure 2**
Partial schema of *Alpheus*. Transcriptome alignments and substitution sequence variants are stored in this core schema, as described in detail in Materials and Methods.

**Figure 3**
Overlaid kernel density estimates of gene expression by sequence read frequencies. Gene expression of whole blood mRNA for normal Illumina library prep (red), fragmented after poly-A selection, and with Ribo(−) exclusion. The X-axis show log2 transformed gene expression values, while the Y-axis shows kernel densities. Without log transformation, samples showed greater variability in kernel densities and sequence read frequencies showed near exponential decay.

**Figure 4**
Unsupervised PCA of expression data. Three dimensional plot of unsupervised PCA by Pearson product-moment correlation of log sequence expression. Normal (Red) and fragmented (Blue) libraries are more similar than the Ribo(−) prepped libraries (blue).

**Figure 5**
Hierarchal clustering of log transformed expression data. 13,791 genes out of 33,887 total genes were at least two-fold different. Most genes had much higher expression in both normal and fragmented library preps than Ribo(−). Normal and fragmented prep had 6,577 genes that were at least two fold different. 10,404 genes were different between normal and ribo(−), while 11,104 genes were two fold different between fragmented and ribo(−) library preps

**Figure 6**
Pairwise sample correlations of Log2 transformed read frequencies, showing pairwise correlation coefficients. Pairwise comparisons suggest fairly linear distribution of gene expression of the normal library technique versus the fragmented technique, while there is much great frequency distribution between ribo (−) and the normal and fragmented techniques.

See this image and copyright information in PMC

References

1. Addo QC, Eshoo TW, Bartel DP, Axtell MJ. Endogenous siRNA and miRNA targets identified by sequencing of the Arabidopsis degradome. Curr Biol. 2008;18:758–762. - PMC - PubMed
1. Addo QC, Miller W, Axtell MJ. CleaveLand: A pipeline for using degradome data to find cleaved small RNA targets. Bioinformatics (Oxford, England) 2008 - PMC - PubMed
1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research. 1997;25:3389–3402. - PMC - PubMed
1. Burnside J, Ouyang M, Anderson A, Bernberg E, Lu C, et al. Deep sequencing of chicken microRNAs. BMC genomics. 2008;9:185. - PMC - PubMed
1. Butcher LM, Beck S. Future impact of integrated high-throughput methylome analyses on human health and disease. Journal of genetics and genomics = Yi chuan xue bao. 2008;35:391–401. - PubMed

Grants and funding

P20 RR016480/RR/NCRR NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Management of High-Throughput DNA Sequencing Projects: Alpheus

Affiliation

Management of High-Throughput DNA Sequencing Projects: Alpheus

Authors

Affiliation

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources