Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2013 Jan 1;4(1):97-106.
doi: 10.4161/viru.23161.

Bioinformatics analysis of large-scale viral sequences: from construction of data sets to annotation of a phylogenetic tree

Affiliations
Review

Bioinformatics analysis of large-scale viral sequences: from construction of data sets to annotation of a phylogenetic tree

Muhammad Munir. Virulence. .

Abstract

Due to a significant decrease in the cost of DNA sequencing, the number of sequences submitted to the public databases has dramatically increased in recent years. Efficient analysis of these data sets may lead to a significant understanding of the nature of pathogens such as bacteria, viruses, parasites, etc. However, this has raised questions about the efficacy of currently available algorithms for the study of pathogen evolution and construction of phylogenetic trees. While the advanced algorithms and corresponding programs are being developed, it is crucial to optimize the available ones in order to cope with the current need. The protocol presented in this study is optimized using a number of strategies currently being proposed for handling large-scale DNA sequence data sets, and offers a highly efficacious and accurate method for computing phylogenetic trees with limited computer resources. The protocol may take up to 36 h for construction and annotation of a final tree of about 20,000 sequences.

PubMed Disclaimer

Figures

None
Figure 1. An illustration outlining the procedure demonstrated in this protocol. All the tools used in this protocol are mentioned on the left side whereas the tools that can be used in future are mentioned on the right side of the figure.
None
Figure 2. A twenty-year perspective of non-structural (NS) gene sequence submissions to the GenBank. The green bar represents the upsurge in sequence submission during the 2009 Swine Flu pandemic.
None
Figure 3. An overview of the sequence alignment as seen in the CodonCode Aligner. The floating window shows the progress of the completion of the four fundamental steps: Initialization, Overlap Detection, Alignment and Data Model Update.
None
Figure 4. Processing of the data set for the construction of a phylogenetic tree in the raxmlGUI program. Please note two different windows, raxmlGUI 1.1 and raxmlGUI 1.1—Pythone—132 by 15. The progress of the tree construction will be displayed in the latter window.
None
Figure 5. An overview of the tree annotated in FigTree. The clustering pattern of different subtypes of influenza viruses is highlighted with different colors. The subtypes such as H3, H5 and H6 made a bigger cluster owing to same genetic nature. All other subtypes had shown diffused pattern within the tree. For clarification purposes, only a tree of the avian influenza NS1 gene is displayed.
None
Figure 6 (See previous page). A presentation of the phylogenetic tree (A) in radial view and (B) in a circle. These different display patterns are crucial to the final interpretation of results. The tree in radial view displays a clear division of the tree into two sub-trees.

Similar articles

Cited by

References

    1. Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol. 2008;26:1135–45. doi: 10.1038/nbt1486. - DOI - PubMed
    1. Swofford D, Olsen G, Waddel P, Hillis DM. Phylogenetic inference. Pages in (Molecular systematics, 2nd edition (D. M. Hillis, C.Moritz, and B. K. Mable, eds.). Sinauer, Sunderland, Massachusetts.
    1. Page R, Holmes E. Molecular evolution: A phylogenetic approach. Blackwell, Osney Mead, Oxford, UK.
    1. Uzzell T, Corbin KW. Fitting discrete probability distributions to evolutionary events. Science. 1971;172:1089–96. doi: 10.1126/science.172.3988.1089. - DOI - PubMed
    1. Jin L, Nei M. Limitations of the evolutionary parsimony method of phylogenetic analysis. Mol Biol Evol. 1990;7:82–102. - PubMed

LinkOut - more resources