Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jan 1;35(1):252-255.
doi: 10.1093/molbev/msx283.

CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection

Affiliations

CompositeSearch: A Generalized Network Approach for Composite Gene Families Detection

Jananan Sylvestre Pathmanathan et al. Mol Biol Evol. .

Abstract

Genes evolve by point mutations, but also by shuffling, fusion, and fission of genetic fragments. Therefore, similarity between two sequences can be due to common ancestry producing homology, and/or partial sharing of component fragments. Disentangling these processes is especially challenging in large molecular data sets, because of computational time. In this article, we present CompositeSearch, a memory-efficient, fast, and scalable method to detect composite gene families in large data sets (typically in the range of several million sequences). CompositeSearch generalizes the use of similarity networks to detect composite and component gene families with a greater recall, accuracy, and precision than recent programs (FusedTriplets and MosaicFinder). Moreover, CompositeSearch provides user-friendly quality descriptions regarding the distribution and primary sequence conservation of these gene families allowing critical biological analyses of these data.

Keywords: bioinformatics; evolution; molecular evolution; network analysis; protein sequence analysis.

PubMed Disclaimer

Figures

<sc>Fig</sc>. 1
Fig. 1
(A) Top: Example of a composite gene. Gene family 3 evolved from a composite of families 1 and 2. Bottom: Sequences from family 3 partially align with sequences from families 1 and 2. (B) Similarity network of a composite gene family (red) and its component gene families (green and purple). MosaicFinder will detect only the top case where composite genes form a clique, whereas CompositeSearch detects composite gene families forming a clique (top) or quasi-clique (bottom).

References

    1. Adai AT, Date SV, Wieland S, Marcotte EM.. 2004. LGL: creating a map of protein function with an algorithm for visualizing very large biological networks. J Mol Biol. 340(1):179–190. - PubMed
    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ.. 1990. Basic local alignment search tool. J Mol Biol. 215(3):403–410. - PubMed
    1. Atkinson HJ, Morris JH, Ferrin TE, Babbitt PC.. 2009. Using sequence similarity networks for visualization of relationships across diverse protein superfamilies. PLoS One 4(2):e4345.. - PMC - PubMed
    1. Bornberg-Bauer E, Schmitz J, Heberlein M.. 2015. Emergence of de novo proteins from ‘dark genomic matter’ by ‘grow slow and moult’. Biochem Soc Trans. 43(5):867–873. - PubMed
    1. Corel E, Lopez P, Meheust R, Bapteste E.. 2016. Network-Thinking: graphs to analyze microbial complexity and evolution. Trends Microbiol. 24(3):224–237.http://dx.doi.org/10.1016/j.tim.2015.12.003 - DOI - PMC - PubMed

Publication types

LinkOut - more resources