Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May 26:12:14.
doi: 10.1186/s13015-017-0106-z. eCollection 2017.

The gene family-free median of three

Affiliations

The gene family-free median of three

Daniel Doerr et al. Algorithms Mol Biol. .

Abstract

Background: The gene family-free framework for comparative genomics aims at providing methods for gene order analysis that do not require prior gene family assignment, but work directly on a sequence similarity graph. We study two problems related to the breakpoint median of three genomes, which asks for the construction of a fourth genome that minimizes the sum of breakpoint distances to the input genomes.

Methods: We present a model for constructing a median of three genomes in this family-free setting, based on maximizing an objective function that generalizes the classical breakpoint distance by integrating sequence similarity in the score of a gene adjacency. We study its computational complexity and we describe an integer linear program (ILP) for its exact solution. We further discuss a related problem called family-free adjacencies for k genomes for the special case of [Formula: see text] and present an ILP for its solution. However, for this problem, the computation of exact solutions remains intractable for sufficiently large instances. We then proceed to describe a heuristic method, FFAdj-AM, which performs well in practice.

Results: The developed methods compute accurate positional orthologs for genomes comparable in size of bacterial genomes on simulated data and genomic data acquired from the OMA orthology database. In particular, FFAdj-AM performs equally or better when compared to the well-established gene family prediction tool MultiMSOAR.

Conclusions: We study the computational complexity of a new family-free model and present algorithms for its solution. With FFAdj-AM, we propose an appealing alternative to established tools for identifying higher confidence positional orthologs.

Keywords: Breakpoint median; Family-free genome comparison; Positional orthology.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
a Illustration of the score of a candidate median gene. b Gene similarity graph of three genomes G, H, and I. Colored components indicate candidate median genes m1=(g1,h1,i2), m2=(g2,h2,i1), m3=(g3,h3,i2), and m4=(g4,h3,i3). Median gene pairs m1,m3 and m3,m4 are conflicting
Fig. 2
Fig. 2
The effect of duplication and deletion of a single gene in problem FF-Median. Colored arcs correspond to potential median adjacencies
Fig. 3
Fig. 3
The seven valid types of components of a partial 3-matching
Fig. 4
Fig. 4
Program FF-Median, an ILP for solving problem FF-Median
Fig. 5
Fig. 5
Algorithm ICF-SEG
Fig. 6
Fig. 6
Program FFAdj-3G, an ILP for solving FF-Adjacencies for three genomes
Fig. 7
Fig. 7
The implications of Constraint (C.02) on combinations of saturated edges. Parts ap visualize all 16 possibilities that are valid under Constraint (C.01). The parts show how edges incident to genes i and h are effected by the first case of Constraint (C.02) that acts on edges {g,h} and {g,i} (green lines). Saturated edges are indicated by thick continuous lines, unsaturated edges by dashed lines. Dotted gray lines are not considered by the constraint and can be either saturated or unsaturated. Only combinations shown in Parts h, l, and p violate constraint (C.02)
Fig. 8
Fig. 8
Top Precision and recall of a FF-Median and b FFAdj-AM in comparison with MultiMSOAR in simulations; Middle agreement, compatibility and disagreement of positional orthologs inferred by c FFAdj-AM and d MultiMSOAR with the OMA database; Bottom e statistical assessment of CARs and median genes inferred by FF-Median on real datasets; f histogram of fragile orthologies in results obtained by FFAdj-AM and MultiMSOAR

References

    1. Tannier E, Zheng C, Sankoff D. Multichromosomal median and halving problems under different genomic distances. BMC Bioinform. 2009;10:120. doi: 10.1186/1471-2105-10-120. - DOI - PMC - PubMed
    1. Braga MDV, Chauve C, Doerr D, Jahn K, Stoye J, Thévenin A, Wittler R. The potential of family-free genome comparison. In: Models and algorithms for genome evolution. London: Springer; 2013. p. 287–323.
    1. Dörr D. Gene family-free genome comparison. PhD thesis, Universität Bielefeld, Bielefeld, Germany. 2016. https://pub.uni-bielefeld.de/publication/2902049.
    1. Doerr D, Thévenin A, Stoye J. Gene family assignment-free comparative genomics. BMC Bioinform. 2012;13(Suppl 19):3. doi: 10.1186/1471-2105-13-S19-S3. - DOI - PMC - PubMed
    1. Martinez FV, Feijão P, Braga MDV, Stoye J. On the family-free DCJ distance and similarity. Algorithms Mol Biol. 2015;10:13. doi: 10.1186/s13015-015-0041-9. - DOI - PMC - PubMed

LinkOut - more resources