Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep;29(9):961-973.
doi: 10.1089/cmb.2021.0399. Epub 2022 May 19.

An Integer Linear Programming Approach for Scaffolding Based on Exemplar Breakpoint Distance

Affiliations

An Integer Linear Programming Approach for Scaffolding Based on Exemplar Breakpoint Distance

Yi-Kung Shieh et al. J Comput Biol. 2022 Sep.

Abstract

Reference-based scaffolding is an important process used in genomic sequencing to order and orient the contigs in a draft genome based on a reference genome. In this study, we utilize the concept of genome rearrangement to formulate this process as an exemplar breakpoint distance (EBD)-based scaffolding problem, whose aim is to scaffold the contigs of two given draft genomes, both containing duplicate genes (or sequence markers) and acting with each other as a reference, such that the EBD between the scaffolded genomes is minimized. The EBD-based scaffolding problem is difficult to solve because it is non-deterministic polynomial-time (NP)-hard. In this work, we design an integer linear programming (ILP)-based algorithm to exactly solve the EBD-based scaffolding problem. Our experimental results on both simulated and biological data sets show that our ILP-based scaffolding algorithm can accurately and efficiently use a reference genome to scaffold the contigs of a draft genome. Moreover, our ILP-based scaffolding algorithm with considering duplicate genes indeed has better accuracy performance than that without considering duplicate genes, suggesting that duplicate genes and their exemplars are helpful for the application of genome rearrangement in the study of the reference-based scaffolding problem. When compared with RaGOO, a current state-of-the-art alignment-based scaffolder, our ILP-based scaffolding algorithm still has better accuracy performance on the biological data sets.

Keywords: algorithm; exemplar breakpoint distance; integer linear programming; scaffolding; sequencing.

PubMed Disclaimer

Publication types

LinkOut - more resources