Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 5;16(11):e0259670.
doi: 10.1371/journal.pone.0259670. eCollection 2021.

Detection of structural variations in densely-labelled optical DNA barcodes: A hidden Markov model approach

Affiliations

Detection of structural variations in densely-labelled optical DNA barcodes: A hidden Markov model approach

Albertas Dvirnas et al. PLoS One. .

Abstract

Large-scale genomic alterations play an important role in disease, gene expression, and chromosome evolution. Optical DNA mapping (ODM), commonly categorized into sparsely-labelled ODM and densely-labelled ODM, provides sequence-specific continuous intensity profiles (DNA barcodes) along single DNA molecules and is a technique well-suited for detecting such alterations. For sparsely-labelled barcodes, the possibility to detect large genomic alterations has been investigated extensively, while densely-labelled barcodes have not received as much attention. In this work, we introduce HMMSV, a hidden Markov model (HMM) based algorithm for detecting structural variations (SVs) directly in densely-labelled barcodes without access to sequence information. We evaluate our approach using simulated data-sets with 5 different types of SVs, and combinations thereof, and demonstrate that the method reaches a true positive rate greater than 80% for randomly generated barcodes with single variations of size 25 kilobases (kb). Increasing the length of the SV further leads to larger true positive rates. For a real data-set with experimental barcodes on bacterial plasmids, we successfully detect matching barcode pairs and SVs without any particular assumption of the types of SVs present. Instead, our method effectively goes through all possible combinations of SVs. Since ODM works on length scales typically not reachable with other techniques, our methodology is a promising tool for identifying arbitrary combinations of genomic alterations.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Schematics of the structural variations (SVs) problem using DNA barcodes.
As an illustration of different types of SVs, shown here are 6 different pairs (above: reference barcode, below: query barcode) of stacked barcodes: (A) An insertion, a sub-barcode is inserted in the query barcode. (B) A deletion, a sub-barcode is deleted in the query barcode. (C) An inversion, involves flipping a sub-barcode in the query barcode. (D) A repeat, a sub-barcode is repeated two (or more) times. (E) A translocation, a sub-barcode in the query barcode is moved to a different place on the reference barcode. (F) Inversion+Translocation, a complex SV involving both flipping a sub-barcode in the query barcode and moving a sub-barcode in the query barcode to a different place compared to the reference barcode. In these examples all query barcodes are random barcodes (see Table 1) of 500 pixels (≈250 kb) length and the SVs are 100 pixels (≈50 kb) long. Matching sub-barcodes are enveloped in boxes of the same colour.
Fig 2
Fig 2. Hidden Markov Model (HMM) approach for detecting SVs in barcodes.
The method consists of 5 steps: 1) The length of the query barcode (barcode with SVs) is rescaled based on a range of length re-scaling factors around an initial estimate of length re-scaling factor. 2) The most likely path through the states, which defines the final alignment, is found using Viterbi algorithm. This path corresponds to pairs of indices of sub-barcodes between query and reference barcodes. 3) Sub-barcodes based on the most likely length re-scaling factor are selected. 4) Gaps and overlaps that are separated by a distance no more than g are closed (sub-barcodes merged). 5) Unlikely matches are filtered out using a p-value threshold pthresh. Finally, the output table with the detected matching sub-barcode pairs is given.
Fig 3
Fig 3. SV-detection for noisified random SV barcodes.
(Top) HMM output for comparison of two noisified random SV barcodes with a single 50 pixel (25 kb) insertion. (Bottom) HMM output for comparison of two noisified random SV barcodes with a 50 pixel (25 kb) inversion and a 50 pixel (25 kb) translocation. Sub-barcode pairs that did not pass the p-value threshold are visualized in dashed boxes. In the tables next to each figure, dist scores for sub-barcodes Ci, p-values pi, and sub-barcode lengths li are reported. The noise level, 1 − dist, was here set to 0.1.
Fig 4
Fig 4. Dependence of true positive rate on noise in noisified random SV barcodes of different SVs.
We evaluate the five different SVs (insertion, deletion, inversion, repeat, and translocation) with random query and reference barcodes to test how true positive rate depends on the presence of different levels of noise. The associated figure showing the TPR as a function of the lengths of the SVs is found in S7 Fig in S1 Text. We find that the success rate (here measured by a true positive rate) is close to 0 after the p-value threshold for smaller values of dist (the noise is quantified by the dist value between noisified random SV barcode and random SV bacode without noise), but gets closer to 1 for larger values of dist. We used 100 pairs of random query (250 kb) and noisified random SV data barcodes with SVs of length 25 kb for dist ranging from 0.75 to 0.95.
Fig 5
Fig 5. HMM output for real data from a neonatal outbreak.
(Top) Output of the HMM method for comparison of two experimental ESBL-KP 80 kb consensus barcodes. Detected sub-barcode pairs suggest that there was a roughly 33 kb inversion in the middle. (Middle) Output of the HMM method for comparison of two experimental 215 kb consensus barcodes from different patients taken at approximately the same time. We find that all smaller sub-barcodes have been merged together, and there is a deletion (30 kb) on the reference barcode. (Bottom) Output of the HMM method for comparison of two experimental 215 kb consensus barcodes which shows a change that occurred within a patient over a 2 years period. Same color boxes contain significantly matching sub-barcodes. The detected sub-barcode has a dist score Ci, p-value pi, and is of length li.
Fig 6
Fig 6. HMM output for plasmid experiment against an ancestor plasmid DNA sequence of the bacterial resistance plasmid.
(Top) HMM output of an experimental consensus barcode for the pUUH239.2 plasmid compared to the theoretical DNA barcode for the ancestor (the pKPN3 plasmid). Note that we successfully identified the matching barcode-pair regions predicted by the BLAST alignment. (Bottom) BLAST output of 12 longest sub-sequence pairs with matching similarity of at least 90%.

References

    1. Müller V, Westerlund F. Optical DNA mapping in nanofluidic devices: principles and applications. Lab on a Chip. 2017;17(4):579–90. doi: 10.1039/C6LC01439A - DOI - PubMed
    1. Neely RK, Dedecker P, Hotta JI, Urbanavičiūtė G, Klimašauskas S, Hofkens J. DNA fluorocode: A single molecule, optical map of DNA with nanometre resolution. Chemical Science. 2010;1(4):453–60. doi: 10.1039/c0sc00277a - DOI
    1. Persson F, Tegenfeldt JO. DNA in nanochannels—directly visualizing genomic information. Chemical Society Reviews. 2010;39(3):985–99. doi: 10.1039/b912918a - DOI - PubMed
    1. Nilsson AN, Emilsson G, Nyberg LK, Noble C, Stadler LS, Fritzsche J, et al.. Competitive binding-based optical DNA mapping for fast identification of bacteria-multi-ligand transfer matrix theory and experimental applications on Escherichia coli. Nucleic acids research. 2014. Sep 2;42(15):e118-. doi: 10.1093/nar/gku556 - DOI - PMC - PubMed
    1. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, et al.. Global variation in copy number in the human genome. nature. 2006. Nov;444(7118):444–54. doi: 10.1038/nature05329 - DOI - PMC - PubMed

LinkOut - more resources