Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Dec 1;11(12):e0164540.
doi: 10.1371/journal.pone.0164540. eCollection 2016.

Nullomers and High Order Nullomers in Genomic Sequences

Affiliations

Nullomers and High Order Nullomers in Genomic Sequences

Davide Vergni et al. PLoS One. .

Abstract

A nullomer is an oligomer that does not occur as a subsequence in a given DNA sequence, i.e. it is an absent word of that sequence. The importance of nullomers in several applications, from drug discovery to forensic practice, is now debated in the literature. Here, we investigated the nature of nullomers, whether their absence in genomes has just a statistical explanation or it is a peculiar feature of genomic sequences. We introduced an extension of the notion of nullomer, namely high order nullomers, which are nullomers whose mutated sequences are still nullomers. We studied different aspects of them: comparison with nullomers of random sequences, CpG distribution and mean helical rise. In agreement with previous results we found that the number of nullomers in the human genome is much larger than expected by chance. Nevertheless antithetical results were found when considering a random DNA sequence preserving dinucleotide frequencies. The analysis of CpG frequencies in nullomers and high order nullomers revealed, as expected, a high CpG content but it also highlighted a strong dependence of CpG frequencies on the dinucleotide position, suggesting that nullomers have their own peculiar structure and are not simply sequences whose CpG frequency is biased. Furthermore, phylogenetic trees were built on eleven species based on both the similarities between the dinucleotide frequencies and the number of nullomers two species share, showing that nullomers are fairly conserved among close species. Finally the study of mean helical rise of nullomers sequences revealed significantly high mean rise values, reinforcing the hypothesis that those sequences have some peculiar structural features. The obtained results show that nullomers are the consequence of the peculiar structure of DNA (also including biased CpG frequency and CpGs islands), so that the hypermutability model, also taking into account CpG islands, seems to be not sufficient to explain nullomer phenomenon. Finally, high order nullomers could emphasize those features that already make simple nullomers useful in several applications.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Number of first order nullomers (black filled circles, ⚫) compared with expected number of first order nullomers (red empty circle, ⚪) of size 14, as a function of the number of CpGs occurring in the sequences.
The expected number of nullomers is computed considering random sequences with the same length of the human genome preserving dinucleotide frequencies.
Fig 2
Fig 2. CpG frequencies for each dinucleotide position (black line) for H011 (panel a), H114 (panel b) and H216 (panel c) of the human genome.
CpG frequencies for present sequences of the same length (green line) are also reported in the three panels.
Fig 3
Fig 3. CpG frequencies for each dinucleotide position for first order nullomers (H114) in eleven different species: panel a) Human, Chimpanzee and Gorilla (yellow, dark yellow and light orange, respectively), panel b) Rat and Mouse (orange and light red, respectively), panel c) Opossum, Bovine, Goat and Lemur (red, dark red, light brown and brown, respectively), panel d) Chicken, and Rabbit (very dark brown and black, respectively).
In panel e all the species are reported together.
Fig 4
Fig 4. Phylogenetic trees of 11 species obtained by (first row) DC distance for nullomers (T1—on the left) and first order nullomers (T2—on the right); (second row) DJ distance for nullomers (T3—on the left) and first order nullomers (T4—on the right).
Fig 5
Fig 5. Distribution of average rise values (black line) for H011 (panel a), H114 (panel b) and H216 (panel c). Average rise values for present sequences (green plot) are also reported in the three panels.

Similar articles

Cited by

References

    1. Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM. Whole-genome random sequencing and assembly of Haemophilus influenzae rd. Science. 1995;269:496–512 10.1126/science.7542800 - DOI - PubMed
    1. Karlin S, Mrazek J, Campbell AM. Compositional biases of bacterial genomes and evolutionary implications. Journal of bacteriology. 1997;179:3899–3913 10.1128/jb.179.12.3899-3913.1997 - DOI - PMC - PubMed
    1. Karlin S, Mrazek J. Compositional differences within and between eukaryotic genomes. Proceedings of the National Academy of Sciences. 1997;94:10227–10232 10.1073/pnas.94.19.10227 - DOI - PMC - PubMed
    1. Hampikian G, Andersen T. Absent sequences: nullomers and primes. Pacific Symposium on Biocomputing. 2007;12:355–366. - PubMed
    1. Acquisti C, Poste G, Curtiss D, Kumar S. Nullomers: really a matter of natural selection? PloS one. 2007;2:1022 10.1371/journal.pone.0001022 - DOI - PMC - PubMed

LinkOut - more resources