Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jan 22;8(1):1300.
doi: 10.1038/s41598-017-14403-y.

Telomerecat: A ploidy-agnostic method for estimating telomere length from whole genome sequencing data

Collaborators, Affiliations

Telomerecat: A ploidy-agnostic method for estimating telomere length from whole genome sequencing data

James H R Farmery et al. Sci Rep. .

Erratum in

Abstract

Telomere length is a risk factor in disease and the dynamics of telomere length are crucial to our understanding of cell replication and vitality. The proliferation of whole genome sequencing represents an unprecedented opportunity to glean new insights into telomere biology on a previously unimaginable scale. To this end, a number of approaches for estimating telomere length from whole-genome sequencing data have been proposed. Here we present Telomerecat, a novel approach to the estimation of telomere length. Previous methods have been dependent on the number of telomeres present in a cell being known, which may be problematic when analysing aneuploid cancer data and non-human samples. Telomerecat is designed to be agnostic to the number of telomeres present, making it suited for the purpose of estimating telomere length in cancer studies. Telomerecat also accounts for interstitial telomeric reads and presents a novel approach to dealing with sequencing errors. We show that Telomerecat performs well at telomere length estimation when compared to leading experimental and computational methods. Furthermore, we show that it detects expected patterns in longitudinal data, repeated measurements, and cross-species comparisons. We also apply the method to a cancer cell data, uncovering an interesting relationship with the underlying telomerase genotype.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1
Scatter plots describing the relationship between Telomerecat, mTRF, and TelSeq estimates of telomere length (TL).
Figure 2
Figure 2
This figure shows estimates for the MSC samples produced by Telomerecat (left) and TelSeq (right). We expect to see a decrease in telomere length with additional passaging (P1 to P13), but consistent high telomere lengths in the two iPSC samples (iPSC1 and iPSC2).
Figure 3
Figure 3
Telomerecat and TelSeq estimates for the HCC cell line dataset.
Figure 4
Figure 4
A plot of telomere length (TL) estimates for repeated measurement pairs. Colours correspond to the sequencing platform of each sample in the pair.
Figure 5
Figure 5
Telomere length estimates by Telomerecat for 10 mouse samples from the Mouse Genomes Project.
Figure 6
Figure 6
An overview of the Telomerecat length estimation process.
Figure 7
Figure 7
The algorithm that determines the indices of divergence from the telomere sequence. 0: We observe a sequencing read 1: We split the read into ‘segments’ (11 in total in our example) such that each segment is a substring of the original sequence and that every other segment consists of unbroken telomere sequence. In our example we see that segments 1,3,5,7,9,11 contain unbroken telomere sequence. 2: Each segment containing a telomere hexamer is ‘expanded‘ to capture the full extent of the surrounding telomere sequence. The number of segments is reduced by 2. 3: When two segments both containing the telomere hexamer are adjacent after Step 2 this indicates a deletion event. We take the loci with the lowest corresponding Phred score. For any segment that does not contain a telomere hexamer and where the length of the segment is greater or equal to 4 apply we conduct a basic alignment of all possible telomere offset telomere sequences. The telomere sequence with the lowest Hamming distance is taken as a local alignment for that segment. Where two alignments are equal the one with the lowest average Phred score is preferred. 4: Sequence loci that are not in a complete hexamer or were mismatched in the Hamming alignment step are taken as mismatching loci. m for this example is given in the final line of the diagram.
Figure 8
Figure 8
(A) A heatmap of the joint distribution of Phred scores a mismatching loci and the number of mismatching loci (X). The intensities in the top left corner of the heatmap indicate an association between fewer mismatches and lower phred scores. We observe that the maximum mismatching loci is commonly ~75% of the read length. This effect is caused by non-telomere reads match a the telomere sequence simply by chance (B) A heatmap of the joint distribution of random loci in reads and the associated phred score (Y). We note that the joint distirubtion of reads in the upper half of the matrix is different to that in X while the lower portion is identical. (C) The difference between X and Y. Referred to as D in the text. (D) A binary heatmap showing all cells in D that are greater than the threshold k. We note the preponderance of cells in the upper left hand corner of the figure (E) We remove noise from the figure using the methods detailed in (Supplementary Algorithm 1) (F) We apply a final rule to ensure cells associated with low Phred scores are captured in the error profile (Supplementary Algorithm 2).
Figure 9
Figure 9
(A) The read-pair types at the boundary between telomere and subtelomere. F2a reads stem from the boundary whereas F1 reads stem from anywhere within the telomere proper. F3 are reads where neither read in the pair is complete telomere (B) Detail of the F1 and F2a read types. F1 read-pairs are comprised of two complete telomere reads. F2a read-pairs are comprised of a read-pair where one read is complete telomere and the other is not. Crucially, the complete telomere read is comprised of CCCTAA (C) The read-pair types at an ITR. (D) Detail of the F2b and F4 read types. Note that the F2b is physical indistinguishable from an F2a read. An F4 read is read-pair where one read is complete telomere and the other is not. The complete end is comprised of TTAGGG.
Algorithm 1
Algorithm 1
Telomerecat length estimation simulation algorithm.

References

    1. O’Sullivan RJ, Karlseder J. Telomeres: protecting chromosomes against genome instability. Nat. Rev. Mol. Cell Biol. 2010;11:171–181. doi: 10.1038/nrm2848. - DOI - PMC - PubMed
    1. Blackburn EH, Epel ES, Lin J. Human telomere biology: A contributory and interactive factor in aging, disease risks, and protection. Science. 2015;350:1193–1198. doi: 10.1126/science.aab3389. - DOI - PubMed
    1. Maciejowski J, de Lange T. Telomeres in cancer: tumour suppression and genome instability. Nat. Rev. Mol. Cell Biol. 2017;18:175–186. doi: 10.1038/nrm.2016.171. - DOI - PMC - PubMed
    1. Blasco MA. Telomeres and human disease: ageing, cancer and beyond. Nat. Rev. Genet. 2005;6:611–622. doi: 10.1038/nrg1656. - DOI - PubMed
    1. MacArthur J, et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog) Nucleic Acids Res. 2017;45:D896–D901. doi: 10.1093/nar/gkw1133. - DOI - PMC - PubMed

Publication types

MeSH terms