Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 3;21(1):279.
doi: 10.1186/s12864-020-6683-0.

Re-annotation of the Theileria parva genome refines 53% of the proteome and uncovers essential components of N-glycosylation, a conserved pathway in many organisms

Affiliations

Re-annotation of the Theileria parva genome refines 53% of the proteome and uncovers essential components of N-glycosylation, a conserved pathway in many organisms

Kyle Tretina et al. BMC Genomics. .

Abstract

Background: The apicomplexan parasite Theileria parva causes a livestock disease called East coast fever (ECF), with millions of animals at risk in sub-Saharan East and Southern Africa, the geographic distribution of T. parva. Over a million bovines die each year of ECF, with a tremendous economic burden to pastoralists in endemic countries. Comprehensive, accurate parasite genome annotation can facilitate the discovery of novel chemotherapeutic targets for disease treatment, as well as elucidate the biology of the parasite. However, genome annotation remains a significant challenge because of limitations in the quality and quantity of the data being used to inform the location and function of protein-coding genes and, when RNA data are used, the underlying biological complexity of the processes involved in gene expression. Here, we apply our recently published RNAseq dataset derived from the schizont life-cycle stage of T. parva to update structural and functional gene annotations across the entire nuclear genome.

Results: The re-annotation effort lead to evidence-supported updates in over half of all protein-coding sequence (CDS) predictions, including exon changes, gene merges and gene splitting, an increase in average CDS length of approximately 50 base pairs, and the identification of 128 new genes. Among the new genes identified were those involved in N-glycosylation, a process previously thought not to exist in this organism and a potentially new chemotherapeutic target pathway for treating ECF. Alternatively-spliced genes were identified, and antisense and multi-gene family transcription were extensively characterized.

Conclusions: The process of re-annotation led to novel insights into the organization and expression profiles of protein-coding sequences in this parasite, and uncovered a minimal N-glycosylation pathway that changes our current understanding of the evolution of this post-translational modification in apicomplexan parasites.

Keywords: East coast fever; Genome; N-glycosylation; Re-annotation; Theileria.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Manual gene model curation examples. Several tracks are shown: updated gene model (beige background), original (2005) gene annotation (grey background), RNAseq data (white background), transcript assembly (dark green, on green background), and EVM predictions (orange, on green background). a A new gene discovered on the basis of RNAseq data (TpMuguga_03g02005). b A case where two genes in the 2005 annotation merge in the new annotation on the basis of RNAseq read coverage (TpMuguga_04g02435). c A case where a gene in the 2005 annotation has been split into two genes in the new annotation (TpMuguga_04g02190 and TpMuguga_04g02185). d A case where a gene has been reversed in orientation on the basis of RNAseq data (TpMuguga_02g02095). e A case where overlapping genes led to ambiguity in UTR coordinates, and so the UTRs were not defined in this intergenic region (TpMuguga_01g00527 and TpMuguga_01g00528). f A case of a single gene where alternative splicing exists (as seen by significant read coverage in at least one intronic region), but there is one most prevalent isoform (TpMuguga_03g00622). g A case of two genes that overlap by coding sequences. Coding exons are colored by reading frame (TpMuguga_05g00017 and TpMuguga_05g00018)
Fig. 2
Fig. 2
Comparative metrics of original and new T. parva annotations. a The percentage of proteins with at least one PFAM domain found by Hidden Markov Model searches of the predicted proteomes of the new T. parva Muguga annotation was 2% higher than those in the 2005 annotation, implying that the new annotation captures functional elements that were previously missed. b The new T. parva Muguga annotation has more reciprocal best-hit orthologs (N) with T. annulata Ankara than the 2005 T. parva Muguga annotation. The variation in protein length (SD) between T. parva and T. annulata ortholog pairs is greatly reduced in the new relative to the original T. parva annotation. Only nuclear genes were used for this analysis. The x-axis was limited to the range − 300 to + 300 for easy visual interpretation. c The number of canonical GT/AG intron splice sites increased and the number of non-canonical intron splice site combinations decreased in the new T. parva Muguga annotation compared to the 2005 annotation. d The number and proportion of introns validated by at least one RNAseq read increased in the new T. parva Muguga annotation compared to the 2005 annotation. These lines of evidence suggest that the new annotation is more accurate, and also considerably more consistent with the RNAseq data, as expected
Fig. 3
Fig. 3
Distribution of RNAseq RPKM values for T. parva Muguga genes (a) A histogram of sense RPKM values after logarithmic transformation of the data. Frequencies on the y-axis correspond to probability density. The blue line shows a normal distribution around the same median, while the red line shows a more reliable fixed-width, Gaussian, kernel-smoothed estimate of the probability density. b The sense (green) and antisense (red) reads per kilobase transcript per million reads (RPKM) after fourth-root transformation of the data. Genes are sorted by position on the chromosome for all four nuclear chromosomes of T. parva Muguga
Fig. 4
Fig. 4
The uncovered Theileria parva Alg14 shows a similar predicted structure to the empirically determined Saccharomyces cerevisiae Alg14 protein structure, and is syntenic in multiple piroplasms. a A Phyre2 prediction of T. parva Alg14 (TpAlg14; green; TpMuguga_01g02045) and the Protein Database (http://www.rcsb.org/) [34] nuclear magnetic resonance structure of Saccharomyces cerevisiae Alg14 (ScAlg14; teal; PDB 2JZC) were aligned in MacPyMol (https://pymol.org/2/) [35]. b Shown are the syntenic regions around Alg14 orthologs (synteny in grey), using the adjacent gene EngB as an anchor (synteny in red) in the Sybil software package [36]. c Shown are the syntenic regions around STT3 orthologs (synteny in grey), using a B. bovis STT3-adjacent gene (BBOV_II000210) as an anchor (synteny in red) in the Sybil software package

Similar articles

Cited by

References

    1. Spielman DJ. XVI public-private partnerships and pro-poor livestock research: the search for an East Coast fever vaccine. Washington, D.C.: The National Academies Press; 2009.
    1. Herrero M, Thornton PK, Notenbaert AM, Wood S, Msangi S, Freeman HA, Bossio D, Dixon J, Peters M, van de Steeg J, et al. Smart investments in sustainable food production: revisiting mixed crop-livestock systems. Science. 2010;327(5967):822–825. doi: 10.1126/science.1183725. - DOI - PubMed
    1. Nkedianye D, Radeny M, Kristjanson P, Herrero M. Assessing returns to land and changing livelihood strategies in Kitengela. In: Homewood K, Kristjanson P, Chevenix Trench P, editors. Staying Maasai? Livelihoods, conservation and development in East African Rangelands. Dordrecht: Springer; 2009. pp. 115–150.
    1. Baldwin CL, Black SJ, Brown WC, Conrad PA, Goddeeris BM, Kinuthia SW, Lalor PA, MacHugh ND, Morrison WI, Morzaria SP, et al. Bovine T cells, B cells, and null cells are transformed by the protozoan parasite Theileria parva. Infect Immun. 1988;56(2):462–467. doi: 10.1128/IAI.56.2.462-467.1988. - DOI - PMC - PubMed
    1. Tindih HS, Geysen D, Goddeeris BM, Awino E, Dobbelaere DA, Naessens J. A Theileria parva isolate of low virulence infects a subpopulation of lymphocytes. Infect Immun. 2012;80(3):1267–1273. doi: 10.1128/IAI.05085-11. - DOI - PMC - PubMed

Substances