Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2016 Oct 10:235:121-31.
doi: 10.1016/j.jbiotec.2016.04.023. Epub 2016 Apr 12.

Refined Pichia pastoris reference genome sequence

Affiliations
Review

Refined Pichia pastoris reference genome sequence

Lukas Sturmberger et al. J Biotechnol. .

Abstract

Strains of the species Komagataella phaffii are the most frequently used "Pichia pastoris" strains employed for recombinant protein production as well as studies on peroxisome biogenesis, autophagy and secretory pathway analyses. Genome sequencing of several different P. pastoris strains has provided the foundation for understanding these cellular functions in recent genomics, transcriptomics and proteomics experiments. This experimentation has identified mistakes, gaps and incorrectly annotated open reading frames in the previously published draft genome sequences. Here, a refined reference genome is presented, generated with genome and transcriptome sequencing data from multiple P. pastoris strains. Twelve major sequence gaps from 20 to 6000 base pairs were closed and 5111 out of 5256 putative open reading frames were manually curated and confirmed by RNA-seq and published LC-MS/MS data, including the addition of new open reading frames (ORFs) and a reduction in the number of spliced genes from 797 to 571. One chromosomal fragment of 76kbp between two previous gaps on chromosome 1 and another 134kbp fragment at the end of chromosome 4, as well as several shorter fragments needed re-orientation. In total more than 500 positions in the genome have been corrected. This reference genome is presented with new chromosomal numbering, positioning ribosomal repeats at the distal ends of the four chromosomes, and includes predicted chromosomal centromeres as well as the sequence of two linear cytoplasmic plasmids of 13.1 and 9.5kbp found in some strains of P. pastoris.

Keywords: Centromere; Genome; Killer plasmid; P. pastoris; RNA-seq; Splicing.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Open reading frames (ORFs) identified in the closed gaps of the P. pastoris chromosomes
Six small gaps of 60–200 bp as well as six larger gaps of 2–6 kbp present in the CBS7435 genome sequence could be closed. The ORFs found in these regions are marked with dark triangles. In addition, the orientation of the four chromosomes was standardized to show the ribosomal clusters at the 3′ ends. The vertical lines marked on all four chromosomes represent the annotation of ORFs.
Figure 2
Figure 2. Multiple sequence alignment of P. pastoris CBS7435 genome sequence (2011) and the P. pastoris CBS7435 reference genome sequence presented here
All alignments were performed using CLC Bio’s proprietary alignment algorithm. The dark areas correspond to perfect nucleotide matches whereas white areas denote mismatches or missing bases. N denotes bases missing in the 2011 genome sequence. A. The genes at location 475309..477983 (chr2) of the 2011 sequence (top) and 475237..478581 (chr2) of the 2016 sequence (bottom) were aligned against each other. B. The genes at location 2208981..2209753 (chr3) of the 2011 sequence (top) and 2214003..2221310 (chr3) of the 2016 sequence (bottom) were aligned against each other.
Figure 3
Figure 3. Exemplary intron splicing site prediction as identified by mapping RNA-seq reads to the P. pastoris reference sequence
A Gene at location 2262918..2263848 (chr1). Mapped reads verify the automated annotation. B Gene at location 586755..589073 (chr4). Due to the presence of mapped reads in the intron sequence the automated annotation had to be corrected. C Gene at location 579086..582205 (chr3). The intron identified in the RNA-seq data was incorporated into the automated annotation. The bottom part of each figure shows the gene as present in the genome sequence of P. pastoris CBS7435 published in 2011. The middle part corresponds to the RNA-seq reads. CLC Genomics Workbench version 7 was used for visualizations and manual corrections.
Figure 4
Figure 4. P. pastoris chr1 with the predicted ORFs annotated
The annotated ORFs are marked in dark grey. The putative centromere unique region is marked in bright gray.
Figure 5
Figure 5. Putative location of P. pastoris centromeres indicated by RNA-seq reads mapped to this reference sequence
A-D corresponds to chromosomes 1–4. The putative centromere regions are largely devoid of transcribed genes as can be seen by the marked drop in the RNA-seq signal strength. The dark triangles correspond to the location of the putative centromere on each chromosome. The 138 kbp mating type chromosomal inversion region is indicated by the dark bar on chr4. The log scale plot shows the transcriptome density with 4 kbp windows at 100 bp intervals normalized to the maximum density window of 900,000 for chr1.
Figure 6
Figure 6. Visualization of clustered centromeres in P. pastoris by confocal microscopy
This strain expressed DsRed-HDEL to label the endoplasmic reticulum in red. The ring visible in each cell is the nuclear envelope. In addition, the strain expressed Cse4-GFP to label centromeres in green. The merged image shows the two fluorescence signals overlaid on a transmitted light image of the cells. A cluster of centromeres is visible at the nuclear periphery in each cell. Scale bar, 2 μm.
Figure 7
Figure 7. Genetic Organization of the two linear plasmids identified in P. pastoris
Based on the plasmid sequences we identified 8 open reading frames on the 13.1 kbp killer plasmid and 6 open reading frames on the 9.5 kbp killer plasmid. Both plasmids are flanked by long terminal repeat sequences (LTR).

References

    1. Banerjee H, Kopvak C, Curley D. Identification of Linear DNA Plasmids of the Yeast Pichia pastoris. Plasmid. 1998;40:58–60. - PubMed
    1. Banerjee H, Verma M. Search for a novel killer toxin in yeast Pichia pastoris. Plasmid. 2000;43:181–3. - PubMed
    1. Bevis BJ, Hammond AT, Reinke CA, Glick BS. De novo formation of transitional ER sites and Golgi structures in Pichia pastoris. Nat Cell Biol. 2002;4:750–756. - PubMed
    1. Biggins S. The composition, functions, and regulation of the budding yeast kinetochore. Genetics. 2013;194:817–846. - PMC - PubMed
    1. Birney Clamp M, Durbin RE. GeneWise and Genomewise. Genome Res. 2004;14:988–995. - PMC - PubMed