Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Nov;38(21):7388-99.
doi: 10.1093/nar/gkq653. Epub 2010 Jul 26.

Characterization and prediction of protein nucleolar localization sequences

Affiliations

Characterization and prediction of protein nucleolar localization sequences

Michelle S Scott et al. Nucleic Acids Res. 2010 Nov.

Abstract

Although the nucleolar localization of proteins is often believed to be mediated primarily by non-specific retention to core nucleolar components, many examples of short nucleolar targeting sequences have been reported in recent years. In this article, 46 human nucleolar localization sequences (NoLSs) were collated from the literature and subjected to statistical analysis. Of the residues in these NoLSs 48% are basic, whereas 99% of the residues are predicted to be solvent-accessible with 42% in α-helix and 57% in coil. The sequence and predicted protein secondary structure of the 46 NoLSs were used to train an artificial neural network to identify NoLSs. At a true positive rate of 54%, the predictor's overall false positive rate (FPR) is estimated to be 1.52%, which can be broken down to FPRs of 0.26% for randomly chosen cytoplasmic sequences, 0.80% for randomly chosen nucleoplasmic sequences and 12% for nuclear localization signals. The predictor was used to predict NoLSs in the complete human proteome and 10 of the highest scoring previously unknown NoLSs were experimentally confirmed. NoLSs are a prevalent type of targeting motif that is distinct from nuclear localization signals and that can be computationally predicted.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
NoLS characteristics. (A) NoLSs are predominantly found in regions predicted by Jpred (24) as α-helices or coils and very rarely in regions predicted as extended β-strands. (B) NoLSs localize predominantly at the surface of proteins as predicted by Jnet (24) either at relative solvent accessibility thresholds <25% (JnetSol25), <5% (JnetSol5) or 0% (JnetSol0). (C) NoLSs are found predominantly at the ends of proteins. The error bars represent standard deviation.
Figure 2.
Figure 2.
Prediction of NoLSs using an ANN. (A) Sequence windows of size 13 overlapping with an offset of 1 are sparsely encoded into binary vectors of size 165 based on their amino acid sequence, position within the full-length protein sequence and elements of secondary structure. (B) The encoded vectors are fed to the ANN which outputs one score for each input window, attributed to the central residue of the window. (C) Peptides of length 20 are predicted as NoLSs if the average score of the 8 windows of size 13 they contain is >0.8.
Figure 3.
Figure 3.
ROC plots. The predictor was trained by 3-fold cross-validation using all types of negatives combined. The true positive rates (TPRs) versus false positive rates (FPRs) are plotted for the three different types of negatives tested collectively (allNegativeTypesCombined) and separately: randomly chosen cytoplasmic sequences (referred to as cyto), randomly chosen nucleoplasmic sequences (referred to as nuc) and curated non-NoLS NLSs (labelled nls). The accuracy measures of two encodings are shown: encodings based only on sequence (Seq) and encodings based on both sequence and additional structure elements (Seq-Struct). The diagonal line indicates the performance that would be expected at random.
Figure 4.
Figure 4.
Experimental validation by microscopy. (A) Fusion constructs of NoLSs chosen for experimental validation and successfully cloned downstream of GFP (Table 3) were transfected into U2OS cells and the resulting proteins were visualized by microscopy [GFP-NoLS() labelled columns]. The DAPI columns show staining of the DNA in these cells. (B) GFP and GFP-RBM34(324–345) were used as negative controls. The bars represent 15 µm.
Figure 5.
Figure 5.
Characteristics of predicted NoLS-containing proteins. For all cellular compartments considered, the fraction of proteins predicted to harbour a NoLS is shown. Protein counts for each compartment are indicated in parenthesis beside the compartment name. The compartment groups labelled with an asterisk include proteins annotated as being in this and any other compartment except the nucleolus. The 261 proteins in the nucleolus group represent all proteins annotated as being nucleolar regardless of any other localization annotations they may have (indicated by double asterisks). The error bars were determined by bootstrap.

References

    1. Scheer U, Hock R. Structure and function of the nucleolus. Curr. Opin. Cell Biol. 1999;11:385–390. - PubMed
    1. Boisvert FM, van Koningsbruggen S, Navascues J, Lamond AI. The multifunctional nucleolus. Nat. Rev. Mol. Cell Biol. 2007;8:574–585. - PubMed
    1. Olson MO, Dundr M, Szebeni A. The nucleolus: an old factory with unexpected capabilities. Trends Cell Biol. 2000;10:189–196. - PubMed
    1. Olson MO, Hingorani K, Szebeni A. Conventional and nonconventional roles of the nucleolus. Int. Rev. Cytol. 2002;219:199–266. - PMC - PubMed
    1. Pederson T. The plurifunctional nucleolus. Nucleic Acids Res. 1998;26:3871–3876. - PMC - PubMed

Publication types