
- Select a language for the TTS:
- UK English Female
- UK English Male
- US English Female
- US English Male
- Australian Female
- Australian Male
- Language selected: (auto detect) - EN
Play all audios:
Specific signals (degrons) regulate protein turnover mediated by the ubiquitin-proteasome system. Here we systematically analyse known degrons and propose a tripartite model comprising the
following: (1) a primary degron (peptide motif) that specifies substrate recognition by cognate E3 ubiquitin ligases, (2) secondary site(s) comprising a single or multiple neighbouring
ubiquitinated lysine(s) and (3) a structurally disordered segment that initiates substrate unfolding at the 26S proteasome. Primary degron sequences are conserved among orthologues and occur
in structurally disordered regions that undergo E3-induced folding-on-binding. Posttranslational modifications can switch primary degrons into E3-binding-competent states, thereby
integrating degradation with signalling pathways. Degradation-linked lysines tend to be located within disordered segments that also initiate substrate degradation by effective proteasomal
engagement. Many characterized mutations and alternative isoforms with abrogated degron components are implicated in disease. These effects result from increased protein stability and
interactome rewiring. The distributed nature of degrons ensures regulation, specificity and combinatorial control of degradation.
Regulated degradation (Deg) of proteins via the ubiquitin-proteasome system (UPS) is critical for diverse cellular processes such as cell cycle progression, transcription, immune response,
signalling, differentiation and growth1. Deg is spatio-temporally controlled and needs to be harmonized with protein synthesis and functionality to maintain proteostatic balance and to
achieve precise proteome remodelling in response to environmental and intracellular cues. This necessitates an intricate monitoring system capable of recognizing specific signals that mark
proteins for Deg (degrons).
A degron has been defined as a protein element that confers metabolic instability2. Although seemingly straightforward, its molecular correlates are often difficult to define and the term
degron has been used inconsistently in the literature. For example, the degron is often defined as the substrate site that is recognized by E3 ubiquitin ligases and a variety of such degrons
(short peptide motifs and specific structural elements) have been characterized3. The eukaryotic linear motif resource4 also classifies several short, linear motifs (SLiMs) as degrons. In
contrast, other studies have indicated the site of polyubiquitination or the polyubiquitin chain itself as the degron5. Lys48-ubiquitin linkages form the canonical signal for Deg by the 26S
proteasome, although other linkage types may also be recognized6. Recently, Matouschek and colleagues7 suggested that successful Deg requires another additional element, an intrinsically
disordered Deg initiation site on the substrate that facilitates substrate unfolding and entry into the proteasome catalytic core, which is accessible only through a narrow channel8. They
subsequently suggested that degrons are bipartite, composed of the substrate-bound polyubiquitin tag and an appropriately spaced disordered Deg initiation site9.
In this study we focus on signals that activate proteolysis when protein function is no longer required, that is, regulated Deg. To understand substrate commitment and entry into the Deg
pathway, here we have systematically analysed known degrons based on multiple data sets and hypothesize that the minimal region necessary and sufficient for UPS-mediated regulated Deg is
composed of three substrate elements. The primary degron is a short, linear (peptide) motif located mostly within structurally disordered regions (less often within surface-exposed segments
of structured domains) and contains a specific sequence pattern that is recognizable by cognate E3 ligases. The secondary degron is one (or multiple neighbouring) substrate lysine(s) present
on a defined surface region of the substrate (ubiquitination zone10). These lysines possess certain contextual preferences that favour (poly)ubiquitin conjugation such as a moderate-to-high
degree of local structural flexibility and a biased amino acid composition in its neighbourhood. Finally, the tertiary degron is a disordered/locally flexible Deg initiation site located
proximal to (or overlapping with) the secondary degron. We demonstrate that known degron components show a significant correlation with intrinsically disordered regions (IDRs) and highly
flexible substrate segments. We had previously analysed the manifold regulatory advantages of structural disorder in enzymatic components involved in ubiquitination11 and here we suggest
that the multi-layered substrate degron architecture reflects the complexity of proteostatic regulation. Thus, this study lays a solid foundation for a model where the blueprint for protein
Deg is encoded in a distributed (combinatorial) architecture, determining the diversity and specificity of Deg, and thereby enabling a complex and spatiotemporal rewiring of the interactome.
Proteostasis entails a balance between protein synthesis, functional regulation and Deg. The regulatory complexity is clearly apparent at the synthesis and functional levels, but much less
data are available detailing the regulatory elements at the level of Deg (Supplementary Fig. 1). The control of protein production at the transcriptional stage involves a complex interplay
between transcription factors (TFs) and DNA regulatory elements (for example, promoters, enhancers and silencers). The transcription of ∼20,000 human genes is regulated by ∼1,800 TFs and
∼636,000 genomic binding sites have been mapped for 119 TFs. Overall, the number of DNA regulatory elements in the genome may reach into millions12. Following synthesis, a large variety
(∼300 types) of posttranslational modifications (PTMs) regulate protein localization, activity and interactions, in a synergistic and combinatorial manner13. The numbers of enzymes for
certain modifications can be in the hundreds (for example, several hundred human kinases modifying ∼100,000 phosphosites13). Further, a plethora of peptide (sequence) motifs are known to
direct a diverse range of protein functions (binding, modifications, localization, proteolytic cleavage and so on) and their total number has been estimated to be around one million in the
human proteome13.
E3 ubiquitin ligases confer specificity in selecting substrates for UPS-mediated Deg, and consistent with a balance between the regulatory complexity of protein synthesis and functions
vis-à-vis Deg the estimated number of human E3s is ∼600 (ref. 11). This is commensurable with the numbers of TFs and modification enzymes, for example, kinases. However, the balance breaks
down when we observe the paucity of characterized E3-specific primary degrons. The number of SLiM-type degrons identified thus far is only 28, of which 25 types are found in human proteins
(Table 1 and Supplementary Table 1). Experimental validation is available only for a limited number of corresponding substrates (93 human substrates; Supplementary Table 2) and even with
predicted substrates (based on permissive criteria) no >30% of the human proteome can be putatively covered (Supplementary Fig. 1). Further, some of these motifs are highly degenerate (for
example, the DBOX and SPOP sequence patterns); thus, even after filtering the number of false predictions is likely to be high and significant exploratory research will be required to
validate these predicted candidate proteins as true substrates.
In principle, one might expect many more E3 ligases to function by the recognition of a specific SLiM as a primary degron; therefore, the number of unique degrons should be much larger than
the current number (that is, 28). Thus, a large part of the ‘degrome’ (that is, the full complement of degrons) remains to be explored. We propose that (in addition to as-yet uncharacterized
primary degron types and uncharacterized substrates carrying known primary degrons), the regulatory complexity in Deg arises from a tripartite (distributed) degron architecture (Fig. 1),
which enables a combinatorial use of degron components making the degrome commensurable in complexity with transcriptional and posttranslational regulatory elements in the proteome.
Schematic of a substrate protein (box) with the degron components as indicated: primary degron (yellow), lysine(s) and ubiquitin chain(s) (orange), and the disordered Deg initiation site
(red). Lysines (K) are indicated. (a) Specific recognition of substrate by its cognate E3 ligase (pale green) mediated by the primary degron. (b) The E3–E2 (light pink) complex catalyses
formation of poly/mono-Ub chain(s) (orange circles) on appropriate acceptor lysine(s). (c) The ubiquitinated substrate is recognized by the regulatory subunit(s) of the 26S proteasome. This
involves simultaneous recognition of ubiquitin chain(s) and binding to a disordered Deg initiation site that is required for (d) local unfolding and transfer into the proteolytic core of the
proteasome. The model of the proteasome has been adapted from ref. 67.
However, it should be clarified that although many more peptide (SLiM type) primary degrons may be anticipated, not all E3 ligases necessarily bind to peptide degrons. For example, the E3
ligase listerin/Ltn1 forms part of the large ribosomal subunit-associated quality control complex that facilitates translational surveillance in eukaryotes by ubiquitin-tagging defective
polypeptides from stalled ribosomes14. In another example, a distributed structural degron dispersed across 523 residues of the amino-terminal transmembrane domain of yeast
3-hydroxy-3-methylglutaryl-coenzyme A reductase isozyme Hmg2p is required for its Hrd1-dependent regulated Deg15. It is not clear at the moment what fraction of E3 ligases use peptide
degrons for substrate recognition.
E3 ligases target specific substrates for Deg by recognizing their primary degron (Fig. 1a). We catalogued 28 primary degron motifs encompassing a broad functional range, from 171
experimentally validated instances in 157 diverse substrates (Table 1 and Supplementary Tables 1 and 2). Analysis of their properties revealed that primary degrons resemble typical SLiMs.
SLiMs are short (typically 5–10 residues), evolutionarily conserved functional peptide segments present within IDRs that mediate interactions with partner proteins16. Functional regions
(such as binding motifs) behave as islands of sequence conservation within fast diverging IDRs. As E3 ligases bind substrates via their primary degrons, as part of a serious decision on the
protein’s fate, the functional importance of degrons across orthologues is clearly reflected in their highly significant sequence conservation (Fig. 2a).
(a) Plot of the average sequence entropy of primary degron sequences (degron) versus their flanking residues (flanking) for proteins belonging to the primary degron data set. Points above
the diagonal indicate proteins, for which degron is smaller than flanking, that is, the degron sequence is more conserved in comparison with the degron-flanking sequences. (b) Box plot
showing the average IUPred disorder scores of primary degron residues, flanking residues and the remaining protein sequences (for all the primary degron instances). IUPred scores of 0.5 and
above indicate disordered residues. (c) Protein Data Bank (PDB) structure of the inhibitor of nuclear factor-κB kinase subunitβ, IKKB (PDBid: 4e3c) with the primary degron indicated. The
residues are coloured according to IUPred disorder scores (colour scale is shown, red indicates disordered residues). (d) PDB structure of unbound ZAP-70, an essential tyrosine kinase
important in immune response (PDBid: 4k2r). In the structure, a 45-residue segment containing the 7-residue degron is missing from the electron density (red dots). The residues are coloured
according to IUPred disorder scores (colour scale is shown). (e) Average ASA Z-scores for primary degrons versus their flanking residues (see Methods). Points lying above the diagonal
indicate proteins for which the average Z-ASA for the degron segment is lower than that of the flanking regions. Flanking regions in a,b and e include ten residues on the N-terminal and ten
residues on the C-terminal side neighbouring the primary degron.
In terms of their structural preferences, primary degrons tend to be located within segments that are predicted to be intrinsically disordered (IUPred17, Fig. 2b), with high local backbone
flexibility (DynaMine18, Supplementary Fig. 2), as compared with the overall substrate sequences in which they occur (P0.8 indicates ordered residues. Using the IUPred scores, LDRs were
defined as consecutive stretches of at least 20 disordered residues (breaks of upto three consecutive ordered residues within an LDR were permitted).
Secondary structure propensities were calculated from primary sequence using the PSIPRED software20 using default settings. The output provides for each residue a classification: C (coil), H
(helix) or E (strand).
SPINE-X achieves accuracy >80% in predictions of ASAs based on amino acid sequences as input22. SPINE-X outputs ASA values for every residue in the input sequence. Absolute ASAs were
converted into Z-scores, to facilitate comparison of relative solvent accessibility between regions consisting of different residue types. The protocol used was as follows: using SPINE-X
predictions for all 157 proteins in our primary degron data set, we built ASA distributions for each of the 20 amino acid types (Supplementary Fig. 4). Next, for a specific motif instance,
the absolute ASA of each motif residue was converted into a Z-score (using the ASA distribution corresponding to its amino acid type) and the average Z-score calculated for that motif. The
same protocol was followed when estimating the average Z-score for motif-flanking regions.
Pre-computed multiple sequence alignments of orthologues were obtained from Discovery@Bioware (http://bioware.ucd.ie/~compass/biowareweb/) and used to calculate Shannon entropy scores for
each aligned position ‘i’ using the equation: S(i)=−Σp(k).ln(p(k)), where p(k) is the probability of the ith position in the sequence alignment being occupied by a residue of class ‘k’. The
classifications of residues used were as follows: [(Ala, Val, Leu, Ile, Met, Cys), (Gly, Ser, Thr), (Asp, Glu), (Asn, Gln), (Arg, Lys), (Pro, Phe, Tyr, Trp), (His)]60. Substitutions within a
group were considered conservative. The lower the sequence entropy at a given alignment position, the higher its evolutionary conservation. For a given region (for example, primary degron
sequence), the sequence entropy values for each motif position, S(i), were calculated and then averaged (motif=ΣS(i)/nmotif, where nmotif is the number of motif residues).
We used the following data sets of ubiquitinated lysines for analysis (Supplementary Data 3):
A set of 42 mammalian proteins where the ubiquitination of 108 lysines had been linked to their Deg (this set is referred to as Deg). This set had been compiled (and used in an earlier
publication) based on the following criteria39: (a) all the Ubsites had been studied in vivo; (b) existence of literature and database (UniProt61, UbiProt62, Phosphosite63) evidence of
ubiquitination, detected either by high-throughput mass spectrometry or by point mutations of specific lysines that abolish ubiquitination; (c) proteins for which the data quality precluded
complete detection of Ubsites or proteins with ambiguous sites were excluded; and, importantly, (d) the 42 proteins have experimental evidence of undergoing UPS-mediated Deg (or processing)
after ubiquitination. Deg lysines were compared with a control data set comprising the remaining 1,024 lysines (Others) from the same 42-protein set.
A large-scale set of experimentally validated Ubsites in human proteins had previously been collected and used to train a human-specific ubiquitin site predictor (hCKSAAP_UbSite)64. The
outcome of ubiquitination was unknown for this set of proteins. This data had been compiled from two recent proteomics-scale assays (in which the Ubsites were assigned based on enrichment of
endogenous ubiquitinated peptides using affinity purification followed by high-resolution mass spectrometry35,36) and from literature-derived UniProt annotations; the final set was prepared
by filtering the proteins for sequence redundancy (using a 30% identity cutoff). This list was matched against the current UniProt release and contained 9,323 Ubsites (from 3,756 human
proteins) that we refer to as Ubsites. For comparison, we used a set of 9,318 non-ubiquitinated lysines assembled from the same set of proteins (Non-Ubsites).
Sequence windows of 21 residues centred on the lysines were created for analyses of their features. In cases where the Lys was located near the termini of the protein chain, truncated
sequence windows were used.
iRefWeb65 was used to retrieve data for known protein–protein interactions. The following filters were applied: (i) only physical interactions based on experimental validation; (ii) the
interaction had been described in at least one publication; (iii) single-organism interactions only (that is, both proteins were from the same organism); and (iv) MI (MINT-inspired) score of
0.4 or more. This score is a measure of confidence in the observed interaction.
Two data sets were used for cataloguing instances where degron elements were adversely affected: the primary degron data set of 157 proteins (Supplementary Table 2) and the Deg data set of
42 proteins (Supplementary Data 3). The Deg data set contains secondary degron instances (that is, Deg-linked lysines) and tertiary degrons were also defined using the same data set (that
is, LDRs nearest to each Deg lysine). We searched UniProt annotations (feature tables, denoted ‘FT’) for each of these proteins for cases where any of the degron components were
missing/altered:
Sequence variants (alternative sequence or isoforms, denoted ‘VAR_SEQ’ in UniProt annotation) resulting from alternative splicing, alternative promoter usage, alternative initiation and
ribosomal frameshifting; alternative splicing was the most abundant.
Mutations (denoted ‘MUTAGEN’) corresponding to site(s) that have been experimentally altered by mutagenesis and their effects studied.
Sequence variations (position specific, denoted ‘VARIANT’) as reported by authors; validated human polymorphisms are linked to entries in the Single Nucleotide Polymorphism database66.
Entries in this category also include disease-associated mutations.
All statistical tests for calculating P-values were carried out using the Mann–Whitney U-test, unless otherwise specified.
How to cite this article: Guharoy, M. et al. Tripartite degrons confer diversity and specificity on regulated protein degradation in the ubiquitin-proteasome system. Nat. Commun. 7:10239
doi: 10.1038/ncomms10239 (2016).
We thank Professors Shoshana Wodak, Joël Janin and Madan Babu for advice and comments on the work, and Dr Rita Pancsa for help with DynaMine calculations. This work was supported by the
Odysseus grant G.0029.12 from Research Foundation Flanders (FWO) to P.T. and a fellowship from the Marie Curie Initial Training Network project 264257 (IDPbyNMR) from the European Commission
to P.B. M.G. is the recipient of a VIB/Marie Curie COFUND Postdoctoral (omics@VIB) fellowship.
Mainak Guharoy and Pallab Bhowmick: These authors contributed equally to this work
VIB Structural Biology Research Center (SBRC), Vrije Universiteit Brussel (VUB), Building E, Pleinlaan 2, Brussels, 1050, Belgium
Mainak Guharoy, Pallab Bhowmick, Mohamed Sallam & Peter Tompa
Institute of Enzymology, Research Center for Natural Sciences, Hungarian Academy of Sciences, Budapest, 1117, Hungary
M.G. and P.T. conceived the study and wrote the paper. M.G. and P.B. performed the research with assistance from M.S. All authors analysed the results.
Supplementary Figures 1-9, Supplementary Tables 1-5 and Supplementary References (PDF 2751 kb)
List of experimentally observed PTMs within primary degrons and degron flanking residues annotated in UniProt. (XLSX 44 kb)
List of residues flanking primary degrons that undergo post-translational modifications with experimental annotation about the effect of mutations. (XLSX 13 kb)
Lists of lysine residues for the four datasets used (Deg, Others, Ubsites and Non-Ubsites). (XLSX 188 kb)
Lists of proteins with characterized isoforms, variants and mutants that affect primary degrons. (XLSX 75 kb)
Lists of proteins with characterized isoforms, variants and mutants that affect secondary degrons. (XLSX 77 kb)
Lists of proteins with characterized isoforms, variants and mutants that affect tertiary degrons. (XLSX 62 kb)
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons
license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to
reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
Anyone you share the following link with will be able to read this content: