nowbotantique

12.08.2019

Protein Stability Prediction Tool

nowbotantique.netlify.com › ▀ Protein Stability Prediction Tool

Protein Stability Prediction Tool 7,0/10 4916 reviews

Acta Crystallogr F Struct Biol Commun. 2016 Feb 1; 72(Pt 2): 72–95.

Protein Stability Review
Protein Function Prediction Tools

Eris - is a protein stability prediction server. This server calculates the change of the protein stability induced by mutations (ΔΔG) utilizing the recently developed Medusa modeling suite. In our test study, the ΔΔG values of a large dataset (500) were calculated and compared with the experimental data and significant correlations are found. CUPSAT (Cologne University Protein Stability Analysis Tool) is a web tool to analyse and predict protein stability changes upon point mutations (sin.

Published online 2016 Jan 26. doi: 10.1107/S2053230X15024619

PMID: 26841758

This article has been cited by other articles in PMC.

Abstract

Protein stability is a topic of major interest for the biotechnology, pharmaceutical and food industries, in addition to being a daily consideration for academic researchers studying proteins. An understanding of protein stability is essential for optimizing the expression, purification, formulation, storage and structural studies of proteins. In this review, discussion will focus on factors affecting protein stability, on a somewhat practical level, particularly from the view of a protein crystallographer. The differences between protein conformational stability and protein compositional stability will be discussed, along with a brief introduction to key methods useful for analyzing protein stability. Finally, tactics for addressing protein-stability issues during protein expression, purification and crystallization will be discussed.

Keywords: protein stability, protein crystallization, protein disorder, crystallizability

1. Introduction

The main purpose of this review is to introduce the reader to the concepts of protein stability from the viewpoint of a structural biologist, a structural biologist being defined as a scientist who determines the detailed molecular structure of a protein using methods such as crystallography, NMR spectroscopy or cryo-EM. Particular emphasis will be given to crystallographic techniques, as protein stability, or the lack thereof, represents a substantial challenge in the crystallization of many proteins. Protein stability is a wide-ranging topic including aspects of physical chemistry, thermodynamics, entropy, computational chemistry, protein folding and dynamics. For the purposes of this review, many of the computational and theoretical aspects are skipped over and the reader is referred to other excellent reviews on this topic (Compiani & Capriotti, 2013; Lazaridis & Karplus, 2002).

Stability is the potential of a pattern to survive over time, and therefore is integral to our understanding of biological systems and their evolution (Schrödinger, 1945 ▸). Clearly, the exact meaning of a ‘pattern’ for a protein molecule is somewhat vague, but we know that processes such as protein unfolding, denaturation, degradation, conformational change, enzymatic modification and proteolytic cleavage may transform this ‘pattern’. These transformations are generally considered, or analyzed, with respect to the integrity of the primary and conformational structure of the fully folded protein. Additionally, protein stability means different things to different people. For example, a pharmacologist, biotechnologist or food scientist may primarily consider the half-life of a protein’s activity as a measure of its stability. However, a protein chemist or a structural biologist may concern themselves with changes in the primary, secondary, tertiary or quaternary structure of a protein as a measure of its stability. Again, for the purposes of this review, we will focus on the structural aspects of protein stability and will refer the reader to other excellent reviews on protein stability from a pharmacological and biotechnological perspective (Hall, 2014).

We will first discuss protein stability as a fundamental prerequisite for crystallization (§2) and then some important aspects of stability on a higher, structural level (see §3). At this stage it is important to discuss the differences between thermodynamic protein stability and conformational protein disorder, especially given some of the unique parameters that structural biologists use to describe and analyze their structures. For example, NMR spectroscopists often report root-mean-squared deviation (r.m.s.d.) values between their ensemble structures, whereas crystallographers report B factors as a measure of the positional uncertainty in a given protein crystal structure model. Both of these parameters represent displacements and disorder within a structure and can be reflective of the level of conformational stability. We will then discuss some important factors to consider when expressing and purifying proteins for structural studies. Structural genomics efforts have alleviated many of the bottlenecks of a traditional structure-determination pipeline, but researchers are still all too aware of the difficulties of expressing and purifying challenging protein targets. Careful consideration of the primary structure, construct design, expression conditions and hosts cells can all be used to mitigate many of the protein-stability issues observed during expression and purification (see §4). We will then discuss some common methods used to analyze protein stability, with a focus on methods routinely used to asses protein stability, including protein melting temperature analysis (T_m), NMR and cryo-EM (see §5).

2. Stability is a fundamental prerequisite for crystallization

Biomolecular crystallization can be described as the self-organization of macromolecules into a translationally periodic arrangement with long-range order. In order to achieve this goal optimally, the moieties within each asymmetric unit of a crystal should be of the same kind and of the same shape. If a protein cannot form such stable entities per se, a fundamental and primary requirement for crystallization is not met, and no effort to find suitable thermodynamic and kinetic conditions will lead to crystals of such a protein construct (Fig. 1 ▸a).

Factors influencing protein stability. (a) Protein compositional stability and conformational stability as key determining factors for successful crystallization. The stability properties of the protein determine whether the process of crystal formation is possible. Thermodynamics establish the necessary conditions for crystallization, and the kinetics and dynamics of the processes determine whether a possible scenario actually becomes reality. Only if all of the parameters are satisfied will crystal formation proceed. Figure adapted from Rupp (2015). (b) The marginal net stability of a folded protein is highlighted with respect to the contributing factors; the overwhelming lack of conformational stability is only marginally balanced by the contribution of van der Waals (VdW), hydrogen-bonding (H-bonds) and hydrophobic forces. Figure adapted from http://bit.ly/1L921Oi.

It is important to note that from a crystallization perspective, there are at least two major flavors of protein stability: compositional stability and conformational stability (Table 1 ▸). The crystallographer must carefully assess both types of stability in order to enable crystallization of the target protein.

Table 1

Compositional stability versus conformational stability: some important questions to ask when embarking on the crystallization of a protein and some factors to investigate for problem proteins

Global types of protein stability	Key questions to ask when trying to crystallize a protein	Required answers for successful crystallization	Factors to investigate if the answer is ‘No’
Compositional stability	Is the chemical makeup of the protein well defined?	Yes	Check amino-acid sequence. Check for PTMs, especially proteolysis. Purify protein more. Carry out more rigorous bioanalytical methods such as mass spectrometry and light scattering.
	Does the protein have a high level of chemical homogeneity?	Yes
	Is the protein chemically stable in the crystallization conditions?	Yes	Use customized or less harsh crystallization screens. Explore different temperatures for screening.
	Is the protein stable over the course of the crystallization experiment?	Yes
Conformational stability	Are there minimal disordered regions in the protein?	Yes	Redesign expression constructs to engineer out disordered or dynamic regions. Identify stabilizing protein partners or ligands.
Conformational stability	Does the protein have a minimal content of domains or regions that undergo dynamic variability over time?	Yes

2.1. Compositional stability

During the processes of crystallization it is essential to maintain the same species within the crystallization experiment; there needs to be some form of compositional stability. On a simple level this means that the protein molecules must have the same chemical makeup. The chemical homogeneity of a sample can often be determined using mass spectrometry or an SDS–PAGE gel. Compositional homogeneity is typically compromised by post-translational modifications, such as glycosylation and proteolysis, which can affect the primary structure of the protein molecules and generate compositional variability (see §3.1). Because protein crystallization takes time, the primary requirement for compositional stability must be maintained over a period of time, and preferably within a reasonable range of environmental conditions. It is important to note that there is no such thing as absolute stability. For example, a protein that is compositionally stable enough to produce a single band on an SDS–PAGE gel may still not be stable enough over the timeframe of a crystallization experiment.

2.2. Conformational stability

Assuming that the protein sample has a degree of compositional homogeneity, it will still likely not crystallize unless it possesses conformational stability. A large number of proteins fall into the category of conformationally disordered proteins displaying little or no conformational order (Longhi et al., 2010). A protein with substantial disordered regions, or separate domains exhibiting dynamic variability, will be less likely to self-organize into a crystal. This can be the case even if the sample has perfect compositional stability. The strict requirement for limited conformational variability is a unique problem that a crystallographer faces when trying to crystallize a protein sample. The problem is confounded by the fact that the conformation of flexible regions of a protein is a context-driven property. For example, conformations may be quite different in the cellular context, in an NMR tube or in a macromolecular crystal. While structural methods can be used to probe conformational homogeneity, it is important to realise that the results are only meaningful within the context and conditions of that particular method (see §5). For example, analysis of conformational stability and dynamics is often limited using crystallographic methods as the crystal packing can hinder such movements. In these cases NMR solution methods can provide complementary information.

Structures determined using X-ray crystallography provide limited information regarding the dynamics of the protein structure. Nonetheless, some dynamics information is included in the atomic model in the form of the atomic displacement parameter (ADP) or B factor. The B factor is expressed in units of Å² and is essentially a statistical measure describing the probability of finding an atom at that particular mean position in the structure (Willis & Pryor, 1975 ▸). If the B factor of a particular atom is high then it suggests that the certainty of finding the atom at that position in the structure is low. Atoms in regions of high B factor can be displaced as a result of dynamic disorder of the polypeptide chain or as a result of short-range or long-range disorder within the crystal. Such flexible or dynamic regions can often be identified in a crystal structure and engineered out at the cloning stage to produce protein samples with better conformational stability (see §§4.1 and 6.1). Not only do such modifications result in better protein stability during expression and purification, but they also increase the probability that the molecules will pack within the crystal lattice in a more orderly fashion. As a consequence, such efforts often result in better diffracting crystals and higher-resolution X-ray data.

Comparison of ensembles of structures, as typically generated by NMR spectroscopy, can be used to provide a measure analogous to the crystallographic B factor in the form of a root-mean-squared deviation (r.m.s.d.) between corresponding atoms of the ensemble members. This measure can be used to assess the flexibility, dynamics, disorder or stereochemical variability across a set of structural models. The r.m.s.d. value is complementary to the crystallographic B factor and because the structure is in solution it is not perturbed or influenced by crystal packing (Sikic & Carugo, 2009).

A large class of proteins, referred to as intrinsically disordered proteins (IDPs), contain significant levels of conformational disorder and, in some cases, have no discernable three-dimensional structure at all. It is estimated that ∼40% of all human proteins contain at least one disordered segment and ∼25% are completely disordered (Uversky & Dunker, 2010). These proteins have largely been avoided by the crystallographic community owing to the expected difficulties in crystallizing them. However, NMR techniques have been central to unraveling how these unstructured proteins function. Such studies have led to a paradigm shift in our understanding of protein structure and function (Wright & Dyson, 1999). Traditional theory dictates that proteins function by adopting a rigid, preformed structure that binds to a target ligand or protein in a fashion analogous to a lock and key. However, NMR studies on IDP proteins such as CREB, p53 and 14-3-3 have revealed that these disordered regions allow plasticity and flexibility and often only form structure upon binding of the partner protein (Oldfield et al., 2008; Sugase et al., 2007; Mujtaba et al., 2004). These so-called ‘hub proteins’ are capable of interacting with many different protein partners in a context-sensitive manner, and this is only possible as a result of the plasticity and initial lack of conformational stability. High-resolution crystal structures of complexes of these vital ‘hub proteins’ will be essential for understanding their role in human diseases such as Parkinson’s disease and Alzheimer’s disease (Wang et al., 2011). Although the poor conformational stability of these proteins poses challenges for the protein crystallographer, in a cellular context IDPs offer many advantages over more traditional single-function folded proteins, including the ability to bind to many different protein partners (Liu & Huang, 2014).

In addition to IDPs, many proteins contain aggregation-prone regions (APRs) that typically contain a run of 5–15 amino acids with a propensity for forming extended β-sheet structures. For example, APR segments are observed in β₂-microglobulin and are responsible for aggregation into amyloid fibers in diseases such as amyloidosis (De Baets et al., 2014). Another group of proteins referred to as intrinsically insoluble proteins (IIPs) are completely insoluble and cannot be refolded in traditional buffer solutions (Goyal et al., 2015; Liu & Song, 2009). For example, naturally occurring mutants of SH3, such as V22-SH3, are insoluble in the presence of ions, but they can be resurrected and solubilized in pure water, allowing further study of the unstructured proteins in solution using NMR spectroscopy (Liu & Song, 2009).

3. Stability of the protein on a structural level

One simple way of conceptualizing protein stability from a structural perspective is to consider stability at each level of protein structure: primary structure, secondary structure, tertiary structure and quaternary structure (Table 2 ▸). Protein stability with respect to each of the structural levels will now be discussed in turn. Wherever possible, we will emphasize aspects of particular importance to the structural biologist, with a particular focus on protein crystallization.

Table 2

Common measures of protein stability

Definitions of protein stability at each structural level are shown along with common methods used to analyze the degree of stability. Asterisks denote the relative merits of the three main structure-determination techniques, with five asterisks denoting the optimal method. For example, NMR solution methods are often more favorable for studying dynamic processes and quaternary states as they are not influenced by crystal packing.

Relative merits of structural methods
Structural level	Definition of stability	Example biochemical processes or features	Common methods	Crystallography	NMR	EM
Primary	Change of amino-acid sequence or modification of amino acids	PTM Proteolysis Protein splicing	Half-life analysis SDS–PAGE Mass spectrometry Eastern and Western blotting	*****	****	*
Secondary	Change of α-helix, β-sheet and loop content	Secondary-structure formation Racemization Aromatic side-chain interactions Ligands	Circular dichroism (CD) Synchrotron-radiation CD UV-CD FT-IR 2D-IR Deuterium-exchange mass spectromety (DXMS)	*****	*****	*
Tertiary	Change of overall fold or protein conformation	Hydrogen bonding Hydrophobic interactions Conformational change Disulfide bonding Topology	ITC DSC Thermofluor	****	****	**
Quaternary	Change in oligomeric state	Protein–protein interactions Oligomerization	Size-exclusion chromatography Native gel electrophoresis	*	*****	*****

3.1. Primary structure

The primary structure of the protein, or the sequence of the amino acids in the polypeptide chain, can be modified in several ways by post-translational modifications (PTMs). PTMs result in alteration of the structure and function of a protein and for this reason are central to any discussion of protein stability. As discussed above (see §2), PTMs can affect both compositional stability, as the modifications may be non-uniform or incomplete, and also conformational stability, as the modifications may affect protein disorder and dynamics. This is illustrated by glycoproteins, which are often not uniformly glycosylated at all possible glycosylation sites, therefore leading to compositional heterogeneity. Furthermore, complex hydrocarbon chains tend to have a greater degree of conformational freedom. This conformational freedom results in an increase in disorder on the protein surface, while at the same time shielding polar or charged residues on the protein surface required for intermolecular crystal contact formation (see §6.6). Although the heterogeneity of glycosylation tends to impair crystallization, its variability can have important functional implications for a protein.

PTMs are the result of many different changes to the primary structure of a protein, including proteolytic processing, protein splicing and the addition of other functional groups to the amino acids. PTMs are often used for targeting of the protein to a specific region of the cell or modification of the activity or specificity of an enzyme. For example, functional groups such as myristate, palmitate, isoprenoid and glycosylphosphatidylinositol (GPI) are often attached to the protein and used for targeting of the protein to the membrane (Chatterjee & Mayor, 2001). Other functional groups such as carboxylate (Walker et al., 2001), ethanolamine phosphoglycerol (Whiteheart et al., 1989) and hypusine (Park et al., 2010) can be added to proteins to regulate their activity. Additionally, larger peptides and proteins can also be covalently added to proteins, including ubiquitin (Komander & Rape, 2012), SUMO (Hay, 2005), ISG15 (Malakhova et al., 2003), PUP (Striebel et al., 2014) and NEDD (Rabut & Peter, 2008). Of the 821 182 proteins that were experimentally analyzed by Khoury and coworkers, the top ten observed PTMs are phosphorylation (58383), acetylation (6751), N-linked glycosylation (5526), amidation (2844), hydroxylation (1619), methylation (1523), O-linked glycosylation (1133), ubiquitylation (878), pyrrolidone carboxylic acid (826) and sulfation (504) (http://bit.ly/1jdfXR8; Khoury et al., 2011). Key methods used to analyze and identify changes at the primary-structure level include mass spectrometry and Eastern and Western blots (Liu et al., 2014; Towbin et al., 1979; Table 2 ▸).

It is important to note that many PTMs play a role in stabilizing proteins, particular with respect to the half-life and turnover of the protein within the cell. For example, PTMs such as ubiquitination target proteins to the proteasome for degradation and recycling, therefore directly affect the half-life of the protein and its stability within the cell (Komander & Rape, 2012). A myriad of other PTMs exist, including acylation, alkylation, arginylation, butyrylation, malonylation, ADP-ribosylation, iodination, oxidation, succinylation, S-nitrosylation, S-glutathionylation and glycosylation. Currently, just under 500 PTMs have been identified in the SWISS-PROT and TrEMBL databases (for a full list, see http://bit.ly/1P6Rbj3). All of these modifications play a role in the structure and the function of the target protein. However, some PTMs, such as proteolytic cleavage and protein splicing, significantly influence protein structure at the primary level and can lead to drastic changes in compositional stability.

Protein splicing occurs in proteins called inteins (or protein introns), which are a large class of self-cleaving proteins found in all domains of life (Paulus, 2000; Novikova et al., 2014). One of the first examples identified was the VMA1 protein, a yeast vacuolar membrane H⁺-ATPase, which was shown to undergo protein splicing (Hirata et al., 1990). Protein splicing is a naturally occurring process analogous to the splicing of introns from RNA. A precursor polypeptide is processed into a mature and functional protein. The intein is autocatalytically excised from the precursor protein and the flanking exteins are ligated together, producing two new polypetides (Mills et al., 2014). Inteins are of great importance for the stability of proteins, but they are also of interest from a protein-engineering perspective (Aranko et al., 2014). For example, inteins can be used for the preparation of isotope-labeled proteins for NMR spectroscopy, for site-specific fluorescent labeling and as self-cleaving affinity purification tags such as cSAT and intein-CDB (commercially available as the IMPACT system from NEB; Chong et al., 1997; Volkmann & Iwaï, 2010; Lin et al., 2015; see §4.1).

3.2. Secondary structure

Protein secondary structure is the localized three-dimensional structure of the polypeptide chain. Secondary structure can be described in terms of the pattern of hydrogen bonding between amide H atoms and carbonyl O atoms of the backbone (Pauling et al., 1951) or by the stereochemistry adopted by the polypeptide backbone (Ramachandran et al., 1963). On a somewhat simplified level, the primary driving forces behind the formation of secondary structure, and in turn tertiary structure, are hydrogen bonding and hydrophobic interaction (Pace, Scholtz et al., 2014; see §3.3).

The α-helix is the predominant type of secondary structure, accounting for approximately one-third of all secondary-structure elements (Stickle et al., 1992). Analysis of the first crystal structures suggested that certain residues including alanine, leucine and glutamate are found frequently in α-helices. In contrast, other residues such as proline, glycine and aspartic acid are found less frequently (Davies, 1964; Prothero, 1966; Guzzo, 1965). This information has been used to develop many algorithms for the prediction of protein secondary structure, including the popular Chou and Fasman method (Chou & Fasman, 1974). Secondary-structure propensity data have been expanded using mutagenesis data, and tables of α-helical (Pace & Scholtz, 1998) and β-sheet (Smith et al., 1994) propensity have been compiled.

One overwhelming consensus of the amino-acid propensity rules is the destabilizing effect that proline has on the α-helix (ΔΔG of 3.16 kcal mol⁻¹cf. alanine at 0 kcal mol⁻¹; Pace & Scholtz, 1998; see §3.3). This destabilization is a result of the missing backbone amide H atom, which prevents proline from participating in stabilizing hydrogen bonding. Additionally, the bulky cyclic side chain of proline results in a ∼30% kink in the α-helix backbone as a result of steric hindrance (Richardson, 1981; Yun et al., 1991). Glycine has the next lowest propensity for forming α-helices as a result of enhanced conformational flexibility upon folding to form an α-helix (Hermans et al., 1992). It is important to note that many of these secondary-structure propensities are highly context-dependent. For example, proline occurs widely in the transmembrane helices of integral membrane proteins and has been shown to have a stabilizing effect on α-helices in such environments (Li et al., 1996).

Clearly, such findings are in support of the hypothesis that the stability of the folded protein is largely dictated by the amino-acid composition and, as such, the primary structure results in a unique, kinetic minimum of free energy as first suggested by Anfinsen (1973). These simple principles have been expanded into complex algorithms that can be used to design both stable α-helices and β-sheets (Jiménez, 2014; Yakimov et al., 2014). Furthermore, comparative modeling can be used to design proteins with a greater degree of thermal stability, and similar models can be used to predict the crystallizability of a protein (Olson et al., 2015; Smialowski & Frishman, 2010; see §6).

3.3. Tertiary structure

The tertiary structure of a protein is the overall shape, or fold, adopted by the polypeptide chain. Many factors affect the process of protein folding, including conformational and compositional stability, cellular environment including temperature and pH, primary and secondary structure, solvation, hydrogen bonding, salt bridges, hydrophobic effects, van der Waals (vdW) forces, ligand binding, cofactor binding, ion binding, chaperones and PTMs, to name just a few.

The conformational stability of the polypeptide chain results in a significant entropic penalty (−TΔS >> 0), and under normal cellular conditions a folded protein is only marginally stable (∼10 kcal mol⁻¹ for a 10 kDa protein; Fig. 1 ▸b). In order to overcome this entropic penalty, all of the other factors influencing protein folding must outweigh the loss of conformational entropy (Dill, 1990). A series of studies by Pace and coworkers have recently quantified some of these influences (Pace et al., 2011; Pace, Fu et al., 2014; Pace, Scholtz et al., 2014). These studies suggest that the hydrophobic effect contributes ∼60% to the stability of the protein and hydrogen bonding contributes ∼40% (Pace et al., 2011). Specifically, the burial of a single methyl group contributes ∼1.1 kcal mol⁻¹ to net protein stability and loss of its conformational entropy contributes ∼2.4 kcal mol⁻¹ to net protein instability (Pace et al., 2011). The net contribution of hydrogen bonding to overall protein stability is also ∼1.1 kcal mol⁻¹ and is largely independent of the size of the protein (Stickle et al., 1992; Pace, Scholtz et al., 2014). However, in contrast, hydrophobic interactions typically contribute less to the stability of small proteins (Pace et al., 2011; Pace, Fu et al., 2014).

The stability of the protein fold is of particular interest for the design of thermally stable proteins for industrial uses such as biofuel production and as proteases for laundry detergents. Thermophilic organisms such as Thermotoga maritima, which thrives in hot deep-sea vents in the Sargasso Sea, require proteins that maintain fold and structure under such extreme conditions. The study of these thermophilic proteins suggests that the protein structures are similar to their mesophilic counterparts and thermal stability is inferred by subtle changes in the amino-acid composition. On comparing thermophilic proteins with their mesophilic counterparts, certain patterns are observed including an increase in the number of salt bridges, an increase in hydrophobicity and an increase in the number of aromatic residues (Dekker et al., 1991; Tanner et al., 1996; Zhou et al., 2008; Fields et al., 2015; Somero, 2004).

In contrast to the IDPs discussed above (see §2.2), the stability of the protein tertiary structure is often considered to be essential for the maintenance of protein function. However, many proteins undergo an overall change of protein fold as part of their mechanism of action. For example, serine protease inhibitors (serpins) undergo a transformation from a long-term stable native form (S, stressed) into a more stable folded form (R, relaxed) upon interaction with the proteinase (Whisstock & Bottomley, 2006; Whisstock et al., 2000). These structural rearrangements include the insertion of a loop into the center of a core β-sheet or the insertion of a β-strand to form a domain-swapped dimer that can initiate polymerization (Mottonen et al., 1992; Yamasaki et al., 2008; see Fig. 2 ▸c). Large conformational changes such as this are commonplace and are observed in many proteins including influenza virus hemagglutinin, lymphotactin, Mad2 spindle checkpoint protein and chloride intracellular channel 1 (CLIC1; Bryan & Orban, 2010). Therefore, it is important that any discussion on stability carefully considers the mechanism of the protein under study, as some protein folds are designed to be inherently unstable.

Matrix of examples of protein stability and disorder. (a) Examples of proteins with high conformational stability include the protein–protein destabilizing compound cyclosporin in complex with calcineurin and cyclophilin (Huai et al., 2002) and (b) the protein–protein stabilizing drug Tafamidis in combination with transthyretin (Bulawa et al., 2012). (c) Examples of proteins with low conformational stability include the serpins, which undergo large changes in fold and oligomerization state (Yamasaki et al., 2008), and (d) intrinsically disordered proteins (IDPs) such as the tumor suppressor protein p53 (Mujtaba et al., 2004).

3.4. Quaternary structure

Quaternary structure is the arrangement of the folded protein subunits into a multi-subunit complex. The stability of such complexes is of importance for the regulation of allostery and cooperativity, which often results from the conformational changes within individual polypeptide chains. One of the classic models used to describe allosteric transitions in proteins is the Monod–Wyman–Changeux (MWC) model (Monod et al., 1965). In this model proteins may exist in one of two states: tense (T) and relaxed (R). One of the key features of this model is that ligands may bind to either the T or R state with equal affinity, but if the R state is preferred then the affinity will be increased. However, if binding to the T state is favored then the affinity is decreased and the substance is described as an allosteric modulator. One of the best-studied examples is hemoglobin, with the R state representing deoxyhemoglobin and the T state representing oxyhemoglobin (Brunori, 2014; Ronda et al., 2013). As discussed above for IDPs and metastable proteins (see §§2.2 and 3.3), it is not sufficient to consider protein stability in isolation from function. Many proteins undergo large conformational changes involving both secondary and tertiary structure, and each state may have a different conformational or compositional stability (Fig. 2 ▸).

Modulation of the quaternary structure, and more specifically the protein–protein interactions responsible for quaternary structure, has long been a goal for the pharmaceutical industry. Such efforts present considerable challenges as a result of the large surface areas involved. Small molecules that destabilize protein–protein interactions have been demonstrated, but stabilizing examples are somewhat scarce (Giordanetto et al., 2014; Wells & McClendon, 2007). Destabilizing examples include the Abbott drug Navitoclax, which destabilizes the interaction between the anti-apoptotic protein Bcl-2 and Bad/Bid/Bak (Oltersdorf et al., 2005), and the Roche drug Nutlin-3, which inhibits the interaction between the tumor suppressor p53 and MDM2 (Secchiero et al., 2011). Examples of compounds that stabilize protein–protein interactions include natural products such as cyclosporin A, which stabilizes the interaction between calcineurin and cyclophin (Huai et al., 2002), and the drug Tafamidis, which binds to a pocket at the interface of the transthyretin dimer (Bulawa et al., 2012) (see Figs. 2 ▸a and 2 ▸b, respectively). In the case of Tafamidis, stabilization of the dimerized form of transthyretin prevents the aggregation and misfolding which has been shown to be the mechanism leading to transthyretin amyloidosis diseases such as peripheral neuropathy.

4. Protein stability during protein expression and purification

The stability of the protein during the course of expression and purification is often an issue. In order to obtain sufficient quantities of protein for crystallization screening, formulation, vaccine development or therapeutic use, it is essential that intact, stable and folded protein is produced. Many proteins are unstable, unfolded or proteolytically cleaved under the conditions used for protein expression; again it is important to emphasize that these factors can lead to poor stability on both the conformational and compositional levels (see §2 and Table 1 ▸). Factors giving rise to poor protein stability during expression and purification may include the primary structure of the protein, the construction of the recombinant expression plasmid, the temperature and the expression medium used and the toxicity of the protein to the host organism (Table 3 ▸). Therefore, there are many factors to test during expression and purification, and combinatorial design approaches are often used, in combination with high throughput methods to find the appropriate combination of conditions (Papaneophytou & Kontopidis, 2014). The use of advanced genetic engineering methods to modify both cells and expression plasmids is covered in more detail in reviews such as Sørensen & Mortensen (2005).

Table 3

Methods for improving protein stability during expression and purification

Construct design			Expression conditions
Removal of degradation-prone and protease sites	Optimized codon usage	Truncations and point mutations	Chaperones and co-expression	Affinity tags for improved solubility
Examples	N-end rule, PEST sequences and specific protease sites	Synthetic gene synthesis offered by many companies	Stabilizing mutations for membrane proteins	Coexpression and fusion vectors with DnaK and GroEL	GST, MBP, Halo tags
Key references	Bachmair et al. (1986 ▸), Rogers et al. (1986 ▸), Spiegel et al. (2015 ▸)	Daniel et al. (2015 ▸), Kane (1995 ▸)	Gräslund et al. (2008 ▸), Klock et al. (2008 ▸)	Kyratsous & Panagiotidis (2012 ▸), Kyratsous et al. (2009 ▸)	Walls & Loughran (2011 ▸), Kapust & Waugh (1999 ▸)

Expression conditions		Host cells
Cofactors and ligands	Low-temperature expression	Codon-optimized cells	Reduced-toxicity and reduced-protease cells
Examples	Metal ions and cofactors essential for folding, also other stabilizing cofactors	Cold-shock, low-temperature inducible promoters such as pCold	E. coli Rosetta	E. coli BL21 Star and pLysS
Key references	Leibly et al. (2012 ▸)	Qing et al. (2004 ▸), Vasina & Baneyx (1997 ▸)	Tegel et al. (2010 ▸)	Studier (1991 ▸)

4.1. Construct design, sequence, expression tags and protein stability

The design of the expression construct is one of the primary decisions that a structural biologist must make to ensure the efficient expression and purification of the target protein. For example, the compositional stability of a protein can be severely affected by the presence of protease cleavage sites within the target protein. These sites can result in cleavage of the target protein by endogenous proteases produced by the expression host. Proteolysis can be mitigated by the removal of cleavage sites from the expression construct during recombinant assembly. For example, such an approach was used to remove two protease sites from a malarial vaccine candidate that was proteolytically degraded by endogenous KEX2 protease during expression. Removal of these sites enabled the production of full-length protein on a large scale (Spiegel et al., 2015). Additionally, the primary structures of some proteins are inherently unstable, with unusually short half-lives. For example, proteins containing sequences rich in proline, glutamate, serine and threonine (PESTs) often have half-lives of less than 2 h (Rogers et al., 1986). It is suggested that PEST sequences target the protein for intracellular degradation via the proteasome machinery (Spencer et al., 2004) or via more traditional proteolysis pathways utilizing calpain (Shumway et al., 1999). Furthermore, the N-end rule is a strong predictor of protein half-life in vivo. For example, if the N-terminal residue of a protein is methionine, serine, alanine, glycine, threonine, valine or proline the half-life is stabilized (>20 h). In contrast, if the N-terminal residue is phenylalanine, aspartate, lysine or arginine the half-life of the protein is destabilized (<3 min; Bachmair et al., 1986). Proteins destabilized in this way are targeted for degradation via the ubiquitination pathway; therefore, the N-end rule is of no concern when using bacterial expression systems. It is important to note that the N-terminal residue is ‘masked’ by the inclusion of an N-terminal affinity purification tag, as is typically used in most laboratories today.

In addition to the half-life stabilizing effects of affinity purification tags, they are also of considerable interest for improving the solubility of a target protein (Amarasinghe & Jin, 2015; Wood, 2014). This is particularly true for maltose-binding protein (MBP), which has a strong effect in solubilizing the protein to which it is attached. MBP has also been shown to promote the correct folding of the protein target to which it is attached, suggesting that it can act as a form of molecular chaperone (Kapust & Waugh, 1999). Other chaperones that can be used to assist protein stability and folding during expression include DnaK and GroEL (Kyratsous & Panagiotidis, 2012). Some proteins are simply not stable in the cytoplasm and they can be redirected to other compartments of the host cell using an affinity tag. For example, the pMal vector system (NEB) incorporates the malE signal sequence and can be used to direct the protein of interest across the plasma membrane and into the periplasm. This has the added advantage of keeping the target protein away from cytoplasmic proteases during subsequent purification steps, thus further enhancing the compositional stability.

Self-cleaving affinity purification tags can be applied to carefully control the compositional stability. Traditional affinity-tag removal procedures often use proteases, such as thrombin or factor Xa, which can result in nonspecific degradation of the target protein. However, the use of highly specific self-cleaving tags prevents this issue as exogenous protease addition is not required. In addition to the self-cleaving intein-based tags discussed above (see §3.1), it has also been reported that nickel ions can be used to cleave an affinity tag. In the example of the GmSPI-2 inhibitor structure the peptide bond preceding the serine or threonine residue in the (S/T)XHZ motif was cleaved by nickel ions (Kopera et al., 2014; Krężel et al., 2010).

4.2. Expression conditions and protein stability

In order to ensure proper protein folding and stability, it is essential that the expression host is provided with the necessary prosthetic groups, cofactors and ligands as required by the target protein. Many of these are provided by the expression medium and are scavenged by the host cells during the course of expression. However, some cofactors and prosthetic groups cannot be synthesized by the host cell and others are not available in sufficient quantities. For example, heme incorporation is often low when heme-containing proteins are expressed in Escherichia coli. In these cases the expression medium must be supplemented with δ-aminolevulinate in order to achieve satisfactory levels of heme incorporation (Kery et al., 1995). Additionally, the solubility of the protein can also be significantly improved during expression by the addition of additives such as trehalose, glycine betaine, mannitol, l-arginine, potassium citrate, CuCl₂, proline, xylitol, NDSB 201, CTAB and K₂PO₄ (Leibly et al., 2012; see §6.2 for more on buffer screening).

Varying the temperature at which the expression is carried out can also be used to control protein stability and solubility. For example, cold-shock induction systems such as pCold (Takara/Clontech) can be used to improve the overall stability of the target protein (Qing et al., 2004). As an added benefit, at lower temperatures, cell proliferation is halted and the expression of endogenous proteins such as proteases is reduced. Therefore, the target protein is further protected from degradation and purity is improved. To assist in low-temperature expression, cold-adapted E. coli cells, for example ArcticExpress (Agilent Technologies), have been developed. These cells co-express the cold-adapted chaperonins Cpn10 and Cpn60 from the psychrophilic bacterium Oleispira antartica (see §4.3 for more on host-cell selection).

4.3. Host cells and protein stability

The choice of the host cells that are used for the expression of recombinant proteins has an important influence on protein stability. For example, the protein may be toxic to the host cell or the protein may be cleaved by endogenous proteases made by the cell. Clearly, such issues can be a primary source of compositional instability in proteins. Toxicity can be controlled by tight regulation of the expression level using promoters that respond in a concentration-dependent manner to the inducer. Examples of tightly controlled expression vectors include pBAD, which responds to the inducer l-arabinose (Guzman et al., 1995). The background expression of proteases (and proteins in general) can also be controlled using plasmids that express T7 lysozyme, such as pLysS. T7 lysozyme is a natural inhibitor of T7 RNA polymerase, the promoter utilized in the pET vector system, and can be used to reduce background levels of protein expression (Studier, 1991). Background levels of protease activity can also be reduced using OmpT⁻ bacterial strains, which do not express the outer membrane aspartyl protease. Such systems are commercially available as BL21 Star strains of E. coli (Invitrogen/ThermoFisher Scientific).

Host-cell selection is particularly important when expressing mammalian proteins in bacterial cells, as the codon usage between the organisms is different. For example, the AGA codon for arginine is particularly rare in E. coli and can result in premature chain termination, frame-shifting and incorrect amino-acid insertion (Calderone et al., 1996, Kane, 1995). This issue can be addressed in a number of ways, including the generation of a synthetic gene reflecting the codon usage of the host organism or by co-transformation of the host with a plasmid that can provide the tRNA of the missing codons (e.g. CodonPlus, Stratagene and pRARE; EMD Millipore/Novagen; Dieci et al., 2000; Fu et al., 2007). Competent E. coli BL21 cells containing the pRARE plasmid are commercially available under the trade name Rosetta. These cells have been used to optimize the expression of many human proteins in E. coli. For example, the Swedish Human Protein Atlas project has been successful in improving both the level of expression and the purity of proteins using the Rosetta E. coli strain (Tegel et al., 2010).

Finally, mammalian expression systems such as Chinese hamster ovary (CHO) cells (Fischer et al., 2015) and human embryonic kidney cells (e.g. HEK 293T and 293F; Nettleship et al., 2015) are often essential for the expression of mammalian or human proteins (for a discussion of the merits of the various expression systems, see Brondyk, 2009). In addition to addressing the codon-usage issue, expression in mammalian cells is often required to ensure that PTMs are correctly added and the protein is correctly folded and active (see §6.6 for a discussion of GnTI and lec1 glycosylation-deficient mammalian cells). Alternatively, insect cells such as Spodoptera frugiperda (e.g. Sf9 and Sf21) and Trichoplusia ni can be used (Altmann et al., 1999; Jarvis, 2009).

5. Key techniques for determining protein stability

The relative merits of the three main structural methods for assessing protein stability are shown in Table 2 ▸. Given the solid-state nature of protein crystallography it is often difficult to crystallize dynamic and disordered proteins, and for this reason NMR spectroscopy has been one of the main tools used to study IDPs such as p53 and CREB (Brutscher et al., 2015; Dunker & Oldfield, 2015; Mujtaba et al., 2004; Fig. 2 ▸d). NMR is extremely useful for assessing both secondary and tertiary structure in dynamic and disordered systems (see §5.1). The higher resolution of crystal structures make them a particularly attractive method for determining changes at the primary-structure level. For example, uncharacterized PTMs, such as glycosylation, can often be interpreted directly from the electron-density maps if the experimental data are of sufficiently high resolution. Similarly, given sufficiently high-resolution maps, cryo-EM can be a powerful technique for determining protein stability and dynamics at the quaternary level (see §5.2).

5.1. Nuclear magnetic resonance (NMR)

NMR spectroscopy is a powerful method for the determination of the stability of proteins in solution (Bieri et al., 2011; Kwan et al., 2011). The method is highly complementary to X-ray structure analysis, but given its ability to analyze structures in the solution state it is of tremendous value for assessing protein conformational stability (Krishnan & Rupp, 2012 ▸).

The fact that NMR can readily distinguish between folded and unfolded proteins, and detect the presence of disordered and unstructured regions, makes it inherently useful as a diagnostic tool for crystallization experiments (Fig. 3 ▸a). Modern instruments can extract this information with minimal sample requirement (∼10 nM) and a simple one-dimensional proton NMR spectrum can provide information on the conformational stability of the macromolecule. Specifically, as a result of the principal inverse relation between spin–spin relaxation time and the peak width, large soluble aggregates will not yield an interpretable high-resolution NMR spectrum. For non-aggregated protein samples that yield usable one-dimensional NMR spectra, good discrimination in the backbone amide region below 8.3 p.p.m., as well as peaks at around ∼1 p.p.m., are indicative of folded protein (Rehm et al., 2002). Furthermore, two-dimensional heteronuclear single-quantum coherence (HSQC) NMR spectra can be used to analyze the difference between folded and unstructured protein (Fig. 3 ▸a) and also to compare apoprotein and ligand-bound complexes (Figs. 3 ▸b and 3 ▸c). Such a two-dimensional spectrum maps the backbone amide groups according to their ¹H and ¹⁵N resonance frequencies. This method necessitates the production of ¹⁵N-labeled protein and requires larger amounts of sample compared with the more qualitative one-dimensional spectral analysis (Zhao et al., 2004).

HSQC spectrum of folded, unstructured and apo and ligand-bound proteins. (a) Two-dimensional ¹H–¹⁵N heteronuclear single-quantum coherence (HSQC) NMR spectrum showing the distinct discrimination in the region below 8.3 p.p.m. in ω₁ identifying a folded protein (red, sharp peak contours) compared with the wide and unresolved peaks for disordered protein sample (blue contours). Image courtesy of Simon Colebrook, Department of Biochemistry, Oxford University and Joanne Nettleship, Oxford Protein Production facility. (b) HSQC spectrum of apo and ligand-bound protein. The two-dimensional ¹H–¹⁵N HSQC NMR spectrum of bacterial methionine aminopeptidase (bMAP) with (right) and without (left) a tightly bound novel inhibitor (Evdokimov et al., 2007). Note the drastic improvement in the discrimination of the spectrum for the bMAP–ligand complex compared with the apoprotein. The crystals of the bMAP–ligand complex diffracted to 0.9 Å resolution. Image courtesy of Artem Evdokimov, Procter & Gamble Pharmaceuticals, Mason, Ohio, USA. Figure adapted from Rupp (2015).

One of the main benefits of NMR methods is that the effect of environmental conditions, such as pH, temperature or ligand binding, can be readily varied and studied in a near-native solution state. Additionally, the nondestructive nature of NMR spectroscopy means that the samples can also be used for subsequent crystallization experiments, and high-throughput structure-determination facilities often combine NMR screening with crystallization experiments.

5.2. Electron microscopy

The high conformational heterogeneity of proteins, especially of large multi-domain protein–protein complexes, can often hinder crystallization efforts. Mutational variants and different combinations of protein partners may need to be screened for suitability for crystallization. To facilitate this, electron microscopy (EM) can be used to directly visualize the sample and assess the level of heterogeneity. In the best-case scenario three-dimensional cryo-EM reconstructions can be generated, but this can be time-consuming and often requires substantial efforts in screening for suitable rules='groups'>Protein crystallizability- and stability-modifying methodsTruncations and domain selectionBuffer screeningLigands and additive screeningFused affinity tags and crystallization chaperonesReductive alkylationCompositional stabilityYesYesYesYesYesConformational stabilityYesYesYesYesYesExampleRemoval of disordered regionsStabilize buffers via ionic changesBinding to stabilize proteinStabilize protein with rigid fusion of tagProvides entropic benefit upon crystallizationKey referencesYumerefendi et al. (2010 ▸), Reich et al. (2006 ▸), Klock et al. (2008 ▸), Gräslund et al. (2008 ▸)Reinhard et al. (2013 ▸)Chung (2007 ▸), Hassell et al. (2007 ▸)Smyth et al. (2003 ▸)Rice et al. (1977 ▸), Tan et al. (2014 ▸), Walter et al. (2006 ▸)

Protein crystallizability- and stability-modifying methods
Surface mutagenesis, surface-entropy reduction (SER) and deglycosylation	In situ proteolysis	Thermofluor	Deuterium-exchange mass spectrometry (DXMS)	Disulfide engineering
Compositional stability	Yes	Yes	No	No	Yes
Conformational stability	Yes	Yes	Yes	Yes	Yes
Example	Entropic stabilization of large, flexible, surface-exposed residues	Removal of flexible loops	Analysis of thermal stability (T_m)	Identification of flexible regions	Stabilization of quaternary structure of crystal packing
Key references	Cooper et al. (2007 ▸), Goldschmidt et al. (2014 ▸)	Dong et al. (2007 ▸), Wernimont & Edwards (2009 ▸)	Reinhard et al. (2013 ▸), Ristic et al. (2015 ▸)	Englander (2006 ▸), Englander & Kallenbach (1983 ▸)	Forse et al. (2011 ▸), Quistgaard (2014 ▸)

Several software packages and algorithms have been developed to assess the so-called crystallizability of a protein (Smialowski & Frishman, 2010; Derewenda, 2010; Ruggiero et al., 2012). These include DisMeta (Huang et al., 2014), XtalPred (Jahandideh et al., 2014; Slabinski et al., 2007), POODLE (Shimizu, 2014), MFDp2 (Mizianty et al., 2014), MoRFpred (Disfani et al., 2012), RFCRYS (Jahandideh & Mahdavi, 2012), XANNpred (Overton et al., 2011), SVMCRYS (Kandaswamy et al., 2010), SCMCRYS (Charoenkwan et al., 2013), CRYSTALP2 (Kurgan et al., 2009), MetaPPCP (Mizianty & Kurgan, 2009) and ParCrys (Overton et al., 2008). Many of these packages use a template-based approach to analyze the propensity of a protein to crystallize by comparison with known crystal structures. However, some of these packages, including XtalPred, POODLE and DisMeta, utilize sequence-based predictions to identify regions of low complexity, disorder, transmembrane and signal peptides.

It is important to note that the propensity for disorder calculated by most of these methods, and in turn the propensity for crystallization, is largely predicted on the basis of a single polypeptide chain in isolation. Clearly, protein–protein complexes can often result in stabilization of the constituent proteins, as is commonly observed for IDPs (see §2.2). Therefore, particularly for crystallographic studies, it is often essential to study the protein–protein complex as a whole; the individual proteins are often too disordered or unstable when uncomplexed.

6.1. Truncations and domain selection

Selection of the shortest possible domain is often preferable and computational tools such as Expression of Soluble Proteins by Random Incremental Truncation (ESPRIT) and combinatorial domain hunting (CDH) are available to assist in this effort (Reich et al., 2006; Yumerefendi et al., 2010). Truncation of a protein to the shortest possible fragment can often be a key factor in successful structure solution. For example, amyloid fibers are of tremendous medical importance in diseases such Alzheimer’s and prion diseases and their partially disordered structure has traditionally hindered structural analysis using crystallographic techniques. However, shorter fragments of only 6–7 amino acids in length, which also form fibrils, were used to produce microcrystals and to determine the structure (Moshe et al., 2016; Sawaya et al., 2007). Another example is the production of structured truncation arrays of a target protein generated using polymerase incomplete primer extension cloning methods (Klock & Lesley, 2009). Using this technique, structural genomics consortia such as the Joint Center for Structural Genomics (JCSG) and the Structural Genomics Institute, Karolinska Institutet have been able to generate several thousands of truncations for targets recalcitrant to crystallization (Klock et al., 2008; Gräslund et al., 2008).

In addition to truncations of the protein, it is also important to consider other mutations of the protein. Stability-enhancing mutations in membrane proteins are surprisingly common and some estimates suggest that ∼10% of random mutations will confer some level of stability on the protein (Bowie, 2001). For example, two valine-to-alanine substitutions in the transmembrane portion of the M13 coat protein were found to enhance thermal stability (Deber et al., 1993). The reasons for the stability-enhancing effects of mutations are not always immediately obvious from analysis of the structure. It has been suggested that membrane proteins are required to be inherently flexible, and therefore conformationally unstable, in order to maintain function in the restricted environment of the membrane (Bowie, 2001).

6.2. Buffer screening

The buffer in which a protein is solubilized exerts an influence on its stability (Davis-Searles et al., 2001). Therefore, buffer screening is a powerful method for stabilizing proteins for crystallographic applications and also for the formulation of biologics. One of the primary methods used for high-throughput buffer screening is Thermofluor (see §5.4). Using such approaches it is possible to screen libraries of hundreds of different buffers and pH combinations, and the stabilizing effect can be easily inferred from the change in T_m (ΔT_m; Reinhard et al., 2013; Ristic et al., 2015). Using these approaches, interesting protein-stabilizing buffers have been identified. For example, citrate, bis-tris and N-(2-acetamido)iminodiacetic acid (ADA) have all been identified as having statistically significant stabilizing effects on the proteins tested (Ristic et al., 2015). As a more extreme example of buffer screening, IIPs (see §2.2) such as V22-SH3 are insoluble in traditional buffer systems and can only be solubilized in pure water (Liu & Song, 2009).

6.3. Ligands and additive screening

Ligand binding can also significantly help to stabilize the protein, particularly from the perspective of conformational stability. Co-crystal structures of proteins bound to cofactors, prosthetic groups, substrates, drugs and inhibitors are often the holy grail of structural biology; somewhat fortunately for the structural biologist, ligands often have a stabilizing effect on the protein and can increase the chances of successful crystallization. It is important to remember that although soaking of compounds through the crystal lattice to the active site is often possible, it may also bring about conformational changes in the protein on binding of the ligand. Therefore, ab initio crystal screening in the presence of the ligand may be required in order to obtain crystals (Hassell et al., 2007). The judicial use of bioanalytical techniques, such as Thermofluor and NMR, is key for guiding the successful production of a ligand-bound crystal structure (see §5).

In addition to the stabilizing effects of small-molecule ligands, it is also possible to identify ions and other organic additives that stabilize the protein or even the crystal. For example, Thermofluor was used to identify magnesium ions as a stabilizing influence on the enzyme DapD from Mycobacterium tuberculosis, and the subsequent addition of MgCl₂ to the crystallization solution resulted in larger crystals (Reinhard et al., 2013; Schuldt et al., 2009). In this case, stabilization of the quaternary structure results from the addition of magnesium ions, and two tightly coordinated Mg²⁺ ions were identified in the homotrimer interface of the crystal structure (Schuldt et al., 2009). Other examples of stabilizing additives include the commonly used precipitants polyethylene glycol (PEG) and 2-methyl-2,4-pentanediol (MPD), which are both often observed bound to crystal structures. In the case of MPD it has been proposed that it stabilizes the protein by promoting the hydration of the protein surface by binding to exposed hydrophobic surface residues such as leucine (Anand et al., 2002). Building on this theme, additive screens such as ‘Silver Bullets’ (Hampton Research) have been assembled that can stabilize or initiate cross-linking between proteins further promoting crystal lattice formation (McPherson & Cudney, 2006).

6.4. Fused affinity tags and crystallization chaperones

As discussed in §4.1, affinity purification tags such as MBP are often used to aid in both protein stability and solubility during the course of protein expression and purification. Larger tags such as MBP are usually removed prior to crystallization trials, as the flexibility of the tag can interfere with crystal packing. However, in some cases smaller tags such as His can often be left on without unduly affecting crystal packing or protein function (Bucher et al., 2002). The primary reason for removing large tags is the reduced conformational stability resulting from the flexible linker between the target protein and its affinity tag. Several groups have successfully ‘engineered out’ this linker flexibility by inserting a string of alanine residues in place of the usual protease cleavage site in the linker (Smyth et al., 2003). This concept has resulting in several crystal structures of proteins fused to MBP, including gp21 (Kobe et al., 1999), SarR (Liu et al., 2001), MATa1 (Ke & Wolberger, 2003) and MCL1 (Clifton et al., 2015). This concept has led to the idea of crystallization chaperones (Bukowska & Grütter, 2013). Example uses of crystallization chaperones include the application of Fab antibody fragments to study the neurotransmitter sodium symporter LeuT in various conformational states (Krishnamurthy & Gouaux, 2012) and the fusion of T4 lysozyme to G-protein-coupled receptor (Zou et al., 2012). These approaches are proving to be useful for the stabilization of membrane proteins and as aids in their structure determination (Lieberman et al., 2011).

6.5. Reductive alkylation

Modification of the protein surface is a well established strategy for enhancing protein crystallization and can be achieved using site-directed mutagenesis (see §6.6) or chemical modification (Derewenda, 2004). A common method of chemical modification is the reductive methylation of the ∊-amino groups of solvent-exposed lysine residues. This is performed using the reducing agents dimethylamine–borane and formaldehyde (Means, 1977). In recent years, this technique has become one of the workhorse ‘salvage’ techniques of structural genomics consortia (Tan et al., 2014; Walter et al., 2006; Sledz et al., 2010). Reductive methylation is believed to function via the introduction of new surface contacts, therefore promoting crystal lattice formation (Sledz et al., 2010).

Recent developments of the technique include the use of ethylation and isopropylation, although fewer targets are available to assess the performance of such techniques (Tan et al., 2014). Another important development of the method is the use of cysteine alkylation for the structure determination of membrane proteins. This method was first used for the determination of the β₁-adrenergic G-protein-coupled receptor structure (Warne et al., 2008) and is in common use for many GPCR studies (Columbus, 2015). Cysteine alkylation stabilizes the protein by preventing the formation of disulfide bonds, and in the case of the β₁-adrenergic receptor functions by stabilizing the monomers and preventing oligomers forming (Mathiasen et al., 2014).

In addition to alkylation, other chemical modifications such as fluorination can be carried out. Fluorine is all but absent from biological systems, but stabilizes proteins as a result of the ‘fluorous effect’ (Buer & Marsh, 2014; Marsh, 2014). This effect results in an unusual propensity to undergo phase separation and causes an increase in the buried surface area in the hydrophobic core of fluorinated proteins. Fluorinated proteins can be generated using the highly fluorinated amino acid hexafluoroleucine. The crystal structure of a designed four-helical bundle protein, α4H, reveals that the fluorinated residues pack well into the hydrophobic core of the protein with little perturbation of the structure (Buer et al., 2012). This method would clearly perturb the structure and the function of some proteins, but may be a method worthy of further investigation for enabling structural studies of very unstable proteins.

6.6. Surface mutagenesis, surface-entropy reduction (SER) and deglycosylation

Mutagenesis of surface-exposed amino acids is a proven method for engineering proteins with improved stability and chance of crystallization (Derewenda, 2004, 2010; Derewenda & Vekilov, 2006). Amongst the first uses of mutagenesis to enhance crystallizability was the transplant of key crystal contacts from the rat ferritin structure onto the human ortholog, which was previously unsolved (Lawson et al., 1991). In this example, Lys86 of the human protein was mutated to Glu to mimic the Ca²⁺-binding site observed in the crystal contacts of the rat ortholog. Clearly, this technique is useful if an orthologous structure is available. However, this is often not the case and one is shooting in the dark when choosing surface residues to mutate.

To address this issue, Derewenda and coworkers made a series of mutations to RhoGDI targeting glutamate and lysine residues on the surface of the protein (Longenecker et al., 2001; Mateja et al., 2002; Czepas et al., 2004). These residues have high conformational entropy and rarely participate in protein–protein interfaces within the crystal (Lo Conte et al., 1999; Baud & Karlin, 1999). Therefore, they represent attractive targets for modification of crystal contacts. In an attempt to reduce the conformational entropy on the protein surface, these residues were mutated to either alanine, arginine or aspartate. The original set of mutable residues has more recently been expanded to include glutamine. The SERp server is an excellent resource for identifying such residues (Goldschmidt et al., 2007, 2014; http://bit.ly/1LFjdyk). This site evaluates the solvent exposure, secondary structure, surface entropy and evolutionary conservation for sets of glutamate, glutamine or lysine residues. Analysis of the evolutionary conservation is used as a guide to avoid the mutation of residues that may be structurally or functionally significant. Finally, the site suggests appropriate clusters of residues matching these criteria as suitable for mutagenesis. These techniques have been used in conjunction with the fused affinity-tag and molecular-chaperone approach discussed above (see §6.4) to determine the structure of a RACK1A-MBP fusion protein (Moon et al., 2010).

Finally, the presence of PTMs (see §3.1) on the surface of the protein, in particular glycosylation, must be carefully considered. N- and O-linked sugars occur frequently on the surface of eukaryotic proteins and their chemical heterogeneity and conformational freedom can result in significant conformational variability. Strategies to deal with glycosylation on the protein surface include expression using a deglycosylation-deficient CHO cell strains such as lec1 (Puthalakath et al., 1996) or an HEK293S strain such as GnTI (Reeves et al., 2002). Another strategy is the mutagenesis of consensus glycosylation sites, such as the Asn-X-Ser/Thr sequon, which is targeted for N-linked glycosylation (Mellquist et al., 1998), and Ser or Thr residues, which are targeted for O-linked glycosylation. Alternatively, endoglycosylases such as endoglycosidase H (Endo H) and PNGase F can be used. In the case of PNGase F the glycosylated asparagine residue is converted to an aspartic acid, thus removing the sugar completely. Another approach is the use of N-glycosylation inhibitors such as swainsonine and kifunensine (Elbein, 1987). An effective combination approach uses glycosylation inhibitors during expression, followed by treatment with Endo H. Such an approach has been used to generate diffraction-quality crystals of sRPTPμ and s19A (Chang et al., 2007). Also, prudent selection of the expression system can be deployed to vary the glycosylation patterns. For example, Sf insect cells generally produce simpler glycosylation patterns of the GlucNAcMan₅ type and often glycosylate at fewer sites, whereas yeast cells can often hyperglycosylate proteins.

It must be noted that the presence of glycans on the protein surface can sometimes aid in crystallization by mediating important crystal contacts. For example, a complex of the densely glycosylated Hepatitis C virus E2 (HCV E2) glycoprotein bound to a broadly neutralizing antibody failed to crystallize using numerous deglycosylation strategies, including Endo H treatment or N-linked glycan-sequon mutagenesis. In this example, only the fully glycosylated HCV E2 protein crystallized, and the resulting structure revealed a key crystal contact mediated by an N-linked glycan interacting nonspecifically with a neighboring symmetry mate within the crystal (Kong et al., 2013). Full or partial glycosylation may also be necessary to crystallize complexes involving glycan-dependent protein interactions (Kong et al., 2014 ▸). These examples illustrate that glycosylation may also have a stabilizing effect on proteins, especially for crystallization, and the presence of glycans is not always deleterious.

6.7. In situ proteolysis

The presence of proteases in a protein sample submitted for crystallization trials is clearly of significance and can lead to significant composition instability over the time course of the experiment. For example, it was serendipitously discovered that a penicillium fungus growing in a crystallization drop was responsible for cleaving ∼200 residues off the yeast CPSF-100 protein and was essential for successful crystallization (Mandel et al., 2006; Bai et al., 2007). This finding initiated so-called in situ proteolysis, in which protein samples are crystallized in the presence of a panel of proteases such as trypsin and chymotrypsin (Dong et al., 2007; Wernimont & Edwards, 2009). More recently, these techniques have been combined with mass-spectrometric analysis to help identify peptide fragments that are stable over the time frame of a crystallization experiment (Gheyi et al., 2010). In a similar fashion, limited proteolysis can be used to identify stable domains of membrane and globular proteins. The membrane protects the transmembrane portions of the protein from protease cleavage and more compact forms of the membrane protein can be produced. Such an approach was used to generate a stable 55 kDa construct of the P pilus PapC that was subsequently used for crystal structure determination (Remaut et al., 2008).

6.8. Thermal stability screening

The thermal stability of a protein displays a good level of correlation with the crystallizability of a protein. For example, a large-scale Thermofluor study of 657 protein samples showed that only 23% of the proteins with a T_m of <43°C yielded crystals. In contrast, 49% of the proteins with a T_m of >45°C yielded crystals (Dupeux et al., 2011; see §5.4 and Fig. 4 ▸). However, beyond 45°C the T_m does not appear to be particularly predictive of crystallizability; there is no significant increase in crystallization frequency for proteins with a T_m of between 45 and 96°C (Dupeux et al., 2011). Furthermore, there are no known structural features that correlate well with T_m (Kumar et al., 2000), and the T_m appears to be highly solvent-dependent (Faria et al., 2004). It is important to note that other features of the melting curve, aside from the T_m, may be useful to the crystallographer. For example, a high initial signal generated by a Thermofluor assay may indicate that an exposed hydrophobic surface of the native protein may be interacting with the dye used in the assay. Proteins displaying such profiles appear to have a lower rate of crystallization (36.6%; Dupeux et al., 2011). Similarly, a wide thermal denaturation peak can be indicative of noncooperative unfolding behavior, which might be a useful consideration for construct modification (Morar-Mitrica et al., 2013 ▸). Ultimately, measures of thermal stability may be useful as a broad, qualitative assessment of the suitability of a protein for crystallization.

Thermofluor assay of protein melting temperature in the presence of stabilizing ligands. (a) Example melting curves for a protein of unknown function from Eubacterium siraeum (ZP_02421384.1) in the presence of various adenosine- and ribose-containing ligands (grey), adenosine diphosphate (ADP, green), adenosine triphosphate (ATP, red) and control sample with no ligand (blue dashed line). ADP and ATP result in a shift in melting temperature (ΔT_m) of 3 and 8°C, respectively. (b) Matrix of 72 proteins of unknown function screened in a Thermofluor assay against a panel of 327 ligands. The ΔT_m is indicated by the size and color of the data points ranging from 10 to 70°C. Data kindly provided by Anna Grezchnik of the Joint Center for Structural Genomics (JCSG).

6.9. Deuterium-exchange mass spectrometry (DXMS)

Prior to crystallization, it is often desirable to remove protein termini and low-complexity regions owing to their inherent flexibility (Derewenda, 2010). Disordered regions of the protein often hamper crystallization and the design of protein constructs trimmed of such regions can be guided using deuterium-exchange mass spectrometry (DXMS; Englander, 2006; Konermann et al., 2011; Spraggon et al., 2004; Figs. 5 ▸a and 5 ▸b).

Deuterium-exchange mass spectrometry (DXMS) analysis. DXMS was used to guide the construct design used for determining the crystal structure of a putative ethanolamine-utilization protein from Salmonella typhimurium. (a) The left side of the figure shows that the N-terminal portion of the protein is more disordered, or unstructured in solution, as the backbone amide protons are more susceptible to exchange. Deuterium-labelled proteolytic fragments are highlighted in red. (b) The right side of the figure shows that the C-terminal portion of the protein is more ordered, or structured in solution, as the backbone-amide protons are less susceptible to exchange. The region selected for truncation is denoted by a blue arrow. (c) Ribbon diagram of the final crystal structure determined for residues 98–229 showing the compact and ordered structure (loops are shown in green, α-helices in red and β-strands in yellow; PDB entry 2pyt; Joint Center for Structural Genomics, unpublished work). (d) The same structure colored according to the B-factor value, highlighting the stable core of the protein in blue and the more flexible outer regions in green through red. Data kindly provided by Scott Lesley of the Joint Center for Structural Genomics (JCSG).

Using this technique, proteins are exposed to deuterated solvent and the deuterium is allowed to exchange with the backbone amide protons of the protein. The exchange rates are dependent on the degree of solvent exposure and the integrity of the local secondary structure. Therefore, this method gives a measure of the conformational flexibility of the protein in solution (Englander & Kallenbach, 1983). These rates can be obtained by measuring the change in the mass of the deuterated peptides using proteolytic mass spectrometry. DXMS has been successfully applied to structural genomics targets, guiding the deletion of flexible regions that previously hindered the growth of diffraction-quality crystals (Spraggon et al., 2004; Pantazatos et al., 2004; Fig. 5). Although the utility of DXMS is often limited by the lengths of the proteolytic fragments that can be produced, sometimes restricting the area of investigation to less than 50% of the protein sequence, it can often provide a key insight for the design of crystallizable constructs. Furthermore, when coupled with structure, DXMS can be mapped onto the protein structure to provide a three-dimensional map of flexibility that can further guide protein-engineering efforts and mechanistic studies (Guttman et al., 2012; Kong et al., 2010; Zhang et al., 2012; Chung et al., 2011).

6.10. Disulfide engineering

Disulfide engineering is a well established method for stabilizing proteins and for studying and modifying protein function and dynamics (Dombkowski et al., 2014). Intermolecular disulfide bonds between proteins in the crystal lattice have been observed and have led to the coining of the phrase ‘spontaneously polymerizing protein crystals’ (Quistgaard, 2014). Disulfide engineering can be used to introduce these intermolecular disulfides into the protein in an attempt to stabilize the lattice and promote crystallization. Furthermore, studies have shown that symmetrical proteins, such as homodimers, tend to crystallize more readily (Wukovitz & Yeates, 1995). Using T4 lysozyme as a model system, it has been demonstrated that the introduction of disulfide bonds can be used to make monomeric proteins dimerize and increasing the chance of lattice formation (Banatao et al., 2006; Heinz & Matthews, 1994). This has been termed ‘synthetic symmetrization’ and can be a useful tool for assisting in the crystallization of monomeric proteins and protein–protein complexes which display asymmetry. Proteins that have been successfully crystallized using this approach include CelA from Thermotoga maritima (Forse et al., 2011). Tools for identifying residues suitable for disulfide engineering are available, including Disulfide by Design 2.0 (Craig & Dombkowski, 2013; http://bit.ly/1NbV2tQ). One caveat of this approach is the potential of disulfide bonds to adopt different conformations that may promote conformational flexibility. For example, the disulfide bond connecting the I-EGF1 and I-EGF2 domains of β₂ integrin is able to accommodate a >20 Å hinge motion between the domains (Shi et al., 2007; Smagghe et al., 2010).

7. Future outlook

Protein stability does mean many different things to many different scientists. However, on a global level, it can be considered as the ability of a protein to maintain structure and function in a particular environment. If the environment of interest is normal physiological conditions then the net summation of all contributing forces must add up to provide a small negative ΔG, therefore favoring a stable folded protein. However, not all proteins operate in standard physiological environments and many other factors must be considered. For example, many proteins, such as FeFe hydrogenase, are sensitive to oxygen and changes to the structure must be considered under an oxygen-free environment (Mulder et al., 2011). However, the growth of such enzymes under anaerobic conditions is both costly and difficult to achieve on a large scale, and considerable effort is being made to generate oxygen-tolerant hydrogenases for use on an industrial scale (Fritsch et al., 2013). The stability of such enzymes is of great interest to the biofuels industry, with the potential for biological hydrogen-gas production (Kim & Kim, 2011). Similarly, light-sensitive proteins such as opsins, photolyase and photosystems I and II all have unique structural features that enable them to utilize the energy of photons to carry out biological functions. Other interesting examples include the light-, oxygen- and voltage-sensitive domains (LOVs) found in plants and algae that undergo conformational changes and covalent binding of an FMN moiety under illumination with blue light (Briggs, 2007; Kottke et al., 2006). These examples illustrate how nonstandard environments must be considered in any discussion of protein stability.

Traditionally, discussions of protein stability have focused on soluble, folded and more classical globular proteins. However, large percentages of the proteome are predicted to contain unstructured, insoluble and aggregated proteins in the form of IDPs, IIPs and APRs (see §2.2). Such proteins, and their unwieldy conformational stability, represent a challenge for protein crystallographers, who usually simply remove such regions during the cloning stage. However, these families of proteins are of tremendous medical importance and deletion of these regions at the cloning stage is no longer meaningful. Crystallographic techniques are evolving to help to deal with such proteins, particularly in the areas of microcrystallography, next-generation synchrotron sources and X-ray free-electron lasers (XFELs) (Neutze & Moffat, 2012; Spence et al., 2012; Gruner & Lattman, 2015; Weckert, 2015; Moukhametzianov et al., 2008; Igarashi et al., 2008; Fischetti et al., 2009).

Recent developments in XFEL light sources are enabling higher resolution studies of protein dynamics on a femtosecond timescale (Uervirojnangkoorn et al., 2015). From a crystallization perspective, such light sources present a unique set of challenges to the crystallographer that are somewhat orthogonal to the traditional problems faced when using more traditional diffraction methods. One such requirement is the need for microcrystalline sample material that can be injected into the laser path. Additionally, such techniques also allow larger and more dynamic protein structures to be determined, particularly those which possess conformational disorder and thus hinder the formation of larger, more ordered crystals. Recent examples using XFEL techniques include structures of the complex between synaptotagmin 1 and the neuronal SNARE (Zhou et al., 2015). Clearly, XFEL technology, especially when coupled with associated hybrid methods, such as cryo-EM, ultrafast electron diffraction (UED) and double electron–electron resonance (DEER), will help mitigate many of the problems associated with protein conformation stability and its effect on protein crystallization (Wakatsuki, 2016 ▸). Next-generation synchrotron sources will be capable of providing smaller and brighter X-ray beams, and most importantly with a higher coherence fraction (Weckert, 2015). Such coherent beams will provide exciting opportunities for the study of protein dynamics on ever smaller timescales and will be essential for the study of unstable proteins that are currently inaccessible using traditional crystallographic techniques.

In summary, we will end this review with the title of an excellent paper by the late Fred Richards: Protein stability: still an unsolved problem (Richards, 1997). Although the problem is still largely unsolved, considerable progress has been made towards the study of protein stability, disorder and dynamics. Structural methods such as crystallography, NMR and cryo-EM are central to this endeavor and the exploration of innovative hybrid methods will be vital.

Acknowledgments

This work was sponsored in part by contributions from k.-k. Hofkristallamt, Vista, California, USA and Austrian Science Fund (FWF) project P28395-B26. We would like to thank members of the Joint Center for Structural Genomics for fruitful discussions and contribution of the Thermofluor and DXMS data.

References

Altmann, F., Staudacher, E., Wilson, I. B. & März, L. (1999). Insect cells as hosts for the expression of recombinant glycoproteins.Glycoconj. J.16, 109–123. [PubMed]
Alushin, G. M., Lander, G. C., Kellogg, E. H., Zhang, R., Baker, D. & Nogales, E. (2014). High-resolution microtubule structures reveal the structural transitions in αβ-tubulin upon GTP hydrolysis.Cell, 157, 1117–1129. [PMC free article] [PubMed]
Amarasinghe, C. & Jin, J.-P. (2015). The use of affinity tags to overcome obstacles in recombinant protein expression and purification.Protein Pept. Lett.22, 885–892. [PubMed]
Anand, K., Pal, D. & Hilgenfeld, R. (2002). An overview on 2-methyl-2,4-pentanediol in crystallization and in crystals of biological macromolecules.Acta Cryst. D58, 1722–1728. [PubMed]
Anfinsen, C. B. (1973). Principles that govern the folding of protein chains.Science, 181, 223–230. [PubMed]
Aranko, A. S., Wlodawer, A. & Iwai, H. (2014). Nature’s recipe for splitting inteins.Protein Eng. Des. Sel.27, 263–271. [PMC free article] [PubMed]
Bachmair, A., Finley, D. & Varshavsky, A. (1986). In vivo half-life of a protein is a function of its amino-terminal residue.Science, 234, 179–186. [PubMed]
Bai, Y., Auperin, T. C. & Tong, L. (2007). The use of in situ proteolysis in the crystallization of murine CstF-77.Acta Cryst. F63, 135–138. [PMC free article] [PubMed]
Banatao, D. R., Cascio, D., Crowley, C. S., Fleissner, M. R., Tienson, H. L. & Yeates, T. O. (2006). An approach to crystallizing proteins by synthetic symmetrization.Proc. Natl Acad. Sci. USA, 103, 16230–16235. [PMC free article] [PubMed]
Barth, A. (2007). Infrared spectroscopy of proteins.Biochim. Biophys. Acta, 1767, 1073–1101. [PubMed]
Baud, F. & Karlin, S. (1999). Measures of residue density in protein structures.Proc. Natl Acad. Sci. USA, 96, 12494–12499. [PMC free article] [PubMed]
Bieri, M., Kwan, A. H., Mobli, M., King, G. F., Mackay, J. P. & Gooley, P. R. (2011). Macromolecular NMR spectroscopy for the non-spectroscopist: beyond macromolecular solution structure determination.FEBS J.278, 704–715. [PubMed]
Boivin, S., Kozak, S. & Meijers, R. (2013). Optimization of protein purification and characterization using Thermofluor screens.Protein Expr. Purif.91, 192–206. [PubMed]
Bowie, J. U. (2001). Stabilizing membrane proteins.Curr. Opin. Struct. Biol.11, 397–402. [PubMed]
Briggs, W. R. (2007). The LOV domain: a chromophore module servicing multiple photoreceptors.J. Biomed. Sci.14, 499–504. [PubMed]
Brondyk, W. H. (2009). Selecting an appropriate method for expressing a recombinant protein.Methods Enzymol.463, 131–147. [PubMed]
Brunori, M. (2014). Variations on the theme: allosteric control in hemoglobin.FEBS J.281, 633–643. [PubMed]
Brutscher, B., Felli, I. C., Gil-Caballero, S., Hošek, T., Kümmerle, R., Piai, A., Pierattelli, R. & Sólyom, Z. (2015). NMR methods for the study of instrinsically disordered proteins structure, dynamics, and interactions: general overview and practical guidelines.Adv. Exp. Med. Biol.870, 49–122. [PubMed]
Bruylants, G., Wouters, J. & Michaux, C. (2005). Differential scanning calorimetry in life science: thermodynamics, stability, molecular recognition and application in drug design.Curr. Med. Chem.12, 2011–2020. [PubMed]
Bryan, P. N. & Orban, J. (2010). Proteins that switch folds.Curr. Opin. Struct. Biol.20, 482–488. [PMC free article] [PubMed]
Bucher, M. H., Evdokimov, A. G. & Waugh, D. S. (2002). Differential effects of short affinity tags on the crystallization of Pyrococcus furiosus maltodextrin-binding protein.Acta Cryst. D58, 392–397. [PubMed]
Buer, B. C. & Marsh, E. N. (2014). Design, synthesis, and study of fluorinated proteins.Methods Mol. Biol.1216, 89–116. [PubMed]
Buer, B. C., Meagher, J. L., Stuckey, J. A. & Marsh, E. N. (2012). Structural basis for the enhanced stability of highly fluorinated proteins.Proc. Natl Acad. Sci. USA, 109, 4810–4815. [PMC free article] [PubMed]
Bukowska, M. A. & Grütter, M. G. (2013). New concepts and aids to facilitate crystallization.Curr. Opin. Struct. Biol.23, 409–416. [PubMed]
Bulawa, C. E., Connelly, S., Devit, M., Wang, L., Weigel, C., Fleming, J. A., Packman, J., Powers, E. T., Wiseman, R. L., Foss, T. R., Wilson, I. A., Kelly, J. W. & Labaudinière, R. (2012). Tafamidis, a potent and selective transthyretin kinetic stabilizer that inhibits the amyloid cascade.Proc. Natl Acad. Sci.109, 9629–9634. [PMC free article] [PubMed]
Calderone, T. L., Stevens, R. D. & Oas, T. G. (1996). High-level misincorporation of lysine for arginine at AGA codons in a fusion protein expressed in Escherichia coli.J. Mol. Biol.262, 407–412. [PubMed]
Carver, T. E. et al. (2005). Decrypting the biochemical function of an essential gene from Streptococcus pneumoniae using Thermofluor technology.J. Biol. Chem.280, 11704–11712. [PubMed]
Chang, V. T., Crispin, M., Aricescu, A. R., Harvey, D. J., Nettleship, J. E., Fennelly, J. A., Yu, C., Boles, K. S., Evans, E. J., Stuart, D. I., Dwek, R. A., Jones, E. Y., Owens, R. J. & Davis, S. J. (2007). Glycoprotein structural genomics: solving the glycosylation problem.Structure, 15, 267–273. [PMC free article] [PubMed]
Charoenkwan, P., Shoombuatong, W., Lee, H.-C., Chaijaruwanich, J., Huang, H.-L. & Ho, S.-Y. (2013). SCMCRYS: predicting protein crystallization using an ensemble scoring card method with estimating propensity scores of P-collocated amino acid pairs.PLoS One, 8, e72368. [PMC free article] [PubMed]
Chatterjee, S. & Mayor, S. (2001). The GPI-anchor and protein sorting.Cell. Mol. Life Sci.58, 1969–1987. [PubMed]
Chong, S., Mersha, F. B., Comb, D. G., Scott, M. E., Landry, D., Vence, L. M., Perler, F. B., Benner, J., Kucera, R. B., Hirvonen, C. A., Pelletier, J. J., Paulus, H. & Xu, M.-Q. (1997). Single-column purification of free recombinant proteins using a self-cleavable affinity tag derived from a protein splicing element.Gene, 192, 271–281. [PubMed]
Chou, P. Y. & Fasman, G. D. (1974). Conformational parameters for amino acids in helical, β-sheet, and random coil regions calculated from proteins.Biochemistry, 13, 211–222. [PubMed]
Chung, C. (2007). The use of biophysical methods increases success in obtaining liganded crystal structures.Acta Cryst. D63, 62–71. [PMC free article] [PubMed]
Chung, K. Y., Rasmussen, S. G. F., Liu, T., Li, S., DeVree, B. T., Chae, P. S., Calinski, D., Kobilka, B. K., Woods, V. L. Jr & Sunahara, R. K. (2011). Conformational changes in the G protein Gs induced by the β₂ adrenergic receptor.Nature (London), 477, 611–615. [PMC free article] [PubMed]
Clifton, M. C. et al. (2015). A maltose-binding protein fusion construct yields a robust crystallography platform for MCL1.PLoS One, 10, e0125010. [PMC free article] [PubMed]
Columbus, L. (2015). Post-expression strategies for structural investigations of membrane proteins.Curr. Opin. Struct. Biol.32, 131–138. [PMC free article] [PubMed]
Compiani, M. & Capriotti, E. (2013). Computational and theoretical methods for protein folding.Biochemistry, 52, 8601–8624. [PubMed]
Conte, L. L., Chothia, C. & Janin, J. (1999). The atomic structure of protein–protein recognition sites.J. Mol. Biol.285, 2177–2198. [PubMed]
Cooper, D. R., Boczek, T., Grelewska, K., Pinkowska, M., Sikorska, M., Zawadzki, M. & Derewenda, Z. (2007). Protein crystallization by surface entropy reduction: optimization of the SER strategy.Acta Cryst. D63, 636–645. [PubMed]
Craig, D. B. & Dombkowski, A. A. (2013). Disulfide by Design 2.0: a web-based tool for disulfide engineering in proteins.BMC Bioinformatics, 14, 346. [PMC free article] [PubMed]
Czepas, J., Devedjiev, Y., Krowarsch, D., Derewenda, U., Otlewski, J. & Derewenda, Z. S. (2004). The impact of Lys→Arg surface mutations on the crystallization of the globular domain of RhoGDI.Acta Cryst. D60, 275–280. [PubMed]
Daniel, E., Onwukwe, G. U., Wierenga, R. K., Quaggin, S. E., Vainio, S. J. & Krause, M. (2015). ATGme: open-source web application for rare codon identification and custom DNA sequence optimization.BMC Bioinformatics, 16, 303. [PMC free article] [PubMed]
Davies, D. R. (1964). A correlation between amino acid composition and protein structure.J. Mol. Biol.9, 605–609. [PubMed]
Davis-Searles, P. R., Saunders, A. J., Erie, D. A., Winzor, D. J. & Pielak, G. J. (2001). Interpreting the effects of small uncharged solutes on protein-folding equilibria.Annu. Rev. Biophys. Biomol. Struct.30, 271–306. [PubMed]
De Baets, G., Schymkowitz, J. & Rousseau, F. (2014). Predicting aggregation-prone sequences in proteins.Essays Biochem.56, 41–52. [PubMed]
Deber, C. M., Khan, A. R., Li, Z., Joensson, C., Glibowicka, M. & Wang, J. (1993). Val→Ala mutations selectively alter helix–helix packing in the transmembrane segment of phage M13 coat protein.Proc. Natl Acad. Sci. USA, 90, 11648–11652. [PMC free article] [PubMed]
Dekker, K., Yamagata, H., Sakaguchi, K. & Udaka, S. (1991). Xylose (glucose) isomerase gene from the thermophile Thermus thermophilus: cloning, sequencing, and comparison with other thermostable xylose isomerases.J. Bacteriol.173, 3078–3083. [PMC free article] [PubMed]
Demirdöven, N., Cheatum, C. M., Chung, H. S., Khalil, M., Knoester, J. & Tokmakoff, A. (2004). Two-dimensional infrared spectroscopy of antiparallel β-sheet secondary structure.J. Am. Chem. Soc.126, 7981–7990. [PubMed]
Derewenda, Z. S. (2004). Rational protein crystallization by mutational surface engineering.Structure, 12, 529–535. [PubMed]
Derewenda, Z. S. (2010). Application of protein engineering to enhance crystallizability and improve crystal properties.Acta Cryst. D66, 604–615. [PMC free article] [PubMed]
Derewenda, Z. S. & Vekilov, P. G. (2006). Entropy and surface engineering in protein crystallization.Acta Cryst. D62, 116–124. [PubMed]
Dieci, G., Bottarelli, L., Ballabeni, A. & Ottonello, S. (2000). tRNA-assisted overproduction of eukaryotic ribosomal proteins.Protein Expr. Purif.18, 346–354. [PubMed]
Dill, K. A. (1990). Dominant forces in protein folding.Biochemistry, 29, 7133–7155. [PubMed]
Disfani, F. M., Hsu, W.-L., Mizianty, M. J., Oldfield, C. J., Xue, B., Dunker, A. K., Uversky, V. N. & Kurgan, L. (2012). MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in proteins.Bioinformatics, 28, i75–i83. [PMC free article] [PubMed]
Dombkowski, A. A., Sultana, K. Z. & Craig, D. B. (2014). Protein disulfide engineering.FEBS Lett.588, 206–212. [PubMed]
Dong, A. et al. (2007). In situ proteolysis for protein crystallization and structure determination.Nature Methods, 4, 1019–1021. [PMC free article] [PubMed]
Dunker, A. K. & Oldfield, C. J. (2015). Back to the future: nuclear magnetic resonance and bioinformatics studies on intrinsically disordered proteins.Adv. Exp. Med. Biol.870, 1–34. [PubMed]
Dupeux, F., Röwer, M., Seroul, G., Blot, D. & Márquez, J. A. (2011). A thermal stability assay can help to estimate the crystallization likelihood of biological samples.Acta Cryst. D67, 915–919. [PubMed]
Elbein, A. D. (1987). Inhibitors of the biosynthesis and processing of N-linked oligosaccharide chains.Annu. Rev. Biochem.56, 497–534. [PubMed]
Englander, S. W. (2006). Hydrogen exchange and mass spectrometry: a historical perspective.J. Am. Soc. Mass Spectrom.17, 1481–1489. [PMC free article] [PubMed]
Englander, S. W. & Kallenbach, N. R. (1983). Hydrogen exchange and structural dynamics of proteins and nucleic acids.Q. Rev. Biophys.16, 521–655. [PubMed]
Ericsson, U. B., Hallberg, B. M., DeTitta, G. T., Dekker, N. & Nordlund, P. (2006). Thermofluor-based high-throughput stability optimization of proteins for structural studies.Anal. Biochem.357, 289–298. [PubMed]
Evdokimov, A. G. et al. (2007). Serendipitous discovery of novel bacterial methionine aminopeptidase inhibitors.Proteins, 66, 538–546. [PubMed]
Faria, T. Q., Lima, J. C., Bastos, M., Maçanita, A. L. & Santos, H. (2004). Protein stabilization by osmolytes from hyperthermophiles: effect of mannosylglycerate on the thermal unfolding of recombinant nuclease A from Staphylococcus aureus studied by picosecond time-resolved fluorescence and calorimetry.J. Biol. Chem.279, 48680–48691. [PubMed]
Fields, P. A., Dong, Y., Meng, X. & Somero, G. N. (2015). Adaptations of protein structure and function to temperature: there is more than one way to ‘skin a cat’.J. Exp. Biol.218, 1801–1811. [PubMed]
Fischer, S., Handrick, R. & Otte, K. (2015). The art of CHO cell engineering: a comprehensive retrospect and future perspectives.Biotechnol. Adv.33, 1878–1896. [PubMed]
Fischetti, R. F., Xu, S., Yoder, D. W., Becker, M., Nagarajan, V., Sanishvili, R., Hilgart, M. C., Stepanov, S., Makarov, O. & Smith, J. L. (2009). Mini-beam collimator enables microcrystallography experiments on standard beamlines.J. Synchrotron Rad.16, 217–225. [PMC free article] [PubMed]
Forse, G. J., Ram, N., Banatao, D. R., Cascio, D., Sawaya, M. R., Klock, H. E., Lesley, S. A. & Yeates, T. O. (2011). Synthetic symmetrization in the crystallization and structure determination of CelA from Thermotoga maritima.Protein Sci.20, 168–178. [PMC free article] [PubMed]
Fritsch, J., Lenz, O. & Friedrich, B. (2013). Structure, function and biosynthesis of O₂-tolerant hydrogenases.Nature Rev. Microbiol.11, 106–114. [PubMed]
Fu, W., Lin, J. & Cen, P. (2007). 5-Aminolevulinate production with recombinant Escherichia coli using a rare codon optimizer host strain.Appl. Microbiol. Biotechnol.75, 777–782. [PubMed]
Gheyi, T., Rodgers, L., Romero, R., Sauder, J. M. & Burley, S. K. (2010). Mass spectrometry guided in situ proteolysis to obtain crystals for X-ray structure determination.J. Am. Soc. Mass Spectrom.21, 1795–1801. [PMC free article] [PubMed]
Giordanetto, F., Schäfer, A. & Ottmann, C. (2014). Stabilization of protein–protein interactions by small molecules.Drug Discov. Today, 19, 1812–1821. [PubMed]
Goldschmidt, L., Cooper, D. R., Derewenda, Z. S. & Eisenberg, D. (2007). Toward rational protein crystallization: a web server for the design of crystallizable protein variants.Protein Sci.16, 1569–1576. [PMC free article] [PubMed]
Goldschmidt, L., Eisenberg, D. & Derewenda, Z. S. (2014). Salvage or recovery of failed targets by mutagenesis to reduce surface entropy.Methods Mol. Biol.1140, 201–209. [PubMed]
Goyal, S., Qin, H., Lim, L. & Song, J. (2015). Insoluble protein characterization by circular dichroism (CD) spectroscopy and nuclear magnetic resonance (NMR).Methods Mol. Biol.1258, 371–385. [PubMed]
Gräslund, S., Sagemark, J., Berglund, H., Dahlgren, L. G., Flores, A., Hammarström, M., Johansson, I., Kotenyova, T., Nilsson, M., Nordlund, P. & Weigelt, J. (2008). The use of systematic N- and C-terminal deletions to promote production and structural studies of recombinant proteins.Protein Expr. Purif.58, 210–221. [PubMed]
Gruner, S. M. & Lattman, E. E. (2015). Biostructural science inspired by next-generation X-ray sources.Annu. Rev. Biophys.44, 33–51. [PubMed]
Guttman, M., Kahn, M., Garcia, N. K., Hu, S.-L. & Lee, K. K. (2012). Solution structure, conformational dynamics, and CD4-induced activation in full-length, glycosylated, monomeric HIV gp120.J. Virol.86, 8750–8764. [PMC free article] [PubMed]
Guzman, L. M., Belin, D., Carson, M. J. & Beckwith, J. (1995). Tight regulation, modulation, and high-level expression by vectors containing the arabinose PBAD promoter.J. Bacteriol.177, 4121–4130. [PMC free article] [PubMed]
Guzzo, A. V. (1965). The influence of amino acid sequence on protein structure.Biophys. J.5, 809–822. [PMC free article] [PubMed]
Hall, M. P. (2014). Biotransformation and in vivo stability of protein biotherapeutics: impact on candidate selection and pharmacokinetic profiling.Drug Metab. Dispos.42, 1873–1880. [PubMed]
Hassell, A. M. et al. (2007). Crystallization of protein–ligand complexes.Acta Cryst. D63, 72–79. [PMC free article] [PubMed]
Hay, R. T. (2005). SUMO.Mol. Cell, 18, 1–12. [PubMed]
Heinz, D. W. & Matthews, B. W. (1994). Rapid crystallization of T4 lysozyme by intermolecular disulfide cross-linking.Protein Eng. Des. Sel.7, 301–307. [PubMed]
Hermans, J., Anderson, A. G. & Yun, R. H. (1992). Differential helix propensity of small apolar side chains studied by molecular dynamics simulations.Biochemistry, 31, 5646–5653. [PubMed]
Hirata, R., Ohsumk, Y., Nakano, A., Kawasaki, H., Suzuki, K. & Anraku, Y. (1990). Molecular structure of a gene, VMA1, encoding the catalytic subunit of H⁺-translocating adenosine triphosphatase from vacuolar membranes of Saccharomyces cerevisiae.J. Biol. Chem.265, 6726–6733. [PubMed]
Huai, Q., Kim, H.-Y., Liu, Y., Zhao, Y., Mondragon, A., Liu, J. O. & Ke, H. (2002). Crystal structure of calcineurin–cyclophilin–cyclosporin shows common but distinct recognition of immunophilin–drug complexes.Proc. Natl Acad. Sci. USA, 99, 12037–12042. [PMC free article] [PubMed]
Huang, Y. J., Acton, T. B. & Montelione, G. T. (2014). DisMeta: a meta server for construct design and optimization.Methods Mol. Biol.1091, 3–16. [PMC free article] [PubMed]
Igarashi, N., Ikuta, K., Miyoshi, T., Matsugaki, N., Yamada, Y., Yousef, M. S. & Wakatsuki, S. (2008). X-ray beam stabilization at BL-17A, the protein microcrystallography beamline of the Photon Factory.J. Synchrotron Rad.15, 292–295. [PMC free article] [PubMed]
Jahandideh, S., Jaroszewski, L. & Godzik, A. (2014). Improving the chances of successful protein structure determination with a random forest classifier.Acta Cryst. D70, 627–635. [PMC free article] [PubMed]
Jahandideh, S. & Mahdavi, A. (2012). RFCRYS: sequence-based protein crystallization propensity prediction by means of random forest.J. Theor. Biol.306, 115–119. [PubMed]
Jarvis, D. L. (2009). Baculovirus–insect cell expression systems.Methods Enzymol.463, 191–222. [PubMed]
Jiménez, M. A. (2014). Design of monomeric water-soluble β-hairpin and β-sheet peptides.Methods Mol. Biol.1216, 15–52. [PubMed]
Julien, J.-P. et al. (2015). Design and structure of two HIV-1 clade C SOSIP.664 trimers that increase the arsenal of native-like Env immunogens.Proc. Natl Acad. Sci. USA, 112, 11947–11952. [PMC free article] [PubMed]
Kandaswamy, K. K., Pugalenthi, G., Suganthan, P. N. & Gangal, R. (2010). SVMCRYS: an SVM approach for the prediction of protein crystallization propensity from protein sequence.Protein Pept. Lett.17, 423–430. [PubMed]
Kane, J. F. (1995). Effects of rare codon clusters on high-level expression of heterologous proteins in Escherichia coli.Curr. Opin. Biotechnol.6, 494–500. [PubMed]
Kapust, R. B. & Waugh, D. S. (1999). Escherichia coli maltose-binding protein is uncommonly effective at promoting the solubility of polypeptides to which it is fused.Protein Sci.8, 1668–1674. [PMC free article] [PubMed]
Ke, A. & Wolberger, C. (2003). Insights into binding cooperativity of MATa1/MATα2 from the crystal structure of a MATa1 homeodomain-maltose binding protein chimera.Protein Sci.12, 306–312. [PMC free article] [PubMed]
Kery, V., Elleder, D. & Kraus, J. P. (1995). δ-Aminolevulinate increases heme saturation and yield of human cystathionine β-synthase expressed in Escherichia coli.Arch. Biochem. Biophys.316, 24–29. [PubMed]
Khoury, G. A., Baliban, R. C. & Floudas, C. A. (2011). Proteome-wide post-translational modification statistics: frequency analysis and curation of the Swiss-Prot database.Sci. Rep.1, 90. [PMC free article] [PubMed]
Kim, D.-H. & Kim, M.-S. (2011). Hydrogenases for biological hydrogen production.Bioresour. Technol.102, 8423–8431. [PubMed]
Klock, H. E., Koesema, E. J., Knuth, M. W. & Lesley, S. A. (2008). Combining the polymerase incomplete primer extension method for cloning and mutagenesis with microscreening to accelerate structural genomics efforts.Proteins, 71, 982–994. [PubMed]
Klock, H. E. & Lesley, S. A. (2009). The polymerase incomplete primer extension (PIPE) method applied to high-throughput cloning and site-directed mutagenesis.Methods Mol. Biol.498, 91–103. [PubMed]
Kobe, B., Center, R. J., Kemp, B. E. & Poumbourios, P. (1999). Crystal structure of human T cell leukemia virus type 1 gp21 ectodomain crystallized as a maltose-binding protein chimera reveals structural evolution of retroviral transmembrane proteins.Proc. Natl Acad. Sci. USA, 96, 4319–4324. [PMC free article] [PubMed]
Komander, D. & Rape, M. (2012). The ubiquitin code.Annu. Rev. Biochem.81, 203–229. [PubMed]
Konermann, L., Pan, J. & Liu, Y.-H. (2011). Hydrogen exchange mass spectrometry for studying protein structure and dynamics.Chem. Soc. Rev.40, 1224–1234. [PubMed]
Kong, L., Giang, E., Nieusma, T., Kadam, R. U., Cogburn, K. E., Hua, Y., Dai, X., Stanfield, R. L., Burton, D. R., Ward, A. B., Wilson, I. A. & Law, M. (2013). Hepatitis C virus E2 envelope glycoprotein core structure.Science, 342, 1090–1094. [PMC free article] [PubMed]
Kong, L., Huang, C.-C., Coales, S. J., Molnar, K. S., Skinner, J., Hamuro, Y. & Kwong, P. D. (2010). Local conformational stability of HIV-1 gp120 in unliganded and CD4-bound states as defined by amide hydrogen/deuterium exchange.J. Virol.84, 10311–10321. [PMC free article] [PubMed]
Kong, L., Stanfield, R. & Wilson, I. (2014). HIV Glycans in Infection and Immunity, edited by R. Pantophlet, pp. 117–141. New York: Springer.
Kopera, E., Bal, W., Lenarčič Živkovič, M., Dvornyk, A., Kludkiewicz, B., Grzelak, K., Zhukov, I., Zagórski-Ostoja, W., Jaskolski, M. & Krzywda, S. (2014). Atomic resolution structure of a protein prepared by non-enzymatic His-tag removal. Crystallographic and NMR study of GmSPI-2 inhibitor.PLoS One, 9, e106936. [PMC free article] [PubMed]
Kottke, T., Hegemann, P., Dick, B. & Heberle, J. (2006). The photochemistry of the light-, oxygen-, and voltage-sensitive domains in the algal blue light receptor phot.Biopolymers, 82, 373–378. [PubMed]
Krężel, A., Kopera, E., Protas, A. M., Poznański, J., Wysłouch-Cieszyńska, A. & Bal, W. (2010). Sequence-specific Ni^II-dependent peptide bond hydrolysis for protein engineering. Combinatorial library determination of optimal sequences.J. Am. Chem. Soc.132, 3355–3366. [PubMed]
Krishnamurthy, H. & Gouaux, E. (2012). X-ray structures of LeuT in substrate-free outward-open and apo inward-open states.Nature (London), 481, 469–474. [PMC free article] [PubMed]
Krishnan, V. V. & Rupp, B. (2012). Macromolecular structure determination: comparison of X-ray crystallography and NMR spectroscopy.eLS, 10.1002/9780470015902.a0002716.pub2.
Kumar, S., Tsai, C.-J. & Nussinov, R. (2000). Factors enhancing protein thermostability.Protein Eng. Des. Sel.13, 179–191. [PubMed]
Kurgan, L., Razib, A. A., Aghakhani, S., Dick, S., Mizianty, M. & Jahandideh, S. (2009). CRYSTALP2: sequence-based protein crystallization propensity prediction.BMC Struct. Biol.9, 50. [PMC free article] [PubMed]
Kwan, A. H., Mobli, M., Gooley, P. R., King, G. F. & Mackay, J. P. (2011). Macromolecular NMR spectroscopy for the non-spectroscopist.FEBS J.278, 687–703. [PubMed]
Kyratsous, C. A. & Panagiotidis, C. A. (2012). Heat-shock protein fusion vectors for improved expression of soluble recombinant proteins in Escherichia coli.Methods Mol. Biol.824, 109–129. [PubMed]
Kyratsous, C. A., Silverstein, S. J., DeLong, C. R. & Panagiotidis, C. A. (2009). Chaperone-fusion expression plasmid vectors for improved solubility of recombinant proteins in Escherichia coli.Gene, 440, 9–15. [PMC free article] [PubMed]
Lawson, D. M., Artymiuk, P. J., Yewdall, S. J., Smith, J. M., Livingstone, J. C., Treffry, A., Luzzago, A., Levi, S., Arosio, P., Cesareni, G., Thomas, C. D., Shaw, W. V. & Harrison, P. M. (1991). Solving the structure of human H ferritin by genetically engineering intermolecular crystal contacts.Nature (London), 349, 541–544. [PubMed]
Layton, C. J. & Hellinga, H. W. (2011). Quantitation of protein–protein interactions by thermal stability shift analysis.Protein Sci.20, 1439–1450. [PMC free article] [PubMed]
Lazaridis, T. & Karplus, M. (2002). Thermodynamics of protein folding: a microscopic view.Biophys. Chem.100, 367–395. [PubMed]
Leibly, D. J., Nguyen, T. N., Kao, L. T., Hewitt, S. N., Barrett, L. K. & Van Voorhis, W. C. (2012). Stabilizing additives added during cell lysis aid in the solubilization of recombinant proteins.PLoS One, 7, e52482. [PMC free article] [PubMed]
Li, S.-C., Goto, N. K., Williams, K. A. & Deber, C. M. (1996). α-Helical, but not β-sheet, propensity of proline is determined by peptide environment.Proc. Natl Acad. Sci. USA, 93, 6676–6681. [PMC free article] [PubMed]
Lieberman, R. L., Culver, J. A., Entzminger, K. C., Pai, J. C. & Maynard, J. A. (2011). Crystallization chaperone strategies for membrane proteins.Methods, 55, 293–302. [PMC free article] [PubMed]
Lin, Z., Zhao, Q., Zhou, B., Xing, L. & Xu, W. (2015). Cleavable self-aggregating tags (cSAT) for protein expression and purification.Methods Mol. Biol.1258, 65–78. [PubMed]
Liu, J. & Song, J. (2009). Insights into protein aggregation by NMR characterization of insoluble SH3 mutants solubilized in salt-free water.PLoS One, 4, e7805. [PMC free article] [PubMed]
Liu, Y., Manna, A., Li, R., Martin, W. E., Murphy, R. C., Cheung, A. L. & Zhang, G. (2001). Crystal structure of the SarR protein from Staphylococcus aureus.Proc. Natl Acad. Sci. USA, 98, 6877–6882. [PMC free article] [PubMed]
Liu, Z. & Huang, Y. (2014). Advantages of proteins being disordered.Protein Sci.23, 539–550. [PMC free article] [PubMed]
Liu, Z.-Q., Mahmood, T. & Yang, P.-C. (2014). Western blot: technique, theory and trouble shooting.N. Am. J. Med. Sci.6, 160. [PMC free article] [PubMed]
Longenecker, K. L., Garrard, S. M., Sheffield, P. J. & Derewenda, Z. S. (2001). Protein crystallization by rational mutagenesis of surface residues: Lys to Ala mutations promote crystallization of RhoGDI.Acta Cryst. D57, 679–688. [PubMed]
Longhi, S., Lieutaud, P. & Canard, B. (2010). Conformational disorder.Methods Mol. Biol.609, 307–325. [PubMed]
Malakhova, O. A., Yan, M., Malakhov, M. P., Yuan, Y., Ritchie, K. J., Kim, K. I., Peterson, L. F., Shuai, K. & Zhang, D.-E. (2003). Protein ISGylation modulates the JAK-Stat signaling pathway.Genes Dev.17, 455–460. [PMC free article] [PubMed]
Mandel, C. R., Gebauer, D., Zhang, H. & Tong, L. (2006). A serendipitous discovery that in situ proteolysis is essential for the crystallization of yeast CPSF-100 (Ydh1p).Acta Cryst. F62, 1041–1045. [PMC free article] [PubMed]
Marsh, E. N. (2014). Fluorinated proteins: from design and synthesis to structure and stability.Acc. Chem. Res.47, 2878–2886. [PubMed]
Mateja, A., Devedjiev, Y., Krowarsch, D., Longenecker, K., Dauter, Z., Otlewski, J. & Derewenda, Z. S. (2002). The impact of Glu→Ala and Glu→Asp mutations on the crystallization properties of RhoGDI: the structure of RhoGDI at 1.3 Å resolution.Acta Cryst. D58, 1983–1991. [PubMed]
Mathiasen, S., Christensen, S. M., Fung, J. J., Rasmussen, S. G. F., Fay, J. F., Jorgensen, S. K., Veshaguri, S., Farrens, D. L., Kiskowski, M., Kobilka, B. & Stamou, D. (2014). Nanoscale high-content analysis using compositional heterogeneities of single proteoliposomes.Nature Methods, 11, 931–934. [PMC free article] [PubMed]
McPherson, A. & Cudney, B. (2006). Searching for silver bullets: an alternative strategy for crystallizing macromolecules.J. Struct. Biol.156, 387–406. [PubMed]
Means, G. E. (1977). Reductive alkylation of amino groups.Methods Enzymol.47, 469–478. [PubMed]
Mellquist, J. L., Kasturi, L., Spitalnik, S. L. & Shakin-Eshleman, S. H. (1998). The amino acid following an Asn-X-Ser/Thr sequon is an important determinant of N-linked core glycosylation efficiency.Biochemistry, 37, 6833–6837. [PubMed]
Mills, K. V., Johnson, M. A. & Perler, F. B. (2014). Protein splicing: how inteins escape from precursor proteins.J. Biol. Chem.289, 14498–14505. [PMC free article] [PubMed]
Mizianty, M. J. & Kurgan, L. (2009). Meta prediction of protein crystallization propensity.Biochem. Biophys. Res. Commun.390, 10–15. [PubMed]
Mizianty, M. J., Uversky, V. & Kurgan, L. (2014). Prediction of intrinsic disorder in proteins using MFDp2.Methods Mol. Biol.1137, 147–162. [PubMed]
Monod, J., Wyman, J. & Changeux, J.-P. (1965). On the nature of allosteric transitions: a plausible model.J. Mol. Biol.12, 88–118. [PubMed]
Moon, A. F., Mueller, G. A., Zhong, X. & Pedersen, L. C. (2010). A synergistic approach to protein crystallization: combination of a fixed-arm carrier with surface entropy reduction.Protein Sci.19, 901–913. [PMC free article] [PubMed]
Morar-Mitrica, S., Nesta, D. & Crotts, G. (2013). Differential scanning calorimetry (DSC) for biopharmaceutical development: old concepts, new applications.BioPharma Asia, 2(4), 44–55.
Moshe, A., Landau, M. & Eisenberg, D. (2016). Preparation of crystalline samples of amyloid fibrils and oligomers.Methods Mol. Biol.1345, 201–210. [PubMed]
Mottonen, J., Strand, A., Symersky, J., Sweet, R. M., Danley, D. E., Geoghegan, K. F., Gerard, R. D. & Goldsmith, E. J. (1992). Structural basis of latency in plasminogen activator inhibitor-1.Nature (London), 355, 270–273. [PubMed]
Moukhametzianov, R., Burghammer, M., Edwards, P. C., Petitdemange, S., Popov, D., Fransen, M., McMullan, G., Schertler, G. F. X. & Riekel, C. (2008). Protein crystallography with a micrometre-sized synchrotron-radiation beam.Acta Cryst. D64, 158–166. [PMC free article] [PubMed]
Mujtaba, S., He, Y., Zeng, L., Yan, S., Plotnikova, O., Sachchidanand, Sanchez, R., Zeleznik-Le, N. J., Ronai, Z. & Zhou, M.-M. (2004). Structural mechanism of the bromodomain of the coactivator CBP in p53 transcriptional activation.Mol. Cell, 13, 251–263. [PubMed]
Mulder, D. W., Shepard, E. M., Meuser, J. E., Joshi, N., King, P. W., Posewitz, M. C., Broderick, J. B. & Peters, J. W. (2011). Insights into [FeFe]-hydrogenase structure, mechanism, and maturation.Structure, 19, 1038–1052. [PubMed]
Nettleship, J. E., Watson, P. J., Rahman-Huq, N., Fairall, L., Posner, M. G., Upadhyay, A., Reddivari, Y., Chamberlain, J. M., Kolstoe, S. E., Bagby, S., Schwabe, J. W. & Owens, R. J. (2015). Transient expression in HEK 293 cells: an alternative to E. coli for the production of secreted and intracellular mammalian proteins.Methods Mol. Biol.1258, 209–222. [PubMed]
Neutze, R. & Moffat, K. (2012). Time-resolved structural studies at synchrotrons and X-ray free electron lasers: opportunities and challenges.Curr. Opin. Struct. Biol.22, 651–659. [PMC free article] [PubMed]
Novikova, O., Topilina, N. & Belfort, M. (2014). Enigmatic distribution, evolution, and function of inteins.J. Biol. Chem.289, 14490–14497. [PMC free article] [PubMed]
Oldfield, C. J., Meng, J., Yang, J. Y., Yang, M. Q., Uversky, V. N. & Dunker, A. K. (2008). Flexible nets: disorder and induced fit in the associations of p53 and 14-3-3 with their partners.BMC Genomics, 9, S1. [PMC free article] [PubMed]
Olson, M. A., Zabetakis, D., Legler, P. M., Turner, K. B., Anderson, G. P. & Goldman, E. R. (2015). Fusion to a highly stable consensus albumin binding domain allows for tunable pharmacokinetics.Protein Eng. Des. Sel.28, 395–402. [PubMed]
Oltersdorf, T. et al. (2005). An inhibitor of Bcl-2 family proteins induces regression of solid tumours.Nature (London), 435, 677–681. [PubMed]
Overton, I. M., Padovani, G., Girolami, M. A. & Barton, G. J. (2008). ParCrys: a Parzen window density estimation approach to protein crystallization propensity prediction.Bioinformatics, 24, 901–907. [PubMed]
Overton, I. M., van Niekerk, C. A. & Barton, G. J. (2011). XANNpred: neural nets that predict the propensity of a protein to yield diffraction-quality crystals.Proteins, 79, 1027–1033. [PMC free article] [PubMed]
Pace, C. N., Fu, H., Fryar, K. L., Landua, J., Trevino, S. R., Shirley, B. A., Hendricks, M. M., Iimura, S., Gajiwala, K., Scholtz, J. M. & Grimsley, G. R. (2011). Contribution of hydrophobic interactions to protein stability.J. Mol. Biol.408, 514–528. [PMC free article] [PubMed]
Pace, C. N., Fu, H. et al. (2014). Contribution of hydrogen bonds to protein stability.Protein Sci.23, 652–661. [PMC free article] [PubMed]
Pace, C. N. & Scholtz, J. M. (1998). Biophys. J.75, 422–427. [PMC free article] [PubMed]
Pace, C. N., Scholtz, J. M. & Grimsley, G. R. (2014). Forces stabilizing proteins.FEBS Lett.588, 2177–2184. [PMC free article] [PubMed]
Pantazatos, D., Kim, J. S., Klock, H. E., Stevens, R. C., Wilson, I. A., Lesley, S. A. & Woods, V. L. Jr (2004). Rapid refinement of crystallographic protein construct definition employing enhanced hydrogen/deuterium exchange MS.Proc. Natl Acad. Sci. USA, 101, 751–756. [PMC free article] [PubMed]
Papaneophytou, C. P. & Kontopidis, G. (2014). Statistical approaches to maximize recombinant protein expression in Escherichia coli: a general review.Protein Expr. Purif.94, 22–32. [PubMed]
Park, M. H., Nishimura, K., Zanelli, C. F. & Valentini, S. R. (2010). Functional significance of eIF5A and its hypusine modification in eukaryotes.Amino Acids, 38, 491–500. [PMC free article] [PubMed]
Pauling, L., Corey, R. B. & Branson, H. R. (1951). The structure of proteins: two hydrogen-bonded helical configurations of the polypeptide chain.Proc. Natl Acad. Sci. USA, 37, 205–211. [PMC free article] [PubMed]
Paulus, H. (2000). Protein splicing and related forms of protein autoprocessing.Annu. Rev. Biochem.69, 447–496. [PubMed]
Pelton, J. T. & McLean, L. R. (2000). Spectroscopic methods for analysis of protein secondary structure.Anal. Biochem.277, 167–176. [PubMed]
Privalov, P. L. & Dragan, A. I. (2007). Microcalorimetry of biological macromolecules.Biophys. Chem.126, 16–24. [PubMed]
Prothero, J. W. (1966). Correlation between the distribution of amino acids and alpha helices.Biophys. J.6, 367–370. [PMC free article] [PubMed]
Pugach, P. et al. (2015). A native-like SOSIP.664 trimer based on an HIV-1 subtype B env gene.J. Virol.89, 3380–3395. [PMC free article] [PubMed]
Puthalakath, H., Burke, J. & Gleeson, P. A. (1996). Glycosylation defect in Lec1 Chinese hamster ovary mutant is due to a point mutation in N-acetylglucosaminyltransferase I gene.J. Biol. Chem.271, 27818–27822. [PubMed]
Qing, G., Ma, L.-C., Khorchid, A., Swapna, G. V. T., Mal, T. K., Takayama, M. M., Xia, B., Phadtare, S., Ke, H., Acton, T., Montelione, G. T., Ikura, M. & Inouye, M. (2004). Cold-shock induced high-yield protein production in Escherichia coli.Nature Biotechnol.22, 877–882. [PubMed]
Quistgaard, E. M. (2014). A disulfide polymerized protein crystal.Chem. Commun.50, 14995–14997. [PubMed]
Rabut, G. & Peter, M. (2008). Function and regulation of protein neddylation.EMBO Rep.9, 969–976. [PMC free article] [PubMed]
Ramachandran, G. N., Ramakrishnan, C. & Sasisekharan, V. (1963). Stereochemistry of polypeptide chain configurations.J. Mol. Biol.7, 95–99. [PubMed]
Reeves, P. J., Callewaert, N., Contreras, R. & Khorana, H. G. (2002). Structure and function in rhodopsin: high-level expression of rhodopsin with restricted and homogeneous N-glycosylation by a tetracycline-inducible N-acetylglucosaminyltransferase I-negative HEK293S stable mammalian cell line.Proc. Natl Acad. Sci. USA, 99, 13419–13424. [PMC free article] [PubMed]
Rehm, T., Huber, R. & Holak, T. A. (2002). Application of NMR in structural proteomics: screening for proteins amenable to structural analysis.Structure, 10, 1613–1618. [PubMed]
Reich, S., Puckey, L. H., Cheetham, C. L., Harris, R., Ali, A. A. E., Bhattacharyya, U., Maclagan, K., Powell, K. A., Prodromou, C., Pearl, L. H., Driscoll, P. C. & Savva, R. (2006). Combinatorial domain hunting: an effective approach for the identification of soluble protein domains adaptable to high-throughput applications.Protein Sci.15, 2356–2365. [PMC free article] [PubMed]
Reinhard, L., Mayerhofer, H., Geerlof, A., Mueller-Dieckmann, J. & Weiss, M. S. (2013). Optimization of protein buffer cocktails using Thermofluor.Acta Cryst. F69, 209–214. [PMC free article] [PubMed]
Remaut, H., Tang, C., Henderson, N. S., Pinkner, J. S., Wang, T., Hultgren, S. J., Thanassi, D. G., Waksman, G. & Li, H. (2008). Fiber formation across the bacterial outer membrane by the chaperone/usher pathway.Cell, 133, 640–652. [PMC free article] [PubMed]
Rice, R. H., Means, G. E. & Brown, W. D. (1977). Stabilization of bovine trypsin by reductive methylation.Biochim. Biophys. Acta, 492, 316–321. [PubMed]
Richards, F. M. (1997). Protein stability: still an unsolved problem.Cell. Mol. Life Sci.53, 790–802. [PubMed]
Richardson, J. S. (1981). The anatomy and taxonomy of protein structure.Adv. Protein Chem.34, 167–339. [PubMed]
Ristic, M., Rosa, N., Seabrook, S. A. & Newman, J. (2015). Formulation screening by differential scanning fluorimetry: how often does it work?Acta Cryst. F71, 1359–1364. [PMC free article] [PubMed]
Rogers, S., Wells, R. & Rechsteiner, M. (1986). Amino acid sequences common to rapidly degraded proteins: the PEST hypothesis.Science, 234, 364–368. [PubMed]
Ronda, L., Bruno, S. & Bettati, S. (2013). Tertiary and quaternary effects in the allosteric regulation of animal hemoglobins.Biochim. Biophys. Acta, 1834, 1860–1872. [PubMed]
Rosa, N., Ristic, M., Seabrook, S. A., Lovell, D., Lucent, D. & Newman, J. (2015). Meltdown: a tool to help in the interpretation of thermal melt curves acquired by differential scanning fluorimetry.J. Biomol. Screen.20, 898–905. [PubMed]
Ruggiero, A., Smaldone, G., Squeglia, F. & Berisio, R. (2012). Enhanced crystallizability by protein engineering approaches: a general overview.Protein Pept. Lett.19, 732–742. [PubMed]
Rupp, B. (2015). Origin and use of crystallization phase diagrams.Acta Cryst. F71, 247–260. [PMC free article] [PubMed]
Sanchez-Ruiz, J. M. (1995). Differential scanning calorimetry of proteins.Subcell. Biochem.24, 133–176. [PubMed]
Sawaya, M. R., Sambashivan, S., Nelson, R., Ivanova, M. I., Sievers, S. A., Apostol, M. I., Thompson, M. J., Balbirnie, M., Wiltzius, J. J., McFarlane, H. T., Madsen, A. O., Riekel, C. & Eisenberg, D. (2007). Atomic structures of amyloid cross-β spines reveal varied steric zippers.Nature (London), 447, 453–457. [PubMed]
Schrödinger, E. (1945). What is Life? The Physical Aspect of the Living Cell. New York: McMillan.
Schuldt, L., Weyand, S., Kefala, G. & Weiss, M. S. (2009). The three-dimensional structure of a mycobacterial DapD provides insights into DapD diversity and reveals unexpected particulars about the enzymatic mechanism.J. Mol. Biol.389, 863–879. [PubMed]
Seabrook, S. A. & Newman, J. (2013). High-throughput thermal scanning for protein stability: making a good technique more robust.ACS Comb. Sci.15, 387–392. [PubMed]
Secchiero, P., Bosco, R., Celeghini, C. & Zauli, G. (2011). Recent advances in the therapeutic perspectives of nutlin-3.Curr. Pharm. Des.17, 569–577. [PubMed]
Selzer, L., Kant, R., Wang, J. C., Bothner, B. & Zlotnick, A. (2015). Hepatitis B virus core protein phosphorylation sites affect capsid stability and transient exposure of the C-terminal domain.J. Biol. Chem.290, 28584–28593 [PMC free article] [PubMed]
Semisotnov, G. V., Rodionova, N. A., Razgulyaev, O. I., Uversky, V. N., Gripas’, A. F. & Gilmanshin, R. I. (1991). Study of the ‘molten globule’ intermediate state in protein folding by a hydrophobic fluorescent probe.Biopolymers, 31, 119–128. [PubMed]
Shi, M., Foo, S. Y., Tan, S.-M., Mitchell, E. P., Law, S. K. A. & Lescar, J. (2007). A structural hypothesis for the transition between bent and extended conformations of the leukocyte β₂ integrins.J. Biol. Chem.282, 30198–30206. [PubMed]
Shimizu, K. (2014). POODLE: tools predicting intrinsically disordered regions of amino acid sequence.Methods Mol. Biol.1137, 131–145. [PubMed]
Shumway, S. D., Maki, M. & Miyamoto, S. (1999). The PEST domain of IκBα is necessary and sufficient for in vitro degradation by μ-calpain.J. Biol. Chem.274, 30874–30881. [PubMed]
Sikic, K. & Carugo, O. (2009). CARON – average RMSD of NMR structure ensembles.Bioinformation, 4, 132–133. [PMC free article] [PubMed]
Slabinski, L., Jaroszewski, L., Rychlewski, L., Wilson, I. A., Lesley, S. A. & Godzik, A. (2007). XtalPred: a web server for prediction of protein crystallizability.Bioinformatics, 23, 3403–3405. [PubMed]
Sledz, P., Zheng, H., Murzyn, K., Chruszcz, M., Zimmerman, M. D., Chordia, M. D., Joachimiak, A. & Minor, W. (2010). New surface contacts formed upon reductive lysine methylation: improving the probability of protein crystallization.Protein Sci.19, 1395–1404. [PMC free article] [PubMed]
Smagghe, B. J., Huang, P.-S., Ban, Y.-E. A., Baker, D. & Springer, T. A. (2010). Modulation of integrin activation by an entropic spring in the β-knee.J. Biol. Chem.285, 32954–32966. [PMC free article] [PubMed]
Smialowski, P. & Frishman, D. (2010). Protein crystallizability.Methods Mol. Biol.609, 385–400. [PubMed]
Smith, C. K., Withka, J. M. & Regan, L. (1994). A thermodynamic scale for the β-sheet forming tendencies of the amino acids.Biochemistry, 33, 5510–5517. [PubMed]
Smoot, A. L., Panda, M., Brazil, B. T., Buckle, A. M., Fersht, A. R. & Horowitz, P. M. (2001). The binding of bis-ANS to the isolated GroEL apical domain fragment induces the formation of a folding intermediate with increased hydrophobic surface not observed in tetradecameric GroEL.Biochemistry, 40, 4484–4492. [PubMed]
Smyth, D. R., Mrozkiewicz, M. K., McGrath, W. J., Listwan, P. & Kobe, B. (2003). Crystal structures of fusion proteins with large affinity tags.Protein Sci.12, 1313–1322. [PMC free article] [PubMed]
Somero, G. N. (2004). Adaptation of enzymes to temperature: searching for basic ‘strategies’.Comp. Biochem. Physiol. B Biochem. Mol. Biol.139, 321–333. [PubMed]
Sørensen, H. P. & Mortensen, K. K. (2005). Advanced genetic strategies for recombinant protein expression in Escherichia coli.J. Biotechnol.115, 113–128. [PubMed]
Spence, J. C., Weierstall, U. & Chapman, H. N. (2012). X-ray lasers for structural and dynamic biology.Rep. Prog. Phys.75, 102601. [PubMed]
Spencer, M. L., Theodosiou, M. & Noonan, D. J. (2004). NPDC-1, a novel regulator of neuronal proliferation, is degraded by the ubiquitin/proteasome system through a PEST degradation motif.J. Biol. Chem.279, 37069–37078. [PubMed]
Spiegel, H., Schinkel, H., Kastilan, R., Dahm, P., Boes, A., Scheuermayer, M., Chudobová, I., Maskus, D., Fendel, R., Schillberg, S., Reimann, A. & Fischer, R. (2015). Optimization of a multi-stage, multi-subunit malaria vaccine candidate for the production in Pichia pastoris by the identification and removal of protease cleavage sites.Biotechnol. Bioeng.112, 659–667. [PubMed]
Spraggon, G., Pantazatos, D., Klock, H. E., Wilson, I. A., Woods, V. L. Jr & Lesley, S. A. (2004). On the use of DXMS to produce more crystallizable proteins: structures of the T. maritima proteins TM0160 and TM1171.Protein Sci.13, 3187–3199. [PMC free article] [PubMed]
Stickle, D. F., Presta, L. G., Dill, K. A. & Rose, G. D. (1992). Hydrogen bonding in globular proteins.J. Mol. Biol.226, 1143–1159. [PubMed]
Striebel, F., Imkamp, F., Özcelik, D. & Weber-Ban, E. (2014). Pupylation as a signal for proteasomal degradation in bacteria.Biochim. Biophys. Acta, 1843, 103–113. [PubMed]
Studier, F. W. (1991). Use of bacteriophage T7 lysozyme to improve an inducible T7 expression system.J. Mol. Biol.219, 37–44. [PubMed]
Sugase, K., Dyson, H. J. & Wright, P. E. (2007). Mechanism of coupled folding and binding of an intrinsically disordered protein.Nature (London), 447, 1021–1025. [PubMed]
Surma, M. A., Szczepaniak, A. & Króliczewski, J. (2014). Comparative studies on detergent-assisted apocytochrome b₆ reconstitution into liposomal bilayers monitored by zetasizer instruments.PLoS One, 9, e111341. [PMC free article] [PubMed]
Tan, K. et al. (2014). Salvage of failed protein targets by reductive alkylation.Methods Mol. Biol.1140, 189–200. [PMC free article] [PubMed]
Tanner, J. J., Hecht, R. M. & Krause, K. L. (1996). Determinants of enzyme thermostability observed in the molecular structure of Thermus aquaticusd-glyceraldehyde-3-phosphate dehydrogenase at 2.5 Å resolution.Biochemistry, 35, 2597–2609. [PubMed]
Tegel, H., Tourle, S., Ottosson, J. & Persson, A. (2010). Increased levels of recombinant human proteins with the Escherichia coli strain Rosetta(DE3).Protein Expr. Purif.69, 159–167. [PubMed]
Towbin, H., Staehelin, T. & Gordon, J. (1979). Electrophoretic transfer of proteins from polyacrylamide gels to nitrocellulose sheets: procedure and some applications.Proc. Natl Acad. Sci. USA, 76, 4350–4354. [PMC free article] [PubMed]
Uervirojnangkoorn, M., Zeldin, O. B., Lyubimov, A. Y., Hattne, J., Brewster, A. S., Sauter, N. K., Brunger, A. T. & Weis, W. I. (2015). Enabling X-ray free electron laser crystallography for challenging biological systems from a limited number of crystals.Elife, 4, e05421. [PMC free article] [PubMed]
Uversky, V. N. & Dunker, A. K. (2010). Understanding protein non-folding.Biochim. Biophys. Acta, 1804, 1231–1264. [PMC free article] [PubMed]
Vasina, J. A. & Baneyx, F. (1997). Expression of aggregation-prone recombinant proteins at low temperatures: a comparative study of the Escherichia colicspA and tac promoter systems.Protein Expr. Purif.9, 211–218. [PubMed]
Volkmann, G. & Iwaï, H. (2010). Protein trans-splicing and its use in structural biology: opportunities and limitations.Mol. Biosyst.6, 2110–2121. [PubMed]
Wakatsuki, S. (2016). In Synchrotron Light Sources and Free-Electron Lasers, edited by E. Jaeschke, S. Khan, J. R. Schneider & J. B. Hastings. New York: Springer. In the press.
Walker, C. S., Shetty, R. P., Clark, K., Kazuko, S. G., Letsou, A., Olivera, B. M. & Bandyopadhyay, P. K. (2001). On a potential global role for vitamin K-dependent γ-carboxylation in animal systems: evidence for a γ-glutamyl carboxylase in Drosophila.J. Biol. Chem.276, 7769–7774. [PubMed]
Wallace, B. A. & Janes, R. W. (2010). Synchrotron radiation circular dichroism (SRCD) spectroscopy: an enhanced method for examining protein conformations and protein interactions.Biochem. Soc. Trans.38, 861–873. [PubMed]
Walls, D. & Loughran, S. T. (2011). Tagging recombinant proteins to enhance solubility and aid purification.Methods Mol. Biol.681, 151–175. [PubMed]
Walter, T. S., Meier, C., Assenberg, R., Au, K. F., Ren, J., Verma, A., Nettleship, J. E., Owens, R. J., Stuart, D. I. & Grimes, J. M. (2006). Lysine methylation as a routine rescue strategy for protein crystallization.Structure, 14, 1617–1622. [PubMed]
Wang, J., Cao, Z., Zhao, L. & Li, S. (2011). Novel strategies for drug discovery based on intrinsically disordered proteins (IDPs).Int. J. Mol. Sci.12, 3205–3219. [PMC free article] [PubMed]
Warne, T., Serrano-Vega, M. J., Baker, J. G., Moukhametzianov, R., Edwards, P. C., Henderson, R., Leslie, A. G. W., Tate, C. G. & Schertler, G. F. X. (2008). Structure of a β₁-adrenergic G-protein-coupled receptor.Nature (London), 454, 486–491. [PMC free article] [PubMed]
Weckert, E. (2015). The potential of future light sources to explore the structure and function of matter.IUCrJ, 2, 230–245. [PMC free article] [PubMed]
Wells, J. A. & McClendon, C. L. (2007). Reaching for high-hanging fruit in drug discovery at protein–protein interfaces.Nature (London), 450, 1001–1009. [PubMed]
Wernimont, A. & Edwards, A. (2009). In situ proteolysis to generate crystals for structure determination: an update.PLoS One, 4, e5094. [PMC free article] [PubMed]
Whisstock, J. C. & Bottomley, S. P. (2006). Molecular gymnastics: serpin structure, folding and misfolding.Curr. Opin. Struct. Biol.16, 761–768. [PubMed]
Whisstock, J. C., Skinner, R., Carrell, R. W. & Lesk, A. M. (2000). Conformational changes in serpins. I. The native and cleaved conformations of α₁-antitrypsin.J. Mol. Biol.296, 685–699. [PubMed]
Whiteheart, S. W., Shenbagamurthi, P., Chen, L., Cotter, R. J. & Hart, G. W. (1989). Murine elongation factor 1 alpha (EF-1 alpha) is posttranslationally modified by novel amide-linked ethanolamine-phosphoglycerol moieties. Addition of ethanolamine-phosphoglycerol to specific glutamic acid residues on EF-1 alpha.J. Biol. Chem.264, 14334–14341. [PubMed]
Whitmore, L. & Wallace, B. A. (2008). Protein secondary structure analyses from circular dichroism spectroscopy: methods and reference databases.Biopolymers, 89, 392–400. [PubMed]
Whitmore, L., Woollett, B., Miles, A. J., Klose, D. P., Janes, R. W. & Wallace, B. A. (2011). PCDDB: the Protein Circular Dichroism Data Bank, a repository for circular dichroism spectral and metadata.Nucleic Acids Res.39, D480–D486. [PMC free article] [PubMed]
Willis, B. T. M. & Pryor, A. W. (1975). Thermal Vibrations in Crystallography. Cambridge University Press.
Wood, D. W. (2014). New trends and affinity tag designs for recombinant protein purification.Curr. Opin. Struct. Biol.26, 54–61. [PubMed]
Wright, P. E. & Dyson, H. J. (1999). Intrinsically unstructured proteins: re-assessing the protein structure–function paradigm.J. Mol. Biol.293, 321–331. [PubMed]
Wukovitz, S. W. & Yeates, T. O. (1995). Why protein crystals favour some space groups over others.Nature Struct. Mol. Biol.2, 1062–1067. [PubMed]
Yakimov, A., Rychkov, G. & Petukhov, M. (2014). De novo design of stable α-helices.Methods Mol. Biol.1216, 1–14. [PubMed]
Yamasaki, M., Li, W., Johnson, D. J. & Huntington, J. A. (2008). Crystal structure of a stable dimer reveals the molecular basis of serpin polymerization.Nature (London), 455, 1255–1258. [PubMed]
Yeh, A. P., McMillan, A. & Stowell, M. H. B. (2006). Rapid and simple protein-stability screens: application to membrane proteins.Acta Cryst. D62, 451–457. [PubMed]
Yumerefendi, H., Tarendeau, F., Mas, P. J. & Hart, D. J. (2010). ESPRIT: an automated, library-based method for mapping and soluble expression of protein domains from challenging targets.J. Struct. Biol.172, 66–74. [PubMed]
Yun, R. H., Anderson, A. & Hermans, J. (1991). Proline in α-helix: stability and conformation studied by dynamics simulation.Proteins, 10, 219–228. [PubMed]
Zhang, A. P., Bornholdt, Z. A., Liu, T., Abelson, D. M., Lee, D. E., Li, S., Woods, V. L. Jr & Saphire, E. O. (2012). The ebola virus interferon antagonist VP24 directly binds STAT1 and has a novel, pyramidal fold.PLoS Pathog.8, e1002550. [PMC free article] [PubMed]
Zhao, Q., Frederick, R., Seder, K., Thao, S., Sreenath, H., Peterson, F., Volkman, B. F., Markley, J. L. & Fox, B. G. (2004). Production in two-liter beverage bottles of proteins for NMR structure determination labeled with either ¹⁵N- or ¹³C–¹⁵N.J. Struct. Funct. Genomics, 5, 87–93. [PubMed]
Zhou, Q. et al. (2015). Architecture of the synaptotagmin–SNARE machinery for neuronal exocytosis.Nature (London), 525, 62–67. [PMC free article] [PubMed]
Zhou, X.-X., Wang, Y.-B., Pan, Y.-J. & Li, W.-F. (2008). Differences in amino acids composition and coupling patterns between mesophilic and thermophilic proteins.Amino Acids, 34, 25–33. [PubMed]
Zou, Y., Weis, W. I. & Kobilka, B. K. (2012). N-Terminal T4 lysozyme fusion facilitates crystallization of a G protein coupled receptor.PLoS One, 7, e46039. [PMC free article] [PubMed]

Articles from Acta Crystallographica. Section F, Structural Biology Communications are provided here courtesy of International Union of Crystallography

Published online 2013 Jan 21. doi: 10.1186/1471-2105-14-S2-S5

PMID: 23369171

This article has been cited by other articles in PMC.

Associated Data

Supplementary Materials

Additional file 1 M3131_Decreased and M3131_Increased show the integrated training data M3131 separated into positive (increasing stability) dataset and negative (decreasing stability) dataset. iStable_Comparison_results presents the different results of training conditions and comparisons of different predictors.

GUID: 93305430-F66F-42B0-928C-3429EEB0AD3C

Additional file 2 Superfamily_M1311 and Superfamily_M1820 record the superfamilies refer to the PDB IDs in M1311and M1820datasets. SF_DNA BINDING, SF_Enzyme, and SF_Protein-protein-interaction list the PDB IDs belong to three major categories.

GUID: ED3CF2A5-D230-404D-9FFF-AE96DDE20E14

Abstract

Background

Mutation of a single amino acid residue can cause changes in a protein, which could then lead to a loss of protein function. Predicting the protein stability changes can provide several possible candidates for the novel protein designing. Although many prediction tools are available, the conflicting prediction results from different tools could cause confusion to users.

Results

We proposed an integrated predictor, iStable, with grid computing architecture constructed by using sequence information and prediction results from different element predictors. In the learning model, several machine learning methods were evaluated and adopted the support vector machine as an integrator, while not just choosing the majority answer given by element predictors. Furthermore, the role of the sequence information played was analyzed in our model, and an 11-window size was determined. On the other hand, iStable is available with two different input types: structural and sequential. After training and cross-validation, iStable has better performance than all of the element predictors on several datasets. Under different classifications and conditions for validation, this study has also shown better overall performance in different types of secondary structures, relative solvent accessibility circumstances, protein memberships in different superfamilies, and experimental conditions.

Conclusions

The trained and validated version of iStable provides an accurate approach for prediction of protein stability changes. iStable is freely available online at: http://predictor.nchu.edu.tw/iStable.

Background

Protein Stability Review

Protein structure is highly related to protein function. A single mutation on the amino acid residue may cause a severe change in the whole protein structure and thus, lead to disruption of function. A well-known instance is the sickle cell anemia, which is caused by a single mutation from glutamate to valine at the sixth position of the hemoglobin sequence, leading to abnormal polymerization of hemoglobin and distorting the shape of red blood cells []; single amino acid mutation could also change the structural stability of a protein by making a smaller free energy change (ΔG, or dG) after folding, while the difference in folding free energy change between wild type and mutant protein (ΔΔG, or ddG) is often considered as an impact factor of protein stability changes []. From the viewpoint of protein design, it will be very helpful if researchers could accurately predict changes in protein stability resulting from amino acid mutations without actually doing experiments []. If the mechanism by which a single site mutation influences protein stability could be revealed, protein designers might be able to design novel proteins or modify existing enzymes into more efficient, thermal-stable forms, which are ideal for biochemical research and industrial applications in two ways: first, a thermal-stable enzyme could function well in high temperature environment and therefore, reveal higher efficiency due to the relatively higher temperature; second, a structurally stable protein could have longer a half life than relatively unstable ones, meaning a longer usage time, which could economize the use of enzymes.

As the data regarding protein stability changes based on residue mutations is collected, a comprehensive and integrated database for protein thermodynamic parameters is built and published. ProTherm is constructed and can be queried by using a web-based interface http://gibk26.bio.kyutech.ac.jp/jouhou/protherm/protherm.html. All the data collected in ProTherm is all validated through actual experiment and collect from published original articles. In this database, researchers access information on the mutant protein, experimental methods and conditions, thermodynamic parameters, and literature information. Due to the richness of data, ProTherm has been a valuable resource for researchers trying to know more about the protein folding mechanism and protein stability changes []. In the past decades, many of the available prediction methods designed for predicting protein stability changes. Some of these researched the physical potential [-], some were based on statistical potentials [,-] and some on empirical approaches that combined physical and statistical potentials to confer how the protein stability would change upon mutations [-]; still others were based on machine learning theories, by converting the energy and environment parameters into digital inputs for different methods such as support vector machine, neural network, decision tree and random forest [-]. Nowadays, there are many web-based prediction tools available, and each of them has its own capabilities and advantages, although none of them is perfect. As different predictors give conflicting results, it may be difficult for the user to decide which result is correct. An integrated predictor could relieve the user from such dilemma [].

In this study, we construct an integrated predictor, iStable, which uses a support vector machine (SVM) to predict protein stability changes upon single amino acid residue mutations. Integration of predictors helps to combine results from different predictors and use the power of meta predictions to perform better than any single method alone. Considering the effects of nonlocal interactions, most prediction methods need three-dimensional information on the protein in order to predict stability changes; however, recent research has proven that sequence information can also be used to effectively predict a mutation's effects [,-,-,]. We collected the prediction results from different types of predictors used for constructing iStable by submitting a compiled dataset to them, and applied the sequence information together as inputs for SVM training. When the user submits a new prediction task, iStable will determine whether the mutation is a stabilizing or destabilizing mutant. As previous works have mentioned, correctly predicting the direction of the stability change is more relevant than knowing its magnitude [,].

In the construction of iStable, five web-based prediction tools were chosen as element predictors: I-Mutant2.0 [], MUPRO [], AUTO-MUTE [], PoPMuSiC2.0 [], and CUPSAT []. From these predictors, seven models were chosen for in-model training, as described later. During iStable training, we found that the element predictors usually performed well when handling destabilizing mutations, but when it came to stabilizing mutations, the element predictors did not show very satisfying performance, leading to a high specificity combined with a relatively low sensitivity. After training, we designed two different prediction strategies for users that provided two formats of input data. Both showed better prediction performance than all of the other element predictors, which was especially apparent when predicting the effects of stabilizing mutations. Moreover, we undertook various analyses to evaluate iStable in order to make it more precise for user applications. The constructed iStable web-based tool, which provides two strategies for prediction, is available at http://predictor.nchu.edu.tw/iStable/.

Methods

Compilation of training datasets

The compilation of our training dataset can be divided into six steps, which are summarized in Figure Figure11.

Data Processing of iStable. After collecting two datasets used for training I-Mutant2.0 and PoPMuSic2.0, we integrated them into a non redundant dataset of protein stability change data, with the information of secondary structure and RSA value on the mutant site included.

Step 1 Collection of training data

Two datasets, collected from ProTherm, were used for our model training: the first is Capriotti's training set used for the construction of I-Mutant2.0 (available at http://gpcr2.biocomp.unibo.it/~emidio/I-Mutant2.0/dbMut3D.html, which includes data from 1948 mutation sites of 58 proteins, and is referred to as dataset S1948 for convenience. The second source is the dataset Dehouck used in training of PoPMuSiC2.0 (available at http://bioinformatics.oxfordjournals.org/cgi/content/full/btp445/DC1), which includes data from 2648 mutation sites of 119 proteins; this dataset is named S2648 for convenience.

Five types of information can be obtained from these two datasets:

1) The ID of the protein corresponds to its protein data bank (PDB) ID, which allows element predictors to obtain 3D information for proteins by getting the structure data (in PDB file format).

2) The site of mutation and the residue site of the native and mutant proteins.

3) The temperature used in the experiment.

4) The pH used in the experiment.

5) The relative stability change of mutant proteins (ddG or ΔΔG), an index of stability change that has been used in previous studies.

Step 2 Deletion of redundant data

In dataset S1948, many of the mutations share the same PDB IDs and have the same mutation site and ddG value, resulting in redundant data that may lead to biases in training. In addition to these redundant sites, some data still has the same PDB ID and mutation site, with only the pH and temperature differing slightly. We removed the redundant data and named the resulting dataset M1311, as there remained data from 1,311 mutations of 58 proteins.

The S2648 dataset shares the same PDB ID and mutation site information as M1311 for 815 mutations; we had to remove this data because we needed an unbiased training dataset. After having removed the redundant data, the remaining dataset was named M1820 and contained data from 1,820 mutations in 119 proteins.

Step 3 Definitions of positive and negative data

We defined the stabilizing data as positive (+) with a ddG value > 0, and the destabilizing data as negative (-) with a ddG value < 0; this convention for ddG is consistent with I-Mutant2.0 and AUTO-MUTE. PoPMuSiC2.0 uses a different convention for ddG, so we inverted the sign of ddG in M1820.

Step 4 Correction of sequence information

To make our predictor more adaptable so that it can handle novel protein mutations, we also included sequence data into training datasets M1311 and M1820. The sequence information is presented as a segment of protein sequence centered on the mutated site, with window sizes ranging from 7 to 19 tested separately.

Since the position of residues can be expressed as either absolute or relative, directly applying FASTA text will lead to inconsistencies with the training data, which could cause problems when using I-Mutant2.0 and MUPRO. By checking the consistency of the sequence at the mutation site and the latest sequence text manually, we found several differences between relative and absolute positions of sequence first residue in proteins and corrected them to make the attached sequence information consistent with the training dataset; the final integrated dataset was called M3131. The datasets comprise M1311, M1820, and M3131 can be fetched in Additional file 1.

Step 5 Classification of secondary structure and relative solvent accessibility

Previous studies have mentioned the secondary structure and relative solvent accessibility (RSA) of the mutation site as effective predictors of the accuracy of protein stability-change prediction [,]. We analyzed the distribution of data based on the secondary structure and RSA of the mutation site. Secondary structures were classified as helix (α helix), sheet (β sheet), or other (turn and coil). Its range determined the RSA: values between 0% and 20% were classified as 'B' (buried), between 20%~50% as 'P' (partially buried) and between 50% and 100% as 'E' (exposed). This RSA classification is based upon those used in previous studies [,].

Step 6 Categorization of proteins

The motivation for predicting protein stability changes is to find a mechanism to modify existing enzymes into more stable forms. We accessed the PDB to determine which superfamilies the proteins in the training dataset belonged to and found three major categories: enzymes, nucleic acid binding proteins, and protein-protein interaction related (ubiquitin-related, for example). The dataset can be fetched in Additional file 2.

Element predictors

Five element predictors were chosen:

1. I-Mutant2.0 adopts an SVM model to approximate the ddG value of the protein and predicts the direction of stability change. Both sequence (I-Mutant_SEQ) and structure (I-Mutant_PDB) information is used in iStable construction.

2. AUTO-MUTE computes the environmental disturbance caused by a single amino acid replacement. From the four models of prediction available in AUTO-MUTE, we chose the random forest (RF) (AUTO-MUTE_RF) and support vector machine (AUTO-MUTE_SVM) strategies for our model construction.

3. MUPRO adopts an SVM model to predict stability changes due to single-site mutations, primarily from sequential information, along with the use of optionally provided structural information. The result predicts only whether the change will lead to destabilization or not, without providing an actual ddG value. During the construction of iStable, we found that the regression task and the neural network approaches were broken. We used the SVM model (MUPRO_SVM) as an element predictor.

4. PoPMuSiC2.0 applies an energy-based function and uses the volume change of a protein upon single amino acid mutation to predict the stability change.

5. CUPSAT predicts protein stability changes using structural environment-specific atom potentials and torsion angle potentials. The user can submit predictions by typing in the PDB ID or uploading a custom PDB file.

Summaries of the element predictors are given in Table Table11.

Table 1

Predictors	References	URLs
I-Mutant2.0	[20]	http://gpcr2.biocomp.unibo.it/cgi/predictors/I-Mutant2.0/I-Mutant2.0.cgi
AUTO-MUTE	[30]	http://proteins.gmu.edu/automute/
MUPRO	[22]	http://www.ics.uci.edu/~baldig/mutation.html
PoPMuSiC2.0	[31]	http://babylone.ulb.ac.be/popmusic/
CUPSAT	[10]	http://cupsat.tu-bs.de/

List of chosen predictors used in the construction of iStable with the corresponding references and URLs.

Obtaining prediction results from element predictors

When using I-Mutant2.0, in addition to the PDB ID, the sequential strategy (I-Mutant_SEQ) was also applied, by choosing the direction-deciding prediction strategy; in the output form, we extracted the stability-change direction. When submitting to AUTO-MUTE, we entered the PDB ID, mutation, temperature, pH value, and chain code (if available). The prediction results using RF and SVM were collected separately; we extracted the direction of stability change (decreased/increased) in the output form. Since MUPRO uses protein sequence as its input information, we obtained the sequence from a FASTA file downloaded beforehand and then pasted the sequence into the input form and designated the site of mutation and the mutated amino acid code. The output form gives the user three types of prediction results, and we took all of them into consideration. For some reason, the regression and neural network models in the website did not work when constructing iStable; the regression model always gave a result of 'INCREASE', and the neural network predictor always gave 'DECREASE' as a result. Presently, only the SVM strategy is applied in the construction of iStable. PoPMuSiC2.0 accepts PDB ID, chain code (if available), and site information as input data; the predicted ddG is then extracted. CUPSAT accepts either the PDB ID or the PDB file format in order to predict changes in stability, and we chose to use the uploaded PDB file. We obtained the secondary structure, the relative solvent accessibility of the mutated site, and the predicted ddG value. All the work described was completed with Java program.

Encoding schemes of support vector machine

After compared witch various algorithms, SVM was selected as the learning model for iStable, protein stability changes upon mutation can be predicted by using structural and sequential information, as in previous studies. In our research, we used the prediction results from the element predictors as input data with local sequence information included. The SVM converted the data into a multi-dimension vector. After distributing the data into multi-dimension space, the SVM determined a hyperplane used to split the data into different groups. The trained integrated predictor iStable uses SVM to predict the direction of stability change of the protein input data, that is, to determine whether the target is a stabilizing or a destabilizing mutant. In this work, we used LIBSVM (Library for support vector machines) 2.89 [] to achieve the SVMs implemented in this study, and the kernel adopted the radial basis function (RBF). While training, two crucial parameters were tuned to optimize the performance of prediction, the kernel parameter γ and the penalty parameter C. The value of γ and C were tuned to 0.03125 and 2, separately.

When encoding our training data into the form used by the SVM, the input data was constructed using two schemes: sequence scheme and website results scheme. In the sequence scheme, we converted sequences into several sets of 21-symbol coded input, namely, the 20 amino acid codes and an extra input representing the end-flanking fragment (ex: '-'DCAMYW); one set of the 21 inputs was used to represent the mutant residue after the mutation; the sequence scheme had (21 × ('window size'+1)) inputs altogether. The website result scheme had seven sets of input (I-Mutant_PDB, I-Mutant_SEQ, AUTO-MUTE_RF, AUTO-MUTE_SVM, MU-PRO_SVM, PoPMuSiC2.0 and CUPSAT) representing the prediction results of element predictors, each shown as a set of three inputs, with destabilizing results represented as '1-0-0' and stabilizing results represented as '0-0-1'. As some prediction queries were not accessible to a specific site, we recorded this type of result as a null prediction, represented as '0-1-0'. The trained predictor was evaluated with 5-fold cross-validation as the training dataset was split into five groups, with four groups used as training sets and one as the testing set by turns.

After iStable was constructed using all of the schemes, we designed another model of predictor integration, named iStable_SEQ, primarily for users handling protein sequences where no PDB ID is available. The iStable_SEQ model was constructed using a sequence scheme and using only the results of I-Mutant_SEQ and MUPRO_SVM of the website scheme, both of which use protein sequences as their inputs for prediction queries. The iStable_SEQ was also trained and validated with 5-fold cross-validation.

Framework of integrated predictor construction

Protein Function Prediction Tools

Figure Figure22 is a brief introduction to iStable's grid computing architecture. The predictor can be divided into three different layers - predictor layer, coordinator layer, and data visualization layer.

Grid computing architecture of iStable. When a user input the mutant protein's information through graphical user interface, the input/output dispatcher will pass the relative information to element predictors. After the results from predictors are collected into repository module, prediction layer will active the prediction program and the output result will be send to data visualization layer through input/output dispatcher, finally the integrated result will be presented to the user.

Predictor layer

It is the source of data needed for data integration, which, in this article, refers to the element predictors used: I-Mutant_PDB, I-Mutant_SEQ, AUTO-MUTE_RF, AUTO-MUTE_SVM, MUPRO_SVM, PoPMuSiC2.0 and CUPSAT.

A. Adapter: The interface uses the Java HttpUnit suite to convert information between the in-put data and the predictors' input formats.

B. Website: I-Mutant_PDB, I-Mutant_SEQ, AUTO-MUTE_RF, AUTO-MUTE_SVM, MU-PRO_SVM, PoPMuSiC2.0 and CUPSAT.

Data visualization layer

It is the layer to present a graphical user interface (GUI) and output the prediction result, which can be divided into two modules:

A. GUI: Through the use of a JSP website and JavaScript, it provides users with an interface for inputs and results in webpage form.

B. Result visualization: A Java program, responsible for integrating the prediction result and adding webpage tags for result output.

Coordinator layer

It is the coordinator between the predictor and data visualization layers. As users input parameters through the visualization layer GUI, the coordinator layer can receive the parameters and send them to the predictor layer at the same time. It can then receive results from the predictor layer to complete the prediction of stability change. The coordinator layer can be divided into three modules:

A. Prediction: executes prediction mechanism using the SVM method described before.

B. Repository: deposits the prediction results from the element predictors.

C. I/O Dispatcher: responsible for sequential actions after receiving parameters from users; collects results from element predictors, deposits data, and coordinates the prediction work.

Prediction progress of iStable

Figure Figure33 is a visualized presentation of iStable prediction work. When a user inputs a query with protein mutant information, the program first accesses the PDB and gets the structure data and the amino acid sequence. After structural and sequential information is gathered, the program get an 11-amino acid residue sequence window centered on the mutated site, converts it into 11 sets of sequential code with 21 inputs, and the mutant residue is converted into an extra set of sequential code. Meanwhile, the structural (PDB code and PDB file) and sequential (FASTA sequence) information is used to submit the prediction query to get prediction results from seven element predictors, which are then converted into seven sets of 3-input website result schemes. After both parts of SVM input are converted, the support vector machine processes and gives out a prediction result as an output of iStable.

Workflow of iStable. Illustration of how iStable prediction proceeds after the user has input the data of interested target protein.

Performance assessment

Correct predictions of positive and negative data have different meanings because the effects of mutation are not always detrimental to protein function. One of the purposes of predicting protein stability change is to identify mechanisms of structural stability change upon single amino acid mutation; another goal is to apply this knowledge to protein design in order to modify protein into more stable and thermal-tolerant forms. Since it is equally important to understand the mechanisms underlying stabilizing and destabilizing mutations, we expect an integrated predictor to make correct predictions in both cases. Since the minority result could be the right answer, we want to prove that iStable, with SVM training, would know right from wrong and not just pick the majority answer. In addition, Accuracy (Acc), sensitivity (Sn), specificity (Sp), and the Matthews correlation coefficient (MCC) were used to evaluate the predictive ability of each system. Four measures were defined:

and

where TP, FP, FN and TN are true positives, false positives, false negatives, and true negatives, respectively. Sn and Sp represent the rate of true positives and true negatives respectively. Acc is the overall accuracy of prediction. Additionally, MCC is a measure of the quality of the classifications, and the value may range between -1 (an inverse prediction) and +1 (a perfect prediction), with 0 denoting a random prediction.

Results and discussion

Performance on the M1311, M1820 and M3131 datasets

After construction of the integrated predictor iStable, we first compared the performances of iStable and the element predictors using two different datasets. The results are presented separately in Tables Tables22 and and3.3. In both datasets, iStable showed obvious improvement in sensitivity, accuracy and MCC. The performance using dataset M1820 is worth mentioning. While other predictors have shown sensitivity values that average lower than 0.370 and MCC values lower than 0.352, iStable reached a sensitivity score of 0.456 and a MCC score of 0.402. During our observations, we found that the element predictors made many more 'negative' predictions than 'positive' ones, leading to high specificity, but universally low sensitivity for the element predictors.

Table 2

Predictors	Sn	Sp	Acc	MCC
iStable	0.944	0.981	0.969	0.930
I-Mutant_PDB	0.555	0.922	0.800	0.530
I-Mutant_SEQ	0.702	0.973	0.883	0.734
AUTO-MUTE_RF	0.893	0.991	0.958	0.906
AUTO-MUTE_SVM	0.772	0.975	0.907	0.789
MUPRO_SVM	0.775	0.956	0.896	0.761
PoPMuSiC2.0	0.313	0.941	0.724	0.341
CUPSAT	0.579	0.823	0.742	0.411
Majority Voting	0.737	0.984	0.902	0.779

I-Mutant_PDB: I-Mutant2.0 prediction strategy using PDB ID.

I-Mutant_SEQ: I-Mutant2.0 prediction strategy using protein sequence.

AUTO-MUTE_RF: AUTO-MUTE Random Forest prediction model.

AUTO-MUTE_SVM: AUTO-MUTE SVM prediction model.

MUPRO_ SVM: MUPRO SVM prediction model.

Table 3

Predictors	Sn	Sp	Acc	MCC
iStable	0.456	0.900	0.752	0.409
I-Mutant_PDB	0.198	0.906	0.670	0.148
I-Mutant_SEQ	0.212	0.899	0.670	0.155
AUTO-MUTE_RF	0.129	0.985	0.700	0.234
AUTO-MUTE_SVM	0.067	0.965	0.666	0.072
MUPRO_SVM	0.276	0.885	0.682	0.206
PoPMuSiC2.0	0.303	0.952	0.736	0.352
CUPSAT	0.370	0.757	0.628	0.133
Majority Voting	0.113	0.984	0.693	0.212

Based on the objective, we wanted to construct a predictor that could perform well using both positive and negative data. The MCC values show that iStable has the best overall performance on M1311; the results obtained from M1820 show that the performances of the element predictors are lower than those in M1311, especially in the case of I-Mutant2.0, AUTO-MUTE and MUPRO. This may be related to the training datasets used in their construction; the training data for MUPRO was extracted from Capriotti's training set S1615 for neural networks, and AUTO-MUTE's training data was extracted and edited from S1948, originally the same as that of I-Mutant2.0. As the M1311 dataset is similar to their training dataset, the three element predictors showed performances consistent with those from their training. The performances using the dataset M1820 indicate that these three element predictors might have relatively lower performances when using new data not employed during previous training. Consistent with the fact that the M1820 dataset was extracted from PoPMuSiC2.0's training data M2648, we observed the performance of PoPMuSiC2.0, when using M1820, to be much better than with M1311. We tried different dataset sources, and iStable showed better prediction performance than every other element predictor. When using the same training data, iStable still showed obvious improvements in performance, especially with stabilizing mutants.

After comparing the performances of iStable and the element predictors on two datasets, we wanted to prove that training iStable with large amounts of data would give the integrated predictor a stronger capacity to deal with new data. We checked the performances of all the predictors with the mixed dataset M3131, which is shown in Table Table4.4. We see that the specificity of iStable is sometimes lower than several of the element predictors; however, the overall performance is still better than the element predictors. Through Table Table4,4, we can see that the integrated predictor iStable showed obviously improved performance with positive data, with the highest sensitivity among all of the predictors.

Table 4

Predictors	Sn	Sp	Acc	MCC
iStable	0.688	0.941	0.857	0.669
I-Mutant_PDB	0.377	0.916	0.736	0.357
I-Mutant_SEQ	0.457	0.934	0.775	0.464
AUTO-MUTE_RF	0.511	0.989	0.829	0.615
AUTO-MUTE_SVM	0.420	0.969	0.786	0.499
MUPRO_SVM	0.526	0.908	0.780	0.480
PoPMuSiC2.0	0.308	0.945	0.733	0.348
CUPSAT	0.474	0.780	0.678	0.261
Majority Voting	0.425	0.980	0.795	0.527

To validate iStable and compare it with other combination methods, i.e., radial basis function network (RBFN), random forest (RF), neural networks (NN), Bayesian network (BN), and majority voting (MV)[33] with respect to predicting protein stability changes in dataset M3131 (Table (Table5).5). The MCC of iStable, RF, and NN are all over 0.6; the MCC of BN and MV are both between 0.5 and 0.6; however, the MCC of RBFN is below 0.5. Sn and Sp in our study are both not the highest score to other combination methods; even so, iStable showed the best performance of overall evaluation to integrate off-the-shelf predictors for protein stability changes.

Table 5

WS+SEQ				SEQ
Methods	Sn	Sp	Acc	MCC	Sn	Sp	Acc	MCC
iStable	0.688	0.941	0.857	0.669	0.625	0.906	0.812	0.564
RBFN	0.752	0.764	0.760	0.495	0.583	0.759	0.700	0.337
RF	0.694	0.910	0.838	0.627	0.630	0.894	0.806	0.550
NN	0.584	0.965	0.838	0.627	0.741	0.605	0.651	0.327
BN	0.685	0.888	0.820	0.588	0.649	0.868	0.795	0.529
MV	0.425	0.980	0.795	0.527	N/A	N/A	N/A	N/A

SEQ: Sequence scheme; WS: Website result scheme

iStable was also trained and validated, using support vector regression, to predict the value of free energy stability change by integrating the ddG value fetched from I-Mutant_PDB, AUTO-MUTE, PoPMuSiC, and CUPSAT. The correlation between the predicted and the observed ddG is 0.86, with a standard error of 1.5 kcal/mol, when the method is structure based (Figure (Figure4).4). On the other hand, only I-Mutant_SEQ provides the predicted ddG value in sequence based; therefore, iStable_SEQ just shows the ddG value generated by I-Mutant_SEQ.

Evaluation of predicted ddG. Correlation plot of the experimental observed and the predicted values of ddG based on iStable.

Evaluation of sequence scheme

After comparing the performances of iStable and the element predictors with the integrated rules='groups'>StrategiesSnSpAccMCCSEQ+WS0.6880.9410.8570.669WS only0.6270.9600.8490.652SEQ+WS, without AUTO-MUTE_RF0.6580.9250.8360.622WS only, without AUTO-MUTE_RF0.7010.7450.7310.484

SEQ: Sequence scheme; WS: Website result scheme

Performance of the iStable_SEQ strategy with M3131

For users with novel proteins that lack available structural information, iStable provides a prediction strategy that takes amino acid sequences as inputs. The prediction result is presented in Table Table7.7. By integrating the results of the sequential prediction models of I-Mutant2.0 and MUPRO with an extra sequential scheme, the iStable_SEQ model showed a performance noticeably higher than the two models we used.

Table 7

Performance comparison of iStable_SEQ and sequential models

Predictors	Sn	Sp	Acc	MCC
iStable_SEQ	0.625	0.906	0.812	0.564
I-Mutant_SEQ	0.457	0.934	0.775	0.464
MUPRO_SVM	0.526	0.908	0.780	0.480

Structural analysis of predictors' performances

As mentioned, the secondary structure and RSA of the mutated site could influence the predictor's performance. Therefore, we analyzed the performance of iStable with mutations within different secondary structures and RSA ranges, and compared the results with the element predictors used. The results obtained from different kinds of mutants are presented in Tables Tables88 and and9.9. With respect to secondary structure, iStable showed the best prediction performance among all the predictors; for some reason, the performance of iStable in the case of mutants with secondary structures 'other' than helixes and sheets was relatively lower than in the presence of these two structures; this may be due to the irregular structures of loops and turns. Performance with β sheets showed a higher MCC than with helix and coil/turn structures, which is consistent with previous research []. This may be caused by the presence of residues in β-strand segments that are close in space, but far away in sequence []. When analyzing the performance of iStable for different RSA ranges, we found that iStable performs best in buried (63.4%), partially buried (68.4%) and exposed (71.2%) regions. Among the three ranges of RSA, iStable showed the high performance in partially buried region (68.4%), which is consistent with Dr. Gromiha's previous research []; the sequence and structure information of partially buried mutations were very important for predicting stability changes, but did not very high correlation for buried mutations. On the other hand, Dr. Gromiha indicated buried mutation within β-strand segments correlated better than did those in α-helical segments; iStable, therefore, brought higher sensitivity than other element predictors at buried mutations.

Table 8

Comparison of performance based on secondary structure

Secondary Structure	Predictors	Sn	Sp	Acc
Helix
iStable	0.702	0.933	0.850	0.666
I-Mutant_PDB	0.415	0.901	0.728	0.371
I-Mutant_SEQ	0.520	0.929	0.784	0.509
AUTO-MUTE_RF	0.563	0.987	0.834	0.647
AUTO-MUTE_SVM	0.495	0.957	0.792	0.536
MUPRO_SVM	0.639	0.915	0.818	0.591
PoPMuSiC2.0	0.250	0.957	0.708	0.311
CUPSAT	0.541	0.778	0.693	0.323
Sheet
iStable	0.691	0.946	0.876	0.676
I-Mutant_PDB	0.348	0.944	0.782	0.385
I-Mutant_SEQ	0.495	0.948	0.825	0.520
AUTO-MUTE_RF	0.455	0.984	0.838	0.567
AUTO-MUTE_SVM	0.297	0.996	0.805	0.426
MUPRO_SVM	0.417	0.904	0.770	0.370
PoPMuSiC2.0	0.310	0.956	0.776	0.363
CUPSAT	0.417	0.796	0.697	0.213
Other
iStable	0.680	0.943	0.847	0.666
I-Mutant_PDB	0.365	0.893	0.699	0.311
I-Mutant_SEQ	0.358	0.924	0.716	0.354
AUTO-MUTE_RF	0.479	0.995	0.805	0.595
AUTO-MUTE_SVM	0.386	0.954	0.745	0.434
MUPRO_SVM	0.485	0.900	0.748	0.433
PoPMuSiC2.0	0.330	0.889	0.688	0.270
CUPSAT	0.474	0.766	0.662	0.249

Helix: α helix; Sheet: β sheet; Other: turns and coil.

Table 9

RSA range	Predictors	Sn	Sp	Acc
Buried
iStable	0.640	0.946	0.869	0.634
I-Mutant_PDB	0.197	0.942	0.757	0.208
I-Mutant_SEQ	0.394	0.947	0.809	0.428
AUTO-MUTE_RF	0.387	0.988	0.839	0.528
AUTO-MUTE_SVM	0.254	0.989	0.806	0.403
MUPRO_SVM	0.445	0.922	0.803	0.423
PoPMuSiC2.0	0.201	0.969	0.778	0.285
CUPSAT	0.381	0.822	0.714	0.209
Partially buried
iStable	0.684	0.954	0.854	0.684
I-Mutant_PDB	0.458	0.911	0.746	0.427
I-Mutant_SEQ	0.537	0.940	0.792	0.542
AUTO-MUTE_RF	0.604	0.981	0.843	0.665
AUTO-MUTE_SVM	0.510	0.967	0.799	0.566
MUPRO_SVM	0.508	0.905	0.759	0.460
PoPMuSiC2.0	0.146	0.963	0.667	0.189
CUPSAT	0.536	0.781	0.692	0.323
Exposed
iStable	0.782	0.920	0.853	0.712
I-Mutant_PDB	0.527	0.818	0.683	0.363
I-Mutant_SEQ	0.502	0.927	0.728	0.480
AUTO-MUTE_RF	0.598	0.993	0.807	0.653
AUTO-MUTE_SVM	0.565	0.933	0.760	0.543
MUPRO_SVM	0.665	0.902	0.788	0.587
PoPMuSiC2.0	0.439	0.857	0.658	0.329
CUPSAT	0.513	0.661	0.592	0.177

The influence of window size on predictor performance

In previous research on constructing novel predictors, investigators have tried different lengths of protein sequence centered on the mutated site. MUPRO chose 7 as the best window size, while I-Mutant2.0 chose 19. We compared the performances of iStable with different window sizes using the sequence scheme. The result of the comparison is shown in Figure Figure5.5. As shown, a window size of 11 amino acids centered on the mutated site performed best in terms of both accuracy (85.7%) and MCC (66.9%). Based on this comparison, a window size 11 was selected for use in the sequence scheme of iStable.

Individual performance of different window size. By comparing accuracy and MCC, a window size of 11 showed the best performance for the both parameters.

Performance with different protein superfamilies and experimental conditions

Protein structure is closely related to function, and alteration of protein structure as the result of mutation may lead to disruption of biological function. We classified the proteins in our training dataset into their corresponding superfamilies, as previously mentioned. We chose three major categories (enzymes, DNA/RNA binding proteins, and protein-protein interaction-related proteins) of protein superfamilies to determine how iStable would perform in terms of prediction ability when the training dataset is limited. We used the three categories as independent training sets for iStable training. Each set was split into five subsets and used in 5-fold cross-validation for iStable. The performance results with the three categories of proteins are shown in Table Table10.10. As shown, iStable performs better than any of the element predictors for the three different categories of proteins. In the enzyme and protein-protein interaction categories, with limited data availability, iStable did not perform as well as with the M3131-trained model, but in the nucleic acid binding protein category, iStable showed an obvious performance improvement that was clearly superior to the element predictors. In this case, although the performance of iStable is limited by the prediction power of the element predictors, we still demonstrated that the combination of sequence and website result schemes could provide noticeable improvements in prediction performance.

Table 10

Evaluation of iStable prediction results with data from different protein superfamilies

Protein categories	Predictors	Sn	Sp	Acc
Nucleic acid binding
iStable	0.550	0.943	0.795	0.567
I-Mutant_PDB	0.550	0.852	0.742	0.439
I-Mutant_SEQ	0.300	0.943	0.704	0.343
AUTO-MUTE_RF	0.250	0.971	0.704	0.359
AUTO-MUTE_SVM	0.250	0.943	0.684	0.262
MUPRO_SVM	0.450	0.857	0.704	0.395
PoPMuSiC2.0	0.400	0.910	0.724	0.355
CUPSAT	0.350	0.552	0.476	-0.073
Enzyme
iStable	0.451	0.797	0.720	0.334
I-Mutant_PDB	0.253	0.869	0.756	0.135
I-Mutant_SEQ	0.242	0.878	0.762	0.131
AUTO-MUTE_RF	0.138	0.978	0.825	0.217
AUTO-MUTE_SVM	0.057	0.965	0.800	0.049
MUPRO_SVM	0.281	0.859	0.753	0.144
PoPMuSiC2.0	0.344	0.931	0.824	0.328
CUPSAT	0.390	0.740	0.676	0.112
Protein-protein interaction related
iStable	0.357	0.943	0.831	0.379
I-Mutant_PDB	0.207	0.858	0.733	0.088
I-Mutant_SEQ	0.361	0.798	0.714	0.161
AUTO-MUTE_RF	0.129	0.965	0.805	0.145
AUTO-MUTE_SVM	0.079	0.970	0.799	0.100
MUPRO_SVM	0.204	0.864	0.737	0.076
PoPMuSiC2.0	0.100	0.964	0.798	0.091
CUPSAT	0.461	0.778	0.717	0.216

We observed the performance of each predictor under a variety of pH and temperature ranges. Table Table1111 was shown that iStable and AUTO-MUTE_RF have better performance than other element predictors when pH < = 6 or pH > 8. These two predictors have similar performance, however, iStable have more excellent accuracy than AUTO-MUTE_RF in the condition of temperature < = 37. Finally, it is worth mentioning that iStable is the best choice predictor for predicting protein stability changes when pH between 6 and 8.

Table 11

Evaluation of iStable prediction results with data from pH-temperature ranges by accuracy

pH	< = 6			6~8			> 8
Temperature	< = 37	37~65	> 65	< = 37	37~65	> 65	< = 37	37~65	> 65
I-Mutant_PDB	0.38	0.68	0.55	0.22	0.42	0.73	0.18	0.69	0.08
I-Mutant_SEQ	0.46	0.79	0.77	0.25	0.62	0.73	0.18	0.46	0.42
AUTO-MUTE_SVM	0.38	0.81	0.97	0.20	0.43	0.79	0.09	0.69	0.33
AUTO-MUTE_RF	0.48	0.97	1.00	0.28	0.47	0.85	0.09	0.77	0.83
MUPRO	0.46	0.69	0.90	0.40	0.49	0.88	0.27	0.85	0.58
PoPMuSiC	0.13	0.17	0.35	0.29	0.36	0.60	0.09	0.38	0.17
CUPSAT	0.57	0.59	0.55	0.38	0.59	0.54	0.27	0.54	0.50
iStable	0.61	0.94	1.00	0.55	0.77	0.88	0.27	0.77	0.75

Conclusions

The power of the integrated predictor

Compared with various machine learning methods and element predictors, iStable successfully integrated sequence and website result scheme to promote the predictive performance of protein stability changes. When synergistic method was taken, we should consider some issues; 1) the input and output format are not all the same from different element predictors; 2) the evaluation of the prediction results of each element predictor; and 3) the improvement of the overall performance of synergistic systems. Majority voting model is one kind of popular synergistic method, which is the frequently strategy adopted by biologists when they must to obtain the answer from a lot of prediction tools. However, the prediction performance of the element predictor of AUTO-MUTE_RF and iStable are much better than majority voting with the above 50% MCC in our study, which because majority voting does not take into account confidence measure in the prediction results from different element predictor. Besides, iStable is a prediction system based on the synergistic method and constructed according to the grid computing architecture; therefore, iStable has the properties of software reusability and computing resources reduction.

On the other hand, the sequence scheme provides the information of local interaction; however, website result scheme also includes the non-local interaction information by the element predictors of PopMuSiC2.0 with the folding free energy changes and CUPSAT with atom potentials. Only considered sequence as input that caused iStable_SEQ does not include non-local information; furthermore, just two element predictors can be adopted, therefore, the prediction performance of iStable_SEQ is less than the that of iStable at least 10% of MCC.

Prediction tool available on website

The trained predictor iStable is available at http://predictor.nchu.edu.tw/iStable/. Users can access two models of prediction: iStable and iStable_SEQ. For predicting mutations in proteins with available 3-D structure information in the PDB, users can input the PDB ID to apply the iStable model. If the user has proteins they interested in that have an available sequence, but are not available in PDB for their structure information, the iStable_SEQ model would be the ideal choice for them.

Availability and requirements

• Project name: iStable

• Project home page: http://predictor.nchu.edu.tw/iStable

• Operating system(s): Platform independent (web server)

• Programming language: Java (server interface), PHP (web site)

• Other requirements: LIBSVM

• License: none

• Any restrictions to use by non-academics: none

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

CWC wrote the experimental programs, participated in the experimental design, and constructed the iStable website. JL compiled the data set, participated in the experimental design, and wrote the manuscript. YWC conceived of the study, participated in its design and coordination, and drafted the manuscript. All authors read and approved the manuscript.

Declarations

The publication costs for this article were funded by the corresponding author's institution and the National Science Council, Taiwan, Republic of China.

This article has been published as part of BMC Bioinformatics Volume 14 Supplement 2, 2013: Selected articles from the Eleventh Asia Pacific Bioinformatics Conference (APBC 2013): Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/14/S2.

Supplementary Material

Additional file 1:

M3131_Decreased and M3131_Increased show the integrated training data M3131 separated into positive (increasing stability) dataset and negative (decreasing stability) dataset. iStable_Comparison_results presents the different results of training conditions and comparisons of different predictors.

Additional file 2:

Superfamily_M1311 and Superfamily_M1820 record the superfamilies refer to the PDB IDs in M1311and M1820datasets. SF_DNA BINDING, SF_Enzyme, and SF_Protein-protein-interaction list the PDB IDs belong to three major categories.

Acknowledgements

This work was supported in part by the National Science Council, Taiwan, Republic of China, under grants NSC 100-2221-E-005-073 and NSC 101-2221-E-005-085.

References

Mehanna AS. Sickle cell anemia and antisickling agents then and now. Curr Med Chem. 2001;8(2):79–88. doi: 10.2174/0929867013373778. [PubMed] [CrossRef] [Google Scholar]
Tokuriki N, Stricher F, Serrano L, Tawfik DS. How Protein Stability and NewFunctions Trade Off. Plos Computational Biology. 2008;4(2):e1000002. doi: 10.1371/journal.pcbi.1000002.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Daggett V, Fersht AR. Is there a unifying mechanism for protein folding? Trends Biochem Sci. 2003;28(1):18–25. doi: 10.1016/S0968-0004(02)00012-9. [PubMed] [CrossRef] [Google Scholar]
Gromiha MM, Sarai A. Thermodynamic database for proteins: features and applications. Methods Mol Biol. 2010;609:97–112. doi: 10.1007/978-1-60327-241-4_6. [PubMed] [CrossRef] [Google Scholar]
Bash PA, Singh UC, Langridge R, Kollman PA. Free energy calculations by computer simulation. Science. 1987;236(4801):564–568. doi: 10.1126/science.3576184. [PubMed] [CrossRef] [Google Scholar]
Lee C, Levitt M. Accurate prediction of the stability and activity effects of site-directed mutagenesis on a protein core. Nature. 1991;352(6334):448–451. doi: 10.1038/352448a0. [PubMed] [CrossRef] [Google Scholar]
Pitera JW, Kollman PA. Exhaustive mutagenesis in silico: Multicoordinate free energy calculations on proteins and peptides. Proteins-Structure Function and Genetics. 2000;41(3):385–397. doi: 10.1002/1097-0134(20001115)41:3<385::AID-PROT100>3.0.CO;2-R. [PubMed] [CrossRef] [3.0.CO;2-R&' target='pmc_ext' ref='reftype=other&article-id=3549852&issue-id=218460&journal-id=13&FROM=Article%7CCitationRef&TO=Content%20Provider%7CLink%7CGoogle%20Scholar'>Google Scholar]
Carter CW Jr, LeFebvre BC, Cammer SA, Tropsha A, Edgell MH. Four-body potentials reveal protein-specific correlations to stability changes caused by hydrophobic core mutations. J Mol Biol. 2001;311(4):625–638. doi: 10.1006/jmbi.2001.4906. [PubMed] [CrossRef] [Google Scholar]
Gilis D, Rooman M. Predicting protein stability changes upon mutation using database-derived potentials: solvent accessibility determines the importance of local versus non-local interactions along the sequence. J Mol Biol. 1997;272(2):276–290. doi: 10.1006/jmbi.1997.1237. [PubMed] [CrossRef] [Google Scholar]
Parthiban V, Gromiha MM, Schomburg D. CUPSAT: prediction of protein stability upon point mutations. Nucleic Acids Research. 2006;34(Web Server):W239–W242. doi: 10.1093/nar/gkl190.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Sippl MJ. Knowledge-based potentials for proteins. Curr Opin Struct Biol. 1995;5(2):229–235. doi: 10.1016/0959-440X(95)80081-6. [PubMed] [CrossRef] [Google Scholar]
Topham CM, Srinivasan N, Blundell TL. Prediction of the stability of protein mutants based on structural environment-dependent amino acid substitution and propensity tables. Protein Engineering. 1997;10(1):7–21. doi: 10.1093/protein/10.1.7. [PubMed] [CrossRef] [Google Scholar]
Zhou HY, Zhou YQ. Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction. Protein Science. 2002;11(11):2714–2726.[PMC free article] [PubMed] [Google Scholar]
Guerois R, Nielsen JE, Serrano L. Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol. 2002;320(2):369–387. doi: 10.1016/S0022-2836(02)00442-4. [PubMed] [CrossRef] [Google Scholar]
Munoz V, Serrano L. Development of the multiple sequence approximation within the AGADIR model of alpha-helix formation: Comparison with Zimm-Bragg and Lifson-Roig formalisms. Biopolymers. 1997;41(5):495–509. doi: 10.1002/(SICI)1097-0282(19970415)41:5<495::AID-BIP2>3.0.CO;2-H. [PubMed] [CrossRef] [3.0.CO;2-H&' target='pmc_ext' ref='reftype=other&article-id=3549852&issue-id=218460&journal-id=13&FROM=Article%7CCitationRef&TO=Content%20Provider%7CLink%7CGoogle%20Scholar'>Google Scholar]
Takano K, Ota M, Ogasahara K, Yamagata Y, Nishikawa K. et al.Experimental verification of the 'stability profile of mutant protein' (SPMP) data using mutant human lysozymes. Protein Engineering. 1999;12(8):663–672. doi: 10.1093/protein/12.8.663. [PubMed] [CrossRef] [Google Scholar]
Villegas V, Viguera AR, Aviles FX, Serrano L. Stabilization of proteins by rational design of alpha-helix stability using helix/coil transition theory. Folding & Design. 1996;1(1):29–34. doi: 10.1016/S1359-0278(96)00009-0. [PubMed] [CrossRef] [Google Scholar]
Yin S, Ding F, Dokholyan NV. Modeling backbone flexibility improves protein stability estimation. Structure. 2007;15(12):1567–1576. doi: 10.1016/j.str.2007.09.024. [PubMed] [CrossRef] [Google Scholar]
Capriotti E, Fariselli P, Casadio R. A neural-network-based method for predicting protein stability changes upon single point mutations. Bioinformatics. 2004;20(Suppl 1):i63–68. doi: 10.1093/bioinformatics/bth928. [PubMed] [CrossRef] [Google Scholar]
Capriotti E, Fariselli P, Casadio R. I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res. 2005;33(Web Server):W306–310. doi: 10.1093/nar/gki375.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Casadio R, Compiani M, Fariselli P, Vivarelli F. Predicting free energy contributions to the conformational stability of folded proteins from the residue sequence with radial basis function networks. Proc Int Conf Intell Syst Mol Biol. 1995;3:81–88. [PubMed] [Google Scholar]
Cheng J, Randall A, Baldi P. Prediction of protein stability changes for single-site mutations using support vector machines. Proteins. 2006;62(4):1125–1132. [PubMed] [Google Scholar]
Frenz CM. Neural network-based prediction of mutation-induced protein stability changes in Staphylococcal nuclease at 20 residue positions. Proteins. 2005;59(2):147–151. doi: 10.1002/prot.20400. [PubMed] [CrossRef] [Google Scholar]
Huang LT, Gromiha MM. Reliable prediction of protein thermostability change upon double mutation from amino acid sequence. Bioinformatics. 2009;25(17):2181–2187. doi: 10.1093/bioinformatics/btp370. [PubMed] [CrossRef] [Google Scholar]
Huang LT, Gromiha MM, Ho SY. iPTREE-STAB: interpretable decision tree based method for predicting protein stability changes upon mutations. Bioinformatics. 2007;23(10):1292–1293. doi: 10.1093/bioinformatics/btm100. [PubMed] [CrossRef] [Google Scholar]
Huang LT, Gromiha MM, Ho SY. Sequence analysis and rule development of predicting protein stability change upon mutation using decision tree model. J Mol Model. 2007;13(8):879–890. doi: 10.1007/s00894-007-0197-4. [PubMed] [CrossRef] [Google Scholar]
Wan J, Kang SL, Tang CN, Yan JH, Ren YL. et al.Meta-prediction of phosphorylation sites with weighted voting and restricted grid search parameter selection. Nucleic Acids Res. 2008;36(4):e22.[PMC free article] [PubMed] [Google Scholar]
Bordner AJ, Abagyan RA. Large-scale prediction of protein geometry and stability changes for arbitrary single point mutations. Proteins. 2004;57(2):400–413. doi: 10.1002/prot.20185. [PubMed] [CrossRef] [Google Scholar]
Capriotti E, Fariselli P, Rossi I, Casadio R. A three-state prediction of single point mutations on protein stability changes. BMC Bioinformatics. 2008;9(Suppl 2):S6. doi: 10.1186/1471-2105-9-S2-S6.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Masso M, Vaisman II. Accurate prediction of stability changes in protein mutants by combining machine learning with structure based computational mutagenesis. Bioinformatics. 2008;24(18):2002–2009. doi: 10.1093/bioinformatics/btn353. [PubMed] [CrossRef] [Google Scholar]
Dehouck Y. et al.Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0. Bioinformatics. 2009;25(19):2537–2543. doi: 10.1093/bioinformatics/btp445. [PubMed] [CrossRef] [Google Scholar]
Chang CC, Hsu CW, Lin CJ. The analysis of decomposition methods for support vector machines. IEEE Trans Neural Netw. 2000;11(4):1003–1008. doi: 10.1109/72.857780. [PubMed] [CrossRef] [Google Scholar]
Witten IH, Frank E. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. 2. San Francisco: Morgan Kaufmann Publishers; 2005. [Google Scholar]
Gromiha MM, Selvaraj S. Inter-residue interactions in protein folding and stability. Prog Biophys Mol Biol. 2004;86(2):235–277. doi: 10.1016/j.pbiomolbio.2003.09.003. [PubMed] [CrossRef] [Google Scholar]
Gromiha MM. et al.Role of structural and sequence information in the prediction of protein stability changes: comparison between buried and partially buried mutations. Protein Eng. 1999;12(7):549–555. doi: 10.1093/protein/12.7.549. [PubMed] [CrossRef] [Google Scholar]

Articles from BMC Bioinformatics are provided here courtesy of BioMed Central

nowbotantique – 2019