Amino acids are the building blocks of all proteins. Each amino acid corresponds to a different molecule and these molecules are linked together by peptide bonds to form long polypeptide chains that are called proteins. There are 20 different amino acids encoded in the genomes of all organisms. However, these amino acids can be modified, either temporarily or permanently, in many different ways by post-translational modifications, which increases the possible diversity of molecular building blocks that form proteins.
Amino acids are often designed with a three letter code (e.g., Alanine = ALA), or with a one letter code (Alanine = A). See this page or this figure for a summary of the names and codes attributed to natural amino acids.
All amino acids have in common what is called a backbone that consists of four heavy atoms: N-C-C=O. The sidechain corresponds to the part of the molecule that is linked to the first carbon atom of the backbone (often called C alpha atom) and is specific to each amino acid. For instance, Alanine has the shortest sidechain, consisting of only one carbon atom linked to the C alpha atom, while Lysine has a long chain of 4 carbon atoms followed by one nitrogen atom (C-C-C-C-N). Glycine does not have any sidechain.
While nature uses only a limited set of different sidechains (the 20 "natural" ones + the ones obtained by post-translational modifications), the possible chemical diversity of sidechains is nearly infinite. Non-natural sidechains refer to sidechains that are not part of the 20 naturally occurring amino acid sidechains.
Amino acids display a chiral center at the C alpha atom. All natural sidechains are in the so-called L-conformation. A D-amino acid corresponds to the case where the sidechain and the Hydrogen atom linked to the C alpha atom have been switched (see this image for a graphical representation). D amino acids are naturally found almost exclusively in some sea-dwelling organisms or some bacteria, and are the result of a post-translational modification. D-amino acid containing peptides display usually a much higher resistance to protease mediated degradation.
Amino acid sidechain are flexible molecules that can adopt distinct conformations corresponding to preferred values of torsion angles along chemical bonds. For instance carbon chains adopt three main conformations around each bond, corresponding to torsion angles equal to -60, 60 and 180 degrees. These preferred conformations are called rotamers. For amino acid sidechains, the probability of each rotamer depends on the nature of the chemical bonds, interactions with other atoms of the sidechain, interactions with the backbone of the protein, and interactions with other atoms found in the vicinity of the sidechain.
Backbone dependent rotamer libraries assign probabilities for each rotamer as a function of the phi and psi backbone dihedral angles (typically by using bins of 10 degrees on phi and psi angles). Backbone independent rotamer libraries disregard the phi and psi dependencies and are typically used to model amino acids that are not part of a polypeptide chain, as well as N or C terminal amino acids.
For natural sidechains, rotamers have been generated by running statistics on all available X-ray structures in the PDB (as of Jan 2012) with a resolution lower or equal to 1.75 Ångströms. For non-natural sidechains, we used a combined physics-based and knowledge-based approach. This consists in computing the probability of each rotamer based on MD trajectories and renormalizing the obtained probabilities for the first dihedral angles by the ones obatined for natural sidechains in experimental structure (see our paper (PDF) for more details).
For D-amino acid, we treated separately chiral and non-chiral sidechains. For non-chiral sidechains, rotamer probabilities can be readily computed by using the properties of mirror images, and especially the fact that the probability of a rotamer of a D-sidechain for given phi and psi backbone dihedral angles is equal to the probability of the same rotamer for the L-sidechain given -phi and -psi values for the backbone dihedral angles. For chiral D-amino acids, rotamers were generated using the same strategy as for non-natural L-amino acids (first evaluating probabilities with MD simulations and then renormalizing by experimental rotamer libraries).
The SwissSidechain database contains structural and molecular data for 210 non-natural sidechains, both in L- and D-conformations, in addition to the 20 natural ones. These data are stored in files describing the chemical structure (SMILES) and 3D coordinates (PDB, MOL2) of these sidechains, files describing the physico-chemical properties of these sidechains (partial charges, LogP, bond/angle/torsion constants), as well as files describing the different possible conformations of these sidechains (rotamers). In addition, we provide tools to rapidly integrate non-natural sidechains into existing analysis (CHARMM and GROMACS) and visualization (PyMOL and UCSF Chimera) software.
SwissSidechain can be used to address many important questions in structural biology and drug design. First, it enables visualizing non-natural sidechains in 3D and incorporating them into existing structures. This is useful for instance to evaluate which sidechain might optimally fit into a binding pocket of a protein complex to increase the binding affinity, thereby narrowing down the list of molecules to be experimentally tested. Second, it provides physico-chemical information about non-natural sidechains that can be used to select some specific sidechains to be inserted into existing peptides to improve their pharmacological properties (e.g., hydrophobicity, protease resistance,...). Third it provides all data required to run detailed molecular mechanics calculations that were so far only possible for the 20 natural sidechains with existing software.
The 210 non-natural sidechains present in the SwissSidechain database have been selected according to two main criteria. First, we retrieved all non-natural sidechains with available structural data in the PDB (141 in total). Second, we added more than 70 sidechains that are often used in biochemistry and drug design and are commercially available. Note that non-natural sidechains in the PDB are often also commercially available, so that in total more than 70% of the SwissSidechain non-natural amino acids can be purchased.
In the current version, we focus on sidechains that do not modify the backbone atoms other than the C alpha. Therefore we do not include for instance proline derivatives or beta amino acids.
For all non-natural L-amino acids present in the PDB, the same three-letter code was used in order to maximize compatibility between different resources. For other L-amino acids, a four-letter code was chosen, as close as possible to the full name (e.g., 2-cyano-phenylalanine -> 2CNP). For D-amino acids whose corresponding L-form had a three-letter code, we simply added a D in front of the code (e.g., ABA -> DABA). For other D-amino acids, a new four-letter code was chosen starting with a D, bearing as much similarity as possible with the four-letter code of the L-form (e.g., AZDA -> DZDA). Therefore most four-letter codes starting with a D correspond to D-amino acids, except for DILE (L-diethylalanine), DIPH (L-3,3-diphenylalanine) and DMP3 (3-ethyl-phenylalanine). A table listing codes for all sidechains in L and D configurations, as well as full chemical names is available here.
Volumes have been calculated with CHARMM. Molecular weights have been computed with OpenBabel. Experimental LogP values for both the full amino acid and the sidechain only have been manually collected from literature and existing databases, when available. For the rest of the sidechains, LogP values have predicted using XlogP3 and are marked as "(predicted)" on the website. PKa have been manually retrieved from existing literature when available and predicted with MarvinSketch (ChemAxon) otherwise. The latter are indicated as "(predicted)". PKa values are listed in the order they appear in the 2D structure, from left to right: first the pKa of the C-terminus (COO-), then the pKa of the N-terminus (NH3+), then the pKa of each sidechain protonatable group. In case of possible ambiguities, the chemical group is specificied in parenthesis. Sidechain pKa have been calculated assuming that the amino acid was part of a longer polypeptide chain (i.e. without charged N- or C-termini). When several group with similar pKa values are present on the sidechain (e.g, 2 OH group attached to a phenyl ring as in 3FG), the highest (resp. lowest) pKa corresponds to the double unprotonated (resp. protonated) state. Between these two values, the sidechain is often found as a mixture of different protonation states. We also stress that pKa values are highly dependent on the chemical environment of the sidechains and therefore vary greatly between different systems. PKa values are separated by slashes (' / ') in the webpage and underscore (_) in the flat file summarizing all sidechain properties.
SwissSidechain is primarily a database of manually curated non-natural sidechains and not a webservice. Thus it does not allow you to generate parameters for other non-natural sidechains, which is a non-trivial task that often requires manual curation. Should you have another non-natural sidechain for which you would like to specifically generate parameters and topologies for MD simulations, we recommend you to use the SwissParam website
We do not provide online tools to predict binding free energy differences when mutating a natural sidechain to a non-natural one in a protein or a peptide. However, you can use the SwissSidechain data with molecular mechanics simulation software, such as CHARMM and GROMACS, which provide tools to make binding free energy predictions. Moreover, visual inspection with our visualization plugins of the structural environment of a non-natural sidechain mutant is often quite informative to assess whether the new sidechain may fit or not into a binding pocket.
The SwissSidechain database provide different ways of querying data. First, you can browse the database according to the different families of derivatives of natural sidechain at this page. Second, you can search for non-natural sidechains based on their physico-chemical properties (volume and logP) using our interactive 2D graph. If you are using Internet Explorer, make sure you have version 9+ to use this browsing option. Third, you can print PDF tables that summarize the different sidechains present in SwissSidechain.
The SwissSidechain database has been published in Nucleic Acids Reserach: Gfeller D, Michielin O, Zoete V, Nucleic Acids Research, 41, D327-D332 (2013) (PDF).
For more information about how SwissSidechain data have been generated, you can refer to our technical paper describing the new methods that we have developed to generate force-field parameters and rotamer libraries: Gfeller D, Michielin O, Zoete V, Journal of Computational Chemistry, 55, 1525-1535 (2012). (PDF)(Supp)
Yes, all data can be accessed freely by academic users for research purposes only (see our license policy).
The SwissSidechain database is property of the Swiss Institute of Bioinformatics (SIB) and its exclusive commercial representative GeneBio. For commercial license agreement, see our commercial license policy.
Visualization tools in SwissSidechain have been developed as plugins for existing visualization software. Currently we provide plugins for PyMOL and Chimera, which are two of the most widely used molecular visualization software. Therefore you need to have PyMOL or UCSF Chimera installed on your computer to use SwissSidechian visualization tools.
UCSF Chimera is developed by the University of San Francisco. It can be obtained free of charge for academic users at this link.
Our PyMOL plugin enables you to insert both L and D amino acid sidechains in existing protein structures using a simple command line, or using the wizard tool. Detailed explanations can be obtained at this page.
The UCSF Chimera plugin can be used to insert both L and D amino acid sidechains, using either command line, or using the Structure Editing tools. Detailed explanations can be obtained at this page.
MD simulation stands for Molecular Dynamics simulation. MD simulations aim at describing the dynamical behavior of macromolecules by simulating its actual motion based on physical forces (such as electrostatics or van der Walls) between the atoms of the molecule and of the solvent, starting most often from a known crystal structure. MD simulations provide powerful tools to sample the structural environment and predict the molecular rearrangement occurring for instance after mutating in silico one residue of a protein or peptide. Moreover, by averaging free-energy calculations along short MD trajectories, one can reliably estimate the binding free energy differences between the WT and different mutants (e.g., Zoete et al.). The data provided in SwissSidechain enable expanding these analyses to hundreds of non-natural sidechains.
Several parameters required to run MD simulations with non-natural sidechains, such as bond, angle or torsion constants, are not part of the standard force-fields. To generate these parameters, we used the SwissParam webserver, as well as similarity with existing force field parameters.
Topology files, including partial charges on the atoms, have also been generated either using direct mapping from natural sidechains (e.g., hydrogen atoms within -CH1, -CH2 or -CH3 groups have been assigned a charge of 0.09), or retrieving them from the SwissParam webserver.
SwissSidechain provides all data required to run MD simulations with CHARMM and GROMACS (using the CHARMM force field).
CHARMM can be obtained here (a license is required)
GROMACS can be obtained here. It is free of charge for academic users.