Proteins

Contents

Amino acids

Amino acids are the building blocks of proteins. They all have the basic structure shown below:

L-amino acid.
L-amino acid
CO-R-N goes anticlockwise

D-amino acid.
D-amino acid
CO-R-N goes clockwise

As you can see, amino acids (in general) are chiral: only the L form is found in proteins, although the D form of alanine is an important component of bacterial cell walls. All amino acids are chiral except glycine. The L/D descriptor is an old fashioned version of the S/R descriptor, and almost all amino acids are the (S)-isomer because the R group (not to be confused with an (R) descriptor!) is priority three. Only cysteine is an exception to this rule, because its R side group is -CH2SH, which has higher priority than COOH. The L/D descriptors come from the 'CO-R-N' law: if you place the hydrogen behind the chiral centre, and count round from the COOH group, to the R group, to the NH2 group, this goes anticlockwise, and for similar reason to the R/S descriptor, this means it's the L isomer. The D-isomer goes clockwise.

There are twenty (or so) amino acids, which we will discuss in groups based on their chemical properties..

The hydrophobic amino acids are:

Aliphatic: glycine (Gly, G), alanine (Ala, A), leucine (Leu, L), isoleucine (Ile, I), valine (Val, V).

Glycine.
Glycine

Alanine.
Alanine

Leucine.
Leucine

Isoleucine.
Isoleucine

Valine.
Valine

Sulfur containing: methionine (Met, M).

Methionine.
Methionine

Aromatic: phenylalanine (Phe, F), tryptophan (Trp, W).

Phenylalanine.
Phenylalanine

Tryptophan.
Tryptophan

Cyclic imine : proline (Pro, P).

Proline.
Proline

Proline has an unusual imine ring structure (a secondary amine), where the terminal amine group is actually incorporated into the side chain. This causes changes to the secondary structure of a protein. Hydrophobic residues are often found in membrane bound proteins, and the aromatic ones contribute to protein absorbance at 280 nm, which is an important method of protein quantification. Alanine is the 'don't care' amino acid, often appearing where nothing interesting is happening!

The polar amino acids are those that are not charged at physiological pH, but which are nevertheless quite polar due to their alcohol or amide groups.

Alcohols: threonine (Thr, T), serine (Ser, S).

Threonine.
Threonine

Serine.
Serine

Amides: asparagine (Asn, N), glutamine (Gln, Q).

Asparagine.
Asparagine

Glutamine.
Glutamine

The basic amino acids are those that can accept protons. The pKa refers to the dissociation of the proton from a positively charged (protonated) amine group.

Imidazole: histidine (His, H, pKa of protonated form of the group -C=NH+- = 6.04).

Histidine.
Histidine

Amine: lysine (Lys, K, pKa of protonated form of terminal amine group NH3+ = 10.54).

Lysine.
Lysine

Guanidinium: arginine (Arg, R, pKa of protonated form of terminal group -C=NH2+ = 12.48).

Arginine.
Arginine

The acidic amino acids are carboxylic acids, plus an thiol (cysteine) and a phenol (tyrosine) with dissociable S-H or O-H groups:

Alcohols: aspartic acid (Asp, D, pKa of COOH group = 3.90), glutamic acid (Glu, E, pKa of COOH group = 4.07).

Aspartic acid.
Aspartic acid

Glutamic acid.
Glutamic acid

Cysteine (Cys, C, pKa of SH group = 8.37).

Cysteine.
Cysteine

Phenol: tyrosine (Tyr, Y, pKa of OH group = 10.46).

Tyrosine.
Tyrosine

The sodium salt of glutamic acid is the flavour enhancer MSG.

Cysteine is capable of forming a dimer: Cys-SH + Cys-SH → Cys-S-S-Cys. These disulphide bridges (confusingly known as cystine), are responsible for a lot of protein tertiary structures.

The charged (acidic/basic) and polar amino acids are often involved in catalysis, forming covalent products with substrates. The terminal COOH and NH2 groups on an amino acid (and on larger peptides and proteins) may also be charged, as have their own pKa's.

At physiological pH, amino acids and proteins are usually charged at both ends, a zwitterion.

Zwitterions have charged amine and carboxylate groups.
Alanine zwitterion.

There are three other amino acids that are incorporated directly into proteins at the ribosome. Each uses a codon otherwise used by another amino acid, or which normally acts as a stop codon. The 21st proteinogenic amino acid is N-formylmethionine, which is the first amino acid in all bacterial proteins. It is encoded by the same codon as methionine (AUG), but because of the interaction between initiation factors and ribosome at the start of translation, only the tRNA for N-formylmethionine is able to bind the start codon. The 22nd proteinogenic amino acid is selenocysteine, which is encoded by a special interpretation of the opal stop codon UGA, and is an important component of some enzymes, such as glutathione peroxidase. The 23rd genetically encoded amino acid is found in the methanogenic Archaea, which can interpret the amber stop codon (UAG) as the amino acid pyrrolysine when it is followed by a sequence in the mRNA that folds into a specific stem-loop structure.

N-formylmethionine.
N-formyl methionine

Selenocysteine.
Selenocysteine

Pyrrolysine.
Pyrrolysine

In addition to the amino acids found in proteins, the cell also contains a number of other amino acids that are not normally found in peptides. These include ornithine and citrulline, important intermediates in the urea cycle and homocysteine, an important intermediate in sulfur metabolism.

Ornithine.
Ornithine

Homocysteine.
Homocysteine

Citrulline.
Citrulline

In addition to this, many amino acids in proteins are modified after translation to incorporate phosphate groups, hydroxyl groups, etc. These modified amino acid residues include the phosphorylated alcohol amino acids (Ser, Thr, Tyr), which are phosphorylated by kinase enzymes. Histidine can also be phosphorylated. Phosphorylation is a common strategy for protein regulation. The histone proteins, which regulate DNA packaging in the nucleus are themselves regulated by methylation and acetylation of their arginine and lysine residues, and the characteristic triple helical structure of collagen is caused by its possession of many glycine and hydroxyproline (a post-translational modification of proline) residues.

O-phosphoserine.
O-phosphoserine

O-phosphothreonine.
O-phosphothreonine

O-phosphotyrosine.
O-phosphotyrosine

Acetyllysine.
N-Acetyllysine

Hydroxyproline.
Hydroxyproline

Methylarginine.
N-Methylarginine

Peptides and proteins

Proteins and peptides are just huge polymeric amides (A-CONH-B). Peptide is the name given to small chains of amino acids, polypeptides the name for very long (100+ amino acids long) peptides, and protein is the name given to polypeptides that have been processed to form functional products.

The peptide (amide) bond is formed between two amino acids thusly:

A-COOH + B-NH2 → A-CO-NH-B + H2O.

Acid + base → salt + water.

Proteins are peptides are amides, that the peptide bonds that hold proteins together are just long strings of amide linkages. The peptide bond has a certain degree of double bond characters about it, with restricted rotation, and hence, the ability to have a sort of geometric isomery. For steric reasons the trans (E) form shown below is favoured by about 1000 times for most amino acids. Proline is an exception: it can form the cis form reasonably easily, and this motif is found in some protein turns.

The peptide bond has some double-bond character.
The π bond between the C and O of the peptide bond is delocalised onto the adjacent N. This gives the C-N bond enough double-bond character to prevent free rotation. Consequently, the peptide backbone is held with the α carbons (those bearing the R group) in trans (E).

α-amanitin is a small cyclical peptide which is the highly toxic product of Amanita phalloides, the death cap toadstool. It inhibits RNA polymerase II, and as such can be used in molecular biology to curtail transcription of genes. See if you can identify the amino acids in it.

Amanitin.
α-Amanitin

The structure of proteins is covered in all biochemistry text books in excruciating detail, so this will be somewhat of a whistle-stop tour. Protein structure can be largely divided into primary, secondary, tertiary and quaternary structure.

Primary structure is simply the ordered list of amino acids that compose the polypeptide chain (conventionally written from the NH2 (N) terminus to the COOH (C) terminus). Determining the primary structure of a protein involves using enzyme digests and the Edman degradation. Firstly, we use chemicals and enzymes that cut at specific places in the peptide chain, to break our protein into more manageable bits: mercaptoethanol can be used to break disulfide bonds, cyanogen bromide always cleaves to the C terminus side of methionine, and the enzyme trypsin to the C terminus side of lysine and arginine. The Edman degradation can then be used to nibble off individual amino acids from the N terminus, and identify them by HPLC. The technique uses phenylisothiocyanate (PhNCS), which reacts with the amino terminus to give a adduct. This can then be treated with 6 M HCl, which causes the adduct to cleave off the end of the chain, yielding a phenylthiohydantoin derivative of the amino acid (which can be identified), and a new N terminus, ready for the next round of degradation. The Edman N-terminal sequence can either be used to discover the whole protein sequence, or the first five to ten residues can be used to design a degenerate gene probe to find the gene responsible for the protein, and then sequence the gene instead, which is often easier.

The secondary structure of a protein is caused by the interactions of the peptide backbone with itself. There are several common motifs, the most famous of which are the α helix and the β pleated sheet. However, there are several other less well known motifs, such as 3-residue and inverse gamma turns. All these structures are stabilised by hydrogen bonds between the peptide backbones.

Alpha-helix - click for Jmol version
α-helix

Beta-sheet - click for Jmol version
β-sheet

Tertiary structure is that due to interactions between secondary structures and/or the side chains of the amino acid residues. The aforementioned cysteine-cysteine disulfide bridge is such a structure. Other interactions occur between hydrophobic amino acids (actually it's an entropic effect, but whatever), between oppositely charged amino acids, and H-bonds forms between polar and charged amino acids. These interactions often lead to the protein having several distinct domains, each with a more-or-less specific job, such as ATP hydrolysis, glucose binding, etc. Some of these tertiary structures are quite attractive: such as this β-barrel from fatty-acid binding protein:

Fatty_acid_binding_protein - click for Jmol version

Quaternary structure is the final step in protein building: it is the structure due to modifications, cofactors (like vitamins and nucleotides), prosthetic groups (like haem, and bound metal ions), glycosylation (the addition of sugar residues through serine or asparagine residues) , and the agglomeration of separate polypeptide chains to form multimeric proteins. This is haemoglobin: you can see both the primary structure (the amino acids), the helices and sheets of the secondary structure, the tertiary arrangement of these helices and sheets, and the quaternary structures: the four separate chains and the prosthetic haem groups.

Haemoglobin_(sidechains) - click for Jmol version

Test yourself

  1. Which of the amino acids will show good pH buffering activity at pH 6?
  2. Show the predominant form of the following peptide at pH 10.

    Peptide.

  3. Why are most amino acids held in the trans conformation in peptides?
  4. Classify the following as primary, secondary, tertiary or quaternary protein structures.
    • Amino acid sequence
    • A haem prosthetic group
    • The multi-subunit nature of phosphofructokinase
    • An antiparallel β sheet
    • An ATP binding domain

Answers

Bibliography

Peer Review.
This page has been peer reviewed by 4 people. Thanks to Phillip San Miguel for his feedback and John Garavelli for his corrections and suggestions.