Molecular Biology Methods

Contents

Manipulating cells

This page will concentrate on a long-term experiment that tries to find out where DNA polymerase is located in the cell, and how it gets there. We will do this from first principles, just as you would have to if you were working on an entirely novel protein, and explain the techniques required at each stage.

We would like to know how DNA polymerase gets into the nucleus. Questions of where things are located in the cell are the domain of cytochemistry, which investigates what is present in a cell. Cytochemical investigation can answer questions of the "how many cells do we have," and, "what is in them?" sort. Simple staining with dyes can be used to locate some chemicals: haematoxylin and methylene blue bind DNA, RNA, acidic proteins, etc; however, it is rare to find stains specific for a single protein like DNA polymerase. Consequently, we need to rely on molecular biological techniques to manipulate protein and DNA to investigate their localisation and properties.

The first thought for finding DNA polymerase in a cell might be to use fluorescent antibodies against it. Just like immunogold staining, we can use primary antibodies raised to the protein in e.g. a mouse, then use secondary anti-mouse antibodies raised in e.g. a rabbit to stain these. Rather than colloidal gold, we can tag the anti-mouse antibodies with a fluorophore like rhodamine or fluorescein. However, we need lots of purified protein to raise an antibody. If we don't have that, we'll need to make some…

Proteins are usually investigated in model organisms, whose biochemistry and physiology is well understood, and which reproduce rapidly:

Cells are cultured in growth media. Chemically defined media (made from exactly specified ingredients) can be used for yeasts and bacteria; but protein growth factors are required for mammalian cells, so undefined 'serum' is often added for human cell lines. Culture media will usually contain:

Proteins may be secreted extracellularly, but DNA polymerase is not. Consequently, we must disrupt the cells containing it to release it. The method used to break open a cell depends on its robustness. Animals have a weak plasmalemma, which can be easily burst using osmotic stress or sonication. Plants, bacteria and fungi (including yeasts) have a tougher cell wall, which requires enzymatic digestion, or bead-beating, or even the use of a French press, which squeezes the cells at high pressure through a tiny hole to make them explode. PMSF (a protease inhibitor), protamine sulfate (which precipitates nucleic acids), detergent (which disrupts membranes) and dithiothreitol (a reducing agent) may be added at this stage.

The cell lysate contains a vast array of subcellular debris. Centrifugation separates items by their speed of sedimentation by using a rapidly rotating rotor to pellet-down particles (e.g. organelles) from suspensions (e.g. culture medium).

Centrifugation separates particles based on their sedimentation rate.

Simple velocity sedimentation simply spins things down, depending on how long the centrifuge is run for, and how fast. Density gradient centrifugation (using gradients of CsCl or sucrose solutions) allows particles to come to equilibrium according to their buoyancy in the gradient. The speed of sedimentation is measured in Svedberg units - the rate of sedimentation scaled for the radial acceleration of the rotor: this is the 'S' in 70S ribosome, etc.

Typical centrifugation speeds and times are:

DNA polymerase is found in nuclei, so after lysing our yeast cells, we can spin out the nuclei, break them open with a little detergent, then use a sucrose density gradient to separate the enzymes in the nucleus (left in the supernatant) from the other junk (which will be pelleted down).

Protein purification

Centrifugation will separate organelles from macromolecules, but the supernatant will still contain many, many enzymes other than the one we wish to work on. Consequently, the supernatant will need to be purified to separate these different proteins. Chromatography is the most important method for this, although salting out and dialysis may be used to give an initial cut.

Salting out with ammonium sulfate separates proteins on the basis of their solubility. Adding large quantities of a chaotropic salt like ammonium sulfate will weaken the hydrogen bonds between proteins and water, making them drop out of solution over a relatively small concentration range, where they can be harvested by centrifugation. This can be used to effect a crude initial fractionation of proteins from a cytosolic extract.

Dialysis separates proteins on the basis of their size. Protein solutions that have been salted out or have come off a chromatography column are usually full of salt, which must be removed. Dialysis is the usual way to do this: the proteins are put in a semi-permeable bag of Visking membrane, which is then put in a big bucket of buffer. Salt and water can cross the membrane, but proteins are retained, hence the protein is desalted. A variation on this is to add the solution to a centrifuge tube containing a dialysis membrane half way down it, and spin it. You can buy different membranes with different sized holes, so you can even achieve a little purification too (small proteins get spun out into the lower fraction, which is discarded).

A dialysed ammonium sulfate cut can then be subjected to several rounds of chromatography. Chromatography is the separation of substances due to their differential solubility in mobile and stationary phases: the retention time (how long a molecule stays stuck to a column) varies, depending on how firmly bound it is to the stationary phase, as the mobile phase washes through the column. The mobile phase may be varied to wash off (elute) strongly bound species. Often, ion exchange chromatography is used for the initial separation, followed by gel filtration and affinity chromatography.

Ion exchange separates proteins by charge: anion exchange groups like DEAE (diethylaminoethyl) are positively charged, so will reversibly bind negatively charged anions. Cation exchangers like CMC (carboxymethyl) are negatively charged. These groups are covalently bound onto a cellulose or Sephadex (modified starch) matrix. Proteins can be eluted with increasing salt concentrations, whose ions compete for binding to the stationary phase. The eluate will contain a lot of salt, so needs to be dialysed afterwards.

Cation exchange chromatography would be suitable for the purification of DNA polymerase.

Because DNA polymerase is full of basic residues like arginine, lysine and histidine, at pH 7.2, it is positively charged, so we could use carboxymethylcellulose to purify it.

Size exclusion chromatography (a.k.a. gel filtration) uses crosslinked dextran, agarose or polyacrylamide beads with pores that trap proteins below a particular size. Small molecules are retarded, whilst larger molecules fall through more quickly.

Size-exclusion chromatography would be suitable for the purification of DNA polymerase from smaller proteins and salt.

Affinity chromatography separates proteins by specific binding. Antibodies (for known proteins) or substrates (for enzymes) can be used to specifically bind proteins of interest. These can be eluted ionically, or by flooding with other species that compete for binding. For DNA polymerase, cellulose coated with denatured ssDNA will work well.

Affinity chromatography (using ssDNA) would be suitable for the purification of DNA polymerase from all but other DNA binding proteins.

Protein assay

During purification, it is essential to monitor two things: the total quantity of protein in the eluate, and the enzyme activity of the eluate. The whole point of purification is to increase the ratio of the activity to the protein.

Identifying a functional enzyme requires an assay. An enzyme is defined by its catalytic activity, so to see if we are actually purifying the enzyme during chromatography, we must measure its activity. DNA polymerase makes dsDNA from ssDNA and dNTPs, and dsDNA is insoluble in cold trichloroacetic acid. Therefore, 3H-dTTP can be used: we measure the radioactivity in the precipitate in cold TCAA after a set time to see how active our enzyme is. Spectrophotometric assays are easier though (e.g. most oxidoreductases can be monitored by their effect on NADH, which has a characteristic absorbance at 340 nm).

Assaying the total mass of proteins in an eluate may be achieved in several ways, depending on the sensitivity and accuracy required:

The UV method is generally used in molecular biology, because it is cheap, quick and non-destructive.

During purification, we would also like to know how many proteins are still contaminating our enzyme preparation. SDS-PAGE (sodium dodecyl sulfate polyacrylamide gel electrophoresis) is used for this.

SDS-PAGE methodology.

Hot mercaptoethanol is used to break disulfide bonds and denature the protein. SDS (a detergent) is then used to swamp the protein's natural charge, so that all proteins have same charge to mass ratio (the SDS binds stoichiometrically, 1 SDS per 1.4 amino acids). When these proteins are loaded onto a gel, and an electric field is applied, the proteins migrate through the pores in the polyacrylamide gel: the gel separates the proteins based on how well they can move, which is proportional to the (logarithm of their) molecular mass alone.

At each chromatographic step, the number of protein peaks decreases, both spectrophotometrically, and as measured by SDS-PAGE. The activity also increases, as measured by the cold TCAA assay.

Chromatographic purification of DNA polymerase.

Determining protein and gene sequences

We now have a purified DNA polymerase extract. Now we are in a position to find its amino acid sequence, and thence its gene.

Protein sequencing is performed by the Edman degradation using phenylthiocyanate. This can be used to determine the entire 1° structure, but more usually, it is used to get the N-terminal sequence of the protein, because it is slow and not 100% efficient. Often, it is easier to probe a DNA library for the gene, and then generate the protein sequence from that.

Edman degradation technique.
Edman degradation systematically forms amino acid derivatives from the N-­terminal end of the protein, which can be identified by chromatography.

Genomic DNA libraries are made by digesting all the DNA from an organism using restriction enzymes (defensive enzymes from bacteria that cut up DNA at specific sequences).

EcoRI cuts at GAATTC sequences.
Restriction enzymes are dimeric endonucleases that cut DNA at specific (usually palindromic) sequences. This is the EcoRI enzyme from E. coli, which cuts G|AATTC.

The library is made by inserting these millions of fragments of DNA en masse into λ bacteriophage plasmids. This allows the genes to be grown up (cloned) in E. coli. If you are interested in what genes are being expressed in a cell, a cDNA library can be made instead by using reverse transcriptase to generate DNA from the total mRNA pool in the cell (cDNA libraries have the additional advantage of not containing introns, so they can be expressed in bacteria directly).

Production of a DNA library.

Libraries are then screened with degenerate probes. If the Edman degradation generated a protein sequence of NH3+-SRVIVHVD, there are many different DNA sequences that could code for this (because of the redundancy of the genetic code). A 'degenerate' mixture of oligonucleotide probes can be synthesised, which would bind to any possible sequences for the DNA polymerase gene.

Note that either of the two sequences below would produced the same SRVIVHVD peptide, so the degenerate mixture must contain both (and several hundred others!)

5′-AGCAGAGTAATAGTCCACGTTGAC-3′

5′-TCTCGTGTCATTGTACATGTGGAT-3′

After growing up the library phages on bacteria, the plaques (areas of dead cells) they produce can be transferred to a nitrocellulose membrane, and washed with the radiolabelled probe mixture. Only plaques that contain the DNA polymerase gene will bind the probe strongly. After washing, the binding can be easily observed by autoradiography onto a photographic film.

Probing a DNA library with a degenerate probe mixture.

So now we have our gene (inside a phage), but we need to find its sequence. To do this, we need a lot of it, so we can use PCR to multiply it up.

PCR is used to exponentially increase a quantity of DNA.

The polymerase chain reaction (PCR) uses the enzymes involved in cellular DNA replication to multiply up a chosen DNA sequence to many million-fold times its original concentration. DNA polymerase from Thermus aquaticus (Taq polymerase) is used because of its stability at very high temperatures. DNA polymerases require four things:

The procedure is designed to create exponential amounts of dsDNA, consequently two primers are needed: one for the negative DNA strand, and one for the positive. These primers together delimit the portion of the template DNA to be multiplied. The procedure is as follows:

After producing a large amount of DNA, the next objective is often to sequence it. For this, a quantity of ssDNA is prepared from dsDNA, usually by a PCR-like method, but only using a single primer rather than two. ddNTP sequencing uses dideoxynucleotides, which can be incorporated into a DNA strand as usual by DNA polymerase, but due to their lack of the 3′-OH group, they prevent continued polymerisation thereafter. Therefore, they terminate the polymerisation at the base that they complement. ddNTP sequencing requires:

Four reactions are carried out: one contains a trace of ddATP, one a trace of ddTTP, etc. The ddNTPs terminate sequencing at the base they complement. The concentration of the specific ddNTP is carefully set so that polymerisations are terminated at all possible sites along the newly synthesised strand: too high a concentration would terminate all polymerisations at the first complementary base, too low a concentration would cause no terminations at all. So, adding a little ddTTP to the reaction will yield DNA fragments of all lengths up to an A on the parent strand, and likewise for the other three bases. When all four reactions have been carried out, the products are run on a gel: shorter DNA strands migrate to a lower position on the gel, so the bands are effectively sorted by DNA length as you move up the gel. Consequently, in the ddTTP lane, bands will show up wherever an A occurred in the original strand, and so on. The sequence can be read off very simply.

In a useful modification, which can be automated, ddNTPs can be tagged with fluorescent dyes: ddATP can be tagged with fluorescein, etc. The sequencing reactions can be carried out together in the same pot, and the bands identified by their colour rather than by which lane (ddATP, ddTTP, etc.) they came from.

Fluorescent ddNTP sequencing.
ddTTP will terminate the synthesis of a strand complementary to the ACGACT… parental strand when it binds to an A. By using low concentrations of ddTTP, some chains are terminated at the first A, some at the second, and some at the third, so we get a range of fragments, whose lengths correspond to the positions of A in the parent strand. The DNA fragments can be separated on the basis of length by agarose electrophoresis (very similar to SDS-PAGE). This can be done for all four ddNTPs separately, or together by using ddNTPs that have been linked to fluorescent molecules.

We now have the gene sequence of our DNA polymerase, and can predict the protein 1° structure, We can also use our phage library to make masses of the protein for analysis. If we are lucky, we can make enough of the protein to form a crystal for X-ray crystallography. X-rays diffract passing through a protein crystal. The diffraction pattern can be number-crunched to produce a 3D electron density map, and further crunching fits the 1° structure.

X-ray crystallograph.    Human DNA polymerase with possible NLS sequences highlighted.
From protein crystal to diffraction pattern, to protein structure.

Manipulating genes

So now we have the gene sequence and protein structure for DNA polymerase. Now we want to see how it works, and (to answer our original question), where it is located in the cell, and how it gets there.

The process described above is a right pain. Since we now have the gene in a phage, we can now make our lives slightly easier by modifying it:

Fusion proteins can be formed from the gene of a protein of interest and the gene for another protein that is easy to purify or observe.

GFP is an extremely useful protein from a jellyfish that glows green under blue light. It can be used to tag proteins and show where they go; or can be used downstream of a promoter of unknown function, to show where/when the promoter is active.

Green fluorescent protein - click for Jmol version
Green fluorescent protein showing β-barrel.

Cells expressing fluorescent proteins can also be readily automatically sorted by fluorescence-activated cell sorters:

Fluorescence activated cell sorting.

Site directed mutagenesis can now be used to modify signals and active sites. DNA polymerase gets into the nucleus because it has a nuclear localisation sequence, which is a stretch of highly basic amino acids that target it for karyopherin-catalysed import into the nucleus. We can identify which of the (probably several) basic stretches on the enzyme is the NLS by modifying them so they are no longer basic. Site directed mutagenesis uses an oligonucleotide with a mismatch to add specific mutations to a gene. After the oligo is annealed to the DNA we wish to mutate, it is extended using the Klenow fragment of DNA polymerase. This allows us to modify our putative nuclear localisation signals, e.g. by converting lysine to isoleucine. If we use a GFP-fusion protein, we can then simply look and see where our tagged DNA polymerase ends up.

5′---GGA GTA AAA AAA TTA ATG AAC---3′

3′---CCT CAT TAT TTT AAT TAC TTG---5′

AAA codes for lysine. By adding an oligo with a deliberate mismatch (TAT), we can introduce a mutation. Klenow fragment extends the gene from the oligo to form the lower partial sequence shown below.

5′---GGA GTA ATA AAA TTA ATG AAC---3′

3′---CCT CAT TAT TTT AAT TAC TTG---5′

After one round of replication, and synthesis of the upper complementary strand, we have a specifically modified protein with isoleucine (ATA) rather than lysine (AAA).

The GFP tag allows us to show very easily that the lysine in the putative NLS is required for nuclear import, by comparing the lysine wildtype to a isoleucine mutant we have created.

Nuclear localisation of lysine NLS wildtype.

Cytoplasmic localisation of isoleucine NLS mutant.

Left of each pair of images shows GFP fluorescence. Right of each pair shows location of nuclei.

Summary

Test yourself

  1. Why is fluorescent ddNTP DNA sequencing more convenient than conventional ddNTP sequencing?

Answers

Bibliography

Peer Review.
This page has been peer reviewed by 1 person.