Transcription

Contents

Conventions

Before going any further, it is important to understand a few conventions in molecular biology to do with how we write the sequence of important biopolymers such as RNA and proteins:

Conventions in molecular biology.

RNA (right hand strand) differs from DNA (left hand strand) by its 2′ OH group, and its use of uracil instead of thymine (methyluracil). The 2′ OH groups on the ribose can act as a nucleophile, making RNA is much less stable (to alkaline hydrolysis) than DNA. ssRNA is also much more prone to forming complex secondary and tertiary structures; DNA forms essentially bland double helixes.

DNA/RNA hybrid duplex.

There are a number of different types of RNA, which play different roles in the cell:

The nucleotide sequence of RNA allows it to carry digital information, as in RNA viruses like HIV; and its base pairing allows it to hybridise with DNA (as seen during transcription by RNA polymerase); with other RNA (as when tRNA binds to mRNA and rRNA in the ribosome; and with itself (as seen in the tRNA cloverleaf. Its catalytic activity allows it to speed up reactions: rRNA and (probably) snRNA are catalytic.

RNA polymerases

RNA polymerase is the enzyme that generates RNA from DNA. Cells contain 20 times more RNA than DNA: in fact, about 5% of the cell is RNA, although only 5% of this 5% is mRNA, because most of the RNA in the cell is rRNA. Since the majority of RNA is rRNA, significantly more RNA is transcribed than translated. This is especially true in eukaryotes, whose mRNA requires processing to remove introns.

The discovery of RNA polymerase is an instructive one. In 1955, Grunberg-Manago and Ochoa found an enzyme catalysing:

(RNA)n + NDP → (RNA)n+1 + Pi.

However, the reaction did not seem to need a template, and the RNA sequence depended on the concentrations of NDPs! This didn't sound like the properties expected of RNA polymerase. In fact, what the group had discovered was polynucleotide phosphorylase running backwards! Oops. The real enzyme was discovered in 1960 by Weiss, Hurwit, Stevens and Bonner in E. coli.

One somewhat esoteric question that is nonetheless intestesting is why the cell shouldn't make polypeptides directly from DNA, rather than going through an RNA intermediate. There are several reasons. The most obvious is historical: the ribosome exists and we're stuck with it. However, as we said above, some RNAs (perhaps 10% of the transcribed genome) are themselves functional (rRNA, tRNA, snRNA, etc), and even more importantly, is the fact that proteins are needed in huge copy numbers compared to the DNA from which they are transcribed. Transcription allows amplification because mRNA can be worked on by several ribosomes at once, allowing greater rates of protein production. The lack of amplification of rRNA results in the cell having to have many copies of its rRNA genes to produce sufficient quantities for the synthesis of ribosomes.

The primary gene products of RNA polymerase (in eukaryotes) are:

The interactions of RNA after (and indeed before) transcription is now understood to be absolutely fundamental to the functioning of the cell: RNA is no longer considered to be a bland intermediate between DNA and protein, but rather an active participant in its own synthesis, processing and regulation.

RNA polymerase enzymes are very large:

Their speed of transcription is around 50 b s−1, meaning an mRNA for an average protein takes about 20 s (prokaryote) or 3 min (eukaryote) to transcribe - longer in eukaryotes, despite their proteins being of similar size, because they contain introns.

There are two main classes of RNA polymerases: bacteriophage T7's RNA polymerase is a stripped down, single-subunit mutant DNA polymerase, as are plastid and mitochondrial RNA polymerases. All other RNA polymerases are multi-subunit enzymes, unrelated to DNA polymerase. However, despite very different origins, their core mechanisms are similar (analogous).

RNA polymerase is a DNA dependent RNA polymerase (DdRp). We have already met a DdDp (DNA polymerase), and you may have come across reverse transcriptase too (which can act as a RdDp). The enzymes are similar in some ways, but differ in many others: RNA polymerase has no need for double stranded primer, and obviously requires NTPs, not dNTPs, as monomers. The most important difference is that the hybrid DNA/RNA duplex synthesised by RNA polymerase does not persist: the RNA is displaced and the DNA duplex reforms. Both RNA and DNA polymerase require Mg2+ as a cofactor. The error rate of RNA polymerase is much higher: 10−4.

Bacterial RNA polymerase showing core subunits plus sigma

All multi-subunit RNA polymerases have 5 core subunits. The (true) Bacteria have an additional σ factor subunit that aids regulation and binding to DNA.

Eukaryotic RNA polymerase showing common and core subunits

Eukaryotic RNA polymerases have five core subunits plus five common subunits.

Eukaryotic RNA polymerase II showing common and core subunits and the additional two variable subunits (4 and 7)

Eukaryotes have three varieties of RNA polymerase. These differ slightly from one another: RNA polymerase II has the five core subunits plus the five common subunits, but also has two 'variable' subunits specific to RNA polymerase II only. Note that RNA polymerase II also has a 'tail', which is involved in RNA processing and the initiation of transcription.

Archaeal RNA polymerase showing common and core subunits similar to RNA polymerase II

Archaeal RNA polymerases is more similar to eukaryotic RNA polymerase than to bacterial, and is particularly similar to RNA polymerase II. The archaeal system may have evolved by 'stripping down' of over-complex system of eukaryotes.

Bacterial RNA polymerases

Bacterial RNA polymerases are best characterised in Thermus aquaticus. The five subunits are:

Bacterial RNA polymerase - click for Jmol version
Bacterial RNA polymerase viewed from the 'side' and without the σ factor bound.

RNA polymerase has a number of features that help it bind and process DNA:

Rudders, flaps and clamps help RNA polymerase bind to DNA.

The clamp keeps the polymerase anchored to DNA, whilst the flap ensures that the mRNA is retained. The rudder (and associated structures) prevent the DNA/RNA hybrid duplex from persisting. A zipper (not shown, but close to the rudder) helps reform dsDNA. Note that DNA does not enter the polymerase's 'mouth' directly: it is held sideways (like a rose between the teeth), with a sharp bend to the left as it exits the polymerase. The mRNA is believed to leave via the back of the polymerase; it is not yet clear how NTPs enter the active site: they may come in the same channel as the DNA or maybe via a secondary channel in the β lobes.

Initiation

The promoter consensus sequence (below) is highly conserved in bacteria. This is because the RNA polymerase holoenzyme binds directly to DNA via its σ subunit (which is only found in bacteria). However, even with σ, abortive initiation occurs: tiny pieces of RNA (6 nucleotides long) are made as the polymerase jiggles about on the promoter.

5′      −35      −10   1     3′

5′---TTGACA---TATAAT---NNN---3′

Different σs have different affinities for variant promoters.

Note that the factors listed above correspond to different modes of life: the σ system allows switching between several different strategies: rapid growth, stress metabolism, motility, sporulation, etc. There are also anti-σ factors: anti-σ70 (Rsd) is synthesised as cells enter stationary phase. This inactivates σ70, allowing σS to recruit RNA polymerase.

Elongation

Elongation begins when RNA polymerase makes about 10 nt of RNA. This is long enough for the mRNA to displace σ and tuck behind the 'flap' region. The RNA polymerase then dissociates from σ, and the clamp closes, trapping DNA. The flap also closes, trapping mRNA. Elongation occurs by incorporation of ATP, GTP, CTP and UTP. A temporary (10 bp) RNA/DNA hybrid formed, which is prevented from becoming permanent by the 'rudder', which wedges into the nascent helix, dislodging the RNA, so the ssDNA can re-anneal with its complement.

Termination

Termination occurs by a very neat system of a self complementary sequence then UUUU. The self-complementary sequence forms an RNA stem-loop hairpin and makes RNA polymerase drop off by wedging open the flap. The self-complementary sequence is CG rich because C≡G pairing is strong; and the the sequence is followed by UUUU because A=U pairing between RNA and DNA is weak. In bacteria, many terminators use the ρ-factor instead, a hexameric ring that binds the terminator on the RNA and yanks it out of RNA polymerase.

RNA polymerase cycle

RNA polymerase cycle.


Clockwise from top: core subunits and σ associate to form holoenzyme, which binds DNA (σ made partially transparent to show what happens inside) to form the 'open complex'. σ melts the DNA making a short hairpin loop, and exposing the template DNA to the active site ('closed complex'). σ dissociates (after a short round of abortive transcription), and the clamps and flaps lock the nucleic acids in position. mRNA is synthesised. The termination sequence causes the mRNA to form a stem-loop hairpin, which wedges open the flap and causes the mRNA to dissociate from the enzyme. The free core then moves off to find another promoter.

Eukaryotic RNA polymerases

Eukaryotes have three RNA polymerases:

All three have the core subunits: αIαII are 3, 11 in eukaryotes (heterodimer), ββ′ become 2, 1 and ω's equivalent is 6. All three also have five common extra subunits: 5, 8, 9, 10 and 12, and a variable number of specific subunits.

RNA polymerase I makes rRNA (except the 5S rRNA). It has 4 extra units. RNA polymerases can be investigated by and individually targeted using specific inhibitors, the most important of which is α-amanitin, a toxic cyclic peptide extracted from Amanita phalloides (the death cap). RNA polymerase I has a very low sensitivity to α-amanitin. rRNA genes exist in tandem repeats, so when they are transcribed transcribed by a battery of RNA polymerase enzymes, they form 'Christmas trees': each branch being a transcript with a nascent ribosome beginning to form at its tip.

RNA polymerase II makes mRNA, some snRNAs and the snoRNAs required for rRNA processing. It has two extra units (4, 7) added to the core and common subunits, and exists in about 40 000 copies per cell. Its sensitivity to α-amanitin is very high. The archaeal RNA polymerase is homologous to RNA polymerase II.

RNA polymerase III makes tRNA, the 5S rRNA, and some snRNAs and other small functional RNAs. Its sensitivity to α-amanitin is moderate.

The three forms of RNA polymerase in eukaryotes allow differential control of transcription at the enzyme level. The mixture of conserved polypeptides and unique subunits allows different promoters to be recognised. However, (like much in eukaryotes), it smack of over-engineering: the system still needs further regulation to differentially transcribe e.g. different mRNAs at different levels.

Transcription in eukaryotes differs markedly from that in bacteria:

Transcription factors

Since the polymerases cannot bind DNA directly, each polymerase needs a different set of general transcription factors (imaginatively titled TFI, TFII, TFIII) to help it bind DNA. Promoters are less conserved in eukaryotes than bacteria, because of the variability of the TF complexes. Many of these DNA-binding proteins have zinc finger motifs, where a helix-turn-sheet is held into a 'finger' by coordination to a zinc ion (via four histidine and cysteine residues). The finger fits into the major groove of the DNA.

Zinc finger - click for Jmol version
Zinc fingers: three fingers coil around a DNA helix's major groove. An isolated finger shows coordination of two cysteine and two histidine residues to a zinc ion, to form the 'finger' structure.

The transcription factors of RNA polymerase II bind in a set sequence. TFIID binds the DNA's TATA box via its integral TBP (TATA binding protein). The TATA binding protein severely deforms DNA, melting the helix. The remaining parts of TFIID are termed TAFs - TBP associated factors.

TATA binding protein - click for Jmol version
TATA binding protein complexed to DNA helix: note the severe bending of the DNA, almost through 90°.

The remaining TFs are then recruited sequentially, with TFII­B binding one of the promoter sequences, RNA polymerase and TFIID. RNA polymerase is then recruited. TFIIH melts the DNA using ATP and phosphorylates the tail of the polymerase, causing it to dissociate from the TF complex and begin elongation. When RNA polymerase II starts transcription, the TFs are mostly left behind on the promoter.

xxx
Initiation in RNA polymerase II

The transcription factors bind to the promoters, which are not all necessarily on the same strand of the DNA as the transcribed region. Nor are all the promoter sequences necessarily present in every gene. The promoter(s) decides which strand is read (remember that DNA has two strands: genes appear on both the 'upper' and 'lower' strand, in extreme cases, even overlapping. The TATA box (TATA{AT}A{AT}) binds TFIID, whilst the BRE element ({GC}{GC}{GA}CGCC) is bound by TFIIB

5′   −35    −30     1    30   3′

5′---BRE---TATA---INR---DPE---3′

Note that some promoter sequences are within the transcribed portion of the gene.

Eukaryotic RNA polymerase has a 'tail', which is phosphorylated by TFIIH. This starts the polymerase off. Histone acetylase and chromatin remodelling machines precede the polymerase, doing something to the nucleosomes that (like during DNA replication) is still not well understood.

Chromatin remodelling occurs upstream of transcription.

Enhancers and gene regulatory regions

In eukaryotes, it is not just the promoter that regulates gene expression. Gene regulatory proteins bind enhancers and recruit TF via mediator proteins, which allows DNA to loop over large distances (1 to 20 kbp). The gene's promoters and enhancers creates a 'gene regulatory region' that can be much larger than the gene itself, with many enhancers (both negative and positive) modifying recruitment of RNA polymerase, and allowing very fine tuning of gene expression.

In eukaryotes, DNA distant from the gene being transcribed can influence its expression.

This region can be investigated by putting the regulatory region upstream of reporter genes (lacZ or GFP), and then investigating when (in embryology) and where (in what tissues) the regulatory region is turned on. Detection of lacZ needs autoradiography or fluorescent antisense probe, whereas GFP just needs blue light.

mRNA processing

Eukaryotes perform a great deal of RNA processing. This seems more likely to be the ancestral character state, and bacteria have largely dispensed with RNA processing (for mRNA at least), allowing polycistronic genes, where several related genes are expressed together on a single mRNA with more than one start codon (upper diagram). Eukaryotic mRNA (lower diagram) only contains one coding sequence. Archaea have some RNA processing (self-splicing introns), but much less than eukaryotes.

Polycistronic mRNAs from bacteria have several start codons.

Eukaryotes only have monocistronic mRNAs.

mRNA is produced by RNA polymerase II in eukaryotes. The primary transcript (pre-mRNA) is rapidly modified. Much of this occurs at the same time as transcription:

The roles of this modification are several:

5′ cap

The first modification to mRNA in eukaryotes is the addition of a 5′ cap, which is synthesised in four steps: Pi cleaved from the end of the pre-mRNA transcript by phosphatase; then guanyl transferase uses GTP used to add guanosine to the end of transcript (using an unusual 5′ to 5′ linkage). The guanosine is then 7-methylated; and zero or more riboses at 5′ end may also be methylated as well.

An mRNA cap has methylguanosine attached to the rest of the mRNA by an unusual 5 prime to 5 prime linkage.

Capping only occurs on mRNA. Recombinant loci with an RNA polymerase II promoter but a RNA polymerase I or III sequence are capped, hence the capping is performed by factors associated only with RNA polymerase II.

The cap binding protein complex (CBC) binds the cap: this is needed when the mRNA is exported from the nucleus.

Capped mRNA binds several RNP and protein cofactors, including the CBC.

The role of the 5′ cap is bound by the small subunit of ribosomes. It also prevents degradation of the pre-mRNA in the nucleus. In humans (but not yeasts) RNA polymerase carries on transcribing after the primary transcript has been cleaved. Without the cap, this is degraded, and eventually the polymerase terminates.

The transcripts that RNA polymerase makes as it (slowly) terminates are degraded because they lack a 5 prime cap.

3′ poly-A tail

The poly-A tail which is added to almost all mRNAs is added post-transcriptionally: it is not encoded in the gene itself. The pre-mRNA transcript has an AAUAAA sequence that is recognised by CPSF (cleavage and polyadenylation specificity factor) and a CA sequence that is an endonuclease binding site.

5′       c.20 nt           3′

5′---AAUAAA---CA---{GU}n---3′

The poly-A tail is added in two main steps: AAUAAA is bound by cleavage and polyadenylation specificity factor (CPSF), which recruits cleavage stimulating factor F. CStF endonuclease cleaves the transcript at the CA site. Poly-A polymerase then adds c. 200 adenosine residues, which are bound by poly-A binding protein.

Addition of poly-A tail to mRNA.

The tail is required for export through nuclear pores, and prevents degradation in the cytoplasm (the half-life of poly-A mRNA is c. 10 hr).

Splicing

Transcripts in most eukaryotes are c. 10 000 nt, but polypeptides are 400 aa = 1200 nt. 80% of the transcript is not translated, i.e. pre-mRNA is generally 7800 bases longer than a typical 1200 bp mRNA. What happens to all this excess RNA? Most eukaryotic genes contain internal 'junk' (discovered in 1977). Exons (which are expressed, and exit the nucleus) are separated by introns (100 - 100 000 nt long), which are removed from the pre-mRNA and destroyed without ever leaving the nucleus. The size of introns is species-specific: yeasts have very few, some human genes can have 50 in a single gene! Typically there are four introns within five exons, but the introns (as mentioned above) are generally much larger than the exons. Introns must be spliced out from between the exons before the mRNA can leave the nucleus.

5′---E1---I1---E2---I2---E3---I3---E4---I4---E5---3′

As mRNA is transcribed, it is complexed by five small nuclear ribonucleoproteins (snRNPs, also known as 'U' particles).RNAse treatment of pre-mRNA degrades pre-mRNA into heterogeneous ribonucleoprotein fragments. These proteins have a common RNA binding motif (below). Some of these hnRNPs are U-proteins, directly involved in splicing out introns, others are important for labelling introns and exons.

H3N+---{KR}G{FY}{AG}{FY}VX{FY}---…---COO

Sufferers of systemic lupus erythrematosus (SLE) produce autoimmune anti-snRNP antibodies, which can be used to reveal islands of snRNP in nucleus, which are know as Cajal bodies.

Splice sequences are quite variable across species, but in general:

5′   donor      branch  acceptor  3′

5′---AGGU------UAAC-----AGG---3′

5′---AGG---3′

The intron sequence is spliced out of the transcript to generate the processed mRNA. This is the human consensus sequence.

The U-particles combine with one another in a variety of ways to for spliceosomes. In the most typical splicing reaction, U1 binds the 5′ donor. U2 then binds the 3′ acceptor and bridge sequence. U4 and U6 act together to form a lariat, and then U5 ligates the exons together. Note that U2, U5 and U6 remain attached to the lariat in U-catalysed splicing. This helps mark the lariat for degradation.

Type II splicing.

Tetrahymena (a ciliate) and some Archaea have self-splicing introns. The snRNP system described above probably evolved from (type-II) self-splicing systems. Note that there are several other varieties of splicing, using the U particles whose existence you might be wondering about ("where's U3?"…).

U based splicing systems evolved from RNA self-splicing systems.

The donor and acceptor splice junctions are correctly paired off in multi-intron genes because splicing is concurrent with transcription. Some specificity may also be allowed by the hnRNPs and SR-rich proteins that bind to the pre-mRNA. Capping, splicing and poly-A factors are associated with the tail of RNA polymerase II: the enzyme complex is a factory, both producing mRNA and processing it ready for export.

Splicing seems to be an expensive waste of time for the most part; however, under certain circumstances, it has advantages in producing more than one gene product from a single stretch of DNA. In B-lymphocytes, early in development, the cell produces membrane-tethered immunoglobulin (Ig) receptors by transcription and splicing of the entire gene:

Long transcript:

5′---Ig---donor---stop---acceptor---hydrophobic---stop---3′

After splicing:

5′---Ig----hydrophobic----stop----3′

Protein:

H3N+-Ig-hydrophobic-COO

The longer transcript splices out the first stop codon, producing an antibody with a hydrophobic carboxy terminus, which allows it to be tethered into the cell membrane. However, after the cell has recognised an antigen, it begins to produce soluble antibodies instead. Recognition of antigen increases the amount of CStF in the cell, producing a shorter transcript of the gene, which lacks the intron acceptor sequence:

Short transcript:

5′---Ig---donor---stop---3′

No splicing, because of lack of acceptor junction, so the protein is:

H3N+-Ig-COO

The shorter transcript produces a protein with no hydrophobic peptide, which is plasma-soluble.

We've mostly been speaking about mRNA modification, but the other types of RNA are even more heavily modified (edited) than mRNA…

tRNAs are c. 80 bases long, and in eukaryotes, there are at least 31 in the cytoplasm, 30 in chloroplasts and 22 in mitochondria. All tRNAs have an acceptor stem with a 3′ CCA sequence, which (covalently) binds amino acids. They all also have a triplet anticodon, which (transiently) binds mRNA. The general structure is often termed a 'cloverleaf', which is true of the secondary structure, but somewhat misleading for the tertiary…

tRNA for phenylalanine.

tRNA has 'L'-shaped tertiary structure.

tRNA phenylalanine - click for Jmol version
tRNA for phenylalanine.

Modification of the tRNA anticodon produces gibberish proteins. Chemical modification of tRNAcys to tRNAala also produces non-functional proteins. The anticodon/amino-acid pairing is essential to correct translation of mRNA.

tRNA contains unusual nucleosides.

Wybutosine.
Wybutosine.

Chemical modification of tRNAs is performed by proteins, unlike mRNA splicing. One major effect of modification is to allow wobble. Wobble is the non-Watson-Crick pairing at the third base of a codon, which means that fewer anticodons (tRNAs) are needed than the 64 possible codons:

There is generally more wobble in bacteria (see bracketed bases below). Mitochondria even have wobble at the second base of the codon.

Wobble codon base

Possible anticodon bases

U

(A), G, I

C

G, I

A

U, (I)

G

C, (U)

In trypanosome mitochondria, guide RNA (gRNA) guides excision and addition (insertion/deletion or indel) of bases (mostly U) to mRNA.

gRNA above; mRNA below:

3′---UU AUCUCAACCGACCA---5′

5′---AAGUAGAG~~GGCUGGU---3′

Note that the is G displaced and the AA causes stretching, so the edited RNA looks like:

5′---AA|UAGAGUUGGCUGGU---3′

With the G excised and UU inserted in the stretch.

The nucleolus

The nucleolus makes and/or processes rRNA, tRNA, telomerase RNA and some snRNAs. The nucleolus is not membrane-bound: it's 'just' a patch in the nucleus, with three main structures under TEM:

  1. Pale-staining components (loops of nucleolar-organising DNA).
  2. Fibrillar components (45S transcripts).
  3. Granular background (nascent ribosomes).

rRNA in eukaryotes exists as tandem repeats of genes. In humans, ten chromosomes possess rRNA genes, each with tens of copies. These chromosomes are termed nucleolar organisers. The number of tandem repeats tends to increase over time by errors during meiotic crossing over. After mitosis, the patches of chromatin and RNP from each of these chromosomes associate to form the nucleolus. E. coli has seven copies of its rRNA gene, but these are not present as tandem repeats, but are instead found and seven different locations on the chromosome.

(Most of the) rRNA in eukaryotes is synthesised as a 45S precursor by RNA polymerase I. This is cleaved into three pieces: 18S, 5.8S and 28S.

45S rRNA is cleaved into the three main rRNAs.

snoRNAs then act as gRNAs, directing RNA-modifying enzymes to specific parts of the pre-rRNA. Many modifications are made: mostly methylation and isomery of uridine to pseudouridine.     

Uridine.   Uridine.
Uridine and pseudouridine: note the pyrimidine ring has been 'flipped' along the diagonal.

Ribosome assembly occurs by spontaneous assembly of proteins around the RNA cores, which will occur even in vitro. The proteins required are imported into the nucleus via the nuclear pores. The small subunit takes about 30 min from binding of RNA polymerase to export into cytoplasm; the large subunit takes about 60 min. The two subunits are exported fully formed through the nuclear pores, but do not assemble without mRNA.

Why have a nucleolus? Well, rRNA (and other RNAs) have no translational amplification, unlike proteins. Consequently, the cell needs many copies to provide enough ribosomes. There are 200 copies of the 45S rRNA gene in Homo sapiens, plus 2000 copies of the 5S rRNA gene. Similar subnuclear structures (GEMS and Cajal bodies) may be snRNA factories.

If you hadn't realised, the obvious place to go from here is translation

Summary

Test yourself

  1. How does the initiation of transcription differ between eukaryotes and bacteria?
  2. How do RNA-RNA interactions guide the processing of RNAs?
  3. Why is the error rate of RNA polymerase so much higher than for DNA polymerase? Surely this must be a catastrophe waiting to happen?

Answers

Bibliography

Peer Review.
This page has been peer reviewed by 2 people. Thanks to Wayne Decatur for his correction.