Genomes

Contents

Organisation of DNA

The organisation of DNA in a nucleus or nucleoid is critically dependent on the nature and topology of the nucleic acid. Nucleic acids may occur in double or single stranded forms, most typically dsDNA (double stranded DNA) and ssRNA (single stranded RNA). Single stranded pieces of nucleic acid can pair with themselves to form hairpin loops, cruciforms, internal loops and bulges. These are important in the termination of transcription and restriction enzymes (palindromic sequences pair readily with themselves). The secondary and tertiary structure of RNAs can give it catalytic activity.

5S rRNA structure
Predicted 2° structure for human 5S rRNA. Note the internal loops where pairing does not occur, two hairpin loops, and a large bulge in the centre of a 3-way cruciform. Note also that it is actually rather difficult to predict the folding of 5S rRNA with accuracy, as it has unusual non-Watson-Crick basepairs, so this structure probably doesn't correspond too marvellously with reality!

Nucleic acids also exist in at least three helical isomeric forms: A, B and Z.

A-form of dsRNA - click for Jmol version   A-form of DNA-RNA hybrid - click for Jmol version   B-form of dsDNA - click for Jmol version
A-form in dsRNA duplex (left) from phenylalanine tRNA; A-form in an RNA/DNA hybrid (middle) - note the wider, more closely packed helix, with equal minor and major groove, compared to B-form dsDNA. B-form in dsDNA duplex (right) - note the narrower, less closely packed helix (compared to the short hybrid strand), with distinctly wider major groove.

In A and (particularly the) B forms, it is easier to distinguish bases in the major groove: A C≡G cannot be distinguished from a G≡C; nor an A=T from a T=A in the minor groove).

Hydrogen bond acceptors and donors in a GC base-pair.

Hydrogen bond acceptors and donors in an AT base-pair

RNA is capable of forming complex secondary and tertiary structures.

RNA hairpin - click for Jmol version
A RNA hairpin, showing the unusual base-paring that can occur when the regularity of a double helix is disturbed by bending.

The self-splicing type I intron is a catalytic RNA (ribozyme) with complex 2° and 3° structures. Secondary structures include A-helices and hairpin loops; tertiary structures include several hairpin-loop to bulge contacts.

Self-splicing intron - click for Jmol version
The type I self-splicing intron can catalyse its own excision.

Prokaryotic genome structure

Prokaryotic DNA is largely a simple loop of dsDNA, although some viral and other parasitic ssDNA, ssRNA, dsRNA, etc., may be present. The majority of the genome is present on a single chromosome, in Escherichia coli this is about 5 Mbp in sie, and supercoiled.

Supercoiling adds (or removes) extra twists to the already coiled DNA. Negative supercoiling (underwinding) makes the duplex part more easily, a reaction that is catalysed by DNA gyrase. Most chromosomes are negatively supercoiled, particularly bacterial ones. To add a single negative supercoil to a DNA molecule requires energy of the same magnitude required to separate one base pair. Supercoiling can be used to part ('melt') the DNA double helix. Supercoiling may be involved in initiation of replication.

DNA has associated proteins in prokaryotes, but these are not structural, but are rather things such as DNA polymerase, RNA polymerase, repressors and activators. A famous example is the catabolite activator protein, which under conditions of nutrient stress binds cyclic AMP, and then binds to lac operon, promoting the expression of lactase.

Prokaryotes have polycistronic genes, with several related genes in a contiguous block of DNA, under the control of a single promoter. Eukaryotes have discrete genes, each with its own promoter. Polycistrons can be transcribed and translated simultaneously. This is impossible in eukaryotes because mature ribosomes are excluded from the nucleus, and mRNA requires extensive processing before it can be translated.

Eukaryotic genome structure

In eukaryotes, DNA is organised into many, linear chromosomes, containing long stretches of dsDNA. Whereas prokaryotes contain nearly 100% 'useful' DNA, eukaryotes vary widely in the quantity of seemingly functionless DNA they contain. The minimum value of C for a given taxon (the size of the haploid genome in base pairs) is in rough approximation to the complexity of the organism. However, the maximum value is anything up to a thousand times greater than this number, particularly in plants, amphibia, insects and teleost fish. A housefly contains six times as much DNA as a fruitfly!

This extra DNA is mostly selfish i.e. like all DNA, it exists only because it is good at replicating, and can survive despite having no useful (or even detrimental) phenotypic effects. This junk DNA falls in many classes:

Some of the DNA is 'useful junk'. This sort of DNA is often highly repetitive and heterochromatic, such as:

Other types of DNA that are not transcribed, but which serve a useful 'purpose' are:

Unsurprisingly therefore, only 1.5% of the human genome actually codes for protein or functional RNA. Even the majority of an mRNA primary transcript is composed of introns, which are discarded.

Only 1.5% of DNA in a eukaryote is actually functional exons.

Typical values for genome size in eukaryotes and prokaryotes:

 

Escherichia coli

Homo sapiens

Size of genome

4.6 Mbp

3.3 Gbp

Size of typical gene

1 kbp

10 kbp (4 exons : 1350 bp)

Size of typical polypeptide

350 aa

450 aa

Number of genes

4 377

c. 25 000

Packaging of DNA

Homo sapiens contain 3 Gbp per (haploid) genome, so there is c. 2 m of DNA per 5 µm diameter nucleus; however, a mitotic chromosome is only around 2 µm long. This is a pretty impressive feat of packaging, especially when you consider that the volume of DNA in a nucleus is about 6 µm3 DNA, and the nucleus itself is only 500 µm3: this is 1% of perfect packing, yet the DNA is readily accessible and untangled.

The initial packaging of DNA in eukaryotes (and archaeons) only is done by histones. There are five highly conserved types, with only 2 amino acid differences between the sequences of H3 in cow and pea.

Histones are highly basic (they contain a lot of arginine (R) and lysine (K)) so they can bind readily to DNA, which is an acid. All histones possess a common tertiary structure, the 'histone fold' - a Z-shaped helix-turn-helix-turn-helix motif.

Nucleosome - click for Jmol version
A nucleosome contains eight histone proteins, forming a core around which a double turn of dsDNA is wound.

Histones bind DNA into nucleosomes, which are little beads of histones wrapped by DNA. Each nucleosome has a core of two each of H2a, H2b, H3 and H4, and the DNA wraps round this octamer twice. Nucleosomes are generally spaced every c. 200 bp of DNA in eukaryotes: 146 bp are bound to the histone core, with a 54 bp linker between the nucleosomes. An average eukaryote gene therefore spans c. 50 nucleosomes. The '11 nm fibre' (otherwise known as 'beads on a string' chromatin) that is formed by the DNA/histone complex reduces the length to about one third that of the unpackaged DNA.

Tetranucleosome - click for Jmol version
The 30 nm fibre is formed by DNA zig-zagging between nucleosomes.

As the DNA bound to the nucleosome is less available to proteins than the linker region, this means that packaging of DNA may help regulate transcription and DNA replication. However, most nucleosomes seem to be randomly arranged on DNA and do not show much evidence of specific placement. In rare cases (such as the 5S rRNA gene), the nucleosomes seem to be in phase (i.e. in exactly the same place on the DNA) across different cells. However, the DNA has to be bent quite violently around the nucleosome core, so this seems to be due to A=T pairs (which are easier to bend) being preferred in the minor groove (indicated by the curly braces below).

Histones have to bend DNA, so there is a preference for the smaller AT base-pairs to occur in the minor groove

Most DNA seems to be bound to the nucleosome during transcription, leading one to wonder how RNA polymerase negotiates the twists and turns of the DNA around the histones without falling off:

Methylation of histone H3 tends to reduces gene expression, whilst phosphorylation and acetylation (which reduce binding of histone to DNA) increase it

The jury is still out, but it seems 'subtle modification' may be the answer. Histones have 'tails' which are able to interact with other proteins and DNA. These tails can also be acetylated, methylated and/or phosphorylated. Acetylation by histone acetyltransferases reduces binding of tails to DNA, and allows easier access to the DNA by RNA polymerase. This modification can be very specific: gene silencing is caused by methylation of lysine K9 on histone H3. In addition to modification of the histones, chromatin remodelling 'machines'  (large protein complexes) also alter the binding of DNA to the nucleosome core. The fact that nucleosomal DNA with cross-linked histones can still be transcribed indicates that the modification is mostly to DNA topology and supercoiling, not to the histone core itself.

The 11 nm fibres is only the first level in DNA packaging. It is itself coiled to form a higher order structure, the 30 nm fibre, whose length is about one twelfth that of the 11 nm fibre within it.

The most recent model of the 30 nm fibres has a double helix of nucleosomes with the DNA zigzagging between them.

30 nm fibre model, with a double helix of nucleosomes with DNA zig-zagging between them.

Histone H1 binds the 11 nm fibres to make the 30 nm fibre. It appears to alter the direction of exit of DNA from the nucleosome, and 'locks it off'. The change of direction allows the nucleosomes to pack more closely, condensing the 11 nm fibre into the 30 nm. H1 exists as at least 7 isotypes (including H5, which is only expressed in certain tissues).

Histone H1 (and the related H5) lock off the nucleosome and encourage supercoiling into the 30 nm fibre.

The binding of H1 reduces gene expression, probably because it slows chromatin remodelling. So called 'active' DNA has a low affinity for H1. In SV40 such active DNA includes the origin of replication, where proteins other than histones need to bind.

Active DNA can be identified because is more readily degraded by DNAse. This can be observed by DNA footprinting, where fragments of a radiolabelled DNA sequence are generated by DNAse in the presence or absence of DNA-binding protein. A footprint is formed because the protein protects the DNA it binds to from degradation. DNA hypersensitive to DNAses can often be shown to be being actively transcribed.

Bands are produced by randomly cutting a DNA molecule labelled at one end by radioactive 35P using DNAse. If the DNAse is present only at low concentrations, sequences of every possible length will be produced, which will show up on an autoradiograph of an electrophoresis as shown in the upper diagram. If part of the DNA is protected by a protein as in the lower diagram, a patch of bands will be missing.

The 30 nm fibre is then packed into looped domains. The 30 nm fibre is looped around a central proteinaceous nuclear scaffold, which contains enzymes such as topoisomerase II, and ribonucleoproteins. Each looped domain is c. 300 kbp long, and contains c. ten eukaryotic genes. It is possible that each domain form some sort of functional unit. Prokaryotes also seem to have looped domains (of DNA, rather than chromatin).

Looped domains form around a nuclear scaffold.

Looped domains can be seen clearly in certain unusual interphase chromosomes, such as lamp-brush chromosomes in salamander eggs. The giant polytene chromosomes of Drosophila larva salivary glands show 'puffing', where looped domains decondense to allow gene expression.

The chromatin in eukaryotes exists in two main forms:

Telomeres are also heterochromatic. This probably helps prevents damage to the loose (and single-stranded) ends of the chromosome. In yeast (at least), the telomere is bound into a specific structure called the telosome, which lacks nucleosomes. Rap and Sir proteins prevent the ssDNA at the telomere from being degraded.

The telosome is a doubled-back structure that protects the telomere in yeast chromosomes.

Unsurprisingly, the other major feature of a eukaryotic chromosome, the centromere, is also heterochromatic. This is probably to withstand stresses from the kinetochore during mitosis. 15 nucleosomes are precisely placed around the centromere sequence in yeast, and these nucleosomes also contain a modified histone H3 (called CENP-A).The centromere in humans may form spontaneously on chromosome fragments (microdissected chromosomes may spontaneously form new centromeres), although specified 'α satellite DNA' sequences appear to promote centromere formation.

During mitosis, the interphase chromosomes condense even further. Phosphorylation of five serine residues in the H1 protein appears to promote this. Condensin proteins then hydrolyse ATP to make the looped domains condense into chromatids and bind ribonucleoproteins on their surfaces. The exact nature of higher order packaging is still not clear: rosettes and coils and other higher order structures have been proposed, but none have been well attested yet.

Levels of packaging of DNA in a eukaryotic chromosome.

The highest order packaging of DNA is the synaptonemal complex, which binds together two chromosomes during meiosis and is responsible for crossing over. Lateral elements (cohesin proteins) bind two sister chromatids as a single chromosome, and central elements bind two homologous chromosomes and carry recombination nodules.

Summary

Test yourself

  1. How do transcriptionally active and inactive euchromatin differ?
  2. How does telomeric heterochromatin differ from euchromatin?
  3. What is the approximate length of 60 000 µm of DNA (i.e. about one chromosome's worth) at the following levels of packing: 11 nm fibre, 30 nm fibre, looped domain, metaphase chromosome?
  4. Prove that a diploid human cell contains 2 m of DNA.
  5. Compare and contrast prokaryotic and eukaryotic gene structures.
  6. When is junk DNA useful?

Answers

Bibliography

Peer Review.
This page has been peer reviewed by 2 people. Thanks to Andreas Wilm for his comments about the 5S rRNA structure.