TranscriptionTranscription is the transfer of genetic information from DNA to RNA, using DNA as a template. Protein synthesis occurs in ribosomes.
What is a gene? At one level, a gene is an ordered string of nucleotides that encodes a polypeptide. Such genes are "structural" genes. We also know that genes can also encode RNA, including messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA) as well as other RNA types. But something has to turn on and terminate gene expression, as well as regulate it. The regulatory sequences, which may be "promoters" or "enhancers/silencers" may be located far away from the coding regions. So now our view of a gene must include the idea of separate regions of a chromosome. What if the information as transcribed onto mRNA doesn't reflect the final protein until it is further modified? This is "posttranscriptional modification". Now the concept of a gene is becoming even cloudier. What if there are "overlapping" coding regions? Clearly our definition of a gene is not going to be a simple one.
Functionally, though, we can describe a gene as having a distinct coding region and a distinct regulatory region, the latter controlling the rate at which DNA is transcribed into mRNA. We will see that the regulatory units are composed of DNA "motifs" and that every motif will need to be occupied by a regulatory protein if a gene is to be regulated properly. Not only must there be the appropriate attachment of the protein, but the proteins all have to fit together with proteins binding to other nearby motifs in the way that jigsaw puzzle pieces fit together. And there is only one correct way for everything to fit together. So, it's not just a simple matter of DNA directing mRNA synthesis, which then directs protein synthesis; proteins are intrinsically involved in the regulation of protein production at the level of transcription. This can get to be a nightmare if you begin to think about the regulation of the production of regulatory proteins.
We also have to consider an important difference between eukaryotes and prokaryotes with regard to the transcription of structural, or protein-coding, genes. In eukaryotes, the genes are transcribed individually while in prokaryotes, genes with related functions ("operons") can be transcribed together. As an example, the Lac operon includes three protein-encoding genes as well as their control sequences. The operon is transcribed as a single unit as a "polycistronic mRNA". Eukaryotic structural genes are transcribed as monocistronic mRNA.
DNA-Directed RNA SynthesisThere are three steps that characterize DNA-directed RNA synthesis:
(1) Initiation by binding of the transcription apparatus to the DNA template
(2) Elongation of the mRNA chain
(3) Termination of the mRNA chain
The piece of mRNA that results from the direct transcription of the DNA that encodes a "gene" is called the "primary transcript" and it undergoes modification, sometimes quite extensively, before it can translate its message into protein.
The class of enzymes that synthesize RNAs are known as RNA polymerases. They are all multisubunit complexes that are present in all cells and they catalyze the reaction:
(RNA)n residues + 1 NTP === (RNA)n+1 residues + PPi
Pyrophosphate is irreversibly hydrolyzed to 2 Pi thus driving the reaction to the right. The individual nucleotides that are read off of the DNA template strand are transcribed into the nucleotides of the corresponding RNA, so the final result is a single-stranded polymer, namely the mRNA, whose nucleotides correspond exactly to the complementary nucleotides on the DNA strand with the exception that everywhere that a "A" appears in the DNA template strand, a "U" appears in the mRNA. (The possible NTPs, then, are ATP,CTP,GTP,UTP.)
Transcription in Prokaryotes
The most studied RNA polymerase is that from E.coli, so we will study it as the prototype of the RNA polymerases. The holoenzyme is a 449 kD protein composed of a "core enzyme" and a "s-subunit", and the entire complex is denoted (core)s. The core enzyme directs the polymerization reaction, and it has 4 subunits: core enzyme = a2bb'w. The inorganic ions Zn2+ (two of them in the b' subunit) and Mg2+ are required for catalytic activity and the three-dimensional structure of the enzyme resembles a hand. The thumb of the hand can be envsioned as grasping a piece of B DNA that lies in a channel represented by the curved fingers and palm of the hand. This channel is cylindrical, with dimensions on the order of 25 A by 55 A. These dimensions allow a fit of about 16 base pairs of B DNA.
The "hand" structure appears in other enzymes that we will study, including DNA polymerase and reverse transcriptase. You can further study the hand structure of RNA polymerase by looking at T7 RNA polymerase (see PDB below).
We will look at transcription from the point of view of the gene, which we have already mentioned is a rather ambiguous entity. Nevertheless, it is clear that there must be a starting point for correct transcription to take place, and it is reasonable to include this as part of the gene, even though it does not get transcribed itself. So, the problem of initiation is really one of recognition of a starting point. But which of the two strands of the DNA serves as the template and how does the polymerase choose?
Either strand can serve as the template but the transcription always proceeds from the 5' end of a strand of DNA to the 3' end. The 3'-5' strand that serves as the template is called the "antisense" or noncoding strand and the 5'-3' strand (which has the same nucleotide sequence, with exception of "U"s for "T"s, as the subsequently transcribed mRNA) is the "sense" or "coding" strand. To be consistent and clear, we will use the convention that our description of position along a sequence of nucleotides will be from the point of view of the sense strand, as this is the same ordering as that of the mRNA that is transcribed. The part of the gene that serves as the initiation site is called the "promoter" and it is sought out by the RNA polymerase holoenzyme. The holoenzyme binds weakly to DNA, with a Kdissoc of about 10-7 M, and this allows it to move along the antisense strand in search of the promoter. The s subunit is specific for its promoter sequence and tight binding of the holoenzyme occurs (Kdissoc of about 10-14 M).
The promoter is recognized by an approximately 40 bp nucleotide sequence on the 5' side of the initiation site, and within this sequence are two "conserved" sequences. One of these is 6 bp in length and is centered about 10 bps upstream from the starting site of transcription. This is the "Pribnow Box" and it has a consensus sequence TATAAT .The other, less highly conserved, sequence is centered about 35 bp upstream and has a consensus sequence of TTGACA. The start site is indicated by the notation +1 and is almost always A or G.
RNA polymerase holoenzyme contacts the promoter at roughly the centers of the two regions (-10 and -35) and the core enzyme tightly binds to the duplex DNA. Its action is that of melting the double-stranded DNA along a sequence of about 11 bps, from -9 to +2. The s factor splits off as transcription begins.
It is the specific s factors within a cell that determine which genes will be transcribed. Thus individual cell types are characterized by their s factors.
Chain elongation proceeds in the 5'--> 3' direction, and the "transcription bubble" (the length of "melted" DNA) travels with the RNA polymerase. As a consequence, the unmelted DNA is overwound in front of the bubble and underwound behind the bubble. Topoisomerases then act to relax the positive and negative supercoils. The mRNA that is produced is hybridized for a short length to the DNA at the downstream position, and exists separate from the DNA as a "tail", the point of attachment being at the downstream end. The RNA polymerase does not fall off of the DNA as it is processing because of its relatively tight, but nonspecific, binding on both sides of the transcription bubble, stabilized by its "thumb" wrapping around the DNA. About 20 to 50 nucleotides are transcribed per second at 37 C and one nucleotide is incorrectly transcribed in about every 104 . As genes are repeatedly transcribed, this error rate is not too deleterious, especially when coupled with the fact that there are multiple codons ("synonyms") for each amino acid subsequently translated and that single amino acid substitution errors in a protein usually do not hinder its function.Spontaneous termination of gene transcription is signaled by "termination sequences".In E.coli, the final signal to stop transcription is a series of 4 - 10 A-T base pairings with the As on the template strand. For each A in this region, the mRNA transcript will have a U. Just upstream from this sequence is a region rich in G and C bases followed by a spacer of nucleotides and another region rich in G and C. The two G,C rich regions are such that one region can be superimposed upon the other by a symmetry operation of 180o . This relationship of base pairs around a center of rotational symmetry is called a "palindromic sequence". The resulting string of nucleotides at the 3' end of the mRNA is such that a hairpin loop can form, the Gs base-pairing with the Cs and vice versa, and the As with the Us. The most terminal part of the 3' end is a series of Us followed by a hydroxyl group. As the loop is forming, the RNA polymerase pauses at the termination site. The terminal oligo-U tail, which is only weakly bound to the DNA template strand, is displaced by the non-template DNA strand. Now the mRNA strand is free of the DNA template. However, there are numerous other factors that influence the overall process of termination.
Nonspontaneous termination of transcription requires a "rho factor" protein, which also functions to improve the spontaneous termination efficiency. The rho factor recognizes a sequence on the growing mRNA chain, upstream from the termination site, after which it attaches and moves along the chain in the 5'-3' direction until it reaches the RNA polymerase that is paused at the termination site. The transcript is released from its template strand by the unwinding of RNA-DNA duplex by the rho factor.
Transcription in Eukarytes:
While very similar to that in prokaryotes, the "machinery" and control sequences of transcription in eukaryotes is much more complex, and there are numerous RNA polymerases.
Ribosomal RNA (rRNA) constitutes about 95% of all RNA and about 67% of the RNA in ribosomes. The remainder of RNA includes transfer RNA (tRNA), messenger RNA (mRNA) and other types present in smaller amounts, like "small nuclear" RNAs (snRNAs) involved in mRNA splicing and "guide" RNAs that are involved in editing of RNA. These latter two processes occur in the post-translation stage of the life cycle of eukaryotic mRNA. All RNAs are coded for by DNA, and the different types of RNA polymerase in eukaryotes reflect this and the fact that, in eukaryotes, translation of mRNA into DNA occurs outside of the nucleus.
Precursors of most rRNA are synthesized in nucleoli with the enzyme RNA polymerase I. Precursors of mRNA are synthesized in the nucleoplasm by RNA polymerase II while RNA polymerase III, also in the nucleoplasm, synthesizes precursors of 5S RNA, tRNAs and other RNAs found both in the nucleus and cytoplasm. Mitochondria have their own RNA polymerases, and these are analogous to chloroplast RNAs found in plants. We will focus on RNA polymerase II as it is the one involved in transcription in eukaryotes.
You can look at the structure of yeast RNA polymerase II (see PDB below) as we discuss its structure as a prototype. These are large, multisubunit enzymes, with some of the subunits being homologs of the a,b, and b' subunits in the prokaryotic RNA polymerase. The overall shape of the enzyme is similar to that of the prokaryotic RNA polymerase ( and DNA polymerase), namely that of a hand with a "thumb" motif that flanks a channel big enough to contain a piece of B-DNA (about 25 A wide).
We did not yet consider the chemistry of the elongation of the mRNA chain, but we will do so here. The chains are elongated in the direction 5' --> 3' by nucleophilic attack of the 3' OH group of the growing chain by the a-phosphate of the incoming NTP.
As in prokaryotes, eukaryotic transcription begins by recognition of promoters. There are many copies of the rRNA genes that direct rRNA synthesis, all with almost identical sequences. This redundancy assures an adequate supply of rRNA which, as we mentioned previously, comprises about 95% of cellular RNA. The promoters for these almost identical genes are, therefore, identical, so RNA polymerase I must only recognize one promoter sequence. However, the RNA polymerase I is species-specific (RNA poly II and III are not species specific).
For promotion of mammalian rRNA,, there is a "core promoter element" that spans the region -31 to +6 (note that this overlaps a region of the gene that is transcribed) and an "upstream promoter element" that spans -187 to -107.
For transcription of genes by RNA polymerase III, the promoter is sometimes located in a segment within the transcribed part of the gene, between +40 and +80, but can also be partially upsteam or entirely upstream fro the start site.
RNA Polymerase II Promoters and Control Sequences
Promoter sequences for RNA polymerase II are diverse. We can divide these into two classes: those that are found in genes that produce proteins at about the same rate in all cells ( "constitutive enzymes") and those for genes whose production rates vary greatly from one cell type to another and depend upon the needs of a differentiated cell at a given time ("inducible enzymes").
Constitutive Gene Promoter Elements:
The GC Box : This is a region containing one or more copies of the sequence GGGCGG (or its complement) in a location upstream from the start site, and it is analogous to the prokaryotic promoter elements.
Other promoter elements are also found in the -50 to - 110 region upstream from the GC box.
Selectively Expressed Gene Promoter Elements:
The TATA Box : A region located at about -25 to -30 that is rich in the nucleotides "A" and "T" and that resembles the Pribnow Box (TATAAT). Genes can still be transcribed in the presence of a defective TATA box and it is thought that the TATA box is involved in choosing the transcription start site
The CCAAT Box : This is a sequence that is often found upstream to the TATA box, located at about -70 to -90. These bind RNA polymerase II as well as other proteins needed for initiation of transcription.
Control Sequences for Structural Genes:
Other regions of the chromosome, some far-removed from the start site, can affect the binding of RNA polymerase II to promoter elements. These gene elements are called "enhancers" and "silencers". Proteins called "activators" and "repressors" can bind to the enhancers and silencers , thus affecting polymerase binding to the promoters. Furthermore, the same protein can function as both an activator or a repressor, depending upon the specific interaction ("dual-acting" transcription factors).
Recruitment of RNA Polymerase II to the Promoter:
Eukaryotes do not have a simple protein that corresponds to the s factor in prokaryotes. Rather, there is a set of proteins that together perform the same function as the s factor, and these are the "general transcription factors" ("GTFs"). We have already looked at structures of transcription factors when we discussed DNA-protein interaction in a previous lecture. Otherwise, the general mechanisms of transcription initiation are similar.
There are 6 GTFs that are required for a low and invariant basal rate of transcription, and this rate can be increased by the participation of other protein factors. These GTFs form a "preinitiation complex" that begins when the "TATA binding protein" ("TBP") binds to the TATA box (if there is one) of a promoter. The specific sequence at which it binds identifies the transcription start site. As a result of this binding, the DNA is distorted by kinks at both ends of the TATA box. Other GTFs bind successively, followed by the binding of RNA polymerase. Finally, the remaining GTFs bind.
After TBP (which is a component of TFIID) binds, the sequence of binding is as follows:
TFIIH has two important enzyme activities. The first is an ATP-dependent helicase activity that assists the formation of an open complex and the second is a kinase activity that results in the phosphorylation of the largest subunit of RNA polymerase II at its C-terminal end. Now the transcription elongation process can begin, with the various GTFs (except TFIIF) dissociating from the complex as elongation occurs. TFIID remains bound to the promoter so that repeated transcription can occur as GTFs reassemble to form the preinitiation complex.
This discussion has focused on RNA polymerase II; different transcription factors are needed for RNA polymerases I and III. However, all three require TBP.
Cells control the transcription of every gene individually. A unique combination of silencers and enhancers for each gene modulates the transcription rate. How do activator and repressor proteins that are bound far from the promoter influence that transcription of genes?
"Specificity protein 1" (Sp1) was the first human transcription factor that was found that could recognize a specific GC regulatory enhancer sequence. This protein has two interesting modules:
(1) A module of 3 zinc fingers at one end;
(2) A module at the opposite end with 2 discrete segments rich in Gln.
Mutants that do not have the glutamine-rich end can bind to DNA but transcription is not stimulated. Therefore, the glutamine-rich end must need to bind to something else for transcription to occur, and these are the "coactivators". They are also called "TBP-Assicuated Factors" or "TAFs" and there are at least eight of them that are important to transcriptional activation. These are not basal factors (GTFs) and they do not bind to specific DNA sequences. Rather, they bind avidly to TBP and provide for multiple "docking sites" to the activators. In this sense, they are "adaptor molecules". A "toolkit" of such adaptor molecules provides for tremendous diversity of options to modulate transcription of a gene. So, expanding on our previous comparison of the preinitiation complex of GTFs to the prokaryotic s factor, a better comparison would be between the s factor and the entire complex of activator-coactivator-basal preinitiation complex. As to how this arrangement modulates of influences the rate of transcription, it is probably mediated primarily by distortion of DNA that facilitates the movement of RNA polymerase II along the coding region.
Latchman (TRENDS in Biochemical Sciences Vol. 26 No.4 April 2001) has pointed out the importance of the DNA binding site itself as playing a key role in transcriptional modulation. The same transcription factor can assume different conformations as a result if binding to different sites. The conformational changes are induced by the DNA-protein interaction, thereby increasing the flexibility of the spectrum of control of transcription, since one protein can act like an entire collection of proteins, each having its own effect (activation, inhibition or no effect).
To carry this one step further, one can imagine that a similar phenomenon can occur when coactivators bind to activators. Perhaps different conformational changes are similarly induced in the bound protein depending upon type of protein-protein interaction. Such conformational changes can then result in different ability to modulate transcription.
The activation domains of transcription factors are often glutamine-rich, but others are proline-rich or acidic. In some cases, hydrophobic residues are interspersed among the acidic or glutamine residues and are important for activation. Tjian (Cell, Vol. 77, 5-8, April 8, 1994) suggests that hydrophobic forces drive cohesion of activation domains with their targets and that specificity is achieved by the periodicity of the cohesive elements.
Genes are transcribed at measurable rates only if the correct activators are present and are able to overcome the effects of repressors.
Click below to link to class notes: