Transcription
Transcription is the transfer of genetic information from DNA to RNA, using DNA
as a template. Protein synthesis occurs in ribosomes.
What is a gene? At one level, a gene is an ordered string of nucleotides that
encodes a polypeptide. Such genes are "structural" genes. We also know
that genes can also encode RNA, including messenger RNA (mRNA), transfer RNA (tRNA),
ribosomal RNA (rRNA) as well as other RNA types. But something has to turn on
and terminate gene expression, as well as regulate it. The regulatory sequences,
which may be "promoters" or "enhancers/silencers" may be
located far away from the coding regions. So now our view of a gene must include
the idea of separate regions of a chromosome. What if the information as
transcribed onto mRNA doesn't reflect the final protein until it is further
modified? This is "posttranscriptional modification". Now the concept
of a gene is becoming even cloudier. What if there are "overlapping"
coding regions? Clearly our definition of a gene is not going to be a simple
one.
Functionally, though, we can describe a gene as having a distinct coding
region and a distinct regulatory region, the latter controlling the rate at
which DNA is transcribed into mRNA. We will see that the regulatory units are
composed of DNA "motifs" and that every motif will need to be occupied
by a regulatory protein if a gene is to be regulated properly. Not only
must there be the appropriate attachment of the protein, but the proteins all
have to fit together with proteins binding to other nearby motifs in the way
that jigsaw puzzle pieces fit together. And there is only one correct way for
everything to fit together. So, it's not just a simple matter of DNA directing
mRNA synthesis, which then directs protein synthesis; proteins are intrinsically
involved in the regulation of protein production at the level of transcription.
This can get to be a nightmare if you begin to think about the regulation of the
production of regulatory proteins.
We also have to consider an important difference between eukaryotes and
prokaryotes with regard to the transcription of structural, or protein-coding,
genes. In eukaryotes, the genes are transcribed individually while in
prokaryotes, genes with related functions ("operons") can be
transcribed together. As an example, the Lac operon includes three
protein-encoding genes as well as their control sequences. The operon is
transcribed as a single unit as a "polycistronic mRNA". Eukaryotic
structural genes are transcribed as monocistronic mRNA.
DNA-Directed RNA Synthesis
There are three steps that characterize DNA-directed RNA synthesis:
(1) Initiation by binding of the
transcription apparatus to the DNA template
(2) Elongation of the mRNA chain
(3) Termination of the mRNA chain
The piece of mRNA that results from the direct transcription of the DNA that
encodes a "gene" is called the "primary transcript" and it
undergoes modification, sometimes quite extensively, before it can translate its
message into protein.
The class of enzymes that synthesize RNAs are known as RNA polymerases. They
are all multisubunit complexes that are present in all cells and they catalyze
the reaction:
(RNA)n
residues + 1 NTP === (RNA)n+1 residues
+ PPi
Pyrophosphate is irreversibly hydrolyzed to 2 Pi thus driving the
reaction to the right. The individual nucleotides that are read off of the DNA
template strand are transcribed into the nucleotides of the corresponding RNA,
so the final result is a single-stranded polymer, namely the mRNA, whose
nucleotides correspond exactly to the complementary nucleotides on the DNA
strand with the exception that everywhere that a "A" appears in the
DNA template strand, a "U" appears in the mRNA. (The possible
NTPs, then, are ATP,CTP,GTP,UTP.)
Transcription in Prokaryotes
The most studied RNA polymerase is that from E.coli, so we will study
it as the prototype of the RNA polymerases. The holoenzyme is a 449 kD protein
composed of a "core enzyme" and a "s-subunit",
and the entire complex is denoted (core)s. The
core enzyme directs the polymerization reaction, and it has 4 subunits:
core enzyme = a2bb'w.
The inorganic ions Zn2+ (two of them in the b'
subunit) and Mg2+ are required for catalytic activity and the
three-dimensional structure of the enzyme resembles a hand. The thumb of the
hand can be envsioned as grasping a piece of B DNA that lies in a channel
represented by the curved fingers and palm of the hand. This channel is
cylindrical, with dimensions on the order of 25 A by 55 A. These dimensions
allow a fit of about 16 base pairs of B DNA.
The "hand" structure appears in other enzymes that we will study,
including DNA polymerase and reverse transcriptase. You can further study the
hand structure of RNA polymerase by looking at T7 RNA polymerase (see PDB
below).

1ARO: T7 RNA Polymerase
We will look at transcription from the point of view of the gene, which we
have already mentioned is a rather ambiguous entity. Nevertheless, it is clear
that there must be a starting point for correct transcription to take place, and
it is reasonable to include this as part of the gene, even though it does not
get transcribed itself. So, the problem of initiation is really one of
recognition of a starting point. But which of the two strands of the DNA serves
as the template and how does the polymerase choose?
Either strand can serve as the template but the transcription always proceeds
from the 5' end of a strand of DNA to the 3' end. The 3'-5' strand that serves
as the template is called the "antisense" or noncoding strand and the
5'-3' strand (which has the same nucleotide sequence, with exception of "U"s
for "T"s, as the subsequently transcribed mRNA) is the
"sense" or "coding" strand. To be consistent and clear, we
will use the convention that our description of position along a sequence of
nucleotides will be from the point of view of the sense strand, as this is the
same ordering as that of the mRNA that is transcribed. The part of the gene that
serves as the initiation site is called the "promoter" and it is
sought out by the RNA polymerase holoenzyme. The holoenzyme binds weakly to DNA,
with a Kdissoc of about 10-7 M, and this allows it
to move along the antisense strand in search of the promoter. The s
subunit is specific for its promoter sequence and tight binding of the
holoenzyme occurs (Kdissoc of about 10-14 M).
The promoter is recognized by an approximately 40 bp nucleotide sequence on
the 5' side of the initiation site, and within this sequence are two
"conserved" sequences. One of these is 6 bp in length and is centered
about 10 bps upstream from the starting site of transcription. This is the
"Pribnow Box" and it has a consensus sequence TATAAT
.The other, less highly conserved, sequence is centered about 35 bp
upstream and has a consensus sequence of TTGACA.
The start site is indicated by the notation +1 and is almost always A
or G.
RNA polymerase holoenzyme contacts the promoter at roughly the centers of the
two regions (-10 and -35) and the core enzyme tightly binds to the duplex DNA.
Its action is that of melting the double-stranded DNA along a sequence of
about 11 bps, from -9 to +2. The s factor splits off
as transcription begins.
It is the specific s factors within a cell that
determine which genes will be transcribed. Thus individual cell types are
characterized by their s factors.
Chain elongation proceeds in the 5'--> 3' direction, and the
"transcription bubble" (the length of "melted" DNA) travels
with the RNA polymerase. As a consequence, the unmelted DNA is overwound in
front of the bubble and underwound behind the bubble. Topoisomerases then act to
relax the positive and negative supercoils. The mRNA that is produced is
hybridized for a short length to the DNA at the downstream position, and exists
separate from the DNA as a "tail", the point of attachment being at
the downstream end. The RNA polymerase does not fall off of the DNA as it is
processing because of its relatively tight, but nonspecific, binding on both
sides of the transcription bubble, stabilized by its "thumb"
wrapping around the DNA. About 20 to 50 nucleotides are transcribed per second
at 37 C and one nucleotide is incorrectly transcribed in about every 104
. As genes are repeatedly transcribed, this error rate is not too deleterious,
especially when coupled with the fact that there are multiple codons
("synonyms") for each amino acid subsequently translated and that
single amino acid substitution errors in a protein usually do not hinder its
function.
Spontaneous termination of gene transcription is signaled by "termination
sequences".In E.coli, the final signal to stop transcription is a
series of 4 - 10 A-T base pairings with the As
on the template strand. For each A in this
region, the mRNA transcript will have a U.
Just upstream from this sequence is a region rich in G
and C bases followed by a spacer of
nucleotides and another region rich in G and
C. The two G,C
rich regions are such that one region can be superimposed upon the other by a
symmetry operation of 180o . This relationship of base pairs around a
center of rotational symmetry is called a "palindromic sequence". The
resulting string of nucleotides at the 3' end of the mRNA is such that a hairpin
loop can form, the Gs base-pairing with the Cs
and vice versa, and the As with the Us. The most terminal part of the 3' end is
a series of Us followed by a hydroxyl group.
As the loop is forming, the RNA polymerase pauses at the termination site. The
terminal oligo-U tail, which is only weakly
bound to the DNA template strand, is displaced by the non-template DNA strand.
Now the mRNA strand is free of the DNA template. However, there are numerous
other factors that influence the overall process of termination.
Nonspontaneous termination of transcription requires a "rho factor"
protein, which also functions to improve the spontaneous termination efficiency.
The rho factor recognizes a sequence on the growing mRNA chain, upstream from
the termination site, after which it attaches and moves along the chain in the
5'-3' direction until it reaches the RNA polymerase that is paused at the
termination site. The transcript is released from its template strand by the
unwinding of RNA-DNA duplex by the rho factor.
Transcription in Eukarytes:
While very similar to that in prokaryotes, the "machinery" and
control sequences of transcription in eukaryotes is much more complex, and there
are numerous RNA polymerases.
Ribosomal RNA (rRNA) constitutes about 95% of all RNA and about 67% of the
RNA in ribosomes. The remainder of RNA includes transfer RNA (tRNA), messenger
RNA (mRNA) and other types present in smaller amounts, like "small
nuclear" RNAs (snRNAs) involved in mRNA splicing and "guide" RNAs
that are involved in editing of RNA. These latter two processes occur in the
post-translation stage of the life cycle of eukaryotic mRNA. All RNAs are coded
for by DNA, and the different types of RNA polymerase in eukaryotes reflect this
and the fact that, in eukaryotes, translation of mRNA into DNA occurs outside of
the nucleus.
Precursors of most rRNA are synthesized in nucleoli with the enzyme RNA
polymerase I. Precursors of mRNA are synthesized in the nucleoplasm by RNA
polymerase II while RNA polymerase III, also in the nucleoplasm, synthesizes
precursors of 5S RNA, tRNAs and other RNAs found both in the nucleus and
cytoplasm. Mitochondria have their own RNA polymerases, and these are analogous
to chloroplast RNAs found in plants. We will focus on RNA polymerase II as it is
the one involved in transcription in eukaryotes.
You can look at the structure of yeast RNA polymerase II (see PDB below) as we
discuss its structure as a prototype. These are large, multisubunit
enzymes, with some of the subunits being homologs of the a,b,
and b' subunits in the prokaryotic RNA polymerase.
The overall shape of the enzyme is similar to that of the prokaryotic RNA
polymerase ( and DNA polymerase), namely that of a hand with a "thumb"
motif that flanks a channel big enough to contain a piece of B-DNA (about 25 A
wide).

1ENO: Yeast RNA Polymerase II
We did not yet consider the chemistry of the elongation of the mRNA
chain, but we will do so here. The chains are elongated in the direction 5'
--> 3' by nucleophilic attack of the 3' OH group of the growing chain by
the a-phosphate of the incoming NTP.
As in prokaryotes, eukaryotic transcription begins by recognition of
promoters. There are many copies of the rRNA genes that direct rRNA synthesis,
all with almost identical sequences. This redundancy assures an adequate supply
of rRNA which, as we mentioned previously, comprises about 95% of cellular RNA.
The promoters for these almost identical genes are, therefore, identical, so RNA
polymerase I must only recognize one promoter sequence. However, the RNA
polymerase I is species-specific (RNA poly II and III are not species specific).
For promotion of mammalian rRNA,, there is a "core promoter
element" that spans the region -31 to +6 (note that this overlaps a region
of the gene that is transcribed) and an "upstream promoter element"
that spans -187 to -107.
For transcription of genes by RNA polymerase III, the promoter is sometimes
located in a segment within the transcribed part of the gene, between +40 and
+80, but can also be partially upsteam or entirely upstream fro the start
site.
RNA Polymerase II Promoters and Control Sequences
Promoter sequences for RNA polymerase II are diverse. We can divide these
into two classes: those that are found in genes that produce proteins at
about the same rate in all cells ( "constitutive enzymes") and
those for genes whose production rates vary greatly from one cell type to
another and depend upon the needs of a differentiated cell at a given time
("inducible enzymes").
Constitutive Gene Promoter Elements:
The GC Box : This is a
region containing one or more copies of the sequence GGGCGG (or its
complement) in a location upstream from the start site, and it is analogous to
the prokaryotic promoter elements.
Other promoter elements are also
found in the -50 to - 110 region upstream from the GC box.
Selectively Expressed Gene Promoter Elements:
The TATA Box : A
region located at about -25 to -30 that is rich in the nucleotides "A"
and "T" and that resembles the Pribnow Box (TATAAT). Genes can still
be transcribed in the presence of a defective TATA box and it is thought
that the TATA box is involved in choosing the transcription start site
The CCAAT Box : This is a
sequence that is often found upstream to the TATA box, located at about -70 to
-90. These bind RNA polymerase II as well as other proteins needed for
initiation of transcription.
Control Sequences for Structural Genes:
Other regions of
the chromosome, some far-removed from the start site, can affect the binding of
RNA polymerase II to promoter elements. These gene elements are called
"enhancers" and "silencers". Proteins called
"activators" and "repressors" can bind to the enhancers and
silencers , thus affecting polymerase binding to the promoters. Furthermore,
the same protein can function as both an activator or a repressor, depending
upon the specific interaction ("dual-acting" transcription factors).
Recruitment of RNA Polymerase II to the Promoter:
Eukaryotes do not have a simple protein that corresponds to the s
factor in prokaryotes. Rather, there is a set of proteins that together perform
the same function as the s factor, and these are the
"general transcription factors" ("GTFs"). We have
already looked at structures of transcription factors when we discussed
DNA-protein interaction in a previous lecture. Otherwise, the general mechanisms
of transcription initiation are similar.
There are 6 GTFs that are required for a low and invariant basal rate of
transcription, and this rate can be increased by the participation of other
protein factors. These GTFs form a "preinitiation complex" that begins
when the "TATA binding protein" ("TBP") binds to the
TATA box (if there is one) of a promoter. The specific sequence at which it
binds identifies the transcription start site. As a result of this binding, the
DNA is distorted by kinks at both ends of the TATA box. Other GTFs bind
successively, followed by the binding of RNA polymerase. Finally, the remaining
GTFs bind.

1YTB: TBP/TATA Box Complex
After TBP (which is a component of TFIID) binds, the sequence of binding is
as follows:
TFIIA
TFIIB
TFIIF
RNA PII
TFIIE
TFIIH
TFIIH has two important enzyme activities. The first is an ATP-dependent
helicase activity that assists the formation of an open complex and the second
is a kinase activity that results in the phosphorylation of the largest subunit
of RNA polymerase II at its C-terminal end. Now the transcription elongation
process can begin, with the various GTFs (except TFIIF) dissociating from the
complex as elongation occurs. TFIID remains bound to the promoter so that
repeated transcription can occur as GTFs reassemble to form the preinitiation
complex.
This discussion has focused on RNA polymerase II; different transcription
factors are needed for RNA polymerases I and III. However, all three require TBP.
Cells control the transcription of every gene individually. A unique
combination of silencers and enhancers for each gene modulates the transcription
rate. How do activator and repressor proteins that are bound far from the
promoter influence that transcription of genes?
"Specificity protein 1" (Sp1) was the first human
transcription factor that was found that could recognize a specific GC
regulatory enhancer sequence. This protein has two interesting modules:
(1) A module of 3 zinc fingers at one end;
(2) A module at the opposite end with 2 discrete segments
rich in Gln.
Mutants that do not have the glutamine-rich end can bind to DNA but
transcription is not stimulated. Therefore, the glutamine-rich end must need to
bind to something else for transcription to occur, and these are the "coactivators".
They are also called "TBP-Assicuated Factors" or "TAFs" and
there are at least eight of them that are important to transcriptional
activation. These are not basal factors (GTFs) and they do not bind to specific DNA
sequences. Rather, they bind avidly to TBP and provide for multiple
"docking sites" to the activators. In this sense, they are
"adaptor molecules". A "toolkit" of such adaptor molecules
provides for tremendous diversity of options to modulate transcription of a
gene. So, expanding on our previous comparison of the preinitiation complex of
GTFs to the prokaryotic s factor, a better comparison
would be between the s factor and the entire complex
of activator-coactivator-basal preinitiation complex. As to how this arrangement
modulates of influences the rate of transcription, it is probably mediated
primarily by
distortion of DNA that facilitates the movement of RNA polymerase II along the
coding region.
Latchman (TRENDS in Biochemical Sciences Vol. 26 No.4 April 2001) has pointed
out the importance of the DNA binding site itself as playing a key role in
transcriptional modulation. The same transcription factor can assume different
conformations as a result if binding to different sites. The conformational
changes are induced by the DNA-protein interaction, thereby increasing the
flexibility of the spectrum of control of transcription, since one protein can
act like an entire collection of proteins, each having its own effect
(activation, inhibition or no effect).
To carry this one step further, one can imagine that a similar phenomenon can
occur when coactivators bind to activators. Perhaps different conformational
changes are similarly induced in the bound protein depending upon type of
protein-protein interaction. Such conformational changes can then result in
different ability to modulate transcription.
The activation domains of transcription factors are often glutamine-rich, but
others are proline-rich or acidic. In some cases, hydrophobic residues are
interspersed among the acidic or glutamine residues and are important for
activation. Tjian (Cell, Vol. 77, 5-8, April 8, 1994) suggests that
hydrophobic forces drive cohesion of activation domains with their targets and
that specificity is achieved by the periodicity of the cohesive elements.
Genes are transcribed at measurable rates only if the correct activators
are present and are able to overcome the effects of repressors.
RU386: The Abortion Pill
Click below to link to class notes:
TRANSCRIPTION.doc
Transcription
Study
Questions
|