DNA - Protein Interactions


We have started to consider the issue of control of the readout of the genetic information that is stored in DNA by looking at the flexibility, and its constraints, inherent in DNA due to its geometric conformations and by studying the topology of DNA supercoiling. The genetic information must be accessible to proteins that transcribe it onto RNA or that assist in duplicating the DNA. Control of these two processes is modulated by varying the accessibility of the encoded information. This will affect whether or not a specific gene is active in a particular cell type and, if it is, at what rate it directs the synthesis of its protein.

Proteins can interact with DNA either specifically or non-specifically. In the case of non-specific interactions, the sequence of nucleotides does not matter, as far as the binding interactions are concerned. Histone (protein) - DNA interactions are an example of such interactions, and they occur between functional groups on the protein and the sugar-phosphate backbone of DNA. Specific DNA - protein interactions, however, depend upon the sequence of bases in the DNA and, as we have already discussed, on the orientation of the bases that can vary with twisting and writhing (supercoiling). These DNA - protein interactions are strong, and are mediated by:

        Hydrogen bonding : Can be direct H-bonds, or indirect, mediated by water molecules

        Ionic interactions: Salt bridges; protein side chains - DNA backbone interactions

        Other forces: van der Waals, hydrophobic

In this exercise, we will be concerned with specific DNA- protein interactions; in particular, with those involved in the process of DNA transcription. Specifically, we will look at  common supersecondary structures ("motifs") that are found in proteins that control transcription in prokaryotes and eukaryotes and learn how and why these structures interact with DNA.

You have already studied the processes of DNA replication and of transcription onto mRNA, but it is reasonable to mention a few of the highlights here. RNA polymerases are enzymes that carry out transcription. In order to begin transcription, the RNA polymerase must recognize the beginning of a gene (or operons, as are found in prokaryotes). The RNA polymerase recognizes a region of specific base sequences known as the "promoter". "Repressor proteins" can also bind at the promoter, thus blocking initiation of transcription. In addition, "activator proteins" can bind in regions next to the promoters to increase the rate of transcription of a gene. "Enhancer regions" of DNA, which are removed  from the promoter in terms of the number of nucleotides separating them, but may be close in space to the promoter by virtue of the flexibility of DNA, can also bind proteins that can affect the rate of transcription of a particular gene. These interactions, then, act to regulate the rate of transcription. At the extreme, they can also function as "switches", which can turn a gene on and off. 

 Prokaryotic Transcription Control Motifs

                  Proteins can "recognize" specific segments of DNA  in the major and minor grooves via the forces mentioned above. The major groove, being wider than the minor groove, can accommodate larger structural motifs. As we saw before, the pattern of base pairs that are exposed in the floor of the grooves is more specific and discriminatory for the major groove. Thus, one would expect that most of the important protein-DNA interactions will occur in the major groove.

Study Question:   Explain why DNA-protein interactions primarily occur within the major groove of DNA.

Nature has evolved a collection of "motifs" that provide a scaffold for a piece of protein secondary structure, usually an a-helix, to recognize and bind to DNA. We will consider the "helix-turn-helix" motif and the role that it, or a variation of it, plays in transcriptional control.     





2OR1: 434 Phage Repressor with Target DNA




Now, answer the following questions:

(1) Describe in detail the helix-turn-helix motif.

(2) Specific operator regions are often palindromic in nature. What is the relationship between this kind of symmetry and the interaction of the specific region with H-T-H recognition elements?

(3) What is the "recognition helix" and what kinds of interactions, at a molecular level, are involved between the recognition helix and its target?





1TRO: E. coli Trp Repressor-Operator Complex



Now, answer the following questions:

(1) Describe in detail the interaction of the trp repressor with its operator.

(2) What forces are involved at the molecular level in this DNA-protein interaction?

(3) What is the  "indirect readout" phenomenon?  Why is it significant?

(4) Compare the DNA-protein interaction in this complex to that seen in the complex between the 434 Phage Repressor and its target DNA.

(5) What is a corepressor? How does the binding of tryptophan in this complex affect the production of trp by E.coli?


Finally, we'll look at the E. coli met repressor. When complexed with S-adenosylmethionine (SAM), the gene that codes for production of methionine is repressed. Thus, SAM acts as a corepressor here. 




1CMA: E. coli Met Repressor-SAM Operator Complex


Now, describe the similarities and differences in the 3 DNA-protein interactions that you have just studied. Make a chart with the following headings, and fill it in for the 3 proteins as indicated below. (You will compete this chart after you look at  the eukaryotic motifs.)


Prokaryotic Motif Type Dimer Palindrome Bonding Contacts Additional Substrate Miscellaneous
434 Phage Repressor H-T-H YES YES H-bonds a; Same Side NO Parallel to major groove
E. coli Trp Repressor              
E. coli  Met Repressor              


The helix-turn-helix and met-repressor-like motifs are common among the prokaryotic transcriptional regulatory proteins. Variations on the theme of the H-T-H motif are also seen. 




These proteins promote selective activation and/or repression of genes, as do the prokaryotic transcription factors, but there is a greater diversity in the DNA-binding motifs than what we saw with the prokaryotes. We will look at the following motifs, as well as variations on individual themes:


    (1) Zinc Fingers

    (2) Leucine Zippers

    (3) Helix-Loop-Helix



The zinc finger is a motif that is used in a variety of ways in eukaryotic transcription factors. There are at least 10 different types of zinc finger domains and we will look at examples of the more prominent ones:

                (1) Cys2-His2

                (2) Cys4

                (3) Cys6


(1) Cys2-His2  Domains:

    These were first described in 1985, initially as TFIIA in the frog Xenopus laevis. These domains are the most abundant DNA-binding motifs found in eukaryotic transcription factors.

    In this domain, there are :

        Two invariant Cys residues

        Two invariant His residues

        A tetrahedrally-liganded Zn2+ ion

The general structure of the domain is:

        (Tyr or Phe)-X-Cys-X2-4 - Cys-X3 - Phe - X5 - Leu-X2 -His-X3-5 - His

Notice that there are only about 25-30 residues that define this domain.

An interesting "general principle" in protein folding: Protein structural domains of less than 50 amino acid residues do not fold autonomously.

The purpose of the zinc ion is to stabilize the fold.





1ZNF and 1A1F: Cys2-His2 Domains




More than one zinc finger domain can be combined in tandem to produce  a domain with multiple fingers that can function as independent "reading heads".


(2) Cys4 Domains

These are most commonly seen in the nuclear hormone receptor family. Gene regulation by these transcription factors requires binding of specific hormones, steroids or vitamins. There are two units to each domain, which consists of about 80 amino acids, and each unit contains two zinc fingers. These domains are also referred to as Cys2-Cys2 domains

The first unit recognizes and binds DNA.

The second unit allows for dimerization of two identical receptor molecules.




1HCQ: Cys4 Estrogen Receptor



(3) The Cys6 Domain

These are "binuclear" zinc fingers, in which two Zn2+ ions are bound by six Cys residues. Each Zn2+ is coordinated tetrahedrally, and two residues ligate both metal ions.

There are 6 invariant Cys residues, and the general domain structure is as follows, where "X" represent any amino acid:

    Cys-X2 - Cys-X6 - Cys-X6 -Cys-X2 -Cys-X6  -Cys

Although the amino acid sequences may vary, their individual lengths are strictly conserved.


1D66: Cys6 motif Yeast Transcription Factor GAL4







Breast and Ovarian Cancer Development





An interesting statistic: One percent of the human genome specifies zinc fingers!


 The Importance of Zinc in Biochemistry


Link to General Chemical Principles: What is it about the chemistry of zinc that makes it so important in biochemistry?



These are not DNA-binding motifs in and of themselves but they mediate dimerization of  certain DNA-binding proteins. They were first recognized in the amino acid sequences of the mammalian transcription factor C/EBP. "EBP" is the acronym for "Enhancer-Binding Protein" and the "C" refers to the fact that C/EBP recognizes the "C" motif, CCATT, found in many gene promoters.The yeast transcription factor, GCN 4 and the three nuclear transforming oncogene products, fos, jun and myc have leucine zippers.

In the structure originally proposed by Steven McKnight, who cloned the C/EBP gene, the following features were specified:

    (1) The region of the leucine zipper consists of about 30 residues.

    (2) Every 7th residue is leucine . This is an example of a "heptad repeat". (You have already studied the coiled-coil structure of keratin that arises from its primary structure, a-b-c-d-e-f-g, where "a" and "d" are nonpolar residues.)

    (3) There are no proline or glycine residues. (These are "helix-breakers").

    (4) The hydrophobic residue leucine is positioned at every second turn of an 8-turn long a-helix so as to form a ridge of leucines.

    (5) An anti-parallel interfacing of two of these "ridges" forms a dimerization surface, in which the leucines interdigitate to resemble a zipper.

Subsequently, the leucines were shown to not interdigitate but, rather, the helices were parallel and the leucine residues were adjacent to each other.






A variation on the theme of leucine zippers is that they do not necessarily have to mediate dimerization between the same protein. If they do, then "homodimeric" zippers result. If the two proteins are different, then "heterodimers" result.

If the zipper is homodimeric, then you would expect to find a corresponding palindromic nucleotide base sequence on its DNA partner. Heterodimeric protein interactions that are mediated by the leucine zipper increase the repertoire of potential DNA-protein interactions via this "motif".


 HELIX-LOOP-HELIX : This type of transcription factor has a conserved, basic region that binds to the DNA  which is followed by two amphipathic helices that are connected by a loop (a basic helix-loop-helix motif). The basic region, along with the N-terminal portion of the first helix binds DNA in its major groove. The  C-terminal helix mediates dimerization of the protein via formation of a coiled-coil. Sometimes a leucine zipper is continuous with the bHLH . This probably assists the dimerization.

Look at the transcription factor Max bound to its DNA target sequence:  PDBid 1AN2


Now that we have looked at some of the more prominent eukaryotic DNA-protein interactions, complete the following chart as you did above for the prokaryotic interactions.

Eukaryotic Motif Type Dimer Palindrome Bonding Contacts Additional Substrate Miscellaneous
Zinc Finger:              
               Zif 268 Cys2His2            
               GAL4 Cys6            
               Estrogen Receptor Cys4            
Leucine Zipper:              


Link here for Power Point Presentation:  DNA-PROTEIN INTERACTIONS