CAAT box gene transcriptions: Difference between revisions

Jump to navigation Jump to search
mNo edit summary
Line 89: Line 89:
==Gene transcriptions==
==Gene transcriptions==
{{main|Gene transcriptions}}
{{main|Gene transcriptions}}
"Genes that have this element seem to require it for the gene to be transcribed in sufficient quantities. It is frequently absent from genes that encode [[Draft:Proteins|proteins]] used in virtually all cells. This box along with the [[GC box]] is known for binding general transcription factors. CAAT and GC are primarily located in the region from 100-150bp upstream from the [[TATA box]]. Both of these consensus sequences belong to the regulatory promoter. Full gene expression occurs when transcription activator proteins bind to each module within the regulatory promoter. Protein specific binding is required for the CCAAT box activation. These proteins are known as CCAAT box binding proteins/CCAAT box binding factors."<ref name=CAATBox/>
"Genes that have this element seem to require it for the gene to be transcribed in sufficient quantities. It is frequently absent from genes that encode proteins used in virtually all cells. This box along with the [[GC box]] is known for binding general transcription factors. CAAT and GC are primarily located in the region from 100-150bp upstream from the [[TATA box]]. Both of these consensus sequences belong to the regulatory promoter. Full gene expression occurs when transcription activator proteins bind to each module within the regulatory promoter. Protein specific binding is required for the CCAAT box activation. These proteins are known as CCAAT box binding proteins/CCAAT box binding factors."<ref name=CAATBox/>


==Cadherins==
==Cadherins==
Line 191: Line 191:


With each SuccessablesCAAT.bas extended from 958 to 4445 nts starting just beyond ZNF497, there are no changes in results.
With each SuccessablesCAAT.bas extended from 958 to 4445 nts starting just beyond ZNF497, there are no changes in results.
Copying the consensus sequence for the Hap4p 5'-CCAAT-3' and putting the sequence in "⌘F" finds one location between ZNF497 and A1BG or no locations between ZSCAN22 and A1BG as can be found by the computer programs.


Copying the consensus sequence 5'-CAAT-3' and putting the sequence in "⌘F" finds seven location between ZNF497 and A1BG or no locations between ZSCAN22 and A1BG as can be found by the computer programs.
Copying the consensus sequence 5'-CAAT-3' and putting the sequence in "⌘F" finds seven location between ZNF497 and A1BG or no locations between ZSCAN22 and A1BG as can be found by the computer programs.
Line 212: Line 210:
==CCAAT samplings==
==CCAAT samplings==
{{main|Model samplings}}
{{main|Model samplings}}
Copying a responsive elements consensus sequence AAAAAAAA and putting the sequence in "⌘F" finds none between ZNF497 and A1BG or none between ZSCAN22 and A1BG as can be found by the computer programs.
Copying the consensus sequence for the Hap4p CCAAT and putting the sequence in "⌘F" finds one location between ZNF497 and A1BG or no locations between ZSCAN22 and A1BG as can be found by the computer programs.


For the Basic programs testing consensus sequence AAAAAAAA (starting with SuccessablesAAA.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:
For the Basic programs testing consensus sequence CCAAT (starting with Successables CCAAT.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:
# negative strand, negative direction, looking for AAAAAAAA, 0.
# negative strand, negative direction, looking for AAAAAAAA, 0.
# positive strand, negative direction, looking for AAAAAAAA, 0.
# positive strand, negative direction, looking for AAAAAAAA, 0.

Revision as of 15:33, 16 January 2022

Editor-In-Chief: Henry A. Hoff

File:Alosa fallax.jpg
As representative of the Metazoa here is an image of a twaid shad. Credit: Hans Hillewaert.

A "CCAAT box (also sometimes abbreviated a CAAT box or CAT box) is a distinct pattern of nucleotides"[1] along the template strand of DNA in eukaryotes.

Boxes

A "repeating sequence of nucleotides that forms a transcription or a regulatory signal"[2] is a box.

Consensus sequences

In the direction of transcription on the template strand, the consensus sequence for a CAAT box is 3'-GGCCAATCT-5'.[1]

On the coding strand "(T/C)G ATTGG (T/C)(T/C)(A/G) was the sequence that favored CBF binding [in the mouse pro-α2(1) collagen promoter]."[3] On the template strand, this is 3'-(C/T)(A/G)(A/G)CCAATC(A/G)-5'. "[T]he favorable sequence for CBF binding was TG ATTGG (T/C)(T/C)(A/G)."[3]

The upstream activating sequence (UAS) for the Hap4p is 5'-CCAAT-3'.[4]

Core promoters

Notation: let the symbol CBF represent the CAAT-box binding factor.

A CAAT box when present occurs "upstream by 75-80 bases to the initial transcription site."[1]

"In many eukaryotic class II promoters, CCAAT motifs are often found between 50 and 100 nucleotides upstream of the transcription start site (17-20), and these motifs are recognized by different classes of CCAAT-binding proteins, one of which is CBF."[5]

"In many higher eukaryotic class II promoters, CCAAT motifs (or ATTGG motifs in the opposite strand), are often found between −50 and −110 relative to the start of transcription (1-4). The precise location of these CCAAT motifs and the promoter sequences around the motif of a specific gene are highly conserved during evolution."[3]

"In metazoa, the CBF-DNA complex is characterized by its requirement for a high degree of conservation within the binding motif CCAAT (7, 21, 22), and sequences surrounding the pentameric motif contribute to the binding specificity (Ref. 16 and references therein)."[5]

"Computer analysis of 502 unrelated RNA polymerase II promoter regions showed that approximately 30% of the promoters contained a CCAAT sequence (or ATTGG sequence on the complementary strand) and that in a large number of vertebrate promoters the CCAAT motif was located around nucleotide −80 upstream of the transcription start site (4)."[3]

"[I]n most of these promoters the flanking sequences of ATTGG were TG on the 5′ side and (T/C)(T/C)(A/G) on the 3′ side".[3]

"[T]he CCAAT-flanking sequences [occur] around the CCAAT motifs in most eukaryotic promoters harboring a CCAAT sequence in these proximal promoters."[3]

"In contrast to many animal CCAAT motifs, the majority of the plant sequences contain only one C or lack a CAAT-box completely."[5]

Gene transcriptions

"Genes that have this element seem to require it for the gene to be transcribed in sufficient quantities. It is frequently absent from genes that encode proteins used in virtually all cells. This box along with the GC box is known for binding general transcription factors. CAAT and GC are primarily located in the region from 100-150bp upstream from the TATA box. Both of these consensus sequences belong to the regulatory promoter. Full gene expression occurs when transcription activator proteins bind to each module within the regulatory promoter. Protein specific binding is required for the CCAAT box activation. These proteins are known as CCAAT box binding proteins/CCAAT box binding factors."[1]

Cadherins

"Transcriptional downregulation of E-cadherin appears to be an important event in the progression of various epithelial tumors. SIP1 (ZEB-2) is a Smad-interacting, multi-zinc finger protein that shows specific DNA binding activity. [Expression] of wild-type but not of mutated SIP1 downregulates mammalian E-cadherin transcription via binding to both conserved E2 boxes of the minimal E-cadherin promoter."[6]

"Analysis of mouse and human E-cadherin promoters revealed a conserved modular structure with positive regulatory elements including two E2 boxes (CACCTG) with a potential repressor role Behrens et al. 1991, Giroldi et al. 1997."[6]

"The two E2 boxes in the mouse and human E-cadherin promoter sequences were demonstrated to play a crucial role in the epithelial-specific expression of E-cadherin Behrens et al. 1991, Giroldi et al. 1997. Mutation of these sequence elements results in upregulation of the E-cadherin promoter in dedifferentiated cancer cells, whereas the wild-type promoter shows low activity in such cells. Recently, it was shown that the zinc finger transcriptional repressor Snail can downregulate E-cadherin by binding to the E boxes in the E-cadherin promoter Batlle et al. 2000, Cano et al. 2000. Human Snail belongs to a family of zinc finger proteins, which contain four or five zinc finger domains of the C2H2 type at their C-terminal end. These zinc fingers bind to the CANNTG sequence in E box motifs."[6]

"δEF1 and SIP1 have been shown to bind spaced CACCT DNA sequences, including E2 boxes (CACCTG), by their zinc finger clusters (Remacle et al., 1999)."[6]

"To address the specificity of SIP1 action, mutagenesis of the E-cadherin promoter in either its upstream E2 box 1 (−75) or its downstream E2 box 3 (−25), or in both E2 boxes was performed [...]."[6]

Wild-type "SIP1 represses the E-cadherin promoter, likely through binding via both zinc finger clusters to spaced E2 boxes as demonstrated previously (Remacle et al., 1999) and confirmed here by a DNA-mediated pull-down assay of SIP1 protein [...]. Wild-type but not mutated SIP1 from transfected human cells could be efficiently precipitated by biotinylated E-cadherin promoter oligonucleotides, comprising two wild-type E2 box sequences. Mutation of the E2 boxes resulted in the loss of SIP1 binding."[6]

Human E2 boxes are E2-box 1 (GCAGGTGA), E2-box 2 (TGGCCGGC) and E2-box 3 (TCACCTGG).[6]

"Alignment of the E-cadherin promoter sequences of dog, mouse, and man. Conserved regulatory elements are indicated: E2 boxes 1 and 3, CCAAT box, and GC box. The E2 box 2 has been described as part of a palindromic E-pal sequence in the mouse E-cadherin promoter (Behrens et al., 1991), but is conserved neither in canine nor in human sequences."[6]

Human NeuroD (BETA2/BHF1) genes

"There was no consensus CAAT box. [...] In addition, we performed mutation analyses of the E2 box and the E3 box to evaluate whether the E2 and E3 boxes regulate the transcriptional activity of the human NeuroD gene [...]."[7]

Human glucocerebrosidase genes

The "5′ genomic sequences revealed promoter elements containing a TATA box at nucleotides −23 to −27 and a CAAT box between nucleotides [...] and an E2 box [...]."[8]

Cap signal elements

"Studies have reported that the cap signal element with the TATA-box, CAAT-box, and GC-box is the most general element of the POL II promoter and exists in major protein [...]."[9]

Hypotheses

  1. A1BG is not transcribed by a CAAT box.

A1BG samplings

A CCAAT box (also sometimes abbreviated a CAAT box or CAT box) is a distinct pattern of nucleotides along the template strand of DNA in eukaryotes.

On the template strand, the CAAT box consensus sequence is 3'-(C/T)(A/G)(A/G)CCAATC(A/G)-5'.

For the Basic programs (starting with SuccessablesCAAT.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. CAAT - 3'-(C/T)(A/G)(A/G)CCAATC(A/G)-5', -- there are zero, -+ there are zero, +- there are zero, ++ there are zero.
  2. CAAT - 3'-(A/G)(C/T)(C/T)GGTTAG(C/T)-5', complement, -- there are zero, -+ there are zero, +- there are zero, and ++ there are zero.
  3. CAAT - 3'-(A/G)-C-T-A-A-C-C-(A/G)-(A/G)-(C/T)-5', inverse, -- there are zero, -+ there are zero, +- there are zero, and ++ there are zero.
  4. CAAT - 3'-(C/T)-G-A-T-T-G-G-(C/T)-(C/T)-(A/G)-5', complement inverse, -- there are zero, -+ there are zero, +- there are zero, and ++ there are zero.

With each SuccessablesCAAT.bas extended from 958 to 4445 nts starting just beyond ZNF497, there are no changes in results.

Copying the consensus sequence 5'-CAAT-3' and putting the sequence in "⌘F" finds seven location between ZNF497 and A1BG or no locations between ZSCAN22 and A1BG as can be found by the computer programs.

Transcribed CAAT boxes

Gene ID: 1051 is CEBPB CCAAT enhancer binding protein beta aka TCF5; NF-IL6; TF5: "This intronless gene encodes a transcription factor that contains a basic leucine zipper (bZIP) domain. The encoded protein functions as a homodimer but can also form heterodimers with CCAAT/enhancer-binding proteins alpha, delta, and gamma. Activity of this protein is important in the regulation of genes involved in immune and inflammatory responses, among other processes. The use of alternative in-frame AUG start codons results in multiple protein isoforms, each with distinct biological functions."[10]

  1. NP_001272807.1 CCAAT/enhancer-binding protein beta isoform b: "Transcript Variant: This variant (1) encodes multiple isoforms through the use of alternative translation initiation codons. The isoform [b, also known as LAP (liver activating protein)] represented in this RefSeq results from translation initiation at a downstream AUG start codon. Isoform b has a shorter N-terminus, compared to isoform a."[10]
  2. NP_001272808.1 CCAAT/enhancer-binding protein beta isoform c: "Transcript Variant: This variant (1) encodes multiple isoforms through the use of alternative translation initiation codons. The isoform [c, also known as LIP (liver inhibitory protein)] represented in this RefSeq results from translation initiation at a downstream AUG start codon. Isoform c has a shorter N-terminus, compared to isoform a."[10]
  3. NP_005185.2 CCAAT/enhancer-binding protein beta isoform a: "Transcript Variant: This variant (1) encodes multiple isoforms through the use of alternative translation initiation codons. The isoform (a, also known as LAP*) represented in this RefSeq results from translation initiation at the 5' most AUG start codon and is the longest isoform."[10]

CCAAT samplings

Copying the consensus sequence for the Hap4p CCAAT and putting the sequence in "⌘F" finds one location between ZNF497 and A1BG or no locations between ZSCAN22 and A1BG as can be found by the computer programs.

For the Basic programs testing consensus sequence CCAAT (starting with Successables CCAAT.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. negative strand, negative direction, looking for AAAAAAAA, 0.
  2. positive strand, negative direction, looking for AAAAAAAA, 0.
  3. positive strand, positive direction, looking for AAAAAAAA, 0.
  4. negative strand, positive direction, looking for AAAAAAAA, 0.
  5. complement, negative strand, negative direction, looking for TTTTTTTT, 0.
  6. complement, positive strand, negative direction, looking for TTTTTTTT, 0.
  7. complement, positive strand, positive direction, looking for TTTTTTTT, 0.
  8. complement, negative strand, positive direction, looking for TTTTTTTT, 0.
  9. inverse complement, negative strand, negative direction, looking for TTTTTTTT, 0.
  10. inverse complement, positive strand, negative direction, looking for TTTTTTTT, 0.
  11. inverse complement, positive strand, positive direction, looking for TTTTTTTT, 0.
  12. inverse complement, negative strand, positive direction, looking for TTTTTTTT, 0.
  13. inverse negative strand, negative direction, looking for AAAAAAAA, 0.
  14. inverse positive strand, negative direction, looking for AAAAAAAA, 0.
  15. inverse positive strand, positive direction, looking for AAAAAAAA, 0.
  16. inverse negative strand, positive direction, looking for AAAAAAAA, 0.

AAA UTRs

AAA core promoters

AAA proximal promoters

AAA distal promoters

CCAAT random dataset samplings

  1. RDr0: 0.
  2. RDr1: 0.
  3. RDr2: 0.
  4. RDr3: 0.
  5. RDr4: 0.
  6. RDr5: 0.
  7. RDr6: 0.
  8. RDr7: 0.
  9. RDr8: 0.
  10. RDr9: 0.
  11. RDr0ci: 0.
  12. RDr1ci: 0.
  13. RDr2ci: 0.
  14. RDr3ci: 0.
  15. RDr4ci: 0.
  16. RDr5ci: 0.
  17. RDr6ci: 0.
  18. RDr7ci: 0.
  19. RDr8ci: 0.
  20. RDr9ci: 0.

RDr UTRs

RDr core promoters

RDr proximal promoters

RDr distal promoters

CCAAT analysis and results

Acknowledgements

The content on this page was first contributed by: Henry A. Hoff.

Initial content for this page in some instances came from Wikiversity.

See also

References

  1. 1.0 1.1 1.2 1.3 "CAAT box". San Francisco, California: Wikimedia Foundation, Inc. April 8, 2013. Retrieved 2013-04-14.
  2. "Box (disambiguation)". San Francisco, California: Wikimedia Foundation, Inc. May 23, 2013. Retrieved 2013-06-15.
  3. 3.0 3.1 3.2 3.3 3.4 3.5 Weimin Bi, Ling Wu, Françoise Coustry, Benoit de Crombrugghe and Sankar N. Maity (October 17, 1997). "DNA Binding Specificity of the CCAAT-binding Factor CBF/NF-Y". The Journal of Biological Chemistry. 272 (42): 26562–72. doi:10.1074/jbc.272.42.26562. Retrieved 2013-04-14.
  4. Hongting Tang, Yanling Wu, Jiliang Deng, Nanzhu Chen, Zhaohui Zheng, Yongjun Wei, Xiaozhou Luo, and Jay D. Keasling (6 August 2020). "Promoter Architecture and Promoter Engineering in Saccharomyces cerevisiae". Metabolites. 10 (8): 320–39. doi:10.3390/metabo10080320. PMID 32781665 Check |pmid= value (help). Retrieved 18 September 2020.
  5. 5.0 5.1 5.2 Victor Kusnetsov, Martin Landsberger, Jörg Meurer and Ralf Oelmüller (December 10, 1999). "The Assembly of the CAAT-box Binding Complex at a Photosynthesis Gene Promoter Is Regulated by Light, Cytokinin, and the Stage of the Plastids". The Journal of Biological Chemistry. 274 (50): 36009–14. doi:10.1074/jbc.274.50.36009. Retrieved 2013-04-14.
  6. 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 Joke Comijn, Geert Berx, Petra Vermassen, Kristin Verschueren, Leo van Grunsven, Erik Bruyneel, Marc Mareel, Danny Huylebroeck, Frans van Roy (June 2001). "The Two-Handed E Box Binding Zinc Finger Protein SIP1 Downregulates E-Cadherin and Induces Invasion". Molecular Cell. 7 (6): 1267–78. doi:10.1016/S1097-2765(01)00260-X. Retrieved 11 January 2019.
  7. Takafumi Miyachi, Hirofumi Maruyama, Takeshi Kitamura, Shigenobu, Nakamura and Hideshi Kawakami (8 June 1999). "Structure and regulation of the human NeuroD (BETA2/BHF1) gene". Molecular Brain Research. 69 (2): 223–231. doi:10.1016/S0169-328X(99)00112-6. Retrieved 2 February 2019.
  8. Dan Moran, Emilia Galperin and Mia Horowitz (31 July 1997). "Identification of factors regulating the expression of the human glucocerebrosidase gene". Gene. 194 (2): 201–213. Retrieved 2 February 2019.
  9. Hyun-Jun Jang, Jin Won Choi, Young Min Kim, Sang Su Shin, Kichoon Lee and Jae Yong Han (November 2011). "Reactivation of Transgene Expression by Alleviating CpG Methylation of the Rous sarcoma virus Promoter in Transgenic Quail Cells". Molecular Biotechnology. 49 (3): 222–228. doi:10.1007/s12033-011-9393-7. Retrieved 2 February 2019.
  10. 10.0 10.1 10.2 10.3 RefSeq (October 2013). "CEBPB CCAAT enhancer binding protein [ Homo sapiens (human) ]". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 2 May 2020.

External links

Template:Sisterlinks