C box gene transcriptions

Jump to navigation Jump to search

Associate Editor(s)-in-Chief: Henry A. Hoff

GAGGCCATCT is a C-box, [...].[1]

"Members of the box C/D snoRNA family, which are the subject of the present report, possess characteristic sequence elements known as box C (UGAUGA) and box D (GUCUGA)."[2]

The human ribosomal protein L11 gene (HRPL11) has [...] two potential snRNA-coding sequences in intron 4: the C box beginning at +4131 (GGTGATG), [...] a D box beginning at +4237 (TCCTG), [...].[3]

Analysis "of the recombinant (soybean [Glycine max] TGACG-motif binding factor 1) STF1 protein revealed the C-box (nGACGTCn) to be a high-affinity binding site (Cheong et al., 1998)."[4]

Hypotheses

  1. The C boxes are not involved in the transcription of A1BG.

Johnson C-box samplings

For the Basic programs SuccessablesCJbox.bas written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. negative strand, negative direction, looking for 5'-GAGGCCATCT-3'[1], 0.
  2. negative strand, positive direction, looking for 5'-GAGGCCATCT-3', 0.
  3. positive strand, negative direction, looking for 5'-GAGGCCATCT-3', 0.
  4. positive strand, positive direction, looking for 5'-GAGGCCATCT-3', 0.
  5. complement, negative strand, negative direction, looking for 5'-CTCCGGTAGA-3', 0.
  6. complement, negative strand, positive direction, looking for 5'-CTCCGGTAGA-3', 0.
  7. complement, positive strand, negative direction, looking for 5'-CTCCGGTAGA-3', 0.
  8. complement, positive strand, positive direction, looking for 5'-CTCCGGTAGA-3', 0.
  9. inverse complement, negative strand, negative direction, looking for 5'-AGATGGCCTC-3', 0.
  10. inverse complement, negative strand, positive direction, looking for 5'-AGATGGCCTC-3', 0.
  11. inverse complement, positive strand, negative direction, looking for 5'-AGATGGCCTC-3', 0.
  12. inverse complement, positive strand, positive direction, looking for 5'-AGATGGCCTC-3', 0.
  13. inverse, negative strand, negative direction, looking for 5'-TCTACCGGAG-3', 0.
  14. inverse, negative strand, positive direction, looking for 5'-TCTACCGGAG-3', 0.
  15. inverse, positive strand, negative direction, looking for 5'-TCTACCGGAG-3', 0.
  16. inverse, positive strand, positive direction, looking for 5'-TCTACCGGAG-3', 0.

snoRNA C box

File:RF00071.jpg
This example of a C/D box is a small nucleolar RNA 73 (snoRNA U73). Credit: Rfam database (RF00071).{{free media}}
File:U14 snoRNA.png
This U14 snoRNA from Saccharomyces cerevisiae shows structure and genomic organization. Credit: Dmitry A.Samarsky, Maurille J.Fournier, Robert H.Singer and Edouard Bertrand.{{fairuse}}

For "box C/D snoRNAs, boxes C and D and an adjoining stem form a vital structure, known as the box C/D motif."[2]

"The [C and D] box elements are essential for snoRNA production [transcription] and for snoRNA-directed modification of rRNA nucleotides."[2]

The "motif is necessary and sufficient for nucleolar targeting, both in yeast and mammals. Moreover, in mammalian cells, RNA is targeted to coiled bodies as well. Thus, the box C/D motif is the first intranuclear RNA trafficking signal identified for an RNA family. Remarkably, it also couples snoRNA localization with synthesis and, most likely, function. The distribution of snoRNA precursors in mammalian cells suggests that this coupling is provided by a specific protein(s) which binds the box C/D motif during or rapidly after snoRNA transcription."[2]

In snoRNA U73 on the right, the C box starting from the left side of the stem consists of nucleotides: ARUGAUGA, and from the right side the D box is AGUCY. In 5' to 3' direction, the D box is YCUGA.

Shown in the second image on the right are the C box (3'-AGUAGU-5'). Substituting T for U yields C box = 3'-AGTAGT-5' in the transcription direction on the template strand.

Samarsky C box samplings

For the Basic programs (starting with SuccessablesCbox.bas or SuccessablesDbox.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. negative strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesCbox--.bas, looking for AGTAGT, 4, AGTAGT at 3521, AGTAGT at 3418, AGTAGT at 2944, AGTAGT at 2888.
  2. negative strand, positive direction: 0.
  3. positive strand, negative direction: 0.
  4. positive strand, positive direction: 1, AGTAGT at 3251.
  5. complement, negative strand, negative direction is SuccessablesCboxc--.bas, looking for 5'-TCATCA-3', 0,
  6. complement, negative strand, positive direction is SuccessablesCboxc-+.bas, looking for 5'-TCATCA-3', 1, 5'-TCATCA-3', 3251,
  7. complement, positive strand, negative direction is SuccessablesCboxc+-.bas, looking for 5'-TCATCA-3', 4, 5'-TCATCA-3', 2888, 5'-TCATCA-3', 2944, 5'-TCATCA-3', 3418, 5'-TCATCA-3', 3521,
  8. complement, positive strand, positive direction is SuccessablesCboxc++.bas, looking for 5'-TCATCA-3', 0,
  9. inverse complement, negative strand, negative direction: 0.
  10. inverse complement, negative strand, positive direction: 0.
  11. inverse complement, positive strand, negative direction: 0.
  12. inverse complement, positive strand, positive direction: 1, ACTACT at 2144.
  13. inverse, negative strand, negative direction, is SuccessablesCboxi--.bas, looking for 5'-TGATGA-3', 0,
  14. inverse, negative strand, positive direction, is SuccessablesCboxi-+.bas, looking for 5'-TGATGA-3', 1, 5'-TGATGA-3', 2144,
  15. inverse, positive strand, negative direction, is SuccessablesCboxi+-.bas, looking for 5'-TGATGA-3', 0,
  16. inverse, positive strand, positive direction, is SuccessablesCboxi++.bas, looking for 5'-TGATGA-3', 0.

BoxC (4560-2846) UTRs

  1. Negative strand, negative direction: AGTAGT at 3521, AGTAGT at 3418, AGTAGT at 2944, AGTAGT at 2888.

BoxC positive direction (4050-1) distal promoters

  1. Positive strand, positive direction: AGTAGT at 3251, ACTACT at 2144.

Cbox (Samarsky) random samplings

  1. Cboxr0: 0.
  2. Cboxr1: 0.
  3. Cboxr2: 0.
  4. Cboxr3: 0.
  5. Cboxr4: 0.
  6. Cboxr5: 1, AGTAGT at 3259.
  7. Cboxr6: 3, AGTAGT at 3454, AGTAGT at 801, AGTAGT at 531.
  8. Cboxr7: 0.
  9. Cboxr8: 0.
  10. Cboxr9: 1, AGTAGT at 16.
  11. Cboxr0ci: 0.
  12. Cboxr1ci: 1, ACTACT at 374.
  13. Cboxr2ci: 2, ACTACT at 317, ACTACT at 314.
  14. Cboxr3ci: 0.
  15. Cboxr4ci: 1, ACTACT at 3715.
  16. Cboxr5ci: 0.
  17. Cboxr6ci: 0.
  18. Cboxr7ci: 2, ACTACT at 4118, ACTACT at 3925.
  19. Cboxr8ci: 0.
  20. Cboxr9ci: 1, ACTACT at 4090.

BoxCr arbitrary (evens) (4560-2846) UTRs

  1. Cboxr6: AGTAGT at 3454.
  2. Cboxr4ci: ACTACT at 3715.

BoxCr alternate (odds) (4560-2846) UTRs

  1. Cboxr5: AGTAGT at 3259.
  2. Cboxr7ci: ACTACT at 4118, ACTACT at 3925.
  3. Cboxr9ci: ACTACT at 4090.

BoxCr arbitrary positive direction (odds) (4265-4050) proximal promoters

  1. Cboxr7ci: ACTACT at 4118.
  2. Cboxr9ci: ACTACT at 4090.

BoxCr arbitrary negative direction (evens) (2596-1) distal promoters

  1. Cboxr6: AGTAGT at 801, AGTAGT at 531.
  2. Cboxr2ci: ACTACT at 317, ACTACT at 314.

BoxCr alternate negative direction (odds) (2596-1) distal promoters

  1. Cboxr9: AGTAGT at 16.
  2. Cboxr1ci: ACTACT at 374.

BoxCr arbitrary positive direction (odds) (4050-1) distal promoters

  1. Cboxr5: AGTAGT at 3259.
  2. Cboxr9: AGTAGT at 16.
  3. Cboxr1ci: ACTACT at 374.
  4. Cboxr7ci: ACTACT at 3925.

BoxCr alternate positive direction (evens) (4050-1) distal promoters

  1. Cboxr6: AGTAGT at 3454, AGTAGT at 801, AGTAGT at 531.
  2. Cboxr2ci: ACTACT at 317, ACTACT at 314.
  3. Cboxr4ci: ACTACT at 3715.

Cbox (Samarsky) analysis and results

AGTAGT.[2]

Reals or randoms Promoters direction Numbers Strands Occurrences Averages (± 0.1)
Reals UTR negative 4 2 2 2
Randoms UTR arbitrary negative 2 10 0.2 0.3
Randoms UTR alternate negative 4 10 0.4 0.3
Reals Core negative 0 2 0 0
Randoms Core arbitrary negative 0 10 0 0
Randoms Core alternate negative 0 10 0 0
Reals Core positive 0 2 0 0
Randoms Core arbitrary positive 0 10 0 0
Randoms Core alternate positive 0 10 0 0
Reals Proximal negative 0 2 0 0
Randoms Proximal arbitrary negative 0 10 0 0
Randoms Proximal alternate negative 0 10 0 0
Reals Proximal positive 0 2 0 0
Randoms Proximal arbitrary positive 2 10 0.2 0.1
Randoms Proximal alternate positive 0 10 0 0.1
Reals Distal negative 0 2 0 0
Randoms Distal arbitrary negative 4 10 0.4 0.3
Randoms Distal alternate negative 2 10 0.2 0.3
Reals Distal positive 2 2 1 1
Randoms Distal arbitrary positive 4 10 0.4 0.5
Randoms Distal alternate positive 6 10 0.6 0.5

Comparison:

The occurrences of real Cbox(Samarsky)s are greater than the randoms. This suggests that the real Cbox(Samarsky)s are likely active or activable.

Voronina C box samplings

For the Basic programs starting with SuccessablesCVbox.bas written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. negative strand, negative direction, looking for 5'-GGTGATG-3'[3], 0.
  2. negative strand, positive direction, looking for 5'-GGTGATG-3', 0.
  3. positive strand, negative direction, looking for 5'-GGTGATG-3', 1, 5'-GGTGATG-3' at 3798.
  4. positive strand, positive direction, looking for 5'-GGTGATG-3', 0.
  5. complement, negative strand, negative direction, looking for 5'-CCACTAC-3', 1, 5'-CCACTAC-3' at 3798.
  6. complement, negative strand, positive direction, looking for 5'-CCACTAC-3', 0.
  7. complement, positive strand, negative direction, looking for 5'-CCACTAC-3', 0.
  8. complement, positive strand, positive direction, looking for 5'-CCACTAC-3', 0.
  9. inverse complement, negative strand, negative direction, looking for 5'-CATCACC-3', 0.
  10. inverse complement, negative strand, positive direction, looking for 5'-CATCACC-3', 0.
  11. inverse complement, positive strand, negative direction, looking for 5'-CATCACC-3', 0.
  12. inverse complement, positive strand, positive direction, looking for 5'-CATCACC-3', 0.
  13. inverse, negative strand, negative direction, looking for 5'-GTAGTGG-3', 0.
  14. inverse, negative strand, positive direction, looking for 5'-GTAGTGG-3', 0.
  15. inverse, positive strand, negative direction, looking for 5'-GTAGTGG-3', 0.
  16. inverse, positive strand, positive direction, looking for 5'-GTAGTGG-3', 0.

CV box (4560-2846) UTRs

  1. Positive strand, negative direction: GGTGATG at 3798.

CVbox random dataset samplings

  1. CVboxr0: 0.
  2. CVboxr1: 0.
  3. CVboxr2: 0.
  4. CVboxr3: 0.
  5. CVboxr4: 2, GGTGATG at 2498, GGTGATG at 1821.
  6. CVboxr5: 1, GGTGATG at 2781.
  7. CVboxr6: 0.
  8. CVboxr7: 0.
  9. CVboxr8: 0.
  10. CVboxr9: 0.
  11. CVboxr0ci: 0.
  12. CVboxr1ci: 0.
  13. CVboxr2ci: 0.
  14. CVboxr3ci: 0.
  15. CVboxr4ci: 1, CATCACC at 3456.
  16. CVboxr5ci: 0.
  17. CVboxr6ci: 0.
  18. CVboxr7ci: 0.
  19. CVboxr8ci: 1, CATCACC at 808.
  20. CVboxr9ci: 0.

CVboxr arbitrary (evens) (4560-2846) UTRs

  1. CVboxr4ci: CATCACC at 3456.

CVboxr alternate negative direction (odds) (2811-2596) proximal promoters

  1. CVboxr5: GGTGATG at 2781.

CVboxr arbitrary negative direction (evens) (2596-1) distal promoters

  1. CVboxr4: GGTGATG at 2498, GGTGATG at 1821.
  2. CVboxr8ci: CATCACC at 808.

CVboxr arbitrary positive direction (odds) (4050-1) distal promoters

  1. CVboxr5: GGTGATG at 2781.

CVboxr alternate positive direction (evens) (4050-1) distal promoters

  1. CVboxr4: GGTGATG at 2498, GGTGATG at 1821.
  2. CVboxr4ci: CATCACC at 3456.
  3. CVboxr8ci: CATCACC at 808.

Voronina C box analysis and results

Described by Voronina (GGTGATG, positive strand, negative direction at 3798).[3]

Reals or randoms Promoters direction Numbers Strands Occurrences Averages (± 0.1)
Reals UTR negative 1 2 0.5 0.5
Randoms UTR arbitrary negative 1 10 0.1 0.05
Randoms UTR alternate negative 0 10 0 0.05
Reals Core negative 0 2 0 0
Randoms Core arbitrary negative 0 10 0 0
Randoms Core alternate negative 0 10 0 0
Reals Core positive 0 2 0 0
Randoms Core arbitrary positive 0 10 0 0
Randoms Core alternate positive 0 10 0 0
Reals Proximal negative 0 2 0 0
Randoms Proximal arbitrary negative 0 10 0 0.05
Randoms Proximal alternate negative 1 10 0.1 0.05
Reals Proximal positive 0 2 0 0
Randoms Proximal arbitrary positive 0 10 0 0
Randoms Proximal alternate positive 0 10 0 0
Reals Distal negative 0 2 0 0
Randoms Distal arbitrary negative 3 10 0.3 0.15
Randoms Distal alternate negative 0 10 0 0.15
Reals Distal positive 0 2 0 0
Randoms Distal arbitrary positive 1 10 0.1 0.25
Randoms Distal alternate positive 4 10 0.4 0.25

Comparison:

The occurrences of real CV boxes are greater than the randoms. This suggests that the real CV boxs are likely active or activable.

Song C-boxes

Analysis "of the recombinant (soybean [Glycine max] TGACG-motif binding factor 1) STF1 protein revealed the C-box (nGACGTCn) to be a high-affinity binding site (Cheong et al., 1998). The HY5 protein interacts with both the G- (CACGTG) and Z- (ATACGTGT) boxes of the light-regulated promoter of RbcS1A (ribulose bisphosphate carboxylase small subunit) and the CHS (chalcone synthase) genes (Ang et al., 1998; Chattopadhyay et al., 1998; Yadav et al., 2002). To test whether STF1 and HY5 have similar DNA-binding properties, the binding properties of each were compared with eight different DNA sequences that represent G-, C-, and C/G-box motifs [TGACGTGT]. C-box sequences carrying the mammalian cAMP responsive element (CRE; TGACGTCA) motif and the Hex sequence (TGACGTGGC), a hybrid C/G-box (Cheong et al., 1998), were high-affinity binding sites for both proteins [...]. No binding or limited binding was observed to as-1 (Lam et al., 1989), nos-1 (Lam et al., 1990), or the AP-1 site (TGACTCA; Kim et al., 1993). Binding to the palindromic G-box (PA G-box, GCCACGTGGC) was moderate. However, binding activity to the G-box of the light-responsive unit 1 (U1) region of the parsley (Petroselinum crispum) CHS promoter (CHS-U1: TCCACGTGGC; Schulze-Lefert et al., 1989) or the G-box of GmAux28 (TCCACGTGTC) was much weaker than to the PA G-box [...]."[4]

The "binding affinities of both bZIP proteins were similar to CREA/T (ATGACGTCAT), a CRE sequence with flanking adenine and thymine (A/T) at positions -4 and +4. [The] bZIP domains of both STF1 and HY5 have similar binding properties for recognizing ACGT-containing elements (ACEs). [Although] the G-box is a known target site for the HY5 protein, the C-box sequences are the preferred binding sites for both STF1 and HY5."[4]

"When analyzed by type of ACE, these sequences can be grouped into four subclasses [...]: C-box, where the C residue comes at the 12 position; a hybrid C/G- box (C/G-box), with G at the 12 position; C/A-box [TGACGTAT], with A at the 12 position; and C/T-box, with T at the 12 position. The C-box subclass contains the largest number of selected binding sites for STF1 (38% at 50 mM KCl and 48% at 150 mM), followed by the C/G- (25.3%) and the C/A-boxes (26%). Only a small number of C/T-boxes [TGACGTTA] (4/100) and non-TGACGT sequences (4/100) were selected."[4]

C-boxes are TCTTACGTCATC, AATGACGTCGAA, TCTCACGTGTGG, TTTGACGTGTGA, GATGACGTCATC, and AGAGACGTCAAC for an apparent consensus sequence of (A/G/T)(A/C/G/T)(A/T)(C/G/T)ACGT(C/G)(A/G/T)(A/G/T)(A/C/G).[4]

Song C-box samplings

For the Basic programs starting with SuccessablesC-box.bas written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. negative strand, negative direction, looking for GACGTC[4], 1, GACGTC at 4316.
  2. negative strand, positive direction, looking for GACGTC, 0.
  3. positive strand, negative direction, looking for GACGTC, 0,
  4. positive strand, positive direction, looking for GACGTC, 9, GACGTC at 4316, GACGTC at 3280, GACGTC at 3231, GACGTC at 2858, GACGTC at 1506, GACGTC at 1120, GACGTC at 532, GACGTC at 437, GACGTC at 193.
  5. inverse complement is the same as the direct consensus sequence.
  6. complement, negative strand, negative direction, looking for 5'-CTGCAG-3', 0,
  7. complement, negative strand, positive direction, looking for 5'-CTGCAG-3', 9, 5'-CTGCAG-3' at 193, 5'-CTGCAG-3' at 437, 5'-CTGCAG-3' at 532, 5'-CTGCAG-3' at 1120, 5'-CTGCAG-3' at 1506, 5'-CTGCAG-3' at 2858, 5'-CTGCAG-3' at 3231, 5'-CTGCAG-3' at 3280, 5'-CTGCAG-3' at 4316.
  8. complement, positive strand, negative direction, looking for 5'-CTGCAG-3', 1, 5'-CTGCAG-3' at 4316.
  9. complement, positive strand, positive direction, looking for 5'-CTGCAG-3', 0.

Cbox (4560-2846) UTRs

  1. Negative strand, negative direction: GACGTC at 4316.

Cbox positive direction (4050-1) distal promoters

  1. Positive strand, positive direction: GACGTC at 3280, GACGTC at 3231, GACGTC at 2858, GACGTC at 1506, GACGTC at 1120, GACGTC at 532, GACGTC at 437, GACGTC at 193.

C-box random dataset samplings

  1. C-boxr0: 1, GACGTC at 2538.
  2. C-boxr1: 1, GACGTC at 1185.
  3. C-boxr2: 1, GACGTC at 1604.
  4. C-boxr3: 1, GACGTC at 2584.
  5. C-boxr4: 1, GACGTC at 2314.
  6. C-boxr5: 0.
  7. C-boxr6: 1, GACGTC at 610.
  8. C-boxr7: 1, GACGTC at 697.
  9. C-boxr8: 2, GACGTC at 4066, GACGTC at 3543.
  10. C-boxr9: 0.

C-boxr arbitrary (evens) (4560-2846) UTRs

  1. C-boxr8: GACGTC at 4066, GACGTC at 3543.

C-boxr alternate positive direction (evens) (4265-4050) proximal promoters

  1. C-boxr8: GACGTC at 4066.

C-boxr arbitrary negative direction (evens) (2596-1) distal promoters

  1. C-boxr0: GACGTC at 2538.
  2. C-boxr2: GACGTC at 1604.
  3. C-boxr4: GACGTC at 2314.
  4. C-boxr6: GACGTC at 610.

C-boxr alternate negative direction (odds) (2596-1) distal promoters

  1. C-boxr1: GACGTC at 1185.
  2. C-boxr3: GACGTC at 2584.
  3. C-boxr7: GACGTC at 697.

C-boxr arbitrary positive direction (odds) (4050-1) distal promoters

  1. C-boxr1: GACGTC at 1185.
  2. C-boxr3: GACGTC at 2584.
  3. C-boxr7: GACGTC at 697.

C-boxr alternate positive direction (evens) (4050-1) distal promoters

  1. C-boxr0: GACGTC at 2538.
  2. C-boxr2: GACGTC at 1604.
  3. C-boxr4: GACGTC at 2314.
  4. C-boxr6: GACGTC at 610.
  5. C-boxr8: GACGTC at 3543.

C-box (Song) analysis and results

Analysis "of the recombinant (soybean [Glycine max] TGACG-motif binding factor 1) STF1 protein revealed the C-box (nGACGTCn) to be a high-affinity binding site (Cheong et al., 1998)."[4]

Reals or randoms Promoters direction Numbers Strands Occurrences Averages (± 0.1)
Reals UTR negative 1 2 0.5 0.5
Randoms UTR arbitrary negative 2 10 0.2 0.1
Randoms UTR alternate negative 0 10 0 0.1
Reals Core negative 0 2 0 0
Randoms Core arbitrary negative 0 10 0 0
Randoms Core alternate negative 0 10 0 0
Reals Core positive 0 2 0 0
Randoms Core arbitrary positive 0 10 0 0
Randoms Core alternate positive 0 10 0 0
Reals Proximal negative 0 2 0 0
Randoms Proximal arbitrary negative 0 10 0 0
Randoms Proximal alternate negative 0 10 0 0
Reals Proximal positive 0 2 0 0
Randoms Proximal arbitrary positive 0 10 0 0.05
Randoms Proximal alternate positive 1 10 0.1 0.05
Reals Distal negative 0 2 0 0
Randoms Distal arbitrary negative 4 10 0.4 0.35
Randoms Distal alternate negative 3 10 0.3 0.35
Reals Distal positive 8 2 4 4
Randoms Distal arbitrary positive 3 10 0.3 0.4
Randoms Distal alternate positive 5 10 0.5 0.4

Comparison:

The occurrences of real C-box (Song)s are greater than the randoms. This suggests that the real C-box (Song)s are likely active or activable.

Song C box hybrids

Hybrid C, A boxes

"When analyzed by type of ACE, these sequences can be grouped into four subclasses [...]: C-box, where the C residue comes at the 12 position; a hybrid C/G- box (C/G-box), with G at the 12 position; C/A-box [TGACGTAT], with A at the 12 position; and C/T-box, with T at the 12 position."[4]

Hybrid C, G boxes

"To test whether STF1 and HY5 have similar DNA-binding properties, the binding properties of each were compared with eight different DNA sequences that represent G-, C-, and C/G-box motifs [TGACGTGT]. C-box sequences carrying the mammalian cAMP responsive element (CRE; TGACGTCA) motif and the Hex sequence (TGACGTGGC), a hybrid C/G-box (Cheong et al., 1998), were high-affinity binding sites for both proteins [...]."[4]

Hybrid C, T boxes

"Only a small number of C/T-boxes [TGACGTTA] (4/100) and non-TGACGT sequences (4/100) were selected."[4]

Song hybrid C box samplings

Hybrid C, A box samplings

Copying a portion of the consensus sequence for the hybrid C, A box of TGACGTAT and putting it in "⌘F" finds none located between ZSCAN22 and A1BG and none between ZNF497 and A1BG as can be found by the computer programs.

For the Basic programs SuccessablesCAbox.bas written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. negative strand, negative direction, looking for TGACGTAT[4], 0.
  2. negative strand, positive direction, looking for TGACGTAT, 0.
  3. positive strand, negative direction, looking for TGACGTAT, 0.
  4. positive strand, positive direction, looking for TGACGTAT, 0.
  5. complement, negative strand, negative direction, looking for ACTGCATA, 0.
  6. complement, negative strand, positive direction, looking for ACTGCATA, 0.
  7. complement, positive strand, negative direction, looking for ACTGCATA, 0.
  8. complement, positive strand, positive direction, looking for ACTGCATA, 0.
  9. inverse complement, negative strand, negative direction, looking for ATACGTCA, 0.
  10. inverse complement, negative strand, positive direction, looking for ATACGTCA, 0.
  11. inverse complement, positive strand, negative direction, looking for ATACGTCA, 0.
  12. inverse complement, positive strand, positive direction, looking for ATACGTCA, 0.
  13. inverse, negative strand, negative direction, looking for TATGCAGT, 0.
  14. inverse, negative strand, positive direction, looking for TATGCAGT, 0.
  15. inverse, positive strand, negative direction, looking for TATGCAGT, 0.
  16. inverse, positive strand, positive direction, looking for TATGCAGT, 0.

CAbox random dataset samplings

  1. CAboxr0: 0.
  2. CAboxr1: 0.
  3. CAboxr2: 0.
  4. CAboxr3: 0.
  5. CAboxr4: 0.
  6. CAboxr5: 0.
  7. CAboxr6: 0.
  8. CAboxr7: 0.
  9. CAboxr8: 0.
  10. CAboxr9: 0.
  11. CAboxr0ci: 0.
  12. CAboxr1ci: 0.
  13. CAboxr2ci: 0.
  14. CAboxr3ci: 0.
  15. CAboxr4ci: 1, ATACGTCA at 901.
  16. CAboxr5ci: 0.
  17. CAboxr6ci: 1, ATACGTCA at 838.
  18. CAboxr7ci: 0.
  19. CAboxr8ci: 0.
  20. CAboxr9ci: 0.

CAboxr distal promoters

  1. CAboxr4ci: 1, ATACGTCA at 901.
  2. CAboxr6ci: 1, ATACGTCA at 838.

Hybrid C, G box samplings

Copying a portion of the consensus sequence for the hybrid C, G box of TGACGTGT and putting it in "⌘F" finds none located between ZSCAN22 and A1BG and none between ZNF497 and A1BG as can be found by the computer programs.

For the Basic programs SuccessablesCGbox.bas written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. negative strand, negative direction, looking for TGACGTGT[4], 0.
  2. negative strand, positive direction, looking for TGACGTGT, 0.
  3. positive strand, negative direction, looking for TGACGTGT, 0.
  4. positive strand, positive direction, looking for TGACGTGT, 0.
  5. complement, negative strand, negative direction, looking for ACTGCACA, 0.
  6. complement, negative strand, positive direction, looking for ACTGCACA, 0.
  7. complement, positive strand, negative direction, looking for ACTGCACA, 0.
  8. complement, positive strand, positive direction, looking for ACTGCACA, 0.
  9. inverse complement, negative strand, negative direction, looking for ACACGTCA, 0.
  10. inverse complement, negative strand, positive direction, looking for ACACGTCA, 0.
  11. inverse complement, positive strand, negative direction, looking for ACACGTCA, 0.
  12. inverse complement, positive strand, positive direction, looking for ACACGTCA, 1, ACACGTCA at 3962.
  13. inverse, negative strand, negative direction, looking for TGTGCAGT, 0.
  14. inverse, negative strand, positive direction, looking for TGTGCAGT, 1, TGTGCAGT at 3962.
  15. inverse, positive strand, negative direction, looking for TGTGCAGT, 0.
  16. inverse, positive strand, positive direction, looking for TGTGCAGT, 0.

CGbox positive direction (4050-1) distal promoters

  1. Positive strand, positive direction: ACACGTCA at 3962.

CGbox random dataset samplings

  1. CGboxr0: 0.
  2. CGboxr1: 0.
  3. CGboxr2: 0.
  4. CGboxr3: 0.
  5. CGboxr4: 0.
  6. CGboxr5: 0.
  7. CGboxr6: 0.
  8. CGboxr7: 0.
  9. CGboxr8: 1, TGACGTGT at 915.
  10. CGboxr9: 0.
  11. CGboxr0ci: 0.
  12. CGboxr1ci: 0.
  13. CGboxr2ci: 0.
  14. CGboxr3ci: 0.
  15. CGboxr4ci: 0.
  16. CGboxr5ci: 0.
  17. CGboxr6ci: 0.
  18. CGboxr7ci: 0.
  19. CGboxr8ci: 0.
  20. CGboxr9ci: 0.

CGboxr arbitrary negative direction (evens) (2596-1) distal promoters

  1. CGboxr8: TGACGTGT at 915.

CGboxr alternate positive direction (evens) (4050-1) distal promoters

  1. CGboxr8: TGACGTGT at 915.

CGbox analysis and results

TGACGTGT.[4]

Reals or randoms Promoters direction Numbers Strands Occurrences Averages (± 0.1)
Reals UTR negative 0 2 0 0
Randoms UTR arbitrary negative 0 10 0 0
Randoms UTR alternate negative 0 10 0 0
Reals Core negative 0 2 0 0
Randoms Core arbitrary negative 0 10 0 0
Randoms Core alternate negative 0 10 0 0
Reals Core positive 0 2 0 0
Randoms Core arbitrary positive 0 10 0 0
Randoms Core alternate positive 0 10 0 0
Reals Proximal negative 0 2 0 0
Randoms Proximal arbitrary negative 0 10 0 0
Randoms Proximal alternate negative 0 10 0 0
Reals Proximal positive 0 2 0 0
Randoms Proximal arbitrary positive 0 10 0 0
Randoms Proximal alternate positive 0 10 0 0
Reals Distal negative 0 2 0 0
Randoms Distal arbitrary negative 1 10 0.1 0.05
Randoms Distal alternate negative 0 10 0 0.05
Reals Distal positive 1 2 0.5 0.5
Randoms Distal arbitrary positive 0 10 0 0.05
Randoms Distal alternate positive 1 10 0.1 0.05

Comparison:

The occurrences of real CGboxes are greater than the randoms. This suggests that the real CGboxes are likely active or activable.

Hybrid C, T box samplings

Copying a portion of the consensus sequence for the hybrid C, T box of TGACGTTA and putting it in "⌘F" finds none located between ZSCAN22 and A1BG and none between ZNF497 and A1BG as can be found by the computer programs.

For the Basic programs SuccessablesCTbox.bas written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. negative strand, negative direction, looking for TGACGTTA[4], 0.
  2. negative strand, positive direction, looking for TGACGTTA, 0.
  3. positive strand, negative direction, looking for TGACGTTA, 0.
  4. positive strand, positive direction, looking for TGACGTTA, 0.
  5. complement, negative strand, negative direction, looking for ACTGCAAT, 0.
  6. complement, negative strand, positive direction, looking for ACTGCAAT, 0.
  7. complement, positive strand, negative direction, looking for ACTGCAAT, 0.
  8. complement, positive strand, positive direction, looking for ACTGCAAT, 0.
  9. inverse complement, negative strand, negative direction, looking for TAACGTCA, 0.
  10. inverse complement, negative strand, positive direction, looking for TAACGTCA, 0.
  11. inverse complement, positive strand, negative direction, looking for TAACGTCA, 0.
  12. inverse complement, positive strand, positive direction, looking for TAACGTCA, 0.
  13. inverse, negative strand, negative direction, looking for ATTGCAGT, 0.
  14. inverse, negative strand, positive direction, looking for ATTGCAGT, 0.
  15. inverse, positive strand, negative direction, looking for ATTGCAGT, 0.
  16. inverse, positive strand, positive direction, looking for ATTGCAGT, 0.

CTbox random dataset samplings

  1. CTboxr0: 0.
  2. CTboxr1: 0.
  3. CTboxr2: 0.
  4. CTboxr3: 0.
  5. CTboxr4: 0.
  6. CTboxr5: 0.
  7. CTboxr6: 0.
  8. CTboxr7: 0.
  9. CTboxr8: 0.
  10. CTboxr9: 0.
  11. CTboxr0ci: 0.
  12. CTboxr1ci: 0.
  13. CTboxr2ci: 0.
  14. CTboxr3ci: 0.
  15. CTboxr4ci: 1, TAACGTCA at 2405.
  16. CTboxr5ci: 0.
  17. CTboxr6ci: 1, TAACGTCA at 1638.
  18. CTboxr7ci: 0.
  19. CTboxr8ci: 0.
  20. CTboxr9ci: 0.

CTboxr distal promoters

  1. CTboxr4ci: 1, TAACGTCA at 2405.
  2. CTboxr6ci: 1, TAACGTCA at 1638.

Song C box hybrids analysis and results

The real promoters have no hybrid C/A boxes and the random datasets only had two in the negative direction for an occurrence of 0.2.

The real promoters have only one hybrid C/G box on the positive strand in the positive direction in the distal promoter ACACGTCA at 3962 for an occurrence of 0.5. The random datasets had only one CG box in the arbitrary negative direction in the distal promoter TGACGTGT at 915 for an occurrence of 0.1.

It is suggested that the one C/G box hybrid is likely active or activable.

The real promoters have no C/T box hybrid consensus sequences and the random datasets had two in the negative direction for an occurrence of 0.2.

Acknowledgements

The content on this page was first contributed by: Henry A. Hoff.

Initial content for this page in some instances came from Wikiversity.

See also

References

  1. 1.0 1.1 PA Johnson, D Bunick, NB Hecht (1991). "Protein Binding Regions in the Mouse and Rat Protamine-2 Genes" (PDF). Biology of Reproduction. 44 (1): 127–134. Retrieved 6 April 2019.
  2. 2.0 2.1 2.2 2.3 2.4 Dmitry A. Samarsky, Maurille J.Fournier, Robert H.Singer and Edouard Bertrand (1 July 1998). "The snoRNA box C/D motif directs nucleolar targeting and also couples snoRNA synthesis and localization" (PDF). The European Molecular Biology Organization (EMBO) Journal. 17 (13): 3747–3757. doi:10.1093/emboj/17.13.3747. PMID 9649444. Retrieved 2017-02-04.
  3. 3.0 3.1 3.2 E. N. Voronina, T. D. Kolokol’tsova, E. A. Nechaeva, and M. L. Filipenko (2003). "Structural–Functional Analysis of the Human Gene for Ribosomal Protein L11" (PDF). Molecular Biology. 37 (3): 362–371. Retrieved 11 April 2019.
  4. 4.00 4.01 4.02 4.03 4.04 4.05 4.06 4.07 4.08 4.09 4.10 4.11 4.12 4.13 Young Hun Song, Cheol Min Yoo, An Pio Hong, Seong Hee Kim, Hee Jeong Jeong, Su Young Shin, Hye Jin Kim, Dae-Jin Yun, Chae Oh Lim, Jeong Dong Bahk, Sang Yeol Lee, Ron T. Nagao, Joe L. Key, and Jong Chan Hong (April 2008). "DNA-Binding Study Identifies C-Box and Hybrid C/G-Box or C/A-Box Motifs as High-Affinity Binding Sites for STF1 and LONG HYPOCOTYL5 Proteins" (PDF). Plant Physiology. 146 (4): 1862–1877. doi:10.1104/pp.107.113217. Retrieved 26 March 2019.

External links