C box gene transcriptions: Difference between revisions

Jump to navigation Jump to search
 
(24 intermediate revisions by the same user not shown)
Line 90: Line 90:


For the Basic programs (starting with SuccessablesCbox.bas or SuccessablesDbox.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:
For the Basic programs (starting with SuccessablesCbox.bas or SuccessablesDbox.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:
# negative strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesCbox--.bas, looking for 5'-AGTAGT-3'<ref name=Samarsky/>, 4, 5'-AGTAGT-3', 2888, 5'-AGTAGT-3', 2944, 5'-AGTAGT-3', 3418, 5'-AGTAGT-3', 3521,
# negative strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesCbox--.bas, looking for AGTAGT, 4, AGTAGT at 3521, AGTAGT at 3418, AGTAGT at 2944, AGTAGT at 2888.
# negative strand in the positive direction (from ZNF497 to A1BG) is SuccessablesCbox-+.bas, looking for 5'-AGTAGT-3', 0,
# negative strand, positive direction: 0.
# positive strand in the negative direction is SuccessablesCbox+-.bas, looking for 5'-AGTAGT-3', 0,
# positive strand, negative direction: 0.
# positive strand in the positive direction is SuccessablesCbox++.bas, looking for 5'-AGTAGT-3', 1, 5'-AGTAGT-3', 3251,
# positive strand, positive direction: 1, AGTAGT at 3251.
# complement, negative strand, negative direction is SuccessablesCboxc--.bas, looking for 5'-TCATCA-3', 0,
# complement, negative strand, negative direction is SuccessablesCboxc--.bas, looking for 5'-TCATCA-3', 0,
# complement, negative strand, positive direction is SuccessablesCboxc-+.bas, looking for 5'-TCATCA-3', 1, 5'-TCATCA-3', 3251,
# complement, negative strand, positive direction is SuccessablesCboxc-+.bas, looking for 5'-TCATCA-3', 1, 5'-TCATCA-3', 3251,
# complement, positive strand, negative direction is SuccessablesCboxc+-.bas, looking for 5'-TCATCA-3', 4, 5'-TCATCA-3', 2888, 5'-TCATCA-3', 2944, 5'-TCATCA-3', 3418, 5'-TCATCA-3', 3521,
# complement, positive strand, negative direction is SuccessablesCboxc+-.bas, looking for 5'-TCATCA-3', 4, 5'-TCATCA-3', 2888, 5'-TCATCA-3', 2944, 5'-TCATCA-3', 3418, 5'-TCATCA-3', 3521,
# complement, positive strand, positive direction is SuccessablesCboxc++.bas, looking for 5'-TCATCA-3', 0,
# complement, positive strand, positive direction is SuccessablesCboxc++.bas, looking for 5'-TCATCA-3', 0,
# inverse complement, negative strand, negative direction is SuccessablesCboxci--.bas, looking for 5'-ACTACT-3', 0,
# inverse complement, negative strand, negative direction: 0.
# inverse complement, negative strand, positive direction is SuccessablesCboxci-+.bas, looking for 5'-ACTACT-3', 0,
# inverse complement, negative strand, positive direction: 0.
# inverse complement, positive strand, negative direction is SuccessablesCboxci+-.bas, looking for 5'-ACTACT-3', 0,
# inverse complement, positive strand, negative direction: 0.
# inverse complement, positive strand, positive direction is SuccessablesCboxci++.bas, looking for 5'-ACTACT-3', 1, 5'-ACTACT-3' at 2144.
# inverse complement, positive strand, positive direction: 1, ACTACT at 2144.
# inverse, negative strand, negative direction, is SuccessablesCboxi--.bas, looking for 5'-TGATGA-3', 0,
# inverse, negative strand, negative direction, is SuccessablesCboxi--.bas, looking for 5'-TGATGA-3', 0,
# inverse, negative strand, positive direction, is SuccessablesCboxi-+.bas, looking for 5'-TGATGA-3', 1, 5'-TGATGA-3', 2144,
# inverse, negative strand, positive direction, is SuccessablesCboxi-+.bas, looking for 5'-TGATGA-3', 1, 5'-TGATGA-3', 2144,
Line 107: Line 107:
# inverse, positive strand, positive direction, is SuccessablesCboxi++.bas, looking for 5'-TGATGA-3', 0.
# inverse, positive strand, positive direction, is SuccessablesCboxi++.bas, looking for 5'-TGATGA-3', 0.


===C box S UTRs===
===BoxC (4560-2846) UTRs===
{{main|UTR promoter gene transcriptions}}
Negative strand, negative direction: AGTAGT at 3521, AGTAGT at 3418, AGTAGT at 2944, AGTAGT at 2888.


===C box S distal promoters===
# Negative strand, negative direction: AGTAGT at 3521, AGTAGT at 3418, AGTAGT at 2944, AGTAGT at 2888.
{{main|Distal promoter gene transcriptions}}
 
Negative strand, positive direction: TGATGA at 2144.
===BoxC positive direction (4050-1) distal promoters===


Positive strand, positive direction: AGTAGT at 3251.
# Positive strand, positive direction: AGTAGT at 3251, ACTACT at 2144.


===Random dataset samplings===
==Cbox (Samarsky) random samplings==


# Cboxr0: 0.
# Cboxr0: 0.
Line 140: Line 138:
# Cboxr9ci: 1, ACTACT at 4090.
# Cboxr9ci: 1, ACTACT at 4090.


===Cboxr UTRs===
===BoxCr arbitrary (evens) (4560-2846) UTRs===
{{main|UTR promoter gene transcriptions}}
 
# Cboxr6: AGTAGT at 3454.
# Cboxr6: AGTAGT at 3454.
# Cboxr4ci: ACTACT at 3715.
# Cboxr4ci: ACTACT at 3715.


===Cboxr proximal promoters===
===BoxCr alternate (odds) (4560-2846) UTRs===
{{main|Proximal promoter gene transcriptions}}
 
# Cboxr5: AGTAGT at 3259.
# Cboxr7ci: ACTACT at 4118, ACTACT at 3925.
# Cboxr9ci: ACTACT at 4090.
 
===BoxCr arbitrary positive direction (odds) (4265-4050) proximal promoters===
 
# Cboxr7ci: ACTACT at 4118.
# Cboxr7ci: ACTACT at 4118.
# Cboxr9ci: ACTACT at 4090.
# Cboxr9ci: ACTACT at 4090.


===Cboxr distal promoters===
===BoxCr arbitrary negative direction (evens) (2596-1) distal promoters===
{{main|Distal promoter gene transcriptions}}
 
Negative direction:
# Cboxr6: AGTAGT at 801, AGTAGT at 531.
# Cboxr6: AGTAGT at 801, AGTAGT at 531.
# Cboxr2ci: ACTACT at 317, ACTACT at 314.
# Cboxr2ci: ACTACT at 317, ACTACT at 314.


Positive direction:
===BoxCr alternate negative direction (odds) (2596-1) distal promoters===
 
# Cboxr9: AGTAGT at 16.
# Cboxr1ci: ACTACT at 374.
 
===BoxCr arbitrary positive direction (odds) (4050-1) distal promoters===
 
# Cboxr5: AGTAGT at 3259.
# Cboxr5: AGTAGT at 3259.
# Cboxr9: AGTAGT at 16.
# Cboxr9: AGTAGT at 16.
Line 162: Line 171:
# Cboxr7ci: ACTACT at 3925.
# Cboxr7ci: ACTACT at 3925.


==Samarsky C box analysis and results==
===BoxCr alternate positive direction (evens) (4050-1) distal promoters===
 
# Cboxr6: AGTAGT at 3454, AGTAGT at 801, AGTAGT at 531.
# Cboxr2ci: ACTACT at 317, ACTACT at 314.
# Cboxr4ci: ACTACT at 3715.
 
==Cbox (Samarsky) analysis and results==
{{main|Complex locus A1BG and ZNF497#C boxes (Samarsky)}}
{{main|Complex locus A1BG and ZNF497#C boxes (Samarsky)}}
The real promoters have four consensus sequences in the ZSCAN22 to A1BG UTR of A1BG (occurrence 2.0). There are no core or proximal promoter consensus sequences. There are two distal promoter sequences only in the positive direction for an occurrence of 1.0.
AGTAGT.<ref name=Samarsky/>


The random datasets had two in the UTR for an occurrence of 0.2. No occurrences in the core promoters. Two in the proximal promoters (arbitrary positive direction) for an occurrence of 0.1. In the distal promoters there were four in the arbitrary negative direction for an occurrence of 0.4 and four in the arbitrary positive direction for an occurrence of 0.4.
{|class="wikitable"
|-
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|-
| Reals || UTR || negative || 4 || 2 || 2 || 2
|-
| Randoms || UTR || arbitrary negative || 2 || 10 || 0.2 || 0.3
|-
| Randoms || UTR || alternate negative || 4 || 10 || 0.4 || 0.3
|-
| Reals || Core || negative || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Core || positive || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary positive || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate positive || 0 || 10 || 0 || 0
|-
| Reals || Proximal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Proximal || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Proximal || positive || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary positive || 2 || 10 || 0.2 || 0.1  
|-
| Randoms || Proximal || alternate positive || 0 || 10 || 0 || 0.1
|-
| Reals || Distal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Distal || arbitrary negative || 4 || 10 || 0.4 || 0.3
|-
| Randoms || Distal || alternate negative || 2 || 10 || 0.2 || 0.3
|-
| Reals || Distal || positive || 2 || 2 || 1 || 1
|-
| Randoms || Distal || arbitrary positive || 4 || 10 || 0.4 || 0.5
|-
| Randoms || Distal || alternate positive || 6 || 10 || 0.6 || 0.5
|}


There is a wide discrepancy between the real occurrences and the random occurrences to suggest that the real occurrences are likely active or activable.
Comparison:
 
The occurrences of real Cbox(Samarsky)s are greater than the randoms. This suggests that the real Cbox(Samarsky)s are likely active or activable.


==Voronina C box samplings==
==Voronina C box samplings==
Line 190: Line 252:
# inverse, positive strand, positive direction, looking for 5'-GTAGTGG-3', 0.
# inverse, positive strand, positive direction, looking for 5'-GTAGTGG-3', 0.


===Voronina C box UTRs===
===CV box (4560-2846) UTRs===
{{main|UTR promoter gene transcriptions}}
 
# Positive strand, negative direction: GGTGATG at 3798.
# Positive strand, negative direction: GGTGATG at 3798.


===CVbox random dataset samplings===
==CVbox random dataset samplings==


# CVboxr0: 0.
# CVboxr0: 0.
Line 217: Line 279:
# CVboxr9ci: 0.
# CVboxr9ci: 0.


===CVboxr UTRs===
===CVboxr arbitrary (evens) (4560-2846) UTRs===
{{main|UTR promoter gene transcriptions}}
 
# CVboxr4ci: 1, CATCACC at 3456.
# CVboxr4ci: CATCACC at 3456.
 
===CVboxr alternate negative direction (odds) (2811-2596) proximal promoters===
 
# CVboxr5: GGTGATG at 2781.
 
===CVboxr arbitrary negative direction (evens) (2596-1) distal promoters===


===CVboxr distal promoters===
{{main|Distal promoter gene transcriptions}}
# CVboxr4: GGTGATG at 2498, GGTGATG at 1821.
# CVboxr4: GGTGATG at 2498, GGTGATG at 1821.
# CVboxr8ci: CATCACC at 808.
# CVboxr8ci: CATCACC at 808.


===CVboxr arbitrary positive direction (odds) (4050-1) distal promoters===


# CVboxr5: GGTGATG at 2781.
# CVboxr5: GGTGATG at 2781.
===CVboxr alternate positive direction (evens) (4050-1) distal promoters===
# CVboxr4: GGTGATG at 2498, GGTGATG at 1821.
# CVboxr4ci: CATCACC at 3456.
# CVboxr8ci: CATCACC at 808.


==Voronina C box analysis and results==
==Voronina C box analysis and results==
{{main|Complex locus A1BG and ZNF497#C boxes (Voronina)}}
{{main|Complex locus A1BG and ZNF497#C boxes (Voronina)}}
Described by Voronina (GGTGATG, positive strand, negative direction at 3798).<ref name=Voronina/>
{|class="wikitable"
|-
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|-
| Reals || UTR || negative || 1 || 2 || 0.5 || 0.5
|-
| Randoms || UTR || arbitrary negative || 1 || 10 || 0.1 || 0.05
|-
| Randoms || UTR || alternate negative || 0 || 10 || 0 || 0.05
|-
| Reals || Core || negative || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Core || positive || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary positive || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate positive || 0 || 10 || 0 || 0
|-
| Reals || Proximal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary negative || 0 || 10 || 0 || 0.05
|-
| Randoms || Proximal || alternate negative || 1 || 10 || 0.1 || 0.05
|-
| Reals || Proximal || positive || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary positive || 0 || 10 || 0 || 0
|-
| Randoms || Proximal || alternate positive || 0 || 10 || 0 || 0
|-
| Reals || Distal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Distal || arbitrary negative || 3 || 10 || 0.3 || 0.15
|-
| Randoms || Distal || alternate negative || 0 || 10 || 0 || 0.15
|-
| Reals || Distal || positive || 0 || 2 || 0 || 0
|-
| Randoms || Distal || arbitrary positive || 1 || 10 || 0.1 || 0.25
|-
| Randoms || Distal || alternate positive || 4 || 10 || 0.4 || 0.25
|}
Comparison:
The occurrences of real CV boxes are greater than the randoms. This suggests that the real CV boxs are likely active or activable.


==Song C-boxes==
==Song C-boxes==
Line 268: Line 393:
# complement, positive strand, positive direction, looking for 5'-CTGCAG-3', 0.
# complement, positive strand, positive direction, looking for 5'-CTGCAG-3', 0.


===Song C-box UTRs===
===Cbox (4560-2846) UTRs===
{{main|UTR promoter gene transcriptions}}
Negative strand, negative direction: GACGTC at 4316.


===Song C-box core promoters===
# Negative strand, negative direction: GACGTC at 4316.
{{main|Core promoter gene transcriptions}}
Positive strand, positive direction: GACGTC at 4316.


===Song C-box distal promoters===
===Cbox positive direction (4050-1) distal promoters===
{{main|Distal promoter gene transcriptions}}
 
Positive strand, positive direction: GACGTC at 3280, GACGTC at 3231, GACGTC at 2858, GACGTC at 1506, GACGTC at 1120, GACGTC at 532, GACGTC at 437, GACGTC at 193.
# Positive strand, positive direction: GACGTC at 3280, GACGTC at 3231, GACGTC at 2858, GACGTC at 1506, GACGTC at 1120, GACGTC at 532, GACGTC at 437, GACGTC at 193.


===C-box random dataset samplings===
==C-box random dataset samplings==


# C-boxr0: 1, GACGTC at 2538.
# C-boxr0: 1, GACGTC at 2538.
Line 293: Line 414:
# C-boxr9: 0.
# C-boxr9: 0.


===C-boxr UTRs===
===C-boxr arbitrary (evens) (4560-2846) UTRs===
{{main|UTR promoter gene transcriptions}}
 
# C-boxr8: GACGTC at 4066, GACGTC at 3543.
# C-boxr8: GACGTC at 4066, GACGTC at 3543.


===C-boxr distal promoters===
===C-boxr alternate positive direction (evens) (4265-4050) proximal promoters===
{{main|Distal promoter gene transcriptions}}
 
# C-boxr0: 1, GACGTC at 2538.
# C-boxr8: GACGTC at 4066.
# C-boxr2: 1, GACGTC at 1604.
 
# C-boxr4: 1, GACGTC at 2314.
===C-boxr arbitrary negative direction (evens) (2596-1) distal promoters===
# C-boxr6: 1, GACGTC at 610.
 
# C-boxr0: GACGTC at 2538.
# C-boxr2: GACGTC at 1604.
# C-boxr4: GACGTC at 2314.
# C-boxr6: GACGTC at 610.
 
===C-boxr alternate negative direction (odds) (2596-1) distal promoters===
 
# C-boxr1: GACGTC at 1185.
# C-boxr3: GACGTC at 2584.
# C-boxr7: GACGTC at 697.
 
===C-boxr arbitrary positive direction (odds) (4050-1) distal promoters===
 
# C-boxr1: GACGTC at 1185.
# C-boxr3: GACGTC at 2584.
# C-boxr7: GACGTC at 697.


===C-boxr alternate positive direction (evens) (4050-1) distal promoters===


# C-boxr1: 1, GACGTC at 1185.
# C-boxr0: GACGTC at 2538.
# C-boxr3: 1, GACGTC at 2584.
# C-boxr2: GACGTC at 1604.
# C-boxr7: 1, GACGTC at 697.
# C-boxr4: GACGTC at 2314.
# C-boxr6: GACGTC at 610.
# C-boxr8: GACGTC at 3543.


==Song C box analysis and results==
==C-box (Song) analysis and results==
{{main|Complex locus A1BG and ZNF497#C boxes (Song)}}
{{main|Complex locus A1BG and ZNF497#C boxes (Song)}}
Analysis "of the recombinant (soybean [''Glycine max''] TGACG-motif binding factor 1) STF1 protein revealed the C-box (nGACGTCn) to be a high-affinity binding site (Cheong et al., 1998)."<ref name=Song/>
{|class="wikitable"
|-
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|-
| Reals || UTR || negative || 1 || 2 || 0.5 || 0.5
|-
| Randoms || UTR || arbitrary negative || 2 || 10 || 0.2 || 0.1
|-
| Randoms || UTR || alternate negative || 0 || 10 || 0 || 0.1
|-
| Reals || Core || negative || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Core || positive || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary positive || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate positive || 0 || 10 || 0 || 0
|-
| Reals || Proximal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Proximal || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Proximal || positive || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary positive || 0 || 10 || 0 || 0.05
|-
| Randoms || Proximal || alternate positive || 1 || 10 || 0.1 || 0.05
|-
| Reals || Distal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Distal || arbitrary negative || 4 || 10 || 0.4 || 0.35
|-
| Randoms || Distal || alternate negative || 3 || 10 || 0.3 || 0.35
|-
| Reals || Distal || positive || 8 || 2 || 4 || 4
|-
| Randoms || Distal || arbitrary positive || 3 || 10 || 0.3 || 0.4
|-
| Randoms || Distal || alternate positive || 5 || 10 || 0.5 || 0.4
|}
Comparison:
The occurrences of real C-box (Song)s are greater than the randoms. This suggests that the real C-box (Song)s are likely active or activable.


==Song C box hybrids==
==Song C box hybrids==
Line 378: Line 570:
# CAboxr6ci: 1, ATACGTCA at 838.
# CAboxr6ci: 1, ATACGTCA at 838.


===Hybrid C, G box samplings===
==Hybrid C, G box samplings==


Copying a portion of the consensus sequence for the hybrid C, G box of TGACGTGT and putting it in "⌘F" finds none located between ZSCAN22 and A1BG and none between ZNF497 and A1BG as can be found by the computer programs.
Copying a portion of the consensus sequence for the hybrid C, G box of TGACGTGT and putting it in "⌘F" finds none located between ZSCAN22 and A1BG and none between ZNF497 and A1BG as can be found by the computer programs.
Line 400: Line 592:
# inverse, positive strand, positive direction, looking for TGTGCAGT, 0.
# inverse, positive strand, positive direction, looking for TGTGCAGT, 0.


===CGbox distal promoters===
===CGbox positive direction (4050-1) distal promoters===


# Positive strand, positive direction: ACACGTCA at 3962.
# Positive strand, positive direction: ACACGTCA at 3962.


===CGbox random dataset samplings===
==CGbox random dataset samplings==


# CGboxr0: 0.
# CGboxr0: 0.
Line 427: Line 619:
# CGboxr9ci: 0.
# CGboxr9ci: 0.


===CGboxr distal promoters===
===CGboxr arbitrary negative direction (evens) (2596-1) distal promoters===
{{main|Distal promoter gene transcriptions}}
 
# CGboxr8: TGACGTGT at 915.
# CGboxr8: TGACGTGT at 915.


===Hybrid C, T box samplings===
===CGboxr alternate positive direction (evens) (4050-1) distal promoters===
 
# CGboxr8: TGACGTGT at 915.
 
==CGbox analysis and results==
{{main|Complex locus A1BG and ZNF497#C boxes (Song hybrids)}}
TGACGTGT.<ref name=Song/>
 
{|class="wikitable"
|-
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|-
| Reals || UTR || negative || 0 || 2 || 0 || 0
|-
| Randoms || UTR || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || UTR || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Core || negative || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Core || positive || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary positive || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate positive || 0 || 10 || 0 || 0
|-
| Reals || Proximal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Proximal || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Proximal || positive || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary positive || 0 || 10 || 0 || 0
|-
| Randoms || Proximal || alternate positive || 0 || 10 || 0 || 0
|-
| Reals || Distal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Distal || arbitrary negative || 1 || 10 || 0.1 || 0.05
|-
| Randoms || Distal || alternate negative || 0 || 10 || 0 || 0.05
|-
| Reals || Distal || positive || 1 || 2 || 0.5 || 0.5
|-
| Randoms || Distal || arbitrary positive || 0 || 10 || 0 || 0.05
|-
| Randoms || Distal || alternate positive || 1 || 10 || 0.1 || 0.05
|}
 
Comparison:
 
The occurrences of real CGboxes are greater than the randoms. This suggests that the real CGboxes are likely active or activable.
 
==Hybrid C, T box samplings==


Copying a portion of the consensus sequence for the hybrid C, T box of TGACGTTA and putting it in "⌘F" finds none located between ZSCAN22 and A1BG and none between ZNF497 and A1BG as can be found by the computer programs.
Copying a portion of the consensus sequence for the hybrid C, T box of TGACGTTA and putting it in "⌘F" finds none located between ZSCAN22 and A1BG and none between ZNF497 and A1BG as can be found by the computer programs.
Line 483: Line 734:
==Song C box hybrids analysis and results==
==Song C box hybrids analysis and results==
{{main|Complex locus A1BG and ZNF497#C boxes (Song hybrids)}}
{{main|Complex locus A1BG and ZNF497#C boxes (Song hybrids)}}
The real promoters have no hybrid C/A boxes and the random datasets only had two in the negative direction for an occurrence of 0.2.
The real promoters have only one hybrid C/G box on the positive strand in the positive direction in the distal promoter ACACGTCA at 3962 for an occurrence of 0.5. The random datasets had only one CG box in the arbitrary negative direction in the distal promoter TGACGTGT at 915 for an occurrence of 0.1.
It is suggested that the one C/G box hybrid is likely active or activable.
The real promoters have no C/T box hybrid consensus sequences and the random datasets had two in the negative direction for an occurrence of 0.2.


==Acknowledgements==
==Acknowledgements==

Latest revision as of 18:35, 15 November 2022

Associate Editor(s)-in-Chief: Henry A. Hoff

GAGGCCATCT is a C-box, [...].[1]

"Members of the box C/D snoRNA family, which are the subject of the present report, possess characteristic sequence elements known as box C (UGAUGA) and box D (GUCUGA)."[2]

The human ribosomal protein L11 gene (HRPL11) has [...] two potential snRNA-coding sequences in intron 4: the C box beginning at +4131 (GGTGATG), [...] a D box beginning at +4237 (TCCTG), [...].[3]

Analysis "of the recombinant (soybean [Glycine max] TGACG-motif binding factor 1) STF1 protein revealed the C-box (nGACGTCn) to be a high-affinity binding site (Cheong et al., 1998)."[4]

Hypotheses

  1. The C boxes are not involved in the transcription of A1BG.

Johnson C-box samplings

For the Basic programs SuccessablesCJbox.bas written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. negative strand, negative direction, looking for 5'-GAGGCCATCT-3'[1], 0.
  2. negative strand, positive direction, looking for 5'-GAGGCCATCT-3', 0.
  3. positive strand, negative direction, looking for 5'-GAGGCCATCT-3', 0.
  4. positive strand, positive direction, looking for 5'-GAGGCCATCT-3', 0.
  5. complement, negative strand, negative direction, looking for 5'-CTCCGGTAGA-3', 0.
  6. complement, negative strand, positive direction, looking for 5'-CTCCGGTAGA-3', 0.
  7. complement, positive strand, negative direction, looking for 5'-CTCCGGTAGA-3', 0.
  8. complement, positive strand, positive direction, looking for 5'-CTCCGGTAGA-3', 0.
  9. inverse complement, negative strand, negative direction, looking for 5'-AGATGGCCTC-3', 0.
  10. inverse complement, negative strand, positive direction, looking for 5'-AGATGGCCTC-3', 0.
  11. inverse complement, positive strand, negative direction, looking for 5'-AGATGGCCTC-3', 0.
  12. inverse complement, positive strand, positive direction, looking for 5'-AGATGGCCTC-3', 0.
  13. inverse, negative strand, negative direction, looking for 5'-TCTACCGGAG-3', 0.
  14. inverse, negative strand, positive direction, looking for 5'-TCTACCGGAG-3', 0.
  15. inverse, positive strand, negative direction, looking for 5'-TCTACCGGAG-3', 0.
  16. inverse, positive strand, positive direction, looking for 5'-TCTACCGGAG-3', 0.

snoRNA C box

File:RF00071.jpg
This example of a C/D box is a small nucleolar RNA 73 (snoRNA U73). Credit: Rfam database (RF00071).{{free media}}
File:U14 snoRNA.png
This U14 snoRNA from Saccharomyces cerevisiae shows structure and genomic organization. Credit: Dmitry A.Samarsky, Maurille J.Fournier, Robert H.Singer and Edouard Bertrand.{{fairuse}}

For "box C/D snoRNAs, boxes C and D and an adjoining stem form a vital structure, known as the box C/D motif."[2]

"The [C and D] box elements are essential for snoRNA production [transcription] and for snoRNA-directed modification of rRNA nucleotides."[2]

The "motif is necessary and sufficient for nucleolar targeting, both in yeast and mammals. Moreover, in mammalian cells, RNA is targeted to coiled bodies as well. Thus, the box C/D motif is the first intranuclear RNA trafficking signal identified for an RNA family. Remarkably, it also couples snoRNA localization with synthesis and, most likely, function. The distribution of snoRNA precursors in mammalian cells suggests that this coupling is provided by a specific protein(s) which binds the box C/D motif during or rapidly after snoRNA transcription."[2]

In snoRNA U73 on the right, the C box starting from the left side of the stem consists of nucleotides: ARUGAUGA, and from the right side the D box is AGUCY. In 5' to 3' direction, the D box is YCUGA.

Shown in the second image on the right are the C box (3'-AGUAGU-5'). Substituting T for U yields C box = 3'-AGTAGT-5' in the transcription direction on the template strand.

Samarsky C box samplings

For the Basic programs (starting with SuccessablesCbox.bas or SuccessablesDbox.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. negative strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesCbox--.bas, looking for AGTAGT, 4, AGTAGT at 3521, AGTAGT at 3418, AGTAGT at 2944, AGTAGT at 2888.
  2. negative strand, positive direction: 0.
  3. positive strand, negative direction: 0.
  4. positive strand, positive direction: 1, AGTAGT at 3251.
  5. complement, negative strand, negative direction is SuccessablesCboxc--.bas, looking for 5'-TCATCA-3', 0,
  6. complement, negative strand, positive direction is SuccessablesCboxc-+.bas, looking for 5'-TCATCA-3', 1, 5'-TCATCA-3', 3251,
  7. complement, positive strand, negative direction is SuccessablesCboxc+-.bas, looking for 5'-TCATCA-3', 4, 5'-TCATCA-3', 2888, 5'-TCATCA-3', 2944, 5'-TCATCA-3', 3418, 5'-TCATCA-3', 3521,
  8. complement, positive strand, positive direction is SuccessablesCboxc++.bas, looking for 5'-TCATCA-3', 0,
  9. inverse complement, negative strand, negative direction: 0.
  10. inverse complement, negative strand, positive direction: 0.
  11. inverse complement, positive strand, negative direction: 0.
  12. inverse complement, positive strand, positive direction: 1, ACTACT at 2144.
  13. inverse, negative strand, negative direction, is SuccessablesCboxi--.bas, looking for 5'-TGATGA-3', 0,
  14. inverse, negative strand, positive direction, is SuccessablesCboxi-+.bas, looking for 5'-TGATGA-3', 1, 5'-TGATGA-3', 2144,
  15. inverse, positive strand, negative direction, is SuccessablesCboxi+-.bas, looking for 5'-TGATGA-3', 0,
  16. inverse, positive strand, positive direction, is SuccessablesCboxi++.bas, looking for 5'-TGATGA-3', 0.

BoxC (4560-2846) UTRs

  1. Negative strand, negative direction: AGTAGT at 3521, AGTAGT at 3418, AGTAGT at 2944, AGTAGT at 2888.

BoxC positive direction (4050-1) distal promoters

  1. Positive strand, positive direction: AGTAGT at 3251, ACTACT at 2144.

Cbox (Samarsky) random samplings

  1. Cboxr0: 0.
  2. Cboxr1: 0.
  3. Cboxr2: 0.
  4. Cboxr3: 0.
  5. Cboxr4: 0.
  6. Cboxr5: 1, AGTAGT at 3259.
  7. Cboxr6: 3, AGTAGT at 3454, AGTAGT at 801, AGTAGT at 531.
  8. Cboxr7: 0.
  9. Cboxr8: 0.
  10. Cboxr9: 1, AGTAGT at 16.
  11. Cboxr0ci: 0.
  12. Cboxr1ci: 1, ACTACT at 374.
  13. Cboxr2ci: 2, ACTACT at 317, ACTACT at 314.
  14. Cboxr3ci: 0.
  15. Cboxr4ci: 1, ACTACT at 3715.
  16. Cboxr5ci: 0.
  17. Cboxr6ci: 0.
  18. Cboxr7ci: 2, ACTACT at 4118, ACTACT at 3925.
  19. Cboxr8ci: 0.
  20. Cboxr9ci: 1, ACTACT at 4090.

BoxCr arbitrary (evens) (4560-2846) UTRs

  1. Cboxr6: AGTAGT at 3454.
  2. Cboxr4ci: ACTACT at 3715.

BoxCr alternate (odds) (4560-2846) UTRs

  1. Cboxr5: AGTAGT at 3259.
  2. Cboxr7ci: ACTACT at 4118, ACTACT at 3925.
  3. Cboxr9ci: ACTACT at 4090.

BoxCr arbitrary positive direction (odds) (4265-4050) proximal promoters

  1. Cboxr7ci: ACTACT at 4118.
  2. Cboxr9ci: ACTACT at 4090.

BoxCr arbitrary negative direction (evens) (2596-1) distal promoters

  1. Cboxr6: AGTAGT at 801, AGTAGT at 531.
  2. Cboxr2ci: ACTACT at 317, ACTACT at 314.

BoxCr alternate negative direction (odds) (2596-1) distal promoters

  1. Cboxr9: AGTAGT at 16.
  2. Cboxr1ci: ACTACT at 374.

BoxCr arbitrary positive direction (odds) (4050-1) distal promoters

  1. Cboxr5: AGTAGT at 3259.
  2. Cboxr9: AGTAGT at 16.
  3. Cboxr1ci: ACTACT at 374.
  4. Cboxr7ci: ACTACT at 3925.

BoxCr alternate positive direction (evens) (4050-1) distal promoters

  1. Cboxr6: AGTAGT at 3454, AGTAGT at 801, AGTAGT at 531.
  2. Cboxr2ci: ACTACT at 317, ACTACT at 314.
  3. Cboxr4ci: ACTACT at 3715.

Cbox (Samarsky) analysis and results

AGTAGT.[2]

Reals or randoms Promoters direction Numbers Strands Occurrences Averages (± 0.1)
Reals UTR negative 4 2 2 2
Randoms UTR arbitrary negative 2 10 0.2 0.3
Randoms UTR alternate negative 4 10 0.4 0.3
Reals Core negative 0 2 0 0
Randoms Core arbitrary negative 0 10 0 0
Randoms Core alternate negative 0 10 0 0
Reals Core positive 0 2 0 0
Randoms Core arbitrary positive 0 10 0 0
Randoms Core alternate positive 0 10 0 0
Reals Proximal negative 0 2 0 0
Randoms Proximal arbitrary negative 0 10 0 0
Randoms Proximal alternate negative 0 10 0 0
Reals Proximal positive 0 2 0 0
Randoms Proximal arbitrary positive 2 10 0.2 0.1
Randoms Proximal alternate positive 0 10 0 0.1
Reals Distal negative 0 2 0 0
Randoms Distal arbitrary negative 4 10 0.4 0.3
Randoms Distal alternate negative 2 10 0.2 0.3
Reals Distal positive 2 2 1 1
Randoms Distal arbitrary positive 4 10 0.4 0.5
Randoms Distal alternate positive 6 10 0.6 0.5

Comparison:

The occurrences of real Cbox(Samarsky)s are greater than the randoms. This suggests that the real Cbox(Samarsky)s are likely active or activable.

Voronina C box samplings

For the Basic programs starting with SuccessablesCVbox.bas written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. negative strand, negative direction, looking for 5'-GGTGATG-3'[3], 0.
  2. negative strand, positive direction, looking for 5'-GGTGATG-3', 0.
  3. positive strand, negative direction, looking for 5'-GGTGATG-3', 1, 5'-GGTGATG-3' at 3798.
  4. positive strand, positive direction, looking for 5'-GGTGATG-3', 0.
  5. complement, negative strand, negative direction, looking for 5'-CCACTAC-3', 1, 5'-CCACTAC-3' at 3798.
  6. complement, negative strand, positive direction, looking for 5'-CCACTAC-3', 0.
  7. complement, positive strand, negative direction, looking for 5'-CCACTAC-3', 0.
  8. complement, positive strand, positive direction, looking for 5'-CCACTAC-3', 0.
  9. inverse complement, negative strand, negative direction, looking for 5'-CATCACC-3', 0.
  10. inverse complement, negative strand, positive direction, looking for 5'-CATCACC-3', 0.
  11. inverse complement, positive strand, negative direction, looking for 5'-CATCACC-3', 0.
  12. inverse complement, positive strand, positive direction, looking for 5'-CATCACC-3', 0.
  13. inverse, negative strand, negative direction, looking for 5'-GTAGTGG-3', 0.
  14. inverse, negative strand, positive direction, looking for 5'-GTAGTGG-3', 0.
  15. inverse, positive strand, negative direction, looking for 5'-GTAGTGG-3', 0.
  16. inverse, positive strand, positive direction, looking for 5'-GTAGTGG-3', 0.

CV box (4560-2846) UTRs

  1. Positive strand, negative direction: GGTGATG at 3798.

CVbox random dataset samplings

  1. CVboxr0: 0.
  2. CVboxr1: 0.
  3. CVboxr2: 0.
  4. CVboxr3: 0.
  5. CVboxr4: 2, GGTGATG at 2498, GGTGATG at 1821.
  6. CVboxr5: 1, GGTGATG at 2781.
  7. CVboxr6: 0.
  8. CVboxr7: 0.
  9. CVboxr8: 0.
  10. CVboxr9: 0.
  11. CVboxr0ci: 0.
  12. CVboxr1ci: 0.
  13. CVboxr2ci: 0.
  14. CVboxr3ci: 0.
  15. CVboxr4ci: 1, CATCACC at 3456.
  16. CVboxr5ci: 0.
  17. CVboxr6ci: 0.
  18. CVboxr7ci: 0.
  19. CVboxr8ci: 1, CATCACC at 808.
  20. CVboxr9ci: 0.

CVboxr arbitrary (evens) (4560-2846) UTRs

  1. CVboxr4ci: CATCACC at 3456.

CVboxr alternate negative direction (odds) (2811-2596) proximal promoters

  1. CVboxr5: GGTGATG at 2781.

CVboxr arbitrary negative direction (evens) (2596-1) distal promoters

  1. CVboxr4: GGTGATG at 2498, GGTGATG at 1821.
  2. CVboxr8ci: CATCACC at 808.

CVboxr arbitrary positive direction (odds) (4050-1) distal promoters

  1. CVboxr5: GGTGATG at 2781.

CVboxr alternate positive direction (evens) (4050-1) distal promoters

  1. CVboxr4: GGTGATG at 2498, GGTGATG at 1821.
  2. CVboxr4ci: CATCACC at 3456.
  3. CVboxr8ci: CATCACC at 808.

Voronina C box analysis and results

Described by Voronina (GGTGATG, positive strand, negative direction at 3798).[3]

Reals or randoms Promoters direction Numbers Strands Occurrences Averages (± 0.1)
Reals UTR negative 1 2 0.5 0.5
Randoms UTR arbitrary negative 1 10 0.1 0.05
Randoms UTR alternate negative 0 10 0 0.05
Reals Core negative 0 2 0 0
Randoms Core arbitrary negative 0 10 0 0
Randoms Core alternate negative 0 10 0 0
Reals Core positive 0 2 0 0
Randoms Core arbitrary positive 0 10 0 0
Randoms Core alternate positive 0 10 0 0
Reals Proximal negative 0 2 0 0
Randoms Proximal arbitrary negative 0 10 0 0.05
Randoms Proximal alternate negative 1 10 0.1 0.05
Reals Proximal positive 0 2 0 0
Randoms Proximal arbitrary positive 0 10 0 0
Randoms Proximal alternate positive 0 10 0 0
Reals Distal negative 0 2 0 0
Randoms Distal arbitrary negative 3 10 0.3 0.15
Randoms Distal alternate negative 0 10 0 0.15
Reals Distal positive 0 2 0 0
Randoms Distal arbitrary positive 1 10 0.1 0.25
Randoms Distal alternate positive 4 10 0.4 0.25

Comparison:

The occurrences of real CV boxes are greater than the randoms. This suggests that the real CV boxs are likely active or activable.

Song C-boxes

Analysis "of the recombinant (soybean [Glycine max] TGACG-motif binding factor 1) STF1 protein revealed the C-box (nGACGTCn) to be a high-affinity binding site (Cheong et al., 1998). The HY5 protein interacts with both the G- (CACGTG) and Z- (ATACGTGT) boxes of the light-regulated promoter of RbcS1A (ribulose bisphosphate carboxylase small subunit) and the CHS (chalcone synthase) genes (Ang et al., 1998; Chattopadhyay et al., 1998; Yadav et al., 2002). To test whether STF1 and HY5 have similar DNA-binding properties, the binding properties of each were compared with eight different DNA sequences that represent G-, C-, and C/G-box motifs [TGACGTGT]. C-box sequences carrying the mammalian cAMP responsive element (CRE; TGACGTCA) motif and the Hex sequence (TGACGTGGC), a hybrid C/G-box (Cheong et al., 1998), were high-affinity binding sites for both proteins [...]. No binding or limited binding was observed to as-1 (Lam et al., 1989), nos-1 (Lam et al., 1990), or the AP-1 site (TGACTCA; Kim et al., 1993). Binding to the palindromic G-box (PA G-box, GCCACGTGGC) was moderate. However, binding activity to the G-box of the light-responsive unit 1 (U1) region of the parsley (Petroselinum crispum) CHS promoter (CHS-U1: TCCACGTGGC; Schulze-Lefert et al., 1989) or the G-box of GmAux28 (TCCACGTGTC) was much weaker than to the PA G-box [...]."[4]

The "binding affinities of both bZIP proteins were similar to CREA/T (ATGACGTCAT), a CRE sequence with flanking adenine and thymine (A/T) at positions -4 and +4. [The] bZIP domains of both STF1 and HY5 have similar binding properties for recognizing ACGT-containing elements (ACEs). [Although] the G-box is a known target site for the HY5 protein, the C-box sequences are the preferred binding sites for both STF1 and HY5."[4]

"When analyzed by type of ACE, these sequences can be grouped into four subclasses [...]: C-box, where the C residue comes at the 12 position; a hybrid C/G- box (C/G-box), with G at the 12 position; C/A-box [TGACGTAT], with A at the 12 position; and C/T-box, with T at the 12 position. The C-box subclass contains the largest number of selected binding sites for STF1 (38% at 50 mM KCl and 48% at 150 mM), followed by the C/G- (25.3%) and the C/A-boxes (26%). Only a small number of C/T-boxes [TGACGTTA] (4/100) and non-TGACGT sequences (4/100) were selected."[4]

C-boxes are TCTTACGTCATC, AATGACGTCGAA, TCTCACGTGTGG, TTTGACGTGTGA, GATGACGTCATC, and AGAGACGTCAAC for an apparent consensus sequence of (A/G/T)(A/C/G/T)(A/T)(C/G/T)ACGT(C/G)(A/G/T)(A/G/T)(A/C/G).[4]

Song C-box samplings

For the Basic programs starting with SuccessablesC-box.bas written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. negative strand, negative direction, looking for GACGTC[4], 1, GACGTC at 4316.
  2. negative strand, positive direction, looking for GACGTC, 0.
  3. positive strand, negative direction, looking for GACGTC, 0,
  4. positive strand, positive direction, looking for GACGTC, 9, GACGTC at 4316, GACGTC at 3280, GACGTC at 3231, GACGTC at 2858, GACGTC at 1506, GACGTC at 1120, GACGTC at 532, GACGTC at 437, GACGTC at 193.
  5. inverse complement is the same as the direct consensus sequence.
  6. complement, negative strand, negative direction, looking for 5'-CTGCAG-3', 0,
  7. complement, negative strand, positive direction, looking for 5'-CTGCAG-3', 9, 5'-CTGCAG-3' at 193, 5'-CTGCAG-3' at 437, 5'-CTGCAG-3' at 532, 5'-CTGCAG-3' at 1120, 5'-CTGCAG-3' at 1506, 5'-CTGCAG-3' at 2858, 5'-CTGCAG-3' at 3231, 5'-CTGCAG-3' at 3280, 5'-CTGCAG-3' at 4316.
  8. complement, positive strand, negative direction, looking for 5'-CTGCAG-3', 1, 5'-CTGCAG-3' at 4316.
  9. complement, positive strand, positive direction, looking for 5'-CTGCAG-3', 0.

Cbox (4560-2846) UTRs

  1. Negative strand, negative direction: GACGTC at 4316.

Cbox positive direction (4050-1) distal promoters

  1. Positive strand, positive direction: GACGTC at 3280, GACGTC at 3231, GACGTC at 2858, GACGTC at 1506, GACGTC at 1120, GACGTC at 532, GACGTC at 437, GACGTC at 193.

C-box random dataset samplings

  1. C-boxr0: 1, GACGTC at 2538.
  2. C-boxr1: 1, GACGTC at 1185.
  3. C-boxr2: 1, GACGTC at 1604.
  4. C-boxr3: 1, GACGTC at 2584.
  5. C-boxr4: 1, GACGTC at 2314.
  6. C-boxr5: 0.
  7. C-boxr6: 1, GACGTC at 610.
  8. C-boxr7: 1, GACGTC at 697.
  9. C-boxr8: 2, GACGTC at 4066, GACGTC at 3543.
  10. C-boxr9: 0.

C-boxr arbitrary (evens) (4560-2846) UTRs

  1. C-boxr8: GACGTC at 4066, GACGTC at 3543.

C-boxr alternate positive direction (evens) (4265-4050) proximal promoters

  1. C-boxr8: GACGTC at 4066.

C-boxr arbitrary negative direction (evens) (2596-1) distal promoters

  1. C-boxr0: GACGTC at 2538.
  2. C-boxr2: GACGTC at 1604.
  3. C-boxr4: GACGTC at 2314.
  4. C-boxr6: GACGTC at 610.

C-boxr alternate negative direction (odds) (2596-1) distal promoters

  1. C-boxr1: GACGTC at 1185.
  2. C-boxr3: GACGTC at 2584.
  3. C-boxr7: GACGTC at 697.

C-boxr arbitrary positive direction (odds) (4050-1) distal promoters

  1. C-boxr1: GACGTC at 1185.
  2. C-boxr3: GACGTC at 2584.
  3. C-boxr7: GACGTC at 697.

C-boxr alternate positive direction (evens) (4050-1) distal promoters

  1. C-boxr0: GACGTC at 2538.
  2. C-boxr2: GACGTC at 1604.
  3. C-boxr4: GACGTC at 2314.
  4. C-boxr6: GACGTC at 610.
  5. C-boxr8: GACGTC at 3543.

C-box (Song) analysis and results

Analysis "of the recombinant (soybean [Glycine max] TGACG-motif binding factor 1) STF1 protein revealed the C-box (nGACGTCn) to be a high-affinity binding site (Cheong et al., 1998)."[4]

Reals or randoms Promoters direction Numbers Strands Occurrences Averages (± 0.1)
Reals UTR negative 1 2 0.5 0.5
Randoms UTR arbitrary negative 2 10 0.2 0.1
Randoms UTR alternate negative 0 10 0 0.1
Reals Core negative 0 2 0 0
Randoms Core arbitrary negative 0 10 0 0
Randoms Core alternate negative 0 10 0 0
Reals Core positive 0 2 0 0
Randoms Core arbitrary positive 0 10 0 0
Randoms Core alternate positive 0 10 0 0
Reals Proximal negative 0 2 0 0
Randoms Proximal arbitrary negative 0 10 0 0
Randoms Proximal alternate negative 0 10 0 0
Reals Proximal positive 0 2 0 0
Randoms Proximal arbitrary positive 0 10 0 0.05
Randoms Proximal alternate positive 1 10 0.1 0.05
Reals Distal negative 0 2 0 0
Randoms Distal arbitrary negative 4 10 0.4 0.35
Randoms Distal alternate negative 3 10 0.3 0.35
Reals Distal positive 8 2 4 4
Randoms Distal arbitrary positive 3 10 0.3 0.4
Randoms Distal alternate positive 5 10 0.5 0.4

Comparison:

The occurrences of real C-box (Song)s are greater than the randoms. This suggests that the real C-box (Song)s are likely active or activable.

Song C box hybrids

Hybrid C, A boxes

"When analyzed by type of ACE, these sequences can be grouped into four subclasses [...]: C-box, where the C residue comes at the 12 position; a hybrid C/G- box (C/G-box), with G at the 12 position; C/A-box [TGACGTAT], with A at the 12 position; and C/T-box, with T at the 12 position."[4]

Hybrid C, G boxes

"To test whether STF1 and HY5 have similar DNA-binding properties, the binding properties of each were compared with eight different DNA sequences that represent G-, C-, and C/G-box motifs [TGACGTGT]. C-box sequences carrying the mammalian cAMP responsive element (CRE; TGACGTCA) motif and the Hex sequence (TGACGTGGC), a hybrid C/G-box (Cheong et al., 1998), were high-affinity binding sites for both proteins [...]."[4]

Hybrid C, T boxes

"Only a small number of C/T-boxes [TGACGTTA] (4/100) and non-TGACGT sequences (4/100) were selected."[4]

Song hybrid C box samplings

Hybrid C, A box samplings

Copying a portion of the consensus sequence for the hybrid C, A box of TGACGTAT and putting it in "⌘F" finds none located between ZSCAN22 and A1BG and none between ZNF497 and A1BG as can be found by the computer programs.

For the Basic programs SuccessablesCAbox.bas written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. negative strand, negative direction, looking for TGACGTAT[4], 0.
  2. negative strand, positive direction, looking for TGACGTAT, 0.
  3. positive strand, negative direction, looking for TGACGTAT, 0.
  4. positive strand, positive direction, looking for TGACGTAT, 0.
  5. complement, negative strand, negative direction, looking for ACTGCATA, 0.
  6. complement, negative strand, positive direction, looking for ACTGCATA, 0.
  7. complement, positive strand, negative direction, looking for ACTGCATA, 0.
  8. complement, positive strand, positive direction, looking for ACTGCATA, 0.
  9. inverse complement, negative strand, negative direction, looking for ATACGTCA, 0.
  10. inverse complement, negative strand, positive direction, looking for ATACGTCA, 0.
  11. inverse complement, positive strand, negative direction, looking for ATACGTCA, 0.
  12. inverse complement, positive strand, positive direction, looking for ATACGTCA, 0.
  13. inverse, negative strand, negative direction, looking for TATGCAGT, 0.
  14. inverse, negative strand, positive direction, looking for TATGCAGT, 0.
  15. inverse, positive strand, negative direction, looking for TATGCAGT, 0.
  16. inverse, positive strand, positive direction, looking for TATGCAGT, 0.

CAbox random dataset samplings

  1. CAboxr0: 0.
  2. CAboxr1: 0.
  3. CAboxr2: 0.
  4. CAboxr3: 0.
  5. CAboxr4: 0.
  6. CAboxr5: 0.
  7. CAboxr6: 0.
  8. CAboxr7: 0.
  9. CAboxr8: 0.
  10. CAboxr9: 0.
  11. CAboxr0ci: 0.
  12. CAboxr1ci: 0.
  13. CAboxr2ci: 0.
  14. CAboxr3ci: 0.
  15. CAboxr4ci: 1, ATACGTCA at 901.
  16. CAboxr5ci: 0.
  17. CAboxr6ci: 1, ATACGTCA at 838.
  18. CAboxr7ci: 0.
  19. CAboxr8ci: 0.
  20. CAboxr9ci: 0.

CAboxr distal promoters

  1. CAboxr4ci: 1, ATACGTCA at 901.
  2. CAboxr6ci: 1, ATACGTCA at 838.

Hybrid C, G box samplings

Copying a portion of the consensus sequence for the hybrid C, G box of TGACGTGT and putting it in "⌘F" finds none located between ZSCAN22 and A1BG and none between ZNF497 and A1BG as can be found by the computer programs.

For the Basic programs SuccessablesCGbox.bas written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. negative strand, negative direction, looking for TGACGTGT[4], 0.
  2. negative strand, positive direction, looking for TGACGTGT, 0.
  3. positive strand, negative direction, looking for TGACGTGT, 0.
  4. positive strand, positive direction, looking for TGACGTGT, 0.
  5. complement, negative strand, negative direction, looking for ACTGCACA, 0.
  6. complement, negative strand, positive direction, looking for ACTGCACA, 0.
  7. complement, positive strand, negative direction, looking for ACTGCACA, 0.
  8. complement, positive strand, positive direction, looking for ACTGCACA, 0.
  9. inverse complement, negative strand, negative direction, looking for ACACGTCA, 0.
  10. inverse complement, negative strand, positive direction, looking for ACACGTCA, 0.
  11. inverse complement, positive strand, negative direction, looking for ACACGTCA, 0.
  12. inverse complement, positive strand, positive direction, looking for ACACGTCA, 1, ACACGTCA at 3962.
  13. inverse, negative strand, negative direction, looking for TGTGCAGT, 0.
  14. inverse, negative strand, positive direction, looking for TGTGCAGT, 1, TGTGCAGT at 3962.
  15. inverse, positive strand, negative direction, looking for TGTGCAGT, 0.
  16. inverse, positive strand, positive direction, looking for TGTGCAGT, 0.

CGbox positive direction (4050-1) distal promoters

  1. Positive strand, positive direction: ACACGTCA at 3962.

CGbox random dataset samplings

  1. CGboxr0: 0.
  2. CGboxr1: 0.
  3. CGboxr2: 0.
  4. CGboxr3: 0.
  5. CGboxr4: 0.
  6. CGboxr5: 0.
  7. CGboxr6: 0.
  8. CGboxr7: 0.
  9. CGboxr8: 1, TGACGTGT at 915.
  10. CGboxr9: 0.
  11. CGboxr0ci: 0.
  12. CGboxr1ci: 0.
  13. CGboxr2ci: 0.
  14. CGboxr3ci: 0.
  15. CGboxr4ci: 0.
  16. CGboxr5ci: 0.
  17. CGboxr6ci: 0.
  18. CGboxr7ci: 0.
  19. CGboxr8ci: 0.
  20. CGboxr9ci: 0.

CGboxr arbitrary negative direction (evens) (2596-1) distal promoters

  1. CGboxr8: TGACGTGT at 915.

CGboxr alternate positive direction (evens) (4050-1) distal promoters

  1. CGboxr8: TGACGTGT at 915.

CGbox analysis and results

TGACGTGT.[4]

Reals or randoms Promoters direction Numbers Strands Occurrences Averages (± 0.1)
Reals UTR negative 0 2 0 0
Randoms UTR arbitrary negative 0 10 0 0
Randoms UTR alternate negative 0 10 0 0
Reals Core negative 0 2 0 0
Randoms Core arbitrary negative 0 10 0 0
Randoms Core alternate negative 0 10 0 0
Reals Core positive 0 2 0 0
Randoms Core arbitrary positive 0 10 0 0
Randoms Core alternate positive 0 10 0 0
Reals Proximal negative 0 2 0 0
Randoms Proximal arbitrary negative 0 10 0 0
Randoms Proximal alternate negative 0 10 0 0
Reals Proximal positive 0 2 0 0
Randoms Proximal arbitrary positive 0 10 0 0
Randoms Proximal alternate positive 0 10 0 0
Reals Distal negative 0 2 0 0
Randoms Distal arbitrary negative 1 10 0.1 0.05
Randoms Distal alternate negative 0 10 0 0.05
Reals Distal positive 1 2 0.5 0.5
Randoms Distal arbitrary positive 0 10 0 0.05
Randoms Distal alternate positive 1 10 0.1 0.05

Comparison:

The occurrences of real CGboxes are greater than the randoms. This suggests that the real CGboxes are likely active or activable.

Hybrid C, T box samplings

Copying a portion of the consensus sequence for the hybrid C, T box of TGACGTTA and putting it in "⌘F" finds none located between ZSCAN22 and A1BG and none between ZNF497 and A1BG as can be found by the computer programs.

For the Basic programs SuccessablesCTbox.bas written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. negative strand, negative direction, looking for TGACGTTA[4], 0.
  2. negative strand, positive direction, looking for TGACGTTA, 0.
  3. positive strand, negative direction, looking for TGACGTTA, 0.
  4. positive strand, positive direction, looking for TGACGTTA, 0.
  5. complement, negative strand, negative direction, looking for ACTGCAAT, 0.
  6. complement, negative strand, positive direction, looking for ACTGCAAT, 0.
  7. complement, positive strand, negative direction, looking for ACTGCAAT, 0.
  8. complement, positive strand, positive direction, looking for ACTGCAAT, 0.
  9. inverse complement, negative strand, negative direction, looking for TAACGTCA, 0.
  10. inverse complement, negative strand, positive direction, looking for TAACGTCA, 0.
  11. inverse complement, positive strand, negative direction, looking for TAACGTCA, 0.
  12. inverse complement, positive strand, positive direction, looking for TAACGTCA, 0.
  13. inverse, negative strand, negative direction, looking for ATTGCAGT, 0.
  14. inverse, negative strand, positive direction, looking for ATTGCAGT, 0.
  15. inverse, positive strand, negative direction, looking for ATTGCAGT, 0.
  16. inverse, positive strand, positive direction, looking for ATTGCAGT, 0.

CTbox random dataset samplings

  1. CTboxr0: 0.
  2. CTboxr1: 0.
  3. CTboxr2: 0.
  4. CTboxr3: 0.
  5. CTboxr4: 0.
  6. CTboxr5: 0.
  7. CTboxr6: 0.
  8. CTboxr7: 0.
  9. CTboxr8: 0.
  10. CTboxr9: 0.
  11. CTboxr0ci: 0.
  12. CTboxr1ci: 0.
  13. CTboxr2ci: 0.
  14. CTboxr3ci: 0.
  15. CTboxr4ci: 1, TAACGTCA at 2405.
  16. CTboxr5ci: 0.
  17. CTboxr6ci: 1, TAACGTCA at 1638.
  18. CTboxr7ci: 0.
  19. CTboxr8ci: 0.
  20. CTboxr9ci: 0.

CTboxr distal promoters

  1. CTboxr4ci: 1, TAACGTCA at 2405.
  2. CTboxr6ci: 1, TAACGTCA at 1638.

Song C box hybrids analysis and results

The real promoters have no hybrid C/A boxes and the random datasets only had two in the negative direction for an occurrence of 0.2.

The real promoters have only one hybrid C/G box on the positive strand in the positive direction in the distal promoter ACACGTCA at 3962 for an occurrence of 0.5. The random datasets had only one CG box in the arbitrary negative direction in the distal promoter TGACGTGT at 915 for an occurrence of 0.1.

It is suggested that the one C/G box hybrid is likely active or activable.

The real promoters have no C/T box hybrid consensus sequences and the random datasets had two in the negative direction for an occurrence of 0.2.

Acknowledgements

The content on this page was first contributed by: Henry A. Hoff.

Initial content for this page in some instances came from Wikiversity.

See also

References

  1. 1.0 1.1 PA Johnson, D Bunick, NB Hecht (1991). "Protein Binding Regions in the Mouse and Rat Protamine-2 Genes" (PDF). Biology of Reproduction. 44 (1): 127–134. Retrieved 6 April 2019.
  2. 2.0 2.1 2.2 2.3 2.4 Dmitry A. Samarsky, Maurille J.Fournier, Robert H.Singer and Edouard Bertrand (1 July 1998). "The snoRNA box C/D motif directs nucleolar targeting and also couples snoRNA synthesis and localization" (PDF). The European Molecular Biology Organization (EMBO) Journal. 17 (13): 3747–3757. doi:10.1093/emboj/17.13.3747. PMID 9649444. Retrieved 2017-02-04.
  3. 3.0 3.1 3.2 E. N. Voronina, T. D. Kolokol’tsova, E. A. Nechaeva, and M. L. Filipenko (2003). "Structural–Functional Analysis of the Human Gene for Ribosomal Protein L11" (PDF). Molecular Biology. 37 (3): 362–371. Retrieved 11 April 2019.
  4. 4.00 4.01 4.02 4.03 4.04 4.05 4.06 4.07 4.08 4.09 4.10 4.11 4.12 4.13 Young Hun Song, Cheol Min Yoo, An Pio Hong, Seong Hee Kim, Hee Jeong Jeong, Su Young Shin, Hye Jin Kim, Dae-Jin Yun, Chae Oh Lim, Jeong Dong Bahk, Sang Yeol Lee, Ron T. Nagao, Joe L. Key, and Jong Chan Hong (April 2008). "DNA-Binding Study Identifies C-Box and Hybrid C/G-Box or C/A-Box Motifs as High-Affinity Binding Sites for STF1 and LONG HYPOCOTYL5 Proteins" (PDF). Plant Physiology. 146 (4): 1862–1877. doi:10.1104/pp.107.113217. Retrieved 26 March 2019.

External links