D box gene transcriptions
Associate Editor(s)-in-Chief: Henry A. Hoff
For "box C/D snoRNAs, boxes C and D and an adjoining stem form a vital structure, known as the box C/D motif."[1]
In snoRNA U73 on the right, from the right side, the D box is AGUCY. In 5' to 3' direction, the D box is YCUGA.
Degenerate nucleotides
For transcription, U (in RNA) is T, Y=(C or T) and R=(A or G).
Consensus sequences
Shown in the image on the right is the D box (3'-AGUCUG-5'). Substituting T for U yields D box = 3'-AGTCTG-5' in the transcription direction on the template strand.
"Members of the box C/D snoRNA family, which are the subject of the present report, possess characteristic sequence elements known as box C (UGAUGA) and box D (GUCUGA)."[1]
D-box (TGAGTGG).[2]
Hypotheses
- The D boxes are not involved in the transcription of A1BG.
- The promoters of A1BG do not contain a Samarsky D box.
Dbox (Samarsky) samplings
For the Basic programs (starting with SuccessablesDbox.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:
- Negative strand, negative direction: AGTCTG at 2947.
- Negative strand, positive direction: AGTCTG at 3923.
- Positive strand, negative direction: AGTCTG at 1355.
- Positive strand, positive direction: 0.
- inverse complement, negative strand, negative direction: 0,
- inverse complement, negative strand, positive direction: CAGACT at 1744, CAGACT at 2416.
- inverse complement, positive strand, negative direction: CAGACT at 15, CAGACT at 1616,
- inverse complement, positive strand, positive direction: CAGACT at 2943, CAGACT at 3006, CAGACT at 3924.
DboxS (4560-2846) UTRs
- Negative strand, negative direction: AGTCTG at 2947.
DboxS negative direction (2596-1) distal promoters
- Positive strand, negative direction: CAGACT at 1616, AGTCTG at 1355, CAGACT at 15.
DboxS positive direction (4050-1) distal promoters
- Negative strand, positive direction: AGTCTG at 3923, CAGACT at 2416, CAGACT at 1744.
- Positive strand, positive direction: CAGACT at 3924, CAGACT at 3006, CAGACT at 2943.
Samarsky random dataset samplings
- Dboxr0: 1, AGTCTG at 4073.
- Dboxr1: 0.
- Dboxr2: 0.
- Dboxr3: 1, AGTCTG at 1984.
- Dboxr4: 0.
- Dboxr5: 1, AGTCTG at 2334.
- Dboxr6: 1, AGTCTG at 804.
- Dboxr7: 0.
- Dboxr8: 1, AGTCTG at 587.
- Dboxr9: 4, AGTCTG at 3816, AGTCTG at 1207, AGTCTG at 111, AGTCTG at 36.
- Dboxr0ci: 1, CAGACT at 1616.
- Dboxr1ci: 1, CAGACT at 1754.
- Dboxr2ci: 1, CAGACT at 355.
- Dboxr3ci: 0.
- Dboxr4ci: 0.
- Dboxr5ci: 0.
- Dboxr6ci: 0.
- Dboxr7ci: 0.
- Dboxr8ci: 0.
- Dboxr9ci: 0.
DboxSr arbitrary (evens) (4560-2846) UTRs
- Dboxr0: AGTCTG at 4073.
DboxSr alternate (odds) (4560-2846) UTRs
- Dboxr9: AGTCTG at 3816.
DboxSr alternate positive direction (evens) (4265-4050) proximal promoters
- Dboxr0: AGTCTG at 4073.
DboxSr arbitrary negative direction (evens) (2596-1) distal promoters
- Dboxr6: AGTCTG at 804.
- Dboxr8: AGTCTG at 587.
- Dboxr0ci: CAGACT at 1616.
- Dboxr2ci: CAGACT at 355.
Dboxr alternate negative direction (odds) (2596-1) distal promoters
- Dboxr3: AGTCTG at 1984.
- Dboxr5: AGTCTG at 2334.
- Dboxr9: AGTCTG at 1207, AGTCTG at 111, AGTCTG at 36.
- Dboxr1ci: CAGACT at 1754.
DboxSr arbitrary positive direction (odds) (4050-1) distal promoters
- Dboxr3: AGTCTG at 1984.
- Dboxr5: AGTCTG at 2334.
- Dboxr9: AGTCTG at 3816, AGTCTG at 1207, AGTCTG at 111, AGTCTG at 36.
- Dboxr1ci: CAGACT at 1754.
DboxSr alternate positive direction (evens) (4050-1) distal promoters
- Dboxr6: AGTCTG at 804.
- Dboxr8: AGTCTG at 587.
- Dboxr0ci: CAGACT at 1616.
- Dboxr2ci: CAGACT at 355.
Dbox (Samarsky) analysis and results
The D box (AGUCUG) is determined by substituting T for U to yield D box = AGTCTG in the transcription direction.[1]
Reals or randoms | Promoters | direction | Numbers | Strands | Occurrences | Averages (± 0.1) |
---|---|---|---|---|---|---|
Reals | UTR | negative | 1 | 2 | 0.5 | 0.5 |
Randoms | UTR | arbitrary negative | 1 | 10 | 0.1 | 0.1 |
Randoms | UTR | alternate negative | 1 | 10 | 0.1 | 0.1 |
Reals | Core | negative | 0 | 2 | 0 | 0 |
Randoms | Core | arbitrary negative | 0 | 10 | 0 | 0 |
Randoms | Core | alternate negative | 0 | 10 | 0 | 0 |
Reals | Core | positive | 0 | 2 | 0 | 0 |
Randoms | Core | arbitrary positive | 0 | 10 | 0 | 0 |
Randoms | Core | alternate positive | 0 | 10 | 0 | 0 |
Reals | Proximal | negative | 0 | 2 | 0 | 0 |
Randoms | Proximal | arbitrary negative | 0 | 10 | 0 | 0 |
Randoms | Proximal | alternate negative | 0 | 10 | 0 | 0 |
Reals | Proximal | positive | 0 | 2 | 0 | 0 |
Randoms | Proximal | arbitrary positive | 0 | 10 | 0 | 0.05 |
Randoms | Proximal | alternate positive | 1 | 10 | 0.1 | 0.05 |
Reals | Distal | negative | 3 | 2 | 1.5 | 1.5 |
Randoms | Distal | arbitrary negative | 4 | 10 | 0.4 | 0.5 |
Randoms | Distal | alternate negative | 6 | 10 | 0.6 | 0.5 |
Reals | Distal | positive | 6 | 2 | 3 | 3 |
Randoms | Distal | arbitrary positive | 7 | 10 | 0.7 | 0.55 |
Randoms | Distal | alternate positive | 4 | 10 | 0.4 | 0.55 |
Comparison:
The occurrences of real DboxSs are greater than the randoms. This suggests that the real DboxSs are likely active or activable.
D boxes
The human ribosomal protein L11 gene (HRPL11) has a potential snRNA-coding sequences in intron 4: a D box beginning at +4237 (TCCTG).[3]
D box (Voronina) samplings
For the Basic programs testing consensus sequence TCCTG (starting with SuccessablesAAA.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:
- negative strand, negative direction, looking for TCCTG, 4, TCCTG at 4467, TCCTG at 3755, TCCTG at 3639, TCCTG at 3388, and complements.
- negative strand, positive direction, looking for TCCTG, 10, TCCTG at 4408, TCCTG at 4185, TCCTG at 3621, TCCTG at 3295, TCCTG at 2519, TCCTG at 2500, TCCTG at 2210, TCCTG at 1775, TCCTG at 1117, TCCTG at 143, and complements.
- positive strand, negative direction, looking for TCCTG, 5, TCCTG at 4545, TCCTG at 3905, TCCTG at 1910, TCCTG at 1840, TCCTG at 595, and complements.
- positive strand, positive direction, looking for TCCTG, 4, TCCTG at 4251, TCCTG at 3130, TCCTG at 2459, TCCTG at 1669, and complements.
- inverse complement, negative strand, negative direction, looking for CAGGA, 0.
- inverse complement, negative strand, positive direction, looking for CAGGA, 7, CAGGA at 3869, CAGGA at 3572, CAGGA at 3129, CAGGA at 2746, CAGGA at 2621, CAGGA at 708, CAGGA at 425, and complements.
- inverse complement, positive strand, negative direction, looking for CAGGA, 23, CAGGA at 4437, CAGGA at 4283, CAGGA at 4171, CAGGA at 4139, CAGGA at 3250, CAGGA at 3218, CAGGA at 3111, CAGGA at 2690, CAGGA at 2588, CAGGA at 2368, CAGGA at 2251, CAGGA at 2135, CAGGA at 1942, CAGGA at 1824, CAGGA at 1289, CAGGA at 1276, CAGGA at 998, CAGGA at 985, CAGGA at 851, CAGGA at 832, CAGGA at 715, CAGGA at 579, CAGGA at 442, and complements.
- inverse complement, positive strand, positive direction, looking for CAGGA, 5, CAGGA at 3864, CAGGA at 3620, CAGGA at 2999, CAGGA at 758, CAGGA at 219, and complements.
DboxV (4560-2846) UTRs
- Negative strand, negative direction: TCCTG at 4467, TCCTG at 3755, TCCTG at 3639, TCCTG at 3388.
- Positive strand, negative direction: TCCTG at 4545, CAGGA at 4437, CAGGA at 4283, CAGGA at 4171, CAGGA at 4139, TCCTG at 3905, CAGGA at 3250, CAGGA at 3218, CAGGA at 3111.
DboxV positive direction (4445-4265) core promoters
- Negative strand, positive direction: TCCTG at 4408.
- Positive strand, positive direction: TCCTG at 4251, TCCTG at 3130, TCCTG at 2459, TCCTG at 1669.
DboxV negative direction (2811-2596) proximal promoters
- Positive strand, negative direction: CAGGA at 2690.
DboxV positive direction (4265-4050) proximal promoters
- Negative strand, positive direction: TCCTG at 4185.
- Positive strand, positive direction: TCCTG at 4251.
DboxV negative direction (2596-1) distal promoters
- Positive strand, negative direction: CAGGA at 2588, CAGGA at 2368, CAGGA at 2251, CAGGA at 2135, CAGGA at 1942, TCCTG at 1910, TCCTG at 1840, CAGGA at 1824, CAGGA at 1289, CAGGA at 1276, CAGGA at 998, CAGGA at 985, CAGGA at 851, CAGGA at 832, CAGGA at 715, TCCTG at 595, CAGGA at 579, CAGGA at 442.
DboxV positive direction (4050-1) distal promoters
- Negative strand, positive direction: TCCTG at 3621, TCCTG at 3295, TCCTG at 2519, TCCTG at 2500, TCCTG at 2210, TCCTG at 1775, TCCTG at 1117, TCCTG at 143.
- Negative strand, positive direction: CAGGA at 3869, CAGGA at 3572, CAGGA at 3129, CAGGA at 2746, CAGGA at 2621, CAGGA at 708, CAGGA at 425.
- Positive strand, positive direction: TCCTG at 3130, TCCTG at 2459, TCCTG at 1669.
- Positive strand, positive direction: CAGGA at 3864, CAGGA at 3620, CAGGA at 2999, CAGGA at 758, CAGGA at 219.
D box (Voronina) random dataset samplings
- DVor0: 8, TCCTG at 4018, TCCTG at 2252, TCCTG at 1914, TCCTG at 1550, TCCTG at 1008, TCCTG at 513, TCCTG at 348, TCCTG at 159.
- DVor1: 3, TCCTG at 1801, TCCTG at 1388, TCCTG at 1188.
- DVor2: 4, TCCTG at 2878, TCCTG at 1402, TCCTG at 1203, TCCTG at 724.
- DVor3: 4, TCCTG at 2931, TCCTG at 2508, TCCTG at 2127, TCCTG at 349.
- DVor4: 7, TCCTG at 3918, TCCTG at 3821, TCCTG at 3321, TCCTG at 2668, TCCTG at 2622, TCCTG at 1919, TCCTG at 800.
- DVor5: 3, TCCTG at 4160, TCCTG at 1116, TCCTG at 864.
- DVor6: 5, TCCTG at 4466, TCCTG at 4013, TCCTG at 3240, TCCTG at 3184, TCCTG at 946.
- DVor7: 10, TCCTG at 4133, TCCTG at 4128, TCCTG at 2878, TCCTG at 2785, TCCTG at 2098, TCCTG at 2053, TCCTG at 1578, TCCTG at 1215, TCCTG at 748, TCCTG at 627.
- DVor8: 3, TCCTG at 3448, TCCTG at 3429, TCCTG at 1014.
- DVor9: 2, TCCTG at 1587, TCCTG at 373.
- DVor0ci: 4, CAGGA at 4312, CAGGA at 3138, CAGGA at 1483, CAGGA at 1403.
- DVor1ci: 7, CAGGA at 4309, CAGGA at 3531, CAGGA at 3275, CAGGA at 3139, CAGGA at 2739, CAGGA at 645, CAGGA at 381.
- DVor2ci: 5, CAGGA at 2328, CAGGA at 1600, CAGGA at 985, CAGGA at 574, CAGGA at 492.
- DVor3ci: 5, CAGGA at 2408, CAGGA at 2253, CAGGA at 1525, CAGGA at 1344, CAGGA at 1272.
- DVor4ci: 1, CAGGA at 857.
- DVor5ci: 1, CAGGA at 784.
- DVor6ci: 8, CAGGA at 4256, CAGGA at 4168, CAGGA at 3987, CAGGA at 3260, CAGGA at 2705, CAGGA at 2593, CAGGA at 1223, CAGGA at 419.
- DVor7ci: 2, CAGGA at 4062, CAGGA at 2793.
- DVor8ci: 4, CAGGA at 2161, CAGGA at 868, CAGGA at 372, CAGGA at 22.
- DVor9ci: 2, CAGGA at 3716, CAGGA at 3370.
DboxVr arbitrary (evens) (4560-2846) UTRs
- DVor0: TCCTG at 4018.
- DVor2: TCCTG at 2878.
- DVor4: TCCTG at 3918, TCCTG at 3821, TCCTG at 3321.
- DVor6: TCCTG at 4466, TCCTG at 4013, TCCTG at 3240, TCCTG at 3184.
- DVor8: TCCTG at 3448, TCCTG at 3429.
- DVor0ci: CAGGA at 4312, CAGGA at 3138.
- DVor6ci: CAGGA at 4256, CAGGA at 4168, CAGGA at 3987, CAGGA at 3260.
DboxVr alternate (odds) (4560-2846) UTRs
- DVor3: TCCTG at 2931.
- DVor5: TCCTG at 4160.
- DVor7: TCCTG at 4133, TCCTG at 4128, TCCTG at 2878.
- DVor1ci: CAGGA at 4309, CAGGA at 3531, CAGGA at 3275, CAGGA at 3139.
- DVor7ci: CAGGA at 4062.
- DVor9ci: CAGGA at 3716, CAGGA at 3370.
DboxVr arbitrary positive direction (odds) (4445-4265) core promoters
- DVor1ci: CAGGA at 4309.
DboxVr alternate positive direction (evens) (4445-4265) core promoters
- DVor0ci: CAGGA at 4312.
DboxVr arbitrary negative direction (evens) (2811-2596) proximal promoters
- DVor4: TCCTG at 2668, TCCTG at 2622.
- DVor6ci: CAGGA at 2705.
DboxVr alternate negative direction (odds) (2811-2596) proximal promoters
- DVor7: TCCTG at 2785.
- DVor1ci: CAGGA at 2739.
- DVor7ci: CAGGA at 2793.
DboxVr arbitrary positive direction (odds) (4265-4050) proximal promoters
- DVor5: TCCTG at 4160.
- DVor7: TCCTG at 4133, TCCTG at 4128.
- DVor7ci: CAGGA at 4062.
DboxVr alternate positive direction (evens) (4265-4050) proximal promoters
- DVor6ci: CAGGA at 4256, CAGGA at 4168.
DboxVr arbitrary negative direction (evens) (2596-1) distal promoters
- DVor0: TCCTG at 2252, TCCTG at 1914, TCCTG at 1550, TCCTG at 1008, TCCTG at 513, TCCTG at 348, TCCTG at 159.
- DVor2: TCCTG at 1402, TCCTG at 1203, TCCTG at 724.
- DVor4: TCCTG at 1919, TCCTG at 800.
- DVor6: TCCTG at 946.
- DVor8: TCCTG at 1014.
- DVor0ci: CAGGA at 1483, CAGGA at 1403.
- DVor2ci: CAGGA at 2328, CAGGA at 1600, CAGGA at 985, CAGGA at 574, CAGGA at 492.
- DVor4ci: CAGGA at 857.
- DVor6ci: CAGGA at 2593, CAGGA at 1223, CAGGA at 419.
- DVor8ci: CAGGA at 2161, CAGGA at 868, CAGGA at 372, CAGGA at 22.
DboxVr alternate negative direction (odds) (2596-1) distal promoters
- DVor1: TCCTG at 1801, TCCTG at 1388, TCCTG at 1188.
- DVor3: TCCTG at 2508, TCCTG at 2127, TCCTG at 349.
- DVor5: TCCTG at 1116, TCCTG at 864.
- DVor7: TCCTG at 2098, TCCTG at 2053, TCCTG at 1578, TCCTG at 1215, TCCTG at 748, TCCTG at 627.
- DVor9: TCCTG at 1587, TCCTG at 373.
- DVor1ci: CAGGA at 645, CAGGA at 381.
- DVor3ci: CAGGA at 2408, CAGGA at 2253, CAGGA at 1525, CAGGA at 1344, CAGGA at 1272.
- DVor5ci: CAGGA at 784.
DboxVr arbitrary positive direction (odds) (4050-1) distal promoters
- DVor1: TCCTG at 1801, TCCTG at 1388, TCCTG at 1188.
- DVor3: TCCTG at 2931, TCCTG at 2508, TCCTG at 2127, TCCTG at 349.
- DVor5: TCCTG at 1116, TCCTG at 864.
- DVor7: TCCTG at 2878, TCCTG at 2785, TCCTG at 2098, TCCTG at 2053, TCCTG at 1578, TCCTG at 1215, TCCTG at 748, TCCTG at 627.
- DVor9: TCCTG at 1587, TCCTG at 373.
- DVor1ci: CAGGA at 3531, CAGGA at 3275, CAGGA at 3139, CAGGA at 2739, CAGGA at 645, CAGGA at 381.
- DVor3ci: CAGGA at 2408, CAGGA at 2253, CAGGA at 1525, CAGGA at 1344, CAGGA at 1272.
- DVor5ci: CAGGA at 784.
- DVor7ci: CAGGA at 2793.
- DVor9ci: CAGGA at 3716, CAGGA at 3370.
DboxVr alternate positive direction (evens) (4050-1) distal promoters
- DVor0: TCCTG at 4018, TCCTG at 2252, TCCTG at 1914, TCCTG at 1550, TCCTG at 1008, TCCTG at 513, TCCTG at 348, TCCTG at 159.
- DVor2: TCCTG at 2878, TCCTG at 1402, TCCTG at 1203, TCCTG at 724.
- DVor4: TCCTG at 3918, TCCTG at 3821, TCCTG at 3321, TCCTG at 2668, TCCTG at 2622, TCCTG at 1919, TCCTG at 800.
- DVor6: TCCTG at 4013, TCCTG at 3240, TCCTG at 3184, TCCTG at 946.
- DVor8: TCCTG at 3448, TCCTG at 3429, TCCTG at 1014.
- DVor0ci: CAGGA at 3138, CAGGA at 1483, CAGGA at 1403.
- DVor2ci: CAGGA at 2328, CAGGA at 1600, CAGGA at 985, CAGGA at 574, CAGGA at 492.
- DVor4ci: CAGGA at 857.
- DVor6ci: CAGGA at 3987, CAGGA at 3260, CAGGA at 2705, CAGGA at 2593, CAGGA at 1223, CAGGA at 419.
- DVor8ci: CAGGA at 2161, CAGGA at 868, CAGGA at 372, CAGGA at 22.
DboxV analysis and results
The human ribosomal protein L11 gene (HRPL11) has a potential snRNA-coding sequences in intron 4: a D box beginning at +4237 (TCCTG).[3]
Reals or randoms | Promoters | direction | Numbers | Strands | Occurrences | Averages (± 0.1) |
---|---|---|---|---|---|---|
Reals | UTR | negative | 13 | 2 | 6.5 | 6.5 ± 2.5 (--4,+-9) |
Randoms | UTR | arbitrary negative | 17 | 10 | 1.7 | 1.45 ± 0.25 |
Randoms | UTR | alternate negative | 12 | 10 | 1.2 | 1.45 ± 0.25 |
Reals | Core | negative | 0 | 2 | 0 | 0 |
Randoms | Core | arbitrary negative | 0 | 10 | 0 | 0 |
Randoms | Core | alternate negative | 0 | 10 | 0 | 0 |
Reals | Core | positive | 5 | 2 | 2.5 | 2.5 ± 1.5 (-+1,+-4) |
Randoms | Core | arbitrary positive | 1 | 10 | 0.1 | 0.1 |
Randoms | Core | alternate positive | 1 | 10 | 0.1 | 0.1 |
Reals | Proximal | negative | 1 | 2 | 0.5 | 0.5 |
Randoms | Proximal | arbitrary negative | 3 | 10 | 0.3 | 0.3 |
Randoms | Proximal | alternate negative | 3 | 10 | 0.3 | 0.3 |
Reals | Proximal | positive | 2 | 2 | 1 | 1 |
Randoms | Proximal | arbitrary positive | 4 | 10 | 0.4 | 0.3 |
Randoms | Proximal | alternate positive | 2 | 10 | 0.2 | 0.3 |
Reals | Distal | negative | 18 | 2 | 9 | 9 |
Randoms | Distal | arbitrary negative | 29 | 10 | 2.9 | 2.65 ± 0.25 |
Randoms | Distal | alternate negative | 24 | 10 | 2.4 | 2.65 ± 0.25 |
Reals | Distal | positive | 23 | 2 | 11.5 | 11.5 ± 3.5 (-+15,++8) |
Randoms | Distal | arbitrary positive | 34 | 10 | 3.4 | 3.95 ± 0.55 |
Randoms | Distal | alternate positive | 45 | 10 | 4.5 | 3.95 ± 0.55 |
Comparison:
The occurrences of real DboxVs are greater than the randoms. This suggests that the real DboxVs are likely active or activable.
(Johnson) samplings
TCTCACATT(A/C)AATAAGTCA is a D-box.[4]
For the Basic programs testing consensus sequence 5'-TCTCACATT(A/C)AATAAGTCA-3' (starting with SuccessablesAAA.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:
- negative strand, negative direction, looking for 5'-TCTCACATT(A/C)AATAAGTCA-3', 0.
- negative strand, positive direction, looking for 5'-TCTCACATT(A/C)AATAAGTCA-3', 0.
- positive strand, negative direction, looking for 5'-TCTCACATT(A/C)AATAAGTCA-3', 0.
- positive strand, positive direction, looking for 5'-TCTCACATT(A/C)AATAAGTCA-3', 0.
- complement, negative strand, negative direction, looking for 5'-AGAGTGTAA(G/T)TTATTCAGT-3', 0.
- complement, negative strand, positive direction, looking for 5'-AGAGTGTAA(G/T)TTATTCAGT-3', 0.
- complement, positive strand, negative direction, looking for 5'-AGAGTGTAA(G/T)TTATTCAGT-3', 0.
- complement, positive strand, positive direction, looking for 5'-AGAGTGTAA(G/T)TTATTCAGT-3', 0.
- inverse complement, negative strand, negative direction, looking for 5'-TGACTTATT(G/T)AATGTGAGA-3', 0.
- inverse complement, negative strand, positive direction, looking for 5'-TGACTTATT(G/T)AATGTGAGA-3', 0.
- inverse complement, positive strand, negative direction, looking for 5'-TGACTTATT(G/T)AATGTGAGA-3', 0.
- inverse complement, positive strand, positive direction, looking for 5'-TGACTTATT(G/T)AATGTGAGA-3', 0.
- inverse negative strand, negative direction, looking for 5'-ACTGAATAA(A/C)TTACACTCT-3', 0.
- inverse negative strand, positive direction, looking for 5'-ACTGAATAA(A/C)TTACACTCT-3', 0.
- inverse positive strand, negative direction, looking for 5'-ACTGAATAA(A/C)TTACACTCT-3', 0.
- inverse positive strand, positive direction, looking for 5'-ACTGAATAA(A/C)TTACACTCT-3', 0.
(Mracek) samplings
There is another promoter D box, or D-box: "Located in the region [...] is a single D-box element (5′-GTTGTATAAC-3′) with a distinct sequence from that of the functional D-box identified in the per2 promoter (5′-CTTATGTAAA-3′) [21]."[5]
(Mracek1) samplings
For the Basic programs testing consensus sequence 5'-GTTGTATAAC-3' (starting with SuccessablesMra1.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:
- negative strand, negative direction, looking for 5'-GTTGTATAAC-3', 0.
- negative strand, positive direction, looking for 5'-GTTGTATAAC-3', 0.
- positive strand, negative direction, looking for 5'-GTTGTATAAC-3', 0.
- positive strand, positive direction, looking for 5'-GTTGTATAAC-3', 0.
- complement, negative strand, negative direction, looking for 5'-CAACATATTG-3', 0.
- complement, negative strand, positive direction, looking for 5'-CAACATATTG-3', 0.
- complement, positive strand, negative direction, looking for 5'-CAACATATTG-3', 0.
- complement, positive strand, positive direction, looking for 5'-CAACATATTG-3', 0.
- inverse complement, negative strand, negative direction, looking for 5'-GTTATACAAC-3', 0.
- inverse complement, negative strand, positive direction, looking for 5'-GTTATACAAC-3', 0.
- inverse complement, positive strand, negative direction, looking for 5'-GTTATACAAC-3', 0.
- inverse complement, positive strand, positive direction, looking for 5'-GTTATACAAC-3', 0.
- inverse negative strand, negative direction, looking for 5'-CAATATGTTG-3', 0.
- inverse negative strand, positive direction, looking for 5'-CAATATGTTG-3', 0.
- inverse positive strand, negative direction, looking for 5'-CAATATGTTG-3', 0.
- inverse positive strand, positive direction, looking for 5'-CAATATGTTG-3', 0.
(Mracek2) samplings
For the Basic programs testing consensus sequence 5'-CTTATGTAAA-3' (starting with SuccessablesMra2.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:
- negative strand, negative direction, looking for 5'-CTTATGTAAA-3', 0.
- negative strand, positive direction, looking for 5'-CTTATGTAAA-3', 0.
- positive strand, negative direction, looking for 5'-CTTATGTAAA-3', 0.
- positive strand, positive direction, looking for 5'-CTTATGTAAA-3', 0.
- complement, negative strand, negative direction, looking for 5'-GAATACATTT-3', 0.
- complement, negative strand, positive direction, looking for 5'-GAATACATTT-3', 0.
- complement, positive strand, negative direction, looking for 5'-GAATACATTT-3', 0.
- complement, positive strand, positive direction, looking for 5'-GAATACATTT-3', 0.
- inverse complement, negative strand, negative direction, looking for 5'-TTTACATAAG-3', 0.
- inverse complement, negative strand, positive direction, looking for 5'-TTTACATAAG-3', 0.
- inverse complement, positive strand, negative direction, looking for 5'-TTTACATAAG-3', 0.
- inverse complement, positive strand, positive direction, looking for 5'-TTTACATAAG-3', 0.
- inverse negative strand, negative direction, looking for 5'-AAATGTATTC-3', 0.
- inverse negative strand, positive direction, looking for 5'-AAATGTATTC-3', 0.
- inverse positive strand, negative direction, looking for 5'-AAATGTATTC-3', 0.
- inverse positive strand, positive direction, looking for 5'-AAATGTATTC-3', 0.
Consensus sequence (Motojima)
D-box (TGAGTGG).[2]
(Motojima) samplings
Copying the consensus of the D-box: TGAGTGG and putting the sequence in "⌘F" finds no locations between ZSCAN22 and A1BG and one between ZNF497 and A1BG as can be found by the computer programs.
For the Basic programs testing consensus sequence TGAGTGG (starting with SuccessablesMOT.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:
- negative strand, negative direction, looking for TGAGTGG, 0.
- negative strand, positive direction, looking for TGAGTGG, 1, TGAGTGG at 3449.
- positive strand, negative direction, looking for TGAGTGG, 0.
- positive strand, positive direction, looking for TGAGTGG, 0.
- inverse complement, negative strand, negative direction, looking for CCACTCA, 1, CCACTCA at 3827.
- inverse complement, negative strand, positive direction, looking for CCACTCA, 0.
- inverse complement, positive strand, negative direction, looking for CCACTCA, 1, CCACTCA at 4487.
- inverse complement, positive strand, positive direction, looking for CCACTCA, 0.
DboxM (4560-2846) UTRs
- Negative strand, negative direction: CCACTCA at 3827.
- Positive strand, negative direction: CCACTCA at 4487.
DboxM positive direction (4050-1) distal promoters
- Negative strand, positive direction: TGAGTGG at 3449.
Motojima random dataset samplings
- MOTr0: 1, TGAGTGG at 4502.
- MOTr1: 1, TGAGTGG at 4148.
- MOTr2: 0.
- MOTr3: 0.
- MOTr4: 0.
- MOTr5: 0.
- MOTr6: 0.
- MOTr7: 0.
- MOTr8: 0.
- MOTr9: 0.
- MOTr0ci: 0.
- MOTr1ci: 0.
- MOTr2ci: 1, CCACTCA at 1365.
- MOTr3ci: 0.
- MOTr4ci: 0.
- MOTr5ci: 0.
- MOTr6ci: 1, CCACTCA at 1766.
- MOTr7ci: 0.
- MOTr8ci: 0.
- MOTr9ci: 0.
DboxMr arbitrary (evens) (4560-2846) UTRs
- MOTr0: TGAGTGG at 4502.
DboxMr alternate (odds) (4560-2846) UTRs
- MOTr1: TGAGTGG at 4148.
DboxMr arbitrary positive direction (odds) (4265-4050) proximal promoters
- MOTr1: TGAGTGG at 4148.
DboxMr arbitrary negative direction (evens) (2596-1) distal promoters
- MOTr2ci: CCACTCA at 1365.
- MOTr6ci: CCACTCA at 1766.
DboxMr alternate positive direction (evens) (4050-1) distal promoters
- MOTr2ci: CCACTCA at 1365.
- MOTr6ci: CCACTCA at 1766.
D-box (Motojima) analysis and results
D-box (TGAGTGG).[2]
Reals or randoms | Promoters | direction | Numbers | Strands | Occurrences | Averages (± 0.1) |
---|---|---|---|---|---|---|
Reals | UTR | negative | 2 | 2 | 1 | 1 |
Randoms | UTR | arbitrary negative | 1 | 10 | 0.1 | 0.1 |
Randoms | UTR | alternate negative | 1 | 10 | 0.1 | 0.1 |
Reals | Core | negative | 0 | 2 | 0 | 0 |
Randoms | Core | arbitrary negative | 0 | 10 | 0 | 0 |
Randoms | Core | alternate negative | 0 | 10 | 0 | 0 |
Reals | Core | positive | 0 | 2 | 0 | 0 |
Randoms | Core | arbitrary positive | 0 | 10 | 0 | 0 |
Randoms | Core | alternate positive | 0 | 10 | 0 | 0 |
Reals | Proximal | negative | 0 | 2 | 0 | 0 |
Randoms | Proximal | arbitrary negative | 0 | 10 | 0 | 0 |
Randoms | Proximal | alternate negative | 0 | 10 | 0 | 0 |
Reals | Proximal | positive | 0 | 2 | 0 | 0 |
Randoms | Proximal | arbitrary positive | 1 | 10 | 0.1 | 0.05 |
Randoms | Proximal | alternate positive | 0 | 10 | 0 | 0.05 |
Reals | Distal | negative | 0 | 2 | 0 | 0 |
Randoms | Distal | arbitrary negative | 2 | 10 | 0.2 | 0.1 |
Randoms | Distal | alternate negative | 0 | 10 | 0 | 0.1 |
Reals | Distal | positive | 1 | 2 | 0.5 | 0.5 |
Randoms | Distal | arbitrary positive | 0 | 10 | 0 | 0.1 |
Randoms | Distal | alternate positive | 2 | 10 | 0.2 | 0.1 |
Comparison:
The occurrences of real DboxMs are greater than the randoms. This suggests that the real DboxMs are likely active or activable.
(Samarsky) D box analysis and results
For "box C/D snoRNAs, boxes C and D and an adjoining stem form a vital structure, known as the box C/D motif."[1] Adjoining Domain B and overlapping for two nucleotides is Box D: GUCUGA from Domain B where "GU" are also at the end of Domain B, with the inverse being AGUCUG and replacing U with T yields a likely consensus sequence to search for AGTCTG.[1]
The real consensus sequences are AGTCTG at 2947 in the UTR between A1BG and ZSCAN22 with an occurrence of 0.5, three in the distal promoter also in the negative direction for an occurrence of 1.5, and six in the positive direction for an occurrence of 3.0.
The randoms had one in the UTR: AGTCTG at 4073 in the arbitrary negative direction for an occurrence of 0.1, four in the negative direction in the distal promoter for an occurrence of 0.4 and seven in the positive direction for an occurrence of 0.7.
By comparison, the occurrences are systematically higher for the reals than the randoms which suggests that the reals are likely active or activable.
(Voronina) D box analysis and results
The reals have four consensus sequences in the UTR for an occurrence of 2.0.
There is only one core promoter of eight promoters for an occurrence of 0.125.
Proximal promoters have two occurrences among eight possibilities for an occurrence of 0.25.
Distal promoters have twenty-eight consensus sequences in the negative direction for an occurrence 3.5.
In the positive direction has twenty-three consensus sequences in the positive direction for an occurrence 2.875.
The randoms had seventeen UTR consensus sequences for an occurrence of 1.7.
The randoms had one core promoter from twenty opportunities for an occurrence of 0.05.
In the proximal promoters, the randoms had three in the arbitrary negative direction and four in the positive direction for occurrences of 0.3 and 0.4.
For the distal promoters, the negative direction had twenty-nine consensus sequences for an occurrence of 2.9.
In the positive direction, the randoms had thirty-four consensual sequences for an occurrence of 3.4.
In comparison for the distal promoters, the random sequences had approximately the same occurrences as the reals. For the proximal promoters the randoms had slightly more occurrences than the reals. For the core promoters, the randoms had slightly less occurrences. For the UTR the randoms had slightly less occurrences than the reals (1.7 vs. 2.0). Based on the UTR and core promoters it appears that the reals are likely active or activable.
(Motojima) D-box analysis and results
D-box (TGAGTGG).[2]
The real promoters have two inverse complements in the UTR positive strand, negative direction: CCACTCA at 4487 nucleotides from the end of gene ZSCAN22 and negative strand, negative direction: CCACTCA at 3827, for an occurrence of 0.5.
In the distal promoters, there is an inverse complement (ic) between ZNF497 and A1BG negative strand, positive direction: TGAGTGG at 3449 for an occurrence of 0.25.
The random datasets had one UTR TGAGTGG at 4502 for an occurrence of 0.1.
They had one proximal promoter D-box consensus sequence: TGAGTGG at 4148 in the arbitrary positive direction for an occurrence of 0.05.
The distal promoters had two consensus sequence ics: CCACTCA at 1766 and CCACTCA at 1365 for an occurrence of 0.1.
Comparing the two results, the occurrences are higher for the real UTR consensus sequences and the distal promoter consensus sequences than the randoms suggesting that the reals are likely active or activable.
Destruction box
"The ordered progression through the cell cycle depends on regulating the abundance of several proteins through ubiquitin-mediated proteolysis. Degradation is precisely timed and specific. One key component of the degradation system, the anaphase promoting complex (APC), is a ubiquitin protein ligase. It is activated both during mitosis and late in mitosis/G1, by the WD repeat proteins Cdc20 and Cdh1, respectively. These activators target distinct sets of substrates. Cdc20–APC requires a well-defined destruction box (D box), [...]."[6]
The KEN box, lysine glutamate asparagine or AA(A/G)GA(A/G)AA(C/T), serves as a general targeting signal for Cdh1–APC.[6]
"The budding yeast homolog of Cdc20 contains two destruction boxes [...], but the vertebrate homologs lack any motif similar to the R-L-N of the D box."[6]
The destruction box R-L-N[6] is CGN-(C/T)TN-AAN, but for leucine it's TT(A/G) or CTN.
"Selection of APC/C targets is controlled through recognition of short destruction motifs, predominantly the D box and KEN box."[7]
"The classical APC/C degron is the destruction box or D box, a nine-residue motif (RxxLxxI/VxN), first characterized in B-type cyclins as being sufficient for APC/C-mediated ubiquitylation [...], common to most, but not all APC/C substrates. Another APC/C degron, the KEN motif (KENxxxN/D), is often present in APC/C substrates usually in addition to the D box [...]."[7]
Acknowledgements
The content on this page was first contributed by: Henry A. Hoff.
Initial content for this page in some instances came from Wikiversity.
See also
References
- ↑ 1.0 1.1 1.2 1.3 1.4 Dmitry A.Samarsky, Maurille J.Fournier, Robert H.Singer and Edouard Bertrand (1 July 1998). "The snoRNA box C/D motif directs nucleolar targeting and also couples snoRNA synthesis and localization" (PDF). The European Molecular Biology Organization (EMBO) Journal. 17 (13): 3747–3757. doi:10.1093/emboj/17.13.3747. PMID 9649444. Retrieved 2017-02-04.
- ↑ 2.0 2.1 2.2 2.3 Masaru Motojima, Takao Ando and Toshimasa Yoshioka (10 July 2000). "Sp1-like activity mediates angiotensin-II-induced plasminogen-activator inhibitor type-1 (PAI-1) gene expression in mesangial cells" (PDF). Biomedical Journal. 349 (2): 435–441. doi:10.1042/0264-6021:3490435. PMID 10880342. Retrieved 13 August 2020.
- ↑ 3.0 3.1 E. N. Voronina, T. D. Kolokol’tsova, E. A. Nechaeva, and M. L. Filipenko (2003). "Structural–Functional Analysis of the Human Gene for Ribosomal Protein L11" (PDF). Molecular Biology. 37 (3): 362–371. Retrieved 11 April 2019.
- ↑ PA Johnson, D Bunick, NB Hecht (1991). "Protein Binding Regions in the Mouse and Rat Protamine-2 Genes" (PDF). Biology of Reproduction. 44 (1): 127–134. Retrieved 6 April 2019.
- ↑ Philipp Mracek, Cristina Santoriello, M. Laura Idda, Cristina Pagano, Zohar Ben-Moshe, Yoav Gothilf, Daniela Vallone, Nicholas S. Foulkes (December 6, 2012). "Regulation of per and cry Genes Reveals a Central Role for the D-Box Enhancer in Light-Dependent Gene Expression". PLoS ONE. 7 (12): e51278. doi:10.1371/journal.pone.0051278. Retrieved 10 February 2019.
- ↑ 6.0 6.1 6.2 6.3 Cathie M. Pfleger and Marc W. Kirschner (15 March 2000). "The KEN box: an APC recognition signal distinct from the D box targeted by Cdh1". Genes & Development. 14 (6): 655–665. PMID 10733526. Retrieved 10 May 2023.
- ↑ 7.0 7.1 David Barford (27 December 2011). "Structural insights into anaphase-promoting complex function and mechanism". Philosophical Transactions of the Royal Society B: Biological Sciences. 366 (1584): 3605–3624. doi:10.1098/rstb.2011.0069. PMID 22084387. Retrieved 10 May 2023.