Complex locus A1BG and ZNF497: Difference between revisions

Jump to navigation Jump to search
mNo edit summary
 
(256 intermediate revisions by the same user not shown)
Line 9: Line 9:
|title=A1BG alpha-1-B glycoprotein
|title=A1BG alpha-1-B glycoprotein
|url=https://www.ncbi.nlm.nih.gov/gene/1
|url=https://www.ncbi.nlm.nih.gov/gene/1
|accessdate=May 10, 2013 }}</ref> Additionally, A1BG, in current nucleotide numbering (58,345,183-58,353,492), is located adjacent to the ZSCAN22 gene (58,326,994-58,342,332) on the positive DNA strand, as well as the ZNF837 (58,367,623 - 58,381,030, complement) and ZNF497 (58,354,357 - 58,362,751, complement) genes on the negative strand.<ref name=A1BG/> In the current nucleotide numbering, the A1BG untranslated region (UTR) has been expanded so that with ZSCAN22 ending at 58,342,332, the nucleotides used in this study are 58,342,347 to 58,346,897 on both strands, with the current UTR for A1BG beginning at 58,345,183.
|accessdate=May 10, 2013 }}</ref> Additionally, A1BG, in current nucleotide numbering (58,345,183-58,353,492), is located adjacent to the ZSCAN22 gene (58,326,994-58,342,332) on the positive DNA strand, as well as the ZNF837 (58,367,623 - 58,381,030, complement) and ZNF497 (58,354,357 - 58,362,751, complement) genes on the negative strand.<ref name=A1BG/>
 
In the current nucleotide numbering, the A1BG untranslated region (UTR) has been expanded so that with ZSCAN22 ending at 58,342,332, the nucleotides used in this study are 58,342,333 to 58,346,892 on both strands, with the current UTR for A1BG beginning at 58,345,183. On the other side of A1BG ending at 58,353,492, the nucleotides used are 58,353,493 to 58,357,937. With ZNF497 beginning at 58,354,357, this study goes into ZNF497 to 58,357,937 or 3580 nucleotides from its downstream TSS or 4445 nucleotides from the TSS of A1BG downstream from ZNF497.
 
For example, an abscisic acid responsive element (ABRE) with the consensus sequence of  ACGTG(G/T)C (Watanabe ''et al''. 2017) occurs in the positive strand in the negative direction from ZSCAN22 to A1BG as ACGTGGC ending at 4239 nucleotides from the end of ZSCAN22 or 58,346,571, where the A is at 58,346,565 inside the UTR of A1BG.


==Introduction==
==Introduction==
Line 76: Line 80:
===WD-40 repeat family===
===WD-40 repeat family===
{{main|WD-40 repeat family}}
{{main|WD-40 repeat family}}
"Receptor for activated C kinase (RACK1) is a highly conserved, eukaryotic protein of the WD-40 repeat family. [...] During ''Phaseolus vulgaris'' root development, RACK1 (PvRACK1) mRNA expression was induced by auxins, abscissic acid, cytokinin, and gibberellic acid."<ref name=Flores>{{ cite journal
"Receptor for activated C kinase (RACK1) is a highly conserved, eukaryotic protein of the WD-40 repeat family. [...] During ''Phaseolus vulgaris'' root development, RACK1 (PvRACK1) mRNA expression was induced by auxins, abscisic acid, cytokinin, and gibberellic acid."<ref name=Flores>{{ cite journal
|author=Tania Islas-Flores, Gabriel Guillén, Xóchitl Alvarado-Affantranger, Miguel Lara-Flores, Federico Sánchez, and Marco A. Villanueva
|author=Tania Islas-Flores, Gabriel Guillén, Xóchitl Alvarado-Affantranger, Miguel Lara-Flores, Federico Sánchez, and Marco A. Villanueva
|title=PvRACK1 Loss-of-Function Impairs Cell Expansion and Morphogenesis in ''Phaseolus vulgaris'' L. Root Nodules
|title=PvRACK1 Loss-of-Function Impairs Cell Expansion and Morphogenesis in ''Phaseolus vulgaris'' L. Root Nodules
Line 91: Line 95:
|accessdate=25 April 2021 }}</ref>
|accessdate=25 April 2021 }}</ref>


====Abscissic acid (ABA) response elements====
====Abscisic acid (ABA) response elements====
{{main|ABA-response element gene transcriptions}}
{{main|ABA-response element gene transcriptions#ABRE samplings}}


====Auxin response factors====
====Auxin response factors====
{{main|Auxin response factor gene transcriptions}}


In [[A1BG response element positive results]], [[Auxin response factor gene transcriptions|auxin response factors (ARFs)]] have been identified in the UTR for A1BG between ZSCAN22 and A1BG and between ZNF497 and A1BG. ARF5s occur in this side's core promoter or proximal promoter. If these response elements are active then A1BG can be transcribed as a regulatory element in auxin signaling. The "genome binding of two ARFs (ARF2 and ARF5/Monopteros [MP]) differ largely because these two factors have different preferred ARF binding site (ARFbs) arrangements (orientation and spacing)."<ref name=Stigliani>{{ cite journal
=====ARFUs=====
|author=Arnaud Stigliani, Raquel Martin-Arevalillo, Jérémy Lucas, Adrien Bessy, Thomas Vinos-Poyo, Victoria Mironova, Teva Vernoux, Renaud Dumas and François Parcy
{{main|Auxin response factor gene transcriptions#TGTCTC (Ulmasov) ARFbs samplings}}
|title=Capturing Auxin Response Factors Syntax Using DNA Binding Models
|journal=Molecular Plant
|date=3 June 2019
|volume=12
|issue=6
|pages=822-832
|url=https://www.sciencedirect.com/science/article/pii/S167420521830306X
|arxiv=
|bibcode=
|doi=10.1016/j.molp.2018.09.010
|pmid=30336329
|accessdate=29 August 2020 }}</ref>


The position weight matrices (PWMs) used to model ARF DNA binding specificity suggest more general consensus sequences may be (C/G/T)N(G/T)G(C/T)(C/T), where ARF2 is (C/G/T)(A/C/T)(G/T)G(C/T)(C/T)(G/T)(C/G)(A/C/T)(A/G/T) and ARF5/MP is (C/G/T)N(G/T)GTC(G/T).<ref name=Stigliani/> The likely consensus sequence for ARF2 would allow 2592 possible response elements, and that for ARF5/MP would be 48.
=====ARFBs=====
{{main|Auxin response factor gene transcriptions#TGTCGG (Boer) ARFbs samplings}}


=====Ulmasov ARFbs=====
=====ARF2s=====
{{main|Auxin response factor gene transcriptions#ARF (Stigliani) samplings}}


"ARFbs were originally defined as TGTCTC (Ulmasov et al., 1995, Guilfoyle et al., 1998), [...]."<ref name=Stigliani/>
=====ARF5s=====
{{main|Auxin response factor gene transcriptions#ARF5 samplings}}


The consensus sequence found by Ulmasov (1995) TGTCTC occurs in the negative direction for the UTR (four), proximal promoter (one) and distal promoter (twelve) and in the positive direction only in the distal promoter (ten). But, the random datasets had only one on average in the distal promoter for either direction and only 0.2 in the UTR and none in the proximal promoter. This suggests that the occurrences of the consensus sequences of Ulmasov in the promoters of A1BG are real and likely active or activatable.
====CAACTC regulatory elements====
{{main|CARE gene transcriptions}}


=====Boer ARFbs=====
=====CAREs (Fan)=====
{{main|CARE gene transcriptions#CARE (Fan) sampling of A1BG promoters}}


"More recently, protein binding microarray (PBM) experiments suggested that TGTCGG are preferred ARFbs, [...] (Boer et al., 2014, Franco-Zorrilla et al., 2014, Liao et al., 2015)."<ref name=Stigliani/>
=====CAREs (Garaeva)=====
{{main|CARE gene transcriptions#CARE (Garaeva) samplings}}


The consensus sequence of Boer 2014 TGTCGG occurs only once in the UTR in the negative direction and seven times in the positive direction in the distal promoters. The random datasets had about one per dataset in the UTR. The random occurrences of one to three times in the distal promoters for each of ten data sets. While the occurrence in the UTR about matches a random occurrence, the occurrences in the distal promoters are more frequent in the real promoters vs. the random datasets.
====Cytokinins====
{{main|Cytokinin response regulator gene transcriptions}}


=====Stigliani ARF2s=====
=====ARR1s=====
{{main|Cytokinin response regulator gene transcriptions#ARR1 Cytokinin samplings}}


Random sampling (even numbered datasets for ZSCAN22 to A1BG) for UTRs range from two to eight ARFs whereas the two strands have 6 (negative strand) and 12, respectively, in the negative direction. The negative strand, negative direction results (6) for A1BG fall within the range of random results by number of results, whereas the results for the positive strand, negative direction (12) are well outside the number of random results, suggesting they are real. None of the actual nucleotide sequences for either strand, negative direction match any of the random results.
=====ARR10s=====
{{main|Cytokinin response regulator gene transcriptions#ARR10 Cytokinin samplings}}


For ARF core promoters, four of the ten random nucleotide sequences have core promoters, ranging from one to two, correspond (odd numbered random sequences for positive direction, ZNF497 to A1BG) only and none match the nucleotide sequence for the real, negative strand, positive direction.
=====ARR12s=====
{{main|Cytokinin response regulator gene transcriptions#ARR12 Cytokinin samplings}}


Only three of the ten random datasets had results for the proximal promoters, only one result each, in the negative direction and none matched the real result. The positive direction random datasets had results in seven ranging from one to four with no nucleotide sequence matches.
=====ARRFs=====
{{main|Cytokinin response regulator gene transcriptions#ARR (Ferreira) samplings}}


The nucleotide sequences for the distal promoters do not match: the random data sets range from (2 to 9) in number and (5 to 14) but the real sets range from (14 to 38) and (15 to 51), respectively. The real occurrences way outnumber the random results.
=====ARRR1s=====
{{main|Cytokinin response regulator gene transcriptions#ARR (Rashotte1) samplings}}


The results in the promoters of A1BG have about 100 unique nucleotide sequences (Stigliani ''et al'') of the total possible 2592 such sequences per the PMWs assuming a weight of one. The actual weighting is expected to reduce the number of likely sequences resulting in few duplicates between sequences found to occur and those found in the random data sets. Common to both results are only six nucleotide sequences.
=====ARRR2s=====
{{main|Cytokinin response regulator gene transcriptions#ARR (Rashotte2) samplings}}


=====Stigliani ARF5s=====
====Coupling elements====
{{main|Coupling element gene transcriptions}}


The varieties of the short consensus sequence for ARF5/MP (C/G/T)N(G/T)GTC(G/T) have been detected in the UTR (25), proximal (3) and distal (49) promoters for A1BG between ZSCAN22 and the A1BG gene and the core (5), proximal (1) and distal (84) promoters for A1BG on the ZNF497 side.
=====CE3Ws=====
{{main|Coupling element gene transcriptions#CE3 (Watanabe) samplings}}


Random datasets arbitrarily chosen to represent the negative direction (even numbered datasets) and positive direction (odd numbered datasets) have only from one to five for the consensus sequence UTRs and two to nine for the inverse complement sequences. Regarding core promoters, two random datasets had one or two inverse complement nucleotide sequences on the even numbered sites, whereas the positive direction random datasets had two datasets with one and three nucleotide sequences. Random datasets had proximal promoters only on the positive side (one to two). Distal promoter sequences for the negative direction had from two to nine nucleotide sequences. For the positive direction, the consensus sequences ranged from eight to fifteen.
=====CE3Ds=====
{{main|Coupling element gene transcriptions#CE3 (Ding) samplings}}


Starting with the random occurrences among the ARF5 possibilities 68 have duplicates among the random or real datasets. These same duplicates only occur 78 % of the time among the real datasets.
====EREs====
{{main|Ethylene responsive element gene transcriptions#ERE samplings}}


Using the real consensus sequences to look for duplicates first among the other reals then among the randoms found 97 % had duplicates among either the other reals or among the randoms.
====Gibberellic acid response elements====
{{main|GARE gene transcriptions}}


The possible variety of ARF5s within the consensus sequences (C/G/T)N(G/T)GTC(G/T): 3*4*2*1*1*1*2 = 48 plus complement inverses (A/C)GAC(A/C)N(A/C/G): 2*1*1*1*2*4*3 = 48 with duplicates, 96 minus duplicates of some 18 suggests that up to 78 could occur if sampling were large enough.
=====GAREs=====
{{main|GARE gene transcriptions#GARE sampling of A1BG promoters}}


That the randoms and real occurrences do not match up suggests that the reals are not randomly occurring.
=====GAREL1s=====
{{main|GARE gene transcriptions#GARE-like 1 samplings}}


====CAREs (Fan)====
====Hypoxia response elements====
{{main|CARE gene transcriptions}}
{{main|Hypoxia response element gene transcriptions}}


====CAREs (Garaeva)====
=====HIFs=====
{{main|CARE gene transcriptions}}
{{main|Hypoxia response element gene transcriptions#Hypoxia-inducible factor samplings}}


====Cytokinins====
=====HREs=====
{{main|Hypoxia response element gene transcriptions#Hypoxia response element samplings}}


In [[A1BG response element positive results]], several of the [[Cytokinin response regulator gene transcriptions|cytokinin response regulators]]: ARR10s, ARR12s, and those of Rashotte ''et al.'' (2003) have occurrences in the UTR of A1BG from the ZSCAN22 side and in the proximal promoters of A1BG from the ZNF497 side.
=====CACAs=====
{{main|Hypoxia response element gene transcriptions#CACA samplings}}


Any of the randomly generated nucleotide data sets (0 through 9) can be used to represent any of the real data for the response element of interest.
====Pyrimidine boxes====
{{main|Pyrimidine box gene transcriptions|Nuclear factor of activated T cell gene transcriptions (NFAT)}}


Every response element ARR1 has 6 in the distal promoter (2 negative direction and 4 positive direction). Random samplings ranges from 0-3. For the proximal promoter there is one for two datasets out of 20. No random in the core promoter. And, four for two datasets in the UTR out of 20.
====TAT boxes====
{{main|TAT box gene transcriptions}}


ARR10 has only one element in the negative direction in the UTR. Four of the 10 random datasets had 1.25 response elements in the UTR. No sequences in the core promoters, but one sequence in the proximal promoter and many in the distal promoters. The only response element in the A1BG promoters is an inverse complement and five of the six random UTR response elements are inverse complements.
=====TATFs=====
{{main|TAT box gene transcriptions#TAT box (Fan) samplings}}


For ARR12, the random samples have about 2.5 UTR elements, but ARR12 has only one in the UTR. The A1BG has no core promoters on either side but one random sample out of ten has one on the positive direction from ZNF497. The ARR12 in A1BG proximal promoters have none while two out of ten random datasets have one each. In the distal promoters the negative directions averaged four while the random datasets average 1.3. The positive direction has only one and the random datasets averaged 1.3.
=====TATYs=====
{{main|TAT box gene transcriptions#TAT box (Yang) samplings}}


The Rashotte1 ARR (ARRR1) results differ from random datasets: one in the UTR on the ZSCAN22 side versus two (random). No core promoter elements versus one in two of ten (random). Three proximal promoters versus none (random). Distal promoters: one in the negative direction and three in the positive and one to three (random).
===General Regulatory Factors===
{{main|General regulatory factors}}
The following general regulatory factors occur in the promoters between ZSCAN22, A1BG and ZNF497 on human chromosome 19.


For (G/A)GAT(T/C) in ARRR2 (Rashotte2) UTRs, there are none for the negative strand, negative direction, but each random data set finds 2-8. For the positive strand, negative direction, there are seven versus 2-8 from the randomly generated nucleotide data sets (0 through 9).
====Abfms====
{{main|Abf1 regulatory factor gene transcriptions}}


For (A/G)ATC(C/T) in ARRR2 (Rashotte2) UTRs, all four of the negative strand, negative direction results are inverse complements, where the random data sets have (4-9). For the positive strand, negative direction there are eleven, but the random datasets have 2-9.
====Rap1s====
{{main|Rap1 regulatory factor gene transcriptions}}


Neither direction for ARRR2 (Rashotte2) has core promoter elements, while the random results have an average of (1 per data set, 0-1, negative direction, 0-3, positive direction).
====Reb1s====
{{main|Reb1 general regulatory factor gene transcriptions}}


For the negative direction and the proximal promoter, ARRR2 (Rashotte2) produced only one. And, the random datasets have an average of (1 per data set, 0-4).
====Tbf1s====
{{main|Tbf1 regulatory factor gene transcriptions}}


The positive direction, ARRR2 (Rashotte2) has six. The random datasets have about 1 per dataset (0-3).
===Basic leucine zipper (bZIP) class response elements===


The distal promoters, negative direction, ARRR2 (Rashotte2) has 37 and the positive direction 25. The random datasets cover 8-20.
====A-boxes====
{{main|A box gene transcriptions}}


====Coupling elements====
====ACGTs====
{{main|ACGT-containing element gene transcriptions#ACGT samplings}}
"A majority of the plant bZIP proteins isolated to date recognize elements with an ACGT core (Foster et al., 1994)."<ref name=Nijhawan>{{ cite journal | author = Nijhawan A, Jain M, Tyagi AK, Khurana JP
| title = Genomic survey and gene expression analysis of the basic leucine zipper transcription factor family in rice
| journal = Plant Physiology
| volume = 146
| issue = 2
| pages = 333–50
| date = February 2008
| pmid = 18065552
| doi = 10.1104/pp.107.112821 }}</ref>


"In barley, the combination of an ABRE and one of two known coupling elements CE1 (TGCCACCGG) and CE3 (GCGTGTC) constitutes an ABA responsive complex (ABRC) in the regulation of the ABA‐inducible genes HVA1 and HVA22 (Shen and Ho 1995; Shen et al. 1996)."<ref name=Watanabe/>
"Most recombinant bZIP proteins can interact with ACGT elements derived from different plant genes, albeit with different affinity. Systematic protein/DNA binding studies have shown that sequences flanking the ACGT core affect bZIP protein binding specificity. These studies have provided the basis for a concise ACGT nomenclature and defined high-affinity A-box, C-box, and G-box elements."<ref name=Foster>{{ cite journal
 
|author=Randy Foster, Takeshi Izawa and Nam-Hai Chua
"To identify potential ''cis''-regulatory elements in the promoter sequences of ZmGRXCC genes, the 1500 bp sequences of each [maize CC-type glutaredoxin (GRX)] ZmGRXCC gene upstream of the ATG start codon were selected from the maize genome as the promoter, and the promoter sequence was screened using PlantCARE [32]. The elements searched included [...] ABRE (ABA-responsive element, -CACGTG- or -TACGTG-) and CE3 (coupling element 3, -CACGCG-) for ABA responsiveness; [...]."<ref name=Ding>{{ cite journal
|title=Plant bZIP proteins gather at ACGT elements
|author=Shuangcheng Ding, Fengyu He, Wenlin Tang, Hewei Du and Hongwei Wang
|journal=FASEB
|title=Identification of Maize CC-Type Glutaredoxins That Are Associated with Response to Drought Stress
|date=1 February 1994
|journal=Genes
|volume=8
|date=12 August 2019
|issue=2
|volume=10
|pages=192-200
|issue=610
|url=https://faseb.onlinelibrary.wiley.com/doi/pdfdirect/10.1096/fasebj.8.2.8119490
|pages=1-15
|url=https://www.mdpi.com/2073-4425/10/8/610/pdf
|arxiv=
|arxiv=
|bibcode=
|bibcode=
|doi=10.3390/genes10080610
|doi=10.1096/fasebj.8.2.8119490
|pmid=
|pmid=8119490
|accessdate=30 November 2020 }}</ref>
|accessdate=25 June 2021 }}</ref>


The consensus sequence for coupling element 3 (Watanabe) has only one occurrence in the promoters for A1BG: GCGTGTC at 1053 on the positive strand in the positive direction in the distal promoter. For the positive direction, this is an occurrence of 0.5.
"HY5 binds to the promoter of light-responsive genes featuring [[ACGT-containing element gene transcriptions|"ACGT-containing elements"]] such as the G-box (CACGTG), C-box (GACGTC), Z-box (ATACGGT), and A-box (TACGTA) (4, 6)."<ref name=Nawkar>{{ cite journal
|author=Ganesh M. Nawkar, Chang Ho Kanga, Punyakishore Maibam, Joung Hun Park, Young Jun Jung, Ho Byoung Chae, Yong Hun Chi, In Jung Jung, Woe Yeon Kim, Dae-Jin Yun, and Sang Yeol Lee
|title=HY5, a positive regulator of light signaling, negatively controls the unfolded protein response in ''Arabidopsis''
|journal=Proceedings of the National Academy of Sciences USA
|date=21 February 2017
|volume=114
|issue=8
|pages=2084-89
|url=https://www.pnas.org/content/pnas/114/8/2084.full.pdf
|arxiv=
|bibcode=
|doi=10.1073/pnas.1609844114
|pmid=
|accessdate=24 June 2021 }}</ref>


The random datasets had one in the UTR between ZSCAN22 and A1BG: GCGTGTC at 3282, for an occurrence of 0.1. There were two consensus sequences in the distal promoters: GCGTGTC at 1337 and GCGTGTC at 593, both in the negative direction for an overall occurrence of 0.2. These occurrences in the arbitrary negative direction distal promoters are much lower than the real suggesting that the one real occurrence is likely active or activable.
====Activating transcription factors====
{{main|Activating transcription factor gene transcriptions}}


====Ethylene response factors====
=====ATFBs=====
{{main|Activating transcription factor gene transcriptions#Activating transcription factor samplings (Burton)}}


Two [[Ethylene responsive element gene transcriptions|ethylene response factors or ethylene responsive elements]] occur in the promoters of A1BG: positive strand, negative direction: ATTTCAAA at 1383 and negative strand, positive direction: ATTTCAAA at 2648. Both of these occur in the distal promoters on either side of A1BG. Sampling of ten random datasets looking for the ERE and its inverse complement found only one: ATTTCAAA at 934. Half way between ZSCAN22 and A1BG is about 2300 nucleotides and between ZNF497 and A1BG is about 2200. One is closer to ZSCAN22 and the other is closer to A1BG. The random occurrence is closer to the zinc finger than A1BG. The response element closer to A1BG is likely real even if inactive. As the real occurrences are more frequent and nearer mid points than the random result, it is likely that the EREs are not random but real and perhaps active.
=====ATFKs=====
{{main|Activating transcription factor gene transcriptions#Activating transcription factor samplings (Kilberg)}}


====Gibberellic acid response elements====
====Affinity Capture-Western; Two-hybrid transcription factors====
{{main|GARE gene transcriptions}}
{{main|Aft1p gene transcriptions}}
The TAACAAA box (GARE) has an inverse complement TTTGTTA at 230 nucleotides from ZSCAN22 toward A1BG. This is in the distal promoter for A1BG or is a response element for ZSCAN22 rather than A1BG. The GARE-like 1 TTAACA(A/G)A occurs as an inverse complement TTTGTTA at 230 nucleotides from the gene end of ZSCAN22 and may be a response element for ZSCAN22.


Random datasets have been sampled using TAAC(A/G)(A/G/T)A as a general form to test for GARE-like 2 (TAACGTA), GARE-like 1 (TAACA(A/G)A) and GARE (TAACAAA). Occurrences of TAACGGA, TAACGAA, and TAACATA have been listed but disallowed from analysis. The same has been done for the inverse complement T(A/C/T)(C/T)GTTA with disallowing TCCGTTA, TTCGTTA, and TATGTTA.
=====AFTs=====
{{main|Aft1p gene transcriptions#AFT1 samplings}}


Half of the random datasets (10 for TAAC(A/G)(A/G/T)A) and (10 for T(A/C/T)(C/T)GTTA) produced no results. Individual occurrences of TAACAAA at 3592, TAACAAA at 1376, TAACAGA at 1059, TAACAAA at 961, or TAACGTA at 932 were recorded. For the inverse complements: TACGTTA at 3929, TCTGTTA at 3622, TACGTTA at 3228, TTTGTTA at 3209, TTTGTTA at 1168, TTTGTTA at 453 were recorded. One dataset produced three inverse complements: TACGTTA at 3929, TACGTTA at 3228, TTTGTTA at 1168. All occurrences were in the A1BG negative direction UTR or distal promoters in both directions. None were even close to the real occurrences. This suggests that the few real occurrences were not random.
====Box As====
{{main|A box gene transcriptions#Box A samplings}}


====Hypoxia response elements====
====C-boxes====
{{main|Hypoxia response element gene transcriptions}}
{{main|C box gene transcriptions}}
C-boxes come in several varieties:


====CACA elements====
=====C-boxes (Johnson)=====
{{main|Hypoxia response element gene transcriptions}}
{{main|C box gene transcriptions#Johnson C-box samplings}}


====Pyrimidine boxes====
=====C boxes (Samarsky)=====
{{main|Pyrimidine box gene transcriptions}}
{{main|C box gene transcriptions#Samarsky C box samplings}}


====TAT boxes====
=====C boxes (Voronina)=====
{{main|TAT box gene transcriptions}}
{{main|C box gene transcriptions#Voronina C box samplings}}
 
=====C boxes (Song)=====
{{main|C box gene transcriptions#Song C-box samplings}}
 
=====C boxes (Song hybrids)=====
{{main|C box gene transcriptions#Hybrid C, G box samplings}}
Hybrids: C/A-box (TGACGTAT), C/G-box (TGACGTGT), C/T-box (TGACGTTA).


====TATC boxes====
====CAMPs====
{{main|TATC box gene transcriptions}}
{{main|CRE box gene transcriptions#CRE samplings of the A1BG promoters}}


===General Regulatory Factors===
====ESRE====
{{main|Endoplasmic reticulum stress response element gene transcriptions}}
The endoplasmic reticulum stress response element (ESRE) has two parts: (1) CCAAT and (2) CCACG which are tested separately then compared to see if any parts have any nine nucleotides between them.


"General regulatory factors (GRFs), such as Reb1, Abf1, Rap1, Mcm1, and Cbf1, positionally organize yeast chromatin through interactions with a core consensus DNA sequence."<ref name=Rossi>{{ cite journal
=====CCAAT=====
|author=Matthew J. Rossi
{{main|Endoplasmic reticulum stress response element gene transcriptions#CCAAT samplings}}
|author2=William K.M. Lai
|author3=B. Franklin Pugh
|title=Genome-wide determinants of sequence-specific DNA binding of general regulatory factors
|journal=Genome Research
|date=21 March 2018
|volume=28
|issue=
|pages=497-508
|url=https://genome.cshlp.org/content/28/4/497.full
|arxiv=
|bibcode=
|doi=10.1101/gr.229518.117
|pmid=29563167
|accessdate=31 August 2020 }}</ref>


"Ribosome biogenesis in ''Saccharomyces cerevisiae'' involves a regulon of >200 genes (Ribi genes) coordinately regulated in response to nutrient availability and cellular growth rate. Two ''cis''-acting elements called PAC and RRPE are known to mediate Ribi gene repression in response to nutritional downshift. [Most] Ribi gene promoters also contain binding sites for one or more General Regulatory Factors (GRFs), most frequently Abf1 and Reb1, and that these factors are enriched ''in vivo'' at Ribi promoters. Abf1/Reb1/Tbf1 promoter association was required for full Ribi gene expression in rich medium and for its modulation in response to glucose starvation, characterized by a rapid drop followed by slow recovery. Such a response did not entail changes in Abf1 occupancy, but it was paralleled by a quick increase, followed by slow decrease, in Rpd3L histone deacetylase occupancy. [...] Abf1 site disruption also abolished Rpd3L complex recruitment in response to starvation. Extensive mutational analysis of the ''DBP7'' promoter revealed a complex interplay of Tbf1 sites, PAC and RRPE in the transcriptional regulation of this Ribi gene. [...] GRFs [are] multifaceted players in Ribi gene regulation both during exponential growth and under repressive conditions."<ref name=Bosio>{{ cite journal
=====CCACG=====
|author=Maria Cristina Bosio, Beatrice Fermi, Gloria Spagnoli, Elisabetta Levati, Ludmilla Rubbi, Roberto Ferrari, Matteo Pellegrini, Giorgio Dieci
{{main|Endoplasmic reticulum stress response element gene transcriptions#CCACG samplings}}
|title=Abf1 and other general regulatory factors control ribosome biogenesis gene expression in budding yeast
|journal=Nucleic Acids Research
|date=5 May 2017
|volume=45
|issue=8
|pages=4493-4506
|url=https://academic.oup.com/nar/article/45/8/4493/2965382
|arxiv=
|bibcode=
|doi=10.1093/nar/gkx058
|pmid=
|accessdate=8 June 2021 }}</ref>


====Abfm regulatory factors====
According to So (2018) the endoplasmic reticulum stress response element should be CCAAT-N9-CCACG. Samplings demonstrate that the ideal CCAAT-N9-CCACG or its complement inverse do not occur on either side of A1BG or close to ZSCAN22 or ZNF497.
{{main|Abf1 regulatory factor gene transcriptions}}
The general consensus sequence for Abf1 CGTNNNNN(A/G)(C/T)GA(C/T) occurs on both sides of A1BG but only in the distal promoters. Random datasets, even numbered assigned to the negative direction and odd numbered assigned to the positive direction yielded a sequence in the UTR, core promoter and distal promoter for the negative direction and a sequence in the distal promoter for the positive direction. The real consensus sequence yielded only three results: one in the negative direction and two in the positive, all in the distal promoters. The random sequences (four total) occurred in the UTR, proximal promoter and distal promoter for the negative direction and one in the distal promoter for the positive direction. While the differences between real and random are small (three vs. four), (all distal vs. UTR, proximal and two distal), they are likely significant as the random datasets (10) should have encompassed the real (2, each side of A1BG) but this did not occur.


Specific "sequences considered as exact Abf1 motif occurrences": CGTNNNNNACGA(C/T), CGTNNNNNA(C/T)GAC, CGTNNNNNA(C/T)GA(C/T), CGTNNNNN(A/G)(C/T)GA(C/T) (Abfm).<ref name=Rossi>{{ cite journal
====Hap motif====
|author=Matthew J. Rossi, William K.M. Lai and B. Franklin Pugh
{{main|CAAT box gene transcriptions#Heme-activated protein (Hap) samplings|Endoplasmic reticulum stress response element gene transcriptions#CCAAT samplings}}
|title=Genome-wide determinants of sequence-specific DNA binding of general regulatory factors
|journal=Genome Research
|date=21 March 2018
|volume=28
|issue=
|pages=497-508
|url=https://genome.cshlp.org/content/28/4/497.full
|arxiv=
|bibcode=
|doi=10.1101/gr.229518.117
|pmid=29563167
|accessdate=31 August 2020 }}</ref>


{|class="wikitable"
====G-boxes====
|-
{{main|G box gene transcriptions}}
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|-
| Reals || UTR || negative || 0 || 2 || 0 || 0
|-
| Randoms || UTR || arbitrary negative || 1 || 10 || 0.1 || 0.05
|-
| Randoms || UTR || alternate negative || 0 || 10 || 0 || 0.05
|-
| Reals || Core || negative || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Core || positive || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary positive || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate positive || 0 || 10 || 0 || 0
|-
| Reals || Proximal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary negative || 1 || 10 || 0.1 || 0.05
|-
| Randoms || Proximal || alternate negative || 0 || 10 || 0 || 0.05
|-
| Reals || Proximal || positive || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary positive || 0 || 10 || 0 || 0
|-
| Randoms || Proximal || alternate positive || 0 || 10 || 0 || 0
|-
| Reals || Distal || negative || 1 || 2 || 0.5 || 0.5
|-
| Randoms || Distal || arbitrary negative || 1 || 10 || 0.1 || 0.1
|-
| Randoms || Distal || alternate negative || 1 || 10 || 0.1 || 0.1
|-
| Reals || Distal || positive || 2 || 2 || 1 || 1
|-
| Randoms || Distal || arbitrary positive || 1 || 10 || 0.1 || 0.2
|-
| Randoms || Distal || alternate positive || 3 || 10 || 0.3 || 0.2
|}


Comparison:
=====G-box (CACGTG)=====
{{main|Phosphate starvation-response transcription factor gene transcriptions#Pho samplings|Complex locus A1BG and ZNF497#Phors}}


The occurrences of real Abfms are greater than the randoms. This suggests that the real Abfms are likely active or activable.
====GCN4 motif====
{{main|Gcn4p gene transcriptions}}


====Cbf1 regulatory factors====
=====GCREs (Gcn4)=====
{{main|Cbf1 regulatory factor gene transcriptions}}
{{main|Gcn4p gene transcriptions#GCRE samplings}}
Consensus sequence TCACGTGA<ref name=Rossi/> did not have any real or random results.


====Mcm1 regulatory factors====
====Migs====
{{main|Mcm1 regulatory factor gene transcriptions}}
{{main|Mig1p gene transcriptions}}
Neither TT(A/T)CCNN(A/T)TNGG(A/T)AA nor TTNCCNNNTNNGGNAA produced any real or random results.


====Rap1 regulatory factors====
====Nuclear factors====
{{main|Rap1 regulatory factor gene transcriptions}}
{{main|Nuclear factor gene transcriptions}}
When the Rap1 motif was held constant to ACCCRNRCA<ref name=Rossi/>, no real results occurred. However, using the ten random datasets for testing ACCCRNRCA and its inverse complement yielded five consensus sequence results and four inverse complements. Two were in the UTR of A1BG from the negative direction. One was in the proximal promoter from the positive direction, and the remaining five were in the distal promoters.


The reduced consensus (A/G)(A/C)ACCC(A/G)N(A/G)C(A/C)(C/T)(A/C)<ref name=Rossi/> had one result GAACCCACACCTC in the positive direction at 1807, less than half way from ZNF497. Of ten random datasets only one had a result: GCACCCGGGCATC at 1454. Also, for the inverse complement, there was only one TATGCCTGGGTTT at 1380. In both the real sequences and random sequences, each was in the distal promoter closer to the zinc finger than A1BG. The occurrence of one random result per ten datasets suggests that such a result is rarely random. While the real occurrence is likely active as a regulatory response.
=====NFATs=====
{{main|Nuclear factor of activated T cell gene transcriptions (NFAT)#NFAT samplings}}


The full consensus sequence C(A/C/G)(A/C/G)(A/G)(C/G/T)C(A/C/T)(A/G/T)(C/G/T)(A/G/T)(A/C/G)(A/C)(A/C/T)(A/C/T)<ref name=Rossi/> gave four to six results in the UTR negative direction, one in the core promoter in the positive direction, two in the proximal promoter in the negative direction and one in the positive direction. In the distal promoter each direction had eight to nine results.
=====HNF6s=====
{{main|HNF gene transcriptions#HNF6 samplings}}


For the random data sets: UTR ranged from zero to four in the UTR, core promoter produced only zero to one, proximal promoter produced zero to one, and the distal promoter contained one to seven for either direction.
====T boxes====
{{main|T box gene transcriptions}}


Comparing the two, the real UTR, proximal promoter, and distal promoter usually exceeded the random results. This suggests that some of the real results could just be due to random associations of nucleotides, but the rest are likely real.
=====TboxCs=====
{{main|T box gene transcriptions#T box (Conlon) samplings}}


====Reb1 regulatory factors====
=====TboxZs=====
{{main|Reb1 general regulatory factor gene transcriptions}}
{{main|T box gene transcriptions#T box (Zhang) samplings}}
Reb1 consensus sequences TTACCC(G/T) have three occurrences in A1BG: UTR at 3661 and two distal promoters at 3170 and 2912 in the positive direction all more than half way from either Zn finger.


Using the random datasets: there are three sequences in two datasets within the UTR: TTACCCG at 4135 and CGGGTAA at 3979, with AGGGTAA at 3112; the core promoters contained only TTACCCG at 4416 in the positive direction; the proximal promoters contained only TTACCCT at 4250 in the positive direction; and the distal promoters contained four to five sequences: five in the negative direction all more than half way to ZSCAN22 and four in the positive direction: one TTACCCG at 2965 more than halfway toward A1BG and the remaining three less than halfway.
====Vboxes====
{{main|V box gene transcriptions#V box samplings}}


The extended Reb1 consensus sequence ATTACCCGAA had no locations in either direction or in random datasets for either the extended consensus sequence or its inverse complement.
====Z-boxes====
{{main|Z box gene transcriptions}}


====Tbf1 regulatory factors====
=====ZboxGs=====
{{main|Tbf1 regulatory factor gene transcriptions}}
{{main|Z box gene transcriptions#General Z-box (ZboxG) samplings}}
The usual consensus sequence for Tbf1 ARCCCTAA<ref name=Bosio/> occurs three times around A1BG: once in the negative direction within the UTR: the inverse complement TTAGGGTT at 3978 and twice in the positive direction: negative strand TTAGGGCT at 2768 and positive strand AACCCTAA at 2545. Both are closer to A1BG than either zinc finger.


In the random datasets, only the inverse complements occur in one data set: TTAGGGCT at 3616, TTAGGGTT at 198 representing the positive direction, distal promoter. The second is less than halfway from ZNF497.
=====ZboxSps=====
{{main|Z box gene transcriptions#Z-box (ZboxSp) samplings}}


===Basic leucine zipper (bZIP) class response elements===
===Helix-turn-helix (HTH) transcription factors===
"Most bZIP proteins show high binding affinity for the [[ACGT-containing element gene transcriptions|ACGT motifs]], which include CACGTG (G box), GACGTC (C box), TACGTA (A box), AACGTT (T box), and a GCN4 motif, namely TGA(G/C)TCA (Landschulz et al., 1988; Nijhawan et al., 2008)."<ref name=Zhang/>
{{main|Helix-turn-helix transcription factors}}
Gene ID: 4602 is MYB [myeloblastosis] MYB proto-oncogene, transcription factor on 6q23.3: "This gene encodes a protein with three HTH DNA-binding domains that functions as a transcription regulator. This protein plays an essential role in the regulation of hematopoiesis. This gene may be aberrently expressed or rearranged or undergo translocation in leukemias and lymphomas, and is considered to be an oncogene. Alternative splicing results in multiple transcript variants."<ref name=RefSeq4602>{{ cite web
|author=RefSeq
|title=MYB MYB proto-oncogene, transcription factor [ Homo sapiens (human) ]
|publisher=National Center for Biotechnology Information, U.S. National Library of Medicine
|location=8600 Rockville Pike, Bethesda MD, 20894 USA
|date=January 2016
|url=https://www.ncbi.nlm.nih.gov/gene/4602
|accessdate=7 February 2021 }}</ref>
 
====CadC binding domains====
{{main|CadC binding domain gene transcriptions#Cadaverine C samplings}}


"A majority of the plant bZIP proteins isolated to date recognize elements with an ACGT core (Foster et al., 1994)."<ref name=Nijhawan>{{ cite journal | author = Nijhawan A, Jain M, Tyagi AK, Khurana JP
====Factor II B recognition elements====
| title = Genomic survey and gene expression analysis of the basic leucine zipper transcription factor family in rice
{{main|Factor II B recognition element gene transcriptions#BREu samplings}}
| journal = Plant Physiology
 
| volume = 146
====Forkhead boxes====
| issue = 2
{{main|Forkhead box gene transcriptions#Forkhead box samplings}}
| pages = 333–50
 
| date = February 2008
====Homeoboxes====
| pmid = 18065552
{{main|Homeobox gene transcriptions#Homeobox samplings}}
| doi = 10.1104/pp.107.112821 }}</ref>


"Most recombinant bZIP proteins can interact with ACGT elements derived from different plant genes, albeit with different affinity. Systematic protein/DNA binding studies have shown that sequences flanking the ACGT core affect bZIP protein binding specificity. These studies have provided the basis for a concise ACGT nomenclature and defined high-affinity A-box, C-box, and G-box elements."<ref name=Foster>{{ cite journal
====Homeodomains====
|author=Randy Foster, Takeshi Izawa and Nam-Hai Chua
{{main|Homeobox gene transcriptions#Homeodomain samplings}}
|title=Plant bZIP proteins gather at ACGT elements
|journal=FASEB
|date=1 February 1994
|volume=8
|issue=2
|pages=192-200
|url=https://faseb.onlinelibrary.wiley.com/doi/pdfdirect/10.1096/fasebj.8.2.8119490
|arxiv=
|bibcode=
|doi=10.1096/fasebj.8.2.8119490
|pmid=8119490
|accessdate=25 June 2021 }}</ref>


"HY5 binds to the promoter of light-responsive genes featuring [[ACGT-containing element gene transcriptions|"ACGT-containing elements"]] such as the G-box (CACGTG), C-box (GACGTC), Z-box (ATACGGT), and A-box (TACGTA) (4, 6)."<ref name=Nawkar>{{ cite journal
====HSE3 (Eastmond)====
|author=Ganesh M. Nawkar, Chang Ho Kanga, Punyakishore Maibam, Joung Hun Park, Young Jun Jung, Ho Byoung Chae, Yong Hun Chi, In Jung Jung, Woe Yeon Kim, Dae-Jin Yun, and Sang Yeol Lee
{{main|Hsf1p gene transcriptions#HSE3 (Eastmond) samplings}}
|title=HY5, a positive regulator of light signaling, negatively controls the unfolded protein response in ''Arabidopsis''
|journal=Proceedings of the National Academy of Sciences USA
|date=21 February 2017
|volume=114
|issue=8
|pages=2084-89
|url=https://www.pnas.org/content/pnas/114/8/2084.full.pdf
|arxiv=
|bibcode=
|doi=10.1073/pnas.1609844114
|pmid=
|accessdate=24 June 2021 }}</ref>


An [[ACGT-containing element gene transcriptions|ACGT element]] is its own inverse complement. Random datasets had 5 to 9 elements (ACGT) per dataset in the UTR negative direction toward A1BG. The real dataset had 8 in the UTR for the negative strand and one on the positive strand. This suggests that the negative strand, negative direction real occurrences could be random, but the occurrence of only one on the positive strand is likely real.
====HSE4 (Eastmond)====
{{main|Hsf1p gene transcriptions#HSE4 (Eastmond) samplings}}


The real positive direction core promoter from ZNF497 toward A1BG has two ACGT strands. In the positive transcription direction three of five random datasets had 1 or 2 ACGT sequences. But, the real sequences are at or within 4445 nucleotides from ZNF497, whereas the three random datasets are between 4445 and 4560: at 4489, 4448, and 4478 with the transcription start site at 4300 nts. The real promoters on the positive direction were only tested to 4445 nts from ZNF497. So the random datasets kept to the same constraint as the real tests only produced one core promoter at 4401. This excess of real core promoters in the positive direction suggests they are real.
====HSE8 GAP1 (Eastmond)====
{{main|Hsf1p gene transcriptions#HSE8 GAP1 (Eastmond) samplings}}


Regarding proximal promoters, only two exist both in the negative direction. The random datasets (3) had five proximal promoters in the negative direction for an average just over one each. One had three sequences suggesting that only two real sequences could be random.
====HSE9 GAP2 (Eastmond)====
{{main|Hsf1p gene transcriptions#HSE9 GAP2 (Eastmond) samplings}}


For distal promoters in the negative direction there are 17 real sequences. The random datasets had between 3 and 11 ACGT sequences in the arbitrary negative direction. For the arbitrary positive direction there were between 9 and 15 sequences.
====Hsf (Tang)====
{{main|Hsf1p gene transcriptions#Hsf (Tang) samplings}}


The positive direction had 44 sequences. Both sides of A1BG are not included in the random results suggesting they are real.
====MREs====
{{main|MYB recognition element gene transcriptions#MRE samplings}}


For total ACGT occurrences: negative direction between ZSCAN22 and A1BG is 28 and positive direction is 44, but the random datasets had between 9 and 20. The excessive occurrences of ACGT in both directions suggests they are real.
====Tryptophan residues====
{{main|Interferon regulatory factor gene transcriptions#Tryptophan residue samplings}}


====A-boxes====
===Basic helix-loop-helix (bHLH) transcription factors===
{{main|A box gene transcriptions}}
{{main|Basic helix–loop–helix}}
In particular, HY5 binds to the promoter of light-responsive genes featuring "ACGT-containing elements" such as the G-box (CACGTG), C-box (GACGTC), Z-box (ATACGGT), and A-box (TACGTA) (4, 6)."<ref name=Nawkar/>
"The [palindromic E-box motif (CACGTG)] motif is bound by the transcription factor Pho4, [and has the] class of basic helix-loop-helix DNA binding domain and core recognition sequence (Zhou and O'Shea 2011)."<ref name=Rossi>{{ cite journal
|author=Matthew J. Rossi, William K.M. Lai and B. Franklin Pugh
|title=Genome-wide determinants of sequence-specific DNA binding of general regulatory factors
|journal=Genome Research
|date=21 March 2018
|volume=28
|issue=
|pages=497-508
|url=https://genome.cshlp.org/content/28/4/497.full
|arxiv=
|bibcode=
|doi=10.1101/gr.229518.117
|pmid=29563167
|accessdate=31 August 2020 }}</ref>


{|class="wikitable"
"Pho4 bound to virtually all E-boxes ''in vitro'' (96%) [...]. That was not the case ''in vivo'', where only 5% were bound by Pho4, under activating conditions as determined by ChIP-seq [Zhou and O'Shea 2011]."<ref name=Rossi/>
|-
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)  
|-
| Reals || UTR || negative || 1 || 2 || 0.5 || 0.5
|-
| Randoms || UTR || arbitrary negative || 0 || 10 || 0 || 0.1
|-
| Randoms || UTR || alternate negative || 2 || 10 || 0.2 || 0.1
|-
| Reals || Core || negative || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Core || positive || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary positive || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate positive || 0 || 10 || 0 || 0
|-
| Reals || Proximal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Proximal || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Proximal || positive || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary positive || 0 || 10 || 0 || 0
|-
| Randoms || Proximal || alternate positive || 0 || 10 || 0 || 0
|-
| Reals || Distal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Distal || arbitrary negative || 2 || 10 || 0.2 || 0.1
|-
| Randoms || Distal || alternate negative || 0 || 10 || 0 || 0.1
|-
| Reals || Distal || positive || 1 || 2 || 0.5 || 0.5
|-
| Randoms || Distal || arbitrary positive || 2 || 10 || 0.2 || 0.2
|-
| Randoms || Distal || alternate positive || 2 || 10 || 0.2 || 0.2
|}


Comparison:
"Pho4 possesses the intrinsic ability to bind every E-box, but ''in vivo'' is prevented from binding by chromatin unless assisted by chromatin remodelers (Svaren ''et al.'' 1994) that are targeted at promoter regions."<ref name=Rossi/>


The occurrences of real A-boxes are greater than the randoms. This suggests that the real A-boxes are likely active or activable.
"On one end of that spectrum, typical transcription factors like Pho4 do not appear to compete with nucleosomes and instead predominantly sample motifs that already exist in the [nucleosome-free promoter regions] NFRs generated by other factors. In vitro (PB-exo), Pho4 bound nearly every instance of an E-box motif across the yeast genome. However, in vivo, Pho4 is a low-abundance protein that is recruited to the nucleus upon phosphate starvation by other factors, to act at a few dozen genes (Komeili and O'Shea 1999; Zhou and O'Shea 2011). Since Pho4 appears unable to compete with nucleosomes, competent sites that are occluded by nucleosomes are invisible to Pho4."<ref name=Rossi/>


====Activating transcription factors====
The Pho4 homodimer binds to DNA sequences containing the bHLH binding site CACGTG.<ref name=Shao>{{ cite journal
{{main|Activating transcription factor gene transcriptions}}
|author=Dalei Shao, Caretha L. Creasy, Lawrence W. Bergman
The activating transcription factor (Burton) has the consensus sequence of (A/C/G)TT(A/G/T)C(A/G)TCA with one in the UTR in the negative direction and one in the negative direction in the distal promoter, whereas there are five consensus sequences in the positive direction.
|title = A cysteine residue in helixII of the bHLH domain is essential for homodimerization of the yeast transcription factor Pho4p
|journal = Nucleic Acids Research
|volume = 26
|issue = 3
|pages = 710–4
|date= 1 February 1998
|pmid = 9443961
|pmc = 147311
|doi = 10.1093/nar/26.3.710
|url = https://academic.oup.com/nar/article/26/3/710/1052045 }}</ref>


The random datasets have one consensus sequence in each direction in the distal promoters.
The upstream activating sequence (UAS) for Pho4p is CAC(A/G)T(T/G) in the promoters of ''HIS4'' and ''PHO5'' regarding phosphate limitation with respect to regulation of the purine and histidine biosynthesis pathways [66].<ref name=Tang>{{ cite journal
|author=Hongting Tang, Yanling Wu, Jiliang Deng, Nanzhu Chen, Zhaohui Zheng, Yongjun Wei, Xiaozhou Luo, and Jay D. Keasling
|title=Promoter Architecture and Promoter Engineering in ''Saccharomyces cerevisiae''
|journal=Metabolites
|date=6 August 2020
|volume=10
|issue=8
|pages=320-39
|url=https://www.mdpi.com/2218-1989/10/8/320/pdf
|arxiv=
|bibcode=
|doi=10.3390/metabo10080320
|pmid=32781665
|accessdate=18 September 2020 }}</ref>


The (Kilberg) consensus sequence (A/G/T)TT(A/G/T)CATCA is a special case of (A/C/G)TT(A/G/T)C(A/G)TCA of (Burton).
bHLH proteins typically bind to a consensus sequence called an E-box, CANNTG.<ref name="pmid10319327">{{cite journal |author=Chaudhary J, Skinner MK |title=Basic helix-loop-helix proteins can act at the E-box within the serum response element of the c-fos promoter to influence hormone-induced promoter activation in Sertoli cells |journal=Mol. Endocrinol. |volume=13 |issue=5 |pages=774–86 |date=1999 |pmid=10319327 |doi=10.1210/mend.13.5.0271 }}</ref>


====Affinity Capture-Western; Two-hybrid transcription factors====
"A computer search for transcription promoter elements [...] showed the presence of a prominent TATA box 22 nucleotides upstream of the transcription start site and an [[Sp1]] site at position -42 to -33. The 5'-flanking sequence also contains three E boxes with CANNTG consensus sequences at positions -464 to -459, -90 to -85, and -52 to -47 that have been marked as [[E box]], [[E1 box]], and [[E2 box]], respectively [...]. In addition, the 5'-flanking region contains one or more [[GRE]], [[Aryl hydrocarbon receptor#DNA binding (xenobiotic response element – XRE)|XRE]], [[GATA1|GATA-1]], [[ATF4|GCN-4]], [[ETV4|PEA-3]], [[AP-1 (transcription factor)|AP1]], and [[Activating protein 2|AP2]] consensus motifs and also three imperfect CArG sites [...]."<ref name=Lenka>{{ cite journal
{{main|Aft1p gene transcriptions}}
|author=Nibedita Lenka, Aruna Basu, Jayati Mullick, and Narayan G. Avadhani
The upstream activating sequence (UAS) for Aft1p is PyPuCACCCPu or (C/T)(A/G)CACCC(A/G).<ref name=Tang/>
|title=The role of an E box binding basic helix loop helix protein in the cardiac muscle-specific expression of the rat cytochrome oxidase subunit VIII gene
|journal=The Journal of Biological Chemistry
|date=22 November 1996
|volume=271
|issue=47
|pages=30281–30289
|url=http://www.jbc.org/content/271/47/30281.full.pdf
|arxiv=
|bibcode=
|doi=10.1074/jbc.271.47.30281
|pmid=
|accessdate=7 February 2019 }}</ref>


In the UTR of A1BG in the negative direction there is only one occurrence on the positive strand, the inverse complement TGGGTGTG at 3185. Since a separate one does not occur on the negative strand the likelihood is only 0.5, although its complement does occur on the negative strand.
====AhRYs====
{{main|Xenobiotic response element gene transcriptions#TCDD*AhR DNA-binding consensus sequence sampling}}


From the random datasets, there is no UTR occurrence from the arbitrarily chosen negative direction. If the other dataset had been picked, there would be an UTR occurrence, inverse complement TGGGTGTA at 4170. It would have the same likelihood of 0.5. The occurrence of the real TGGGTGTG at 3185 is likely random.
====AHRE-IIs====
{{main|Xenobiotic responsive element gene transcriptions#AHRE-II samplings}}


The only other real occurrences are in the distal promoter: negative direction - negative strand (1), positive strand (2), for a likelihood of 1.5 in the negative direction. If the occurrences in the UTR and the distal promoter in the negative direction are linked for transcription activation then both are likely active or activable.
====AEREs====
{{main|Antioxidant-electrophile responsive element gene transcriptions#AERE (Lacher) samplings}}


The random datasets had three from ten in the arbitrarily chosen negative direction for 0.3.
====CAT boxes====
{{main|CAT box gene transcriptions#CAT box samplings}}


The real positive direction had one TGCACCCG at 3324 from the negative strand only for 0.5. This has ACGT in the reverse direction.
====CAT-box-like elements====
{{main|CAT box gene transcriptions#CAT-box-like element samplings}}


The arbitrarily chosen positive direction had one complement inverse proximal promoter TGGGTGTA at 4170 from ten datasets for 0.1.
===="Class C"====
{{main|N box gene transcriptions#"Class C" (Leal) samplings}}


The random datasets have four from ten for 0.4. This is close enough to suggest that the real occurrence in the positive direction in likely random.
===="Class I"====


If the combination of a UTR and distal promoter occurrences can work together to promote transcription in the negative direction on the positive strand which is the strand of A1BG, then the occurrences are likely active or activable, whereas the occurrence in the positive direction may not be unless it can co-activate with other transcription factors.
=====TCFs=====
{{main|Transcription factor 3 gene transcriptions#TCF3 samplings}}


====Box As====
====DIOXs====
{{main|Xenobiotic response element gene transcriptions#DIOX samplings}}


For box A (TGACTCT) which is a GCRE there is a consensus sequence on either side of A1BG whereas the random datasets have only 0.3 on either side.
====Enhancer boxes====
{{main|Enhancer box gene transcriptions#Enhancer box samplings}}


"The human [Transforming growth factor b1] TGFB1 promoter region contains two binding sequences for [Activator protein-1] AP-1, designated AP-1 box A (TGACTCT) and box B (TGTCTCA), which mediate the upregulation of promoter activity via a PKC-dependent pathway after exposure of cells to a high-glucose environment (Refs 37, 38)."<ref name=Paratore>{{ cite journal
=====ChoRE motifs=====
|author=Amber Paratore Sanchez and Kumar Sharma
{{main|Carbohydrate response element gene transcriptions}}
|title=Transcription factors in the pathogenesis of diabetic nephropathy
|journal=Expert Reviews in Molecular Medicine
|date=July 2009
|volume=11
|issue=
|pages=e13
|url=https://www.cambridge.org/core/journals/expert-reviews-in-molecular-medicine/article/transcription-factors-in-the-pathogenesis-of-diabetic-nephropathy/5459130CB955272C047982BE21FEE256
|arxiv=
|bibcode=
|doi=10.1017/S1462399409001057
|pmid=
|accessdate=1 October 2018 }}</ref>


"The human TGF-β1 promoter region contains two binding sequences for AP-1, designated AP-1 box A (TGACTCT) and box B (TGTCTCA), which mediate the up-regulation of promoter activity after [High glucose] HG stimulation."<ref name=Kokoroishi>{{ cite journal
=====CarbE1s=====
|author=Keiko Kokoroishi, Ayumu Nakashima, Shigehiro Doi, Toshinori Ueno, Toshiki Doi, Yukio Yokoyama, Kiyomasa Honda, Masami Kanawa, Yukio Kato, Nobuoki Kohno & Takao Masaki
{{main|Carbohydrate response element gene transcriptions#ATCTTG (CarbE1) samplings}}
|title=High glucose promotes TGF-β1 production by inducing FOS expression in human peritoneal mesothelial cells
|journal=Clinical and Experimental Nephrology
|date=28 May 2015
|volume=20
|issue=1
|pages=30-8
|url=https://link.springer.com/article/10.1007/s10157-015-1128-9
|arxiv=
|bibcode=
|doi=10.1007/s10157-015-1128-9
|pmid=26018137
|accessdate=14 August 2020 }}</ref>


{|class="wikitable"
=====CarbE2s=====
|-
{{main|Carbohydrate response element gene transcriptions#CACGTG (CarbE2) samplings}}
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)  
|-
| Reals || UTR || negative || 0 || 2 || 0 || 0
|-
| Randoms || UTR || arbitrary negative || 2 || 10 || 0.2 || 0.15
|-
| Randoms || UTR || alternate negative || 1 || 10 || 0.1 || 0.15
|-
| Reals || Core || negative || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Core || positive || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary positive || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate positive || 0 || 10 || 0 || 0
|-
| Reals || Proximal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Proximal || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Proximal || positive || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary positive || 0 || 10 || 0 || 0
|-
| Randoms || Proximal || alternate positive || 0 || 10 || 0 || 0
|-
| Reals || Distal || negative || 1 || 2 || 0.5 || 0.5
|-
| Randoms || Distal || arbitrary negative || 1 || 10 || 0.1 || 0.1
|-
| Randoms || Distal || alternate negative || 1 || 10 || 0.1 || 0.1
|-
| Reals || Distal || positive || 1 || 2 || 0.5 || 0.5
|-
| Randoms || Distal || arbitrary positive || 2 || 10 || 0.2 || 0.15
|-
| Randoms || Distal || alternate positive || 1 || 10 || 0.1 || 0.15
|}


Comparison:
=====CarbE3s=====
{{main|Carbohydrate response element gene transcriptions#TCCGCC (CarbE3) samplings}}


The occurrences of box A consensus sequences are greater than the randoms. This suggests that the real box A consensus sequences are likely active or activable.
=====Phors=====
{{main|Phosphate starvation-response transcription factor gene transcriptions#Pho samplings}}
Palindromic E-box motif (CACGTG).
 
=====E2 boxes=====
{{main|E2 box gene transcriptions#E2 box samplings}}
 
====GATAs====
{{main|GATA gene transcriptions#GATA samplings}}


====C-boxes====
====Gln3s====
{{main|C box gene transcriptions}}
{{main|GATA gene transcriptions#Staschke Gln3 samplings}}
C-boxes come in several varieties:
# described by Johnson (GAGGCCATCT, none occur on either side of A1BG),<ref name=Johnson>{{ cite journal
|author=PA Johnson, D Bunick, NB Hecht
|title=Protein Binding Regions in the Mouse and Rat Protamine-2 Genes
|journal=Biology of Reproduction
|date=1991
|volume=44
|issue=1
|pages=127-134
|url=https://academic.oup.com/biolreprod/article-pdf/44/1/127/10536199/biolreprod0127.pdf
|arxiv=
|bibcode=
|doi=
|pmid=
|accessdate=6 April 2019 }}</ref> none occurred for GAGGCCATCT in the random datasets or its inverse complement AGATGGCCTC,


====C boxes (Samarsky)====
====Glucocorticoid response elements====
# AGTAGT<ref name=Samarsky>{{ cite journal
{{main|Glucocorticoid response element gene transcriptions#Glu samplings}}
|author=Dmitry A. Samarsky, Maurille J.Fournier, Robert H.Singer and Edouard Bertrand
|title=The snoRNA box C/D motif directs nucleolar targeting and also couples snoRNA synthesis and localization
|journal=The European Molecular Biology Organization (EMBO) Journal
|date=1 July 1998
|volume=17
|issue=13
|pages=3747–3757
|url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1170710/pdf/003747.pdf
|arxiv=
|bibcode=
|doi=10.1093/emboj/17.13.3747
|pmid=9649444
|accessdate=2017-02-04 }}</ref>


The real promoters have four consensus sequences in the ZSCAN22 to A1BG UTR of A1BG (occurrence 2.0). There are no core or proximal promoter consensus sequences. There are two distal promoter sequences only in the positive direction for an occurrence of 1.0.
====ICRE (Lopes)====
{{main|Inositol, choline-responsive element gene transcriptions}}


The random datasets had two in the UTR for an occurrence of 0.2. No occurrences in the core promoters. Two in the proximal promoters (arbitrary positive direction) for an occurrence of 0.1. In the distal promoters there were four in the arbitrary negative direction for an occurrence of 0.4 and four in the arbitrary positive direction for an occurrence of 0.4.
====ICRE (Schwank)====
{{main|Inositol, choline-responsive element gene transcriptions}}


There is a wide discrepancy between the real occurrences and the random occurrences to suggest that the real occurrences are likely active or activable.
====Pho4====
{{main|Phosphate starvation-response transcription factor gene transcriptions#Phop samplings}}


====C boxes (Voronina)====
====QRDREs====
# described by Voronina (GGTGATG, positive strand, negative direction at 3798).<ref name=Voronina>{{ cite journal
{{main|Xenobiotic response element gene transcriptions#QRDRE samplings}}
|author=E. N. Voronina, T. D. Kolokol’tsova, E. A. Nechaeva, and M. L. Filipenko
|title=Structural–Functional Analysis of the Human Gene for Ribosomal Protein L11
|journal=Molecular Biology
|date=2003
|volume=37
|issue=3
|pages=362–371
|url=https://www.researchgate.net/profile/Elena_Voronina3/publication/263607045_Structural-Functional_Analysis_of_the_Human_Gene_for_Ribosomal_Protein_L11/links/5523af480cf27b5dc3795afa.pdf
|arxiv=
|bibcode=
|doi=
|pmid=
|accessdate=11 April 2019 }}</ref>


The real promoters have only one Voronina C box GGTGATG at 3798 on the positive strand in the negative direction in the ZSCAN-A1BG UTR for an occurrence of 0.5.
====Carbon source-responsive elements====
{{main|Carbon source-responsive element gene transcriptions}}


The random datasets also have one Voronina C box in the negative direction in the ZSCAN-A1BG UTR for an occurrence of 0.5, but a complement inverse CATCACC at 3456 instead with an occurrence of 0.1. Random datasets also had four distal promoters for an occurrence of 0.2.
=====CATTCAs=====
{{main|Carbon source-responsive element gene transcriptions#CATTCA samplings}}


The random results do not contain the real results suggesting that the one real consensus sequence is likely active or activable.
=====TCCGs=====
{{main|Carbon source-responsive element gene transcriptions#TCCG samplings}}


====C boxes (Song)====
====XREs====
# described by Song (GACGTC, both sides of A1BG),<ref name=Song>{{ cite journal
{{main|Xenobiotic response element gene transcriptions#Xenobiotic response element samplings}}
|author=Young Hun Song, Cheol Min Yoo, An Pio Hong, Seong Hee Kim, Hee Jeong Jeong, Su Young Shin, Hye Jin Kim, Dae-Jin Yun, Chae Oh Lim, Jeong Dong Bahk, Sang Yeol Lee, Ron T. Nagao, Joe L. Key, and Jong Chan Hong
|title=DNA-Binding Study Identifies C-Box and Hybrid C/G-Box or C/A-Box Motifs as High-Affinity Binding Sites for STF1 and LONG HYPOCOTYL5 Proteins
|journal=Plant Physiology
|date=April 2008
|volume=146
|issue=4
|pages=1862–1877
|url=http://www.plantphysiol.org/content/plantphysiol/146/4/1862.full.pdf
|arxiv=
|bibcode=
|doi=10.1104/pp.107.113217
|pmid=
|accessdate=26 March 2019 }}</ref>


The real promoters have one UTR consensus sequence on the negative strand in the negative direction GACGTC at 4316 for an occurrence of 0.5. There is one C-box in the core promoter but on the positive strand in the positive direction GACGTC at 4316 for an occurrence of 0.5. There are none in the proximal promoters. In the distal promoters there are eight only on the positive strand in the positive direction for an occurrence of 4.0.
===Basic helix-loop-helix leucine zipper transcription factors===


The random datasets had two in the UTR for an occurrence 0.2. There were none in the core or proximal promoters. The distal promoters had seven for an occurrence of 0.7 (with 0.8 in the arbitrary negative direction and 0.6 in the positive direction).
Basic helix-loop-helix leucine zipper transcription factors are, as their name indicates, transcription factors containing both [[Basic helix-loop-helix]] and [[leucine zipper]] motifs.


The real consensus sequences are way outside the random results suggesting that the real are likely active or activable.
Examples include [[Microphthalmia-associated transcription factor]] and [[Sterol regulatory element-binding protein]] (SREBP).


====C boxes (Song hybrids)====
MITF recognizes E-box (CAYRTG) and M-box (TCAYRTG or CAYRTGA) sequences in the promoter regions of target genes.<ref name=Hoek>{{cite journal | author = Hoek KS, Schlegel NC, Eichhoff OM, Widmer DS, Praetorius C, Einarsson SO, Valgeirsdottir S, Bergsteinsdottir K, Schepsky A, Dummer R, Steingrimsson E | title = Novel MITF targets identified using a two-step DNA microarray strategy | journal = Pigment Cell Melanoma Res. | volume = 21 | issue = 6 | pages = 665–76 | date = 2008 | pmid = 19067971 | doi = 10.1111/j.1755-148X.2008.00505.x }}</ref>
# hybrids described by Song:<ref name=Song/> C/A-box (TGACGTAT), C/G-box (TGACGTGT), C/T-box (TGACGTTA).


The real promoters have no hybrid C/A boxes and the random datasets only had two in the negative direction for an occurrence of 0.2.
[[Serum response element gene transcriptions]]: The SRE wild type (SREwt) contains the nucleotide sequence ACAGGATGTCCATATTAGGACATCTGC, of which CCATATTAGG is the CArG box, TTAGGACAT is the C/EBP box, and CATCTG is the E box.<ref name=Misra>{{ cite journal
|author=Ravi P. Misra
|author2=Azad Bonni
|author3=Cindy K. Miranti
|author4=Victor M. Rivera
|author5=Morgan Sheng
|author6=Michael E.Greenberg
|title=L-type Voltage-sensitive Calcium Channel Activation Stimulates Gene Expression by a Serum Response Factor-dependent Pathway
|journal=The Journal of Biological Chemistry
|date=14 October 1994
|volume=269
|issue=41
|pages=25483-25493
|url=http://www.jbc.org/content/269/41/25483.full.pdf
|arxiv=
|bibcode=
|doi=
|pmid=7929249
|accessdate=7 December 2019 }}</ref>


The real promoters have only one hybrid C/G box on the positive strand in the positive direction in the distal promoter ACACGTCA at 3962 for an occurrence of 0.5. The random datasets had only one CG box in the arbitrary negative direction in the distal promoter TGACGTGT at 915 for an occurrence of 0.1.
"Serum response factor (SRF) is an important transcription factor that regulates cardiac and skeletal muscle genes during development, maturation and adult aging [17,18]. SRF regulates its target genes by binding to serum response elements (SREs), which contain a consensus CC(A/T)<sub>6</sub>GG (CArG) motif."<ref name=Zhang2017>{{ cite journal
|author=Xiaomin Zhang, Gohar Azhar, Jeanne Y. Wei
|title=SIRT2 gene has a classic SRE element, is a downstream target of serum response factor and is likely activated during serum stimulation
|journal=PLOS One
|date=21 December 2017
|volume=12
|issue=12
|pages=e0190011
|url=https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0190011
|arxiv=
|bibcode=
|doi=10.1371/journal.pone.0190011
|pmid=
|accessdate=23 February 2021 }}</ref>


It is suggested that the one C/G box hybrid is likely active or activable.
====CArG boxes====
{{main|CArG box gene transcriptions#CArG box samplings}}


The real promoters have no C/T box hybrid consensus sequences and the random datasets had two in the negative direction for an occurrence of 0.2.
====MITF E-boxes====
{{main|Enhancer box gene transcriptions#MITF E-box (CAYRTG) samplings}}


====CAMP response elements====
=====RREs=====
{{main|CRE box gene transcriptions}}
{{main|MYB recognition element gene transcriptions#RRE samplings}}
To pick a particular sequence of eight nucleotides where each of the four A,C,G,T are possible and the probability for any one is 0.25, then the probability for any specific eight is 1.5 x 10<sup>-5</sup>. For some 4560 nucleotides, the probability for TGACGTCA is 0.070.
Consensus sequence: CATCTG.


The real promoters have one on the negative strand, negative direction: TGACGTCA at 4317 in the UTR, where the occurrence is 0.5 for two strands. This suggests that TGACGTCA at 4317 is likely active or activable.
====M-boxes====
{{main|M box gene transcriptions}}
 
=====M box (Bertolotto)=====
{{main|M box gene transcriptions#M box (Bertolotto) samplings}}
 
=====M-box (Hoek)=====
{{main|M box gene transcriptions#M-box (Hoek) samplings}}


====ESRE====
=====M-box (Ripoll)=====
{{main|M box gene transcriptions#M-box (Ripoll) samplings}}


"The released aminoterminal of ATF6 (ATF6-N) then migrates to the nucleus and binds to the ER stress response element (ERSE) containing the consensus sequence CCAAT-N9-CCACG to activate genes encoding ER chaperones, ERAD components, and XBP1 (Chen et al., 2010; Yamamoto et al., 2004; Yoshida et al., 2001)."<ref name=So>{{ cite journal
====SER elements====
|author=Jae-Seon So
{{main|Serum response element gene transcriptions#SER samplings}}
|title=Roles of Endoplasmic Reticulum Stress in Immune Responses
|journal=Molecules and Cells
|date=31 August 2018
|volume=41
|issue=8
|pages=705-16
|url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6125421/
|arxiv=
|bibcode=
|doi=10.14348/molcells.2018.0241
|pmid=30078231
|accessdate=5 September 2020 }}</ref>


The endoplasmic reticulum stress response element (ESRE) has two parts: (1) CCAAT and (2) CCACG which are tested separately then compared to see if any parts have any nine nucleotides between them.
===Basic helix-span-helix===


{|class="wikitable"
====Activating proteins====
|-
{{main|Activating protein gene transcriptions}}
! Reals or randoms for part 1 !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|-
| Reals || UTR || negative || 2 || 2 || 1.0 || 1.0
|-
| Randoms || UTR || arbitrary negative || 26 || 10 || 2.6 || 2.8
|-
| Randoms || UTR || alternate negative || 30 || 10 || 3.0 || 2.8
|-
| Reals || Core || negative || 0 || 2 || 0 || 0
|-
| Randoms || Core || negative || 0 || 10 || 0 || 0.15
|-
| Reals || Core || positive || 0 || 2 || 0 || 0
|-
| Randoms || Core || positive || 3 || 10 || 0.3 || 0.15
|-
| Reals || Proximal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || negative || 3 || 10 || 0.3 || 0.35
|-
| Reals || Proximal || positive || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || positive || 4 || 10 || 0.4 || 0.35
|-
| Reals || Distal || negative || 3 || 2 || 1.5 || 1.5
|-
| Randoms || Distal || negative || 43 || 10 || 4.3 || 4.8
|-
| Reals || Distal || positive || 3 || 2 || 1.5 || 1.5
|-
| Randoms || Distal || positive || 53 || 10 || 5.3 || 4.8
|}


Comparison:
=====AP2as=====
The real occurrences are systematically lower than the randoms for the UTR. For the distals, the real occurrences are also systematically lower than the randoms. These two results suggest that the reals are likely active or activable.
{{main|Activating protein gene transcriptions#AP-2 alpha consensus sequences}}


The results here for part one are in agreement with those for the [[CAAT box gene transcriptions|heme-activated protein]] (Hap).
=====APCo1s=====
{{main|Activating protein gene transcriptions#Activating protein samplings (Cohen)}}


The second part is CCACG.<ref name=So/> This same nucleotide sequence occurs in the perfect palindrome G-boxes known as the G-box motif.<ref name=Oeda>{{ cite journal
=====APCo2s=====
|author=K Oeda, J Salinas, and N H Chua
{{main|Activating protein gene transcriptions#Activating protein (Cohen2) samplings}}
|title=A tobacco bZip transcription activator (TAF-1) binds to a G-box-like motif conserved in plant genes
|journal=The EMBO Journal
|date=July 1991
|volume=10
|issue=7
|pages=1793–1802
|url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC452853/
|arxiv=
|bibcode=
|doi=
|pmid=2050116
|accessdate=2017-02-13 }}</ref> It also occurs within a second activating protein response element GCCCACGGG.<ref name=Murata>{{ cite journal
|author=Takayuki Murata, Chieko Noda, Yohei Narita1, Takahiro Watanabe, Masahiro Yoshida, Keiji Ashio, Yoshitaka Sato, Fumi Goshima, Teru Kanda, Hironori Yoshiyama, Tatsuya Tsurumi, and Hiroshi Kimura
|title=Induction of Epstein-Barr Virus Oncoprotein Latent Membrane Protein 1 (LMP1) by Transcription Factors Activating Protein 2 (AP-2) and Early B Cell Factor (EBF)
|journal=Journal of Virology
|date=27 January 2016
|volume=
|issue=
|pages=
|url=https://jvi.asm.org/content/jvi/early/2016/01/21/JVI.03227-15.full.pdf
|arxiv=
|bibcode=
|doi=10.1128/JVI.03227-15
|pmid=
|accessdate=4 October 2020 }}</ref>


{|class="wikitable"
=====APM3Ns=====
|-
{{main|Activating protein gene transcriptions#Activating protein samplings (Murata, 3N)}}
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)  
|-
| Reals || UTR || negative || 5 || 2 || 2.5 || 2.5
|-
| Randoms || UTR || arbitrary negative || 11 || 10 || 1.1 || 1.4
|-
| Randoms || UTR || alternate negative || 17 || 10 || 1.7 || 1.4
|-
| Reals || Core || negative || 0 || 2 || 0 || 0.5
|-
| Randoms || Core || negative || 0 || 10 || 0 || 0.1
|-
| Reals || Core || positive || 2 || 2 || 1.0 || 0.5
|-
| Randoms || Core || positive || 2 || 10 || 0.2 || 0.1
|-
| Reals || Proximal || negative || 1 || 2 || 0.5 || 0.25
|-
| Randoms || Proximal || negative || 0 || 10 || 0 || 0.1
|-
| Reals || Proximal || positive || 0 || 2 || 0 || 0 .25
|-
| Randoms || Proximal || positive || 2 || 10 || 0.2 || 0.1
|-
| Reals || Distal || negative || 9 || 2 || 4.5 || 10
|-
| Randoms || Distal || negative || 20 || 10 || 2.0 || 3.05
|-
| Reals || Distal || positive || 31 || 2 || 15.5 || 10
|-
| Randoms || Distal || positive || 41 || 10 || 4.1 || 3.05
|}
The real occurrences are systematically higher the randoms suggesting that the second part of the endoplasmic reticulum stress response element is likely active or activable.


According to So (2018) the endoplasmic reticulum stress response element should be CCAAT-N9-CCACG. Samplings below demonstrate that the ideal CCAAT-N9-CCACG or its complement inverse do not occur on either side of A1BG or close to ZSCAN22 or ZNF497.
=====APM4Ns=====
{{main|Activating protein gene transcriptions#Activating protein samplings (Murata, 4N)}}


For the Basic programs testing consensus sequence CCAAT-N9-CCACG (starting with SuccessablesERSECo.bas), or CGTGG-N9-ATTGG (starting with SuccessablesERSECoci.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:
=====Yao1s=====
# Negative strand, negative direction: 0.
{{main|Activating protein gene transcriptions#Activating protein samplings (Yao1)}}
# Positive strand, negative direction: 0.
# Negative strand, positive direction: 0.
# Positive strand, positive direction: 0.
# inverse complement, negative strand, negative direction: 0.
# inverse complement, positive strand, negative direction: 0.
# inverse complement, negative strand, positive direction: 0.
# inverse complement, positive strand, positive direction: 0.


====Hap motif====
=====Yao2s=====
{{main|Activating protein gene transcriptions#Activating protein samplings (Yao2)}}


The upstream activating sequence (UAS) for the Hap4p is CCAAT.<ref name=Tang/>
=====Yau3s=====
{{main|Activating protein gene transcriptions#Activating protein samplings (Yao3)}}
"[[Pemphigus foliaceus]] (PF) is an autoimmune disease, endemic in Brazilian rural areas, characterized by acantholysis and accompanied by complement activation, with generalized or localized distribution of painful epidermal blisters. [[CD59]] is an essential complement regulator, inhibiting formation of the membrane attack complex, and mediating signal transduction and activation of T lymphocytes. ''CD59'' has different transcripts by alternative splicing, of which only two are widely expressed, suggesting the presence of regulatory sites in their noncoding regions. To date, there is no association study with polymorphisms in ''CD59'' noncoding regions and susceptibility to autoimmune diseases. In this study, we aimed to evaluate if ''CD59'' polymorphisms have a possible regulatory effect on gene expression and susceptibility to PF. Six noncoding polymorphisms were haplotyped in 157 patients and 215 controls by sequence-specific [[polymerase chain reaction|PCR]], and CD59 mRNA levels were measured in 82 subjects, by qPCR. The ''rs861256-allele-G'' (''rs861256*G'') was associated with increased mRNA expression (''p'' = .0113) and PF susceptibility in women (OR = 4.11, ''p'' = .0001), which were also more prone to develop generalized lesions (OR = 4.3, ''p'' = .009) and to resist disease remission (OR = 3.69, ''p'' = .045). Associations were also observed for ''rs831625*G'' (OR = 3.1, ''p'' = .007) and ''rs704697*A'' (OR = 3.4, ''p'' = .006) in Euro-Brazilian women, and for ''rs704701*C'' (OR = 2.33, ''p'' = .037) in Afro-Brazilians. These alleles constitute the ''GGCCAA'' haplotype, which also increases PF susceptibility (OR = 4.9, ''p'' = .045) and marks higher mRNA expression (''p'' = .0025). [...] higher ''CD59'' transcriptional levels may be related with PF susceptibility (especially in women), probably due to the effect of genetic polymorphism and to the CD59 role in T cell signal transduction."<ref name=Silva>{{ cite journal
|author=Amanda Salviano-Silva, Maria Luiza Petzl-Erler & Angelica Beate Winter Boldt
|title=''CD59'' polymorphisms are associated with gene expression and different sexual susceptibility to pemphigus foliaceus
|journal=Autoimmunity
|date=29 April 2017
|volume=50
|issue=6
|pages=377-385
|url=https://www.tandfonline.com/doi/abs/10.1080/08916934.2017.1329830
|arxiv=
|bibcode=
|doi=10.1080/08916934.2017.1329830
|pmid=
|accessdate=27 September 2021 }}</ref>


The real promoters have two consensus sequences in the UTR of A1BG between ZSCAN22 and A1BG for an occurrence of 1.0. The random datasets had twenty-six for ten datasets giving an occurrence of 2.6.
===Stem-loops===
[[Image:Stem-loop.svg|thumb|right|300px|An example of an RNA stem-loop is shown. Credit: [[c:user:Sakurambo|Sakurambo]].{{tlx|free media}}]]
As an important secondary structure of RNA, a stem-loop can direct RNA folding, protect structural stability for messenger RNA (mRNA), provide recognition sites for RNA binding proteins, and serve as a substrate for enzymatic reactions.<ref>Svoboda, P., & Cara, A. (2006). Hairpin RNA: A secondary structure of primary importance. Cellular and Molecular Life Sciences, 63(7), 901-908.</ref>


The reals have none in the core or proximal promoters, but the randoms had four in the core promoters for an occurrence of 0.2 (all in the arbitrary positive direction, two of which were above 4445, so 0.1, the occurrence if made the arbitrary negative direction was 0.2) and three in each direction in the proximal promoter for 0.3.
Hairpin loops are often elements found within the 5'UTR of prokaryotes. These structures are often bound by proteins or cause the attenuation of a transcript in order to regulate translation.<ref name=Meyer>{{cite journal|last=Meyer|first=Michelle|author2=Deiorio-Haggar K |author3=Anthony J |title=RNA structures regulating ribosomal protein biosynthesis in bacilli|journal=RNA Biology|date=July 2013|volume=10|series=7|pages=1160–1164|doi=10.4161/rna.24151|pmid=23611891 }}</ref>


In the distal promoters, the randoms had forty-three in the arbitrary negative direction for 4.3, and in the positive direction fifty-six for 5.6. The reals had three in each direction for 1.5 each.
The mRNA stem-loop structure forming at the ribosome binding site may control an initiation of translation.<ref name=Malys2009>{{cite journal | author = Malys N, Nivinskas R | title = Non-canonical RNA arrangement in T4-even phages: accommodated ribosome binding site at the gene 26-25 intercistronic junction |journal = Mol Microbiol |volume = 73 | issue = 6 | pages = 1115–1127 | date = 2009 | pmid = 19708923 | doi =10.1111/j.1365-2958.2009.06840.x }}</ref><ref name=Malys2010>{{ cite journal | author = Malys N, McCarthy JEG | title = Translation initiation: variations in the mechanism can be anticipated |journal = Cellular and Molecular Life Sciences | date = 2010 | doi =10.1007/s00018-010-0588-z | pmid=21076851 | volume = 68 | issue = 6 | pages = 991–1003 }}</ref>
{{clear}}


As the real occurrences are excessively low in occurrences they are likely active or activable.
====AUREs====
{{main|Adenylate–uridylate rich element gene transcriptions#Adenylate–uridylate rich element (Bakheet) samplings}}


====Gal4 transcription factors====
====Adenylate–uridylate rich elements (Chen and Shyu, Class I)====
{{main|Gal4p gene transcriptions}}
{{main|Adenylate–uridylate rich element gene transcriptions#ATTTA (Chen and Shyu, Class I) samplings}}


====G-boxes====
====Adenylate–uridylate rich elements (Chen and Shyu, Class II)====
{{main|G box gene transcriptions}}
{{main|Adenylate–uridylate rich element gene transcriptions#UUAUUUA(U/A)(U/A) (Chen and Shyu, Class II) samplings}}
There are at least two G-boxes: that of Oeda<ref name=Oeda/> (GCCACGTGGC, none found) and that of Loake<ref name=Loake>{{ cite journal
|author=Gary J. Loake, Ouriel Faktor, Christopher J. Lamb, and Richard A. Dixon
|title=Combination of H-box [CCTACC(N)<sub>7</sub>CT] and G-box (CACGTG) cis elements is necessary for feed-forward stimulation of a chalcone synthase promoter by the phenylpropanoid-pathway intermediate ''p''-coumaricacid
|journal=Proceedings of the National Academy of Sciences USA
|date=October 1992
|volume=89
|issue=
|pages=9230-4
|url=https://www.pnas.org/content/pnas/89/19/9230.full.pdf
|arxiv=
|bibcode=
|doi=
|pmid=
|accessdate=5 May 2020 }}</ref> (CACGTG, positive direction only: negative strand at 570 and positive strand at 3884, at 2961, at 1219, and at 547). The G-box of Loake is the same as the [[Phosphate starvation-response transcription factor gene transcriptions|abscisic acid-responsive element]].<ref name=Zhang>{{ cite journal
| author = Z G E, Zhang YP, Zhou JH, Wang L
| title = Mini review roles of the bZIP gene family in rice
| journal = Genetics and Molecular Research
| volume = 13
| issue = 2
| pages = 3025–36
| date = April 2014
| pmid = 24782137
| doi = 10.4238/2014.April.16.11 }}</ref> The random datasets have the even numbered sets arbitrarily assigned to the negative direction. Had they been assigned to the positive direction they additively appear similar to the Loake G-boxes found on the ZNF497 side promoter of A1BG. But, at most only two occur in any dataset. This suggests that those consensus sequences found in the positive direction are real and likely active.


Binding "activity to the G-box of the light-responsive unit 1 (U1) region of the parsley (''Petroselinum crispum'') ''CHS'' promoter (CHS-U1: TCCACGTGGC; Schulze-Lefert et al., 1989) or the G-box of ''GmAux28'' (TCCACGTGTC) was much weaker than to the PA G-box [...]."<ref name=Song/>
====Adenylate–uridylate rich elements (Chen and Shyu, Class III)====
{{main|Adenylate–uridylate rich element gene transcriptions#ATTT (Chen and Shyu, Class III)}}


(G/T)CCACGTG(G/T)C combines the PA G-box (GCCACGTGGC)<ref name=Song/> with the G-box of ''GmAux28'' (TCCACGTGTC) for testing both. No examples of either the PA G-box or the G-box of ''GmAux28'' were found in either promoter of A1BG. The random datasets had one occurrence on the designated negative direction TCCACGTGTC at 2907.
====MERs====
{{main|Adenylate–uridylate rich element gene transcriptions#Overlapping (Siegel) mers}}


====GCN4 motif====
====Constitutive decay elements====
{{main|Gcn4p gene transcriptions}}
{{main|Adenylate–uridylate rich element gene transcriptions#Constitutive decay element (Siegel) samplings}}
"The program DNA-Pattern was used to search for and catalogue occurrences of consensus GCRE (TGABTVW) [TGA(C/G/T)T(A/C/G)(A/T)] and GATA (GATAAG, GATAAH, GATTA) motifs in yeast promoters."<ref name=Staschke>{{ cite journal
|author=Kirk A. Staschke, Souvik Dey, John M. Zaborske, Lakshmi Reddy Palam, Jeanette N. McClintick, Tao Pan, Howard J. Edenberg, and Ronald C. Wek
|title=Integration of General Amino Acid Control and Target of Rapamycin (TOR) Regulatory Pathways in Nitrogen Assimilation in Yeast
|journal=The Journal of Biological Chemistry
|date=May 28, 2010
|volume=285
|issue=22
|pages=16893–16911
|url=https://www.jbc.org/content/285/22/16893.full.pdf
|arxiv=
|bibcode=
|doi=10.1074/jbc.M110.121947
|pmid=
|accessdate=4 January 2021 }}</ref>


"The predicted Gln3p and Gcn4p binding sites in the UGA3 promoter are [...] the consensus Gln3p (GATA) and Gcn4p (GCRE) [TGAGTCA] binding sites present in the minimal UGA3 promoter at -􏰉206 and -􏰉112, respectively, [...]."<ref name=Staschke/>
==={{chem|Cys|2|His|2}} SP / Kruppel-like factor (KLF) transcription factor family===


For TGA(C/G/T)T(A/C/G)(A/T) there are three UTRs for each strand (3.0), one core promoter for each strand in the positive direction only (1.0), one proximal promoter in the negative direction (0.5) and three in the positive direction (1.5), with five distal promoters on the negative direction (2.5) and seventeen in the positive direction (7.5).
The {{chem|Cys|2|His|2}}-like fold group ({{chem|Cys|2|His|2}}) is by far the best-characterized class of zinc fingers, and is common in mammalian transcription factors, where such domains adopt a simple ββα fold and have the amino acid sequence motif:<ref name=Pabo2001>{{cite journal | author = Pabo CO, Peisach E, Grant RA | title = Design and selection of novel Cys2His2 zinc finger proteins | journal = Annual Review of Biochemistry | volume = 70 | pages = 313–40 | date = 2001 | pmid = 11395410 | doi = 10.1146/annurev.biochem.70.1.313 }}</ref>


The random datasets have thirteen UTRs for ten datasets (1.3), three core promoters in the positive direction only for ten datasets (0.3), one proximal promoter in the negative direction for ten datasets (0.1) and seven proximal promoters in the positive direction for ten datasets (0.7), and twenty-three distal promoters for ten datasets in the negative direction (2.3) and twenty-three distal promoters in the positive direction for ten datasets (2.3).
:X<sub>2</sub>-Cys-X<sub>2,4</sub>-Cys-X<sub>12</sub>-His-X<sub>3,4,5</sub>-His


====Maf recognition elements====
====Alcohol dehydrogenase repressor 1====
{{main|Maf recognition element gene transcriptions}}
{{main|Adr1p gene transcriptions#ADR samplings}}


====Nuclear factors====
====SP1M1s====
{{main|Nuclear factor gene transcriptions}}
{{main|Specificity protein gene transcriptions#Sp1-box 1 (Motojima) Samplings}}


=====HNF6s=====
====SP1M2s====
{{main|Specificity protein gene transcriptions#Sp1-box 2 (Motojima) Samplings}}


Hepatic nuclear factors (HNFs) bind through their DNA-binding domain (DBD) to consensus elements (A/G/T)(A/T)(A/G)T(C/T)(A/C/G)AT(A/C/G/T)(A/G/T), resulting in gene transcription.<ref name="Gardmo">{{ cite journal
====SP-1 (Sato)s====
|author=Cissi Gardmo and Agneta Mode
{{main|Specificity protein gene transcriptions#Sp-1 (Sato) samplings}}
|date=1 December 2006
 
|title=In vivo transfection of rat liver discloses binding sites conveying GH-dependent and female-specific gene expression
====SP1 (Yao)s====
|url=http://jme.endocrinology-journals.org/content/37/3/433.full
{{main|Specificity protein gene transcriptions#Sp1 (Yao) samplings}}
|journal=Journal of Molecular Endocrinology
 
|volume=37
====YY1Ts====
|issue=3\
{{main|YY1 gene transcriptions#YY1 CCATCTT samplings}}
|pages=433-441
|arxiv=
|bibcode=
|doi=10.1677/jme.1.02116
|pmid=
|accessdate=2017-09-01 }}</ref>


{|class="wikitable"
===AP-2/EREBP-related factors===
|-
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|-
| Reals || UTR || negative || 5 || 2 || 2.5 || 2.5
|-
| Randoms || UTR || arbitrary negative || 16 || 10 || 1.6 || 1.8
|-
| Randoms || UTR || alternate negative || 20 || 10 || 2.0 || 1.8
|-
| Reals || Core || negative || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Core || positive || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary positive || 4 || 10 || 0.4 || 0.3
|-
| Randoms || Core || alternate positive || 2 || 10 || 0.2 || 0.3
|-
| Reals || Proximal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary negative || 2 || 10 || 0.2 || 0.2
|-
| Randoms || Proximal || alternate negative || 2 || 10 || 0.2 || 0.2
|-
| Reals || Proximal || positive || 3 || 2 || 1.5 || 1.5
|-
| Randoms || Proximal || arbitrary positive || 0 || 10 || 0 || 0.1
|-
| Randoms || Proximal || alternate positive || 2 || 10 || 0.2 || 0.1
|-
| Reals || Distal || negative || 3 || 2 || 1.5 || 1.5
|-
| Randoms || Distal || arbitrary negative || 22 || 10 || 2.2 || 1.85
|-
| Randoms || Distal || alternate negative || 15 || 10 || 1.5 || 1.85
|-
| Reals || Distal || positive || 3 || 2 || 1.5 || 1.5
|-
| Randoms || Distal || arbitrary positive || 28 || 10 || 2.8 || 2.9
|-
| Randoms || Distal || alternate positive || 30 || 10 || 3.0 || 2.9
|}


Comparison:
====AGC boxes====
{{main|AGC box gene transcriptions#AGC box samplings}}


The occurrences of real HNF6 UTRs are greater than the randoms, positive direction proximals are greater than randoms, negative direction distals are at or less than randoms, positive direction distals are less than randoms. This suggests that the real HNF6s are likely active or activable, although the negative direction distals are at or less than randoms.
===AP-1 transcription factor network (Pathway)===


====T boxes====
Sixty-nine genes are included in the AP-1 transcription factor network (Pathway).<ref name=AP-1TFN>{{ cite web
|author=NCBI
|title=AP-1 transcription factor network
|publisher=National Center for Biotechnology Information, U.S. National Library of Medicine
|location=8600 Rockville Pike, Bethesda MD, 20894 USA
|date=9 March 2021
|url=https://pubchem.ncbi.nlm.nih.gov/pathway/Pathway%20Interaction%20Database:ap1_pathway
|accessdate=26 October 2021 }}</ref>


"Most bZIP proteins show high binding affinity for the ACGT motifs, which include [...] AACGTT (T box) [...]."<ref name=Zhang/>
====AGCEs====
{{main|AGCE gene transcriptions#AGCE samplings}}


Real promoter sequences on either side of A1BG only have this T box AACGTT at 2691 and 1614 in the positive direction.
===Zinc finger DNA-binding domains===


====X boxes====
====AnRE1s====
{{main|X box gene transcriptions}}
{{main|Androgen response element gene transcriptions#Androgen response element1 (Kouhpayeh) samplings}}


====Z-boxes====
====AnDRE2s====
{{main|Z box gene transcriptions}}
{{main|Androgen response element gene transcriptions#Androgen response element2 (Kouhpayeh) samplings}}
A more general Z-box consensus sequence A(C/T)A(C/G)GT(A/G)T, which includes CAGGTA<ref name=Burk>{{ cite journal
 
|author=Ulrike Burk, Jörg Schubert, Ulrich Wellner, Otto Schmalhofer, Elizabeth Vincan, Simone Spaderna, Thomas Brabletz
====AnREWs====
|title=A reciprocal repression between ZEB1 and members of the miR‐200 family promotes EMT and invasion in cancer cells
{{main|Androgen response element gene transcriptions#Androgen response element (Wilson) samplings}}
|journal=EMBO Reports
|date=1 June 2008
|volume=9
|issue=6
|pages=582-589
|url=http://embor.embopress.org/content/9/6/582
|arxiv=
|bibcode=
|doi=10.1038/embor.2008.74
|pmid=
|accessdate=15 November 2018 }}</ref> and ATACGTGT<ref name=Song/>, has two occurrences only in the positive direction from ZNF497 of ACAGGTGT at 1969 on the negative strand and ACACGTGT at 2962 on the positive strand.


===Helix-turn-helix (HTH) transcription factors===
====B-boxes====
{{main|Helix-turn-helix transcription factors}}
{{main|B box gene transcriptions#B box (Johnson) samplings}}
Gene ID: 4602 is MYB [myeloblastosis] MYB proto-oncogene, transcription factor on 6q23.3: "This gene encodes a protein with three HTH DNA-binding domains that functions as a transcription regulator. This protein plays an essential role in the regulation of hematopoiesis. This gene may be aberrently expressed or rearranged or undergo translocation in leukemias and lymphomas, and is considered to be an oncogene. Alternative splicing results in multiple transcript variants."<ref name=RefSeq4602>{{ cite web
|author=RefSeq
|title=MYB MYB proto-oncogene, transcription factor [ Homo sapiens (human) ]
|publisher=National Center for Biotechnology Information, U.S. National Library of Medicine
|location=8600 Rockville Pike, Bethesda MD, 20894 USA
|date=January 2016
|url=https://www.ncbi.nlm.nih.gov/gene/4602
|accessdate=7 February 2021 }}</ref>


====CadC binding domains====
====Box Bs====
{{main|CadC binding domain gene transcriptions}}
{{main|B box gene transcriptions#B1 box (Sanchez) samplings}}
"Altogether, the specific contacts observed suggest a consensus binding motif of 5′-T-T-A-x-x-x-x-T-3′."<ref name=Schlundt>{{ cite journal
|author=Andreas Schlundt, Sophie Buchner, Robert Janowski, Thomas Heydenreich, Ralf Heermann, Jürgen Lassak, Arie Geerlof, Ralf Stehle, Dierk Niessing, Kirsten Jung & Michael Sattler
|title=Structure-function analysis of the DNA-binding domain of a transmembrane transcriptional activator
|journal=Scientific Reports
|date=21 April 2017
|volume=7
|issue=
|pages=1051
|url=https://www.nature.com/articles/s41598-017-01031-9/briefing/signup/
|arxiv=
|bibcode=
|doi=10.1038/s41598-017-01031-9
|pmid=28432336
|accessdate=28 August 2020 }}</ref>


"The DNA consensus sequence 5′-T-T-A-x-x-x-x-T-3′ is present once in the quasi-palindromic Cad1 17-mer DNA, consistent with the formation of a 1:1 complex. However, a second consensus facilitates the formation of the 2:1 complex of CadC with Cad1 41-mer DNA as evidenced by the CadC model with the minimal Cad1 26-mer DNA that spans the two AT-rich regions, i.e. consensus sites."<ref name=Schlundt/>
===β-Scaffold factors===


The CadC binding domains occur in the UTR of A1BG between ZSCAN22 and A1BG, thirty-two on the negative strand, negative direction and three on the positive direction for a total occurrence of 17.5. The random datasets had occurrences of eight, six, thirteen, nine, nine, three, nine, ten, five, and six for a total of 78 for ten datasets yielding an average of 7.8 per strand or less than half that of the real strands in the negative direction. The over-random abundance of CadC DNA binding domains suggests they are likely active or activable.
"Higher animals have [transcription factor] TF genes for the basic domain, the β-scaffold factor, and other new
 
structures; however, their total proportion is less than 15% and most are [zinc (Zn)-coordinating factor] ZF and [Helix-Turn-Helix] HTH genes."<ref name=Nagata>{{ cite book
No CadC DNA-binding domains occurred in the real core promoters. The random datasets had one the negative direction for an occurrence of 0.1 and fourteen in the positive direction for an occurrence of 1.4 suggesting that there should have been about one if random.
|author=Toshifumi Nagata, Aeni Hosaka-Sasaki and Shoshi Kikuchi
 
|title=The Evolutionary Diversification of Genes that Encode Transcription Factor Proteins in Plants, In: ''Plant Transcription Factors Evolutionary, Structural and Functional Aspects''
Proximal promoters have one in the negative direction for an occurrence of 0.5, and nine in the positive direction for an occurrence of 4.5. The random datasets had nine in the negative direction for 0.9 and eleven in the designated positive direction for 1.1. The wide disparity in both directions relative to the almost equal results for the random datasets suggests that the CadC DNA-binding domains are likely active or activable.
|publisher=Academic Press
 
|location=
The distal promoters had thirty in the negative direction for an average of 15.0. The positive direction has fourteen for 7.0. The random datasets had 15, 8, 8, 13, 13, 13, 14, 11, 13, 11, for an average of 11.9 for the arbitrary negative direction. 20, 18, 23, 13, 13, 23, 14, 13, 20, 16, for an average of 17.3. A cumulative average of 14.6 ± 3 encompasses the real results in the negative direction but is way higher than the positive direction suggesting that the results for the positive direction between ZNF497 and A1BG are likely active or activable but the negative direction is likely random.
|date=2016
 
|editor=Daniel H. Gonzalez
====Factor II B recognition elements====
|pages=73-97
 
|url=https://www.sciencedirect.com/science/article/pii/B9780128008546000051
"The best known core promoter element is the TATA-box, consisting of an AT-rich sequence located ~27 bp upstream of the TSS, but several other core promoter elements exist, including initiator element (Inr) and X core promoter element 1 (XCPE1) localized around the TSS, the TFIIB recognition elements (BRE) that are positioned upstream of the TSS, and downstream promoter element (DPE), motif ten element (MTE) and downstream core element (DCE) that are situated downstream of TSS. The distal regulatory elements include locus control regions (LCR), enhancers, silencers and insulators. The enhancers and silencers have sites for binding multiple transcription factors and they function in activating and repressing transcription, respectively. Insulators operate by blocking genes from being affected by the regulatory elements of neighbouring genes. The LCR consists of multiple transcription regulatory elements that function together to provide proper expression regulation to a cluster of genes."<ref name= Elsing>{{ cite book
|author=Alexandra Elsing
|title=Regulation of HSF2 and its function in mitosis
|publisher=Department of Biosciences, Åbo Akademi University
|location=Turku, Finland
|date=2014
|editor=
|pages=123
|url=http://www.doria.fi/bitstream/handle/10024/98908/elsing_alexandra.pdf?sequence=2&isAllowed=y
|arxiv=
|arxiv=
|bibcode=
|bibcode=
|doi=
|doi=10.1016/B978-0-12-800854-6.00005-1
|pmid=
|pmid=
|isbn=978-952-12-3105-6
|isbn=978-0-12-800854-6
|accessdate=2018-04-22 }}</ref> The consensus sequence for the TFIIB recognition elements (BRE<sup>u</sup>) is (G/C)(G/C)(G/A)CGCC.<ref name=Kutach>{{ cite journal
|accessdate=28 November 2021 }}</ref>
|author=Alan K. Kutach, James T. Kadonaga
|title=The Downstream Promoter Element DPE Appears To Be as Widely Used as the TATA Box in ''Drosophila'' Core Promoters
|journal=Molecular and Cellular Biology
|date=July 2000
|volume=20
|issue=13
|pages=4754-64
|url=http://www.ncbi.nlm.nih.gov/pmc/articles/PMC85905/pdf/mb004754.pdf
|arxiv=
|bibcode=
|doi=
|pmid=10848601
|accessdate=2012-07-15 }}</ref>


A1BG has one BRE<sup>u</sup> in the UTR between ZSCAN22 on the positive strand in the negative direction GGCGTGG at 3047 that is an inverse complement for an occurrence of 0.5. The remaining BRE<sup>u</sup>s are in the distal promoters: seven in the negative direction and eleven in the positive direction for occurrences of 3.5 and 5.5, respectively.
====ATA boxes====
{{main|ATA box gene transcriptions#ATA box samplings}}


The random datasets had different occurrences: thirteen in the UTR for 1.3, four in the core promoters for 0.2, four in the proximal promoters for 0.2, and thirteen in the negative direction for 1.3 with twenty-nine in the positive direction for 2.9. Based on the disparity, these response elements are likely active or activable.
====Γ-interferon activated sequences====
{{main|Γ-interferon activated sequence gene transcriptions#Γ-interferon activated sequence samplings}}


====Homeoboxes====
====HMG boxes====
{{main|HMG box gene transcriptions#HMG box samplings}}
 
===Zn(II)<sub>2</sub>Cys<sub>6</sub> proteins===


The "binding of wild type and mutants TTF-1 HD to oligonucleotides containing either 5'-TAAT-3' or 5'-CAAG-3' indicate that only in the presence of the latter motif the Gln<sub>50</sub> in TTF-1 HD is utilized for DNA recognition."<ref name=Damante>{{ cite journal
"The transcription factors Uga3, Dal81 and Leu3 belong to the class III family (Zn(II)<sub>2</sub>Cys<sub>6</sub> proteins), and they recognize highly related sequences rich in GGC triplets [15]."<ref name=Ruiz>{{ cite journal
|author=G. Damante, D. Fabbro, L. Pelizari, D. Civitareale, S. Guazzi, M. Polycarpou-Schwartz, S. Cauci, F. Quadrifoglio, S. Formisano and R. Di Lauro
|author=Marcos Palavecino-Ruiz, Mariana Bermudez-Moretti, Susana Correa-Garcia
|title=Sequence-specific DNA recognition by the thyroid transcription factor-1 homeodomain
|title=Unravelling the transcriptional regulation of Saccharomyces cerevisiae UGA genes: the dual role of transcription factor LEU3
|journal=Nucleic Acids Research
|journal=Microbiology
|date=20 June 1994
|date=1 November 2017
|volume=22
|volume=
|issue=15
|issue=
|pages=3075-83
|pages=
|url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC310278/pdf/nar00039-0221.pdf
|url=https://www.researchgate.net/profile/Mariana_Bermudez3/publication/320571623_Unravelling_the_transcriptional_regulation_of_Saccharomyces_cerevisiae_UGA_genes_the_dual_role_of_transcription_factor_Leu3/links/5c62114c299bf1d14cbf7ade/Unravelling-the-transcriptional-regulation-of-Saccharomyces-cerevisiae-UGA-genes-the-dual-role-of-transcription-factor-Leu3.pdf
|arxiv=
|arxiv=
|bibcode=
|bibcode=
|doi=10.1093/nar/22.15.3075
|doi=10.1099/mic.0.000560
|pmid=7915030
|pmid=
|accessdate=6 May 2020 }}</ref>
|accessdate=21 February 2021 }}</ref>


{|class="wikitable"
====Dal81====
|-
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|-
| Reals || UTR || negative || 32 || 2 || 16 || 16
|-
| Randoms || UTR || arbitrary negative || 74 || 10 || 7.4 || 7.35
|-
| Randoms || UTR || alternate negative || 73 || 10 || 7.3 || 7.35
|-
| Reals || Core || negative || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary negative || 1 || 10 || 0.1 || 0.05
|-
| Randoms || Core || alternate negative || 0 || 10 || 0 || 0.05
|-
| Reals || Core || positive || 3 || 2 || 1.5 || 1.5
|-
| Randoms || Core || arbitrary positive || 6 || 10 || 0.6 || 0.8
|-
| Randoms || Core || alternate positive || 10 || 10 || 1.0 || 0.8
|-
| Reals || Proximal || negative || 2 || 2 || 1 || 1
|-
| Randoms || Proximal || arbitrary negative || 3 || 10 || 0.3 || 0.45
|-
| Randoms || Proximal || alternate negative || 6 || 10 || 0.6 || 0.45
|-
| Reals || Proximal || positive || 4 || 2 || 2 || 2
|-
| Randoms || Proximal || arbitrary positive || 9 || 10 || 0.9 || 0.9
|-
| Randoms || Proximal || alternate positive || 9 || 10 || 0.9 || 0.9
|-
| Reals || Distal || negative || 37 || 2 || 18.5 || 18.5
|-
| Randoms || Distal || arbitrary negative || 88 || 10 || 8.8 || 9.3
|-
| Randoms || Distal || alternate negative || 98 || 10 || 9.8 || 9.3
|-
| Reals || Distal || positive || 54 || 2 || 27 || 27
|-
| Randoms || Distal || arbitrary positive || 124 || 10 || 12.4 || 11.75
|-
| Randoms || Distal || alternate positive || 111 || 10 || 11.1 || 11.75
|}


Comparison:
====GCC boxes====
{{main|AGC box gene transcriptions#GCC box samplings}}


The occurrences of real homeoboxes are greater than the randoms. This suggests that the real homeoboxes are likely active or activable.
====GGC triplets====
{{main|GGC triplet gene transcriptions#GGC samplings}}


====Homeodomains====
=====GGCGGC triplets=====
{{main|GGC triplet gene transcriptions#GGCGGC triplet samplings}}


The Pax-4 homeodomain [HD] was shown to preferentially dimerize on DNA sequences consisting of an inverted TAAT motif, separated by 4-nucleotide spacing."<ref name=Kalousova>{{ cite journal
====Leu3====
|author=Anna Kalousová, Vladimı́r Beneš, Jan Pačes, Václav Pačes and Zbyněk Kozmik
{{main|Leu3 gene transcriptions#Leu samplings|GGC triplet gene transcriptions#Leu3 samplings}}
|title=DNA Binding and Transactivating Properties of the Paired and Homeobox Protein Pax4
|journal=Biochemical and Biophysical Research Communications
|date=June 1999
|volume=259
|issue=3
|pages=510-518
|url=https://www.sciencedirect.com/science/article/abs/pii/S0006291X99908094
|arxiv=
|bibcode=
|doi=
|pmid=10364449
|accessdate=6 May 2020 }}</ref>


{|class="wikitable"
====Uga3====
|-
{{main|Leu3 gene transcriptions}}
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|-
| Reals || UTR || negative || 17 || 2 || 8.5 || 8.5
|-
| Randoms || UTR || arbitrary negative || 84 || 10 || 8.4 || 7.8
|-
| Randoms || UTR || alternate negative || 72 || 10 || 7.2 || 7.8
|-
| Reals || Core || negative || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary negative || 1 || 10 || 0.1 || 0.05
|-
| Randoms || Core || alternate negative || 0 || 10 || 0 || 0.05
|-
| Reals || Core || positive || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary positive || 7 || 10 || 0.7 || 0.6
|-
| Randoms || Core || alternate positive || 5 || 10 || 0.5 || 0.6
|-
| Reals || Proximal || negative || 1 || 2 || 0.5 || 0.5
|-
| Randoms || Proximal || arbitrary negative || 6 || 10 || 0.6 || 0.6
|-
| Randoms || Proximal || alternate negative || 6 || 10 || 0.6 || 0.6
|-
| Reals || Proximal || positive || 5 || 2 || 2.5 || 2.5
|-
| Randoms || Proximal || arbitrary positive || 12 || 10 || 1.2 || 1.3
|-
| Randoms || Proximal || alternate positive || 14 || 10 || 1.4 || 1.3
|-
| Reals || Distal || negative || 24 || 2 || 12 || 12
|-
| Randoms || Distal || arbitrary negative || 113 || 10 || 11.3 || 10.6
|-
| Randoms || Distal || alternate negative || 99 || 10 || 9.9 || 10.6
|-
| Reals || Distal || positive || 2 || 2 || 1 || 1
|-
| Randoms || Distal || arbitrary positive || 154 || 10 || 15.4 || 15.85
|-
| Randoms || Distal || alternate positive || 163 || 10 || 16.3 || 15.85
|}


Comparison:
===Hairpin-hinge-hairpin-tail===


The occurrences of real UTR homeodomains are greater than the randoms, negative direction proximals are less than the randoms, positive direction proximals are greater than the randoms, the negative direction distals are greater than the randoms, and the positive direction distals are less than the randoms. This suggests that the real homeodomains are likely active or activable.
"In addition to this ACA box, they have the consensus H box sequence (5'-ANANNA-3') but have no other primary sequence identity. Despite this lack of primary sequence conservation, the H and ACA boxes are embedded in an evolutionarily conserved hairpin-hinge-hairpin-tail core secondary structure with the H box in the single-stranded hinge region and the ACA box in the single-stranded tail (5, 16)."<ref name=Mitchell>{{ cite journal
 
|author=James R. Mitchell, Jeffrey Cheng, ang Kathleen Collins
====HSE3 (Eastmond)====
|title=A Box H/ACA Small Nucleolar RNA-Like Domain at the Human Telomerase RNA 3' End
 
|journal=Molecular and Cellular Biology
"The GAP HSE consists of an ''n''GAA''n'' repeat, followed by any 5 bp and 2 inverted ''n''GAA''n'' repeats (''n''GAA''n''-(5-bp)-''n''GAA''nn''TTC''n'')".<ref name=Eastmond>{{ cite journal
|date=January 1999
|author=Dawn L. Eastmond and Hillary C. M. Nelson
|volume=19
|title=Genome-wide Analysis Reveals New Roles for the Activation Domains of the ''Saccharomyces cerevisiae'' Heat Shock Transcription Factor (Hsf1) during the Transient Heat Shock Response
|issue=1
|journal=Journal of Biological Chemistry
|pages=567–576
|date=October 27, 2006
|url=http://mcb.asm.org/content/19/1/567.full.pdf
|volume=281
|issue=43
|pages=P32909-32921
|url=https://www.jbc.org/article/S0021-9258(20)86866-5/fulltext
|arxiv=
|arxiv=
|bibcode=
|bibcode=
|doi=10.1074/jbc.M602454200
|doi=
|pmid=
|pmid=
|accessdate=19 January 2021 }}</ref>
|accessdate=5 November 2018 }}</ref>


{|class="wikitable"
====H and ACA boxes====
|-
{{main|H and ACA box gene transcriptions#H and ACA boxes in promoters of A1BG}}
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|-
| Reals || UTR || negative || 0 || 2 || 0 || 0
|-
| Randoms || UTR || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || UTR || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Core || negative || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Core || positive || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary positive || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate positive || 0 || 10 || 0 || 0
|-
| Reals || Proximal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Proximal || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Proximal || positive || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary positive || 0 || 10 || 0 || 0
|-
| Randoms || Proximal || alternate positive || 0 || 10 || 0 || 0
|-
| Reals || Distal || negative || 1 || 2 || 0.5 || 0.5
|-
| Randoms || Distal || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Distal || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Distal || positive || 0 || 2 || 0 || 0
|-
| Randoms || Distal || arbitrary positive || 0 || 10 || 0 || 0
|-
| Randoms || Distal || alternate positive || 0 || 10 || 0 || 0
|}


Comparison:
====H-boxes (Grandbastien)====
{{main|H box gene transcriptions#H-box (Grandbastien) samplings}}


The occurrence of a real HSE3 (Eastmond) is greater than the randoms (all zero). This suggests that the real HSE3 (Eastmond) is likely active or activable.
====H-boxes (Lindsay)====
{{main|H box gene transcriptions#H-box (Lindsay) samplings}}


====HSE4 (Eastmond)====
====H boxes (Mitchell)====
{{main|H box gene transcriptions#H boxes (Mitchell) samplings}}


"The STP HSE has a 5-bp insert between each of the 3 ''n''GAA''n'' repeats, yielding the sequences ''n''GAA''n''-(5-bp)-''n''GAA''n''-(5-bp)-''n''GAA''n'' and ''n''TTC''n''-(5-bp)-''n''TTC''n''-(5-bp)-''n''TTC''n'' (30)."<ref name=Eastmond/>
====H boxes (Rozhdestvensky)====
{{main|H box gene transcriptions#H boxes (Rozhdestvensky) in promoters of A1BG}}


{|class="wikitable"
===Unknown response element types===
|-
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|-
| Reals || UTR || negative || 0 || 2 || 0 || 0
|-
| Randoms || UTR || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || UTR || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Core || negative || 1 || 2 || 0.5 || 0.5
|-
| Randoms || Core || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Core || positive || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary positive || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate positive || 0 || 10 || 0 || 0
|-
| Reals || Proximal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Proximal || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Proximal || positive || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary positive || 0 || 10 || 0 || 0
|-
| Randoms || Proximal || alternate positive || 0 || 10 || 0 || 0
|-
| Reals || Distal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Distal || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Distal || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Distal || positive || 0 || 2 || 0 || 0
|-
| Randoms || Distal || arbitrary positive || 0 || 10 || 0 || 0
|-
| Randoms || Distal || alternate positive || 0 || 10 || 0 || 0
|}


Comparison:
====ACEs====
{{main|MYB recognition element gene transcriptions#ACE samplings}}


The occurrence of a real HSE4 (Eastmond) is greater than the randoms. This suggests that the real HSE4 (Eastmond) is likely active or activable.
====BBCABW Inrs====
{{main|Initiator element gene transcriptions#BBCABW samplings}}


====HSE8 GAP1 (Eastmond)====
====Calcineurin-responsive transcription factors====
{{main|Calcineurin-responsive transcription factor gene transcriptions#CRT samplings}}


"As per previous studies (27, 30), we also allowed a single mismatch (''n''GAR) in one of the three ''n''GAA''n'' [nGA(A/G)n-(5-bp)-nGAAnnTTCn] repeats for PFT or GAP."<ref name=Eastmond/>
====Carbs====
{{main|Carbohydrate response element gene transcriptions#ACCGG (Carb) samplings}}


{|class="wikitable"
====Carb1s====
|-
{{main|Carbohydrate response element gene transcriptions#CCCAT (Carb1) samplings}}
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)  
|-
| Reals || UTR || negative || 0 || 2 || 0 || 0
|-
| Randoms || UTR || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || UTR || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Core || negative || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Core || positive || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary positive || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate positive || 0 || 10 || 0 || 0
|-
| Reals || Proximal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Proximal || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Proximal || positive || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary positive || 0 || 10 || 0 || 0
|-
| Randoms || Proximal || alternate positive || 0 || 10 || 0 || 0
|-
| Reals || Distal || negative || 1 || 2 || 0.5 || 0.5
|-
| Randoms || Distal || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Distal || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Distal || positive || 0 || 2 || 0 || 0
|-
| Randoms || Distal || arbitrary positive || 1 || 10 || 0.1 || 0.05
|-
| Randoms || Distal || alternate positive || 0 || 10 || 0 || 0
|}


Comparison:
====Cat8s====
{{main|Cat8p gene transcriptions#Cat8p samplings}}


The occurrence of a real HSE8 GAP1 (Eastmond) is greater than the randoms. This suggests that the real HSE8 GAP1 (Eastmond) is likely active or activable.
====Cell-cycle box variants====
{{Main|Cell-cycle box gene transcriptions#CCB variant samplings}}


====HSE9 GAP2 (Eastmond)====
====CGCG boxes====
{{main|CGCG box gene transcriptions#CGCG box samplings}}


"As per previous studies (27, 30), we also allowed a single mismatch (''n''GAR) in one of the three ''n''GAA''n'' [nGAAn-(5-bp)-nGARnnTTCn] repeats for PFT or GAP."<ref name=Eastmond/>
====Circadian control elements====
{{main|Circadian control element gene transcriptions#CCE samplings}}


{|class="wikitable"
====Cold-responsive elements====
|-
{{main|Cold-responsive element gene transcriptions#Cold-responsive element samplings}}
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|-
| Reals || UTR || negative || 0 || 2 || 0 || 0
|-
| Randoms || UTR || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || UTR || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Core || negative || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Core || positive || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary positive || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate positive || 0 || 10 || 0 || 0
|-
| Reals || Proximal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Proximal || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Proximal || positive || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary positive || 0 || 10 || 0 || 0
|-
| Randoms || Proximal || alternate positive || 0 || 10 || 0 || 0
|-
| Reals || Distal || negative || 1 || 2 || 0.5 || 0.5
|-
| Randoms || Distal || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Distal || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Distal || positive || 0 || 2 || 0 || 0
|-
| Randoms || Distal || arbitrary positive || 0 || 10 || 0 || 0
|-
| Randoms || Distal || alternate positive || 0 || 10 || 0 || 0
|}


Comparison:
====Copper response elements====
{{main|Copper response element gene transcriptions}}


The occurrences of a real HSE9 GAP2 (Eastmond) is greater than the randoms. This suggests that the real HSE9 GAP2 (Eastmond) is likely active or activable.
=====CuREQs=====
{{main|Copper response element gene transcriptions#CuRE (Quinn) samplings}}


====Hsf (Tang)====
=====CuREPs=====
{{main|Copper response element gene transcriptions#CuRE (Park) samplings}}


The upstream activating sequence (UAS) for the Hsf1p is NGAAN.<ref name=Tang/>
====Cytoplasmic polyadenylation elements====
{{main|Cytoplasmic polyadenylation element gene transcriptions#CPE samplings}}


{|class="wikitable"
====DAF-16 binding elements====
|-
{{main|DAF-16 binding element gene transcriptions#DBE samplings}}
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|-
| Reals || UTR || negative || 56+41=97 || 2 || 48.5±7 || 48.5 (41.5-55.5)
|-
| Randoms || UTR || arbitrary negative || 292 || 10 || 29.2 || 30.85
|-
| Randoms || UTR || alternate negative || 325 || 10 || 32.5 || 30.58
|-
| Reals || Core || negative || 11 || 2 || 5.5 || 5.5 (5-6)
|-
| Randoms || Core || arbitrary negative || 6 || 10 || 0.6 || 0.85
|-
| Randoms || Core || alternate negative || 11 || 10 || 1.1 || 0.85
|-
| Reals || Core || positive || 5 || 2 || 2.5 || 2.5 (2-3)
|-
| Randoms || Core || arbitrary positive || 32 || 10 || 3.2 || 3.25
|-
| Randoms || Core || alternate positive || 33 || 10 || 3.3 || 3.25
|-
| Reals || Proximal || negative || 14 || 2 || 7 || 7 (6-8)
|-
| Randoms || Proximal || arbitrary negative || 35 || 10 || 3.5 || 3.45
|-
| Randoms || Proximal || alternate negative || 34 || 10 || 3.4 || 3.45
|-
| Reals || Proximal || positive || 16 || 2 || 8 || 8 (7-9)
|-
| Randoms || Proximal || arbitrary positive || 41 || 10 || 4.1 || 3.85
|-
| Randoms || Proximal || alternate positive || 36 || 10 || 3.6 || 3.58
|-
| Reals || Distal || negative || 141 || 2 || 70.5 || 70.5 (66-75)
|-
| Randoms || Distal || arbitrary negative || 483 || 10 || 48.3 || 47.2
|-
| Randoms || Distal || alternate negative || 461 || 10 || 46.1 || 47.2
|-
| Reals || Distal || positive || 226 || 2 || 113 || 113 (111-115)
|-
| Randoms || Distal || arbitrary positive || 742 || 10 || 74.2 || 73.8
|-
| Randoms || Distal || alternate positive || 734 || 10 || 73.4 || 73.8
|}


Comparison:
====D box (Samarsky)====
{{main|D box gene transcriptions#Dbox (Samarsky) samplings}}


The occurrences of real Hsf (Tang) are outside the ranges of the randoms. This suggests that the real responsive element consensus sequences are likely active or activable.
====D box (Voronina)====
{{main|D box gene transcriptions#D box (Voronina) samplings}}


====MYB recognition elements====
====D-box (Motojima)====
{{main|MYB recognition element gene transcriptions}}
{{main|D box gene transcriptions#(Motojima) samplings}}
"These elements fit the type II MYB consensus sequence A(A/C)C(A/T)A(A/C)C, suggesting that they are MYB recognition elements (MREs)."<ref name=Rushton>{{ cite journal
|author=Paul J Rushton and Imre E Somssich
|title=Transcriptional control of plant genes responsive to pathogens
|journal=Current Opinion in Plant Biology
|date=August 1998
|volume=1
|issue=4
|pages=311-5
|url=http://arquivo.ufv.br/dbv/pgfvg/bve684/htms/pdfs_revisao/estresse/transcriptional.pdf
|arxiv=
|bibcode=
|doi=10.1016/1369-5266(88)80052-9
|pmid=
|accessdate=5 November 2018 }}</ref>


===Basic helix-loop-helix (bHLH) transcription factors===
====dBRE====
{{main|Basic helix–loop–helix}}
{{main|Downstream TFIIB recognition element gene transcriptions#dBRE samplings}}
"The [palindromic E-box motif (CACGTG)] motif is bound by the transcription factor Pho4, [and has the] class of basic helix-loop-helix DNA binding domain and core recognition sequence (Zhou and O'Shea 2011)."<ref name=Rossi/>


"Pho4 bound to virtually all E-boxes ''in vitro'' (96%) [...]. That was not the case ''in vivo'', where only 5% were bound by Pho4, under activating conditions as determined by ChIP-seq [Zhou and O'Shea 2011]."<ref name=Rossi/>
====Downstream core elements====
{{main|Downstream core element gene transcriptions}}


"Pho4 possesses the intrinsic ability to bind every E-box, but ''in vivo'' is prevented from binding by chromatin unless assisted by chromatin remodelers (Svaren ''et al.'' 1994) that are targeted at promoter regions."<ref name=Rossi/>
====DCE SI====
{{main|Downstream core element gene transcriptions#Downstream core element SI samplings}}
 
====DCE SII====
{{main|Downstream core element gene transcriptions#Downstream core element SII samplings}}
 
====DCE SIII====
{{main|Downstream core element gene transcriptions#Downstream core element SIII samplings}}


"On one end of that spectrum, typical transcription factors like Pho4 do not appear to compete with nucleosomes and instead predominantly sample motifs that already exist in the [nucleosome-free promoter regions] NFRs generated by other factors. In vitro (PB-exo), Pho4 bound nearly every instance of an E-box motif across the yeast genome. However, in vivo, Pho4 is a low-abundance protein that is recruited to the nucleus upon phosphate starvation by other factors, to act at a few dozen genes (Komeili and O'Shea 1999; Zhou and O'Shea 2011). Since Pho4 appears unable to compete with nucleosomes, competent sites that are occluded by nucleosomes are invisible to Pho4."<ref name=Rossi/>
====DPE (Juven-Gershon)====
{{main|Downstream promoter element gene transcriptions#DPE (Juven-Gershon) samplings}}


The Pho4 homodimer binds to DNA sequences containing the bHLH binding site CACGTG.<ref name=Shao>{{ cite journal
====DPE (Kadonaga)====
|author=Dalei Shao, Caretha L. Creasy, Lawrence W. Bergman
{{main|Downstream promoter element gene transcriptions#DPE (Kadonaga) samplings}}
|title = A cysteine residue in helixII of the bHLH domain is essential for homodimerization of the yeast transcription factor Pho4p
|journal = Nucleic Acids Research
|volume = 26
|issue = 3
|pages = 710–4
|date= 1 February 1998
|pmid = 9443961
|pmc = 147311
|doi = 10.1093/nar/26.3.710
|url = https://academic.oup.com/nar/article/26/3/710/1052045 }}</ref>


The upstream activating sequence (UAS) for Pho4p is CAC(A/G)T(T/G) in the promoters of ''HIS4'' and ''PHO5'' regarding phosphate limitation with respect to regulation of the purine and histidine biosynthesis pathways [66].<ref name=Tang>{{ cite journal
====DPE (Matsumoto)====
|author=Hongting Tang, Yanling Wu, Jiliang Deng, Nanzhu Chen, Zhaohui Zheng, Yongjun Wei, Xiaozhou Luo, and Jay D. Keasling
{{main|Downstream promoter element gene transcriptions#DPE (Matsumoto) samplings}}
|title=Promoter Architecture and Promoter Engineering in ''Saccharomyces cerevisiae''
|journal=Metabolites
|date=6 August 2020
|volume=10
|issue=8
|pages=320-39
|url=https://www.mdpi.com/2218-1989/10/8/320/pdf
|arxiv=
|bibcode=
|doi=10.3390/metabo10080320
|pmid=32781665
|accessdate=18 September 2020 }}</ref>


bHLH proteins typically bind to a consensus sequence called an E-box, CANNTG.<ref name="pmid10319327">{{cite journal |author=Chaudhary J, Skinner MK |title=Basic helix-loop-helix proteins can act at the E-box within the serum response element of the c-fos promoter to influence hormone-induced promoter activation in Sertoli cells |journal=Mol. Endocrinol. |volume=13 |issue=5 |pages=774–86 |date=1999 |pmid=10319327 |doi=10.1210/mend.13.5.0271 }}</ref>
====EIN3 binding sites====
{{main|EIN3 binding site gene transcriptions#EIN3 samplings}}


"A computer search for transcription promoter elements [...] showed the presence of a prominent TATA box 22 nucleotides upstream of the transcription start site and an [[Sp1]] site at position -42 to -33. The 5'-flanking sequence also contains three E boxes with CANNTG consensus sequences at positions -464 to -459, -90 to -85, and -52 to -47 that have been marked as [[E box]], [[E1 box]], and [[E2 box]], respectively [...]. In addition, the 5'-flanking region contains one or more [[GRE]], [[Aryl hydrocarbon receptor#DNA binding (xenobiotic response element – XRE)|XRE]], [[GATA1|GATA-1]], [[ATF4|GCN-4]], [[ETV4|PEA-3]], [[AP-1 (transcription factor)|AP1]], and [[Activating protein 2|AP2]] consensus motifs and also three imperfect CArG sites [...]."<ref name=Lenka>{{ cite journal
====Endosperm expressions====
|author=Nibedita Lenka, Aruna Basu, Jayati Mullick, and Narayan G. Avadhani
{{main|Endosperm expression gene transcriptions#Endosperm expression samplings}}
|title=The role of an E box binding basic helix loop helix protein in the cardiac muscle-specific expression of the rat cytochrome oxidase subunit VIII gene
|journal=The Journal of Biological Chemistry
|date=22 November 1996
|volume=271
|issue=47
|pages=30281–30289
|url=http://www.jbc.org/content/271/47/30281.full.pdf
|arxiv=
|bibcode=
|doi=10.1074/jbc.271.47.30281
|pmid=
|accessdate=7 February 2019 }}</ref>


====Aryl hydrocarbon responsive DNA-binding consensus sequences====
====Estrogen response elements====
{{main|Estrogen response element gene transcriptions}}


The TCDD*AhR DNA-binding consensus sequence is GCGTGNN(A/T)NNN(C/G).<ref name=Yao/>
=====ERE1s=====
{{main|Estrogen response element gene transcriptions#ERE1 (Driscoll) samplings}}


These AhR DNA-binding consensus sequences occur only in the positive direction, eleven sequences all in the distal promoter: three on the negative strand and eight on the positive strand for an occurrence of 5.5. All of the real occurrences were closer to ZNF497 than to A1BG, suggesting promotion of ZNF497. For all four promoters the occurrence would be 2.75.
=====ERE2s=====
{{main|Estrogen response element gene transcriptions#EREs (Driscoll) samplings}}


The random datasets had thirteen sequences in twenty strands for an occurrence of 0.65 per strand independent of direction. One sequence occurred in the arbitrarily chosen negative direction in the UTR out of ten for 0.1, likewise for the core promoter in the positive direction for 0.1. For the distal promoters there were six in the negative direction and five in the positive for occurrences of 0.6 and 0.5 or 0.55.
====GAAC elements====
{{main|GAAC element gene transcriptions#GAAC element samplings}}


The unusual distribution of of AhRY elements suggests likely active or activable even if not for A1BG.
====GC boxes (Briggs)====
{{main|GC box gene transcriptions#GC box (Briggs) samplings}}


====Aryl hydrocarbon responsive elements II====
====GC boxes (Ye)====
{{main|GC box gene transcriptions#GC box (Ye) samplings}}


CATGN<sub>6</sub>C(A/T)TG is the consensus sequence for AHRE-II.<ref name=Boutros>{{ cite journal
====GC boxes (Zhang)====
| author = Boutros PC, Moffat ID, Franc MA, Tijet N, Tuomisto J, Pohjanvirta R, Okey AB
{{main|GC box gene transcriptions#GC box (Zhang) samplings}}
| title = Dioxin-responsive AHRE-II gene battery: identification by phylogenetic footprinting
| journal = Biochemical and Biophysical Research Communications
| volume = 321
| issue = 3
| pages = 707–15
| date = August 2004
| pmid = 15358164
| doi = 10.1016/j.bbrc.2004.06.177 }}</ref><ref name=Sogawa>{{ cite journal
| author = Sogawa K, Numayama-Tsuruta K, Takahashi T, Matsushita N, Miura C, Nikawa J, Gotoh O, Kikuchi Y, Fujii-Kuriyama Y
| title = A novel induction mechanism of the rat CYP1A2 gene mediated by Ah receptor-Arnt heterodimer
| journal = Biochemical and Biophysical Research Communications
| volume = 318
| issue = 3
| pages = 746–55
| date = June 2004
| pmid = 15144902
| doi = 10.1016/j.bbrc.2004.04.090 }}</ref>


Between ZSCAN22 and A1BG (negative direction) on the positive strand is the consensus sequence CATGGTGGCTCATG at 4116. For four strands (2) and directions (2) there is only only occurrence for 0.25. Using twenty random datasets (ten for the direct and ten for the complement inverse), no consensus sequence for AHRE-II was found for an occurrence of 0.0. This suggests that the one occurrence is not random but likely active or activable.
====GCR1s====
{{main|Gcr1p gene transcriptions#GCR1 samplings}}


====Antioxidant-electrophile responsive elements====
====GREs====
{{main|Gibberellin responsive element gene transcriptions#GRE samplings}}


Using the ARE Consensus GC(A/C/T)(A/G/T)(A/G/T)(C/G/T)T(A/C)A<ref name=Lacher>{{ cite journal
====GT boxes (Sato)====
|author=Sarah E. Lacher, Daniel C. Levings, Samuel Freeman, Matthew Slattery
{{main|TC element gene transcriptions#GT box (Sato) samplings}}
|title=Identification of a functional antioxidant response element at the HIF1A locus
 
|journal=Redox Biology
====Hex sequences====
|date=October 2018
{{main|Hex sequence gene transcriptions#Hex core samplings}}
|volume=19
|issue=
|pages=401-411
|url=https://www.sciencedirect.com/science/article/pii/S2213231718305391
|arxiv=
|bibcode=
|doi=10.1016/j.redox.2018.08.014
|pmid=
|accessdate=6 October 2020 }}</ref> to look for more general AREs, many occur in the promoters of A1BG, where the putative ARE from Human, Chimp, Gorilla, Rhesus, Mouse, and Rat is TGCTGAGTCAT, inside the outer Ts.<ref name=Lacher/> The GCRE (TGAGTCA) occurs within the ARE (Lacher).


The ARE Consensus (Lacher ''et al.'' 2018) occurs in the negative direction UTR (1.0,2.0) and all four distal promoters (1.0,3.0) and (1.0,1.0).
====HY boxes====
{{main|HY box gene transcriptions#HY box samplings}}


The random datasets had seventeen for ten datasets (1.7) in the UTR. In the distal promoters, the reals have six for four (1.5), whereas the randoms have twenty-three for forty (0.575). While it appears the UTR sequences could be random as they are encompassed by the randoms, the discrepancy between reals in the distal promoters and the randoms is significant, suggesting that overall the sequences are likely active or activable.
====IFNs====
{{main|Interferon regulatory factor gene transcriptions#IFN-stimulated response element samplings}}


The predominant consensus sequence for Human, Chimp, Gorilla, Rhesus, Mouse, and Rat is TGCTGAGTCAT.<ref name=Lacher/> This does not include the outside Ts, one at each end. The predominant consensus sequence does not occur in the promoters of A1BG.
====Inr-like, TCTs====
{{main|Initiator element gene transcriptions#Inr-like, TCTs sampling}}


====CAT boxes====
====IRF3s====
{{main|CAT box gene transcriptions}}
{{main|Interferon regulatory factor gene transcriptions#IRF-3 samplings}}
The "5‘ flanking region of the rat acetylcholine receptor (AChR) ''β'' subunit gene [with] regulatory elements that confer muscle specificity [includes] a minimal TATA-box-less promoter region containing an initiator motif. An 85-bp fragment [promotes] high muscle-specific expression of a chloramphenicol acetyltransferase (CAT) reporter construct upon transfection in primary muscle cells. This sequence can be functionally dissected in a basal muscle-specific promoter element carrying a M-CAT box that is flanked at the 5’ end by an enhancer element with two binding sites for myogenic factors. Point mutations in the M-CAT box cause the loss of transcriptional activity of the basal promoter fragment. The enhancer activity depends on the presence of both E boxes that cooperate in a synergistic fashion. [The] control of muscle-specific and developmental expression of the rat AChR ''β'' subunit gene requires both regulatory elements, the M-CAT box and two adjacent E boxes, located in close proximity to each other."<ref name=Berberich>{{ cite journal
|author=Christof Berberich, Ingolf Dürr, Michael Koenen and Veit Witzemann
|title=Two adjacent E box elements and a M‐CAT box are involved in the muscle‐specific regulation of the rat acetylcholine receptor β subunit gene
|journal=European Journal of Biochemistry
|date=September 1993
|volume=216
|issue=2
|pages=395-404
|url=https://febs.onlinelibrary.wiley.com/doi/pdf/10.1111/j.1432-1033.1993.tb18157.x
|arxiv=
|bibcode=
|doi=10.1111/j.1432-1033.1993.tb18157.x
|pmid=
|accessdate=27 December 2019 }}</ref>


"The M-CAT consensus sequence [is] CATTCCT".<ref name=Berberich/>
====IRSs====
{{main|Interferon regulatory factor gene transcriptions#IRS consensus samplings}}


====CAT-box-like elements====
====KAR2s====
{{main|Hac1p gene transcriptions#KAR2 samplings}}


"A CAT-box-like element, GCCATT [34], adjacent to the GC-box, is conserved in the three promoters."<ref name=Berberich/>
====MBE1s====
{{main|Musashi binding element gene transcriptions#MBE1 samplings}}


The reals have three sequences in the UTR with an occurrence of 1.5. The randoms had eight sequences for an occurrence of 0.8.
====MBE2s====
{{main|Musashi binding element gene transcriptions#MBE2 samplings}}


The reals have none in the core promoters or proximal promoters, whereas the randoms had one in the core promoter for an occurrence of 0.1. And, none in the proximal promoters.
====MBE3s====
{{main|Musashi binding element gene transcriptions#MBE3 samplings}}


The reals have one distal promoter sequence for an occurrence of 0.25 in the negative direction out of two for each direction. The randoms had fourteen in the arbitrary negative direction for an occurrence of 1.4. The randoms had sixteen in the positive direction for an occurrence of 1.6.
====NF𝜿BSs====
{{main|Nuclear factor gene transcriptions#NF𝜿B (Sato) samplings}}


The disparity between the reals and randoms suggests that the reals are likely active or activable.
====PREs====
{{main|Polycomb response element gene transcriptions#Core samplings}}


===="Class C"====
====Pribs====
{{main|Pribnow box gene transcriptions#Pribnow box samplings}}


The "Class C" DNA binding site at position -379/-374 in a reverse (-) orientation with a consensus sequence of CACGNG of the bHLH Hey-1 protein had a strong DNA binding activity.<ref name=Leal>{{ cite journal
====RAREs====
|author=María C. Leal, Ezequiel I. Surace, María P. Holgado, Carina C. Ferrari, Rodolfo Tarelli, Fernando Pitossi, Thomas Wisniewski, Eduardo M. Castaño, and Laura Morelli
{{main|Retinoic acid response element gene transcriptions#RARE samplings}}
|title=Notch signaling proteins HES-1 and Hey-1 bind to insulin degrading enzyme (IDE) proximal promoter and repress its transcription and activity: Implications for cellular Aβ metabolism
|journal=Biochim Biophys Acta
|date=19 October 2011
|volume=1823
|issue=2
|pages=227-235
|url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3307219/
|arxiv=
|bibcode=
|doi=10.1016/j.bbamcr.2011.09.014
|pmid=22036964
|accessdate=21 March 2021 }}</ref>


The real promoters have five consensus sequences in the UTR negative direction with an occurrence of 2.5. In the core promoters there are only two sequences with an occurrence of 0.5. Only one sequence occurs in the proximal promoter for an occurrence of 0.25. The distal promoters have seven in the negative direction for an occurrence of 3.5 and thirty-two in the positive direction for an occurrence of 3.2.
====Rgts====
{{main|Rgt1p gene transcriptions#RGT samplings}}


The random datasets had fifteen consensus sequences for an occurrence of 1.5. There are three in the core promoter for an occurrence of 0.3. None in the proximal promoters. Sixteen sequences were in the negative direction distal promoter for an occurrence of 1.6. Thirty-four sequences were in the arbitrary positive direction for an occurrence of 3.4.
====ROREs====
{{main|ROR-response element gene transcriptions#RORE samplings}}


In each promoter except the arbitrary positive direction distal promoter the reals over occurrence the randoms. In the distals real occurrences are 3.5 (negative) and 3.2 (positive), where the randoms had 3.4. This suggests that the occurrences in the distal promoters may be random but the occurrences of consensus sequences in the other promoters are likely active or activable.
====SERVs====
{{main|Servenius sequence gene transcriptions#Servenius samplings}}


====Dioxin-responsive elements====
====STAT5s====
{{main|STAT5 gene transcription laboratory#STAT5 samplings}}


"The DRE consensus sequence, 5′-TNGCGTG-3′ is conserved throughout most species and occurs in multiples within the promoter of a target gene [1, 4]."<ref name=Pinne>{{ cite book
====STREs====
|author=Marija Pinne and Judy L. Raucy
{{main|Msn2,4p gene transcriptions#Stress-response element samplings}}
|title=Cytochrome P450 Gene Regulation: Reporter Assays to Assess Aryl Hydrocarbon Receptor (HLHE76, AhR) Activation and Antagonism, In: ''Cytochrome P450. Methods in Pharmacology and Toxicology''
|publisher=Humana
|location=New York, NY USA
|date=10 July 2021
|pages=157-174
|url=https://link.springer.com/protocol/10.1007/978-1-0716-1542-3_10
|arxiv=
|bibcode=
|doi=10.1007/978-1-0716-1542-3_10
|{{isbn|978-1-0716-1541-6}}
|pmid=
|accessdate=1 November 2021 }}</ref>


The dioxin response element occurs in the UTR between ZSCAN22 and A1BG on the positive strand, negative direction CACGCCA, the complement inverse, at 3282. It also occurs in the distal promoters, both in the negative direction and has three occurrences on the positive strand, positive direction for a total of eight occurrences in four strands for 2.0.
====Sucroses====
{{main|Sucrose box gene transcriptions#Sucrose box samplings}}


With the twenty random datasets, there are only eleven occurrences for 0.55, which suggests these DREs are likely active or activable. The random occurrences were also in the UTR (one) and distal promoters (10).
====TACTs====
{{main|TACTAAC box gene transcriptions#TACT samplings}}


====E-boxes====
====TAGteams====
{{main|TAGteam gene transcriptions#TAGteam samplings}}


Consensus sequences: CACGTG.<ref name=Chaudhary>{{ cite journal
====TAPs====
|author=Jaideep Chaudhary and Michael K. Skinner
{{main|Tapetum box gene transcriptions#Tapetum box samplings}}
|title=Basic Helix-Loop-Helix Proteins Can Act at the E-Box within the Serum Response Element of the c-fos Promoter to Influence Hormone-Induced Promoter Activation in Sertoli Cells
|journal=Molecular Endocrinology
|date=May 1999
|volume=13
|issue=5
|pages=774-86
|url=http://mend.endojournals.org/content/13/5/774.short
|arxiv=
|bibcode=
|doi=10.1210/me.13.5.774
|pmid=10319327
|accessdate=2013-06-14 }}</ref>


=====ChoRE motifs=====
====TATAs====
{{main|TATA box gene transcriptions#TATA box samplings}}
Examining the promoter regions upstream from ZSCAN22 to A1BG and downstream from ZNF497 to A1BG for TATA boxes has shown that TATA boxes in various forms are present and likely active or activable: (1) TATAAAA (Carninci 2006), (2) TATA(A/T)A(A/T) (Watson 2014), (3) TATA(A/T)AA(A/G) (Juven-Gershon 2010), and (4) TATA(A/T)A(A/T)(A/G) (Basehoar 2004).


ATCTTG and TCCGCC are the two E-boxes in ChoRE motifs.<ref name=Long/>
The TATA boxes have the pattern of appearing in only the negative direction UTRs, proximal and distals. The shorter TATA box: TATAAA does appear as above but also in the positive direction as the complement inverse TTTATA at 2588 in the distal promoter.


For the first enhancer box (ATCTTG) there is one sequence in the UTR for an occurrence of 0.5, two sequence in the proximal promoter: positive strand, positive direction for 0.5.
====TATABs====
{{main|TATA box gene transcriptions#TATA box (Butler 2002) samplings}}


In the distal promoter there are two consensus sequences for 0.75.
====TATACs====
{{main|TATA box gene transcriptions#TATA boxes (Carninci 2006) samplings}}


The random sequences had five in the UTR for an occurrence of 2.5, none in the core or proximal promoters, and eleven in the negative direction for an occurrence of 1.1. In the arbitrary positive direction there were eleven for an occurrence of 1.1.
====TATAJs====
{{main|TATA box gene transcriptions#TATA box (Juven-Gershon 2010) samplings}}


There is no apparent match between random datasets and the real promoters. The consensus sequences are likely active or activable.
====TATAWs====
{{main|TATA box gene transcriptions#TATA box (Watson 2014) samplings}}


The real occurrences of Carb E3 (TCCGCC) are in the UTR for 1.0, zero in the core promoters, proximal promoters have an occurrence of 0.5, the distal promoters have thirteen in the negative direction for an occurrence of 6.5 and five in the positive direction for an occurrence of 2.5.
====TEAs====
{{main|TEA consensus sequence gene transcriptions#TEA samplings}}


The random datasets had two occurrences in the UTR for 0.2, one in the core promoters for 0.05, zero in the proximal promoters, and in the distal promoters 0.6 in the negative direction and 2.0 in the arbitrary positive direction, or an overall occurrence of 1.6.
====TECs====
{{main|Tec1p gene transcriptions#Tec1 samplings}}


Comparing the real occurrences of Carb E3 to the random occurrences indicates that the reals are likely active or activable.
====THRs====
{{main|Thyroid hormone response element gene transcriptions#THR samplings}}


=====Phors=====
====TRFs====
{{main|Telomeric repeat DNA-binding factor gene transcriptions#TRF samplings}}


"The [palindromic E-box motif (CACGTG)] motif is bound by the transcription factor Pho4, [and has the] class of basic helix-loop-helix DNA binding domain and core recognition sequence (Zhou and O'Shea 2011)."<ref name=Rossi/>
====UPREs====
{{main|Unfolded protein response element gene transcriptions#UPRE samplings}}


{|class="wikitable"
====UPRE-1s====
|-
{{main|Hac1p gene transcriptions#UPRE-1 samplings}}
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|-
| Reals || UTR || negative || 0 || 2 || 0 || 0
|-
| Randoms || UTR || arbitrary negative || 2 || 10 || 0.2 || 0.15
|-
| Randoms || UTR || alternate negative || 1 || 10 || 0.1 || 0.15
|-
| Reals || Core || negative || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Core || positive || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary positive || 0 || 10 || 0 || 0.05
|-
| Randoms || Core || alternate positive || 1 || 10 || 0.1 || 0.05
|-
| Reals || Proximal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Proximal || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Proximal || positive || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary positive || 0 || 10 || 0 || 0
|-
| Randoms || Proximal || alternate positive || 0 || 10 || 0 || 0
|-
| Reals || Distal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Distal || arbitrary negative || 2 || 10 || 0.2 || 0.25
|-
| Randoms || Distal || alternate negative || 3 || 10 || 0.3 || 0.25
|-
| Reals || Distal || positive || 5 || 2 || 2.5 || 2.5
|-
| Randoms || Distal || arbitrary positive || 4 || 10 || 0.4 || 0.3
|-
| Randoms || Distal || alternate positive || 2 || 10 || 0.2 || 0.3
|}


Comparison:
====URS (Sumrada, core)====
{{main|DNA damage response element gene transcriptions#URS1 (Sumrada, core) samplings}}


The occurrences of real E-box motif are greater than the randoms. This suggests that the real E-box motifs are likely active or activable.
====VDREs====
{{main|Vitamin D response element gene transcriptions#VDRE samplings}}


====E2 boxes====
====XCPE1s====
{{main|X core promoter element gene transcriptions#XCPE1 samplings}}


"The most dramatic impact on immunoglobulin gene enhancer activity was observed upon mutation of sites that contain an E2-box motif (G/ACAGNTGN)."<ref name=Murre>{{ cite journal
====Yaps====
|author=Cornelis Murre and David Baltimore
{{main|Yap1p,2p gene transcriptions#Yap samplings}}
|title=The Helix-Loop-Helix Motif: Structure and Function, In: ''Transcriptional Regulation''
|publisher=Cold Spring Harbor Laboratory Press
|location=
|date=1992
|volume=22B
|editor=
|pages=861-79
|url=https://cshmonographs.org/csh/index.php/monographs/article/viewPDFInterstitial/3449/2723
|arxiv=
|bibcode=
|doi=10.1101/087969425.22B.861
|pmid=
|isbn=
|accessdate=2017-02-08 }}</ref>


{|class="wikitable"
====YYRNWYY Inrs====
|-
{{main|Initiator element gene transcriptions#YYRNWYY samplings}}
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|-
| Reals || UTR (ZSCAN22-A1BG) || negative || 5 || 2 || 2.5 || 2.5
|-
| Randoms || UTR (ZSCAN22-A1BG) || arbitrary negative || 13 || 10 || 1.3 || 0.8
|-
| Randoms || UTR (ZSCAN22-A1BG) || alternate negative || 3 || 10 || 0.3 || 0.8
|-
| Reals || Distal || negative || 7 || 2 || 3.5 || 3.5
|-
| Randoms || Distal || negative || 7 || 10 || 0.7 || 0.9
|-
| Reals || Distal || positive || 1 || 2 || 0.5 || 0.5
|-
| Randoms || Distal || positive || 11 || 10 || 1.1 || 0.9
|}


Comparisons:
==A1BG orthologs==
# For the UTR the reals are systematically higher in occurrence than the randoms.
# For the distals the occurrence in the negative direction is systematically higher and the positive direction is systematically lower than for the randoms.


Conclusion: the reals are likely active or activable.
===''Geotrypetes seraphini''===
[[Image:Geotrypetes seraphini 81151944.jpg|thumb|right|250px|''Geotrypetes seraphini'', the Gaboon caecilian, is a species of amphibian. Credit: [https://www.inaturalist.org/users/7865 Marius Burger].{{tlx|free media}}]]
''Geotrypetes seraphini'', the Gaboon caecilian, is a species of amphibian in the family ''Dermophiidae''.<ref name=IUCN>{{cite journal |author=IUCN SSC Amphibian Specialist Group |date=2019 |title=''Geotrypetes seraphini'' |volume=2019 |page=e.T59557A16957715 |url=https://en.wikipedia.org/wiki/IUCN_Red_List
|doi=10.2305/IUCN.UK.2019-1.RLTS.T59557A16957715.en |accessdate=16 November 2021}}</ref>


====Enhancer boxes====
Its A1BG ortholog has 368 aa vs 495 aa for ''Homo sapiens''.
The consensus sequence for the Enhancer box element is CANNTG, with a palindromic canonical sequence of CACGTG.<ref name=Chaudhary/>
{{clear}}


{|class="wikitable"
==ZSCAN22==
|-
{{main|ZSCAN22}}
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
# Gene ID: 342945 is ZSCAN22 zinc finger and SCAN domain containing 22 on 19q13.43.<ref name=HGNC342945>{{ cite web
|-
|author=HGNC
| Reals || UTR || negative || 13 || 2 || 6.5 || 6.5
|title=ZSCAN22 zinc finger and SCAN domain containing 22 [ Homo sapiens (human) ]
|-
|publisher=National Center for Biotechnology Information
| Randoms || UTR || arbitrary negative || 26 || 10 || 2.6 || 2.4
|location=U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA
|-
|date=13 March 2020
| Randoms || UTR || alternate negative || 22 || 10 || 2.2 || 2.4
|url=https://www.ncbi.nlm.nih.gov/gene/342945
|-
|accessdate=2019-12-18 }}</ref> ZSCAN22 is transcribed in the negative direction from LOC100887072.<ref name=HGNC342945/>
| Reals || Core || negative || 0 || 2 || 0 || 0.5
# Gene ID: 102465484 is MIR6806 microRNA 6806 on 19q13.43: "microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop."<ref name=RefSeq102465484>{{ cite web
|-
|author=RefSeq
| Randoms || Core || negative || 0 || 10 || 0 || 0
|title=MIR6806 microRNA 6806 [ Homo sapiens (human) ]
|-
|publisher=National Center for Biotechnology Information
| Reals || Core || positive || 2 || 2 || 1.0 || 0.5
|location=U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA
|-
|date=10 September 2009
| Randoms || Core || positive || 0 || 10 || 0 || 0
|url=https://www.ncbi.nlm.nih.gov/gene/102465484
|-
|accessdate=2019-12-18 }}</ref> MIR6806 is transcribed in the negative direction from LOC105372480.<ref name=RefSeq102465484/>
| Reals || Proximal || negative || 1 || 2 || 0.5 || 0.5
|-
| Randoms || Proximal || negative || 5 || 10 || 0.5 || 0.4
|-
| Reals || Proximal || positive || 1 || 2 || 0.5 || 0.5
|-
| Randoms || Proximal || positive || 3 || 10 || 0.3 || 0.4
|-
| Reals || Distal || negative || 17 || 2 || 8.5 || 13.25
|-
| Randoms || Distal || negative || 42 || 10 || 4.2 || 4.85
|-
| Reals || Distal || positive || 36 || 2 || 18.0 || 13.25
|-
| Randoms || Distal || positive || 55 || 10 || 5.5 || 4.85
|}
 
Comparison:
 
The occurrences of real enhancer box consensus sequences are larger than the randoms, except the proximals are within error of the randoms. This suggests that the enhancer box consensus sequences are likely active or activable.
 
====GATAs====
{{main|GATA gene transcriptions}}
Although "the P3, P6 substitutions alter the conserved 'GATAAG' I box motif, a 'GATA' motif is present in the introduced ''EcoRV'' site. This introduced GATA sequence clearly does not serve as a functional I box [...]."<ref name=Donald>{{ cite journal
|author=Robert G. K. Donald and Anthony R. Cashmore
|title=Mutation of either G box or I box sequences profoundly affects expression from the ''Arabidopsis rbcS‐1A'' promoter
|journal=The EMBO Journal
|date=1990
|volume=9
|issue=6
|pages=1717-1726
|url=https://onlinelibrary.wiley.com/doi/pdf/10.1002/j.1460-2075.1990.tb08295.x
|arxiv=
|bibcode=
|doi=10.1002/j.1460-2075.1990.tb08295.x
|pmid=
|accessdate=8 November 2018 }}</ref>


The GATA box is part of the [[DAF-16-associated element gene transcriptions|DAF-16-associated elements]] TGATAAG, the [[DNA replication-related element gene transcriptions|DNA replication-related elements]] TATCGATA, the [[I box gene transcriptions|F boxes]] TGATAAG, the [[I box gene transcriptions|Iboxes]] GATAAG, the [[Shoot specific element gene transcriptions|Shoot specific elements]] GATAATGATG, the [[Cytokinin response regulator gene transcriptions|Cytokinin response regulators]] (ARR10s) (A/G)GATA(A/C)G, the [[Cytokinin response regulator gene transcriptions|Cytokinin response regulators]] (ARR12s) (A/G)AGATA, the [[HNF6 gene transcriptions|HNF6s]] (A/G/T)(A/T)(A/G)T(C/T) (A/C/G)AT(A/C/G/T)(A/G/T) Positive strand, negative direction: TAGTTGATAA at 3527, and the [[Tat box gene transcriptions|TAT Boxes]] TATCCAT Negative strand, negative direction: ATGGATA at 2996.
Of the some 111 gaps between genes on chromosome locus 19q13.43 as of 4 August 2020, gap number 88 is between ZSCAN22 and A1BG. But, there is no gap between ZNF497 and A1BG.


{|class="wikitable"
==Promoters==
|-
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|-
| Reals || UTR || negative || 15 || 2 || 7.5 || 7.5
|-
| Randoms || UTR || arbitrary negative || 59 || 10 || 5.9 || 5.45
|-
| Randoms || UTR || alternate negative || 50 || 10 || 5.0 || 5.45
|-
| Reals || Core || negative || 0 || 2 || 0 || 0
|-
| Randoms || Core || negative || 0 || 10 || 0 || 0.35
|-
| Reals || Core || positive || 0 || 2 || 0 || 0
|-
| Randoms || Core || positive || 7 || 10 || 0.7 || 0.35
|-
| Reals || Proximal || negative || 0 || 2 || 0 || 0.25
|-
| Randoms || Proximal || negative || 4 || 10 || 0.4 || 0.5
|-
| Reals || Proximal || positive || 1 || 2 || 0.5 || 0.25
|-
| Randoms || Proximal || positive || 6 || 10 || 0.6 || 0.5
|-
| Reals || Distal || negative || 21 || 2 || 10.5 || 8.25
|-
| Randoms || Distal || negative || 76 || 10 || 7.6 || 9.4
|-
| Reals || Distal || positive || 12 || 2 || 6 || 8.25
|-
| Randoms || Distal || positive || 112 || 10 || 11.2 || 9.4
|}


Comparison:
The core promoter begins approximately -35 nts upstream from the transcription start site (TSS). For the numbered nucleotides between ZSCAN22 and A1BG the core promoter extends from 4425 nts up to 4460 nts (TSS). The proximal promoter extends from approximately -250 to the TSS or 4210 nts up to 4460 nts. The distal promoter begins at about 2460 nts and extends to about 4210 nts.


The occurrences of real GATA UTR occurrence is systematically larger than the randoms. The proximal promoters are apparently random or just outside the random range. The GATA distal promoters are systematically less occurring than the randoms. This suggests that the real GATA consensus sequences are likely active or activable when occurring in the UTR or distals but may be random when occurring in the proximals.
From the ZNF497 side the core promoter begins about 4265 nts up to 4300 nts, the proximal promoter from 4050 nts to 4265 nts, and the distal promoter from 2300 nts to 4050 nts.


====Gln3s====
==Alpha-1-B glycoprotein==
{{main|Alpha-1-B glycoprotein}}
'''Def.''' "a substance that induces an immune response, usually foreign"<ref name=AntigenWikt>{{ cite web
|author=[[wikt:User:Jag123|Jag123]]
|title=antigen
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=7 March 2005
|url=https://en.wiktionary.org/wiki/antigen
|accessdate=7 March 2020 }}</ref> is called an '''antigen'''.


"Upstream noncoding regulatory sequences were retrieved and analyzed using Regulatory Sequence Analysis Tools (34). The program DNA-Pattern was used to search for and catalogue occurrences of consensus [[Gcn4p gene transcriptions|GCRE]] (TGABTVW) [TGA(C/G/T)T(A/C/G)(A/T)] and GATA (GATAAG, GATAAH, GATTA) motifs in yeast promoters."<ref name=Staschke>{{ cite journal
'''Def.''' any "substance that elicits [an] immune response"<ref name=ImmunogenWikt>{{ cite web
|author=Kirk A. Staschke, Souvik Dey, John M. Zaborske, Lakshmi Reddy Palam, Jeanette N. McClintick, Tao Pan, Howard J. Edenberg, and Ronald C. Wek
|author=[[wikt:User:SemperBlotto|SemperBlotto]]
|title=Integration of General Amino Acid Control and Target of Rapamycin (TOR) Regulatory Pathways in Nitrogen Assimilation in Yeast
|title=immunogen
|journal=The Journal of Biological Chemistry
|publisher=Wikimedia Foundation, Inc
|date=May 28, 2010
|location=San Francisco, California
|volume=285
|date=21 April 2008
|issue=22
|url=https://en.wiktionary.org/wiki/immunogen
|pages=16893–16911
|accessdate=8 March 2020 }}</ref> is called an '''immunogen'''.
|url=https://www.jbc.org/content/285/22/16893.full.pdf
|arxiv=
|bibcode=
|doi=10.1074/jbc.M110.121947
|pmid=
|accessdate=4 January 2021 }}</ref>


====Glucocorticoid response elements====
An antigen "or immunogen is a molecule that sometimes stimulates an immune system response."<ref name=AntigenWikidoc>{{ cite web
{{main|Glucocorticoid response element gene transcriptions}}
|author=C. Michael Gibson
"DNA-binding by the GR-DBD has been well-characterized; it is highly sequence-specific, directly recognizing invariant guanine nucleotides of two AGAACA [TGTTCT] half sites called the glucocorticoid response element (GRE), and binds as a dimer in head-to-head orientation with mid-nanomolar affinity (4,12–18). [...] The consensus DNA glucocorticoid response element (GRE) is comprised of two half-sites (AGAACA) separated by a three base-pair spacer (13,15,60,61)."<ref name=Parsonnet>{{ cite journal
|title=Antigen
|author=Nicholas V Parsonnet, Nickolaus C Lammer, Zachariah E Holmes, Robert T Batey, Deborah S Wuttke
|publisher=WikiDoc Foundation
|title=The glucocorticoid receptor DNA-binding domain recognizes RNA hairpin structures with high affinity
|location=Boston, Massachusetts
|journal=Nucleic Acids Research
|date=27 April 2008
|date=5 September 2019
|url=https://www.wikidoc.org/index.php/Antigen
|volume=47
|accessdate=8 March 2020 }}</ref> But, "the immune system does not consist of only antibodies",<ref name=AntigenWikidoc/> instead it "encompasses all substances that can be recognized by the [[adaptive immune system]]."<ref name=AntigenWikidoc/>
|issue=15
|pages=8180-8192
|url=https://academic.oup.com/nar/article/47/15/8180/5506867
|arxiv=
|bibcode=
|doi=10.1093/nar/gkz486
|pmid=31147715
|accessdate=28 August 2020 }}</ref>


{|class="wikitable"
'''Def.''' "a protein produced by B-lymphocytes that binds to [a specific antigen or]<ref name=AntibodyWikt1>{{ cite web
|-
|author=[[wikt:User:Williamsayers79|Williamsayers79]]
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|title=antibody
|-
|publisher=Wikimedia Foundation, Inc
| Reals || UTR || negative || 5 || 2 || 2.5 || 2.5
|location=San Francisco, California
|-
|date=26 February 2007
| Randoms || UTR || arbitrary negative || 0 || 10 || 0 || 0.05
|url=https://en.wiktionary.org/wiki/antibody
|-
|accessdate=7 March 2020 }}</ref> an antigen"<ref name=AntibodyWikt>{{ cite web
| Randoms || UTR || alternate negative || 1 || 10 || 0.1 || 0.05
|author=[[wikt:User:Jag123|Jag123]]
|-
|title=antibody
| Reals || Core || negative || 0 || 2 || 0 || 0
|publisher=Wikimedia Foundation, Inc
|-
|location=San Francisco, California
| Randoms || Core || negative || 0 || 10 || 0 || 0
|date=7 March 2005
|-
|url=https://en.wiktionary.org/wiki/antibody
| Reals || Core || positive || 0 || 2 || 0 || 0
|accessdate=7 March 2020 }}</ref> is called an '''[[antibody]]'''.
|-
| Randoms || Core || positive || 0 || 10 || 0 || 0
|-
| Reals || Proximal || negative || 0 || 2 || 0 || 0.25
|-
| Randoms || Proximal || negative || 0 || 10 || 0 || 0.05
|-
| Reals || Proximal || positive || 1 || 2 || 0.5 || 0.25
|-
| Randoms || Proximal || positive || 1 || 10 || 0.1 || 0.05
|-
| Reals || Distal || negative || 3 || 2 || 1.5 || 1.25
|-
| Randoms || Distal || negative || 5 || 10 || 0.5 || 0.5
|-
| Reals || Distal || positive || 2 || 2 || 1.0 || 1.25
|-
| Randoms || Distal || positive || 5 || 10 || 0.5 || 0.5
|}


Comparison:
Five different antibody isotypes are known in mammals, which perform different roles, and help direct the appropriate immune response for each different type of foreign object they encounter.<ref name=Market>{{ cite journal
 
|author=Eleonora Market, F. Nina Papavasiliou
The occurrences of real Glucocorticoid response elements are larger than the randoms. This suggests that the real Glucocorticoid response elements are likely active or activable.
|date=2003
 
|url=http://biology.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pbio.0000016
====ICRE (Lopes)====
|title=V(D)J Recombination and the Evolution of the Adaptive Immune System
{{main|Inositol, choline-responsive element gene transcriptions}}
|journal=PLoS Biology
|volume=1
|issue=1
|pages=e16
|doi=10.1371/journal.pbio.0000016 }}</ref>


====ICRE (Schwank)====
Although the general structure of all antibodies is very similar, a small region, known as the hypervariable region, at the tip of the protein is extremely variable, allowing millions of antibodies with slightly different tip structures to exist, where each of these variants can bind to a different target, known as an antigen.<ref name=Janeway5>{{ cite book | author = Charles A Janeway, Jr, Paul Travers, Mark Walport, and Mark J Shlomchik | title = Immunobiolog. | edition = 5th ed. | publisher = Garland Publishing | date = 2001 | url = http://www.ncbi.nlm.nih.gov/books/bv.fcgi?call=bv.View..ShowTOC&rid=imm.TOC&depth=10 | isbn = 0-8153-3642-X }}</ref>
{{main|Inositol, choline-responsive element gene transcriptions}}


====Pho4====
'''Def.''' "any of the glycoproteins in blood serum that respond to invasion by foreign antigens and that protect the host by removing pathogens;"<ref name=ImmunoglobulinWikt>{{ cite web
|author=[[wikt:User:SemperBlotto|SemperBlotto]]
|title= immunoglobulin
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=25 February 2006
|url=https://en.wiktionary.org/wiki/immunoglobulin
|accessdate=7 March 2020 }}</ref> "an antibody"<ref name=ImmunoglobulinWikt1>{{ cite web
|author=[[wikt:User:SemperBlotto|SemperBlotto]]
|title= immunoglobulin
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=28 April 2008
|url=https://en.wiktionary.org/wiki/immunoglobulin
|accessdate=7 March 2020 }}</ref> is called an '''[[immunoglobulin]]'''.


The upstream activating sequence (UAS) for Pho4p is CAC(A/G)T(T/G) in the promoters of ''HIS4'' and ''PHO5'' regarding phosphate limitation with respect to regulation of the purine and histidine biosynthesis pathways [66].<ref name=Tang/>
Gene ID: 1 is A1BG [[alpha-1-B glycoprotein]] on 19q13.43, a 54.3 kDa [[protein]] in humans that is encoded by the A1BG [[gene]].<ref name=RefSeq1>{{ cite web
 
|author=RefSeq
{|class="wikitable"
|title=A1BG alpha-1-B glycoprotein [ Homo sapiens (human) ]
|-
|publisher=National Center for Biotechnology Information
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|location=U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA
|-
|date=10 December 2019
| Reals || UTR || negative || 4 || 2 || 2 || 2 (1-3)  
|url=https://www.ncbi.nlm.nih.gov/gene/1
|-
|accessdate=2019-12-18 }}</ref> A1BG is transcribed in the positive direction from ZNF497.<ref name=RefSeq1/> "The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins."<ref name=RefSeq1/>
| Randoms || UTR || arbitrary negative || 12 || 10 || 1.2 || 1.05
# NP_570602.2 alpha-1B-glycoprotein precursor, '''cd05751''' Location: 401 → 493 Ig1_LILRB1_like; First immunoglobulin (Ig)-like domain found in Leukocyte Ig-like receptors (LILR)B1 (also known as LIR-1) and similar proteins, '''smart00410''' Location: 218 → 280 IG_like; Immunoglobulin like, '''pfam13895''' Location: 210 → 301 Ig_2; Immunoglobulin domain and '''cl11960''' Location: 28 → 110 Ig; Immunoglobulin domain.<ref name=RefSeq1/>
|-
| Randoms || UTR || alternate negative || 9 || 10 || 0.9 || 1.05
|-
| Reals || Core || negative || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Core || positive || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary positive || 2 || 10 || 0.2 || 0.25
|-
| Randoms || Core || alternate positive || 3 || 10 || 0.2 || 0.25
|-
| Reals || Proximal || negative || 1 || 2 || 0.5 || 0.5 (0-1)
|-
| Randoms || Proximal || arbitrary negative || 2 || 10 || 0.2 || 0.15
|-
| Randoms || Proximal || alternate negative || 1 || 10 || 0.1 || 0.15
|-
| Reals || Proximal || positive || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary positive || 2 || 10 || 0.2 || 0.2  
|-
| Randoms || Proximal || alternate positive || 2 || 10 || 0.2 || 0.2
|-
| Reals || Distal || negative || 10 || 2 || 5 || 5 (1-9)  
|-
| Randoms || Distal || arbitrary negative || 23 || 10 || 2.3 || 2.1
|-
| Randoms || Distal || alternate negative || 19 || 10 || 1.9 || 2.1
|-
| Reals || Distal || positive || 14 || 2 || 7 || 7 (6-8)  
|-
| Randoms || Distal || arbitrary positive || 25 || 10 || 2.5 || 2.85
|-
| Randoms || Distal || alternate positive || 32 || 10 || 3.2 || 2.85
|}


Comparison:
Patients who have pancreatic ductal [[adenocarcinoma]] show an [[overexpression]] of A1BG in [[pancreatic juice]].<ref name=Tian>{{ cite journal
|author=Mei Tian, Ya-Zhou Cui, Guan-Hua Song, Mei-Juan Zong, Xiao-Yan Zhou, Yu Chen, Jin-Xiang Han
| title = Proteomic analysis identifies MMP-9, DJ-1 and A1BG as overexpressed proteins in pancreatic juice from pancreatic ductal adenocarcinoma patients
| journal = BMC Cancer
| volume = 8
| issue =
| pages = 241
| date = 2008
| pmid = 18706098
| pmc = 2528014
| doi = 10.1186/1471-2407-8-241
|url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2528014/ }}</ref>


The occurrences of real Pho4ps are greater than the randoms, but the positive strands of the UTRs and negative direction distals are in the random range. This suggests that the real Pho4ps are likely active or activable.
===Immunoglobulin supergene family===
{{main|Immunoglobulin supergene family}}
"𝛂<sub>1</sub>B-glycoprotein(𝛂<sub>1</sub>B) [...] consists of a single polypeptide chain N-linked to four
glucosamine oligosaccharides. The polypeptide has five intrachain disulfide bonds and contains 474 amino acid residues. [...] 𝛂<sub>1</sub>B exhibits internal duplication and consists of five repeating structural domains, each containing about 95 amino acids and one disulfide bond. [...] several domains of 𝛂<sub>1</sub>B, especially the third, show statistically significant homology to variable regions of certain immunoglobulin light and heavy chains. 𝛂<sub>1</sub>B [...] exhibits sequence similarity to other members of the [[immunoglobulin supergene family]] such as the receptor for transepithelial transport of IgA and IgM and the secretory component of human IgA."<ref name=Ishioka/>


====Quinone reductase response elements====
"Some of the domains of 𝛂<sub>1</sub>B show significant homology to variable (V) and constant (C) regions of certain immunoglobulins. Likewise, there is statistically significant homology between 𝛂<sub>1</sub>B and the secretory component (SC) of human IgA (15) and also with the extracellular portion of the rabbit receptor for transepithelial transport of polymeric immunoglobulins (IgA and IgM). Mostov et al. (16) have called the later protein the poly-Ig receptor or poly-IgR and have shown that it is the precursor of SC."<ref name=Ishioka/>


The quinone reductase (QRDRE) gene contains TCCCCTTGCGTG which has the DRE core of TNGCGTG.<ref name=Yao/>
The immunoglobulin supergene family is "the group of proteins that have immunoglobulin-like domains, including histocompatibility antigens, the T-cell antigen receptor, poly-IgR, and other proteins involved in the vertebrate immune response (17)."<ref name=Ishioka/>


While the QRDRE may be just limited to TCCCC since it also has the fixed T after this sequence, TCCCCT has been used to examine the four possible promoters. The only elements are in the distal promoters in the positive direction, negative strand TCCCCT at 1073, and positive strand TCCCCT at 3665, TCCCCT at 2657, TCCCCT at 321, for a response of 1.0.
"The internal homology in primary structure [...] and the presence of an intrasegment disulfide bond suggest that 𝛂<sub>1</sub>B is composed of five structural domains that arose by duplication of a primordial gene coding for about 95 amino acid residues."<ref name=Ishioka/>


The random datasets had a far greater number of sequences. In the UTR between ZSCAN22 and A1BG were six consensus sequences, three direct and three complement inverses, for a response of 0.6. In the core promoters there were two in the chosen positive direction for a response of 0.2. The proximal promoters had three, two in the chosen negative direction and one in the positive direction for a response of 0.3. The distal promoters there were ten in the chosen negative direction for 1.0 and nineteen in the positive direction for 1.9.
"Unlike immunoglobulins (25), ceruloplasmin (6), and hemopexin (7), 𝛂<sub>1</sub>B is not subject to limited interdomain cleavage by proteolytic enzymes. At least, we were not able to produce such fragments by use of a variety of proteases. This stability of 𝛂<sub>1</sub>B is probably associated with the frequency of proline in the sequences linking the domains [...]."<ref name=Ishioka/>


While the responses for the UTR, core and proximal promoters were low but none were zero. None of the QRDREs are linked to the DREs. As the real occurrences were only in the distal promoters and the occurrences were in the 1.0 to 1.9 range it is likely that these QRDREs are random.
"A peptide identified in the late and early milk proteomes showed homology to eutherian alpha 1B glycoprotein (A1BG), a plasma protein with unknown function<sup>46</sup>, as well as venom inhibitors characterised in the Southern opossum ''Didelphis marsupialis'' (DM43 and DM46<sup>47,48,49</sup>), all members of the immunoglobulin superfamily. To characterise the relationship between the peptide sequence identified in koala, A1BG, DM43 and DM46, a phylogenetic tree was constructed [...] including all marsupial and monotreme homologs (identified by BLAST), three phylogenetically representative eutherian sequences, with human IGSF1 and TARM1, related members of the immunoglobulin super family, used as outgroups. This phylogeny indicates that A1BG-like proteins in marsupials and the ''Didelphis'' antitoxic proteins are homologs of eutherian A1BG, with excellent bootstrap support (98%). The marsupial A1BG-like sequences and the ''Didelphis'' antitoxic proteins formed a single clade with strong bootstrap support (97%)."<ref name=Morris>{{ cite journal
 
|author=Katrina M. Morris, Denis O’Meally, Thiri Zaw, Xiaomin Song, Amber Gillett, Mark P. Molloy, Adam Polkinghorne, and Katherine Belova
====TCCG elements====
|title=Characterisation of the immune compounds in koala milk using a combined transcriptomic and proteomic approach
 
|journal=Scientific Reports
====Xenobiotic response elements====
|date=7 October 2016
{{main|Xenobiotic response element gene transcriptions}}
|volume=6
"The megalin (LRP2) gene promoter region [shows] eight consensus sequence of XRE 5′-GCGTG-3′."<ref name=Mokhtar>{{ cite journal
|issue=
|author=Mahmoud Mohamed Mokhtar, Emad Gamil Khidr, Hesham Mohamed Shaban, Shady Allam, Bakheet E. M. Elsadek, Salama Abdou Salama & Shawkey Saddik Ali
|pages=35011
|title=The effect of aryl hydrocarbon receptor ligands on gentamicin-induced nephrotoxicity in rats
|url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5054531/
|journal=Environmental Science and Pollution Research
|date=28 February 2020
|volume=27
|issue=May
|pages=16189–16202
|url=https://link.springer.com/article/10.1007/s11356-020-08073-z
|arxiv=
|arxiv=
|bibcode=
|bibcode=
|doi=10.1007/s11356-020-08073-z
|doi=10.1038/srep35011
|pmid=
|pmid=27713568
|accessdate=16 February 2021 }}</ref>
|accessdate=14 March 2020 }}</ref>


A1BG is not included in any of the aryl hydrocarbon receptor (Gene ID: 196) pathways (19).<ref name=RefSeq196>{{ cite web
"Human TARM1 and IGSF1, related members of the immunoglobulin superfamily are used as outgroups. The tree was constructed using the maximum likelihood approach and the JTT model with bootstrap support values from 500 bootstrap tests. Bootstrap values less than 50% are not displayed. Accession numbers: Tasmanian devil (''Sarcophilus harrisii''; XP_012402143), Wallaby (''Macropus eugenii''; FY619507), Possum (''Trichosurus vulpecula''; DY596639) Virginia opossum (''Didelphis virginiana''; AAA30970, AAN06914), Southern opossum (''Didelphis marsupialis''; AAL82794, P82957, AAN64698), Human (''Homo sapiens''; P04217, B6A8C7, Q8N6C5), Platypus (''Ornithorhychus anatinus''; ENSOANP00000000762), Cow (''Bos taurus''; Q2KJF1), Alpaca (''Vicugna pacos''; XP_015107031)."<ref name=Morris/>
|author=RefSeq
|title=AHR aryl hydrocarbon receptor [ Homo sapiens (human) ]
|publisher=National Center for Biotechnology Information, U.S. National Library of Medicine
|location=8600 Rockville Pike, Bethesda MD, 20894 USA
|date=September 2015
|url=https://www.ncbi.nlm.nih.gov/gene/196
|accessdate=5 November 2021 }}</ref> As such the XRE is not expected in the promoters of A1BG.


The XRE is involved in the response of 78 human genes such as Gene ID: 1543 where the "Hypomethylation of the XRE -1383 site is associated with the upregulation of CYP1A1 in gastric adenocarcinoma."<ref name=Sadeghi>{{ cite journal
"The sequences of 𝛂<sub>1</sub>B-glycoprotein (38) and chicken N-CAM (neural cell-adhesion molecule) (39) have been shown to be related to the immunoglobulin supergene family."<ref name=Paxton>{{ cite journal
|author=L Sadeghi-Amiri, A Barzegar, N Nikbakhsh-Zati, P Mehraban
|author=R. J. Paxton, G. Mooser, H. Pande, T. D. Lee, and J. E. Shively
|title=Hypomethylation of the XRE -1383 site is associated with the upregulation of ''CYP1A1'' in gastric adenocarcinoma
|title=Sequence analysis of carcinoembryonic antigen: identification of glycosylation sites and homology with the immunoglobulin supergene family
|journal=Gene
|journal=Proceedings of the National Academy of Sciences USA
|date=15 February 2021
|date=1 February 1987
|volume=769
|volume=84
|issue=145216
|issue=4
|pages=
|pages=920-924
|url=https://www.sciencedirect.com/science/article/abs/pii/S0378111920308854?via%3Dihub
|url=https://www.pnas.org/content/pnas/84/4/920.full.pdf
|arxiv=
|bibcode=
|doi=10.1016/j.gene.2020.145216
|pmid=33069801
|accessdate=5 November 2021 }}</ref> "Bisulfite sequencing and the resulting methylation percentages revealed dynamically methylated CpG sites located within or around xenobiotic response elements (XRE) 4–10, and a region of consistent hypermethylation located near proximal promoter, encompassing XRE2-3."<ref name=Sadeghi/> For example using search concepts ("gastric adenocarcinoma" A1BG) on Google Scholar produced, "Immunohistochemical staining on a tissue microarray was then carried out for alpha-1B-glycoprotein (A1BG), leucine-rich alpha-2-glycoprotein (LRG1), ubiquitin carboxyl-terminal hydrolase 1 (USP1), and mucin-5B as candidate biomarkers. Their levels were significantly elevated in lung cancer tissue. A1BG levels were also determined as significantly elevated with Western blot on sera samples."<ref name=Hudler>{{ cite journal
|author=Petra Hudler, Nina Kocevar, and Radovan Komel
|title=Proteomic Approaches in Biomarker Discovery: New Perspectives in Cancer Diagnostics
|journal=The Scientific Work Journal
|date=14 January 2014
|volume=2014
|issue=260348
|pages=18
|url=https://www.hindawi.com/journals/tswj/2014/260348/
|arxiv=
|arxiv=
|bibcode=
|bibcode=
|doi=10.1155/2014/260348
|doi=10.1073/pnas.84.4.920
|pmid=
|pmid=3469650
|accessdate=5 November 2021 }}</ref> Further, "Using sera as samples and multiple fractionation steps (protein depletion, lectin affinity fractionation, IEF separation, and LC-MS analysis), the following candidates were selected as breast cancer-associated proteins: thrombospondin-1 (TSP1) and 5 (TSP5), alpha-1B-glycoprotein (A1BG), serum amyloid P-component (SAP), and tenascin-X (TN-X) [106]. SAP and TSP5 were increased in breast cancer serum, A1BG showed a pI shift and a slight increase in total abundance in the cancer samples, TSP1 showed changes in glycan structure, and TN-X was both increased and showed glycan structure changes."<ref name=Hudler/>
|accessdate=26 March 2020 }}</ref>


The xenobiotic response element (XRE) GCGTG occurs twice only on the positive strand in the UTR between ZSCAN22 and A1BG for an occurrence of (1.0), but does not occur in either core promoter or proximal promoter. For the distal promoters it occurs along with its complement inverse ten times between the two strands, negative (3) or positive (7), in the negative direction, and between the two strands negative (17) for 8.5 or positive (27) for 13.5 in the positive direction.
A1BG contains the immunoglobulin domain: '''cl11960''' and three immunoglobulin-like domains: '''pfam13895''', '''cd05751''' and '''smart00410'''.


The random datasets contained seven XREs in each of ten UTRs for an occurrence of 0.7, twice in the core promoter in the arbitrarily chosen positive direction for an occurrence of 1.0, once in the proximal direction also in the positive direction for 0.5. In the distal promoters nineteen in the arbitrarily chosen negative direction for ten strands yielding 1.9 occurrences per strand and twenty-seven in the positive direction for ten strands yielding 2.7 occurrences per strand.
"Immunoglobulin (Ig) domain ['''cl11960'''] found in the Ig superfamily. The Ig superfamily is a heterogenous group of proteins, built on a common fold comprised of a sandwich of two beta sheets. Members of this group are components of immunoglobulin, neuroglia, cell surface glycoproteins, such as, T-cell receptors, CD2, CD4, CD8, and membrane glycoproteins, such as, butyrophilin and chondroitin sulfate proteoglycan core protein. A predominant feature of most Ig domains is a disulfide bridge connecting the two beta-sheets with a tryptophan residue packed against the disulfide bond."<ref name=NCBI386229>{{ cite web
|author=NCBI
|title=Conserved Protein Domain Family cl11960: Ig Superfamily
|publisher=National Center for Biotechnology Information, U.S. National Library of Medicine
|location=8600 Rockville Pike, Bethesda MD, 20894 USA
|date=2 February 2016
|url=https://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=386229
|accessdate=22 May 2020 }}</ref>


Although the occurrences are close for the UTRs (1.0 vs 0.7), the comparisons for the core promoters (0 vs 1.0) and proximal promoters (0 vs 0.5) are not close, and for the distal promoters (5.0 vs 1.9) in the negative direction and (13.5 vs 2.7) in the positive direction are also not close, suggesting that the occurrences of the XREs are likely active or activable rather than random occurrences.
"This domain ['''pfam13895'''] contains immunoglobulin-like domains."<ref name=NCBI372793>{{ cite web
|author=NCBI
|title=Conserved Protein Domain Family pfam13895: Ig_2
|publisher=National Center for Biotechnology Information, U.S. National Library of Medicine
|location=8600 Rockville Pike, Bethesda MD, 20894 USA
|date=5 August 2015
|url=https://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=372793
|accessdate=24 May 2020 }}</ref>


===Basic helix-loop-helix leucine zipper transcription factors===
"Ig1_LILR_KIR_like: ['''cd05751'''] domain similar to the first immunoglobulin (Ig)-like domain found in Leukocyte Ig-like receptors (LILRs) and Natural killer inhibitory receptors (KIRs). This group includes LILRB1 (or LIR-1), LILRA5 (or LIR9), an activating natural cytotoxicity receptor NKp46, the immune-type receptor glycoprotein VI (GPVI), and the IgA-specific receptor Fc-alphaRI (or CD89). LILRs are a family of immunoreceptors expressed on expressed on T and B cells, on monocytes, dendritic cells, and subgroups of natural killer (NK) cells. The human LILR family contains nine proteins (LILRA1-3,and 5, and LILRB1-5). From functional assays, and as the cytoplasmic domains of various LILRs, for example LILRB1 (LIR-1), LILRB2 (LIR-2), and LILRB3 (LIR-3) contain immunoreceptor tyrosine-based inhibitory motifs (ITIMs) it is thought that LIR proteins are inhibitory receptors. Of the eight LIR family proteins, only LIR-1 (LILRB1), and LIR-2 (LILRB2), show detectable binding to class I MHC molecules; ligands for the other members have yet to be determined. The extracellular portions of the different LIR proteins contain different numbers of Ig-like domains for example, four in the case of LILRB1 (LIR-1), and LILRB2 (LIR-2), and two in the case of LILRB4 (LIR-5). The activating natural cytotoxicity receptor NKp46 is expressed in natural killer cells, and is organized as an extracellular portion having two Ig-like extracellular domains, a transmembrane domain, and a small cytoplasmic portion. GPVI, which also contains two Ig-like domains, participates in the processes of collagen-mediated platelet activation and arterial thrombus formation. Fc-alphaRI is expressed on monocytes, eosinophils, neutrophils and macrophages; it mediates IgA-induced immune effector responses such as phagocytosis, antibody-dependent cell-mediated cytotoxicity and respiratory burst."<ref name=NCBI319306>{{ cite web
|author=NCBI
|title=Conserved Protein Domain Family cd05751: Ig1_LILR_KIR_like
|publisher=National Center for Biotechnology Information, U.S. National Library of Medicine
|location=8600 Rockville Pike, Bethesda MD, 20894 USA
|date=16 August 2016
|url=https://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=319306
|accessdate=24 May 2020 }}</ref>


Basic helix-loop-helix leucine zipper transcription factors are, as their name indicates, transcription factors containing both [[Basic helix-loop-helix]] and [[leucine zipper]] motifs.
"IG domains ['''smart00410'''] that cannot be classified into one of IGv1, IGc1, IGc2, IG."<ref name=NCBI214653>{{ cite web
 
|author=NCBI
Examples include [[Microphthalmia-associated transcription factor]] and [[Sterol regulatory element-binding protein]] (SREBP).
|title=Conserved Protein Domain Family smart00410: IG_like
 
|publisher=National Center for Biotechnology Information, U.S. National Library of Medicine
MITF recognizes E-box (CAYRTG) and M-box (TCAYRTG or CAYRTGA) sequences in the promoter regions of target genes.<ref name=Hoek>{{cite journal | author = Hoek KS, Schlegel NC, Eichhoff OM, Widmer DS, Praetorius C, Einarsson SO, Valgeirsdottir S, Bergsteinsdottir K, Schepsky A, Dummer R, Steingrimsson E | title = Novel MITF targets identified using a two-step DNA microarray strategy | journal = Pigment Cell Melanoma Res. | volume = 21 | issue = 6 | pages = 665–76 | date = 2008 | pmid = 19067971 | doi = 10.1111/j.1755-148X.2008.00505.x }}</ref>
|location=8600 Rockville Pike, Bethesda MD, 20894 USA
 
|date=16 January 2013
[[Serum response element gene transcriptions]]: The SRE wild type (SREwt) contains the nucleotide sequence ACAGGATGTCCATATTAGGACATCTGC, of which CCATATTAGG is the CArG box, TTAGGACAT is the C/EBP box, and CATCTG is the E box.<ref name=Misra>{{ cite journal
|url=https://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=214653
|author=Ravi P. Misra
|accessdate=24 May 2020 }}</ref>
|author2=Azad Bonni
"𝛂<sub>1</sub>B-glycoprotein(𝛂<sub>1</sub>B) [...] consists of a single polypeptide chain N-linked to four
|author3=Cindy K. Miranti
glucosamine oligosaccharides. The polypeptide has five intrachain disulfide bonds and contains 474 amino acid residues. [...] 𝛂<sub>1</sub>B exhibits internal duplication and consists of five repeating structural domains, each containing about 95 amino acids and one disulfide bond. [...] several domains of 𝛂<sub>1</sub>B, especially the third, show statistically significant homology to variable regions of certain immunoglobulin light and heavy chains. 𝛂<sub>1</sub>B [...] exhibits sequence similarity to other members of the immunoglobulin supergene family such as the receptor for transepithelial transport of IgA and IgM and the secretory component of human IgA."<ref name=Ishioka>{{ cite journal
|author4=Victor M. Rivera
|author=Noriaki Ishioka, Nobuhiro Takahashi, and Frank W. Putnam
|author5=Morgan Sheng
|title=Amino acid sequence of human plasma 𝛂<sub>1</sub>B-glycoprotein: Homology to the immunoglobulin supergene family
|author6=Michael E.Greenberg
|journal=Proceedings of the National Academy of Sciences USA
|title=L-type Voltage-sensitive Calcium Channel Activation Stimulates Gene Expression by a Serum Response Factor-dependent Pathway
|date=April 1986
|journal=The Journal of Biological Chemistry
|volume=83
|date=14 October 1994
|issue=8
|volume=269
|pages=2363-7
|issue=41
|url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC323297/pdf/pnas00312-0089.pdf
|pages=25483-25493
|url=http://www.jbc.org/content/269/41/25483.full.pdf
|arxiv=
|arxiv=
|bibcode=
|bibcode=
|doi=
|doi=10.1073/pnas.83.8.2363
|pmid=7929249
|pmid=3458201
|accessdate=7 December 2019 }}</ref>
|accessdate=9 March 2020 }}</ref>


"Serum response factor (SRF) is an important transcription factor that regulates cardiac and skeletal muscle genes during development, maturation and adult aging [17,18]. SRF regulates its target genes by binding to serum response elements (SREs), which contain a consensus CC(A/T)<sub>6</sub>GG (CArG) motif."<ref name=Zhang2017>{{ cite journal
===A1BG protein species===
|author=Xiaomin Zhang, Gohar Azhar, Jeanne Y. Wei
|title=SIRT2 gene has a classic SRE element, is a downstream target of serum response factor and is likely activated during serum stimulation
|journal=PLOS One
|date=21 December 2017
|volume=12
|issue=12
|pages=e0190011
|url=https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0190011
|arxiv=
|bibcode=
|doi=10.1371/journal.pone.0190011
|pmid=
|accessdate=23 February 2021 }}</ref>


====CArG boxes====
'''Def.''' a "group of plants or animals having similar appearance"<ref name=SpeciesWikt>{{ cite web
 
|author=[[wikt:User:24.98.118.180|24.98.118.180]]
Consensus sequences: CC(A/T)<sub>6</sub>GG.
|title=species
 
|publisher=Wikimedia Foundation, Inc
====C/EBP boxes====
|location=San Francisco, California
 
|date=28 February 2007
Consensus sequences: CCATAATAGG.
|url=https://en.wiktionary.org/wiki/species
 
|accessdate=25 March 2020 }}</ref> or "the largest group of organisms in which [any]<ref name=Species1/> two individuals [of the appropriate sexes or mating types]<ref name=Species1>{{ cite web
====E boxes====
|author=[[w:User:Peter coxhead|Peter coxhead]]
 
|title=Species
Consensus sequences: CATCTG.
|publisher=Wikimedia Foundation, Inc
 
|location=San Francisco, California
Consensus sequences: CAYRTG, CA(C/T)(A/G)TG.
|date=22 August 2018
|url=https://en.wikipedia.org/wiki/Species
|accessdate=25 March 2020 }}</ref> can produce fertile offspring, typically by sexual reproduction"<ref name=Species>{{ cite web
|author=[[w:User:Chiswick Chap|Chiswick Chap]]
|title=Species
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=1 December 2016
|url=https://en.wikipedia.org/wiki/Species
|accessdate=25 March 2020 }}</ref> is called a '''species'''.


====M-boxes====
The gene contains 20 distinct introns.<ref name=AceView>{{ cite web
|title=AceView: A1BG
|url=https://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/av.cgi?db=human&term=a1bg&submit=Go
|accessdate=May 11, 2013 }}</ref> Transcription produces 15 different mRNAs, 10 alternatively spliced variants and 5 unspliced forms.<ref name="AceView"/> There are 4 probable alternative promoters, 4 non overlapping alternative last exons and 7 validated alternative polyadenylation sites.<ref name="AceView"/> The mRNAs appear to differ by truncation of the 5' end, truncation of the 3' end, presence or absence of 4 cassette exons, overlapping exons with different boundaries, splicing versus retention of 3 introns.<ref name="AceView"/>


Consensus sequences: TCAYRTG or CAYRTGA, TCA(C/T)(A/G)TG or CA(C/T)(A/G)TGA<ref name=Hoek/> ~ (T/N)CA(C/T)(A/G)TG(A/N).
====Variants or isoforms====


The M box consensus sequence GTCATGTGCT<ref name=Bertolotto>{{ cite journal
'''Def.''' a "different sequence of a gene (locus)"<ref name=VariantWikt>{{ cite web
|author=Corine Bertolotto, Roser Buscà, Patricia Abbe, Karine Bille, Edith Aberdam, Jean-Paul Ortonne, and Robert Ballotti
|author=[[wikt:User:Pdeitiker|Pdeitiker]]
|title=Different ''cis''-Acting Elements Are Involved in the Regulation of TRP1 and TRP2 Promoter Activities by Cyclic AMP: Pivotal Role of M Boxes (GTCATGTGCT) and of Microphthalmia
|title=variant
|journal=Molecular and Cellular Biology
|publisher=Wikimedia Foundation, Inc
|date=February 1998
|location=San Francisco, California
|volume=18
|date=26 July 2008
|issue=2
|url=https://en.wiktionary.org/wiki/variant
|pages=694–702
|accessdate=25 March 2020 }}</ref> is called a '''variant'''.
|url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC108780/
|arxiv=
|bibcode=
|doi=
|pmid=9447965
|accessdate=8 December 2018 }}</ref> does not occur on either side of A1BG. The random datasets had only one occurrence GTCATGTGCT at 1977 in the distal promoter.


Using a more general M-box consensus of (T/N)CA(C/T)(A/G)TG(A/N) yielded four sequences in the negative direction and twelve in the positive direction. Of these only TCACATGA at 325 in the negative direction and TCACATGT at 3957, CCATGTGA at 3903, and CCACATGA at 3708 in the positive direction conform to TCAYRTG or CAYRTGA<ref name=Hoek/>. The random datasets had 25 occurrences of the general consensus but only fifteen fit TCAYRTG or CAYRTGA<ref name=Hoek/>, nine in the arbitrary negative direction and six in the positive direction. The disparity between real occurrences and random occurrences suggests that the real occurrences are likely active or can be activated.
'''Def.''' any "of several different forms of the same protein, arising from either single nucleotide polymorphisms,<ref name=IsoformWikt1>{{ cite web
|author=[[wikt:User:SemperBlotto|SemperBlotto]]
|title=isoform
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=6 January 2007
|url=https://en.wiktionary.org/wiki/isoform
|accessdate=2 December 2018 }}</ref> differential splicing of mRNA, or post-translational modifications (e.g. sulfation, glycosylation, etc.)"<ref name=IsoformWikt2>{{ cite web
|author=[[wikt:User:72.178.245.181|72.178.245.181]]
|title=isoform
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=30 November 2008
|url=https://en.wiktionary.org/wiki/isoform
|accessdate=2 December 2018 }}</ref> is called an '''isoform'''.


The M-box with the consensus sequence TCACATGA<ref name=Ripoll>{{ cite journal
Regarding additional isoforms, mention has been made of "new genetic variants of A1BG."<ref name=Eiberg>{{ cite journal
|author=Vera M. Ripoll, Nicholas A. Meadows, Liza-Jane Raggatt, Ming K. Chang, Allison R. Pettit, Alan I. Cassady and David A. Hume
|author=H Eiberg, ML Bisgaard, J Mohr
|title=Microphthalmia transcription factor regulates the expression of the novel osteoclast factor GPNMB
|title=Linkage between alpha 1B-glycoprotein (A1BG) and Lutheran (LU) red blood group system: assignment to chromosome 19: new genetic variants of A1BG
|journal=Gene
|journal=Clinical genetics
|date=30 April 2005
|date=1 December 1989
|volume=413
|volume=36
|issue=1-2
|issue=6
|pages=32-41
|pages=415-8
|url=https://research-repository.griffith.edu.au/bitstream/handle/10072/61380/95248_1.pdf?sequence=1
|url=http://europepmc.org/abstract/MED/2591067
|arxiv=
|arxiv=
|bibcode=
|bibcode=
|doi=10.1016/j.gene.2008.01.014
|doi=
|pmid=
|pmid=2591067
|accessdate=18 March 2021 }}</ref> occurred only once TCACATGA at 325 in the negative direction. There were no occurrences among the random datasets.
|accessdate=2017-10-08 }}</ref>


M-box consensus sequence is GGTCATGTGCT.<ref name=Zhao>{{ cite journal
"Proteomic analysis revealed that [a circulating] set of plasma proteins was α 1 B-glycoprotein ('''A1BG''') and its
|author=Yuanyuan Zhao, Jinzhu Meng, Guoqing Cao, Pengfei Gao & Changsheng Dong
post-translationally modified isoforms."<ref name=Stehle>{{ cite journal
|title=Screening the optimal activity region of the dopachrome tautomerase gene promoter in sheep skin melanocytes
|author=John R. Stehle Jr., Mark E. Weeks, Kai Lin, Mark C. Willingham, Amy M. Hicks, John F. Timms, Zheng Cui
|journal=Journal of Applied Animal Research
|title=Mass spectrometry identification of circulating alpha-1-B glycoprotein, increased in aged female C57BL/6 mice
|date=28 August 2018
|journal=Biochimica et Biophysica Acta (BBA) - General Subjects
|volume=46
|date=January 2007
|volume=1770
|issue=1
|issue=1
|pages=1382-1388
|pages=79-86
|url=https://www.tandfonline.com/doi/pdf/10.1080/09712119.2018.1512497
|url=http://www.sciencedirect.com/science/article/pii/S0304416506001826
|arxiv=
|arxiv=
|bibcode=
|bibcode=
|doi=10.1080/09712119.2018.1512497
|doi=10.1016/j.bbagen.2006.06.020
|pmid=
|pmid=16945486
|accessdate=6 August 2021 }}</ref> This contains the core consensus sequence GTCATGTGCT.<ref name=Bertolotto/>
|accessdate=2017-10-08 }}</ref>


====SER elements====
Pharmacogenomic variants have been reported.<ref name=McDonough/>


Consensus sequences: ACAGGATGT.
====Genotypes====


===Basic helix-span-helix===
'''Def.''' the "part (DNA sequence) of the genetic makeup of an organism which determines a specific characteristic (phenotype) of that organism"<ref name=GenotypeWikt1>{{ cite web
|author=[[wikt:User:DTLHS|DTLHS]]
|title=genotype
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=10 January 2018
|url=https://en.wiktionary.org/wiki/genotype
|accessdate=25 March 2020 }}</ref> or a "group of organisms having the same genetic constitution" <ref name=GenotypeWikt>{{ cite web
|author=[[wikt:User:SemperBlotto|SemperBlotto]]
|title=genotype
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=22 October 2005
|url=https://en.wiktionary.org/wiki/genotype
|accessdate=25 March 2020 }}</ref>is called a '''genotype'''.


====Activating proteins====
There are A1BG genotypes.<ref name=McDonough>{{ cite journal
{{main|Activating protein gene transcriptions}}
|author=Caitrin W. McDonough, Yan Gong, Sandosh Padmanabhan, Ben Burkley, Taimour Y. Langaee, Olle Melander, Carl J. Pepine, Anna F. Dominiczak, Rhonda M. Cooper-DeHoff, and Julie A. Johnson
The activating protein GCCTGGCC (Cohen) has eight occurrences on both sides of A1BG in its promoters: five from the negative direction from ZSCAN22 (two in the UTR and three in the distal promoter), and three from the positive direction in the distal promoter. But, sampling the random datasets (ten for GCCTGGCC and the same ten for the inverse complement GGCCAGGC found no occurrences. This indicates these sequences in the promoters of either side of A1BG are likely real and activable.
|title=Pharmacogenomic Association of Nonsynonymous SNPs in ''SIGLEC12'', ''A1BG'', and the Selectin Region and Cardiovascular Outcomes
 
|journal=Hypertension
The second activating protein TCCCCCGCCC (Cohen) had only one occurrence that being in the positive direction toward A1BG from ZNF497 ending at 4440 nts inside A1BG gene downstream from the TSS at 4300 nts. But, sampling the random datasets (ten for TCCCCCGCCC and the same ten for the inverse complement GGGCGGGGGA found no occurrences. This indicates this sequence in the promoter of the ZNF497 side of A1BG is likely real and activable.
|date=June 2013
 
|volume=62
The activating protein TCTTCCC (Yao) has three occurrences around A1BG: two in the negative direction: TCTTCCC at 1657, GGGAAGA at 620 in the distal promoter, and one in the positive direction CCCTTCT at 4264 which is only one nts from being inside the core promoter and is likely a core promoter transcription factor.
|issue=1
 
|pages=48-54
Testing the random datasets with TCTTCCC<ref name=Yao/> (Yao) and its complement inverse GGGAAGA yielded three sequences in the negative direction: GGGAAGA at 4383, TCTTCCC at 3951, and TCTTCCC at 924, for ten datasets (0.3 per dataset), versus two real sequences. For the positive direction, the random datasets yielded four sequences: TCTTCCC at 1995, TCTTCCC at 1201, GGGAAGA at 1193, and GGGAAGA at 468, (0.4 per dataset) versus one real sequence.
|url=http://hyper.ahajournals.org/content/hypertensionaha/early/2013/05/20/HYPERTENSIONAHA.111.00823.full.pdf
 
In both instances, the sequences found in random datasets were much less common than in the real directions around A1BG.
 
The second activating protein consensus sequence CTCCCA<ref name=Yao>{{ cite journal
| author = Yao EF, Denison MS
| title = DNA sequence determinants for binding of transformed Ah receptor to a dioxin-responsive enhancer
| journal = Biochemistry
| volume = 31
| issue = 21
| pages = 5060–7
| date = June 1992
| pmid = 1318077
| doi = 10.1021/bi00136a019 }}</ref> and its inverse complement TGGGAG occur nine times in the UTR of A1BG, whereas four random datasets contain mostly one and once two such sequences. None of either occur in the core promoters. None occur in the proximal promoters in either real direction, but two occur (10 %) in the random datasets. In the distal promoters, nine to ten real sequences occur, whereas one to 1.4 occur per each random dataset. These results indicate that these sequences are likely real and activable.
 
"Pemphigus foliaceus (PF) is an autoimmune disease, endemic in Brazilian rural areas, characterized by acantholysis and accompanied by complement activation, with generalized or localized distribution of painful epidermal blisters. CD59 is an essential complement regulator, inhibiting formation of the membrane attack complex, and mediating signal transduction and activation of T lymphocytes. ''CD59'' has different transcripts by alternative splicing, of which only two are widely expressed, suggesting the presence of regulatory sites in their noncoding regions. To date, there is no association study with polymorphisms in ''CD59'' noncoding regions and susceptibility to autoimmune diseases. In this study, we aimed to evaluate if ''CD59'' polymorphisms have a possible regulatory effect on gene expression and susceptibility to PF. Six noncoding polymorphisms were haplotyped in 157 patients and 215 controls by sequence-specific [polymerase chain reaction (PCR)] PCR, and CD59 mRNA levels were measured in 82 subjects, by qPCR. The ''rs861256-allele-G'' (''rs861256*G'') was associated with increased mRNA expression (''p'' = .0113) and PF susceptibility in women (OR = 4.11, ''p'' = .0001), which were also more prone to develop generalized lesions (OR = 4.3, ''p'' = .009) and to resist disease remission (OR = 3.69, ''p'' = .045). Associations were also observed for ''rs831625*G'' (OR = 3.1, ''p'' = .007) and ''rs704697*A'' (OR = 3.4, ''p'' = .006) in Euro-Brazilian women, and for ''rs704701*C'' (OR = 2.33, ''p'' = .037) in Afro-Brazilians. These alleles constitute the ''GGCCAA'' haplotype, which also increases PF susceptibility (OR = 4.9, ''p'' = .045) and marks higher mRNA expression (''p'' = .0025). [...] higher ''CD59'' transcriptional levels may be related with PF susceptibility (especially in women), probably due to the effect of genetic polymorphism and to the CD59 role in T cell signal transduction."<ref name=Silva>{{ cite journal
|author=Amanda Salviano-Silva, Maria Luiza Petzl-Erler & Angelica Beate Winter Boldt
|title=''CD59'' polymorphisms are associated with gene expression and different sexual susceptibility to pemphigus foliaceus
|journal=Autoimmunity
|date=29 April 2017
|volume=50
|issue=6
|pages=377-385
|url=https://www.tandfonline.com/doi/abs/10.1080/08916934.2017.1329830
|arxiv=
|arxiv=
|bibcode=
|bibcode=
|doi=10.1080/08916934.2017.1329830
|doi=10.1161/HYPERTENSIONAHA.111.00823
|pmid=
|pmid=23690342
|accessdate=27 September 2021 }}</ref>
|accessdate=2017-10-08 }}</ref>


The third activating protein consensus sequence GGCCAA can occur within the "optimal TCDD-AhR DNA-binding consensus sequence of GCGTGNNA/TNNNC/G [...]."<ref name=Yao/> or (C/G)NNN(A/T)NNGTGCG which yields two UTRs TTGGCC at 4099, TTGGCC at 3948 and six distal promoters (one in the negative direction) and five in the positive direction. The random datasets yielded usually one to two UTRs with an average of 0.4 per data set, two core promoters for an average of 0.1, two proximal promoters, and twenty-one distal promoters for 0.9 and 1.2 per direction, whereas the real nucleotides have no core promoters and six distal promoters (one in the negative direction and five in the positive direction).
A1BG has a genetic risk score of rs893184.<ref name=McDonough/>


Two activating protein response elements have been investigated: [G/C]CCN(3,4)GG[G/C] and GCCCACGGG.<ref name=Murata/> The second does not occur on either side of A1BG and also did not occur in the random datasets. The first one which is more general occurred in the core promoter (two) and proximal promoter (two) for the positive direction only. Both directions occurred in the distal promoters: negative direction (two) and positive direction (eighteen). In the random datasets: twenty-one consensus sequences occurred in the UTR for an average of 3.6 per dataset, three were core promoters in the arbitrarily chosen positive direction for 0.6 average, two proximal promoters (one negative, one positive direction) and the distal promoters had twenty-two in the negative direction and positive direction for an average of 4.4 in either direction. There was essentially no agreement between real and random consensus sequences. Therefore, the more general activating protein is likely or activable.
"A genetic risk score, including rs16982743, rs893184, and rs4525 in F5, was significantly associated with treatment-related adverse cardiovascular outcomes in whites and Hispanics from the INVEST study and in the Nordic Diltiazem study (meta-analysis interaction P=2.39×10<sup>−5</sup>)."<ref name=McDonough/>


===Stem-loops===
====Polymorphs====
[[Image:Stem-loop.svg|thumb|right|300px|An example of an RNA stem-loop is shown. Credit: [[c:user:Sakurambo|Sakurambo]].{{tlx|free media}}]]
As an important secondary structure of RNA, a stem-loop can direct RNA folding, protect structural stability for messenger RNA (mRNA), provide recognition sites for RNA binding proteins, and serve as a substrate for enzymatic reactions.<ref>Svoboda, P., & Cara, A. (2006). Hairpin RNA: A secondary structure of primary importance. Cellular and Molecular Life Sciences, 63(7), 901-908.</ref>


Hairpin loops are often elements found within the 5'UTR of prokaryotes. These structures are often bound by proteins or cause the attenuation of a transcript in order to regulate translation.<ref name=Meyer>{{cite journal|last=Meyer|first=Michelle|author2=Deiorio-Haggar K |author3=Anthony J |title=RNA structures regulating ribosomal protein biosynthesis in bacilli|journal=RNA Biology|date=July 2013|volume=10|series=7|pages=1160–1164|doi=10.4161/rna.24151|pmid=23611891 }}</ref>
'''Def.''' the "regular existence of two or more different genotypes within a given species or population; also, variability of amino acid sequences within a gene's protein"<ref name=PolymorphismWikt>{{ cite web
|author=[[wikt:User:Widsith|Widsith]]
|title=polymorphism
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=28 March 2012
|url=https://en.wiktionary.org/wiki/polymorphism
|accessdate=25 March 2020 }}</ref> is called '''polymorphism'''.


The mRNA stem-loop structure forming at the ribosome binding site may control an initiation of translation.<ref name=Malys2009>{{cite journal | author = Malys N, Nivinskas R | title = Non-canonical RNA arrangement in T4-even phages: accommodated ribosome binding site at the gene 26-25 intercistronic junction |journal = Mol Microbiol |volume = 73 | issue = 6 | pages = 1115–1127 | date = 2009 | pmid = 19708923 | doi =10.1111/j.1365-2958.2009.06840.x }}</ref><ref name=Malys2010>{{ cite journal | author = Malys N, McCarthy JEG | title = Translation initiation: variations in the mechanism can be anticipated |journal = Cellular and Molecular Life Sciences | date = 2010 | doi =10.1007/s00018-010-0588-z | pmid=21076851 | volume = 68 | issue = 6 | pages = 991–1003 }}</ref>
'''Def.''' "one of a number of alternative forms of the same gene occupying a given position, [or locus],<ref name=AlleleWikt1>{{ cite web
{{clear}}
|author=[[wikt:User:217.105.66.98|217.105.66.98]]
|title=allele
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=8 September 2016
|url=https://en.wiktionary.org/wiki/allele
|accessdate=25 March 2020 }}</ref> on a chromosome"<ref name=AlleleWikt>{{ cite web
|author=[[wikt:User:138.130.33.215|138.130.33.215]]
|title=allele
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=7 April 2004
|url=https://en.wiktionary.org/wiki/allele
|accessdate=25 March 2020 }}</ref> is called an '''allele'''.


====Adenylate–uridylate rich elements (Bakheet)====
"rs893184 causes a histidine (His) to arginine (Arg) [nonsynonymous single nucleotide polymorphism (nsSNP), A (minor) for G (major)] substitution at amino acid position 52 in A1BG."<ref name=McDonough/>


"The 3′UTRs were searched for the 13-bp pattern WWWUAUUUAUWW with mismatch=−1 which was computationally derived as previously described ( 2 ). The pattern was further statistically validated against larger sets of mRNA data (10 872 mRNA with 3′UTR; GenBank 119) showing occurrence of the motif in 6.8% of human mRNA."<ref name=Bakheet>{{ cite journal
"Genetic polymorphism of human plasma (serum) alpha 1B-glycoprotein (alpha 1B) was observed using one-dimensional horizontal polyacrylamide gel electrophoresis (PAGE) pH 9.0 of plasma samples followed by Western blotting with specific antiserum to alpha 1B."<ref name=Gahne>{{ cite journal
|author=Tala Bakheet, Bryan R. G. Williams, and Khalid S. A. Khabar
|author=B. Gahne, R. K. Juneja, and A. Stratil
|title=ARED 2.0: an update of AU-rich element mRNA database
|title=Genetic polymorphism of human plasma alpha 1B-glycoprotein: phenotyping by immunoblotting or by a simple method of 2-D electrophoresis
|journal=Nucleic Acids Research
|journal=Human Genetics
|date=1 January 2003
|date=June 1987
|volume=31
|volume=76
|issue=2
|pages=111-5
|url=https://link.springer.com/article/10.1007%2FBF00284904
|arxiv=
|bibcode=
|doi=10.1007/bf00284904
|pmid=3610142
|accessdate=25 March 2020 }}</ref>
 
''A1B*5'' is a "new allele [...] of human plasma 𝜶<sub>1</sub>B-glycoprotein [...]."<ref name=Juneja1989>{{ cite journal
|author=R.K. Juneja, G. Beckman, M. Lukka, B. Gahne, and C. Ehnholm
|title=Plasma α<sub>1</sub>B-Glycoprotein Allele Frequencies in Finns and Swedish Lapps: Evidence for a New α<sub>1</sub>B Allele
|journal=Human Heredity
|date=1989
|volume=39
|issue=1
|issue=1
|pages=421-423
|pages=32-36
|url=https://academic.oup.com/nar/article/31/1/421/2401201?login=true
|url=https://www.karger.com/Article/Abstract/153828
|arxiv=
|arxiv=
|bibcode=
|bibcode=
|doi=10.1093/nar/gkg023
|doi=10.1159/000153828
|pmid=
|pmid=2759622
|accessdate=23 March 2021 }}</ref> This consensus sequence when in a promoter would be WWWUAUUUAUWW=(A/T)(A/T)(A/T)TATTTAT(A/T)(A/T).<ref name=Bakheet/> This sequence occurred only twice in the promoters of A1BG both in the negative direction: negative strand, negative direction TTTTATTTATTA at 4076 and a complement inverse on the positive strand, negative direction AAATAAATAATA at 4077. Both are in the UTR of A1BG in the negative direction.
|accessdate=25 March 2020 }}</ref>


The twenty random datasets yielded a direct sequence AATTATTTATTT at 859 in the arbitrary positive direction and a complement inverse TAATAAATAAAA at 1499 in the arbitrary negative direction, both in the distal promoters. The real sequences are likely active or activable.
"Genetic polymorphism of human plasma 𝜶<sub>1</sub>B-glycoprotein (𝜶<sub>1</sub>B) was reported first, in brief, by Altland ''et al.'' [1983; also given in Altkand and Hacklar, 1984]. A detailed description of human 𝜶<sub>1</sub>B polymorphism was reported in subsequent studies [Gahne ''et al.'', 1987; Juneja ''et al.'', 1988, 1989]. Five different 𝜶<sub>1</sub>B alleles (''A1B*1, A1B*2, A1B*3, A1B*4'' and ''A1B*5'') were reported. In Caucasian whites, the frequencies of ''A1B*1'' and ''''A1B*2'' were about 0.95 and 0.05, respectively. ''A1B*4'' was observed in 2 related Czech individuals. In American blacks, ''A1B*1'' and ''A1B*2'' occurred with a frequency of 0.73 and 0.21, respectively, while a new allele, viz, ''A1B*3'' had a frequency of 0.06. ''A1B*5'' was observed only in Swedish Lapps and in Finns with a frequency of 0.04 and 0.007, respectively."<ref name=Juneja>{{ cite journal
|author=R.K. Juneja, N. Saha, B. Gahne and J.S.H. Tay
|title=Distribution of Plasma Alpha-1-B-Glycoprotein Phenotypes in Several Mongoloid Populations of East Asia
|journal=Human Heredity
|date=1989
|volume=39
|issue=
|pages=218-222
|url=https://www.karger.com/Article/PDF/153863
|arxiv=
|bibcode=
|doi=10.1159/000153863
|pmid=2583734
|accessdate=25 March 2020 }}</ref>


====Adenylate–uridylate rich elements (Chen and Shyu, Class I)====
"The frequency of ''A1B*1'' varied from 0.89 to 0.91 and that of ''A1B*2'' from 0.08 to 0.10. The ''A1B*3'' allele, reported previously only in American blacks, was observed with a frequency range of 0.003-0.01 in 3 of the Chinese populations, in Koreans and in Malays. A new 𝜶<sub>1</sub>B allele (''A1B*6'') was observed in 2 Chinese individuals."<ref name=Juneja/>


"Class I AUUUA-containing AREs had 1-3 copies of scattered AUUUA motifs coupled with a nearby U-rich region or U stretch".<ref name=Siegel/> This consensus sequence when in a promoter would be AUUUA=ATTTA. The real promoters have two sequences in the UTR of A1BG: negative strand, negative direction, ATTTA at 4073 and positive strand, negative direction, ATTTA at 4535; two sequences in the proximal promoters: negative strand, negative direction, ATTTA at 2636 and negative strand, positive direction, ATTTA at 4135; and two sequences in the distal promoters: negative strand, negative direction, ATTTA at 1698, and positive strand, positive direction, ATTTA at 3428.
====Phenotypes====


The random datasets had very different results: of the five arbitrarily chosen negative direction datasets, three had eleven sequences in the UTR for a 2.2 average; the ten datasets had only one sequence in a core promoter ATTTA at 4287 for an average of 0.1;, the ten datasets had three sequences in the proximal promoters for an average of 0.3; and the ten datasets had forty-nine for an average of 4.9 per direction.
'''Def.''' the "appearance of an organism based on a single trait [multifactorial combination of genetic traits and environmental factors]<ref name=PhenotypeWikt2>{{ cite web
 
|author=[[wikt:User:24.235.196.118|24.235.196.118]]
====Adenylate–uridylate rich elements (Chen and Shyu, Class II)====
|title=phenotype
 
|publisher=Wikimedia Foundation, Inc
Class "II AUUUA-containing AREs had at least two overlapping copies of the nonamer UUAUUUA(U/A)(U/A) in a U-rich region."<ref name=Siegel/> UUAUUUA(U/A)(U/A) in the promoters would be TTATTTA(A/T)(A/T).
|location=San Francisco, California
 
|date=23 September 2007
The real promoters have only one occurrence: in the UTR, negative strand, negative direction: TTATTTATT at 4075.
|url=https://en.wiktionary.org/wiki/phenotype
 
|accessdate=2016-10-04 }}</ref>, especially used in pedigrees"<ref name=PhenotypeWikt1>{{ cite web
The twenty random datasets had four sequences all in the distal promoters for an average of 0.2. The one real occurrence is likely active or activable.
|author=[[wikt:User:SemperBlotto|SemperBlotto]]
|title=phenotype
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=14 February 2005
|url=https://en.wiktionary.org/wiki/phenotype
|accessdate=2016-10-04 }}</ref> or any "observable characteristic of an organism, such as its morphological, developmental, biochemical or physiological properties, or its behavior"<ref name=PhenotypeWikt>{{ cite web
|author=[[wikt:User:N2e|N2e]]
|title=phenotype
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=3 July 2008
|url=https://en.wiktionary.org/wiki/phenotype
|accessdate=2016-10-04 }}</ref> is called a '''phenotype'''.


====Adenylate–uridylate rich elements (Chen and Shyu, Class III)====
"The three different phenotypes of α1B observed (designated 1-1, 1-2, and 2-2) were apparently identical to those reported by Altland et al. (1983), who used double one-dimensional electrophoresis. Family data supported the hypothesis that the three α1B phenotypes are determined by two codominant alleles at an autosomal locus, designated A1B. Allele frequencies in a Swedish population were: A1B *1, 0.937; A1B *2, 0.063; PIC, 0.111."<ref name=Gahne/>


"Subsequent studies based on analyses of a set of 4884 AUUUA-containing AREs led to a new classification based primarily on the number of overlapping AUUUA-repeats [8, 9, 10]."<ref name=Siegel/>
====Protein species====


Both the sequence ATTT and its inverse complement AAAT were searched on both sides of A1BG. An overlap would occur e.g. as follows ATTT occurs on the negative strand in the negative direction at 4514, i.e. ATTT ends at 4514, ATT ends at 4513, AT ends at 4512, and the A occurs at 4511. A first overlap would be ATTTATTT beginning at 4510, but the next ATTT ends at 4072. In order to overlap an occurrence near A1BG would need to end at -4 before the specific occurrence. For example, ATTT ends at 3014, but the further away ARE is ATTT at 3009, which is -5 rather than -4 so there is no overlapping repeat. For the negative strand in the negative direction there are no ATTTATTT overlapping repeats. For each of the direct sequences there are no overlapping repeats. However, for the inverse complements, there is an overlapping sequence positive strand, negative direction: AAAT at 4073 and AAAT at 4069, yielding AAATAAAT at 4073.
"Both protein species of [alpha 1-beta glycoprotein] A1B (A1Ba, p = 0.008; f.c.= +1.62, A1Bb, p = 0.003; f.c. = +1.82) [...] were apparently overexpressed in patients with PTCa [...]."<ref name=Abdullah>{{ cite journal
 
|author=Mardiaty Iryani Abdullah, Ching Chin Lee, Sarni Mat Junit, Khoon Leong Ng, and Onn Haji Hashim
For the twenty random datasets, there are (1) AURIIIr5: ATTT at 859 and ATTT at 855 for ATTTATTT at 859, (2) AURIIIr6: ATTT at 1143 and ATTT at 1139 for ATTTATTT at 1143, and (3) AURIIIr7ci: AAAT at 3634 and AAAT at 3630 for AAATAAAT at 3634. This yields a probability of 0.15 for direct and inverse complement whereas the real promoters have one occurrence. The sequence AAATAAAT at 4073 in the UTR of A1BG (negative direction) is likely active or activable. The two that did occur in the random datasets were both in the arbitrary positive direction. Choosing the other datasets would put one in the UTR and the other in the distal promoter.
|title=Tissue and serum samples of patients with papillary thyroid cancer with and without benign background demonstrate different altered expression of proteins
 
|journal=Peer J
====Adenylate–uridylate rich overlapping (Siegel) elements====
|date=13 September 2016
 
|volume=4
"Cluster 1 and 2 motifs total 13 nucleotides, with AU-rich segments flanking one or two AUUUA core motifs, respectively. Clusters 3, 4 and 5 include 3, 4, or 5 exact AUUUA repeats respectively."<ref name=Siegel/> "Naive Effective Length Pentamers: Pentamers classified by the “effective length” according to the formula floor((length(nt) + registration − 2)/4). “Registration” refers to the starting nucleotides of the ARE within the initial AUUUA pentamer: an ARE that starts AUUU*=0, UUUA*=1, UUAU*=2, and UAUU*=3. No mismatches allowed."<ref name=Siegel/>
 
To find possible ATTT regions within the promoters an algorithm was written to look for sequences of "(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)(A/T)" so that an ATTTA can occur at least once for each of the first five nts. Only the negative strand, negative direction need be considered as the positive strand contains the complements. Possible ATTT regions with overlaps appear to exist such as "TTTATTATT at 4224" but continued examination shows that "TTTTATTAT at 4223" it is not. Another "ATTTATTAT at 4077" upon continuation shows "ATTTATTAT at 4077, TATTTATTA at 4076, TTATTTATT at 4075, TTTATTTAT at 4074, TTTTATTTA at 4073, TTTTTATTT at 4072" that an "ATTTA" is present but overlapping is unlikely. No other such "ATTTA" sequence wa found in either direction or side of A1BG.
 
In the random datasets an "ATTTAAAAA at 2395" was found in ''ARESr0'', as was a complement "TAAATAAAA at 1499, ATAAATAAA at 1498, AATAAATAA at 1497, TAATAAATA at 1496, ATAATAAAT at 1495, AATAATAAA at 1494". Others were found "TTTTATTTA at 3611, ATTTTATTT at 3610" and "AATTTAATT at 1007, AAATTTAAT at 1006, AAAATTTAA at 1005" in ''ARESr1'', "TAAATTTTT at 3953, ATAAATTTT at 3952, AATAAATTT at 3951, TAATAAATT at 3950, ATAATAAAT at 3949", "AAATTAATT at 1556, TAAATTAAT at 1555" and "TTAAATTTA at 1771" in ''ARESr2'', "TAAATTTTA at 4196, TTAAATTTT at 4195, TTTAAATTT at 4194, TTTTAAATT at 4193, TTTTTAAAT at 4192" in ''ARESr3'', "TATATTTAA at 3724, TTATATTTA at 3723" and "TTAATAAAT at 2444, TTTAATAAA at 2443" in ''ARESr4'', "ATTATAAAT at 1564, AATTATAAA at 1563" and "AAATTTATT at 492" in ''ARESr5'', "TAAATATAA at 2239, TTAAATATA at 2238, TTTAAATAT at 2237, TTTTAAATA at 2236, ATTTTAAAT at 2235, AATTTTAAA at 2234" and "ATTTATTTA at 1144, TATTTATTT at 1143, ATATTTATT at 1142, TATATTTAT at 1141", "TATTTATTA at 919" and "AATTATTTA at 187" in ''ARESr6'', "TTAAAAATA at 3920, TTTAAAAAT at 3919, ATTTAAAAA at 3918, AATTTAAAA at 3917, TAATTTAAA at 3916" and "AAATAAATT at 3635, TAAATAAAT at 3634, ATAAATAAA at 3633, TATAAATAA at 3632" in ''ARESr7'', "TTTTAAATA at 3609" in ''ARESr8'', and "ATTATTTAA at 810, AATTATTTA at 809" in ''ARESr9''; for an occurrence of nineteen possible overlaps in ten datasets for 1.9 per dataset compared with one for two or 0.5 for the real promoters which suggests that the occurrence is likely active or activable but insufficient when two or more are needed.
 
====Constitutive decay elements====
 
Constitutive "decay elements (CDEs) [4, 18][...] are conserved stem loop motifs that bind to the proteins Roquin and Roquin2, resulting in increased mRNA decay [18]. CDEs include an upper stem-loop sequence of the form UUCYRYGAA flanked by lower stem sequences. Lower stem sequences are formed by 2-5 nt pairs of reverse-complementary sequences (e.g. CCUUCYRYGAAGG has a lower stem length of 2)."<ref name=Siegel>{{ cite journal
|author=David A. Siegel, Olivier Le Tonqueze, Anne Biton, Noah Zaitlen, and David J. Erle
|title=Massively Parallel Analysis of Human 3′ UTRs Reveals that AU-Rich Element Length and Registration Predict mRNA Destabilization
|journal=bioRxiv
|date=12 February 2020
|volume=
|issue=
|issue=
|pages=
|pages=e2450
|url=https://www.biorxiv.org/content/biorxiv/early/2020/02/12/2020.02.12.945063.full.pdf
|url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5028788/
|arxiv=
|arxiv=
|bibcode=
|bibcode=
|doi=10.1101/2020.02.12.945063
|doi=10.7717/peerj.2450
|pmid=
|pmid=27672505
|accessdate=23 March 2021 }}</ref>
|accessdate=15 March 2020 }}</ref>


For transcription a CDE would occur as TTC(C/T)(A/G)(C/T)GAA for an upper stem-loop sequence. This would be expected to be flanked by lower stem sequences such as CC...GG. The only consensus sequence found on either side of A1BG is TTCCATGAA at 128 in the distal promoter in the positive direction from ZNF497 toward A1BG for an occurrence of 0.125 (1/8). On the two sides of this sequence are GA...CG up to AGAGA...CGGAA which are not the reverse (inverse)-complement of each other. This CDE does not have a CC...GG on the sides so appears to be incomplete or may be a random occurrence rather than a likely active or activable response element.
A1BG is mainly produced in the liver, and is secreted to plasma to levels of approximately 0.22 mg/mL.<ref name=Ishioka/>


Of the ten random direct datasets, two consensus sequences occurred: CDEr8: TTCCATGAA at 1472 and CDEr9: TTCTATGAA at 2350. Likewise for the reverse-complements: CDEr4ci: TTCGCGGAA at 2553 and CDEr5ci: TTCGTGGAA at 633. The odd random datasets were arbitrarly chosen as the positive direction. Such a choice suggests a probability of 0.2 for a positive direction CDE. All random occurrences were in the distal promoters though two (TTCGCGGAA at 2553 and TTCTATGAA at 2350) were near the halfway points (2280 nts and 2222.5 nts) between the genes. These results suggest that the upper stem-loop CDE TTCCATGAA at 128 may be a random occurrence or likely active or activable possibly for ZNF497.
===CRISPs===


==={{chem|Cys|2|His|2}}===
The human cysteine-rich secretory protein (CRISP3) "is present in exocrine secretions and in secretory granules of neutrophilic granulocytes and is believed to play a role in innate immunity."<ref name=Udby>{{ cite journal
 
|author=Udby L, Sørensen OE, Pass J, Johnsen AH, Behrendt N, Borregaard N, Kjeldsen L.
The {{chem|Cys|2|His|2}}-like fold group (C2H2) is by far the best-characterized class of zinc fingers, and is common in mammalian transcription factors, where such domains adopt a simple ββα fold and have the amino acid sequence motif:<ref name=Pabo2001>{{cite journal | author = Pabo CO, Peisach E, Grant RA | title = Design and selection of novel Cys2His2 zinc finger proteins | journal = Annual Review of Biochemistry | volume = 70 | pages = 313–40 | date = 2001 | pmid = 11395410 | doi = 10.1146/annurev.biochem.70.1.313 }}</ref>
|title=Cysteine-rich secretory protein 3 is a ligand of alpha1B-glycoprotein in human plasma
 
|journal=Biochemistry
:X<sub>2</sub>-Cys-X<sub>2,4</sub>-Cys-X<sub>12</sub>-His-X<sub>3,4,5</sub>-His
|date=12 October 2004
 
|volume=43
====Alcohol dehydrogenase repressor 1====
|issue=40
 
|pages=12877-86
"''Saccharomyces cerevisiae'' Alcohol dehydrogenase repressor 1 (Adr1p, YDR216W) is the transcription activator of the ADH2 gene (alcohol dehydrogenase 2) [1,2], which participates in the metabolic switch from glucose to ethanol or glycerol as food sources in yeast. Adr1p is involved in the activation of a number of genes of the respiratory metabolism, including those that regulate peroxisomes and phospholipid biosynthesis [3,4]."<ref name=Buttinelli>{{ cite journal
|url=https://pubs.acs.org/doi/10.1021/bi048823e
|author=Memmo Buttinelli, Gianna Panetta, Ambra Bucci, Daniele Frascaria, Veronica Morea and Adriana Erica Miele
|title=Protein Engineering of Multi-Modular Transcription Factor Alcohol Dehydrogenase Repressor 1 (Adr1p), a Tool for Dissecting In Vitro Transcription Activation
|journal=Biomolecules
|date=17 September 2019
|volume=9
|issue=9
|pages=497
|url=https://www.mdpi.com/2218-273X/9/9/497/htm
|arxiv=
|arxiv=
|bibcode=
|bibcode=
|doi=10.3390/biom9090497
|doi=10.1021/bi048823e
|pmid=
|pmid=15461460
|accessdate=30 October 2020 }}</ref> The upstream activating sequence (UAS) for Adr1p is TTGGGG or TTGG(A/G)G.<ref name=Tang/>
|accessdate=2011-11-28 }}</ref> CRISP3 has a relatively high content in human plasma.<ref name=Udby/>


In the real promoters of A1BG, Adr1 occurs in the UTRs six times, Adr1 occurs in the core promoter in the positive direction TTGGGG at 4302. In the proximal promoters, it occurs once in the negative direction. The only other occurrences are in each direction in the distal promoters: negative direction (seventeen times) and positive direction (eleven times).
"The A1BG-CRISP-3 complex is noncovalent with a 1:1 stoichiometry and is held together by strong electrostatic forces."<ref name=Udby/> "Similar [complex formation] between toxins from snake venom and A1BG-like plasma proteins ... inhibits the toxic effect of snake venom metalloproteinases or myotoxins and protects the animal from envenomation."<ref name=Udby/>


But in the random datasets, Adr1 occurs sixteen times for ten datasets or 1.6 times in the UTR, once for ten datasets in the negative direction for 0.1 and two of three less than or equal to 4445 for ten datasets (0.2) in the arbitrary positive direction in the core promoters, once in the negative direction for ten datasets (0.1) and twice in the positive direction for 0.2 in the proximal promoters, for the distal promoters ten datasets had twenty occurrences for 2.0 per dataset in the negative direction, and thirty-eight occurrences for ten datasets in the arbitrary positive direction for 3.8.
Opossums have a remarkably robust immune system, and show partial or total immunity to the venom of rattlesnakes, ''Agkistrodon piscivorus'', cottonmouths, and other ''Crotalinae'', pit vipers.<ref>{{ cite web
|url=http://www.wildliferescueleague.org/report/opossum.html
|title=The Opossum: Our Marvelous Marsupial, The Social Loner
|publisher=Wildlife Rescue League }}</ref><ref>[http://www.scielo.br/scielo.php?pid=S0104-79301999000100005&script=sci_arttext Journal Of Venomous Animals And Toxins – Anti-Lethal Factor From Opossum Serum Is A Potent Antidote For Animal, Plant And Bacterial Toxins]. Retrieved 2009-12-29.</ref>


===AP-2/EREBP-related factors===
"Crisp3 [is] mainly [expressed] in the salivary glands, pancreas, and prostate."<ref name=Haendler>{{ cite journal
 
|author=B Haendler, J Krätzschmar, F Theuring and W D Schleuning
====AGC boxes====
|title=Transcripts for cysteine-rich secretory protein-1 (CRISP-1; DE/AEG) and the novel related CRISP-3 are expressed under androgen control in the mouse salivary gland.
 
|journal=Endocrinology
"cDNA clones have been identified representing 4 novel DNA-binding proteins, called ethylene-responsive element binding proteins (EREBPs), that specifically bind the ERE AGC box".<ref name=Metzger>{{ cite journal
|date=July 1993
|author=Gerhard Leubner-Metzger, Luciana Petruzzelli, Rosa Waldvogel, Regina Vögeli-Lange, and Frederick Meins, Jr.
|volume=133
|title=Ethylene-responsive element binding protein (EREBP) expression and the transcriptional regulation of class I β-1, 3-glucanase during tobacco seed germination
|issue=1
|journal=Plant Molecular Biology
|pages=192-8
|date=November 1998
|url=http://endo.endojournals.org/content/133/1/192.full.pdf+html
|volume=38
|issue=5
|pages=785-95
|url=http://link.springer.com/article/10.1023/A:1006040425383
|arxiv=
|arxiv=
|bibcode=
|bibcode=
|doi=10.1023/A:1006040425383
|doi=10.1210/en.133.1.192
|pmid=
|pmid= 8319566
|accessdate=2014-05-02 }}</ref>
|accessdate=2012-02-20 }}</ref> "CRISP3 is highly expressed in the human cauda epididymidis and ampulla of vas deferens (Udby et al. 2005)."<ref name=Haendler/>


In the real promoters on either side of A1BG only one AGC box occurs on the positive strand, negative direction GGCGGCT at 1754 in the distal promoter. These four strands in two directions yielded only this consensus sequence for a rate of 0.25 per strand.
==A1BG-AS1==


In the twenty random datasets there are ten sequences: three in the UTR (arbitrary negative direction), one on the core promoter (arbitrary positive direction), one in the proximal promoter (arbitrary positive direction), and five in the distal promoters, for a response rate of 0.5 per strand. The striking difference between the real promoters and the random promoters suggests that the limited response of the real promoters is likely active or activable. A1BG coding strand is on the positive strand, negative direction which adds some weight to the likelihood this AGC box is used.
Gene ID: 503538 is [[A1BG-AS1]] A1BG antisense RNA 1.<ref name=HGNC503538>{{ cite web
|author=HGNC
|title=A1BG-AS1 A1BG antisense RNA 1 [ Homo sapiens (human) ]
|publisher=National Center for Biotechnology Information
|location=U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA
|date=10 December 2019
|url=https://www.ncbi.nlm.nih.gov/gene/503538
|accessdate=2019-12-18 }}</ref> A1BG-AS1 is transcribed in the negative direction from ZSCAN22.<ref name=HGNC503538/>


{|class="wikitable"
Gene ID 503538 extends from 58,351,390 to 58,355,183. It is a long, non-coding (lnc) RNA.<ref name=Bai>{{ cite journal
|-
|author=Jigang Bai, Bowen Yao, Liang Wang, Liankang Sun, Tianxiang Chen, Runkun Liu, Guozhi Yin, Qiuran Xu, Wei Yang
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|title=lncRNA A1BG-AS1 suppresses proliferation and invasion of hepatocellular carcinoma cells by targeting miR-216a-5p
|-
|journal=
| Reals || UTR || negative || 0 || 2 || 0 || 0
|date=June 2019
|-
|volume=120
| Randoms || UTR || arbitrary negative || 3 || 10 || 0.3 || 0.25
|issue=6
|-
|pages=10310-10322
| Randoms || UTR || alternate negative || 2 || 10 || 0.2 || 0.25
|url=https://pubmed.ncbi.nlm.nih.gov/30556161/
|-
| Reals || Core || negative || 0 || 2 || 0 || 0
|-
| Randoms || Core || negative || 0 || 10 || 0 || 0.05
|-
| Reals || Core || positive || 0 || 2 || 0 || 0
|-
| Randoms || Core || positive || 1 || 10 || 0.1 || 0.05
|-
| Reals || Proximal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || negative || 0 || 10 || 0 || 0.05
|-
| Reals || Proximal || positive || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || positive || 1 || 10 || 0.1 || 0.05
|-
| Reals || Distal || negative || 1 || 2 || 0.5 || 0.25
|-
| Randoms || Distal || negative || 2 || 10 || 0.2 || 0.25
|-
| Reals || Distal || positive || 0 || 2 || 0 || 0.25
|-
| Randoms || Distal || positive || 3 || 10 || 0.3 || 0.25
|}
 
Comparison:
 
The occurrence of a real AGC box is greater than the randoms. This suggests that the real AGC box is likely active or activable.
 
===AP-1 transcription factor network (Pathway)===
 
Sixty-nine genes are included in the AP-1 transcription factor network (Pathway).<ref name=AP-1TFN>{{ cite web
|author=NCBI
|title=AP-1 transcription factor network
|publisher=National Center for Biotechnology Information, U.S. National Library of Medicine
|location=8600 Rockville Pike, Bethesda MD, 20894 USA
|date=9 March 2021
|url=https://pubchem.ncbi.nlm.nih.gov/pathway/Pathway%20Interaction%20Database:ap1_pathway
|accessdate=26 October 2021 }}</ref>
 
====Angiotensinogen core promoter element 1====
 
One of the AP-1 transcription factor network genes is Gene ID: 183 AGT Angiotensin II. The response element class for AGCE1 is not stated but likely bZIP although ACGT does not occur within the response element consensus sequence (A/C)T(C/T)GTG, "located between the TATA box and transcription initiation site (positions −25 to −1)".<ref name=Yanai>{{ cite journal
|author=Kazuyuki Yanai, Tomoko Saito, Keiko Hirota, Hideyuki Kobayashi, Kazuo Murakami and Akiyoshi Fukamizu
|title=Molecular Variation of the Human Angiotensinogen Core Promoter Element Located between the TATA Box and Transcription Initiation Site Affects Its Transcriptional Activity
|journal=The Journal of Biological Chemistry
|date=28 November 1997
|volume=272
|issue=48
|pages=30558-62
|url=http://www.jbc.org/cgi/pmidlookup?view=long&pmid=9374551
|arxiv=
|bibcode=
|doi=
|pmid= 9374551
|accessdate=2012-02-20 }}</ref> The three occurrences are ATCGTG, CTCGTG, and ATTGTG, but does not include CTTGTG.<ref name=Yanai/>
 
The AGCE1 "acts as a critical regulator of ''AGT'' transcription".<ref name=Sato>{{ cite journal
|author=Noriyuki Sato; Tomohiro Katsuya; Hiromi Rakugi; Seiju Takami; Yukiko Nakata; Tetsuro Miki; Jitsuo Higaki; Toshio Ogihara
|title=Association of Variants in Critical Core Promoter Element of Angiotensinogen Gene With Increased Risk of Essential Hypertension in Japanese
|journal=Hypertension
|date=September 1997
|volume=30
|issue=3 Pt 1
|pages=321-5
|url=http://www.ncbi.nlm.nih.gov/pubmed/9314411
|arxiv=
|bibcode=
|doi=10.1161/01.HYP.30.3.321
|pmid=9314411
|accessdate=2012-02-20 }}</ref>
 
The consensus sequence occurs in the UTR between ZSCAN22 and A1BG on both strands seven times. It occurs once in the core promoter between ZNF497 and A1BG on the negative strand in the positive direction. None occur in either proximal promoter. In the distal promoters, nine occur on the strands in the negative direction for 4.5, ten occur on both strands in the positive direction for 5.0.
 
Not including CTTGTG changes the results as the UTR has six occurrences on the strands for 3.0. In the distal promoters there is one occurrence of CTTGTG.
 
For the random datasets, ten occur per ten datasets in the UTR for an occurrence of 1.0 vs. 3.0 in the real promoter. In the core promoters, it occurs only once in the arbitrary positive direction CACAAG at 4525. As the number of nucleotides in the positive direction was limited to 4445 this occurrence is excluded. For the real promoters the occurrence is 0.25 for four strands in two directions vs. zero in the random datasets. If the arbitrary direction is reversed the occurrence would be 0.1 for ten datasets. No consensus sequences occur in either proximal promoter. For the distal promoters, there were ten in the negative direction for an occurrence of 1.0. For the arbitrary positive direction there were twenty-seven occurrences for ten datasets or 2.7.
 
With the exception of the proximal promoters at zero for all datasets and real promoters, the real occurrences were greater than the random occurrences suggesting that the promoter elements are likely active or activable.
 
===Zinc finger DNA-binding domains===
 
====Androgen response elements (Kouhpayeh)====
 
"Androgen response elements structurally consist of a short DNA motif  with  base  sequence  specificity  within  the  promoter upstream of the androgen-responsive genes. The HRE contains a pair of conserved sequences, which are separated by a three-nucleotide spacer. This sequence is determined as 5'-GGTACAnnnTGTTCT-3'<sup>10, 11</sup> with 5'-CGG-3' as the spacer in the androgen response element."<ref name=Kouhpayeh>{{ cite journal
|author=S Kouhpayeh, AR Einizadeh, Z Hejazi, M Boshtam, L Shariati, M Mirian, L Darzi, M Sojoudi, H Khanahmad and A Rezaei
|title=Antiproliferative effect of a synthetic aptamer mimicking androgen response elements in the LNCaP cell line
|journal=Cancer Gene Therapy
|date=1 July 2016
|volume=23
|issue=
|pages=254-257
|url=https://www.researchgate.net/profile/Mina_Mirian/publication/304707422_Antiproliferative_effect_of_a_synthetic_aptamer_mimicking_androgen_response_elements_in_the_LNCaP_cell_line/links/59ffed00458515d0706e4f27/Antiproliferative-effect-of-a-synthetic-aptamer-mimicking-androgen-response-elements-in-the-LNCaP-cell-line.pdf
|arxiv=
|bibcode=
|doi=10.1038/cgt.2016.26
|pmid=
|accessdate=3 October 2020 }}</ref>
 
"ARE half sites, benefit from neighboring motifs or cooperating transcription factors in regulating gene expression."<ref name=Wilson>{{ cite journal
|author=Stephen Wilson, Jianfei Qi & Fabian V. Filipp
|title=Refinement of the androgen response element based on ChIP-Seq in androgen-insensitive and androgen-responsive prostate cancer cell lines
|journal=Scientific Reports
|date=14 September 2016
|volume=6
|issue=
|pages=32611
|url=https://www.nature.com/articles/srep32611
|arxiv=
|bibcode=
|doi=10.1038/srep32611
|pmid=
|accessdate=3 October 2020 }}</ref>
 
The first half to consider is GGTACA and its inverse complement. Of the eight occurrences, only one is in the proximal promoter, negative strand, negative direction for an occurrence of 0.25, whereas the other seven are in the distal promoters: three in the negative direction, one of these on the positive strand and the other two on the negative strand for an occurrence of 1.5, the remaining four are on the positive strand in the positive direction for an occurrence of 2.0.
 
The random datasets had three out of ten occurrences in the arbitrary negative direction UTR for an occurrence of 0.3. The only occurrence in the core promoters was also in the positive direction for an occurrence of 0.1. The proximal promoters had two occurrences in opposite directions for 0.1. The distal promoters had four out of ten in the negative direction for 0.4, a 4 out of ten in the positive direction for 0.4.
 
The real occurrences do not compare well with the random datasets. The real occurrences are likely active or activable.
 
The second half TGTTCT and its complement inverse have five occurrences in the UTR between ZSCAN22 and A1BG, four on the negative strand (4.0) and one on the positive strand (1.0). One occurrence is in the proximal promoter between ZNF497 and A1BG on the negative strand (0.5). There are five distal promoter sequences, three in the negative direction, negative strand (1.0) and (2.0) and two in the positive direction (1.0).
 
The random data sets had only one in the UTR of ten for (0.1), one in the proximal promoters of twenty for (0.05). The distal promoters had five of ten for (0.5) in the arbitrary negative direction and seven of ten in the positive direction for (0.7).
 
Again there is no agreement between random sequences and real sequences suggesting that the real sequences are likely active or activable.
 
For the full AnRE (Kouhpayeh) there are no real sequences. With the random datasets there are none close enough to form the full AnRE.
 
====Androgen response elements (Wilson)====
 
The full fifteen nucleotide sequence AGAACANNNTGTTCT doesn't occur in any promoter for A1BG. While the TGTTCT portion was studied as part of "Androgen response elements (Kouhpayeh)" the other portion AGAACA which is the complement inverse of TGTTCT was looked at as part of "Androgen response element2 (Kouhpayeh)". The random occurrences were 0.1 for the one UTR, 0.1 for one of ten, and the distal promoters had two for each direction for 0.2 each. These are much lower than the real results suggesting that the reals are likely active or activable.
 
====B box (Johnson)====
 
TGGGCA is a B-box.<ref name=Johnson>{{ cite journal
|author=PA Johnson, D Bunick, NB Hecht
|title=Protein Binding Regions in the Mouse and Rat Protamine-2 Genes
|journal=Biology of Reproduction
|date=1991
|volume=44
|issue=1
|pages=127-134
|url=https://academic.oup.com/biolreprod/article-pdf/44/1/127/10536199/biolreprod0127.pdf
|arxiv=
|bibcode=
|doi=
|pmid=
|accessdate=6 April 2019 }}</ref> It shows up in the UTR (occurrence of 3.0), proximal promoter (occurrence of 0.5), and distal promoter (occurrence of 3.0) of A1BG between ZSCAN22 and A1BG (negative direction). In the positive direction, proximal promoter (occurrence of 0.5) and distal promoter (occurrence of 3.0).
 
The random datasets had three for occurrence of 0.3 in the UTR, one for ten in the proximal promoter (occurrence of 0.1), but in the distal promoter (occurrence of 0.7, negative direction) or seventeen (occurrence of 7.0, positive direction).
 
====Box B (Sanchez)====
 
"The human [Transforming growth factor b1] TGFB1 promoter region contains two binding sequences for [Activator protein-1] AP-1, designated AP-1 box A (TGACTCT) and box B (TGTCTCA), which mediate the upregulation of promoter activity via a PKC-dependent pathway after exposure of cells to a high-glucose environment (Refs 37, 38)."<ref name=Paratore/>
 
The real promoters have two box Bs in the UTR (occurrence of 1.0), eight in the negative direction distal promoter (occurrence of 4.0) and three in the positive direction distal promoter (occurrence of 1.5).
 
The random datasets had one either in the UTR or proximal promoter (occurrence of 0.1) and one in the distal promoter (occurrence of 0.1) near either ZSCAN22 or ZNF497. The disparity in occurrences is large indicating that the box Bs are likely active or activable.
 
===β-Scaffold factors===
 
"Higher animals have [transcription factor] TF genes for the basic domain, the β-scaffold factor, and other new
structures; however, their total proportion is less than 15% and most are [zinc (Zn)-coordinating factor] ZF and [Helix-Turn-Helix] HTH genes."<ref name=Nagata>{{ cite book
|author=Toshifumi Nagata, Aeni Hosaka-Sasaki and Shoshi Kikuchi
|title=The Evolutionary Diversification of Genes that Encode Transcription Factor Proteins in Plants, In: ''Plant Transcription Factors Evolutionary, Structural and Functional Aspects''
|publisher=Academic Press
|location=
|date=2016
|editor=Daniel H. Gonzalez
|pages=73-97
|url=https://www.sciencedirect.com/science/article/pii/B9780128008546000051
|arxiv=
|bibcode=
|doi=10.1016/B978-0-12-800854-6.00005-1
|pmid=
|isbn=978-0-12-800854-6
|accessdate=28 November 2021 }}</ref>
 
====ATA boxes====
 
In the real promoters for A1BG, nine consensus ATA boxes (direct and complement inverse) occur in the UTR between ZSCAN22 and A1BG for an occurrence of 4.5 (two strands in one direction). No consensus sequences occur in either core promoter, only one occurs in the four proximal promoters for two directions, occurrence 0.25. The distal promoters have only two consensus sequences for an occurrence of 0.5.
 
The random sequences (ten direct and ten complement inverses) had four sequences in the UTR for an occurrence of 0.4 (one of two directions). The core promoters had only two for 0.2, the proximal promoters had two for 0.2, and the distal promoters had eighteen and thirteen per direction for 1.8 and 1.3.
 
Comparing the two, the real sequences are far more common in the UTR and much less common further away from the transcription start sites suggesting that the real sequences are likely active or activable.
Whether the ATA box is bound by transcription factors possessing a β-Scaffold is not yet known or the source not yet found.
 
====Γ-interferon activated sequences====
 
"Computer analysis of the nt −653 to nt −483 region identified two sites that resemble the [γ-interferon activated sequence] GAS consensus sequence, TTNCNNNAA (19). Similar GAS-like sites have been shown to mediate the effects of various cytokines, including [growth hormone] GH, on the transcription of other genes (19, 20). The first site, TTCCTAGAA (ALS-GAS1), is located between nt −633 and nt −625; the second site, TTAGACAAA (ALS-GAS2), is located between nt −553 and nt −545."<ref name=Ooi>{{ cite journal
|author=Guck T. Ooi, Kelley R. Hurst, Matthew N. Poy, Matthew M. Rechler, Yves R. Boisclair
|title=Binding of STAT5a and STAT5b to a Single Element Resembling a γ-Interferon-Activated Sequence Mediates the Growth Hormone Induction of the Mouse Acid-Labile Subunit Promoter in Liver Cells
|journal=Molecular Endocrinology
|date=1 May 1998
|volume=12
|issue=5
|pages=675-687
|url=https://academic.oup.com/mend/article/12/5/675/2754376
|arxiv=
|bibcode=
|doi=10.1210/mend.12.5.0115
|pmid=9605930
|accessdate=9 September 2020 }}</ref>
 
{|class="wikitable"
|-
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|-
| Reals || UTR || negative || 7 || 2 || 3.5 || 3.5
|-
| Randoms || UTR || arbitrary negative || 31 || 10 || 2.9 || 2.75
|-
| Randoms || UTR || alternate negative || 24 || 10 || 2.4 || 2.75
|-
| Reals || Core || negative || 0 || 2 || 0 || 0
|-
| Randoms || Core || negative || 0 || 10 || 0 || 0.25
|-
| Reals || Core || positive || 0 || 2 || 0 || 0
|-
| Randoms || Core || positive || 5 || 10 || 0.5 || 0.25
|-
| Reals || Proximal || negative || 0 || 2 || 0 || 0.5
|-
| Randoms || Proximal || negative || 3 || 10 || 0.3 || 0.3
|-
| Reals || Proximal || positive || 2 || 2 || 1.0 || 0.5
|-
| Randoms || Proximal || positive || 3 || 10 || 0.3 || 0.3
|-
| Reals || Distal || negative || 10 || 2 || 5.0 || 3.25
|-
| Randoms || Distal || negative || 41 || 10 || 4.1 || 4.65
|-
| Reals || Distal || positive || 3 || 2 || 1.5 || 3.25
|-
| Randoms || Distal || positive || 52 || 10 || 5.2 || 4.65
|}
 
Comparison:
 
The occurrences of real Γ-interferon activated sequences are systematically outside the range of the randoms. This suggests that the real responsive element consensus sequences are likely active or activable.
 
====HMG boxes====
 
"Most HMG box proteins contain two or more HMG boxes and appear to bind DNA in a relatively sequence-aspecific manner (5, 13, 15, 16 and references therein). [...] they all appear to bind to the minor groove of the A/T A/T C A A A G-motif (10, 14, 18-20)."<ref name=Laudet>{{ cite journal
|author=Vincent Laudet, Dominique Stehelin and Hans Clevers
|title=Ancestry and diversity of the HMG box superfamily
|journal=Nucleic Acids Research
|date=1993
|volume=21
|issue=10
|pages=2493-501
|url=https://academic.oup.com/nar/article-pdf/21/10/2493/4086740/21-10-2493.pdf
|arxiv=
|bibcode=
|doi=
|pmid=
|accessdate=2017-04-05 }}</ref>
 
{|class="wikitable"
|-
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|-
| Reals || UTR || negative || 1 || 2 || 0.5 || 0.5
|-
| Randoms || UTR || arbitrary negative || 9 || 10 || 0.9 || 0.6
|-
| Randoms || UTR || alternate negative || 3 || 10 || 0.3 || 0.6
|-
| Reals || Core || negative || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Core || positive || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary positive || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate positive || 0 || 10 || 0 || 0
|-
| Reals || Proximal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary negative || 1 || 10 || 0.1 || 0.05
|-
| Randoms || Proximal || alternate negative || 0 || 10 || 0 || 0.05
|-
| Reals || Proximal || positive || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary positive || 0 || 10 || 0 || 0.1
|-
| Randoms || Proximal || alternate positive || 2 || 10 || 0.2 || 0.1
|-
| Reals || Distal || negative || 2 || 2 || 1 || 1
|-
| Randoms || Distal || arbitrary negative || 11 || 10 || 1.1 || 0.8
|-
| Randoms || Distal || alternate negative || 5 || 10 || 0.5 || 0.8
|-
| Reals || Distal || positive || 0 || 2 || 0 || 0
|-
| Randoms || Distal || arbitrary positive || 8 || 10 || 0.8 || 1.35
|-
| Randoms || Distal || alternate positive || 19 || 10 || 1.9 || 1.35
|}
 
Comparison:
 
The occurrences of real HMG boxes are within the range of the randoms. This suggests that the real HMG boxes are likely random.
 
===Zn(II)<sub>2</sub>Cys<sub>6</sub> proteins===
 
"The transcription factors Uga3, Dal81 and Leu3 belong to the class III family (Zn(II)<sub>2</sub>Cys<sub>6</sub> proteins), and they recognize highly related sequences rich in GGC triplets [15]."<ref name=Ruiz/>
 
====Dal81====
 
====GCC boxes====
 
"Expression of the osmotin gene is similar to that of the OLP gene. The osmotin gene also has several AGCCGCC sequences; a complete AGCCGCC (from -50 to -44), a slightly modified CGCCGCC (from -144 to -138), and an AGCCGCC sequence in reverse orientation (from -162 to -156)."<ref name=Sato/>
 
GCC boxes occur in the
# AGC boxes: "The GCC box, also referred to as the '''AGC box''' (10), GCC element (11), or AGCCGCC sequence (13), is an ethylene-responsive element found in the promoters of a large number of [pathogenesis related] PR genes whose expression is up-regulated following pathogen attack."<ref name=Buttner>{{ cite journal
|author=Michael Büttner and Karam B. Singh
|title=''Arabidopsis thaliana'' ethylene-responsive element binding protein (AtEBP), an ethylene-inducible, GCC box DNA-binding protein interacts with an ocs element binding protein
|journal=Proceedings of the National Academy of Sciences of the United States of America
|date=May 27, 1997
|volume=94
|issue=11
|pages=5961-6
|url=http://www.pnas.org/content/94/11/5961.long
|arxiv=
|bibcode=
|doi=
|pmid=
|accessdate=2014-05-02 }}</ref>
# DNA damage response elements (DRE) (Sumrada, core): "A consensus sequence, 5'-TAGCCGCCGRRRR-3' (where R = an unspecified purine nucleoside [A/G],was generated from these data."<ref name=Sumrada/>
# GGC triplets: "The transcription factors Uga3, Dal81 and Leu3 belong to the class III family (Zn(II)<sub>2</sub>Cys<sub>6</sub> proteins), and they recognize highly related sequences rich in GGC triplets [15]."<ref name=Ruiz>{{ cite journal
|author=Marcos Palavecino-Ruiz, Mariana Bermudez-Moretti and Susana Correa-Garcia
|title=Unravelling the transcriptional regulation of ''Saccharomyces cerevisiae UGA'' genes: the dual role of transcription factor Leu3
|journal=Microbiology
|date=12 October 2017
|volume=163
|issue=
|pages=1692-1701
|url=https://www.researchgate.net/profile/Mariana-Bermudez-2/publication/320571623_Unravelling_the_transcriptional_regulation_of_Saccharomyces_cerevisiae_UGA_genes_the_dual_role_of_transcription_factor_Leu3/links/5c62114c299bf1d14cbf7ade/Unravelling-the-transcriptional-regulation-of-Saccharomyces-cerevisiae-UGA-genes-the-dual-role-of-transcription-factor-Leu3.pdf
|arxiv=
|bibcode=
|doi=10.1099/mic.0.000560
|pmid=
|accessdate=20 April 2021 }}</ref>
# Kozak sequences: GCCGCC(A/G)CCATGG.<ref name=Kozak1987>{{ cite journal
|author=Kozak Marilyn
|date=October 1987
|title=An analysis of 5'-noncoding sequences from 699 vertebrate messenger RNAs
|url=http://nar.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=3313277
|journal=Nucleic Acids Research
|volume=15
|issue=20
|pages=8125–8148
|doi=10.1093/nar/15.20.8125
|pmid=3313277 }}</ref>
 
{|class="wikitable"
|-
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|-
| Reals || UTR || negative || 0 || 2 || 0 || 0
|-
| Randoms || UTR || arbitrary negative || 7 || 10 || 0.7 || 0.6
|-
| Randoms || UTR || alternate negative || 5 || 10 || 0.5 || 0.6
|-
| Reals || Core || negative || 0 || 2 || 0 || 0
|-
| Randoms || Core || negative || 0 || 10 || 0 || 0.05
|-
| Reals || Core || positive || 0 || 2 || 0 || 0
|-
| Randoms || Core || positive || 1 || 10 || 0.1 || 0.05
|-
| Reals || Proximal || negative || 1 || 2 || 0.5 || 0.25
|-
| Randoms || Proximal || negative || 1 || 10 || 0.1 || 0.1
|-
| Reals || Proximal || positive || 0 || 2 || 0 || 0.25
|-
| Randoms || Proximal || positive || 1 || 10 || 0.1 || 0.1
|-
| Reals || Distal || negative || 1 || 2 || 0.5 || 1.75
|-
| Randoms || Distal || negative || 9 || 10 || 0.9 || 1.05
|-
| Reals || Distal || positive || 6 || 2 || 3.0 || 1.75
|-
| Randoms || Distal || positive || 12 || 10 || 1.2 || 1.05
|}
 
Comparison:
 
The occurrences of real GCC boxes are greater than the randoms. This suggests that the real GCC boxes are likely active or activable.
 
====GGC triplets====
 
"The transcription factors Uga3, Dal81 and Leu3 belong to the class III family (Zn(II)<sub>2</sub>Cys<sub>6</sub> proteins), and they recognize highly related sequences rich in GGC triplets [15]."<ref name=Ruiz/>
 
{|class="wikitable"
|-
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|-
| Reals || UTR || negative || 79 || 2 || 39.3 || 39.3
|-
| Randoms || UTR || arbitrary negative || 283 || 10 || 28.3 || 28.4
|-
| Randoms || UTR || alternate negative || 285 || 10 || 28.5 || 28.4
|-
| Reals || Core || negative || 0 || 2 || 0 || 3.75
|-
| Randoms || Core || negative || 9 || 10 || 0.9 || 1.875
|-
| Reals || Core || positive || 15 || 2 || 7.5 || 3.75
|-
| Randoms || Core || positive || 66 || 10 || 6.6 || 1.875
|-
| Reals || Proximal || negative || 13 || 2 || 6.5 || 6.25
|-
| Randoms || Proximal || negative || 32 || 10 || 3.2 || 2.95
|-
| Reals || Proximal || positive || 12 || 2 || 6.0 || 6.25
|-
| Randoms || Proximal || positive || 27 || 10 || 2.7 || 2.95
|-
| Reals || Distal || negative || 156 || 2 || 78 || 135
|-
| Randoms || Distal || negative || 414 || 10 || 41.4 || 59.6
|-
| Reals || Distal || positive || 384 || 2 || 192 || 135
|-
| Randoms || Distal || positive || 778 || 10 || 77.8 || 59.6
|}
 
Comparison:
 
The occurrences of real GGC triplets are larger than the randoms. This suggests that the real GGC triplets are likely active or activable.
 
====Leu3====
 
"Known consensus string Type of motifs LEU3 CCGNNNNCGG or GGCNNNNGCC Gapped motif".<ref name=Reddy>{{ cite journal
|author=Uyyala Srinivasulu Reddy, Michael Arock, A.V. Reddy
|title=Discovering of gapped motifs using particle swarm optimisation
|journal=International Journal of Computational Intelligence in Bioinformatics and Systems Biology
|date=20 April 2020
|volume=2
|issue=1
|pages=1-21
|url=https://www.inderscienceonline.com/doi/abs/10.1504/IJCIBSB.2020.106858
|arxiv=
|bibcode=
|doi=10.1504/IJCIBSB.2020.106858
|pmid=
|accessdate=20 April 2021 }}</ref>
 
{|class="wikitable"
|-
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)  
|-
| Reals || UTR || negative || 1 || 2 || 0.5 || 0.5
|-
| Randoms || UTR || arbitrary negative || 40 || 10 || 4.0 || 4.45
|-
| Randoms || UTR || alternate negative || 49 || 10 || 4.9 || 4.45
|-
| Reals || Core || negative || 0 || 2 || 0 || 0
|-
| Randoms || Core || negative || 1 || 10 || 0.1 || 0.7
|-
| Reals || Core || positive || 0 || 2 || 0 || 0
|-
| Randoms || Core || positive || 13 || 10 || 1.3 || 0.7
|-
| Reals || Proximal || negative || 1 || 2 || 0.5 || 0.25
|-
| Randoms || Proximal || negative || 2 || 10 || 0.2 || 0.2
|-
| Reals || Proximal || positive || 0 || 2 || 0 || 0.25
|-
| Randoms || Proximal || positive || 2 || 10 || 0.2 || 0.2
|-
| Reals || Distal || negative || 7 || 2 || 3.5 || 4.5
|-
| Randoms || Distal || negative || 51 || 10 || 5.1 || 6.9
|-
| Reals || Distal || positive || 11 || 2 || 5.5 || 4.5
|-
| Randoms || Distal || positive || 87 || 10 || 8.7 || 6.9
|}
 
Comparison:
 
The occurrences of real UTR, proximals and distals for Leu3 and are systematically lower than the randoms. This suggests that the real Leu3s are likely active or activable.
 
====Uga3====
 
===Hairpin-hinge-hairpin-tail===
 
"In addition to this ACA box, they have the consensus H box sequence (5'-ANANNA-3') but have no other primary sequence identity. Despite this lack of primary sequence conservation, the H and ACA boxes are embedded in an evolutionarily conserved hairpin-hinge-hairpin-tail core secondary structure with the H box in the single-stranded hinge region and the ACA box in the single-stranded tail (5, 16)."<ref name=Mitchell/>
 
====H and ACA boxes====
 
The combined consensus sequence is ACAGGA.<ref name=Mitchell>{{ cite journal
|author=James R. Mitchell, Jeffrey Cheng, ang Kathleen Collins
|title=A Box H/ACA Small Nucleolar RNA-Like Domain at the Human Telomerase RNA 3' End
|journal=Molecular and Cellular Biology
|date=January 1999
|volume=19
|issue=1
|pages=567–576
|url=http://mcb.asm.org/content/19/1/567.full.pdf
|arxiv=
|bibcode=
|doi=
|pmid=
|accessdate=5 November 2018 }}</ref>
 
{|class="wikitable"
|-
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|-
| Reals || UTR || negative || 3 || 2 || 1.5 || 1.5
|-
| Randoms || UTR || arbitrary negative || 2 || 10 || 0.2 || 0.2
|-
| Randoms || UTR || alternate negative || 2 || 10 || 0.2 || 0.2
|-
| Reals || Core || negative || 0 || 2 || 0 || 0
|-
| Randoms || Core || negative || 0 || 10 || 0 || 0
|-
| Reals || Core || positive || 0 || 2 || 0 || 0
|-
| Randoms || Core || positive || 0 || 10 || 0 || 0
|-
| Reals || Proximal || negative || 0 || 2 || 0 || 0.25
|-
| Randoms || Proximal || negative || 0 || 10 || 0 || 0.05
|-
| Reals || Proximal || positive || 1 || 2 || 0.5 || 0.25
|-
| Randoms || Proximal || positive || 1 || 10 || 0.1 || 0.05
|-
| Reals || Distal || negative || 2 || 2 || 1.0 || 2.25
|-
| Randoms || Distal || negative || 8 || 10 || 0.8 || 0.65
|-
| Reals || Distal || positive || 7 || 2 || 3.5 || 2.25
|-
| Randoms || Distal || positive || 5 || 10 || 0.5 || 0.65
|}
 
Comparison:
 
The occurrences of real H and ACA box consensus sequences are greater than the randoms. This suggests that the real H and ACA box consensus sequences are likely active or activable.
 
====H-boxes (Grandbastien)====
 
H box in ''Solanaceae'' has the following consensus sequence CC(A/T)ACCNNNNNNN(A/C)T.<ref name=Grandbastien>{{ cite journal
|author=M.-A. Grandbastien, C. Audeon, E. Bonnivard, J.M. Casacuberta, B. Chalhoub, A.-P.P. Costa, Q.H. Le, D. Melayah, M. Petit, C. Poncet, S.M. Tam, M.-A. Van Sluys, C. Mhiri
|title=Stress activation and genomic impact of Tnt1 retrotransposons in Solanaceae
|journal=Cytogenetic and Genomic Research
|date=July 2005
|volume=110
|issue=1-4
|pages=229-41
|url=https://www.researchgate.net/profile/Corinne_Mhiri/publication/7666072_Stress_activation_and_genomic_impact_of_Tnt1_retrotransposons_in_Solanaceae/links/548089040cf20f081e7258e9/Stress-activation-and-genomic-impact-of-Tnt1-retrotransposons-in-Solanaceae.pdf
|arxiv=
|bibcode=
|doi=10.1159/000084957
|pmid=
|accessdate=5 November 2018 }}</ref>
 
{|class="wikitable"
|-
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|-
| Reals || UTR || negative || 1 || 2 || 0.5 || 0.5
|-
| Randoms || UTR || arbitrary negative || 1 || 10 || 0.1|| 0.1
|-
| Randoms || UTR || alternate negative || 1 || 10 || 0.1 || 0.1
|-
| Reals || Core || negative || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Core || positive || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary positive || 1 || 10 || 0.1 || 0.1
|-
| Randoms || Core || alternate positive || 1 || 10 || 0.1 || 0.1
|-
| Reals || Proximal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || negative || 0 || 10 || 0 || 0
|-
| Reals || Proximal || positive || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || positive || 0 || 10 || 0 || 0
|-
| Reals || Distal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Distal || arbitrary negative || 1 || 10 || 0.1 || 0.1
|-
| Randoms || Distal || alternate negative || 1 || 10 || 0.1 || 0.1
|-
| Reals || Distal || positive || 0 || 2 || 0 || 0
|-
| Randoms || Distal || arbitrary positive || 1 || 10 || 0.1 || 0.1
|-
| Randoms || Distal || alternate positive || 1 || 10 || 0.1 || 0.1
|}
 
Comparison:
 
The occurrences of real H-box (Grandbastien) is greater than the randoms. This suggests that the real H-box (Grandbastien) is likely active or activable.
 
====H-boxes (Lindsay)====
 
"The KAP-2 protein [...] binds to the H-box (CCTACC) element in the bean CHS15 chalcone synthase promoter".<ref name=Lindsay>{{ cite journal
|author=William P. Lindsay, Fiona M. McAlister, Qun Zhu, Xian-Zhi He, Wolfgang Dröge-Laser, Susie Hedrick, Peter Doerner, Chris Lamb and Richard A. Dixon
|title=KAP-2, a protein that binds to the H-box in a bean chalcone synthase promoter, is a novel plant transcription factor with sequence identity to the large subunit of human Ku autoantigen
|journal=Plant Molecular Biology
|date=July 2002
|volume=49
|issue=5
|pages=503–514
|url=https://link.springer.com/article/10.1023/A:1015505316379
|arxiv=
|bibcode=
|doi=10.1023/A:1015505316379
|pmid=
|accessdate=5 October 2019 }}</ref>
 
{|class="wikitable"
|-
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|-
| Reals || UTR || negative || 1 || 2 || 0.5 || 0.5
|-
| Randoms || UTR || arbitrary negative || 7 || 10 || 0.7 || 0.7
|-
| Randoms || UTR || alternate negative || 7 || 10 || 0.7 || 0.7
|-
| Reals || Core || negative || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary negative || 0 || 10 || 0 || 0
|-
| Randoms || Core || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Core || positive || 0 || 2 || 0 || 0
|-
| Randoms || Core || arbitrary positive || 3 || 10 || 0.3 || 0.15
|-
| Randoms || Core || alternate positive || 0 || 10 || 0 || 0.15
|-
| Reals || Proximal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary negative || 1 || 10 || 0.1 || 0
|-
| Randoms || Proximal || alternate negative || 0 || 10 || 0 || 0
|-
| Reals || Proximal || positive || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || arbitrary positive || 1 || 10 || 0.1 || 0.1
|-
| Randoms || Proximal || alternate positive || 1 || 10 || 0.1 || 0.1
|-
| Reals || Distal || negative || 2 || 2 || 1 || 1
|-
| Randoms || Distal || arbitrary negative || 10 || 10 || 1 || 0.7
|-
| Randoms || Distal || alternate negative || 4 || 10 || 0.4 || 0.7
|-
| Reals || Distal || positive || 5 || 2 || 2.5 || 2.5
|-
| Randoms || Distal || arbitrary positive || 7 || 10 || 0.7 || 1.15
|-
| Randoms || Distal || alternate positive || 16 || 10 || 1.6 || 1.15
|}
 
Comparison:
 
The occurrences of real H-boxes (Lindsay) are less than the randoms for the UTRs, the distals are greater than the randoms for the positive direction and equal to or greater than the randoms for the negative direction. This suggests that the real responsive element consensus sequences are likely active or activable.
 
====H boxes (Mitchell)====
 
They "have the consensus H box sequence (5'-ANANNA-3') but have no other primary sequence identity."<ref name=Mitchell/>
 
{|class="wikitable"
|-
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|-
| Reals || UTR || negative || 144 || 2 || 72 || 72
|-
| Randoms || UTR || arbitrary negative || 265 || 10 || 26.5 || 28.55
|-
| Randoms || UTR || alternate negative || 306 || 10 || 30.6 || 28.55
|-
| Reals || Core || negative || 21 || 2 || 10.5 || 10.5
|-
| Randoms || Core || arbitrary negative || 5 || 10 || 0.5 || 0.6
|-
| Randoms || Core || alternate negative || 7 || 10 || 0.7 || 0.6
|-
| Reals || Core || positive || 5 || 2 || 2.5 || 2.5
|-
| Randoms || Core || arbitrary positive || 36 || 10 || 3.6 || 3.45
|-
| Randoms || Core || alternate positive || 33 || 10 || 3.3 || 3.45
|-
| Reals || Proximal || negative || 21 || 2 || 10.5 || 10.5
|-
| Randoms || Proximal || arbitrary negative || 19 || 10 || 1.9 || 2.65
|-
| Randoms || Proximal || alternate negative || 34 || 10 || 3.4 || 2.65
|-
| Reals || Proximal || positive || 17 || 2 || 8.5 || 8.5
|-
| Randoms || Proximal || arbitrary positive || 40 || 10 || 4.0 || 3.8
|-
| Randoms || Proximal || alternate positive || 36 || 10 || 3.6 || 3.8
|-
| Reals || Distal || negative || 288 || 2 || 144.0 || 144.0
|-
| Randoms || Distal || arbitrary negative || 478 || 10 || 47.8 || 44.0
|-
| Randoms || Distal || alternate negative || 402 || 10 || 40.2 || 44.0
|-
| Reals || Distal || positive || 130 || 2 || 65.0 || 65.0
|-
| Randoms || Distal || arbitrary positive || 659 || 10 || 65.9 || 66.45
|-
| Randoms || Distal || alternate positive || 670 || 10 || 67.0 || 66.45
|}
 
Comparison:
 
For the occurrences of real H boxes (Mitchell), the UTRs are systematically greater than the randoms, the real core negatives are systematically greater, the real core positives are systematically less than the randoms, real proximals are systematically greater than the randoms and the real distals are systematically outside the range of the randoms. This suggests that the real H boxes (Mitchell) are likely active or activable.
 
====H boxes (Rozhdestvensky)====
 
An H box has a consensus sequence of 3'-ACACCA-5'.<ref name=Rozhdestvensky>{{ cite journal
|author=Timofey S. Rozhdestvensky, Thean Hock Tang, Inna V. Tchirkova, Jürgen Brosius, Jean‐Pierre Bachellerie and Alexander Hüttenhofer
|title=Binding of L7Ae protein to the K‐turn of archaeal snoRNAs: a shared RNA binding motif for C/D and H/ACA box snoRNAs in Archaea
|journal=Nucleic Acids Research
|month=
|year=2003
|volume=31
|issue=3
|pages=869-77
|url=http://nar.oxfordjournals.org/content/31/3/869.long
|arxiv=
|bibcode=
|doi=10.1093/nar/gkg175
|pmid=
|accessdate=2014-06-08 }}</ref>
 
===Unknown response element types===
 
====BBCABW Inrs====
{{main|Initiator element gene transcriptions}}
 
====Calcineurin-responsive transcription factors====
 
The calcineurin-responsive transcription factors occur (1.5) in the UTR between ZSCAN22 and A1BG and at 2.5 in the distal promoters.
 
The random datasets had an occurrence of 0.4 in the UTR, 0.1 in the core promoters, and 0.2 in the negative direction distal promoters and 0.7 in the positive direction.
 
The disparity indicates that the CRTs are likely active or activable.
 
====Carbohydrate response elements (Carb)====
 
"The putative ChREBP binding sites ChoRE1 [contain the response element here denoted as Carb underlined] (CACGTG<u>ACCGG</u>ATCTTG, -324 to -308)".<ref name=Long>{{ cite journal
|author=Jianyin Long, Daniel L. Galvan, Koki Mise, Yashpal S. Kanwar, Li Li, Naravat Poungavrin, Paul A. Overbeek, Benny H. Chang, and Farhad R. Danesh
|title=Role for carbohydrate response element-binding protein (ChREBP) in high glucose-mediated repression of long noncoding RNA Tug1
|journal=Journal of Biological Chemistry
|date=28 May 2020
|volume=5
|issue=28
|pages=
|url=https://www.jbc.org/content/early/2020/05/28/jbc.RA120.013228.full.pdf
|arxiv=
|bibcode=
|doi=10.1074/jbc.RA120.013228
|pmid=
|accessdate=6 October 2020 }}</ref>
 
The UTR between ZSCAN22 and A1BG has an occurrence of 3.5 response elements per strand and direction, the core promoters have an occurrence of 0.25. the proximal promoters have an occurrence of 0.5, and the distal promoters have an occurrence of 5.0 in the negative direction and 13.0 in the positive direction.
 
The random datasets had occurrences of 2.1 in the UTR, 0.3 for the core promoters, 0.4 for the proximal promoters, and 3.0 in the negative direction and 4.7 in the positive direction. Generally, the random datasets produced slightly higher results suggestive of likely active or activable.
 
====Carbohydrate response elements (Carb1)====
 
"The putative ChREBP binding sites [are] ChoRE1 (CACGTG<u>ACCGG</u>ATCTTG, -324 to -308) and ChoRE2 (TCCGCC<u>CCCAT</u>CACGTG, -298 to - 282)".<ref name=Long/>
.<ref name=Long/>
 
Carb1 (CCCAT) occurs in the UTR between A1BG and ZSCAN22 at 2.5, the proximal promoters at 1.0, and the distal promoters at 2.0 in the negative direction and 4.5 in the positive direction.
 
The random datasets had twenty-three for ten datasets or 2.3, in the core promoters for 0.2, proximal promoters for 0.35, and distal promoters: 31 for the arbitrary negative direction (3.1) and 55 for the arbitrary positive direction (5.5) at 4.3 for an average of both.
 
While the occurrences for real and random are close for the UTRs and distal promoters, with an error of about 0.2 suggests that most of the occurrences are random except in the proximal promoters. The real occurrences may be likely active or activable.
 
====Cat8s====
 
The upstream activating sequence (UAS) for Cat8p is 5'-CGGNBNVMHGGA-3', where N = A, C, G, T, B = C, G, T, V = A, C, G, M = A, C, and H = A, C, T; i.e. 5'-CGG(A/C/G/T)(C/G/T)(A/C/G/T)(A/C/G)(A/C)(A/C/T)GGA-3'.<ref name=Tang/>
 
The real promoters have only two Cat8s: TCCGTGCCACCG at 2528 and TCCGTGCCACCG at 657, both inverse complements and on the negative strand in the negative direction in the distal promoter for an occurrence of 0.5.
 
The random datasets had response elements in the UTRs (0.1) and proximal promoters (0.1) as well as the distal promoters (0.25), specifically in the negative direction (0.2) and 0.3 in the positive direction.
 
Even though there are only two real Cat8 response elements in the distal promoter, they are likely active or activable.
 
====Cell-cycle box variants====
 
The real promoters have been examined for the CCB variants: CACGAAA, ACGAAA and C-CGAAA, where C-C indicates CC with the likely A being absent (CCGAAA). The inverse complements are TTTCGTG, TTTCGT and TTTCG-G (TTTCGG). The possibility of finding these CCB variants has been performed using a general consensus sequence of NNCGAAA, the expected variants should occur if present. The real promoters have CCB variants only in the distal promoters. In the negative direction, the variants occur ACGAAA at 494 and ACGAAA at 312. The actual general consensus sequence occurrences are GGCGAAA at 2157, TACGAAA at 494, and GACGAAA at 312. CACGAAA or CCGAAA never occurred. The inverse complements occur in the negative and positive direction: TTTCGT at 2479, TTTCGT at 2473 and TTTCGT at 186, and TTTCGGG at 1752, negative strand, positive strand: TTTCGTG at 3600, TTTCGT at 2006, respectively. The occurrences are 2.5 and 1.5 per direction.
 
The random datasets had twenty-three UTR general consensus sequences for an occurrence of 2.3. Of these the CCB variants had the following frequencies: CACGAAA (0), ACGAAA (2) and CCGAAA (5). The inverse complements had TTTCGTG (0), TTTCGT (1) and TTTCGG (6). The remaining nine were of the general consensus sequence of NNCGAAA or TTTCGNN.
 
The random datasets had three general consensus sequences in arbitrary positive direction core promoter only for an occurrence of 0.3 (two strands, one direction) or 0.15 (four strands, both directions). The proximal general consensus sequences (five) had occurrences of 0.2 and 0.3.
 
The distal promoters for the random datasets had twenty-six in the arbitrary negative direction for an occurrence of 2.6 and forty-two in the positive direction for an occurrence of 4.2.
 
As the choices for direction are arbitrary for the random datasets an average occurrence would be 3.25. Even separately the real occurrences are lower than the random ones albeit not by much for the real negative direction (2.5) vs. the arbitrarily chosen negative direction (2.6) for the randoms. The randoms also had UTR, core and proximal promoter occurrences where the reals have none.
 
These results suggest that the real variants ACGAAA (2), TTTCGT (5) and TTTCGG (1), are likely active or activable.
 
====CGCG boxes====
 
All of the real CGCG boxes found are more closely associated with ZSCAN22 or ZNF497 than A1BG.
 
The real promoters only have three CGCG boxes closer to ZSCAN22 (for an occurrence of 1.5) or thirty-two closer to ZNF497 than A1BG (for an occurrence of 16.0).
 
The random datasets had fifteen CGCG boxes in the A1BG UTR on the ZSCAN22 side for an occurrence of 3.0.
 
The random datasets had two CGCG boxes in the core promoter (arbitrary positive direction) for an occurrences of 0.2.
 
In the proximal promoters the random datasets had one (positive direction) for an occurrence of 0.1.
 
The distal promoters had twenty sequences in the arbitrary negative direction some closer to A1BG some to ZSCAN22 for an occurrence of 4.0. There were thirty sequences in the arbitrary positive direction some closer to A1BG than to ZNF497 for an occurrence of 6.0.
 
The disparity between the real consensus sequences regarding their occurrences closer to the zinc fingers than A1BG and between the values of occurrences in the distal promoters 1.5 in the negative direction vs 4.0 and 16.0 vs 6.0 in the positive direction suggests that the real consensus sequences are likely active or activable probably for the respective zinc fingers.
 
====Circadian control elements====
 
"The circadian control element [CCE] (circadian; Anderson ''et al.'', 1994) was found in 10 FvTCP genes."<ref name=Wei>{{ cite journal
|author=Wei Wei, Yang Hu, Meng-Yuan Cui, Yong-Tao Han, Kuan Gao, and Jia-Yue Feng
|title=Identification and Transcript Analysis of the TCP Transcription Factors in the Diploid Woodland Strawberry ''Fragaria vesca''
|journal=Frontiers in Plant Science
|date=22 December 2016
|volume=7
|issue=
|pages=1937
|url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5177655/#B2
|arxiv=
|bibcode=
|doi=10.3389/fpls.2016.01937
|pmid=28066489
|accessdate=29 November 2020 }}</ref> Circadian control elements (CAANNNNATC).<ref name=Sharma>{{ cite journal
|author=Bhaskar Sharma & Joemar Taganna
|title=Genome-wide analysis of the U-box E3 ubiquitin ligase enzyme gene family in tomato
|journal=Scientific Reports
|date=12 June 2020
|volume=10
|issue=9581
|pages=
|url=https://www.nature.com/articles/s41598-020-66553-1
|arxiv=
|bibcode=
|doi=10.1038/s41598-020-66553-1
|pmid=32533036
|accessdate=27 August 2020 }}</ref>
 
In the real promoters there is only one CCE, an inverse complement: GATGGGGTTG at 3804 in the ZSCAN22 - A1BG UTR for an occurrence of 0.5 (its complement CTACCCCAAC is on the negative strand in the negative direction).
 
The random datasets had two CCEs in the UTR out of ten strands for 0.2. There are also one core promoter for an occurrence of 0.1, no proximal promoter, and five (arbitrary negative direction) and nine (positive direction) distal promoters for 0.5 and 0.9 occurrences, respectively.
 
Comparing the real and random results, the one CCE in the real promoters is likely active or activable.
 
====Cold-responsive elements====
 
The randoms have fourteen consensus sequences in the ZSCAN22-A1BG UTR with an occurrence of 1.4. The reals have sixteen sequences with an occurrence of 8.0 which is much larger than the randoms.
 
The randoms had core promoters: one in the arbitrary negative direction and two in the positive direction for occurrences of 0.1 and 0.2, respectively. The reals have no sequences in the core promoters.
 
The randoms had two in the negative direction for an occurrence of 0.2 and two in the positive direction also for an occurrence of 0.2. The reals have three sequences in the negative direction only for an occurrence of 1.5.
 
In the distal promoters, the randoms had twenty-five in the negative direction for an occurrence of 2.5. There were forty-two in the positive direction for an occurrence of 4.2.
 
The reals have twenty-two sequences in the negative direction for an occurrence of 11.0 and thirty-six in the positive direction for an occurrence of 18.0. Both of these are systematically higher than the randoms.
 
With the exceptions of the core promoters, the real UTR, proximal and distal promoters have systematically many more sequences than the randoms. This suggests that these promoters are likely active or activable.
 
====Copper response elements====
 
"A consensus copper-response element [CuRE] TTTGC(T/G)C(A/G) (12) is a binding site for Mac1p."<ref name=Quinn>{{ cite journal
|author=Jeanette M. Quinn, Paola Barraco, Mats Eriksson and Sabeeha Merchant
|title=Coordinate Copper- and Oxygen-responsive ''Cyc6'' and ''Cpx1'' Expression in ''Chlamydomonas'' Is Mediated by the Same Element
|journal=Journal of Biological Chemistry
|date=March 2000
|volume=275
|issue=9
|pages=6080-6089
|url=https://www.sciencedirect.com/science/article/pii/S0021925818304551
|arxiv=
|bibcode=
|doi=10.1074/jbc.275.9.6080
|pmid=
|accessdate=1 April 2021 }}</ref>
 
The only copper response element conforming to the Quinn consensus sequence is an inverse complement CGCGCAAA at 163 in the positive strand, negative direction distal promoter with an occurrence of 0.5.
 
The random datasets had one inverse complement consensus sequence in the UTR CGAGCAAA at 2440 with an occurrence of 0.1. In the distal promoters there were two inverse complement consensus sequences in the arbitrary negative direction for an occurrence of 0.2. The low occurrences among the randoms suggests that the one real occurrence is likely active or activable even though there is only one. It also occurs in the positive strand as does A1BG.
 
"An additional EMSA result demonstrated that [''Aspergillus fumigatus'' (Af)] AfMac1 directly binds to a copper response element in the promoter regions of the ''ctrA2'' and ''ctrC'' genes with a defined consensus DNA motif (5′-TGTGCTCA-3′) (Park et al., 2017<ref name=Park>{{ cite journal
|author=Yong-Sung Park, Tae-Hyoung Kim and Cheol-Won Yun
|title=Functional characterization of the copper transcription factor AfMac1 from ''Aspergillus fumigatus''
|journal=Biochemical Journal
|date=3 July 2017
|volume=474
|issue=14
|pages=2365-2378
|url=https://portlandpress.com/biochemj/article-abstract/474/14/2365/49495/Functional-characterization-of-the-copper?redirectedFrom=fulltext
|arxiv=
|bibcode=
|doi=10.1042/BCJ20170191
|pmid=
|accessdate=2 April 2021 }}</ref>), which is strikingly similar to the Mac1-binding motif in ''S. cerevisiae'' (Jamison McDaniels et al., 1999; Keller et al., 2000), suggesting that the mechanism of Mac1-mediated copper homeostasis may be conserved across fungal species."<ref name=JSong>{{ cite journal
|author=Jinxing Song, Rongpeng Li and Jihong Jiang
|title=Copper Homeostasis in ''Aspergillus fumigatus'': Opportunities for Therapeutic Development
|journal=Frontiers in Microbiology
|date=12 April 2019
|volume=10
|issue=
|pages=774
|url=https://www.frontiersin.org/articles/10.3389/fmicb.2019.00774/full
|arxiv=
|bibcode=
|doi=10.3389/fmicb.2019.00774
|pmid=
|accessdate=2 April 2021 }}</ref>
 
The consensus sequence for the copper response element (Park 2017): TGTGCTCA, only occurs as an inverse complement TGAGCACA at 3740 on the negative strand in the positive direction in the distal promoter closer to A1BG than ZNF497. This inverse complement has an occurrence of 0.5.
 
The randoms also have only one occurrence, an inverse complement TGAGCACA at 2259 in the arbitrary positive direction. This inverse complement had an occurrence of 0.1. The difference in occurrences between real and random suggests that the real TGAGCACA at 3740 is likely active or activable.
 
====Cytoplasmic polyadenylation elements====
 
"Cytoplasmic polyadenylation is determined by the cytoplasmic polyadenylation element (CPE; consensus sequence UUUUUAU) that resides in mRNA 3′ untranslated regions (UTRs)."<ref name=Ivshina>{{ cite journal
|author=Maria Ivshina, Paul Lasko, and Joel D. Richter
|title=Cytoplasmic polyadenylation element binding proteins in development, health, and disease
|journal=Annual Review of Cell and Developmental Biology
|date=October 2014
|volume=30
|issue=
|pages=393-415
|url=https://www.annualreviews.org/doi/abs/10.1146/annurev-cellbio-101011-155831
|arxiv=
|bibcode=
|doi=10.1146/annurev-cellbio-101011-155831
|pmid=
|accessdate=17 April 2021 }}</ref>
 
{|class="wikitable"
|-
! Reals or randoms !! Promoters !! direction !! Numbers !! Strands !! Occurrences !! Averages (± 0.1)
|-
| Reals || UTR || negative || 3 || 2 || 1.5 || 1.5
|-
| Randoms || UTR || arbitrary negative || 1 || 10 || 0.1 || 0.1
|-
| Randoms || UTR || alternate negative || 1 || 10 || 0.1 || 0.1
|-
| Reals || Core || negative || 0 || 2 || 0 || 0
|-
| Randoms || Core || negative || 0 || 10 || 0 || 0
|-
| Reals || Core || positive || 0 || 2 || 0 || 0
|-
| Randoms || Core || positive || 0 || 10 || 0 || 0
|-
| Reals || Proximal || negative || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || negative || 0 || 10 || 0 || 0
|-
| Reals || Proximal || positive || 0 || 2 || 0 || 0
|-
| Randoms || Proximal || positive || 0 || 10 || 0 || 0
|-
| Reals || Distal || negative || 6 || 2 || 3 || 1.5
|-
| Randoms || Distal || negative || 8 || 10 || 0.8 || 0.45
|-
| Reals || Distal || positive || 0 || 2 || 0 || 1.5
|-
| Randoms || Distal || positive || 1 || 10 || 0.1 || 0.45
|}
 
Comparison:
 
The occurrences of real cytoplasmic polyadenylation elements are larger than the randoms. This suggests that the real cytoplasmic polyadenylation elements are likely active or activable.
 
====DAF-16 binding elements====
 
"Most paralogous FOX proteins bind to the canonical DNA response element 5′-RYAAAYA-3′ (R = A or G, Y = C or T)<sup>11–13</sup>."<ref name=Li2017>{{ cite journal
|author=Jun Li, Ana Carolina Dantas Machado, Ming Guo, Jared M. Sagendorf, Zhan Zhou, Longying Jiang, Xiaojuan Chen, Daichao Wu, Lingzhi Qu, Zhuchu Chen, Lin Chen, Remo Rohs, and Yongheng Chen
|title=Structure of the forkhead domain of FOXA2 bound to a complete DNA consensus site
|journal=Biochemistry
|date=25 July 2017
|volume=56
|issue=29
|pages=3745-3753
|url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5614898/
|arxiv=
|bibcode=
|doi=10.1021/acs.biochem.7b00211
|pmid=28644006
|accessdate=28 August 2020 }}</ref>
 
For the reals there are five in the UTR between ZSCAN22 and A1BG for an occurrence of 2.5.
 
Only one proximal promoter occurs in the positive direction for an occurrence of 0.5.
 
The distal promoters have eight consensus sequences for an occurrence of 4.0.
 
The randoms had ten UTRs for an occurrence of 1.0, three proximal promoters for an occurrence of 0.15, and eighteen arbitrary distal promoters in the negative direction for an occurrence of 1.8, twenty-three in the arbitrary positive direction for an occurrence of 2.3.
 
A comparison shows that the reals are systematically higher in occurrences that the randoms to suggest that the reals are likely active or activable.
 
====D box (Samarsky)====
 
For "box C/D snoRNAs, boxes C and D and an adjoining stem form a vital structure, known as the box C/D motif."<ref name=Samarsky/> Adjoining Domain B and overlapping for two nucleotides is Box D: GUCUGA from Domain B where "GU" are also at the end of Domain B, with the inverse being AGUCUG and replacing U with T yields a likely consensus sequence to search for AGTCTG.<ref name=Samarsky/>
 
The real consensus sequences are AGTCTG at 2947 in the UTR between A1BG and ZSCAN22 with an occurrence of 0.5, three in the distal promoter also in the negative direction for an occurrence of 1.5, and six in the positive direction for an occurrence of 3.0.
 
The randoms had one in the UTR: AGTCTG at 4073 in the arbitrary negative direction for an occurrence of 0.1, four in the negative direction in the distal promoter for an occurrence of 0.4 and seven in the positive direction for an occurrence of 0.7.
 
By comparison, the occurrences are systematically higher for the reals than the randoms which suggests that the reals are likely active or activable.
 
====D box (Voronina)====
 
The reals have four consensus sequences in the UTR for an occurrence of 2.0.
 
There is only one core promoter of eight promoters for an occurrence of 0.125.
 
Proximal promoters have two occurrences among eight possibilities for an occurrence of 0.25.
 
Distal promoters have twenty-eight consensus sequences in the negative direction for an occurrence 3.5.
 
In the positive direction has twenty-three consensus sequences in the positive direction for an occurrence 2.875.
 
The randoms had seventeen UTR consensus sequences for an occurrence of 1.7.
 
The randoms had one core promoter from twenty opportunities for an occurrence of 0.05.
 
In the proximal promoters, the randoms had three in the arbitrary negative direction and four in the positive direction for occurrences of 0.3 and 0.4.
 
For the distal promoters, the negative direction had twenty-nine consensus sequences for an occurrence of 2.9.
 
In the positive direction, the randoms had thirty-four consensual sequences for an occurrence of 3.4.
 
In comparison for the distal promoters, the random sequences had approximately the same occurrences as the reals. For the proximal promoters the randoms had slightly more occurrences than the reals. For the core promoters, the randoms had slightly less occurrences. For the UTR the randoms had slightly less occurrences than the reals (1.7 vs. 2.0). Based on the UTR and core promoters it appears that the reals are likely active or activable.
 
====D-box (Motojima)====
 
D-box (TGAGTGG).<ref name=Motojima>{{ cite journal
|author=Masaru Motojima, Takao Ando and Toshimasa Yoshioka
|title=Sp1-like activity mediates angiotensin-II-induced plasminogen-activator inhibitor type-1 (''PAI-1'') gene expression in mesangial cells
|journal=Biomedical Journal
|date=10 July 2000
|volume=349
|issue=2
|pages=435-441
|url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1221166/pdf/10880342.pdf
|arxiv=
|bibcode=
|doi=10.1042/0264-6021:3490435
|pmid=10880342
|accessdate=13 August 2020 }}</ref>
 
The real promoters have two inverse complements in the UTR positive strand, negative direction: CCACTCA at 4487 nucleotides from the end of gene ZSCAN22 and negative strand, negative direction: CCACTCA at 3827, for an occurrence of 0.5.
 
In the distal promoters, there is an inverse complement (ic) between ZNF497 and A1BG negative strand, positive direction: TGAGTGG at 3449 for an occurrence of 0.25.
 
The random datasets had one UTR TGAGTGG at 4502 for an occurrence of 0.1.
 
They had one proximal promoter D-box consensus sequence: TGAGTGG at 4148 in the arbitrary positive direction for an occurrence of 0.05.
 
The distal promoters had two consensus sequence ics: CCACTCA at 1766 and CCACTCA at 1365 for an occurrence of 0.1.
 
Comparing the two results, the occurrences are higher for the real UTR consensus sequences and the distal promoter consensus sequences than the randoms suggesting that the reals are likely active or activable.
 
====DRE (Sumrada, core)====
 
"A consensus sequence, 5'-TAGCCGCCGRRRR-3' (where R = an unspecified purine nucleoside [A/G],was generated from these data."<ref name=Sumrada>{{ cite journal
|author=Roberta A. Sumrada and Terrance G. Cooper
|title=Ubiquitous upstream repression sequences control activation of the inducible arginase gene in yeast
|journal=Proceedings of the National Academy of Sciences USA
|date=June 1987
|volume=84
|issue=
|pages=3997-4001
|url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC305008/pdf/pnas00277-0054.pdf
|arxiv=
|bibcode=
|doi=10.1073/pnas.84.12.3997
|pmid=3295874
|accessdate=6 September 2020 }}</ref>
 
"The extent of homology for the entire 13 bp ranged from 56 to 100%. However, for the symmetrical core sequence CCGCC 75 to 100% homology was observed with only conservative substitutions occurring in the nonhomologous positions."<ref name=Sumrada/>
 
The real UTR has two consensus sequences for an occurrence of 1.0.
 
The core promoters are only in the positive direction (three) for an occurrence of 0.75.
 
The proximal promoters have two in the negative direction and one in the positive direction for occurrences of 1.0 and 0.5, respectively.
 
The distal promoters have ten in the negative direction for an occurrence of 5.0. In the positive direction the reals have eleven for an occurrence of 5.5.
 
The random datasets had twenty-nine UTR consensus sequences for an occurrence of 2.9.
 
Randoms had four core promoters for an occurrence of 0.2.
 
Randoms had four proximal promoters for an occurrence of 0.2.
 
Randoms had thirty-five arbitrary negative direction distal promoters for an occurrence of 3.5.
 
In the arbitrary positive direction the randoms had sixty-six distastes promoters for an occurrence of 6.6.
 
By comparison, the randoms outnumber the reals for UTRs (2.9 vs. 1.0, respectively), but the reals outnumber the randoms regarding core promoters (0.75 vs. 0.2), proximal promoters (1.5 vs. 0.2), and just for the distal promoters (total of 10.5 vs. 10.1, respectively). The disparity rather than overlap suggests that the reals are likely active or activable.
 
====dBRE====
 
There are two sets of BREs: one (BREu) found immediately upstream of the TATA box, with the consensus SSRCGCC [(C/G)(C/G)(A/G)CGCC]; the other (BREd) found around 7 nucleotides downstream, with the consensus RTDKKKK [(A/G)T(A/G/T)(G/T)(G/T)(G/T)(G/T)].<ref name=WilsonD>{{cite web |last1=Wilson |first1=David B. |title=Drosophila Core Promoter Motifs |url=http://gander.wustl.edu/~wilson/core_promoter_motifs.html |accessdate=2 April 2019}}</ref><ref name=Gershon>{{cite journal |last1=Juven-Gershon |first1=T |last2=Kadonaga |first2=JT |title=Regulation of gene expression via the core promoter and the basal transcriptional machinery. |journal=Developmental Biology |date=15 March 2010 |volume=339 |issue=2 |pages=225–9 |doi=10.1016/j.ydbio.2009.08.009 |pmid=19682982 |pmc=2830304}}</ref>
 
The reals have twenty-eight UTRs on the negative strand in the negative direction and thirty on the positive strand in the negative direction for a total of fifty-eight with an occurrence of 29.0.
 
The randoms had one hundred and nineteen for an occurrence of 11.9. There's more than twice as many reals as randoms.
 
The real core promoters are one on the positive strand in the negative direction for an occurrence of 0.5. There are two on the negative strand in the positive direction and three on the positive strand in the positive direction for an occurrence of 2.5.
 
The randoms had two in the arbitrary negative direction for an occurrence of 0.2. There were twenty-three in the arbitrary positive direction for an occurrence of 2.3, but eight of the twenty-three are out of range for the random data set so the actual occurrence is 1.5. The reals slightly outnumber the randoms.
 
For the proximal promoters, the reals have eight consensus sequences in the negative direction for an occurrence of 4.0, and seven in the positive direction for 3.5. The randoms had twenty in the arbitrary negative direction for an occurrence of 2.0, and twenty-two in the positive direction for an occurrence of 2.2. The reals outnumber the randoms.
 
With the distal promoters: there are a total of one hundred and thirty-one real distal promoters, forty-three on the negative strand in the negative direction, forty on the positive strand in the negative direction (total 83, occurrence is 41.5), eighteen on the negative strand in the positive direction, and thirty on the positive strand in the positive direction (total 48, occurrence is 24.0). The total occurrences are 32.75.
 
The randoms had the following results: twenty-three (dBREr0), twenty (dBREr2), eighteen (dBREr4), seventeen (dBREr6), seventeen (dBREr8), eighteen (dBREr0ci), eighteen (dBREr2ci), twenty-three (dBREr4ci), twenty-two (dBREr6ci), twenty (dBREr8ci) for a total of 196 in the arbitrary negative direction yielding an occurrence 19.6.
 
For the arbitrary positive direction: twenty-seven (dBREr1), twenty-eight (dBREr3), twenty-eight (dBREr5), twenty-two (dBREr7), twenty-six (dBREr9), twenty-five (dBREr1ci), twenty-four (dBREr3ci), thirty (dBREr5ci), thirty (dBREr7ci), thirty-five (dBREr9ci) for a total of 275 in the arbitrary positive direction yielding an occurrence 27.5. The total number of consensus sequences (471) yields an occurrence of 23.55 with an error of ± 4.
 
Comparing the distal promoters, the negative direction is higher than the randoms, whereas the positive direction is comparable to the randoms. This suggests that as least in the negative direction the reals are likely active or activable, but in the positive direction the core and proximal promoter are likely active or activable, but the distal promoter sequences may be random.
 
====Downstream core elements====
 
The consensus sequence for the DCE is CTTC...CTGT...AGC.<ref name=Lee/> These three consensus elements are referred to as subelements: "S<sub>I</sub> is CTTC, S<sub>II</sub> is CTGT, and S<sub>III</sub> is AGC."<ref name=Lee/>
 
A core promoter that contains all three subelements of the downstream core element may be much less common than one containing only one or two.<ref name=Lee/> "S<sub>I</sub> resides approximately from +6 to +11, S<sub>II</sub> from +16 to +21, and S<sub>III</sub> from +30 to +34."<ref name=Lee>{{ cite journal
|author=Dong-Hoon Lee, Naum Gershenzon, Malavika Gupta, Ilya P. Ioshikhes, Danny Reinberg and Brian A. Lewis
|title=Functional Characterization of Core Promoter Elements: the Downstream Core Element Is Recognized by TAF1
|journal=Molecular and Cellular Biology
|date=November 2005
|volume=25
|issue=21
|pages=9674-86
|url=http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1265815/
|arxiv=
|bibcode=
|doi=10.1128/MCB.25.21.9674-9686.2005
|pmid=16227614
|accessdate=2010-10-23 }}</ref>
 
The number of nucleotides between each subelement can apparently vary down to none.
 
====DCE SI====
 
The consensus sequence for the DCE is CTTC...CTGT...AGC.<ref name=Lee/> These three consensus elements are referred to as subelements: "S<sub>I</sub> is CTTC, S<sub>II</sub> is CTGT, and S<sub>III</sub> is AGC."<ref name=Lee>{{ cite journal
|author=Dong-Hoon Lee, Naum Gershenzon, Malavika Gupta, Ilya P. Ioshikhes, Danny Reinberg and Brian A. Lewis
|title=Functional Characterization of Core Promoter Elements: the Downstream Core Element Is Recognized by TAF1
|journal=Molecular and Cellular Biology
|date=November 2005
|volume=25
|issue=21
|pages=9674-86
|url=http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1265815/
|arxiv=
|bibcode=
|doi=10.1128/MCB.25.21.9674-9686.2005
|pmid=16227614
|accessdate=2010-10-23 }}</ref>
 
The real downstream core elements in the UTR between ZSCAN22 and A1BG have seven sequences for an occurrence of 3.5. The random data sets had thirty-six sequences for an occurrence of 7.0.
 
The real DCE in the core promoters (two) have an occurrence of 0.5. The random data sets had one for an occurrence of 0.2.
 
The real DCE in the proximal promoters (seven) have an occurrence of 1.75. The randoms had seven for an occurrence of 0.7.
 
The real DCE in the distal promoters (sixty) have an occurrence of 15. The randoms had one hundred and fifty-one for an occurrence of 5.1.
 
The disparity between the reals and randoms was often as much as a factor of 2 which suggests that the real DCE SI is likely active or activable.
 
====DCE SII====
 
The reals have seventeen on the negative strand, in the negative direction in the UTR and twenty-one on the positive strand, in the negative direction, for an average occurrence of 19.0.
 
The randoms had fifty-two in the UTR for an occurrence of 5.2.
 
The reals have three in the positive direction only for an average occurrence of 1.5. But, reals could have been present in the negative direction core promoter, so the total average occurrence is 0.75.
 
The randoms had two in the arbitrary negative direction in the core promoter for an occurrence of 0.2 and seven in the arbitrary positive direction for an average occurrence of 0.7.
 
For the proximal promoters, the reals have three in the negative direction for an average occurrence of 1.5 and two in the positive direction for an average occurrence of 1.0.
 
The randoms had one in the arbitrary negative direction for an occurrence of 0.1 and ten in the positive direction for an average occurrence of 1.0, for an average occurrence of 0.55.
 
For the distal promoters, the reals have fifteen on the negative strand, in the negative direction, and thirty-two on the positive strand, in the negative direction for an average occurrence of 23.5. The randoms have seventy-five in the arbitrary negative direction for an average occurrence of 7.5.
 
The reals have fifty-two on the negative strand, in the positive direction and twenty-nine on the positive strand, positive direction, for an average occurrence of 40.5. The randoms had one hundred and nine for an average occurrence of 10.9.
 
Comparatively, the reals in the UTR between ZSCAN22 and A1BG are likely active or activable. For the core promoters, the randoms are close in occurrences but lower than the reals to suggest that the reals are likely active or activable.
 
For the proximal promoters, again the occurrences for the randoms are at or on average below that of the reals suggesting that the reals are likely active or activable.
 
The distal promoters in the real promoters have systematically much higher occurrences than the randoms suggesting that the reals are likely active or activable.
 
====DCE SIII====
 
The reals have nine consensus sequences DCE SIII directs or complement inverses on the negative strand in the negative direction. There are seventy consensus sequences DCE SIII directs or complement inverses on the positive strand in the negative direction. The overall occurrence is 39.5. The randoms have two hundred and twenty-seven consensus sequences DCE SIII for an occurrence of 22.7 which is much less than the reals suggesting the UTRs are likely active or activable.
 
There are nine real core promoters, all in the positive direction (4.5 vs. 0.0 in the negative direction), for an overall occurrence of 2.25. The core promoter randoms, nine in the negative direction (0.9 occurrence), thirty-nine in the positive direction but fifteen are not allowed for twenty-four (2.4 occurrence), had thirty-three consensus sequences for an average occurrence of 1.65. Taken separately by direction, the occurrences for the reals are much higher than that for the randoms suggesting that they are likely active or activable.
 
The proximal promoters (real) have ten sequences in the negative direction (5.0 occurrence) and one in the positive direction (0.5 occurrence) for an overall occurrence of 2.75. The proximal promoter randoms, twenty-one in the negative direction (2.1 occurrence), thirty-eight in the positive direction (3.8 occurrence), had fifty-nine consensus sequences (twenty-one in the arbitrary negative direction, thirty-eight in the positive direction) for an overall occurrence of 2.95. The apparent proximal promoters are roughly within 2.95 ± 0.2 of the randoms suggesting that the proximal promoters are likely random. Taken by direction, the reals are outside the range of the randoms suggesting that they are likely active or activable.
 
The distal promoters for the real promoters, one hundred and fifteen in the negative direction (57.5 occurrence), two hundred and fifty-three in the positive direction (126.5 occurrence), are three hundred and sixty-eight for an overall occurrence of 92.0. The randoms, three hundred and seventeen in the negative direction (31.7 occurrence), five hundred and twenty-three in the positive direction (52.0 occurrence), had eight hundred and forty sequences for an overall occurrence of 42.0, which is systematically lower than the reals suggesting that the reals are likely active or activable.
 
Taking directions into account, the reals generally appear to be likely active or activable.
 
====DPE (Juven-Gershon)====
 
The reals have thirty-four consensus sequences on the negative strand in the negative direction in the UTR between ZSCAN22 and A1BG for an occurrence of 34.0. In the negative direction on the positive strand there are fourteen sequences for an occurrence of 14.0 and an average occurrence of 24.0.
 
The randoms had eighty-two for an occurrence of 8.2 which is way below the reals suggesting they are likely active or activable.
 
For the core promoters, the negative direction has one for an occurrence of 0.5. In the positive direction the reals have three for an occurrence of 1.5, an average of 1.0. For the randoms, there no core promoters in the arbitrary negative direction for an occurrence of 0.0. In the positive direction, the randoms had eleven core promoters for an occurrence of 1.1 for an average occurrence of 0.55. Each is sytematically lower than the reals suggesting that the reals are likely active or activable.
 
The real proximal promoters have three in the negative direction for an occurrence of 1.5. There are seven in the positive direction for an occurrence of 3.5. The randoms had fifteen in the arbitrary negative direction for an occurrence of 1.5. In the positive direction the randoms had fifteen for an occurrence of 1.5. The reals in the negative direction are likely random, whereas the reals in the positive direction are likely active or activable.
 
The real distal promoters have ninety-three in the positive direction for an occurrence of 46.5, and fifty-seven in the negative direction for an occurrence of 28.5.
 
The randoms had two hundred and eleven in the arbitrary positive direction for an occurrence of 21.1, and the negative direction had one hundred and thirty-three for an occurrence of 13.1. These are also systematically lower than the reals suggesting that the real distal occurrences are likely active or activable.
 
====DPE (Kadonaga)====
 
The early DPE consensus sequence was RGWCGTG.<ref name=Burke1996>{{ cite journal
| author = T.W. Burke and James T. Kadonaga
| date = 15 March 1996
| title = Drosophila TFIID binds to a conserved downstream basal promoter element that is present in many TATA-box-deficient promoters
| journal = Genes & Development
| volume = 10
| issue = 6
| pages = 711–724
| doi = 10.1101/gad.10.6.711
| pmid = 8598298
| url = http://genesdev.cshlp.org/content/10/6/711.full.pdf }}</ref><ref name=Kadonaga2002>{{ cite journal
| author = James T. Kadonaga
| date = September 2002
| title = The DPE, a core promoter element for transcription by RNA polymerase II
| journal = Experimental & Molecular Medicine
| volume = 34
| issue = 4
| pages = 259–264
| doi =
| pmid = 12515390
| url = http://www.e-emm.or.kr/article/article_files/emm34-4-1.pdf }}</ref>
 
The UTR for A1BG from ZSCAN22 has four consensus sequences for an occurrence of 2.0. The randoms for the DPE (Kadonaga) had five for an occurrence of 0.5. The UTR occurrences are likely active or activable.
 
The core promoters on either side of A1BG have no consensus sequences. The randoms had one for an occurrence of 0.1.
 
The proximal promoters on either side of A1BG have none. The randoms had none.
 
The distal promoters on either side of A1BG have nine in the negative direction for an occurrence of 4.5 and nine in the positive direction for an occurrence of 4.5. The randoms had one in the negative direction and nine in the arbitrary positive direction for occurrences of 0.1, 0.9, 0.5 for both as an average. The distal promoter occurrences are likely active or activable.
 
====DPE (Matsumoto)====
{{main|Downstream promoter element gene transcriptions}}
 
====EIN3 binding sites====
{{main|EIN3 binding site gene transcriptions}}
 
====Endosperm expressions====
{{main|Endosperm expression gene transcriptions}}
 
====GAAC elements====
{{main|GAAC element gene transcriptions}}
 
====GC boxes (Briggs)====
{{main|GC box gene transcriptions}}
 
====GC boxes (Ye)====
{{main|GC box gene transcriptions}}
 
====GCR1s====
{{main|Gcr1p gene transcriptions}}
 
====GT boxes (Sato)====
{{main|TC element gene transcriptions}}
 
====Hex sequences====
{{main|Hex sequence gene transcriptions}}
 
====HY boxes====
{{main|HY box gene transcriptions}}
 
====Inr-like, TCTs====
{{main|Initiator element gene transcriptions}}
 
====KAR2s====
{{main|Hac1p gene transcriptions}}
 
====UPREs====
{{main|Unfolded protein response element gene transcriptions}}
 
====UPRE-1s====
{{main|Hac1p gene transcriptions}}
 
====YYRNWYY Inrs====
{{main|Initiator element gene transcriptions}}
 
==A1BG orthologs==
 
===''Geotrypetes seraphini''===
[[Image:Geotrypetes seraphini 81151944.jpg|thumb|right|250px|''Geotrypetes seraphini'', the Gaboon caecilian, is a species of amphibian. Credit: [https://www.inaturalist.org/users/7865 Marius Burger].{{tlx|free media}}]]
''Geotrypetes seraphini'', the Gaboon caecilian, is a species of amphibian in the family ''Dermophiidae''.<ref name=IUCN>{{cite journal |author=IUCN SSC Amphibian Specialist Group |date=2019 |title=''Geotrypetes seraphini'' |volume=2019 |page=e.T59557A16957715 |url=https://en.wikipedia.org/wiki/IUCN_Red_List
|doi=10.2305/IUCN.UK.2019-1.RLTS.T59557A16957715.en |accessdate=16 November 2021}}</ref>
 
Its A1BG ortholog has 368 aa vs 495 aa for ''Homo sapiens''.
{{clear}}
 
==ZSCAN22==
{{main|ZSCAN22}}
# Gene ID: 342945 is ZSCAN22 zinc finger and SCAN domain containing 22 on 19q13.43.<ref name=HGNC342945>{{ cite web
|author=HGNC
|title=ZSCAN22 zinc finger and SCAN domain containing 22 [ Homo sapiens (human) ]
|publisher=National Center for Biotechnology Information
|location=U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA
|date=13 March 2020
|url=https://www.ncbi.nlm.nih.gov/gene/342945
|accessdate=2019-12-18 }}</ref> ZSCAN22 is transcribed in the negative direction from LOC100887072.<ref name=HGNC342945/>
# Gene ID: 102465484 is MIR6806 microRNA 6806 on 19q13.43: "microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop."<ref name=RefSeq102465484>{{ cite web
|author=RefSeq
|title=MIR6806 microRNA 6806 [ Homo sapiens (human) ]
|publisher=National Center for Biotechnology Information
|location=U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA
|date=10 September 2009
|url=https://www.ncbi.nlm.nih.gov/gene/102465484
|accessdate=2019-12-18 }}</ref> MIR6806 is transcribed in the negative direction from LOC105372480.<ref name=RefSeq102465484/>
 
Of the some 111 gaps between genes on chromosome locus 19q13.43 as of 4 August 2020, gap number 88 is between ZSCAN22 and A1BG. But, there is no gap between ZNF497 and A1BG.
 
==Promoters==
 
The core promoter begins approximately -35 nts upstream from the transcription start site (TSS). For the numbered nucleotides between ZSCAN22 and A1BG the core promoter extends from 4425 nts up to 4460 nts (TSS). The proximal promoter extends from approximately -250 to the TSS or 4210 nts up to 4460 nts. The distal promoter begins at about 2460 nts and extends to about 4210 nts.
 
From the ZNF497 side the core promoter begins about 4265 nts up to 4300 nts, the proximal promoter from 4050 nts to 4265 nts, and the distal promoter from 2300 nts to 4050 nts.
 
==Alpha-1-B glycoprotein==
{{main|Alpha-1-B glycoprotein}}
'''Def.''' "a substance that induces an immune response, usually foreign"<ref name=AntigenWikt>{{ cite web
|author=[[wikt:User:Jag123|Jag123]]
|title=antigen
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=7 March 2005
|url=https://en.wiktionary.org/wiki/antigen
|accessdate=7 March 2020 }}</ref> is called an '''antigen'''.
 
'''Def.''' any "substance that elicits [an] immune response"<ref name=ImmunogenWikt>{{ cite web
|author=[[wikt:User:SemperBlotto|SemperBlotto]]
|title=immunogen
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=21 April 2008
|url=https://en.wiktionary.org/wiki/immunogen
|accessdate=8 March 2020 }}</ref> is called an '''immunogen'''.
 
An antigen "or immunogen is a molecule that sometimes stimulates an immune system response."<ref name=AntigenWikidoc>{{ cite web
|author=C. Michael Gibson
|title=Antigen
|publisher=WikiDoc Foundation
|location=Boston, Massachusetts
|date=27 April 2008
|url=https://www.wikidoc.org/index.php/Antigen
|accessdate=8 March 2020 }}</ref> But, "the immune system does not consist of only antibodies",<ref name=AntigenWikidoc/> instead it "encompasses all substances that can be recognized by the [[adaptive immune system]]."<ref name=AntigenWikidoc/>
 
'''Def.''' "a protein produced by B-lymphocytes that binds to [a specific antigen or]<ref name=AntibodyWikt1>{{ cite web
|author=[[wikt:User:Williamsayers79|Williamsayers79]]
|title=antibody
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=26 February 2007
|url=https://en.wiktionary.org/wiki/antibody
|accessdate=7 March 2020 }}</ref> an antigen"<ref name=AntibodyWikt>{{ cite web
|author=[[wikt:User:Jag123|Jag123]]
|title=antibody
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=7 March 2005
|url=https://en.wiktionary.org/wiki/antibody
|accessdate=7 March 2020 }}</ref> is called an '''[[antibody]]'''.
 
Five different antibody isotypes are known in mammals, which perform different roles, and help direct the appropriate immune response for each different type of foreign object they encounter.<ref name=Market>{{ cite journal
|author=Eleonora Market, F. Nina Papavasiliou
|date=2003
|url=http://biology.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pbio.0000016
|title=V(D)J Recombination and the Evolution of the Adaptive Immune System
|journal=PLoS Biology
|volume=1
|issue=1
|pages=e16
|doi=10.1371/journal.pbio.0000016 }}</ref>
 
Although the general structure of all antibodies is very similar, a small region, known as the hypervariable region, at the tip of the protein is extremely variable, allowing millions of antibodies with slightly different tip structures to exist, where each of these variants can bind to a different target, known as an antigen.<ref name=Janeway5>{{ cite book | author = Charles A Janeway, Jr, Paul Travers, Mark Walport, and Mark J Shlomchik | title = Immunobiolog. | edition = 5th ed. | publisher = Garland Publishing | date = 2001 | url = http://www.ncbi.nlm.nih.gov/books/bv.fcgi?call=bv.View..ShowTOC&rid=imm.TOC&depth=10 | isbn = 0-8153-3642-X }}</ref>
 
'''Def.''' "any of the glycoproteins in blood serum that respond to invasion by foreign antigens and that protect the host by removing pathogens;"<ref name=ImmunoglobulinWikt>{{ cite web
|author=[[wikt:User:SemperBlotto|SemperBlotto]]
|title= immunoglobulin
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=25 February 2006
|url=https://en.wiktionary.org/wiki/immunoglobulin
|accessdate=7 March 2020 }}</ref> "an antibody"<ref name=ImmunoglobulinWikt1>{{ cite web
|author=[[wikt:User:SemperBlotto|SemperBlotto]]
|title= immunoglobulin
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=28 April 2008
|url=https://en.wiktionary.org/wiki/immunoglobulin
|accessdate=7 March 2020 }}</ref> is called an '''[[immunoglobulin]]'''.
 
Gene ID: 1 is A1BG [[alpha-1-B glycoprotein]] on 19q13.43, a 54.3 kDa [[protein]] in humans that is encoded by the A1BG [[gene]].<ref name=RefSeq1>{{ cite web
|author=RefSeq
|title=A1BG alpha-1-B glycoprotein [ Homo sapiens (human) ]
|publisher=National Center for Biotechnology Information
|location=U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA
|date=10 December 2019
|url=https://www.ncbi.nlm.nih.gov/gene/1
|accessdate=2019-12-18 }}</ref> A1BG is transcribed in the positive direction from ZNF497.<ref name=RefSeq1/> "The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins."<ref name=RefSeq1/>
# NP_570602.2  alpha-1B-glycoprotein precursor, '''cd05751''' Location: 401 → 493 Ig1_LILRB1_like; First immunoglobulin (Ig)-like domain found in Leukocyte Ig-like receptors (LILR)B1 (also known as LIR-1) and similar proteins, '''smart00410''' Location: 218 → 280 IG_like; Immunoglobulin like, '''pfam13895''' Location: 210 → 301 Ig_2; Immunoglobulin domain and '''cl11960''' Location: 28 → 110 Ig; Immunoglobulin domain.<ref name=RefSeq1/>
 
Patients who have pancreatic ductal [[adenocarcinoma]] show an [[overexpression]] of A1BG in [[pancreatic juice]].<ref name=Tian>{{ cite journal
|author=Mei Tian, Ya-Zhou Cui, Guan-Hua Song, Mei-Juan Zong, Xiao-Yan Zhou, Yu Chen, Jin-Xiang Han
| title = Proteomic analysis identifies MMP-9, DJ-1 and A1BG as overexpressed proteins in pancreatic juice from pancreatic ductal adenocarcinoma patients
| journal = BMC Cancer
| volume = 8
| issue =
| pages = 241
| date = 2008
| pmid = 18706098
| pmc = 2528014
| doi = 10.1186/1471-2407-8-241
|url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2528014/ }}</ref>
 
===Immunoglobulin supergene family===
{{main|Immunoglobulin supergene family}}
"𝛂<sub>1</sub>B-glycoprotein(𝛂<sub>1</sub>B) [...] consists of a single polypeptide chain N-linked to four
glucosamine oligosaccharides. The polypeptide has five intrachain disulfide bonds and contains 474 amino acid residues. [...] 𝛂<sub>1</sub>B exhibits internal duplication and consists of five repeating structural domains, each containing about 95 amino acids and one disulfide bond. [...] several domains of 𝛂<sub>1</sub>B, especially the third, show statistically significant homology to variable regions of certain immunoglobulin light and heavy chains. 𝛂<sub>1</sub>B [...] exhibits sequence similarity to other members of the [[immunoglobulin supergene family]] such as the receptor for transepithelial transport of IgA and IgM and the secretory component of human IgA."<ref name=Ishioka/>
 
"Some of the domains of 𝛂<sub>1</sub>B show significant homology to variable (V) and constant (C) regions of certain immunoglobulins. Likewise, there is statistically significant homology between 𝛂<sub>1</sub>B and the secretory component (SC) of human IgA (15) and also with the extracellular portion of the rabbit receptor for transepithelial transport of polymeric immunoglobulins (IgA and IgM). Mostov et al. (16) have called the later protein the poly-Ig receptor or poly-IgR and have shown that it is the precursor of SC."<ref name=Ishioka/>
 
The immunoglobulin supergene family is "the group of proteins that have immunoglobulin-like domains, including histocompatibility antigens, the T-cell antigen receptor, poly-IgR, and other proteins involved in the vertebrate immune response (17)."<ref name=Ishioka/>
 
"The internal homology in primary structure [...] and the presence of an intrasegment disulfide bond suggest that 𝛂<sub>1</sub>B is composed of five structural domains that arose by duplication of a primordial gene coding for about 95 amino acid residues."<ref name=Ishioka/>
 
"Unlike immunoglobulins (25), ceruloplasmin (6), and hemopexin (7), 𝛂<sub>1</sub>B is not subject to limited interdomain cleavage by proteolytic enzymes. At least, we were not able to produce such fragments by use of a variety of proteases. This stability of 𝛂<sub>1</sub>B is probably associated with the frequency of proline in the sequences linking the domains [...]."<ref name=Ishioka/>
 
"A peptide identified in the late and early milk proteomes showed homology to eutherian alpha 1B glycoprotein (A1BG), a plasma protein with unknown function<sup>46</sup>, as well as venom inhibitors characterised in the Southern opossum ''Didelphis marsupialis'' (DM43 and DM46<sup>47,48,49</sup>), all members of the immunoglobulin superfamily. To characterise the relationship between the peptide sequence identified in koala, A1BG, DM43 and DM46, a phylogenetic tree was constructed [...] including all marsupial and monotreme homologs (identified by BLAST), three phylogenetically representative eutherian sequences, with human IGSF1 and TARM1, related members of the immunoglobulin super family, used as outgroups. This phylogeny indicates that A1BG-like proteins in marsupials and the ''Didelphis'' antitoxic proteins are homologs of eutherian A1BG, with excellent bootstrap support (98%). The marsupial A1BG-like sequences and the ''Didelphis'' antitoxic proteins formed a single clade with strong bootstrap support (97%)."<ref name=Morris>{{ cite journal
|author=Katrina M. Morris, Denis O’Meally, Thiri Zaw, Xiaomin Song, Amber Gillett, Mark P. Molloy, Adam Polkinghorne, and Katherine Belova
|title=Characterisation of the immune compounds in koala milk using a combined transcriptomic and proteomic approach
|journal=Scientific Reports
|date=7 October 2016
|volume=6
|issue=
|pages=35011
|url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5054531/
|arxiv=
|bibcode=
|doi=10.1038/srep35011
|pmid=27713568
|accessdate=14 March 2020 }}</ref>
 
"Human TARM1 and IGSF1, related members of the immunoglobulin superfamily are used as outgroups. The tree was constructed using the maximum likelihood approach and the JTT model with bootstrap support values from 500 bootstrap tests. Bootstrap values less than 50% are not displayed. Accession numbers: Tasmanian devil (''Sarcophilus harrisii''; XP_012402143), Wallaby (''Macropus eugenii''; FY619507), Possum (''Trichosurus vulpecula''; DY596639) Virginia opossum (''Didelphis virginiana''; AAA30970, AAN06914), Southern opossum (''Didelphis marsupialis''; AAL82794, P82957, AAN64698), Human (''Homo sapiens''; P04217, B6A8C7, Q8N6C5), Platypus (''Ornithorhychus anatinus''; ENSOANP00000000762), Cow (''Bos taurus''; Q2KJF1), Alpaca (''Vicugna pacos''; XP_015107031)."<ref name=Morris/>
 
"The sequences of 𝛂<sub>1</sub>B-glycoprotein (38) and chicken N-CAM (neural cell-adhesion molecule) (39) have been shown to be related to the immunoglobulin supergene family."<ref name=Paxton>{{ cite journal
|author=R. J. Paxton, G. Mooser, H. Pande, T. D. Lee, and J. E. Shively
|title=Sequence analysis of carcinoembryonic antigen: identification of glycosylation sites and homology with the immunoglobulin supergene family
|journal=Proceedings of the National Academy of Sciences USA
|date=1 February 1987
|volume=84
|issue=4
|pages=920-924
|url=https://www.pnas.org/content/pnas/84/4/920.full.pdf
|arxiv=
|bibcode=
|doi=10.1073/pnas.84.4.920
|pmid=3469650
|accessdate=26 March 2020 }}</ref>
 
A1BG contains the immunoglobulin domain: '''cl11960''' and three immunoglobulin-like domains: '''pfam13895''', '''cd05751''' and '''smart00410'''.
 
"Immunoglobulin (Ig) domain ['''cl11960'''] found in the Ig superfamily. The Ig superfamily is a heterogenous group of proteins, built on a common fold comprised of a sandwich of two beta sheets. Members of this group are components of immunoglobulin, neuroglia, cell surface glycoproteins, such as, T-cell receptors, CD2, CD4, CD8, and membrane glycoproteins, such as, butyrophilin and chondroitin sulfate proteoglycan core protein. A predominant feature of most Ig domains is a disulfide bridge connecting the two beta-sheets with a tryptophan residue packed against the disulfide bond."<ref name=NCBI386229>{{ cite web
|author=NCBI
|title=Conserved Protein Domain Family cl11960: Ig Superfamily
|publisher=National Center for Biotechnology Information, U.S. National Library of Medicine
|location=8600 Rockville Pike, Bethesda MD, 20894 USA
|date=2 February 2016
|url=https://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=386229
|accessdate=22 May 2020 }}</ref>
 
"This domain ['''pfam13895'''] contains immunoglobulin-like domains."<ref name=NCBI372793>{{ cite web
|author=NCBI
|title=Conserved Protein Domain Family pfam13895: Ig_2
|publisher=National Center for Biotechnology Information, U.S. National Library of Medicine
|location=8600 Rockville Pike, Bethesda MD, 20894 USA
|date=5 August 2015
|url=https://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=372793
|accessdate=24 May 2020 }}</ref>
 
"Ig1_LILR_KIR_like: ['''cd05751'''] domain similar to the first immunoglobulin (Ig)-like domain found in Leukocyte Ig-like receptors (LILRs) and Natural killer inhibitory receptors (KIRs). This group includes LILRB1 (or LIR-1), LILRA5 (or LIR9), an activating natural cytotoxicity receptor NKp46, the immune-type receptor glycoprotein VI (GPVI), and the IgA-specific receptor Fc-alphaRI (or CD89). LILRs are a family of immunoreceptors expressed on expressed on T and B cells, on monocytes, dendritic cells, and subgroups of natural killer (NK) cells. The human LILR family contains nine proteins (LILRA1-3,and 5, and LILRB1-5). From functional assays, and as the cytoplasmic domains of various LILRs, for example LILRB1 (LIR-1), LILRB2 (LIR-2), and LILRB3 (LIR-3) contain immunoreceptor tyrosine-based inhibitory motifs (ITIMs) it is thought that LIR proteins are inhibitory receptors. Of the eight LIR family proteins, only LIR-1 (LILRB1), and LIR-2 (LILRB2), show detectable binding to class I MHC molecules; ligands for the other members have yet to be determined. The extracellular portions of the different LIR proteins contain different numbers of Ig-like domains for example, four in the case of LILRB1 (LIR-1), and LILRB2 (LIR-2), and two in the case of LILRB4 (LIR-5). The activating natural cytotoxicity receptor NKp46 is expressed in natural killer cells, and is organized as an extracellular portion having two Ig-like extracellular domains, a transmembrane domain, and a small cytoplasmic portion. GPVI, which also contains two Ig-like domains, participates in the processes of collagen-mediated platelet activation and arterial thrombus formation. Fc-alphaRI is expressed on monocytes, eosinophils, neutrophils and macrophages; it mediates IgA-induced immune effector responses such as phagocytosis, antibody-dependent cell-mediated cytotoxicity and respiratory burst."<ref name=NCBI319306>{{ cite web
|author=NCBI
|title=Conserved Protein Domain Family cd05751: Ig1_LILR_KIR_like
|publisher=National Center for Biotechnology Information, U.S. National Library of Medicine
|location=8600 Rockville Pike, Bethesda MD, 20894 USA
|date=16 August 2016
|url=https://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=319306
|accessdate=24 May 2020 }}</ref>
 
"IG domains ['''smart00410'''] that cannot be classified into one of IGv1, IGc1, IGc2, IG."<ref name=NCBI214653>{{ cite web
|author=NCBI
|title=Conserved Protein Domain Family smart00410: IG_like
|publisher=National Center for Biotechnology Information, U.S. National Library of Medicine
|location=8600 Rockville Pike, Bethesda MD, 20894 USA
|date=16 January 2013
|url=https://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=214653
|accessdate=24 May 2020 }}</ref>
"𝛂<sub>1</sub>B-glycoprotein(𝛂<sub>1</sub>B) [...] consists of a single polypeptide chain N-linked to four
glucosamine oligosaccharides. The polypeptide has five intrachain disulfide bonds and contains 474 amino acid residues. [...] 𝛂<sub>1</sub>B exhibits internal duplication and consists of five repeating structural domains, each containing about 95 amino acids and one disulfide bond. [...] several domains of 𝛂<sub>1</sub>B, especially the third, show statistically significant homology to variable regions of certain immunoglobulin light and heavy chains. 𝛂<sub>1</sub>B [...] exhibits sequence similarity to other members of the immunoglobulin supergene family such as the receptor for transepithelial transport of IgA and IgM and the secretory component of human IgA."<ref name=Ishioka>{{ cite journal
|author=Noriaki Ishioka, Nobuhiro Takahashi, and Frank W. Putnam
|title=Amino acid sequence of human plasma 𝛂<sub>1</sub>B-glycoprotein: Homology to the immunoglobulin supergene family
|journal=Proceedings of the National Academy of Sciences USA
|date=April 1986
|volume=83
|issue=8
|pages=2363-7
|url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC323297/pdf/pnas00312-0089.pdf
|arxiv=
|bibcode=
|doi=10.1073/pnas.83.8.2363
|pmid=3458201
|accessdate=9 March 2020 }}</ref>
 
===A1BG protein species===
 
'''Def.''' a "group of plants or animals having similar appearance"<ref name=SpeciesWikt>{{ cite web
|author=[[wikt:User:24.98.118.180|24.98.118.180]]
|title=species
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=28 February 2007
|url=https://en.wiktionary.org/wiki/species
|accessdate=25 March 2020 }}</ref> or "the largest group of organisms in which [any]<ref name=Species1/> two individuals [of the appropriate sexes or mating types]<ref name=Species1>{{ cite web
|author=[[w:User:Peter coxhead|Peter coxhead]]
|title=Species
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=22 August 2018
|url=https://en.wikipedia.org/wiki/Species
|accessdate=25 March 2020 }}</ref> can produce fertile offspring, typically by sexual reproduction"<ref name=Species>{{ cite web
|author=[[w:User:Chiswick Chap|Chiswick Chap]]
|title=Species
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=1 December 2016
|url=https://en.wikipedia.org/wiki/Species
|accessdate=25 March 2020 }}</ref> is called a '''species'''.
 
The gene contains 20 distinct introns.<ref name=AceView>{{ cite web
|title=AceView: A1BG
|url=https://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/av.cgi?db=human&term=a1bg&submit=Go
|accessdate=May 11, 2013 }}</ref> Transcription produces 15 different mRNAs, 10 alternatively spliced variants and 5 unspliced forms.<ref name="AceView"/> There are 4 probable alternative promoters, 4 non overlapping alternative last exons and 7 validated alternative polyadenylation sites.<ref name="AceView"/> The mRNAs appear to differ by truncation of the 5' end, truncation of the 3' end, presence or absence of 4 cassette exons, overlapping exons with different boundaries, splicing versus retention of 3 introns.<ref name="AceView"/>
 
====Variants or isoforms====
 
'''Def.''' a "different sequence of a gene (locus)"<ref name=VariantWikt>{{ cite web
|author=[[wikt:User:Pdeitiker|Pdeitiker]]
|title=variant
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=26 July 2008
|url=https://en.wiktionary.org/wiki/variant
|accessdate=25 March 2020 }}</ref> is called a '''variant'''.
 
'''Def.''' any "of several different forms of the same protein, arising from either single nucleotide polymorphisms,<ref name=IsoformWikt1>{{ cite web
|author=[[wikt:User:SemperBlotto|SemperBlotto]]
|title=isoform
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=6 January 2007
|url=https://en.wiktionary.org/wiki/isoform
|accessdate=2 December 2018 }}</ref> differential splicing of mRNA, or post-translational modifications (e.g. sulfation, glycosylation, etc.)"<ref name=IsoformWikt2>{{ cite web
|author=[[wikt:User:72.178.245.181|72.178.245.181]]
|title=isoform
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=30 November 2008
|url=https://en.wiktionary.org/wiki/isoform
|accessdate=2 December 2018 }}</ref> is called an '''isoform'''.
 
Regarding additional isoforms, mention has been made of "new genetic variants of A1BG."<ref name=Eiberg>{{ cite journal
|author=H Eiberg, ML Bisgaard, J Mohr
|title=Linkage between alpha 1B-glycoprotein (A1BG) and Lutheran (LU) red blood group system: assignment to chromosome 19: new genetic variants of A1BG
|journal=Clinical genetics
|date=1 December 1989
|volume=36
|issue=6
|pages=415-8
|url=http://europepmc.org/abstract/MED/2591067
|arxiv=
|bibcode=
|doi=
|pmid=2591067
|accessdate=2017-10-08 }}</ref>
 
"Proteomic analysis revealed that [a circulating] set of plasma proteins was α 1 B-glycoprotein ('''A1BG''') and its
post-translationally modified isoforms."<ref name=Stehle>{{ cite journal
|author=John R. Stehle Jr., Mark E. Weeks, Kai Lin, Mark C. Willingham, Amy M. Hicks, John F. Timms, Zheng Cui
|title=Mass spectrometry identification of circulating alpha-1-B glycoprotein, increased in aged female C57BL/6 mice
|journal=Biochimica et Biophysica Acta (BBA) - General Subjects
|date=January 2007
|volume=1770
|issue=1
|pages=79-86
|url=http://www.sciencedirect.com/science/article/pii/S0304416506001826
|arxiv=
|bibcode=
|doi=10.1016/j.bbagen.2006.06.020
|pmid=16945486
|accessdate=2017-10-08 }}</ref>
 
Pharmacogenomic variants have been reported.<ref name=McDonough/>
 
====Genotypes====
 
'''Def.''' the "part (DNA sequence) of the genetic makeup of an organism which determines a specific characteristic (phenotype) of that organism"<ref name=GenotypeWikt1>{{ cite web
|author=[[wikt:User:DTLHS|DTLHS]]
|title=genotype
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=10 January 2018
|url=https://en.wiktionary.org/wiki/genotype
|accessdate=25 March 2020 }}</ref> or a "group of organisms having the same genetic constitution" <ref name=GenotypeWikt>{{ cite web
|author=[[wikt:User:SemperBlotto|SemperBlotto]]
|title=genotype
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=22 October 2005
|url=https://en.wiktionary.org/wiki/genotype
|accessdate=25 March 2020 }}</ref>is called a '''genotype'''.
 
There are A1BG genotypes.<ref name=McDonough>{{ cite journal
|author=Caitrin W. McDonough, Yan Gong, Sandosh Padmanabhan, Ben Burkley, Taimour Y. Langaee, Olle Melander, Carl J. Pepine, Anna F. Dominiczak, Rhonda M. Cooper-DeHoff, and Julie A. Johnson
|title=Pharmacogenomic Association of Nonsynonymous SNPs in ''SIGLEC12'', ''A1BG'', and the Selectin Region and Cardiovascular Outcomes
|journal=Hypertension
|date=June 2013
|volume=62
|issue=1
|pages=48-54
|url=http://hyper.ahajournals.org/content/hypertensionaha/early/2013/05/20/HYPERTENSIONAHA.111.00823.full.pdf
|arxiv=
|bibcode=
|doi=10.1161/HYPERTENSIONAHA.111.00823
|pmid=23690342
|accessdate=2017-10-08 }}</ref>
 
A1BG has a genetic risk score of rs893184.<ref name=McDonough/>
 
"A genetic risk score, including rs16982743, rs893184, and rs4525 in F5, was significantly associated with treatment-related adverse cardiovascular outcomes in whites and Hispanics from the INVEST study and in the Nordic Diltiazem study (meta-analysis interaction P=2.39×10<sup>−5</sup>)."<ref name=McDonough/>
 
====Polymorphs====
 
'''Def.''' the "regular existence of two or more different genotypes within a given species or population; also, variability of amino acid sequences within a gene's protein"<ref name=PolymorphismWikt>{{ cite web
|author=[[wikt:User:Widsith|Widsith]]
|title=polymorphism
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=28 March 2012
|url=https://en.wiktionary.org/wiki/polymorphism
|accessdate=25 March 2020 }}</ref> is called '''polymorphism'''.
 
'''Def.''' "one of a number of alternative forms of the same gene occupying a given position, [or locus],<ref name=AlleleWikt1>{{ cite web
|author=[[wikt:User:217.105.66.98|217.105.66.98]]
|title=allele
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=8 September 2016
|url=https://en.wiktionary.org/wiki/allele
|accessdate=25 March 2020 }}</ref> on a chromosome"<ref name=AlleleWikt>{{ cite web
|author=[[wikt:User:138.130.33.215|138.130.33.215]]
|title=allele
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=7 April 2004
|url=https://en.wiktionary.org/wiki/allele
|accessdate=25 March 2020 }}</ref> is called an '''allele'''.
 
"rs893184 causes a histidine (His) to arginine (Arg) [nonsynonymous single nucleotide polymorphism (nsSNP), A (minor) for G (major)] substitution at amino acid position 52 in A1BG."<ref name=McDonough/>
 
"Genetic polymorphism of human plasma (serum) alpha 1B-glycoprotein (alpha 1B) was observed using one-dimensional horizontal polyacrylamide gel electrophoresis (PAGE) pH 9.0 of plasma samples followed by Western blotting with specific antiserum to alpha 1B."<ref name=Gahne>{{ cite journal
|author=B. Gahne, R. K. Juneja, and A. Stratil
|title=Genetic polymorphism of human plasma alpha 1B-glycoprotein: phenotyping by immunoblotting or by a simple method of 2-D electrophoresis
|journal=Human Genetics
|date=June 1987
|volume=76
|issue=2
|pages=111-5
|url=https://link.springer.com/article/10.1007%2FBF00284904
|arxiv=
|bibcode=
|doi=10.1007/bf00284904
|pmid=3610142
|accessdate=25 March 2020 }}</ref>
 
''A1B*5'' is a "new allele [...] of human plasma 𝜶<sub>1</sub>B-glycoprotein [...]."<ref name=Juneja1989>{{ cite journal
|author=R.K. Juneja, G. Beckman, M. Lukka, B. Gahne, and C. Ehnholm
|title=Plasma α<sub>1</sub>B-Glycoprotein Allele Frequencies in Finns and Swedish Lapps: Evidence for a New α<sub>1</sub>B Allele
|journal=Human Heredity
|date=1989
|volume=39
|issue=1
|pages=32-36
|url=https://www.karger.com/Article/Abstract/153828
|arxiv=
|bibcode=
|doi=10.1159/000153828
|pmid=2759622
|accessdate=25 March 2020 }}</ref>
 
"Genetic polymorphism of human plasma 𝜶<sub>1</sub>B-glycoprotein (𝜶<sub>1</sub>B) was reported first, in brief, by Altland ''et al.'' [1983; also given in Altkand and Hacklar, 1984]. A detailed description of human 𝜶<sub>1</sub>B polymorphism was reported in subsequent studies [Gahne ''et al.'', 1987; Juneja ''et al.'', 1988, 1989]. Five different 𝜶<sub>1</sub>B alleles (''A1B*1, A1B*2, A1B*3, A1B*4'' and ''A1B*5'') were reported. In Caucasian whites, the frequencies of ''A1B*1'' and ''''A1B*2'' were about 0.95 and 0.05, respectively. ''A1B*4'' was observed in 2 related Czech individuals. In American blacks, ''A1B*1'' and ''A1B*2'' occurred with a frequency of 0.73 and 0.21, respectively, while a new allele, viz, ''A1B*3'' had a frequency of 0.06. ''A1B*5'' was observed only in Swedish Lapps and in Finns with a frequency of 0.04 and 0.007, respectively."<ref name=Juneja>{{ cite journal
|author=R.K. Juneja, N. Saha, B. Gahne and J.S.H. Tay
|title=Distribution of Plasma Alpha-1-B-Glycoprotein Phenotypes in Several Mongoloid Populations of East Asia
|journal=Human Heredity
|date=1989
|volume=39
|issue=
|pages=218-222
|url=https://www.karger.com/Article/PDF/153863
|arxiv=
|bibcode=
|doi=10.1159/000153863
|pmid=2583734
|accessdate=25 March 2020 }}</ref>
 
"The frequency of ''A1B*1'' varied from 0.89 to 0.91 and that of ''A1B*2'' from 0.08 to 0.10. The ''A1B*3'' allele, reported previously only in American blacks, was observed with a frequency range of 0.003-0.01 in 3 of the Chinese populations, in Koreans and in Malays. A new 𝜶<sub>1</sub>B allele (''A1B*6'') was observed in 2 Chinese individuals."<ref name=Juneja/>
 
====Phenotypes====
 
'''Def.''' the "appearance of an organism based on a single trait [multifactorial combination of genetic traits and environmental factors]<ref name=PhenotypeWikt2>{{ cite web
|author=[[wikt:User:24.235.196.118|24.235.196.118]]
|title=phenotype
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=23 September 2007
|url=https://en.wiktionary.org/wiki/phenotype
|accessdate=2016-10-04 }}</ref>, especially used in pedigrees"<ref name=PhenotypeWikt1>{{ cite web
|author=[[wikt:User:SemperBlotto|SemperBlotto]]
|title=phenotype
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=14 February 2005
|url=https://en.wiktionary.org/wiki/phenotype
|accessdate=2016-10-04 }}</ref> or any "observable characteristic of an organism, such as its morphological, developmental, biochemical or physiological properties, or its behavior"<ref name=PhenotypeWikt>{{ cite web
|author=[[wikt:User:N2e|N2e]]
|title=phenotype
|publisher=Wikimedia Foundation, Inc
|location=San Francisco, California
|date=3 July 2008
|url=https://en.wiktionary.org/wiki/phenotype
|accessdate=2016-10-04 }}</ref> is called a '''phenotype'''.
 
"The three different phenotypes of α1B observed (designated 1-1, 1-2, and 2-2) were apparently identical to those reported by Altland et al. (1983), who used double one-dimensional electrophoresis. Family data supported the hypothesis that the three α1B phenotypes are determined by two codominant alleles at an autosomal locus, designated A1B. Allele frequencies in a Swedish population were: A1B *1, 0.937; A1B *2, 0.063; PIC, 0.111."<ref name=Gahne/>
 
====Protein species====
 
"Both protein species of [alpha 1-beta glycoprotein] A1B (A1Ba, p = 0.008; f.c.= +1.62, A1Bb, p = 0.003; f.c. = +1.82) [...] were apparently overexpressed in patients with PTCa [...]."<ref name=Abdullah>{{ cite journal
|author=Mardiaty Iryani Abdullah, Ching Chin Lee, Sarni Mat Junit, Khoon Leong Ng, and Onn Haji Hashim
|title=Tissue and serum samples of patients with papillary thyroid cancer with and without benign background demonstrate different altered expression of proteins
|journal=Peer J
|date=13 September 2016
|volume=4
|issue=
|pages=e2450
|url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5028788/
|arxiv=
|arxiv=
|bibcode=
|bibcode=
|doi=10.7717/peerj.2450
|doi=10.1002/jcb.28315
|pmid=27672505
|pmid=30556161
|accessdate=15 March 2020 }}</ref>
|accessdate=16 May 2023 }}</ref> Extensive evidence indicates that long noncoding RNAs (lncRNAs) regulate the tumorigenesis and progression of hepatocellular carcinoma (HCC).<ref name=Bai/>
 
A1BG is mainly produced in the liver, and is secreted to plasma to levels of approximately 0.22 mg/mL.<ref name=Ishioka/>
 
===CRISPs===
 
The human cysteine-rich secretory protein (CRISP3) "is present in exocrine secretions and in secretory granules of neutrophilic granulocytes and is believed to play a role in innate immunity."<ref name=Udby>{{ cite journal
|author=Udby L, Sørensen OE, Pass J, Johnsen AH, Behrendt N, Borregaard N, Kjeldsen L.
|title=Cysteine-rich secretory protein 3 is a ligand of alpha1B-glycoprotein in human plasma
|journal=Biochemistry
|date=12 October 2004
|volume=43
|issue=40
|pages=12877-86
|url=https://pubs.acs.org/doi/10.1021/bi048823e
|arxiv=
|bibcode=
|doi=10.1021/bi048823e
|pmid=15461460
|accessdate=2011-11-28 }}</ref> CRISP3 has a relatively high content in human plasma.<ref name=Udby/>
 
"The A1BG-CRISP-3 complex is noncovalent with a 1:1 stoichiometry and is held together by strong electrostatic forces."<ref name=Udby/> "Similar [complex formation] between toxins from snake venom and A1BG-like plasma proteins ... inhibits the toxic effect of snake venom metalloproteinases or myotoxins and protects the animal from envenomation."<ref name=Udby/>


Opossums have a remarkably robust immune system, and show partial or total immunity to the venom of rattlesnakes, ''Agkistrodon piscivorus'', cottonmouths, and other ''Crotalinae'', pit vipers.<ref>{{ cite web
The underexpression of A1BG-AS1 was found in HCC via analysis of The Cancer Genome Atlas database.<ref name=Bai/> A1BG-AS1 expression in HCC was markedly lower than that in noncancerous tissues.<ref name=Bai/>
|url=http://www.wildliferescueleague.org/report/opossum.html
|title=The Opossum: Our Marvelous Marsupial, The Social Loner
|publisher=Wildlife Rescue League }}</ref><ref>[http://www.scielo.br/scielo.php?pid=S0104-79301999000100005&script=sci_arttext Journal Of Venomous Animals And Toxins – Anti-Lethal Factor From Opossum Serum Is A Potent Antidote For Animal, Plant And Bacterial Toxins]. Retrieved 2009-12-29.</ref>
 
"Crisp3 [is] mainly [expressed] in the salivary glands, pancreas, and prostate."<ref name=Haendler>{{ cite journal
|author=B Haendler, J Krätzschmar, F Theuring and W D Schleuning
|title=Transcripts for cysteine-rich secretory protein-1 (CRISP-1; DE/AEG) and the novel related CRISP-3 are expressed under androgen control in the mouse salivary gland.
|journal=Endocrinology
|date=July 1993
|volume=133
|issue=1
|pages=192-8
|url=http://endo.endojournals.org/content/133/1/192.full.pdf+html
|arxiv=
|bibcode=
|doi=10.1210/en.133.1.192
|pmid= 8319566
|accessdate=2012-02-20 }}</ref> "CRISP3 is highly expressed in the human cauda epididymidis and ampulla of vas deferens (Udby et al. 2005)."<ref name=Haendler/>


==ZNF497==
==ZNF497==
{{main|ZNF497}}
{{main|ZNF497}}
Gene ID: 503538 is [[A1BG-AS1]] A1BG antisense RNA 1.<ref name=HGNC503538>{{ cite web
|author=HGNC
|title=A1BG-AS1 A1BG antisense RNA 1 [ Homo sapiens (human) ]
|publisher=National Center for Biotechnology Information
|location=U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA
|date=10 December 2019
|url=https://www.ncbi.nlm.nih.gov/gene/503538
|accessdate=2019-12-18 }}</ref> A1BG-AS1 is transcribed in the negative direction from ZSCAN22.<ref name=HGNC503538/>


Gene ID: 162968 is [[ZNF497]] zinc finger protein 497.<ref name=HGNC162968>{{ cite web
Gene ID: 162968 is [[ZNF497]] zinc finger protein 497.<ref name=HGNC162968>{{ cite web

Latest revision as of 21:09, 16 May 2023

Associate Editor(s)-in-Chief: Henry A. Hoff

Alpha-1-B glycoprotein is a 54.3 kDa protein in humans that is encoded by the A1BG gene.[1] The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins.

A1BG was located on the DNA strand of chromosome 19.[2] Additionally, A1BG, in current nucleotide numbering (58,345,183-58,353,492), is located adjacent to the ZSCAN22 gene (58,326,994-58,342,332) on the positive DNA strand, as well as the ZNF837 (58,367,623 - 58,381,030, complement) and ZNF497 (58,354,357 - 58,362,751, complement) genes on the negative strand.[2]

In the current nucleotide numbering, the A1BG untranslated region (UTR) has been expanded so that with ZSCAN22 ending at 58,342,332, the nucleotides used in this study are 58,342,333 to 58,346,892 on both strands, with the current UTR for A1BG beginning at 58,345,183. On the other side of A1BG ending at 58,353,492, the nucleotides used are 58,353,493 to 58,357,937. With ZNF497 beginning at 58,354,357, this study goes into ZNF497 to 58,357,937 or 3580 nucleotides from its downstream TSS or 4445 nucleotides from the TSS of A1BG downstream from ZNF497.

For example, an abscisic acid responsive element (ABRE) with the consensus sequence of ACGTG(G/T)C (Watanabe et al. 2017) occurs in the positive strand in the negative direction from ZSCAN22 to A1BG as ACGTGGC ending at 4239 nucleotides from the end of ZSCAN22 or 58,346,571, where the A is at 58,346,565 inside the UTR of A1BG.

Introduction

"Many important disease-related pathways utilize transcription factors that specifically bind DNA (e.g., c-Myc, HIF-1, TCF1, p53) as key nodes or endpoints in complex signaling networks. In such cases the transcription factor itself is often the most attractive target. However, drugging transcription factors is challenging owing to an absence of small ligand binding sites in their DNA-binding domain and the presence of a highly charged DNA-binding surface [1]."[3]

If a specific gene appears to be involved in a disease-related or deleterious pathway being able to alter its expression so as to improve the person's health may be needed. To alter its expression constructively may require knowing what regulatory elements exist in the gene's nearby promoters.

Response elements

Identifying a bona fide response element is more difficult than a simple inspection. In order to attribute the response element to a candidate sequence, some observations have to be conducted using molecular, biological and biophysical methods and functional approaches. Findings may indicate that response element in the promoter is a functional element.[4]

A likely response element found by simple inspection may also be inactive due to methylation.

Response Elements: "Nucleotide sequences, usually upstream, which are recognized by specific regulatory transcription factors, thereby causing gene response to various regulatory agents. These elements may be found in both promoter and enhancer regions."[5]

"Under conditions of stress, a transcription activator protein binds to the response element and stimulates transcription. If the same response element sequence is located in the control regions of different genes, then these genes will be activated by the same stimuli, thus producing a coordinated response."[6]

WD-40 repeat family

"Receptor for activated C kinase (RACK1) is a highly conserved, eukaryotic protein of the WD-40 repeat family. [...] During Phaseolus vulgaris root development, RACK1 (PvRACK1) mRNA expression was induced by auxins, abscisic acid, cytokinin, and gibberellic acid."[7]

Abscisic acid (ABA) response elements

Auxin response factors

ARFUs
ARFBs
ARF2s
ARF5s

CAACTC regulatory elements

CAREs (Fan)
CAREs (Garaeva)

Cytokinins

ARR1s
ARR10s
ARR12s
ARRFs
ARRR1s
ARRR2s

Coupling elements

CE3Ws
CE3Ds

EREs

Gibberellic acid response elements

GAREs
GAREL1s

Hypoxia response elements

HIFs
HREs
CACAs

Pyrimidine boxes

TAT boxes

TATFs
TATYs

General Regulatory Factors

The following general regulatory factors occur in the promoters between ZSCAN22, A1BG and ZNF497 on human chromosome 19.

Abfms

Rap1s

Reb1s

Tbf1s

Basic leucine zipper (bZIP) class response elements

A-boxes

ACGTs

"A majority of the plant bZIP proteins isolated to date recognize elements with an ACGT core (Foster et al., 1994)."[8]

"Most recombinant bZIP proteins can interact with ACGT elements derived from different plant genes, albeit with different affinity. Systematic protein/DNA binding studies have shown that sequences flanking the ACGT core affect bZIP protein binding specificity. These studies have provided the basis for a concise ACGT nomenclature and defined high-affinity A-box, C-box, and G-box elements."[9]

"HY5 binds to the promoter of light-responsive genes featuring "ACGT-containing elements" such as the G-box (CACGTG), C-box (GACGTC), Z-box (ATACGGT), and A-box (TACGTA) (4, 6)."[10]

Activating transcription factors

ATFBs
ATFKs

Affinity Capture-Western; Two-hybrid transcription factors

AFTs

Box As

C-boxes

C-boxes come in several varieties:

C-boxes (Johnson)
C boxes (Samarsky)
C boxes (Voronina)
C boxes (Song)
C boxes (Song hybrids)

Hybrids: C/A-box (TGACGTAT), C/G-box (TGACGTGT), C/T-box (TGACGTTA).

CAMPs

ESRE

The endoplasmic reticulum stress response element (ESRE) has two parts: (1) CCAAT and (2) CCACG which are tested separately then compared to see if any parts have any nine nucleotides between them.

CCAAT
CCACG

According to So (2018) the endoplasmic reticulum stress response element should be CCAAT-N9-CCACG. Samplings demonstrate that the ideal CCAAT-N9-CCACG or its complement inverse do not occur on either side of A1BG or close to ZSCAN22 or ZNF497.

Hap motif

G-boxes

G-box (CACGTG)

GCN4 motif

GCREs (Gcn4)

Migs

Nuclear factors

NFATs
HNF6s

T boxes

TboxCs
TboxZs

Vboxes

Z-boxes

ZboxGs
ZboxSps

Helix-turn-helix (HTH) transcription factors

Gene ID: 4602 is MYB [myeloblastosis] MYB proto-oncogene, transcription factor on 6q23.3: "This gene encodes a protein with three HTH DNA-binding domains that functions as a transcription regulator. This protein plays an essential role in the regulation of hematopoiesis. This gene may be aberrently expressed or rearranged or undergo translocation in leukemias and lymphomas, and is considered to be an oncogene. Alternative splicing results in multiple transcript variants."[11]

CadC binding domains

Factor II B recognition elements

Forkhead boxes

Homeoboxes

Homeodomains

HSE3 (Eastmond)

HSE4 (Eastmond)

HSE8 GAP1 (Eastmond)

HSE9 GAP2 (Eastmond)

Hsf (Tang)

MREs

Tryptophan residues

Basic helix-loop-helix (bHLH) transcription factors

"The [palindromic E-box motif (CACGTG)] motif is bound by the transcription factor Pho4, [and has the] class of basic helix-loop-helix DNA binding domain and core recognition sequence (Zhou and O'Shea 2011)."[12]

"Pho4 bound to virtually all E-boxes in vitro (96%) [...]. That was not the case in vivo, where only 5% were bound by Pho4, under activating conditions as determined by ChIP-seq [Zhou and O'Shea 2011]."[12]

"Pho4 possesses the intrinsic ability to bind every E-box, but in vivo is prevented from binding by chromatin unless assisted by chromatin remodelers (Svaren et al. 1994) that are targeted at promoter regions."[12]

"On one end of that spectrum, typical transcription factors like Pho4 do not appear to compete with nucleosomes and instead predominantly sample motifs that already exist in the [nucleosome-free promoter regions] NFRs generated by other factors. In vitro (PB-exo), Pho4 bound nearly every instance of an E-box motif across the yeast genome. However, in vivo, Pho4 is a low-abundance protein that is recruited to the nucleus upon phosphate starvation by other factors, to act at a few dozen genes (Komeili and O'Shea 1999; Zhou and O'Shea 2011). Since Pho4 appears unable to compete with nucleosomes, competent sites that are occluded by nucleosomes are invisible to Pho4."[12]

The Pho4 homodimer binds to DNA sequences containing the bHLH binding site CACGTG.[13]

The upstream activating sequence (UAS) for Pho4p is CAC(A/G)T(T/G) in the promoters of HIS4 and PHO5 regarding phosphate limitation with respect to regulation of the purine and histidine biosynthesis pathways [66].[14]

bHLH proteins typically bind to a consensus sequence called an E-box, CANNTG.[15]

"A computer search for transcription promoter elements [...] showed the presence of a prominent TATA box 22 nucleotides upstream of the transcription start site and an Sp1 site at position -42 to -33. The 5'-flanking sequence also contains three E boxes with CANNTG consensus sequences at positions -464 to -459, -90 to -85, and -52 to -47 that have been marked as E box, E1 box, and E2 box, respectively [...]. In addition, the 5'-flanking region contains one or more GRE, XRE, GATA-1, GCN-4, PEA-3, AP1, and AP2 consensus motifs and also three imperfect CArG sites [...]."[16]

AhRYs

AHRE-IIs

AEREs

CAT boxes

CAT-box-like elements

"Class C"

"Class I"

TCFs

DIOXs

Enhancer boxes

ChoRE motifs
CarbE1s
CarbE2s
CarbE3s
Phors

Palindromic E-box motif (CACGTG).

E2 boxes

GATAs

Gln3s

Glucocorticoid response elements

ICRE (Lopes)

ICRE (Schwank)

Pho4

QRDREs

Carbon source-responsive elements

CATTCAs
TCCGs

XREs

Basic helix-loop-helix leucine zipper transcription factors

Basic helix-loop-helix leucine zipper transcription factors are, as their name indicates, transcription factors containing both Basic helix-loop-helix and leucine zipper motifs.

Examples include Microphthalmia-associated transcription factor and Sterol regulatory element-binding protein (SREBP).

MITF recognizes E-box (CAYRTG) and M-box (TCAYRTG or CAYRTGA) sequences in the promoter regions of target genes.[17]

Serum response element gene transcriptions: The SRE wild type (SREwt) contains the nucleotide sequence ACAGGATGTCCATATTAGGACATCTGC, of which CCATATTAGG is the CArG box, TTAGGACAT is the C/EBP box, and CATCTG is the E box.[18]

"Serum response factor (SRF) is an important transcription factor that regulates cardiac and skeletal muscle genes during development, maturation and adult aging [17,18]. SRF regulates its target genes by binding to serum response elements (SREs), which contain a consensus CC(A/T)6GG (CArG) motif."[19]

CArG boxes

MITF E-boxes

RREs

Consensus sequence: CATCTG.

M-boxes

M box (Bertolotto)
M-box (Hoek)
M-box (Ripoll)

SER elements

Basic helix-span-helix

Activating proteins

AP2as
APCo1s
APCo2s
APM3Ns
APM4Ns
Yao1s
Yao2s
Yau3s

"Pemphigus foliaceus (PF) is an autoimmune disease, endemic in Brazilian rural areas, characterized by acantholysis and accompanied by complement activation, with generalized or localized distribution of painful epidermal blisters. CD59 is an essential complement regulator, inhibiting formation of the membrane attack complex, and mediating signal transduction and activation of T lymphocytes. CD59 has different transcripts by alternative splicing, of which only two are widely expressed, suggesting the presence of regulatory sites in their noncoding regions. To date, there is no association study with polymorphisms in CD59 noncoding regions and susceptibility to autoimmune diseases. In this study, we aimed to evaluate if CD59 polymorphisms have a possible regulatory effect on gene expression and susceptibility to PF. Six noncoding polymorphisms were haplotyped in 157 patients and 215 controls by sequence-specific PCR, and CD59 mRNA levels were measured in 82 subjects, by qPCR. The rs861256-allele-G (rs861256*G) was associated with increased mRNA expression (p = .0113) and PF susceptibility in women (OR = 4.11, p = .0001), which were also more prone to develop generalized lesions (OR = 4.3, p = .009) and to resist disease remission (OR = 3.69, p = .045). Associations were also observed for rs831625*G (OR = 3.1, p = .007) and rs704697*A (OR = 3.4, p = .006) in Euro-Brazilian women, and for rs704701*C (OR = 2.33, p = .037) in Afro-Brazilians. These alleles constitute the GGCCAA haplotype, which also increases PF susceptibility (OR = 4.9, p = .045) and marks higher mRNA expression (p = .0025). [...] higher CD59 transcriptional levels may be related with PF susceptibility (especially in women), probably due to the effect of genetic polymorphism and to the CD59 role in T cell signal transduction."[20]

Stem-loops

File:Stem-loop.svg
An example of an RNA stem-loop is shown. Credit: Sakurambo.{{free media}}

As an important secondary structure of RNA, a stem-loop can direct RNA folding, protect structural stability for messenger RNA (mRNA), provide recognition sites for RNA binding proteins, and serve as a substrate for enzymatic reactions.[21]

Hairpin loops are often elements found within the 5'UTR of prokaryotes. These structures are often bound by proteins or cause the attenuation of a transcript in order to regulate translation.[22]

The mRNA stem-loop structure forming at the ribosome binding site may control an initiation of translation.[23][24]

AUREs

Adenylate–uridylate rich elements (Chen and Shyu, Class I)

Adenylate–uridylate rich elements (Chen and Shyu, Class II)

Adenylate–uridylate rich elements (Chen and Shyu, Class III)

MERs

Constitutive decay elements

Cys
2
His
2
SP / Kruppel-like factor (KLF) transcription factor family

The Cys
2
His
2
-like fold group (Cys
2
His
2
) is by far the best-characterized class of zinc fingers, and is common in mammalian transcription factors, where such domains adopt a simple ββα fold and have the amino acid sequence motif:[25]

X2-Cys-X2,4-Cys-X12-His-X3,4,5-His

Alcohol dehydrogenase repressor 1

SP1M1s

SP1M2s

SP-1 (Sato)s

SP1 (Yao)s

YY1Ts

AP-2/EREBP-related factors

AGC boxes

AP-1 transcription factor network (Pathway)

Sixty-nine genes are included in the AP-1 transcription factor network (Pathway).[26]

AGCEs

Zinc finger DNA-binding domains

AnRE1s

AnDRE2s

AnREWs

B-boxes

Box Bs

β-Scaffold factors

"Higher animals have [transcription factor] TF genes for the basic domain, the β-scaffold factor, and other new structures; however, their total proportion is less than 15% and most are [zinc (Zn)-coordinating factor] ZF and [Helix-Turn-Helix] HTH genes."[27]

ATA boxes

Γ-interferon activated sequences

HMG boxes

Zn(II)2Cys6 proteins

"The transcription factors Uga3, Dal81 and Leu3 belong to the class III family (Zn(II)2Cys6 proteins), and they recognize highly related sequences rich in GGC triplets [15]."[28]

Dal81

GCC boxes

GGC triplets

GGCGGC triplets

Leu3

Uga3

Hairpin-hinge-hairpin-tail

"In addition to this ACA box, they have the consensus H box sequence (5'-ANANNA-3') but have no other primary sequence identity. Despite this lack of primary sequence conservation, the H and ACA boxes are embedded in an evolutionarily conserved hairpin-hinge-hairpin-tail core secondary structure with the H box in the single-stranded hinge region and the ACA box in the single-stranded tail (5, 16)."[29]

H and ACA boxes

H-boxes (Grandbastien)

H-boxes (Lindsay)

H boxes (Mitchell)

H boxes (Rozhdestvensky)

Unknown response element types

ACEs

BBCABW Inrs

Calcineurin-responsive transcription factors

Carbs

Carb1s

Cat8s

Cell-cycle box variants

CGCG boxes

Circadian control elements

Cold-responsive elements

Copper response elements

CuREQs
CuREPs

Cytoplasmic polyadenylation elements

DAF-16 binding elements

D box (Samarsky)

D box (Voronina)

D-box (Motojima)

dBRE

Downstream core elements

DCE SI

DCE SII

DCE SIII

DPE (Juven-Gershon)

DPE (Kadonaga)

DPE (Matsumoto)

EIN3 binding sites

Endosperm expressions

Estrogen response elements

ERE1s
ERE2s

GAAC elements

GC boxes (Briggs)

GC boxes (Ye)

GC boxes (Zhang)

GCR1s

GREs

GT boxes (Sato)

Hex sequences

HY boxes

IFNs

Inr-like, TCTs

IRF3s

IRSs

KAR2s

MBE1s

MBE2s

MBE3s

NF𝜿BSs

PREs

Pribs

RAREs

Rgts

ROREs

SERVs

STAT5s

STREs

Sucroses

TACTs

TAGteams

TAPs

TATAs

Examining the promoter regions upstream from ZSCAN22 to A1BG and downstream from ZNF497 to A1BG for TATA boxes has shown that TATA boxes in various forms are present and likely active or activable: (1) TATAAAA (Carninci 2006), (2) TATA(A/T)A(A/T) (Watson 2014), (3) TATA(A/T)AA(A/G) (Juven-Gershon 2010), and (4) TATA(A/T)A(A/T)(A/G) (Basehoar 2004).

The TATA boxes have the pattern of appearing in only the negative direction UTRs, proximal and distals. The shorter TATA box: TATAAA does appear as above but also in the positive direction as the complement inverse TTTATA at 2588 in the distal promoter.

TATABs

TATACs

TATAJs

TATAWs

TEAs

TECs

THRs

TRFs

UPREs

UPRE-1s

URS (Sumrada, core)

VDREs

XCPE1s

Yaps

YYRNWYY Inrs

A1BG orthologs

Geotrypetes seraphini

File:Geotrypetes seraphini 81151944.jpg
Geotrypetes seraphini, the Gaboon caecilian, is a species of amphibian. Credit: Marius Burger.{{free media}}

Geotrypetes seraphini, the Gaboon caecilian, is a species of amphibian in the family Dermophiidae.[30]

Its A1BG ortholog has 368 aa vs 495 aa for Homo sapiens.

ZSCAN22

  1. Gene ID: 342945 is ZSCAN22 zinc finger and SCAN domain containing 22 on 19q13.43.[31] ZSCAN22 is transcribed in the negative direction from LOC100887072.[31]
  2. Gene ID: 102465484 is MIR6806 microRNA 6806 on 19q13.43: "microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop."[32] MIR6806 is transcribed in the negative direction from LOC105372480.[32]

Of the some 111 gaps between genes on chromosome locus 19q13.43 as of 4 August 2020, gap number 88 is between ZSCAN22 and A1BG. But, there is no gap between ZNF497 and A1BG.

Promoters

The core promoter begins approximately -35 nts upstream from the transcription start site (TSS). For the numbered nucleotides between ZSCAN22 and A1BG the core promoter extends from 4425 nts up to 4460 nts (TSS). The proximal promoter extends from approximately -250 to the TSS or 4210 nts up to 4460 nts. The distal promoter begins at about 2460 nts and extends to about 4210 nts.

From the ZNF497 side the core promoter begins about 4265 nts up to 4300 nts, the proximal promoter from 4050 nts to 4265 nts, and the distal promoter from 2300 nts to 4050 nts.

Alpha-1-B glycoprotein

Def. "a substance that induces an immune response, usually foreign"[33] is called an antigen.

Def. any "substance that elicits [an] immune response"[34] is called an immunogen.

An antigen "or immunogen is a molecule that sometimes stimulates an immune system response."[35] But, "the immune system does not consist of only antibodies",[35] instead it "encompasses all substances that can be recognized by the adaptive immune system."[35]

Def. "a protein produced by B-lymphocytes that binds to [a specific antigen or][36] an antigen"[37] is called an antibody.

Five different antibody isotypes are known in mammals, which perform different roles, and help direct the appropriate immune response for each different type of foreign object they encounter.[38]

Although the general structure of all antibodies is very similar, a small region, known as the hypervariable region, at the tip of the protein is extremely variable, allowing millions of antibodies with slightly different tip structures to exist, where each of these variants can bind to a different target, known as an antigen.[39]

Def. "any of the glycoproteins in blood serum that respond to invasion by foreign antigens and that protect the host by removing pathogens;"[40] "an antibody"[41] is called an immunoglobulin.

Gene ID: 1 is A1BG alpha-1-B glycoprotein on 19q13.43, a 54.3 kDa protein in humans that is encoded by the A1BG gene.[42] A1BG is transcribed in the positive direction from ZNF497.[42] "The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins."[42]

  1. NP_570602.2 alpha-1B-glycoprotein precursor, cd05751 Location: 401 → 493 Ig1_LILRB1_like; First immunoglobulin (Ig)-like domain found in Leukocyte Ig-like receptors (LILR)B1 (also known as LIR-1) and similar proteins, smart00410 Location: 218 → 280 IG_like; Immunoglobulin like, pfam13895 Location: 210 → 301 Ig_2; Immunoglobulin domain and cl11960 Location: 28 → 110 Ig; Immunoglobulin domain.[42]

Patients who have pancreatic ductal adenocarcinoma show an overexpression of A1BG in pancreatic juice.[43]

Immunoglobulin supergene family

"𝛂1B-glycoprotein(𝛂1B) [...] consists of a single polypeptide chain N-linked to four glucosamine oligosaccharides. The polypeptide has five intrachain disulfide bonds and contains 474 amino acid residues. [...] 𝛂1B exhibits internal duplication and consists of five repeating structural domains, each containing about 95 amino acids and one disulfide bond. [...] several domains of 𝛂1B, especially the third, show statistically significant homology to variable regions of certain immunoglobulin light and heavy chains. 𝛂1B [...] exhibits sequence similarity to other members of the immunoglobulin supergene family such as the receptor for transepithelial transport of IgA and IgM and the secretory component of human IgA."[44]

"Some of the domains of 𝛂1B show significant homology to variable (V) and constant (C) regions of certain immunoglobulins. Likewise, there is statistically significant homology between 𝛂1B and the secretory component (SC) of human IgA (15) and also with the extracellular portion of the rabbit receptor for transepithelial transport of polymeric immunoglobulins (IgA and IgM). Mostov et al. (16) have called the later protein the poly-Ig receptor or poly-IgR and have shown that it is the precursor of SC."[44]

The immunoglobulin supergene family is "the group of proteins that have immunoglobulin-like domains, including histocompatibility antigens, the T-cell antigen receptor, poly-IgR, and other proteins involved in the vertebrate immune response (17)."[44]

"The internal homology in primary structure [...] and the presence of an intrasegment disulfide bond suggest that 𝛂1B is composed of five structural domains that arose by duplication of a primordial gene coding for about 95 amino acid residues."[44]

"Unlike immunoglobulins (25), ceruloplasmin (6), and hemopexin (7), 𝛂1B is not subject to limited interdomain cleavage by proteolytic enzymes. At least, we were not able to produce such fragments by use of a variety of proteases. This stability of 𝛂1B is probably associated with the frequency of proline in the sequences linking the domains [...]."[44]

"A peptide identified in the late and early milk proteomes showed homology to eutherian alpha 1B glycoprotein (A1BG), a plasma protein with unknown function46, as well as venom inhibitors characterised in the Southern opossum Didelphis marsupialis (DM43 and DM4647,48,49), all members of the immunoglobulin superfamily. To characterise the relationship between the peptide sequence identified in koala, A1BG, DM43 and DM46, a phylogenetic tree was constructed [...] including all marsupial and monotreme homologs (identified by BLAST), three phylogenetically representative eutherian sequences, with human IGSF1 and TARM1, related members of the immunoglobulin super family, used as outgroups. This phylogeny indicates that A1BG-like proteins in marsupials and the Didelphis antitoxic proteins are homologs of eutherian A1BG, with excellent bootstrap support (98%). The marsupial A1BG-like sequences and the Didelphis antitoxic proteins formed a single clade with strong bootstrap support (97%)."[45]

"Human TARM1 and IGSF1, related members of the immunoglobulin superfamily are used as outgroups. The tree was constructed using the maximum likelihood approach and the JTT model with bootstrap support values from 500 bootstrap tests. Bootstrap values less than 50% are not displayed. Accession numbers: Tasmanian devil (Sarcophilus harrisii; XP_012402143), Wallaby (Macropus eugenii; FY619507), Possum (Trichosurus vulpecula; DY596639) Virginia opossum (Didelphis virginiana; AAA30970, AAN06914), Southern opossum (Didelphis marsupialis; AAL82794, P82957, AAN64698), Human (Homo sapiens; P04217, B6A8C7, Q8N6C5), Platypus (Ornithorhychus anatinus; ENSOANP00000000762), Cow (Bos taurus; Q2KJF1), Alpaca (Vicugna pacos; XP_015107031)."[45]

"The sequences of 𝛂1B-glycoprotein (38) and chicken N-CAM (neural cell-adhesion molecule) (39) have been shown to be related to the immunoglobulin supergene family."[46]

A1BG contains the immunoglobulin domain: cl11960 and three immunoglobulin-like domains: pfam13895, cd05751 and smart00410.

"Immunoglobulin (Ig) domain [cl11960] found in the Ig superfamily. The Ig superfamily is a heterogenous group of proteins, built on a common fold comprised of a sandwich of two beta sheets. Members of this group are components of immunoglobulin, neuroglia, cell surface glycoproteins, such as, T-cell receptors, CD2, CD4, CD8, and membrane glycoproteins, such as, butyrophilin and chondroitin sulfate proteoglycan core protein. A predominant feature of most Ig domains is a disulfide bridge connecting the two beta-sheets with a tryptophan residue packed against the disulfide bond."[47]

"This domain [pfam13895] contains immunoglobulin-like domains."[48]

"Ig1_LILR_KIR_like: [cd05751] domain similar to the first immunoglobulin (Ig)-like domain found in Leukocyte Ig-like receptors (LILRs) and Natural killer inhibitory receptors (KIRs). This group includes LILRB1 (or LIR-1), LILRA5 (or LIR9), an activating natural cytotoxicity receptor NKp46, the immune-type receptor glycoprotein VI (GPVI), and the IgA-specific receptor Fc-alphaRI (or CD89). LILRs are a family of immunoreceptors expressed on expressed on T and B cells, on monocytes, dendritic cells, and subgroups of natural killer (NK) cells. The human LILR family contains nine proteins (LILRA1-3,and 5, and LILRB1-5). From functional assays, and as the cytoplasmic domains of various LILRs, for example LILRB1 (LIR-1), LILRB2 (LIR-2), and LILRB3 (LIR-3) contain immunoreceptor tyrosine-based inhibitory motifs (ITIMs) it is thought that LIR proteins are inhibitory receptors. Of the eight LIR family proteins, only LIR-1 (LILRB1), and LIR-2 (LILRB2), show detectable binding to class I MHC molecules; ligands for the other members have yet to be determined. The extracellular portions of the different LIR proteins contain different numbers of Ig-like domains for example, four in the case of LILRB1 (LIR-1), and LILRB2 (LIR-2), and two in the case of LILRB4 (LIR-5). The activating natural cytotoxicity receptor NKp46 is expressed in natural killer cells, and is organized as an extracellular portion having two Ig-like extracellular domains, a transmembrane domain, and a small cytoplasmic portion. GPVI, which also contains two Ig-like domains, participates in the processes of collagen-mediated platelet activation and arterial thrombus formation. Fc-alphaRI is expressed on monocytes, eosinophils, neutrophils and macrophages; it mediates IgA-induced immune effector responses such as phagocytosis, antibody-dependent cell-mediated cytotoxicity and respiratory burst."[49]

"IG domains [smart00410] that cannot be classified into one of IGv1, IGc1, IGc2, IG."[50] "𝛂1B-glycoprotein(𝛂1B) [...] consists of a single polypeptide chain N-linked to four glucosamine oligosaccharides. The polypeptide has five intrachain disulfide bonds and contains 474 amino acid residues. [...] 𝛂1B exhibits internal duplication and consists of five repeating structural domains, each containing about 95 amino acids and one disulfide bond. [...] several domains of 𝛂1B, especially the third, show statistically significant homology to variable regions of certain immunoglobulin light and heavy chains. 𝛂1B [...] exhibits sequence similarity to other members of the immunoglobulin supergene family such as the receptor for transepithelial transport of IgA and IgM and the secretory component of human IgA."[44]

A1BG protein species

Def. a "group of plants or animals having similar appearance"[51] or "the largest group of organisms in which [any][52] two individuals [of the appropriate sexes or mating types][52] can produce fertile offspring, typically by sexual reproduction"[53] is called a species.

The gene contains 20 distinct introns.[54] Transcription produces 15 different mRNAs, 10 alternatively spliced variants and 5 unspliced forms.[54] There are 4 probable alternative promoters, 4 non overlapping alternative last exons and 7 validated alternative polyadenylation sites.[54] The mRNAs appear to differ by truncation of the 5' end, truncation of the 3' end, presence or absence of 4 cassette exons, overlapping exons with different boundaries, splicing versus retention of 3 introns.[54]

Variants or isoforms

Def. a "different sequence of a gene (locus)"[55] is called a variant.

Def. any "of several different forms of the same protein, arising from either single nucleotide polymorphisms,[56] differential splicing of mRNA, or post-translational modifications (e.g. sulfation, glycosylation, etc.)"[57] is called an isoform.

Regarding additional isoforms, mention has been made of "new genetic variants of A1BG."[58]

"Proteomic analysis revealed that [a circulating] set of plasma proteins was α 1 B-glycoprotein (A1BG) and its post-translationally modified isoforms."[59]

Pharmacogenomic variants have been reported.[60]

Genotypes

Def. the "part (DNA sequence) of the genetic makeup of an organism which determines a specific characteristic (phenotype) of that organism"[61] or a "group of organisms having the same genetic constitution" [62]is called a genotype.

There are A1BG genotypes.[60]

A1BG has a genetic risk score of rs893184.[60]

"A genetic risk score, including rs16982743, rs893184, and rs4525 in F5, was significantly associated with treatment-related adverse cardiovascular outcomes in whites and Hispanics from the INVEST study and in the Nordic Diltiazem study (meta-analysis interaction P=2.39×10−5)."[60]

Polymorphs

Def. the "regular existence of two or more different genotypes within a given species or population; also, variability of amino acid sequences within a gene's protein"[63] is called polymorphism.

Def. "one of a number of alternative forms of the same gene occupying a given position, [or locus],[64] on a chromosome"[65] is called an allele.

"rs893184 causes a histidine (His) to arginine (Arg) [nonsynonymous single nucleotide polymorphism (nsSNP), A (minor) for G (major)] substitution at amino acid position 52 in A1BG."[60]

"Genetic polymorphism of human plasma (serum) alpha 1B-glycoprotein (alpha 1B) was observed using one-dimensional horizontal polyacrylamide gel electrophoresis (PAGE) pH 9.0 of plasma samples followed by Western blotting with specific antiserum to alpha 1B."[66]

A1B*5 is a "new allele [...] of human plasma 𝜶1B-glycoprotein [...]."[67]

"Genetic polymorphism of human plasma 𝜶1B-glycoprotein (𝜶1B) was reported first, in brief, by Altland et al. [1983; also given in Altkand and Hacklar, 1984]. A detailed description of human 𝜶1B polymorphism was reported in subsequent studies [Gahne et al., 1987; Juneja et al., 1988, 1989]. Five different 𝜶1B alleles (A1B*1, A1B*2, A1B*3, A1B*4 and A1B*5) were reported. In Caucasian whites, the frequencies of A1B*1 and ''A1B*2 were about 0.95 and 0.05, respectively. A1B*4 was observed in 2 related Czech individuals. In American blacks, A1B*1 and A1B*2 occurred with a frequency of 0.73 and 0.21, respectively, while a new allele, viz, A1B*3 had a frequency of 0.06. A1B*5 was observed only in Swedish Lapps and in Finns with a frequency of 0.04 and 0.007, respectively."[68]

"The frequency of A1B*1 varied from 0.89 to 0.91 and that of A1B*2 from 0.08 to 0.10. The A1B*3 allele, reported previously only in American blacks, was observed with a frequency range of 0.003-0.01 in 3 of the Chinese populations, in Koreans and in Malays. A new 𝜶1B allele (A1B*6) was observed in 2 Chinese individuals."[68]

Phenotypes

Def. the "appearance of an organism based on a single trait [multifactorial combination of genetic traits and environmental factors][69], especially used in pedigrees"[70] or any "observable characteristic of an organism, such as its morphological, developmental, biochemical or physiological properties, or its behavior"[71] is called a phenotype.

"The three different phenotypes of α1B observed (designated 1-1, 1-2, and 2-2) were apparently identical to those reported by Altland et al. (1983), who used double one-dimensional electrophoresis. Family data supported the hypothesis that the three α1B phenotypes are determined by two codominant alleles at an autosomal locus, designated A1B. Allele frequencies in a Swedish population were: A1B *1, 0.937; A1B *2, 0.063; PIC, 0.111."[66]

Protein species

"Both protein species of [alpha 1-beta glycoprotein] A1B (A1Ba, p = 0.008; f.c.= +1.62, A1Bb, p = 0.003; f.c. = +1.82) [...] were apparently overexpressed in patients with PTCa [...]."[72]

A1BG is mainly produced in the liver, and is secreted to plasma to levels of approximately 0.22 mg/mL.[44]

CRISPs

The human cysteine-rich secretory protein (CRISP3) "is present in exocrine secretions and in secretory granules of neutrophilic granulocytes and is believed to play a role in innate immunity."[73] CRISP3 has a relatively high content in human plasma.[73]

"The A1BG-CRISP-3 complex is noncovalent with a 1:1 stoichiometry and is held together by strong electrostatic forces."[73] "Similar [complex formation] between toxins from snake venom and A1BG-like plasma proteins ... inhibits the toxic effect of snake venom metalloproteinases or myotoxins and protects the animal from envenomation."[73]

Opossums have a remarkably robust immune system, and show partial or total immunity to the venom of rattlesnakes, Agkistrodon piscivorus, cottonmouths, and other Crotalinae, pit vipers.[74][75]

"Crisp3 [is] mainly [expressed] in the salivary glands, pancreas, and prostate."[76] "CRISP3 is highly expressed in the human cauda epididymidis and ampulla of vas deferens (Udby et al. 2005)."[76]

A1BG-AS1

Gene ID: 503538 is A1BG-AS1 A1BG antisense RNA 1.[77] A1BG-AS1 is transcribed in the negative direction from ZSCAN22.[77]

Gene ID 503538 extends from 58,351,390 to 58,355,183. It is a long, non-coding (lnc) RNA.[78] Extensive evidence indicates that long noncoding RNAs (lncRNAs) regulate the tumorigenesis and progression of hepatocellular carcinoma (HCC).[78]

The underexpression of A1BG-AS1 was found in HCC via analysis of The Cancer Genome Atlas database.[78] A1BG-AS1 expression in HCC was markedly lower than that in noncancerous tissues.[78]

ZNF497

Gene ID: 162968 is ZNF497 zinc finger protein 497.[79] ZNF497 is transcribed in the positive direction from RNA5SP473.[79]

  1. NP_001193938.1 zinc finger protein 497: "Transcript Variant: This variant (2) lacks an alternate exon in the 5' UTR, compared to variant 1. Variants 1 and 2 encode the same protein."[79]
  2. NP_940860.2 zinc finger protein 497: "Transcript Variant: This variant (1) is the longer transcript. Variants 1 and 2 encode the same protein."[79]

Gene ID: 100419840 is LOC100419840 zinc finger protein 446 pseudogene.[80] LOC100419840 may be transcribed in the positive direction from LOC105372483.[80]

Gene ID: 105372483 is LOC105372483 uncharacterized LOC105372483 ncRNA.[81] LOC105372483 is transcribed in the negative direction from LOC100419840.[81]

Gene ID: 106479017 is RNA5SP473 RNA, 5S ribosomal pseudogene 473.[82] RNA5SP473 may be transcribed in the negative direction from ZNF497.[82]

GC contents

Approximately "76% of human core promoters lack TATA-like elements, have a high GC content, and are enriched in Sp1 binding sites."[83]

CpG islands typically occur at or near the transcription start site of genes, particularly housekeeping genes, in vertebrates.[84]

The number of CG or GC pairs near the TSS for A1BG appears to be low: between ZSCAN22 and A1BG are 8.2 % CG/GC and between ZNF497 and A1BG are 15 % CG/GC.

19q13.43

Regulatory elements and regions

Functions of A1BG

"Receptors of the leukocyte receptor cluster (LRC) play a range of important functions in the human immune system."[85]

"The leukocyte receptor cluster (LRC) is a family of structurally related genes for immunoregulatory receptors. Originally, the term LRC was introduced to emphasize the linkage of the genes encoding killer immunoglobulin-like receptors (KIRs), leukocyte Ig-like receptors (LILRs), and FcαR on human chromosome 19q13.4 (Wagtmann et al. 1997; Wende et al. 1999). Subsequently, it has been found that the region contains some other structurally related genes, such as NCR1, GPVI, LAIR1, LAIR2, and OSCAR (Meyaard et al. 1997; Sivori et al. 1997; Clemetson et al. 1999; Kim et al. 2002). Most recently, the LRC has been further extended by adding two more genes named VSTM1/SIRL1 and TARM1 (Steevels et al. 2010; Radjabova et al. 2015)."[85]

"Except for LAIR2, which is a secreted protein, all human LRC products are type I cell surface receptors with extracellular regions composed of 1–4 C2-type Ig-like domains."[85]

The "eutherian LRC family, in addition to commonly recognized members, includes two new, IGSF1 and alpha-1-B glycoprotein (A1BG)."[85]

"Nucleotide sequences were retrieved and analyzed using utilities at the NCBI (https://www.ncbi.nlm.nih.gov/, last accessed May 20, 2019) and Ensemble (http://www.ensembl.org, last accessed May 20, 2019) websites."[85]

"In our previous studies, it was observed that the Ig-like domains of the frog and chicken LRC proteins reproducibly showed homology not only to known LRC members but also to the products of four mammalian genes that to our knowledge have never been considered in the phylogenetic analyses of LRC. These genes are VSTM1, TARM1, A1BG, and IGSF1. VSTM1 and TARM1 are the most recently identified members of the human LRC (Steevels et al. 2010; Radjabova et al. 2015). A1BG encodes alpha-1 B glycoprotein, a soluble component of mammalian blood plasma that is known for half a century (Schultze et al. 1963). The protein is composed of five Ig-like domains and has been shown to bind to CRISP-3, a small polypeptide that is present in exocrine secretions of neutrophilic granulocytes and that is believed to play a role in innate immunity (Udby et al. 2004). In the human genome, A1BG maps to 19q13.4 some 3.3 Mb away from GPVI [...]."[85]

"The attribution of IGSF1 and A1BG domains to the LRC was supported by their 3D structures predicted using homology modeling [...]."[85]

"Noteworthy is that the D1 and D6 domains of IgSF1 fall into one clade with the N-terminal (d1) domains of A1BG and OSCAR (cluster B1). Closer relationship of A1BG and OSCAR was supported by clustering of the d2–d5 domains of A1BG with membrane-proximal (d2) domain of OSCAR (cluster B2)."[85]

"Altogether, these results support the attribution of IGSF1 and A1BG to the LRC and suggest their relatedness to OSCAR, TARM1, and VSTM1."[85]

"Clustering of the N-terminal domains of OSCAR, IGSF1, and A1BG with each other and with IGSF1 d6 was also reproduced. Finally, the d2 domains of OSCAR cluster with the d2–d5 domains of A1BG (fig. 5). These results further justify grouping IGSF1, A1BG, OSCAR, TARM1, and VSTM1 into a distinct group B."[85]

Hypotheses

  1. Downstream core promoters may work as transcription factors even as their complements or inverses.
  2. In addition to the DNA binding sequences listed above, the transcription factors that can open up and attach through the local epigenome need to be known and specified.
  3. Each DNA binding domain serving as a transcription factor for the promoter of any immunoglobulin supergene family member, also serves or is present in the promoters for A1BG.
  4. The function of A1BG is the same as other immunoglobulin genes possessing the immunoglobulin domain cl11960 and/or any of three immunoglobulin-like domains: pfam13895, cd05751 and smart00410 in the order and nucleotide sequence: cd05751 Location: 401 → 493, smart00410 Location: 218 → 280, pfam13895 Location: 210 → 301 and cl11960 Location: 28 → 110.

See also

References

  1. "Entrez Gene: Alpha-1-B glycoprotein". Retrieved 2012-11-09.
  2. 2.0 2.1 "A1BG alpha-1-B glycoprotein". Retrieved May 10, 2013.
  3. Qingliang Li, Rezaul M. Karim, Mo Cheng, Mousumi Das, Lihong Chen, Chen Zhang, Harshani R. Lawrence, Gary W. Daughdrill, Ernst Schonbrunn, Haitao Ji and Jiandong Chen (July 2020). "Inhibition of p53 DNA binding by a small molecule protects mice from radiation toxicity". Oncogene. 39 (29): 5187–5200. doi:10.1038/s41388-020-1344-y. PMID 32555331 Check |pmid= value (help). Retrieved 29 August 2020.
  4. Ruoyi Gu, Jun Xu, Yixiang Lin, Jing Zhang, Huijun Wang, Wei Sheng, Duan Ma, Xiaojing Ma & Guoying Huang (July 2016). "Liganded retinoic acid X receptor α represses connexin 43 through a potential retinoic acid response element in the promoter region". Pediatric Research. 80 (1): 159–168. doi:10.1038/pr.2016.47. PMID 26991262. Retrieved 7 September 2020.
  5. U.S. National Library of Medicine (8 July 2008). "Response Elements MeSH Descriptor Data 2021". 8600 Rockville Pike, Bethesda, MD 20894: National Institutes of Health. Retrieved 22 April 2021.
  6. Benjamin A. Pierce (24 December 2004). Control of Gene Expression, In: Genetics Solutions and Problem Solving MegaManual. Macmillan. p. 221. Retrieved 22 April 2021.
  7. Tania Islas-Flores, Gabriel Guillén, Xóchitl Alvarado-Affantranger, Miguel Lara-Flores, Federico Sánchez, and Marco A. Villanueva (2011). "PvRACK1 Loss-of-Function Impairs Cell Expansion and Morphogenesis in Phaseolus vulgaris L. Root Nodules". Molecular Plant-Microbe Interactions. 24 (7): 819–826. doi:10.1094/MPMI-11-10-0261. Retrieved 25 April 2021.
  8. Nijhawan A, Jain M, Tyagi AK, Khurana JP (February 2008). "Genomic survey and gene expression analysis of the basic leucine zipper transcription factor family in rice". Plant Physiology. 146 (2): 333–50. doi:10.1104/pp.107.112821. PMID 18065552.
  9. Randy Foster, Takeshi Izawa and Nam-Hai Chua (1 February 1994). "Plant bZIP proteins gather at ACGT elements". FASEB. 8 (2): 192–200. doi:10.1096/fasebj.8.2.8119490. PMID 8119490. Retrieved 25 June 2021.
  10. Ganesh M. Nawkar, Chang Ho Kanga, Punyakishore Maibam, Joung Hun Park, Young Jun Jung, Ho Byoung Chae, Yong Hun Chi, In Jung Jung, Woe Yeon Kim, Dae-Jin Yun, and Sang Yeol Lee (21 February 2017). "HY5, a positive regulator of light signaling, negatively controls the unfolded protein response in Arabidopsis" (PDF). Proceedings of the National Academy of Sciences USA. 114 (8): 2084–89. doi:10.1073/pnas.1609844114. Retrieved 24 June 2021.
  11. RefSeq (January 2016). "MYB MYB proto-oncogene, transcription factor [ Homo sapiens (human) ]". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 7 February 2021.
  12. 12.0 12.1 12.2 12.3 Matthew J. Rossi, William K.M. Lai and B. Franklin Pugh (21 March 2018). "Genome-wide determinants of sequence-specific DNA binding of general regulatory factors". Genome Research. 28: 497–508. doi:10.1101/gr.229518.117. PMID 29563167. Retrieved 31 August 2020.
  13. Dalei Shao, Caretha L. Creasy, Lawrence W. Bergman (1 February 1998). "A cysteine residue in helixII of the bHLH domain is essential for homodimerization of the yeast transcription factor Pho4p". Nucleic Acids Research. 26 (3): 710–4. doi:10.1093/nar/26.3.710. PMC 147311. PMID 9443961.
  14. Hongting Tang, Yanling Wu, Jiliang Deng, Nanzhu Chen, Zhaohui Zheng, Yongjun Wei, Xiaozhou Luo, and Jay D. Keasling (6 August 2020). "Promoter Architecture and Promoter Engineering in Saccharomyces cerevisiae". Metabolites. 10 (8): 320–39. doi:10.3390/metabo10080320. PMID 32781665 Check |pmid= value (help). Retrieved 18 September 2020.
  15. Chaudhary J, Skinner MK (1999). "Basic helix-loop-helix proteins can act at the E-box within the serum response element of the c-fos promoter to influence hormone-induced promoter activation in Sertoli cells". Mol. Endocrinol. 13 (5): 774–86. doi:10.1210/mend.13.5.0271. PMID 10319327.
  16. Nibedita Lenka, Aruna Basu, Jayati Mullick, and Narayan G. Avadhani (22 November 1996). "The role of an E box binding basic helix loop helix protein in the cardiac muscle-specific expression of the rat cytochrome oxidase subunit VIII gene" (PDF). The Journal of Biological Chemistry. 271 (47): 30281–30289. doi:10.1074/jbc.271.47.30281. Retrieved 7 February 2019.
  17. Hoek KS, Schlegel NC, Eichhoff OM, Widmer DS, Praetorius C, Einarsson SO, Valgeirsdottir S, Bergsteinsdottir K, Schepsky A, Dummer R, Steingrimsson E (2008). "Novel MITF targets identified using a two-step DNA microarray strategy". Pigment Cell Melanoma Res. 21 (6): 665–76. doi:10.1111/j.1755-148X.2008.00505.x. PMID 19067971.
  18. Ravi P. Misra; Azad Bonni; Cindy K. Miranti; Victor M. Rivera; Morgan Sheng; Michael E.Greenberg (14 October 1994). "L-type Voltage-sensitive Calcium Channel Activation Stimulates Gene Expression by a Serum Response Factor-dependent Pathway" (PDF). The Journal of Biological Chemistry. 269 (41): 25483–25493. PMID 7929249. Retrieved 7 December 2019.
  19. Xiaomin Zhang, Gohar Azhar, Jeanne Y. Wei (21 December 2017). "SIRT2 gene has a classic SRE element, is a downstream target of serum response factor and is likely activated during serum stimulation". PLOS One. 12 (12): e0190011. doi:10.1371/journal.pone.0190011. Retrieved 23 February 2021.
  20. Amanda Salviano-Silva, Maria Luiza Petzl-Erler & Angelica Beate Winter Boldt (29 April 2017). "CD59 polymorphisms are associated with gene expression and different sexual susceptibility to pemphigus foliaceus". Autoimmunity. 50 (6): 377–385. doi:10.1080/08916934.2017.1329830. Retrieved 27 September 2021.
  21. Svoboda, P., & Cara, A. (2006). Hairpin RNA: A secondary structure of primary importance. Cellular and Molecular Life Sciences, 63(7), 901-908.
  22. Meyer, Michelle; Deiorio-Haggar K; Anthony J (July 2013). "RNA structures regulating ribosomal protein biosynthesis in bacilli". RNA Biology. 7. 10: 1160–1164. doi:10.4161/rna.24151. PMID 23611891.
  23. Malys N, Nivinskas R (2009). "Non-canonical RNA arrangement in T4-even phages: accommodated ribosome binding site at the gene 26-25 intercistronic junction". Mol Microbiol. 73 (6): 1115–1127. doi:10.1111/j.1365-2958.2009.06840.x. PMID 19708923.
  24. Malys N, McCarthy JEG (2010). "Translation initiation: variations in the mechanism can be anticipated". Cellular and Molecular Life Sciences. 68 (6): 991–1003. doi:10.1007/s00018-010-0588-z. PMID 21076851.
  25. Pabo CO, Peisach E, Grant RA (2001). "Design and selection of novel Cys2His2 zinc finger proteins". Annual Review of Biochemistry. 70: 313–40. doi:10.1146/annurev.biochem.70.1.313. PMID 11395410.
  26. NCBI (9 March 2021). "AP-1 transcription factor network". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 26 October 2021.
  27. Toshifumi Nagata, Aeni Hosaka-Sasaki and Shoshi Kikuchi (2016). Daniel H. Gonzalez, ed. The Evolutionary Diversification of Genes that Encode Transcription Factor Proteins in Plants, In: Plant Transcription Factors Evolutionary, Structural and Functional Aspects. Academic Press. pp. 73–97. doi:10.1016/B978-0-12-800854-6.00005-1. ISBN 978-0-12-800854-6. Retrieved 28 November 2021.
  28. Marcos Palavecino-Ruiz, Mariana Bermudez-Moretti, Susana Correa-Garcia (1 November 2017). "Unravelling the transcriptional regulation of Saccharomyces cerevisiae UGA genes: the dual role of transcription factor LEU3" (PDF). Microbiology. doi:10.1099/mic.0.000560. Retrieved 21 February 2021.
  29. James R. Mitchell, Jeffrey Cheng, ang Kathleen Collins (January 1999). "A Box H/ACA Small Nucleolar RNA-Like Domain at the Human Telomerase RNA 3' End" (PDF). Molecular and Cellular Biology. 19 (1): 567–576. Retrieved 5 November 2018.
  30. IUCN SSC Amphibian Specialist Group (2019). "Geotrypetes seraphini". 2019: e.T59557A16957715. doi:10.2305/IUCN.UK.2019-1.RLTS.T59557A16957715.en. Retrieved 16 November 2021.
  31. 31.0 31.1 HGNC (13 March 2020). "ZSCAN22 zinc finger and SCAN domain containing 22 [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  32. 32.0 32.1 RefSeq (10 September 2009). "MIR6806 microRNA 6806 [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  33. Jag123 (7 March 2005). "antigen". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 7 March 2020.
  34. SemperBlotto (21 April 2008). "immunogen". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 8 March 2020.
  35. 35.0 35.1 35.2 C. Michael Gibson (27 April 2008). "Antigen". Boston, Massachusetts: WikiDoc Foundation. Retrieved 8 March 2020.
  36. Williamsayers79 (26 February 2007). "antibody". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 7 March 2020.
  37. Jag123 (7 March 2005). "antibody". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 7 March 2020.
  38. Eleonora Market, F. Nina Papavasiliou (2003). "V(D)J Recombination and the Evolution of the Adaptive Immune System". PLoS Biology. 1 (1): e16. doi:10.1371/journal.pbio.0000016.
  39. Charles A Janeway, Jr, Paul Travers, Mark Walport, and Mark J Shlomchik (2001). Immunobiolog (5th ed. ed.). Garland Publishing. ISBN 0-8153-3642-X.
  40. SemperBlotto (25 February 2006). "immunoglobulin". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 7 March 2020.
  41. SemperBlotto (28 April 2008). "immunoglobulin". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 7 March 2020.
  42. 42.0 42.1 42.2 42.3 RefSeq (10 December 2019). "A1BG alpha-1-B glycoprotein [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  43. Mei Tian, Ya-Zhou Cui, Guan-Hua Song, Mei-Juan Zong, Xiao-Yan Zhou, Yu Chen, Jin-Xiang Han (2008). "Proteomic analysis identifies MMP-9, DJ-1 and A1BG as overexpressed proteins in pancreatic juice from pancreatic ductal adenocarcinoma patients". BMC Cancer. 8: 241. doi:10.1186/1471-2407-8-241. PMC 2528014. PMID 18706098.
  44. 44.0 44.1 44.2 44.3 44.4 44.5 44.6 Noriaki Ishioka, Nobuhiro Takahashi, and Frank W. Putnam (April 1986). "Amino acid sequence of human plasma 𝛂1B-glycoprotein: Homology to the immunoglobulin supergene family" (PDF). Proceedings of the National Academy of Sciences USA. 83 (8): 2363–7. doi:10.1073/pnas.83.8.2363. PMID 3458201. Retrieved 9 March 2020.
  45. 45.0 45.1 Katrina M. Morris, Denis O’Meally, Thiri Zaw, Xiaomin Song, Amber Gillett, Mark P. Molloy, Adam Polkinghorne, and Katherine Belova (7 October 2016). "Characterisation of the immune compounds in koala milk using a combined transcriptomic and proteomic approach". Scientific Reports. 6: 35011. doi:10.1038/srep35011. PMID 27713568. Retrieved 14 March 2020.
  46. R. J. Paxton, G. Mooser, H. Pande, T. D. Lee, and J. E. Shively (1 February 1987). "Sequence analysis of carcinoembryonic antigen: identification of glycosylation sites and homology with the immunoglobulin supergene family" (PDF). Proceedings of the National Academy of Sciences USA. 84 (4): 920–924. doi:10.1073/pnas.84.4.920. PMID 3469650. Retrieved 26 March 2020.
  47. NCBI (2 February 2016). "Conserved Protein Domain Family cl11960: Ig Superfamily". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 22 May 2020.
  48. NCBI (5 August 2015). "Conserved Protein Domain Family pfam13895: Ig_2". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 24 May 2020.
  49. NCBI (16 August 2016). "Conserved Protein Domain Family cd05751: Ig1_LILR_KIR_like". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 24 May 2020.
  50. NCBI (16 January 2013). "Conserved Protein Domain Family smart00410: IG_like". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 24 May 2020.
  51. 24.98.118.180 (28 February 2007). "species". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  52. 52.0 52.1 Peter coxhead (22 August 2018). "Species". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  53. Chiswick Chap (1 December 2016). "Species". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  54. 54.0 54.1 54.2 54.3 "AceView: A1BG". Retrieved May 11, 2013.
  55. Pdeitiker (26 July 2008). "variant". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  56. SemperBlotto (6 January 2007). "isoform". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2 December 2018.
  57. 72.178.245.181 (30 November 2008). "isoform". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2 December 2018.
  58. H Eiberg, ML Bisgaard, J Mohr (1 December 1989). "Linkage between alpha 1B-glycoprotein (A1BG) and Lutheran (LU) red blood group system: assignment to chromosome 19: new genetic variants of A1BG". Clinical genetics. 36 (6): 415–8. PMID 2591067. Retrieved 2017-10-08.
  59. John R. Stehle Jr., Mark E. Weeks, Kai Lin, Mark C. Willingham, Amy M. Hicks, John F. Timms, Zheng Cui (January 2007). "Mass spectrometry identification of circulating alpha-1-B glycoprotein, increased in aged female C57BL/6 mice". Biochimica et Biophysica Acta (BBA) - General Subjects. 1770 (1): 79–86. doi:10.1016/j.bbagen.2006.06.020. PMID 16945486. Retrieved 2017-10-08.
  60. 60.0 60.1 60.2 60.3 60.4 Caitrin W. McDonough, Yan Gong, Sandosh Padmanabhan, Ben Burkley, Taimour Y. Langaee, Olle Melander, Carl J. Pepine, Anna F. Dominiczak, Rhonda M. Cooper-DeHoff, and Julie A. Johnson (June 2013). "Pharmacogenomic Association of Nonsynonymous SNPs in SIGLEC12, A1BG, and the Selectin Region and Cardiovascular Outcomes" (PDF). Hypertension. 62 (1): 48–54. doi:10.1161/HYPERTENSIONAHA.111.00823. PMID 23690342. Retrieved 2017-10-08.
  61. DTLHS (10 January 2018). "genotype". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  62. SemperBlotto (22 October 2005). "genotype". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  63. Widsith (28 March 2012). "polymorphism". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  64. 217.105.66.98 (8 September 2016). "allele". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  65. 138.130.33.215 (7 April 2004). "allele". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  66. 66.0 66.1 B. Gahne, R. K. Juneja, and A. Stratil (June 1987). "Genetic polymorphism of human plasma alpha 1B-glycoprotein: phenotyping by immunoblotting or by a simple method of 2-D electrophoresis". Human Genetics. 76 (2): 111–5. doi:10.1007/bf00284904. PMID 3610142. Retrieved 25 March 2020.
  67. R.K. Juneja, G. Beckman, M. Lukka, B. Gahne, and C. Ehnholm (1989). "Plasma α1B-Glycoprotein Allele Frequencies in Finns and Swedish Lapps: Evidence for a New α1B Allele". Human Heredity. 39 (1): 32–36. doi:10.1159/000153828. PMID 2759622. Retrieved 25 March 2020.
  68. 68.0 68.1 R.K. Juneja, N. Saha, B. Gahne and J.S.H. Tay (1989). "Distribution of Plasma Alpha-1-B-Glycoprotein Phenotypes in Several Mongoloid Populations of East Asia". Human Heredity. 39: 218–222. doi:10.1159/000153863. PMID 2583734. Retrieved 25 March 2020.
  69. 24.235.196.118 (23 September 2007). "phenotype". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2016-10-04.
  70. SemperBlotto (14 February 2005). "phenotype". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2016-10-04.
  71. N2e (3 July 2008). "phenotype". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2016-10-04.
  72. Mardiaty Iryani Abdullah, Ching Chin Lee, Sarni Mat Junit, Khoon Leong Ng, and Onn Haji Hashim (13 September 2016). "Tissue and serum samples of patients with papillary thyroid cancer with and without benign background demonstrate different altered expression of proteins". Peer J. 4: e2450. doi:10.7717/peerj.2450. PMID 27672505. Retrieved 15 March 2020.
  73. 73.0 73.1 73.2 73.3 Udby L, Sørensen OE, Pass J, Johnsen AH, Behrendt N, Borregaard N, Kjeldsen L. (12 October 2004). "Cysteine-rich secretory protein 3 is a ligand of alpha1B-glycoprotein in human plasma". Biochemistry. 43 (40): 12877–86. doi:10.1021/bi048823e. PMID 15461460. Retrieved 2011-11-28.
  74. "The Opossum: Our Marvelous Marsupial, The Social Loner". Wildlife Rescue League.
  75. Journal Of Venomous Animals And Toxins – Anti-Lethal Factor From Opossum Serum Is A Potent Antidote For Animal, Plant And Bacterial Toxins. Retrieved 2009-12-29.
  76. 76.0 76.1 B Haendler, J Krätzschmar, F Theuring and W D Schleuning (July 1993). "Transcripts for cysteine-rich secretory protein-1 (CRISP-1; DE/AEG) and the novel related CRISP-3 are expressed under androgen control in the mouse salivary gland". Endocrinology. 133 (1): 192–8. doi:10.1210/en.133.1.192. PMID 8319566. Retrieved 2012-02-20.
  77. 77.0 77.1 HGNC (10 December 2019). "A1BG-AS1 A1BG antisense RNA 1 [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  78. 78.0 78.1 78.2 78.3 Jigang Bai, Bowen Yao, Liang Wang, Liankang Sun, Tianxiang Chen, Runkun Liu, Guozhi Yin, Qiuran Xu, Wei Yang (June 2019). "lncRNA A1BG-AS1 suppresses proliferation and invasion of hepatocellular carcinoma cells by targeting miR-216a-5p". 120 (6): 10310–10322. doi:10.1002/jcb.28315. PMID 30556161. Retrieved 16 May 2023.
  79. 79.0 79.1 79.2 79.3 HGNC (10 December 2019). "ZNF497 zinc finger protein 497 [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  80. 80.0 80.1 HGNC (10 December 2019). "LOC100419840 zinc finger protein 446 pseudogene [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  81. 81.0 81.1 HGNC (10 December 2019). "LOC105372483 uncharacterized LOC105372483 [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  82. 82.0 82.1 HGNC (10 December 2019). "RNA5SP473 RNA, 5S ribosomal pseudogene 473 [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  83. Chuhu Yang, Eugene Bolotin, Tao Jiang, Frances M. Sladek, Ernest Martinez. (2007). "Prevalence of the initiator over the TATA box in human and yeast genes and identification of DNA motifs enriched in human TATA-less core promoters". Gene. 389 (1): 52–65. doi:10.1016/j.gene.2006.09.029. PMID 17123746. Unknown parameter |month= ignored (help)
  84. Saxonov S, Berg P, Brutlag DL (2006). "A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters". Proc Natl Acad Sci USA. 103 (5): 1412–1417. doi:10.1073/pnas.0510310103. PMC 1345710. PMID 16432200.
  85. 85.0 85.1 85.2 85.3 85.4 85.5 85.6 85.7 85.8 85.9 Sergey V Guselnikov and Alexander V Taranin (1 June 2019). "Unraveling the LRC Evolution in Mammals: IGSF1 and A1BG Provide the Keys". Genome Biology and Evolution. 11 (6): 1586–1601. doi:10.1093/gbe/evz102. PMID 31106814. |access-date= requires |url= (help)

External links

{{Phosphate biochemistry}}