Complex locus A1BG and ZNF497: Difference between revisions

Jump to navigation Jump to search
mNo edit summary
 
Line 6: Line 6:
| accessdate = 2012-11-09 }}</ref> The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins.
| accessdate = 2012-11-09 }}</ref> The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins.


A1BG is located on the negative DNA strand of [[chromosome 19]] from 58,858,172 – 58,864,865.<ref name="A1BG alpha-1-B glycoprotein">{{ cite web
A1BG was located on the DNA strand of [[chromosome 19]].<ref name=A1BG>{{ cite web
|title=A1BG alpha-1-B glycoprotein
|title=A1BG alpha-1-B glycoprotein
|url=https://www.ncbi.nlm.nih.gov/gene/1
|url=https://www.ncbi.nlm.nih.gov/gene/1
|accessdate=May 10, 2013 }}</ref> Additionally, A1BG is located directly adjacent to the ZSCAN22 gene (58,838,385-58,853,712) on the positive DNA strand, as well as the ZNF837 (58,878,990 - 58,892,389, complement) and ZNF497 (58865723 - 58,874,214, complement) genes on the negative strand.<ref name="A1BG alpha-1-B glycoprotein"/>
|accessdate=May 10, 2013 }}</ref> Additionally, A1BG, in current nucleotide numbering (58,345,183-58,353,492), is located adjacent to the ZSCAN22 gene (58,326,994-58,342,332) on the positive DNA strand, as well as the ZNF837 (58,367,623 - 58,381,030, complement) and ZNF497 (58,354,357 - 58,362,751, complement) genes on the negative strand.<ref name=A1BG/>
 
In the current nucleotide numbering, the A1BG untranslated region (UTR) has been expanded so that with ZSCAN22 ending at 58,342,332, the nucleotides used in this study are 58,342,333 to 58,346,892 on both strands, with the current UTR for A1BG beginning at 58,345,183. On the other side of A1BG ending at 58,353,492, the nucleotides used are 58,353,493 to 58,357,937. With ZNF497 beginning at 58,354,357, this study goes into ZNF497 to 58,357,937 or 3580 nucleotides from its downstream TSS or 4445 nucleotides from the TSS of A1BG downstream from ZNF497.
 
For example, an abscisic acid responsive element (ABRE) with the consensus sequence of  ACGTG(G/T)C (Watanabe ''et al''. 2017) occurs in the positive strand in the negative direction from ZSCAN22 to A1BG as ACGTGGC ending at 4239 nucleotides from the end of ZSCAN22 or 58,346,571, where the A is at 58,346,565 inside the UTR of A1BG.
 
==Introduction==
 
"Many important disease-related pathways utilize transcription factors that specifically bind DNA (e.g., c-Myc, HIF-1, TCF1, p53) as key nodes or endpoints in complex signaling networks. In such cases the transcription factor itself is often the most attractive target. However, drugging transcription factors is challenging owing to an absence of small ligand binding sites in their DNA-binding domain and the presence of a highly charged DNA-binding surface [1]."<ref name=Li2020>{{ cite journal
|author=Qingliang Li, Rezaul M. Karim, Mo Cheng, Mousumi Das, Lihong Chen, Chen Zhang, Harshani R. Lawrence, Gary W. Daughdrill, Ernst Schonbrunn, Haitao Ji and Jiandong Chen
|title=Inhibition of p53 DNA binding by a small molecule protects mice from radiation toxicity
|journal=Oncogene
|date=July 2020
|volume=39
|issue=29
|pages=5187-5200
|url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7398576/
|arxiv=
|bibcode=
|doi=10.1038/s41388-020-1344-y
|pmid=32555331
|accessdate=29 August 2020 }}</ref>
 
If a specific gene appears to be involved in a disease-related or deleterious pathway being able to alter its expression so as to improve the person's health may be needed. To alter its expression constructively may require knowing what regulatory elements exist in the gene's nearby promoters.
 
==Response elements==
{{main|A1BG response element gene transcriptions}}
Identifying a bona fide response element is more difficult than a simple inspection. In order to attribute the response element to a candidate sequence, some observations have to be conducted using molecular, biological and biophysical methods and functional approaches. Findings may indicate that response element in the promoter is a functional element.<ref name=Gu>{{ cite journal
|author=Ruoyi Gu, Jun Xu, Yixiang Lin, Jing Zhang, Huijun Wang, Wei Sheng, Duan Ma, Xiaojing Ma & Guoying Huang
|title=Liganded retinoic acid X receptor α represses connexin 43 through a potential retinoic acid response element in the promoter region
|journal=Pediatric Research
|date=July 2016
|volume=80
|issue=1
|pages=159-168
|url=https://www.nature.com/articles/pr201647
|arxiv=
|bibcode=
|doi=10.1038/pr.2016.47
|pmid=26991262
|accessdate=7 September 2020 }}</ref>
 
A likely response element found by simple inspection may also be inactive due to methylation.
 
Response Elements: "Nucleotide sequences, usually upstream, which are recognized by specific regulatory transcription factors, thereby causing gene response to various regulatory agents. These elements may be found in both promoter and enhancer regions."<ref name=MeSHNote99>{{ cite web
|author=U.S. National Library of Medicine
|title=Response Elements MeSH Descriptor Data 2021
|publisher=National Institutes of Health
|location=8600 Rockville Pike, Bethesda, MD 20894
|date=8 July 2008
|url=https://meshb.nlm.nih.gov/record/ui?name=response%20element
|accessdate=22 April 2021 }}</ref>
 
"Under conditions of stress, a transcription activator protein binds to the response element and stimulates transcription. If the same response element sequence is located in the control regions of different genes, then these genes will be activated by the same stimuli, thus producing a coordinated response."<ref name=Pierce>{{ cite book
|author=Benjamin A. Pierce
|title=Control of Gene Expression, In: ''Genetics Solutions and Problem Solving MegaManual''
|publisher=Macmillan
|location=
|date=24 December 2004
|editor=
|pages=221
|url=https://books.google.com/books?id=sUaIpEvX9noC&pg=PA221&lpg=PA221&source=bl&ots=14s3Xszdsw&sig=ACfU3U0VV4HsN4ekDRZJO83hxj9QidiZ8w&hl=en&sa=X&ved=2ahUKEwisu8jByJPwAhXI8p4KHRgzDx4Q6AEwBHoECAEQAw#v=onepage&f=false
|arxiv=
|bibcode=
|doi=
|pmid=
|isbn=
|accessdate=22 April 2021 }}</ref>
 
===WD-40 repeat family===
{{main|WD-40 repeat family}}
"Receptor for activated C kinase (RACK1) is a highly conserved, eukaryotic protein of the WD-40 repeat family. [...] During ''Phaseolus vulgaris'' root development, RACK1 (PvRACK1) mRNA expression was induced by auxins, abscisic acid, cytokinin, and gibberellic acid."<ref name=Flores>{{ cite journal
|author=Tania Islas-Flores, Gabriel Guillén, Xóchitl Alvarado-Affantranger, Miguel Lara-Flores, Federico Sánchez, and Marco A. Villanueva
|title=PvRACK1 Loss-of-Function Impairs Cell Expansion and Morphogenesis in ''Phaseolus vulgaris'' L. Root Nodules
|journal=Molecular Plant-Microbe Interactions
|date=2011
|volume=24
|issue=7
|pages=819-826
|url=https://apsjournals.apsnet.org/doi/pdfplus/10.1094/MPMI-11-10-0261
|arxiv=
|bibcode=
|doi=10.1094/MPMI-11-10-0261
|pmid=
|accessdate=25 April 2021 }}</ref>
 
====Abscisic acid (ABA) response elements====
{{main|ABA-response element gene transcriptions#ABRE samplings}}
 
====Auxin response factors====
{{main|Auxin response factor gene transcriptions}}
 
=====ARFUs=====
{{main|Auxin response factor gene transcriptions#TGTCTC (Ulmasov) ARFbs samplings}}
 
=====ARFBs=====
{{main|Auxin response factor gene transcriptions#TGTCGG (Boer) ARFbs samplings}}
 
=====ARF2s=====
{{main|Auxin response factor gene transcriptions#ARF (Stigliani) samplings}}
 
=====ARF5s=====
{{main|Auxin response factor gene transcriptions#ARF5 samplings}}
 
====CAACTC regulatory elements====
{{main|CARE gene transcriptions}}
 
=====CAREs (Fan)=====
{{main|CARE gene transcriptions#CARE (Fan) sampling of A1BG promoters}}
 
=====CAREs (Garaeva)=====
{{main|CARE gene transcriptions#CARE (Garaeva) samplings}}
 
====Cytokinins====
{{main|Cytokinin response regulator gene transcriptions}}
 
=====ARR1s=====
{{main|Cytokinin response regulator gene transcriptions#ARR1 Cytokinin samplings}}
 
=====ARR10s=====
{{main|Cytokinin response regulator gene transcriptions#ARR10 Cytokinin samplings}}
 
=====ARR12s=====
{{main|Cytokinin response regulator gene transcriptions#ARR12 Cytokinin samplings}}
 
=====ARRFs=====
{{main|Cytokinin response regulator gene transcriptions#ARR (Ferreira) samplings}}
 
=====ARRR1s=====
{{main|Cytokinin response regulator gene transcriptions#ARR (Rashotte1) samplings}}
 
=====ARRR2s=====
{{main|Cytokinin response regulator gene transcriptions#ARR (Rashotte2) samplings}}
 
====Coupling elements====
{{main|Coupling element gene transcriptions}}
 
=====CE3Ws=====
{{main|Coupling element gene transcriptions#CE3 (Watanabe) samplings}}
 
=====CE3Ds=====
{{main|Coupling element gene transcriptions#CE3 (Ding) samplings}}
 
====EREs====
{{main|Ethylene responsive element gene transcriptions#ERE samplings}}
 
====Gibberellic acid response elements====
{{main|GARE gene transcriptions}}
 
=====GAREs=====
{{main|GARE gene transcriptions#GARE sampling of A1BG promoters}}
 
=====GAREL1s=====
{{main|GARE gene transcriptions#GARE-like 1 samplings}}
 
====Hypoxia response elements====
{{main|Hypoxia response element gene transcriptions}}
 
=====HIFs=====
{{main|Hypoxia response element gene transcriptions#Hypoxia-inducible factor samplings}}
 
=====HREs=====
{{main|Hypoxia response element gene transcriptions#Hypoxia response element samplings}}
 
=====CACAs=====
{{main|Hypoxia response element gene transcriptions#CACA samplings}}
 
====Pyrimidine boxes====
{{main|Pyrimidine box gene transcriptions|Nuclear factor of activated T cell gene transcriptions (NFAT)}}
 
====TAT boxes====
{{main|TAT box gene transcriptions}}
 
=====TATFs=====
{{main|TAT box gene transcriptions#TAT box (Fan) samplings}}
 
=====TATYs=====
{{main|TAT box gene transcriptions#TAT box (Yang) samplings}}
 
===General Regulatory Factors===
{{main|General regulatory factors}}
The following general regulatory factors occur in the promoters between ZSCAN22, A1BG and ZNF497 on human chromosome 19.
 
====Abfms====
{{main|Abf1 regulatory factor gene transcriptions}}
 
====Rap1s====
{{main|Rap1 regulatory factor gene transcriptions}}
 
====Reb1s====
{{main|Reb1 general regulatory factor gene transcriptions}}
 
====Tbf1s====
{{main|Tbf1 regulatory factor gene transcriptions}}
 
===Basic leucine zipper (bZIP) class response elements===
 
====A-boxes====
{{main|A box gene transcriptions}}
 
====ACGTs====
{{main|ACGT-containing element gene transcriptions#ACGT samplings}}
"A majority of the plant bZIP proteins isolated to date recognize elements with an ACGT core (Foster et al., 1994)."<ref name=Nijhawan>{{ cite journal | author = Nijhawan A, Jain M, Tyagi AK, Khurana JP
| title = Genomic survey and gene expression analysis of the basic leucine zipper transcription factor family in rice
| journal = Plant Physiology
| volume = 146
| issue = 2
| pages = 333–50
| date = February 2008
| pmid = 18065552
| doi = 10.1104/pp.107.112821 }}</ref>
 
"Most recombinant bZIP proteins can interact with ACGT elements derived from different plant genes, albeit with different affinity. Systematic protein/DNA binding studies have shown that sequences flanking the ACGT core affect bZIP protein binding specificity. These studies have provided the basis for a concise ACGT nomenclature and defined high-affinity A-box, C-box, and G-box elements."<ref name=Foster>{{ cite journal
|author=Randy Foster, Takeshi Izawa and Nam-Hai Chua
|title=Plant bZIP proteins gather at ACGT elements
|journal=FASEB
|date=1 February 1994
|volume=8
|issue=2
|pages=192-200
|url=https://faseb.onlinelibrary.wiley.com/doi/pdfdirect/10.1096/fasebj.8.2.8119490
|arxiv=
|bibcode=
|doi=10.1096/fasebj.8.2.8119490
|pmid=8119490
|accessdate=25 June 2021 }}</ref>
 
"HY5 binds to the promoter of light-responsive genes featuring [[ACGT-containing element gene transcriptions|"ACGT-containing elements"]] such as the G-box (CACGTG), C-box (GACGTC), Z-box (ATACGGT), and A-box (TACGTA) (4, 6)."<ref name=Nawkar>{{ cite journal
|author=Ganesh M. Nawkar, Chang Ho Kanga, Punyakishore Maibam, Joung Hun Park, Young Jun Jung, Ho Byoung Chae, Yong Hun Chi, In Jung Jung, Woe Yeon Kim, Dae-Jin Yun, and Sang Yeol Lee
|title=HY5, a positive regulator of light signaling, negatively controls the unfolded protein response in ''Arabidopsis''
|journal=Proceedings of the National Academy of Sciences USA
|date=21 February 2017
|volume=114
|issue=8
|pages=2084-89
|url=https://www.pnas.org/content/pnas/114/8/2084.full.pdf
|arxiv=
|bibcode=
|doi=10.1073/pnas.1609844114
|pmid=
|accessdate=24 June 2021 }}</ref>
 
====Activating transcription factors====
{{main|Activating transcription factor gene transcriptions}}
 
=====ATFBs=====
{{main|Activating transcription factor gene transcriptions#Activating transcription factor samplings (Burton)}}
 
=====ATFKs=====
{{main|Activating transcription factor gene transcriptions#Activating transcription factor samplings (Kilberg)}}
 
====Affinity Capture-Western; Two-hybrid transcription factors====
{{main|Aft1p gene transcriptions}}
 
=====AFTs=====
{{main|Aft1p gene transcriptions#AFT1 samplings}}
 
====Box As====
{{main|A box gene transcriptions#Box A samplings}}
 
====C-boxes====
{{main|C box gene transcriptions}}
C-boxes come in several varieties:
 
=====C-boxes (Johnson)=====
{{main|C box gene transcriptions#Johnson C-box samplings}}
 
=====C boxes (Samarsky)=====
{{main|C box gene transcriptions#Samarsky C box samplings}}
 
=====C boxes (Voronina)=====
{{main|C box gene transcriptions#Voronina C box samplings}}
 
=====C boxes (Song)=====
{{main|C box gene transcriptions#Song C-box samplings}}
 
=====C boxes (Song hybrids)=====
{{main|C box gene transcriptions#Hybrid C, G box samplings}}
Hybrids: C/A-box (TGACGTAT), C/G-box (TGACGTGT), C/T-box (TGACGTTA).
 
====CAMPs====
{{main|CRE box gene transcriptions#CRE samplings of the A1BG promoters}}
 
====ESRE====
{{main|Endoplasmic reticulum stress response element gene transcriptions}}
The endoplasmic reticulum stress response element (ESRE) has two parts: (1) CCAAT and (2) CCACG which are tested separately then compared to see if any parts have any nine nucleotides between them.
 
=====CCAAT=====
{{main|Endoplasmic reticulum stress response element gene transcriptions#CCAAT samplings}}
 
=====CCACG=====
{{main|Endoplasmic reticulum stress response element gene transcriptions#CCACG samplings}}
 
According to So (2018) the endoplasmic reticulum stress response element should be CCAAT-N9-CCACG. Samplings demonstrate that the ideal CCAAT-N9-CCACG or its complement inverse do not occur on either side of A1BG or close to ZSCAN22 or ZNF497.
 
====Hap motif====
{{main|CAAT box gene transcriptions#Heme-activated protein (Hap) samplings|Endoplasmic reticulum stress response element gene transcriptions#CCAAT samplings}}
 
====G-boxes====
{{main|G box gene transcriptions}}
 
=====G-box (CACGTG)=====
{{main|Phosphate starvation-response transcription factor gene transcriptions#Pho samplings|Complex locus A1BG and ZNF497#Phors}}
 
====GCN4 motif====
{{main|Gcn4p gene transcriptions}}
 
=====GCREs (Gcn4)=====
{{main|Gcn4p gene transcriptions#GCRE samplings}}
 
====Migs====
{{main|Mig1p gene transcriptions}}
 
====Nuclear factors====
{{main|Nuclear factor gene transcriptions}}
 
=====NFATs=====
{{main|Nuclear factor of activated T cell gene transcriptions (NFAT)#NFAT samplings}}
 
=====HNF6s=====
{{main|HNF gene transcriptions#HNF6 samplings}}
 
====T boxes====
{{main|T box gene transcriptions}}
 
=====TboxCs=====
{{main|T box gene transcriptions#T box (Conlon) samplings}}
 
=====TboxZs=====
{{main|T box gene transcriptions#T box (Zhang) samplings}}
 
====Vboxes====
{{main|V box gene transcriptions#V box samplings}}
 
====Z-boxes====
{{main|Z box gene transcriptions}}
 
=====ZboxGs=====
{{main|Z box gene transcriptions#General Z-box (ZboxG) samplings}}
 
=====ZboxSps=====
{{main|Z box gene transcriptions#Z-box (ZboxSp) samplings}}
 
===Helix-turn-helix (HTH) transcription factors===
{{main|Helix-turn-helix transcription factors}}
Gene ID: 4602 is MYB [myeloblastosis] MYB proto-oncogene, transcription factor on 6q23.3: "This gene encodes a protein with three HTH DNA-binding domains that functions as a transcription regulator. This protein plays an essential role in the regulation of hematopoiesis. This gene may be aberrently expressed or rearranged or undergo translocation in leukemias and lymphomas, and is considered to be an oncogene. Alternative splicing results in multiple transcript variants."<ref name=RefSeq4602>{{ cite web
|author=RefSeq
|title=MYB MYB proto-oncogene, transcription factor [ Homo sapiens (human) ]
|publisher=National Center for Biotechnology Information, U.S. National Library of Medicine
|location=8600 Rockville Pike, Bethesda MD, 20894 USA
|date=January 2016
|url=https://www.ncbi.nlm.nih.gov/gene/4602
|accessdate=7 February 2021 }}</ref>
 
====CadC binding domains====
{{main|CadC binding domain gene transcriptions#Cadaverine C samplings}}
 
====Factor II B recognition elements====
{{main|Factor II B recognition element gene transcriptions#BREu samplings}}
 
====Forkhead boxes====
{{main|Forkhead box gene transcriptions#Forkhead box samplings}}
 
====Homeoboxes====
{{main|Homeobox gene transcriptions#Homeobox samplings}}
 
====Homeodomains====
{{main|Homeobox gene transcriptions#Homeodomain samplings}}
 
====HSE3 (Eastmond)====
{{main|Hsf1p gene transcriptions#HSE3 (Eastmond) samplings}}
 
====HSE4 (Eastmond)====
{{main|Hsf1p gene transcriptions#HSE4 (Eastmond) samplings}}
 
====HSE8 GAP1 (Eastmond)====
{{main|Hsf1p gene transcriptions#HSE8 GAP1 (Eastmond) samplings}}
 
====HSE9 GAP2 (Eastmond)====
{{main|Hsf1p gene transcriptions#HSE9 GAP2 (Eastmond) samplings}}
 
====Hsf (Tang)====
{{main|Hsf1p gene transcriptions#Hsf (Tang) samplings}}
 
====MREs====
{{main|MYB recognition element gene transcriptions#MRE samplings}}
 
====Tryptophan residues====
{{main|Interferon regulatory factor gene transcriptions#Tryptophan residue samplings}}
 
===Basic helix-loop-helix (bHLH) transcription factors===
{{main|Basic helix–loop–helix}}
"The [palindromic E-box motif (CACGTG)] motif is bound by the transcription factor Pho4, [and has the] class of basic helix-loop-helix DNA binding domain and core recognition sequence (Zhou and O'Shea 2011)."<ref name=Rossi>{{ cite journal
|author=Matthew J. Rossi, William K.M. Lai and B. Franklin Pugh
|title=Genome-wide determinants of sequence-specific DNA binding of general regulatory factors
|journal=Genome Research
|date=21 March 2018
|volume=28
|issue=
|pages=497-508
|url=https://genome.cshlp.org/content/28/4/497.full
|arxiv=
|bibcode=
|doi=10.1101/gr.229518.117
|pmid=29563167
|accessdate=31 August 2020 }}</ref>
 
"Pho4 bound to virtually all E-boxes ''in vitro'' (96%) [...]. That was not the case ''in vivo'', where only 5% were bound by Pho4, under activating conditions as determined by ChIP-seq [Zhou and O'Shea 2011]."<ref name=Rossi/>
 
"Pho4 possesses the intrinsic ability to bind every E-box, but ''in vivo'' is prevented from binding by chromatin unless assisted by chromatin remodelers (Svaren ''et al.'' 1994) that are targeted at promoter regions."<ref name=Rossi/>
 
"On one end of that spectrum, typical transcription factors like Pho4 do not appear to compete with nucleosomes and instead predominantly sample motifs that already exist in the [nucleosome-free promoter regions] NFRs generated by other factors. In vitro (PB-exo), Pho4 bound nearly every instance of an E-box motif across the yeast genome. However, in vivo, Pho4 is a low-abundance protein that is recruited to the nucleus upon phosphate starvation by other factors, to act at a few dozen genes (Komeili and O'Shea 1999; Zhou and O'Shea 2011). Since Pho4 appears unable to compete with nucleosomes, competent sites that are occluded by nucleosomes are invisible to Pho4."<ref name=Rossi/>
 
The Pho4 homodimer binds to DNA sequences containing the bHLH binding site CACGTG.<ref name=Shao>{{ cite journal
|author=Dalei Shao, Caretha L. Creasy, Lawrence W. Bergman
|title = A cysteine residue in helixII of the bHLH domain is essential for homodimerization of the yeast transcription factor Pho4p
|journal = Nucleic Acids Research
|volume = 26
|issue = 3
|pages = 710–4
|date= 1 February 1998
|pmid = 9443961
|pmc = 147311
|doi = 10.1093/nar/26.3.710
|url = https://academic.oup.com/nar/article/26/3/710/1052045 }}</ref>
 
The upstream activating sequence (UAS) for Pho4p is CAC(A/G)T(T/G) in the promoters of ''HIS4'' and ''PHO5'' regarding phosphate limitation with respect to regulation of the purine and histidine biosynthesis pathways [66].<ref name=Tang>{{ cite journal
|author=Hongting Tang, Yanling Wu, Jiliang Deng, Nanzhu Chen, Zhaohui Zheng, Yongjun Wei, Xiaozhou Luo, and Jay D. Keasling
|title=Promoter Architecture and Promoter Engineering in ''Saccharomyces cerevisiae''
|journal=Metabolites
|date=6 August 2020
|volume=10
|issue=8
|pages=320-39
|url=https://www.mdpi.com/2218-1989/10/8/320/pdf
|arxiv=
|bibcode=
|doi=10.3390/metabo10080320
|pmid=32781665
|accessdate=18 September 2020 }}</ref>
 
bHLH proteins typically bind to a consensus sequence called an E-box, CANNTG.<ref name="pmid10319327">{{cite journal |author=Chaudhary J, Skinner MK |title=Basic helix-loop-helix proteins can act at the E-box within the serum response element of the c-fos promoter to influence hormone-induced promoter activation in Sertoli cells |journal=Mol. Endocrinol. |volume=13 |issue=5 |pages=774–86 |date=1999 |pmid=10319327 |doi=10.1210/mend.13.5.0271 }}</ref>
 
"A computer search for transcription promoter elements [...] showed the presence of a prominent TATA box 22 nucleotides upstream of the transcription start site and an [[Sp1]] site at position -42 to -33. The 5'-flanking sequence also contains three E boxes with CANNTG consensus sequences at positions -464 to -459, -90 to -85, and -52 to -47 that have been marked as [[E box]], [[E1 box]], and [[E2 box]], respectively [...]. In addition, the 5'-flanking region contains one or more [[GRE]], [[Aryl hydrocarbon receptor#DNA binding (xenobiotic response element – XRE)|XRE]], [[GATA1|GATA-1]], [[ATF4|GCN-4]], [[ETV4|PEA-3]], [[AP-1 (transcription factor)|AP1]], and [[Activating protein 2|AP2]] consensus motifs and also three imperfect CArG sites [...]."<ref name=Lenka>{{ cite journal
|author=Nibedita Lenka, Aruna Basu, Jayati Mullick, and Narayan G. Avadhani
|title=The role of an E box binding basic helix loop helix protein in the cardiac muscle-specific expression of the rat cytochrome oxidase subunit VIII gene
|journal=The Journal of Biological Chemistry
|date=22 November 1996
|volume=271
|issue=47
|pages=30281–30289
|url=http://www.jbc.org/content/271/47/30281.full.pdf
|arxiv=
|bibcode=
|doi=10.1074/jbc.271.47.30281
|pmid=
|accessdate=7 February 2019 }}</ref>
 
====AhRYs====
{{main|Xenobiotic response element gene transcriptions#TCDD*AhR DNA-binding consensus sequence sampling}}
 
====AHRE-IIs====
{{main|Xenobiotic responsive element gene transcriptions#AHRE-II samplings}}
 
====AEREs====
{{main|Antioxidant-electrophile responsive element gene transcriptions#AERE (Lacher) samplings}}
 
====CAT boxes====
{{main|CAT box gene transcriptions#CAT box samplings}}
 
====CAT-box-like elements====
{{main|CAT box gene transcriptions#CAT-box-like element samplings}}
 
===="Class C"====
{{main|N box gene transcriptions#"Class C" (Leal) samplings}}
 
===="Class I"====
 
=====TCFs=====
{{main|Transcription factor 3 gene transcriptions#TCF3 samplings}}
 
====DIOXs====
{{main|Xenobiotic response element gene transcriptions#DIOX samplings}}
 
====Enhancer boxes====
{{main|Enhancer box gene transcriptions#Enhancer box samplings}}
 
=====ChoRE motifs=====
{{main|Carbohydrate response element gene transcriptions}}
 
=====CarbE1s=====
{{main|Carbohydrate response element gene transcriptions#ATCTTG (CarbE1) samplings}}
 
=====CarbE2s=====
{{main|Carbohydrate response element gene transcriptions#CACGTG (CarbE2) samplings}}
 
=====CarbE3s=====
{{main|Carbohydrate response element gene transcriptions#TCCGCC (CarbE3) samplings}}
 
=====Phors=====
{{main|Phosphate starvation-response transcription factor gene transcriptions#Pho samplings}}
Palindromic E-box motif (CACGTG).
 
=====E2 boxes=====
{{main|E2 box gene transcriptions#E2 box samplings}}
 
====GATAs====
{{main|GATA gene transcriptions#GATA samplings}}
 
====Gln3s====
{{main|GATA gene transcriptions#Staschke Gln3 samplings}}
 
====Glucocorticoid response elements====
{{main|Glucocorticoid response element gene transcriptions#Glu samplings}}
 
====ICRE (Lopes)====
{{main|Inositol, choline-responsive element gene transcriptions}}
 
====ICRE (Schwank)====
{{main|Inositol, choline-responsive element gene transcriptions}}
 
====Pho4====
{{main|Phosphate starvation-response transcription factor gene transcriptions#Phop samplings}}
 
====QRDREs====
{{main|Xenobiotic response element gene transcriptions#QRDRE samplings}}
 
====Carbon source-responsive elements====
{{main|Carbon source-responsive element gene transcriptions}}
 
=====CATTCAs=====
{{main|Carbon source-responsive element gene transcriptions#CATTCA samplings}}
 
=====TCCGs=====
{{main|Carbon source-responsive element gene transcriptions#TCCG samplings}}
 
====XREs====
{{main|Xenobiotic response element gene transcriptions#Xenobiotic response element samplings}}
 
===Basic helix-loop-helix leucine zipper transcription factors===
 
Basic helix-loop-helix leucine zipper transcription factors are, as their name indicates, transcription factors containing both [[Basic helix-loop-helix]] and [[leucine zipper]] motifs.
 
Examples include [[Microphthalmia-associated transcription factor]] and [[Sterol regulatory element-binding protein]] (SREBP).
 
MITF recognizes E-box (CAYRTG) and M-box (TCAYRTG or CAYRTGA) sequences in the promoter regions of target genes.<ref name=Hoek>{{cite journal | author = Hoek KS, Schlegel NC, Eichhoff OM, Widmer DS, Praetorius C, Einarsson SO, Valgeirsdottir S, Bergsteinsdottir K, Schepsky A, Dummer R, Steingrimsson E | title = Novel MITF targets identified using a two-step DNA microarray strategy | journal = Pigment Cell Melanoma Res. | volume = 21 | issue = 6 | pages = 665–76 | date = 2008 | pmid = 19067971 | doi = 10.1111/j.1755-148X.2008.00505.x }}</ref>
 
[[Serum response element gene transcriptions]]: The SRE wild type (SREwt) contains the nucleotide sequence ACAGGATGTCCATATTAGGACATCTGC, of which CCATATTAGG is the CArG box, TTAGGACAT is the C/EBP box, and CATCTG is the E box.<ref name=Misra>{{ cite journal
|author=Ravi P. Misra
|author2=Azad Bonni
|author3=Cindy K. Miranti
|author4=Victor M. Rivera
|author5=Morgan Sheng
|author6=Michael E.Greenberg
|title=L-type Voltage-sensitive Calcium Channel Activation Stimulates Gene Expression by a Serum Response Factor-dependent Pathway
|journal=The Journal of Biological Chemistry
|date=14 October 1994
|volume=269
|issue=41
|pages=25483-25493
|url=http://www.jbc.org/content/269/41/25483.full.pdf
|arxiv=
|bibcode=
|doi=
|pmid=7929249
|accessdate=7 December 2019 }}</ref>
 
"Serum response factor (SRF) is an important transcription factor that regulates cardiac and skeletal muscle genes during development, maturation and adult aging [17,18]. SRF regulates its target genes by binding to serum response elements (SREs), which contain a consensus CC(A/T)<sub>6</sub>GG (CArG) motif."<ref name=Zhang2017>{{ cite journal
|author=Xiaomin Zhang, Gohar Azhar, Jeanne Y. Wei
|title=SIRT2 gene has a classic SRE element, is a downstream target of serum response factor and is likely activated during serum stimulation
|journal=PLOS One
|date=21 December 2017
|volume=12
|issue=12
|pages=e0190011
|url=https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0190011
|arxiv=
|bibcode=
|doi=10.1371/journal.pone.0190011
|pmid=
|accessdate=23 February 2021 }}</ref>
 
====CArG boxes====
{{main|CArG box gene transcriptions#CArG box samplings}}
 
====MITF E-boxes====
{{main|Enhancer box gene transcriptions#MITF E-box (CAYRTG) samplings}}
 
=====RREs=====
{{main|MYB recognition element gene transcriptions#RRE samplings}}
Consensus sequence: CATCTG.
 
====M-boxes====
{{main|M box gene transcriptions}}
 
=====M box (Bertolotto)=====
{{main|M box gene transcriptions#M box (Bertolotto) samplings}}
 
=====M-box (Hoek)=====
{{main|M box gene transcriptions#M-box (Hoek) samplings}}
 
=====M-box (Ripoll)=====
{{main|M box gene transcriptions#M-box (Ripoll) samplings}}
 
====SER elements====
{{main|Serum response element gene transcriptions#SER samplings}}
 
===Basic helix-span-helix===
 
====Activating proteins====
{{main|Activating protein gene transcriptions}}
 
=====AP2as=====
{{main|Activating protein gene transcriptions#AP-2 alpha consensus sequences}}
 
=====APCo1s=====
{{main|Activating protein gene transcriptions#Activating protein samplings (Cohen)}}
 
=====APCo2s=====
{{main|Activating protein gene transcriptions#Activating protein (Cohen2) samplings}}
 
=====APM3Ns=====
{{main|Activating protein gene transcriptions#Activating protein samplings (Murata, 3N)}}
 
=====APM4Ns=====
{{main|Activating protein gene transcriptions#Activating protein samplings (Murata, 4N)}}
 
=====Yao1s=====
{{main|Activating protein gene transcriptions#Activating protein samplings (Yao1)}}
 
=====Yao2s=====
{{main|Activating protein gene transcriptions#Activating protein samplings (Yao2)}}
 
=====Yau3s=====
{{main|Activating protein gene transcriptions#Activating protein samplings (Yao3)}}
"[[Pemphigus foliaceus]] (PF) is an autoimmune disease, endemic in Brazilian rural areas, characterized by acantholysis and accompanied by complement activation, with generalized or localized distribution of painful epidermal blisters. [[CD59]] is an essential complement regulator, inhibiting formation of the membrane attack complex, and mediating signal transduction and activation of T lymphocytes. ''CD59'' has different transcripts by alternative splicing, of which only two are widely expressed, suggesting the presence of regulatory sites in their noncoding regions. To date, there is no association study with polymorphisms in ''CD59'' noncoding regions and susceptibility to autoimmune diseases. In this study, we aimed to evaluate if ''CD59'' polymorphisms have a possible regulatory effect on gene expression and susceptibility to PF. Six noncoding polymorphisms were haplotyped in 157 patients and 215 controls by sequence-specific [[polymerase chain reaction|PCR]], and CD59 mRNA levels were measured in 82 subjects, by qPCR. The ''rs861256-allele-G'' (''rs861256*G'') was associated with increased mRNA expression (''p'' = .0113) and PF susceptibility in women (OR = 4.11, ''p'' = .0001), which were also more prone to develop generalized lesions (OR = 4.3, ''p'' = .009) and to resist disease remission (OR = 3.69, ''p'' = .045). Associations were also observed for ''rs831625*G'' (OR = 3.1, ''p'' = .007) and ''rs704697*A'' (OR = 3.4, ''p'' = .006) in Euro-Brazilian women, and for ''rs704701*C'' (OR = 2.33, ''p'' = .037) in Afro-Brazilians. These alleles constitute the ''GGCCAA'' haplotype, which also increases PF susceptibility (OR = 4.9, ''p'' = .045) and marks higher mRNA expression (''p'' = .0025). [...] higher ''CD59'' transcriptional levels may be related with PF susceptibility (especially in women), probably due to the effect of genetic polymorphism and to the CD59 role in T cell signal transduction."<ref name=Silva>{{ cite journal
|author=Amanda Salviano-Silva, Maria Luiza Petzl-Erler & Angelica Beate Winter Boldt
|title=''CD59'' polymorphisms are associated with gene expression and different sexual susceptibility to pemphigus foliaceus
|journal=Autoimmunity
|date=29 April 2017
|volume=50
|issue=6
|pages=377-385
|url=https://www.tandfonline.com/doi/abs/10.1080/08916934.2017.1329830
|arxiv=
|bibcode=
|doi=10.1080/08916934.2017.1329830
|pmid=
|accessdate=27 September 2021 }}</ref>
 
===Stem-loops===
[[Image:Stem-loop.svg|thumb|right|300px|An example of an RNA stem-loop is shown. Credit: [[c:user:Sakurambo|Sakurambo]].{{tlx|free media}}]]
As an important secondary structure of RNA, a stem-loop can direct RNA folding, protect structural stability for messenger RNA (mRNA), provide recognition sites for RNA binding proteins, and serve as a substrate for enzymatic reactions.<ref>Svoboda, P., & Cara, A. (2006). Hairpin RNA: A secondary structure of primary importance. Cellular and Molecular Life Sciences, 63(7), 901-908.</ref>
 
Hairpin loops are often elements found within the 5'UTR of prokaryotes. These structures are often bound by proteins or cause the attenuation of a transcript in order to regulate translation.<ref name=Meyer>{{cite journal|last=Meyer|first=Michelle|author2=Deiorio-Haggar K |author3=Anthony J |title=RNA structures regulating ribosomal protein biosynthesis in bacilli|journal=RNA Biology|date=July 2013|volume=10|series=7|pages=1160–1164|doi=10.4161/rna.24151|pmid=23611891 }}</ref>
 
The mRNA stem-loop structure forming at the ribosome binding site may control an initiation of translation.<ref name=Malys2009>{{cite journal | author = Malys N, Nivinskas R | title = Non-canonical RNA arrangement in T4-even phages: accommodated ribosome binding site at the gene 26-25 intercistronic junction |journal = Mol Microbiol |volume = 73 | issue = 6 | pages = 1115–1127 | date = 2009 | pmid = 19708923 | doi =10.1111/j.1365-2958.2009.06840.x }}</ref><ref name=Malys2010>{{ cite journal | author = Malys N, McCarthy JEG | title = Translation initiation: variations in the mechanism can be anticipated |journal = Cellular and Molecular Life Sciences | date = 2010 | doi =10.1007/s00018-010-0588-z | pmid=21076851 | volume = 68 | issue = 6 | pages = 991–1003 }}</ref>
{{clear}}
 
====AUREs====
{{main|Adenylate–uridylate rich element gene transcriptions#Adenylate–uridylate rich element (Bakheet) samplings}}
 
====Adenylate–uridylate rich elements (Chen and Shyu, Class I)====
{{main|Adenylate–uridylate rich element gene transcriptions#ATTTA (Chen and Shyu, Class I) samplings}}
 
====Adenylate–uridylate rich elements (Chen and Shyu, Class II)====
{{main|Adenylate–uridylate rich element gene transcriptions#UUAUUUA(U/A)(U/A) (Chen and Shyu, Class II) samplings}}
 
====Adenylate–uridylate rich elements (Chen and Shyu, Class III)====
{{main|Adenylate–uridylate rich element gene transcriptions#ATTT (Chen and Shyu, Class III)}}
 
====MERs====
{{main|Adenylate–uridylate rich element gene transcriptions#Overlapping (Siegel) mers}}
 
====Constitutive decay elements====
{{main|Adenylate–uridylate rich element gene transcriptions#Constitutive decay element (Siegel) samplings}}
 
==={{chem|Cys|2|His|2}} SP / Kruppel-like factor (KLF) transcription factor family===
 
The {{chem|Cys|2|His|2}}-like fold group ({{chem|Cys|2|His|2}}) is by far the best-characterized class of zinc fingers, and is common in mammalian transcription factors, where such domains adopt a simple ββα fold and have the amino acid sequence motif:<ref name=Pabo2001>{{cite journal | author = Pabo CO, Peisach E, Grant RA | title = Design and selection of novel Cys2His2 zinc finger proteins | journal = Annual Review of Biochemistry | volume = 70 | pages = 313–40 | date = 2001 | pmid = 11395410 | doi = 10.1146/annurev.biochem.70.1.313 }}</ref>
 
:X<sub>2</sub>-Cys-X<sub>2,4</sub>-Cys-X<sub>12</sub>-His-X<sub>3,4,5</sub>-His
 
====Alcohol dehydrogenase repressor 1====
{{main|Adr1p gene transcriptions#ADR samplings}}
 
====SP1M1s====
{{main|Specificity protein gene transcriptions#Sp1-box 1 (Motojima) Samplings}}
 
====SP1M2s====
{{main|Specificity protein gene transcriptions#Sp1-box 2 (Motojima) Samplings}}
 
====SP-1 (Sato)s====
{{main|Specificity protein gene transcriptions#Sp-1 (Sato) samplings}}
 
====SP1 (Yao)s====
{{main|Specificity protein gene transcriptions#Sp1 (Yao) samplings}}
 
====YY1Ts====
{{main|YY1 gene transcriptions#YY1 CCATCTT samplings}}
 
===AP-2/EREBP-related factors===
 
====AGC boxes====
{{main|AGC box gene transcriptions#AGC box samplings}}
 
===AP-1 transcription factor network (Pathway)===
 
Sixty-nine genes are included in the AP-1 transcription factor network (Pathway).<ref name=AP-1TFN>{{ cite web
|author=NCBI
|title=AP-1 transcription factor network
|publisher=National Center for Biotechnology Information, U.S. National Library of Medicine
|location=8600 Rockville Pike, Bethesda MD, 20894 USA
|date=9 March 2021
|url=https://pubchem.ncbi.nlm.nih.gov/pathway/Pathway%20Interaction%20Database:ap1_pathway
|accessdate=26 October 2021 }}</ref>
 
====AGCEs====
{{main|AGCE gene transcriptions#AGCE samplings}}
 
===Zinc finger DNA-binding domains===
 
====AnRE1s====
{{main|Androgen response element gene transcriptions#Androgen response element1 (Kouhpayeh) samplings}}
 
====AnDRE2s====
{{main|Androgen response element gene transcriptions#Androgen response element2 (Kouhpayeh) samplings}}
 
====AnREWs====
{{main|Androgen response element gene transcriptions#Androgen response element (Wilson) samplings}}
 
====B-boxes====
{{main|B box gene transcriptions#B box (Johnson) samplings}}
 
====Box Bs====
{{main|B box gene transcriptions#B1 box (Sanchez) samplings}}
 
===β-Scaffold factors===
 
"Higher animals have [transcription factor] TF genes for the basic domain, the β-scaffold factor, and other new
structures; however, their total proportion is less than 15% and most are [zinc (Zn)-coordinating factor] ZF and [Helix-Turn-Helix] HTH genes."<ref name=Nagata>{{ cite book
|author=Toshifumi Nagata, Aeni Hosaka-Sasaki and Shoshi Kikuchi
|title=The Evolutionary Diversification of Genes that Encode Transcription Factor Proteins in Plants, In: ''Plant Transcription Factors Evolutionary, Structural and Functional Aspects''
|publisher=Academic Press
|location=
|date=2016
|editor=Daniel H. Gonzalez
|pages=73-97
|url=https://www.sciencedirect.com/science/article/pii/B9780128008546000051
|arxiv=
|bibcode=
|doi=10.1016/B978-0-12-800854-6.00005-1
|pmid=
|isbn=978-0-12-800854-6
|accessdate=28 November 2021 }}</ref>
 
====ATA boxes====
{{main|ATA box gene transcriptions#ATA box samplings}}
 
====Γ-interferon activated sequences====
{{main|Γ-interferon activated sequence gene transcriptions#Γ-interferon activated sequence samplings}}
 
====HMG boxes====
{{main|HMG box gene transcriptions#HMG box samplings}}
 
===Zn(II)<sub>2</sub>Cys<sub>6</sub> proteins===
 
"The transcription factors Uga3, Dal81 and Leu3 belong to the class III family (Zn(II)<sub>2</sub>Cys<sub>6</sub> proteins), and they recognize highly related sequences rich in GGC triplets [15]."<ref name=Ruiz>{{ cite journal
|author=Marcos Palavecino-Ruiz, Mariana Bermudez-Moretti, Susana Correa-Garcia
|title=Unravelling the transcriptional regulation of Saccharomyces cerevisiae UGA genes: the dual role of transcription factor LEU3
|journal=Microbiology
|date=1 November 2017
|volume=
|issue=
|pages=
|url=https://www.researchgate.net/profile/Mariana_Bermudez3/publication/320571623_Unravelling_the_transcriptional_regulation_of_Saccharomyces_cerevisiae_UGA_genes_the_dual_role_of_transcription_factor_Leu3/links/5c62114c299bf1d14cbf7ade/Unravelling-the-transcriptional-regulation-of-Saccharomyces-cerevisiae-UGA-genes-the-dual-role-of-transcription-factor-Leu3.pdf
|arxiv=
|bibcode=
|doi=10.1099/mic.0.000560
|pmid=
|accessdate=21 February 2021 }}</ref>
 
====Dal81====
 
====GCC boxes====
{{main|AGC box gene transcriptions#GCC box samplings}}
 
====GGC triplets====
{{main|GGC triplet gene transcriptions#GGC samplings}}
 
=====GGCGGC triplets=====
{{main|GGC triplet gene transcriptions#GGCGGC triplet samplings}}
 
====Leu3====
{{main|Leu3 gene transcriptions#Leu samplings|GGC triplet gene transcriptions#Leu3 samplings}}
 
====Uga3====
{{main|Leu3 gene transcriptions}}
 
===Hairpin-hinge-hairpin-tail===
 
"In addition to this ACA box, they have the consensus H box sequence (5'-ANANNA-3') but have no other primary sequence identity. Despite this lack of primary sequence conservation, the H and ACA boxes are embedded in an evolutionarily conserved hairpin-hinge-hairpin-tail core secondary structure with the H box in the single-stranded hinge region and the ACA box in the single-stranded tail (5, 16)."<ref name=Mitchell>{{ cite journal
|author=James R. Mitchell, Jeffrey Cheng, ang Kathleen Collins
|title=A Box H/ACA Small Nucleolar RNA-Like Domain at the Human Telomerase RNA 3' End
|journal=Molecular and Cellular Biology
|date=January 1999
|volume=19
|issue=1
|pages=567–576
|url=http://mcb.asm.org/content/19/1/567.full.pdf
|arxiv=
|bibcode=
|doi=
|pmid=
|accessdate=5 November 2018 }}</ref>
 
====H and ACA boxes====
{{main|H and ACA box gene transcriptions#H and ACA boxes in promoters of A1BG}}
 
====H-boxes (Grandbastien)====
{{main|H box gene transcriptions#H-box (Grandbastien) samplings}}
 
====H-boxes (Lindsay)====
{{main|H box gene transcriptions#H-box (Lindsay) samplings}}
 
====H boxes (Mitchell)====
{{main|H box gene transcriptions#H boxes (Mitchell) samplings}}
 
====H boxes (Rozhdestvensky)====
{{main|H box gene transcriptions#H boxes (Rozhdestvensky) in promoters of A1BG}}
 
===Unknown response element types===
 
====ACEs====
{{main|MYB recognition element gene transcriptions#ACE samplings}}
 
====BBCABW Inrs====
{{main|Initiator element gene transcriptions#BBCABW samplings}}
 
====Calcineurin-responsive transcription factors====
{{main|Calcineurin-responsive transcription factor gene transcriptions#CRT samplings}}
 
====Carbs====
{{main|Carbohydrate response element gene transcriptions#ACCGG (Carb) samplings}}
 
====Carb1s====
{{main|Carbohydrate response element gene transcriptions#CCCAT (Carb1) samplings}}
 
====Cat8s====
{{main|Cat8p gene transcriptions#Cat8p samplings}}
 
====Cell-cycle box variants====
{{Main|Cell-cycle box gene transcriptions#CCB variant samplings}}
 
====CGCG boxes====
{{main|CGCG box gene transcriptions#CGCG box samplings}}
 
====Circadian control elements====
{{main|Circadian control element gene transcriptions#CCE samplings}}
 
====Cold-responsive elements====
{{main|Cold-responsive element gene transcriptions#Cold-responsive element samplings}}
 
====Copper response elements====
{{main|Copper response element gene transcriptions}}
 
=====CuREQs=====
{{main|Copper response element gene transcriptions#CuRE (Quinn) samplings}}
 
=====CuREPs=====
{{main|Copper response element gene transcriptions#CuRE (Park) samplings}}
 
====Cytoplasmic polyadenylation elements====
{{main|Cytoplasmic polyadenylation element gene transcriptions#CPE samplings}}
 
====DAF-16 binding elements====
{{main|DAF-16 binding element gene transcriptions#DBE samplings}}
 
====D box (Samarsky)====
{{main|D box gene transcriptions#Dbox (Samarsky) samplings}}
 
====D box (Voronina)====
{{main|D box gene transcriptions#D box (Voronina) samplings}}
 
====D-box (Motojima)====
{{main|D box gene transcriptions#(Motojima) samplings}}
 
====dBRE====
{{main|Downstream TFIIB recognition element gene transcriptions#dBRE samplings}}
 
====Downstream core elements====
{{main|Downstream core element gene transcriptions}}
 
====DCE SI====
{{main|Downstream core element gene transcriptions#Downstream core element SI samplings}}
 
====DCE SII====
{{main|Downstream core element gene transcriptions#Downstream core element SII samplings}}
 
====DCE SIII====
{{main|Downstream core element gene transcriptions#Downstream core element SIII samplings}}
 
====DPE (Juven-Gershon)====
{{main|Downstream promoter element gene transcriptions#DPE (Juven-Gershon) samplings}}
 
====DPE (Kadonaga)====
{{main|Downstream promoter element gene transcriptions#DPE (Kadonaga) samplings}}
 
====DPE (Matsumoto)====
{{main|Downstream promoter element gene transcriptions#DPE (Matsumoto) samplings}}
 
====EIN3 binding sites====
{{main|EIN3 binding site gene transcriptions#EIN3 samplings}}
 
====Endosperm expressions====
{{main|Endosperm expression gene transcriptions#Endosperm expression samplings}}
 
====Estrogen response elements====
{{main|Estrogen response element gene transcriptions}}
 
=====ERE1s=====
{{main|Estrogen response element gene transcriptions#ERE1 (Driscoll) samplings}}
 
=====ERE2s=====
{{main|Estrogen response element gene transcriptions#EREs (Driscoll) samplings}}
 
====GAAC elements====
{{main|GAAC element gene transcriptions#GAAC element samplings}}
 
====GC boxes (Briggs)====
{{main|GC box gene transcriptions#GC box (Briggs) samplings}}
 
====GC boxes (Ye)====
{{main|GC box gene transcriptions#GC box (Ye) samplings}}
 
====GC boxes (Zhang)====
{{main|GC box gene transcriptions#GC box (Zhang) samplings}}
 
====GCR1s====
{{main|Gcr1p gene transcriptions#GCR1 samplings}}
 
====GREs====
{{main|Gibberellin responsive element gene transcriptions#GRE samplings}}
 
====GT boxes (Sato)====
{{main|TC element gene transcriptions#GT box (Sato) samplings}}
 
====Hex sequences====
{{main|Hex sequence gene transcriptions#Hex core samplings}}
 
====HY boxes====
{{main|HY box gene transcriptions#HY box samplings}}
 
====IFNs====
{{main|Interferon regulatory factor gene transcriptions#IFN-stimulated response element samplings}}
 
====Inr-like, TCTs====
{{main|Initiator element gene transcriptions#Inr-like, TCTs sampling}}
 
====IRF3s====
{{main|Interferon regulatory factor gene transcriptions#IRF-3 samplings}}
 
====IRSs====
{{main|Interferon regulatory factor gene transcriptions#IRS consensus samplings}}
 
====KAR2s====
{{main|Hac1p gene transcriptions#KAR2 samplings}}
 
====MBE1s====
{{main|Musashi binding element gene transcriptions#MBE1 samplings}}
 
====MBE2s====
{{main|Musashi binding element gene transcriptions#MBE2 samplings}}
 
====MBE3s====
{{main|Musashi binding element gene transcriptions#MBE3 samplings}}
 
====NF𝜿BSs====
{{main|Nuclear factor gene transcriptions#NF𝜿B (Sato) samplings}}
 
====PREs====
{{main|Polycomb response element gene transcriptions#Core samplings}}
 
====Pribs====
{{main|Pribnow box gene transcriptions#Pribnow box samplings}}
 
====RAREs====
{{main|Retinoic acid response element gene transcriptions#RARE samplings}}
 
====Rgts====
{{main|Rgt1p gene transcriptions#RGT samplings}}
 
====ROREs====
{{main|ROR-response element gene transcriptions#RORE samplings}}
 
====SERVs====
{{main|Servenius sequence gene transcriptions#Servenius samplings}}
 
====STAT5s====
{{main|STAT5 gene transcription laboratory#STAT5 samplings}}
 
====STREs====
{{main|Msn2,4p gene transcriptions#Stress-response element samplings}}
 
====Sucroses====
{{main|Sucrose box gene transcriptions#Sucrose box samplings}}
 
====TACTs====
{{main|TACTAAC box gene transcriptions#TACT samplings}}
 
====TAGteams====
{{main|TAGteam gene transcriptions#TAGteam samplings}}
 
====TAPs====
{{main|Tapetum box gene transcriptions#Tapetum box samplings}}
 
====TATAs====
{{main|TATA box gene transcriptions#TATA box samplings}}
Examining the promoter regions upstream from ZSCAN22 to A1BG and downstream from ZNF497 to A1BG for TATA boxes has shown that TATA boxes in various forms are present and likely active or activable: (1) TATAAAA (Carninci 2006), (2) TATA(A/T)A(A/T) (Watson 2014), (3) TATA(A/T)AA(A/G) (Juven-Gershon 2010), and (4) TATA(A/T)A(A/T)(A/G) (Basehoar 2004).
 
The TATA boxes have the pattern of appearing in only the negative direction UTRs, proximal and distals. The shorter TATA box: TATAAA does appear as above but also in the positive direction as the complement inverse TTTATA at 2588 in the distal promoter.
 
====TATABs====
{{main|TATA box gene transcriptions#TATA box (Butler 2002) samplings}}
 
====TATACs====
{{main|TATA box gene transcriptions#TATA boxes (Carninci 2006) samplings}}
 
====TATAJs====
{{main|TATA box gene transcriptions#TATA box (Juven-Gershon 2010) samplings}}
 
====TATAWs====
{{main|TATA box gene transcriptions#TATA box (Watson 2014) samplings}}
 
====TEAs====
{{main|TEA consensus sequence gene transcriptions#TEA samplings}}
 
====TECs====
{{main|Tec1p gene transcriptions#Tec1 samplings}}
 
====THRs====
{{main|Thyroid hormone response element gene transcriptions#THR samplings}}
 
====TRFs====
{{main|Telomeric repeat DNA-binding factor gene transcriptions#TRF samplings}}
 
====UPREs====
{{main|Unfolded protein response element gene transcriptions#UPRE samplings}}
 
====UPRE-1s====
{{main|Hac1p gene transcriptions#UPRE-1 samplings}}
 
====URS (Sumrada, core)====
{{main|DNA damage response element gene transcriptions#URS1 (Sumrada, core) samplings}}
 
====VDREs====
{{main|Vitamin D response element gene transcriptions#VDRE samplings}}
 
====XCPE1s====
{{main|X core promoter element gene transcriptions#XCPE1 samplings}}
 
====Yaps====
{{main|Yap1p,2p gene transcriptions#Yap samplings}}
 
====YYRNWYY Inrs====
{{main|Initiator element gene transcriptions#YYRNWYY samplings}}
 
==A1BG orthologs==
 
===''Geotrypetes seraphini''===
[[Image:Geotrypetes seraphini 81151944.jpg|thumb|right|250px|''Geotrypetes seraphini'', the Gaboon caecilian, is a species of amphibian. Credit: [https://www.inaturalist.org/users/7865 Marius Burger].{{tlx|free media}}]]
''Geotrypetes seraphini'', the Gaboon caecilian, is a species of amphibian in the family ''Dermophiidae''.<ref name=IUCN>{{cite journal |author=IUCN SSC Amphibian Specialist Group |date=2019 |title=''Geotrypetes seraphini'' |volume=2019 |page=e.T59557A16957715 |url=https://en.wikipedia.org/wiki/IUCN_Red_List
|doi=10.2305/IUCN.UK.2019-1.RLTS.T59557A16957715.en |accessdate=16 November 2021}}</ref>
 
Its A1BG ortholog has 368 aa vs 495 aa for ''Homo sapiens''.
{{clear}}


==ZSCAN22==
==ZSCAN22==
Line 31: Line 1,106:


Of the some 111 gaps between genes on chromosome locus 19q13.43 as of 4 August 2020, gap number 88 is between ZSCAN22 and A1BG. But, there is no gap between ZNF497 and A1BG.
Of the some 111 gaps between genes on chromosome locus 19q13.43 as of 4 August 2020, gap number 88 is between ZSCAN22 and A1BG. But, there is no gap between ZNF497 and A1BG.
==Promoters==
The core promoter begins approximately -35 nts upstream from the transcription start site (TSS). For the numbered nucleotides between ZSCAN22 and A1BG the core promoter extends from 4425 nts up to 4460 nts (TSS). The proximal promoter extends from approximately -250 to the TSS or 4210 nts up to 4460 nts. The distal promoter begins at about 2460 nts and extends to about 4210 nts.
From the ZNF497 side the core promoter begins about 4265 nts up to 4300 nts, the proximal promoter from 4050 nts to 4265 nts, and the distal promoter from 2300 nts to 4050 nts.


==Alpha-1-B glycoprotein==
==Alpha-1-B glycoprotein==
Line 53: Line 1,134:


An antigen "or immunogen is a molecule that sometimes stimulates an immune system response."<ref name=AntigenWikidoc>{{ cite web
An antigen "or immunogen is a molecule that sometimes stimulates an immune system response."<ref name=AntigenWikidoc>{{ cite web
|author=[[User:C. Michael Gibson|C. Michael Gibson]]
|author=C. Michael Gibson
|title=Antigen
|title=Antigen
|publisher=WikiDoc Foundation
|publisher=WikiDoc Foundation
Line 77: Line 1,158:
|accessdate=7 March 2020 }}</ref> is called an '''[[antibody]]'''.
|accessdate=7 March 2020 }}</ref> is called an '''[[antibody]]'''.


Five different antibody isotypes are known in mammals, which perform different roles, and help direct the appropriate immune response for each different type of foreign object they encounter.<ref name=Market>Eleonora Market, F. Nina Papavasiliou (2003) [http://biology.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pbio.0000016 ''V(D)J Recombination and the Evolution of the Adaptive Immune System''] PLoS Biology 1(1): e16.</ref>
Five different antibody isotypes are known in mammals, which perform different roles, and help direct the appropriate immune response for each different type of foreign object they encounter.<ref name=Market>{{ cite journal
|author=Eleonora Market, F. Nina Papavasiliou
|date=2003
|url=http://biology.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pbio.0000016
|title=V(D)J Recombination and the Evolution of the Adaptive Immune System
|journal=PLoS Biology
|volume=1
|issue=1
|pages=e16
|doi=10.1371/journal.pbio.0000016 }}</ref>


Although the general structure of all antibodies is very similar, a small region, known as the hypervariable region, at the tip of the protein is extremely variable, allowing millions of antibodies with slightly different tip structures to exist, where each of these variants can bind to a different target, known as an antigen.<ref name=Janeway5>{{ cite book | author = Charles A. Janeway, Jr ''et al'' | title = Immunobiolog. | edition = 5th ed. | publisher = Garland Publishing | date = 2001 | url = http://www.ncbi.nlm.nih.gov/books/bv.fcgi?call=bv.View..ShowTOC&rid=imm.TOC&depth=10 | isbn = 0-8153-3642-X }}</ref>
Although the general structure of all antibodies is very similar, a small region, known as the hypervariable region, at the tip of the protein is extremely variable, allowing millions of antibodies with slightly different tip structures to exist, where each of these variants can bind to a different target, known as an antigen.<ref name=Janeway5>{{ cite book | author = Charles A Janeway, Jr, Paul Travers, Mark Walport, and Mark J Shlomchik | title = Immunobiolog. | edition = 5th ed. | publisher = Garland Publishing | date = 2001 | url = http://www.ncbi.nlm.nih.gov/books/bv.fcgi?call=bv.View..ShowTOC&rid=imm.TOC&depth=10 | isbn = 0-8153-3642-X }}</ref>


'''Def.''' "any of the glycoproteins in blood serum that respond to invasion by foreign antigens and that protect the host by removing pathogens;"<ref name=ImmunoglobulinWikt>{{ cite web
'''Def.''' "any of the glycoproteins in blood serum that respond to invasion by foreign antigens and that protect the host by removing pathogens;"<ref name=ImmunoglobulinWikt>{{ cite web
Line 107: Line 1,197:
# NP_570602.2  alpha-1B-glycoprotein precursor, '''cd05751''' Location: 401 → 493 Ig1_LILRB1_like; First immunoglobulin (Ig)-like domain found in Leukocyte Ig-like receptors (LILR)B1 (also known as LIR-1) and similar proteins, '''smart00410''' Location: 218 → 280 IG_like; Immunoglobulin like, '''pfam13895''' Location: 210 → 301 Ig_2; Immunoglobulin domain and '''cl11960''' Location: 28 → 110 Ig; Immunoglobulin domain.<ref name=RefSeq1/>
# NP_570602.2  alpha-1B-glycoprotein precursor, '''cd05751''' Location: 401 → 493 Ig1_LILRB1_like; First immunoglobulin (Ig)-like domain found in Leukocyte Ig-like receptors (LILR)B1 (also known as LIR-1) and similar proteins, '''smart00410''' Location: 218 → 280 IG_like; Immunoglobulin like, '''pfam13895''' Location: 210 → 301 Ig_2; Immunoglobulin domain and '''cl11960''' Location: 28 → 110 Ig; Immunoglobulin domain.<ref name=RefSeq1/>


Patients who have pancreatic ductal [[adenocarcinoma]] show an [[overexpression]] of A1BG in [[pancreatic juice]].<ref name="pmid18706098">{{ cite journal |vauthors=Tian M, Cui YZ, Song GH, Zong MJ, Zhou XY, Chen Y, Han JX | title = Proteomic analysis identifies MMP-9, DJ-1 and A1BG as overexpressed proteins in pancreatic juice from pancreatic ductal adenocarcinoma patients | journal = BMC Cancer | volume = 8 | issue = | pages = 241 | date = 2008 | pmid = 18706098 | pmc = 2528014 | doi = 10.1186/1471-2407-8-241 }}</ref>
Patients who have pancreatic ductal [[adenocarcinoma]] show an [[overexpression]] of A1BG in [[pancreatic juice]].<ref name=Tian>{{ cite journal
|author=Mei Tian, Ya-Zhou Cui, Guan-Hua Song, Mei-Juan Zong, Xiao-Yan Zhou, Yu Chen, Jin-Xiang Han
| title = Proteomic analysis identifies MMP-9, DJ-1 and A1BG as overexpressed proteins in pancreatic juice from pancreatic ductal adenocarcinoma patients
| journal = BMC Cancer
| volume = 8
| issue =
| pages = 241
| date = 2008
| pmid = 18706098
| pmc = 2528014
| doi = 10.1186/1471-2407-8-241
|url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2528014/ }}</ref>


===Immunoglobulin supergene family===
===Immunoglobulin supergene family===
Line 140: Line 1,241:


"The sequences of 𝛂<sub>1</sub>B-glycoprotein (38) and chicken N-CAM (neural cell-adhesion molecule) (39) have been shown to be related to the immunoglobulin supergene family."<ref name=Paxton>{{ cite journal
"The sequences of 𝛂<sub>1</sub>B-glycoprotein (38) and chicken N-CAM (neural cell-adhesion molecule) (39) have been shown to be related to the immunoglobulin supergene family."<ref name=Paxton>{{ cite journal
|author=R J Paxton, G Mooser, H Pande, T D Lee, and J E Shively
|author=R. J. Paxton, G. Mooser, H. Pande, T. D. Lee, and J. E. Shively
|title=Sequence analysis of carcinoembryonic antigen: identification of glycosylation sites and homology with the immunoglobulin supergene family
|title=Sequence analysis of carcinoembryonic antigen: identification of glycosylation sites and homology with the immunoglobulin supergene family
|journal=Proceedings of the National Academy of Sciences USA
|journal=Proceedings of the National Academy of Sciences USA
Line 291: Line 1,392:
|arxiv=
|arxiv=
|bibcode=
|bibcode=
|doi=
|doi=10.1016/j.bbagen.2006.06.020
|pmid=
|pmid=16945486
|accessdate=2017-10-08 }}</ref>
|accessdate=2017-10-08 }}</ref>


Line 316: Line 1,417:


There are A1BG genotypes.<ref name=McDonough>{{ cite journal
There are A1BG genotypes.<ref name=McDonough>{{ cite journal
|author=Caitrin W. McDonough, Yan Gong, Sandosh Padmanabhan, Ben Burkley, Taimour Y. Langaee, Olle Melander, Carl J. Pepine, Anna F. Dominiczak, Rhonda M. Cooper-DeHoff, Julie A. Johnson
|author=Caitrin W. McDonough, Yan Gong, Sandosh Padmanabhan, Ben Burkley, Taimour Y. Langaee, Olle Melander, Carl J. Pepine, Anna F. Dominiczak, Rhonda M. Cooper-DeHoff, and Julie A. Johnson
|title=Pharmacogenomic Association of Nonsynonymous SNPs in SIGLEC12, A1BG, and the Selectin Region and Cardiovascular Outcomes
|title=Pharmacogenomic Association of Nonsynonymous SNPs in ''SIGLEC12'', ''A1BG'', and the Selectin Region and Cardiovascular Outcomes
|journal=Hypertension
|journal=Hypertension
|date=June 2013
|date=June 2013
Line 390: Line 1,491:
|bibcode=
|bibcode=
|doi=10.1159/000153828
|doi=10.1159/000153828
|pmid=
|pmid=2759622
|accessdate=25 March 2020 }}</ref>
|accessdate=25 March 2020 }}</ref>


Line 405: Line 1,506:
|bibcode=
|bibcode=
|doi=10.1159/000153863
|doi=10.1159/000153863
|pmid=
|pmid=2583734
|accessdate=25 March 2020 }}</ref>
|accessdate=25 March 2020 }}</ref>


Line 462: Line 1,563:
|title=Cysteine-rich secretory protein 3 is a ligand of alpha1B-glycoprotein in human plasma
|title=Cysteine-rich secretory protein 3 is a ligand of alpha1B-glycoprotein in human plasma
|journal=Biochemistry
|journal=Biochemistry
|date=October 2004
|date=12 October 2004
|volume=43
|volume=43
|issue=40
|issue=40
|pages=12877-86
|pages=12877-86
|url=
|url=https://pubs.acs.org/doi/10.1021/bi048823e
|arxiv=
|arxiv=
|bibcode=
|bibcode=
Line 495: Line 1,596:
|accessdate=2012-02-20 }}</ref> "CRISP3 is highly expressed in the human cauda epididymidis and ampulla of vas deferens (Udby et al. 2005)."<ref name=Haendler/>
|accessdate=2012-02-20 }}</ref> "CRISP3 is highly expressed in the human cauda epididymidis and ampulla of vas deferens (Udby et al. 2005)."<ref name=Haendler/>


==ZNF497==
==A1BG-AS1==
{{main|ZNF497}}
 
Gene ID: 503538 is [[A1BG-AS1]] A1BG antisense RNA 1.<ref name=HGNC503538>{{ cite web
Gene ID: 503538 is [[A1BG-AS1]] A1BG antisense RNA 1.<ref name=HGNC503538>{{ cite web
|author=HGNC
|author=HGNC
Line 505: Line 1,606:
|url=https://www.ncbi.nlm.nih.gov/gene/503538
|url=https://www.ncbi.nlm.nih.gov/gene/503538
|accessdate=2019-12-18 }}</ref> A1BG-AS1 is transcribed in the negative direction from ZSCAN22.<ref name=HGNC503538/>
|accessdate=2019-12-18 }}</ref> A1BG-AS1 is transcribed in the negative direction from ZSCAN22.<ref name=HGNC503538/>
Gene ID 503538 extends from 58,351,390 to 58,355,183. It is a long, non-coding (lnc) RNA.<ref name=Bai>{{ cite journal
|author=Jigang Bai, Bowen Yao, Liang Wang, Liankang Sun, Tianxiang Chen, Runkun Liu, Guozhi Yin, Qiuran Xu, Wei Yang
|title=lncRNA A1BG-AS1 suppresses proliferation and invasion of hepatocellular carcinoma cells by targeting miR-216a-5p
|journal=
|date=June 2019
|volume=120
|issue=6
|pages=10310-10322
|url=https://pubmed.ncbi.nlm.nih.gov/30556161/
|arxiv=
|bibcode=
|doi=10.1002/jcb.28315
|pmid=30556161
|accessdate=16 May 2023 }}</ref> Extensive evidence indicates that long noncoding RNAs (lncRNAs) regulate the tumorigenesis and progression of hepatocellular carcinoma (HCC).<ref name=Bai/>
The underexpression of A1BG-AS1 was found in HCC via analysis of The Cancer Genome Atlas database.<ref name=Bai/> A1BG-AS1 expression in HCC was markedly lower than that in noncancerous tissues.<ref name=Bai/>
==ZNF497==
{{main|ZNF497}}


Gene ID: 162968 is [[ZNF497]] zinc finger protein 497.<ref name=HGNC162968>{{ cite web
Gene ID: 162968 is [[ZNF497]] zinc finger protein 497.<ref name=HGNC162968>{{ cite web
Line 544: Line 1,665:
|accessdate=2019-12-18 }}</ref> RNA5SP473 may be transcribed in the negative direction from ZNF497.<ref name=HGNC106479017/>
|accessdate=2019-12-18 }}</ref> RNA5SP473 may be transcribed in the negative direction from ZNF497.<ref name=HGNC106479017/>


==19q13.43==
==GC contents==
{{main|Genes on 19q13.43}}


==Regulatory elements and regions==
Approximately "76% of human core promoters lack TATA-like elements, have a high GC content, and are enriched in Sp1 binding sites."<ref name=Yang2>{{ cite journal
|author=Chuhu Yang, Eugene Bolotin, Tao Jiang, Frances M. Sladek, Ernest Martinez.
|title=Prevalence of the initiator over the TATA box in human and yeast genes and identification of DNA motifs enriched in human TATA-less core promoters
|journal=Gene
|month=March 7,
|year=2007
|volume=389
|issue=1
|pages=52-65
|pmid=17123746
|doi=10.1016/j.gene.2006.09.029
|url=http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1955227/?tool=pubmed }}</ref>


It may be still fair to say that in the apparent present era of functional genomics, the challenge is to elucidate gene function such as that of A1BG, its likely regulatory networks and signaling pathways.<ref name=Collins>{{ cite journal
CpG islands typically occur at or near the transcription start site of genes, particularly housekeeping genes, in vertebrates.<ref name="Saxonov2006">{{ cite journal
|author=Francis S Collins, Eric D Green, Alan E Guttmacher, Mark S Guyer
|author=Saxonov S, Berg P, Brutlag DL
|title=A vision for the future of genomics research
|title=A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters
|journal=Nature
|journal=Proc Natl Acad Sci USA
|date=24 April 2003
|volume=103
|volume=422
|issue=5
|issue=6934
|pages=1412–1417
|pages=835-47
|date=2006
|url=https://www.ncbi.nlm.nih.gov/pubmed/12695777
|pmid=16432200
|arxiv=
|pmc=1345710
|bibcode=
|doi=10.1073/pnas.0510310103 }}</ref>
|doi=10.1038/nature01626
|pmid=12695777
|accessdate=9 August 2020 }}</ref> "Since regulation of gene expression ''in vivo'' mainly occurs at the transcriptional level, identifying the location of genetic regulatory elements is a key to understanding the machinery regulating gene transcription. A major goal of current genome research is to identify the locations of all gene regulatory elements, including promoters, enhancers, silencers, insulators and boundary elements, and to analyze their relationship to the current annotation of human genes."<ref name=ENCODE>{{ cite journal
|author=The ENCODE Project Consortium
|title=The ENCODE (ENCyclopedia of DNA Elements) Project
|journal=Science
|date=22 October 2004
|volume=306
|issue=5696
|pages=636-640
|url=https://www.ncbi.nlm.nih.gov/pubmed/15499007
|arxiv=
|bibcode=
|doi=10.1126/science.1105136
|pmid=15499007
|accessdate=9 August 2020 }}</ref><ref name=ENCODE1>{{ cite journal
|author=The ENCODE Project Consortium
|title=Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project
|journal=Nature
|date=14 June 2007
|volume=447
|issue=7146
|pages=799-816
|url=https://www.ncbi.nlm.nih.gov/pubmed/17571346
|arxiv=
|bibcode=
|doi=10.1038/nature05874
|pmid=17571346
|accessdate=9 August 2020 }}</ref> Although "many genome-wide strategies have been developed for identifying functional elements", "no method yet has the resolution to precisely identify all regulatory elements or can be readily applied to the entire human genome."<ref name=Wang>{{ cite journal
|author=Ya-Mei Wang, Ping Zhou, Li-Yong Wang, Zhen-Hua Li, Yao-Nan Zhang, and Yu-Xiang Zhang
|title=Correlation Between DNase I Hypersensitive Site Distribution and Gene Expression in HeLa S3 Cells
|journal=PLoS One
|date=10 August 2012
|volume=7
|issue=8
|pages=e2414
|url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3416863/#pone.0042414-The1
|arxiv=
|bibcode=
|doi=10.1371/journal.pone.0042414
|pmid=22900019
|accessdate=9 August 2020 }}</ref>


There is one CRISPRi-validated cis-regulatory element on 19q13.43: Gene ID: 116286197 LOC116286197. And, four Sharpr-MPRA regulatory regions: (1) Gene ID: 112553117 LOC112553117 Sharpr-MPRA regulatory region 1998, Gene ID: 112553119 LOC112553119 Sharpr-MPRA regulatory region 10473, Gene ID: 112577453 LOC112577453 Sharpr-MPRA regulatory region 7872, and Gene ID: 112577454 is Sharpr-MPRA regulatory region 9894.
The number of CG or GC pairs near the TSS for A1BG appears to be low: between ZSCAN22 and A1BG are 8.2 % CG/GC and between ZNF497 and A1BG are 15 % CG/GC.


===DNase I hypersensitive sites===
==19q13.43==
{{main|Genes on 19q13.43}}


"This genomic region represents a DNase I hypersensitive site (DHS) that was predicted to be an enhancer by the ENCODE (ENCyclopedia Of DNA Elements) project based on various combinations of H3K27 acetylation and binding of p300, GATA1 and RNA polymerase II in K562 erythroleukemia cells. It was validated as a high-confidence cis-regulatory element for the ZNF582 (zinc finger protein 582) gene on chromosome 19 based on multiplex CRISPR/Cas9-mediated perturbation in K562 cells."<ref name=RefSeq116286197>{{ cite web
==Regulatory elements and regions==
|author=RefSeq
{{main|A1BG regulatory elements and regions}}
|title=LOC116286197 CRISPRi-validated cis-regulatory element chr19.6329 [ Homo sapiens (human) ]
|publisher=National Center for Biotechnology Information, U.S. National Library of Medicine
|location=8600 Rockville Pike, Bethesda MD, 20894 USA
|date=November 2019
|url=https://www.ncbi.nlm.nih.gov/gene/116286197
|accessdate=25 July 2020 }}</ref>
 
Gene ID: 116286197 CRISPRi-validated cis-regulatory element chr19.6329 is at NC_000019.10 (56186901..56187499).<ref name=RefSeq116286197/> Gene ID: 147948 ZNF582 is at NC_000019.10 (56382751..56393585, complement).<ref name=RefSeq147948/> The CRISPRi-validated cis-regulatory element chr19.6329 is (56382751 - 56186901) = 195850 nts from the beginning of ZNF582.
 
===Transcriptional regulatory regions===
 
"This genomic sequence was predicted to be a transcriptional regulatory region based on chromatin state analysis from the ENCODE (ENCyclopedia Of DNA Elements) project. It was validated as a functional enhancer by the Sharpr-MPRA technique (Systematic high-resolution activation and repression profiling with reporter tiling using massively parallel reporter assays) in K562 erythroleukemia cells (group: K562 Activating DNase unmatched - State 1:Tss, active promoter, TSS/CpG island region), with weaker activation in HepG2 liver carcinoma cells (group: HepG2 Activating DNase matched - State 1:Tss)."<ref name=RefSeq112553117>{{ cite web
|author=RefSeq
|title=LOC112553117 Sharpr-MPRA regulatory region 1998 [ Homo sapiens (human) ]
|publisher=National Center for Biotechnology Information, U.S. National Library of Medicine
|location=8600 Rockville Pike, Bethesda MD, 20894 USA
|date=June 2018
|url=https://www.ncbi.nlm.nih.gov/gene/112553117
|accessdate=25 July 2020 }}</ref>


"This genomic sequence was predicted to be a transcriptional regulatory region based on chromatin state analysis from the ENCODE (ENCyclopedia Of DNA Elements) project. It was validated as a functional enhancer by the Sharpr-MPRA technique (Systematic high-resolution activation and repression profiling with reporter tiling using massively parallel reporter assays) in HepG2 liver carcinoma cells (group: HepG2 Activating DNase matched - State 5:Enh, candidate strong enhancer, open chromatin). It also displayed weak repressive activity by Sharpr-MPRA in K562 erythroleukemia cells (group: K562 Repressive non-DNase unmatched - State 24:Quies, heterochromatin/dead zone)."<ref name=RefSeq112553119>{{ cite web
==Functions of A1BG==
|author=RefSeq
|title=Sharpr-MPRA regulatory region 10473 [ Homo sapiens (human) ]
|publisher=National Center for Biotechnology Information, U.S. National Library of Medicine
|location=8600 Rockville Pike, Bethesda MD, 20894 USA
|date=June 2018
|url=https://www.ncbi.nlm.nih.gov/gene/112553119
|accessdate=16 July 2020 }}</ref>


"This genomic sequence was predicted to be a transcriptional regulatory region based on chromatin state analysis from the ENCODE (ENCyclopedia Of DNA Elements) project. It was validated as a functional enhancer by the Sharpr-MPRA technique (Systematic high-resolution activation and repression profiling with reporter tiling using massively parallel reporter assays) in both HepG2 liver carcinoma cells (group: HepG2 Activating DNase unmatched - State 1:Tss, active promoter, TSS/CpG island region) and K562 erythroleukemia cells (group: K562 Activating DNase unmatched - State 1:Tss)."<ref name=RefSeq112577453>{{ cite web
"Receptors of the leukocyte receptor cluster (LRC) play a range of important functions in the human immune system."<ref name=Guselnikov>{{ cite journal
|author=RefSeq
|author=Sergey V Guselnikov and Alexander V Taranin
|title=Sharpr-MPRA regulatory region 7872 [ Homo sapiens (human) ]
|title=Unraveling the LRC Evolution in Mammals: IGSF1 and A1BG Provide the Keys
|publisher=National Center for Biotechnology Information, U.S. National Library of Medicine
|journal=Genome Biology and Evolution
|location=8600 Rockville Pike, Bethesda MD, 20894 USA
|date=1 June 2019
|date=June 2018
|url=https://www.ncbi.nlm.nih.gov/gene/112577453
|accessdate=1 August 2020 }}</ref>
 
"This genomic sequence was predicted to be a transcriptional regulatory region based on chromatin state analysis from the ENCODE (ENCyclopedia Of DNA Elements) project. It was validated as a functional enhancer by the Sharpr-MPRA technique (Systematic high-resolution activation and repression profiling with reporter tiling using massively parallel reporter assays) in K562 erythroleukemia cells (group: K562 Activating DNase unmatched - State 1:Tss, active promoter, TSS/CpG island region), with weaker activation in HepG2 liver carcinoma cells (group: HepG2 Activating DNase matched - State 1:Tss)."<ref name=RefSeq112577454>{{ cite web
|author=RefSeq
|title=Sharpr-MPRA regulatory region 9894 [ Homo sapiens (human) ]
|publisher=National Center for Biotechnology Information, U.S. National Library of Medicine
|location=8600 Rockville Pike, Bethesda MD, 20894 USA
|date=June 2018
|url=https://www.ncbi.nlm.nih.gov/gene/112577454
|accessdate=16 July 2020 }}</ref>
 
"The growth hormone-regulated transcription factors STAT5 and BCL6 coordinately regulate sex differences in mouse liver, primarily through effects in male liver, where male-biased genes are upregulated and many female-biased genes are actively repressed."<ref name=Conforto>{{ cite journal
|author=Tara L. Conforto, Yijing Zhang, Jennifer Sherman, and David J. Waxman
|title=Impact of CUX2 on the Female Mouse Liver Transcriptome: Activation of Female-Biased Genes and Repression of Male-Biased Genes
|journal=Molecular and Cellular Biology
|date=November 2012
|volume=32
|issue=22
|pages=4611–4627
|url=https://mcb.asm.org/content/mcb/32/22/4611.full.pdf
|arxiv=
|bibcode=
|doi=10.1128/MCB.00886-12
|pmid=
|accessdate=8 August 2020 }}</ref> "CUX2, a highly female-specific liver transcription factor, contributes to an analogous regulatory network in female liver. Adenoviral overexpression of CUX2 in male liver induced 36% of female-biased genes and repressed 35% of male-biased genes. In female liver, CUX2 small interfering RNA (siRNA) preferentially induced genes repressed by adenovirus expressing CUX2 (adeno-CUX2) in male liver, and it preferentially repressed genes induced by adeno-CUX2 in male liver. CUX2 binding in female liver chromatin was enriched at sites of male-biased DNase hypersensitivity and at genomic regions showing male-enriched STAT5 binding. CUX2 binding was also enriched near genes repressed by adeno-CUX2 in male liver or induced by CUX2 siRNA in female liver but not at genes induced by adeno-CUX2, indicating that CUX2 binding is preferentially associated with gene repression. Nevertheless, direct CUX2 binding was seen at several highly female-specific genes that were positively regulated by CUX2, including A1bg [A1BG in humans], Cyp2b9, Cyp3a44, Tox [TOX in humans], and Trim24 [TRIM24 in humans]."<ref name=Conforto/>
 
==A boxes==
{{main|A box gene transcriptions}}
There is one A box on the positive strand in the negative direction (from ZSCAN22 to A1BG): 3'-TGACTCT-5' at 2788.
 
There is one A box complement on the negative strand in the negative direction: 3'-ACTGAGA-5' at 2788.
 
There is one A box inverse complement on the negative strand in the positive direction: 3'-AGAGTCA-5' at 2613.
 
There is one A box inverse on the positive strand in the positive direction: 3'-TCTCAGT-5' at 2613.
 
==ACGT-containing elements==
{{main|ACGT-containing element gene transcriptions}}
# ACGT elements, negative strand, negative direction: 24, 3'-ACGT-5' at 150, 3'-ACGT-5' at 1030, 3'-ACGT-5' at 1321, 3'-ACGT-5' at 1337, 3'-ACGT-5' at 1345, 3'-ACGT-5' at 1470, 3'-ACGT-5' at 1494, 3'-ACGT-5' at 1535, 3'-ACGT-5' at 1717, 3'-ACGT-5' at 1974, 3'-ACGT-5' at 1998, 3'-ACGT-5' at 2081, 3'-ACGT-5' at 2400, 3'-ACGT-5' at 2424, 3'-ACGT-5' at 2735, 3'-ACGT-5' at 2759, 3'-ACGT-5' at 2863, 3'-ACGT-5' at 3287, 3'-ACGT-5' at 3429, 3'-ACGT-5' at 3771, 3'-ACGT-5' at 4245, 3'-ACGT-5' at 4315, 3'-ACGT-5' at 4330, 3'-ACGT-5' at 4338.
# ACGT elements, negative strand, positive direction: 2, 3'-ACGT-5' at 569, 3'-ACGT-5' at 3254.
# ACGT elements, positive strand, negative direction: 4, 3'-ACGT-5' at 342, 3'-ACGT-5' at 531, 3'-ACGT-5' at 1772, 3'-ACGT-5' at 4236.
# ACGT elements, positive strand, positive direction: 44, 3'-ACGT-5' at 192, 3'-ACGT-5' at 224, 3'-ACGT-5' at 436, 3'-ACGT-5' at 531, 3'-ACGT-5' at 546, 3'-ACGT-5' at 656, 3'-ACGT-5' at 783, 3'-ACGT-5' at 1119, 3'-ACGT-5' at 1218, 3'-ACGT-5' at 1370, 3'-ACGT-5' at 1470, 3'-ACGT-5' at 1505, 3'-ACGT-5' at 1613, 3'-ACGT-5' at 1786, 3'-ACGT-5' at 1820, 3'-ACGT-5' at 1935, 3'-ACGT-5' at 2063, 3'-ACGT-5' at 2204, 3'-ACGT-5' at 2326, 3'-ACGT-5' at 2334, 3'-ACGT-5' at 2350, 3'-ACGT-5' at 2681, 3'-ACGT-5' at 2690, 3'-ACGT-5' at 2719, 3'-ACGT-5' at 2743, 3'-ACGT-5' at 2800, 3'-ACGT-5' at 2857, 3'-ACGT-5' at 2960, 3'-ACGT-5' at 3061, 3'-ACGT-5' at 3070, 3'-ACGT-5' at 3142, 3'-ACGT-5' at 3230, 3'-ACGT-5' at 3268, 3'-ACGT-5' at 3279, 3'-ACGT-5' at 3320, 3'-ACGT-5' at 3341, 3'-ACGT-5' at 3400, 3'-ACGT-5' at 3459, 3'-ACGT-5' at 3464, 3'-ACGT-5' at 3829, 3'-ACGT-5' at 3883, 3'-ACGT-5' at 3960, 3'-ACGT-5' at 4315, 3'-ACGT-5' at 4341.
 
ACGT-containing elements include these metal responsive elements:
# complement, negative strand, negative direction: 6, 3'-ACGTGAG-5' at 1348, 3'-ACGTGAG-5' at 2001, 3'-ACGTGAG-5' at 2427, 3'-ACGTGGG-5' at 2762, 3'-ACGTGAG-5' at 3290, and 3'-ACGTGAG-5' at 4341.
# complement, positive strand, negative direction: 6, 3'-ACGTGTG-5' at 549, 3'-ACGTGTG-5' at 1221, 3'-ACGTGAG-5' at 1373, 3'-ACGTGAG-5' at 1473, 3'-ACGTGTG-5' at 2963, 3'-ACGTGGG-5' at 3323.
# inverse, negative strand, negative direction: 2, 3'-CTCACGT-5' at 1470, 3'-CACACGT-5' at 2863.
# inverse, positive strand, negative direction: 2, 3'-CACACGT-5' at 531, 3'-CTCACGT-5' at 1772.
# inverse, positive strand, positive direction: 6, 3'-CGCACGT-5' at 546, 3'-CGCACGT-5' at 1218, 3'-CTCACGT-5' at 1786, 3'-CTCACGT-5' at 2326, 3'-CCCACGT-5' at 2800, 3'-CCCACGT-5' at 3883.
 
ACGT-containing elements include these cAMP response elements (CRE):
# negative strand in the negative direction (from ZSCAN22 to A1BG): 1, 3'-TGACGTCA-5' at 4317.
 
==AGC boxes==
{{Main|AGC box gene transcriptions}}
An inverse AGC box occurs negative strand, negative direction, 3'-CCGCCGA-5' at 1754 nts from ZSCAN22 toward A1BG in the distal promoter with its complement on the positive strand, negative direction.
 
==Angiotensinogen core promoter elements==
{{main|Angiotensinogen core promoter element gene transcriptions}}
# AGCE, negative strand, negative direction, looking for 3'-A/C-T-C/T-G-T-G-5': 4, 3'-ATTGTG-5' at 340, 3'-ATCGTG-5' at 2096, 3'-CTTGTG-5' at 3669, 3'-CTCGTG-5' at 3914.
# AGCE, negative strand, positive direction, looking for 3'-A/C-T-C/T-G-T-G-5': 2, 3'-ATTGTG-5' at 2679, 3'-CTCGTG-5' at 4376.
# AGCE, positive strand, negative direction, looking for 3'-A/C-T-C/T-G-T-G-5': 0.
# AGCE, positive strand, positive direction, looking for 3'-A/C-T-C/T-G-T-G-5': 6, 3'-CTCGTG-5' at 855, 3'-CTCGTG-5' at 955, 3'-CTCGTG-5' at 1207, 3'-CTCGTG-5' at 1627, 3'-CTTGTG-5' at 3095, 3'-CTCGTG-5' at 3739.
# AGCEc, negative strand, negative direction, looking for 3'-G/T-A-A/G-C-A-C-5': 0.
# AGCEc, negative strand, positive direction, looking for 3'-G/T-A-A/G-C-A-C-5': 6, 3'-GAGCAC-5' at 855, 3'-GAGCAC-5' at 955, 3'-GAGCAC-5' at 1207, 3'-GAGCAC-5' at 1627, 3'-GAACAC-5' at 3095, 3'-GAGCAC-5' at 3739.
# AGCEc, positive strand, negative direction, looking for 3'-G/T-A-A/G-C-A-C-5': 4, 3'-TAACAC-5' at 340, 3'-TAGCAC-5' at 2096, 3'-GAACAC-5' at 3669, 3'-GAGCAC-5' at 3914.
# AGCEc, positive strand, positive direction, looking for 3'-G/T-A-A/G-C-A-C-5': 2, 3'-TAACAC-5' at 2679, 3'-GAGCAC-5' at 4376.
# AGCEci, negative strand, negative direction, looking for 3'-C-A-C-A/G-A-G/T-5': 2, 3'-CACGAT-5' at 336, 3'-CACGAG-5' at 4403.
# AGCEci, negative strand, positive direction, looking for 3'-C-A-C-A/G-A-G/T-5': 1, 3'-CACGAG-5' at 243.
# AGCEci, positive strand, negative direction, looking for 3'-C-A-C-A/G-A-G/T-5': 10, 3'-CACGAG-5' at 435, 3'-CACGAG-5' at 572, 3'-CACGAG-5' at 708, 3'-CACGAG-5' at 1182, 3'-CACAAT-5' at 1721, 3'-CACAAG-5' at 2244, 3'-CACGAG-5' at 3232, 3'-CACAAT-5' at 3515, 3'-CACAAG-5' at 3634, 3'-CACGAG-5' at 4472.
# AGCEci, positive strand, positive direction, looking for 3'-C-A-C-A/G-A-G/T-5': 3, 3'-CACAAG-5' at 107, 3'-CACGAG-5' at 2090, 3'-CACGAG-5' at 3152.
# AGCEi, negative strand, negative direction, looking for 3'-G-T-G-C/T-T-A/C-5': 10, 3'-GTGCTC-5' at 435, 3'-GTGCTC-5' at 572, 3'-GTGCTC-5' at 708, 3'-GTGCTC-5' at 1182, 3'-GTGTTA-5' at 1721, 3'-GTGTTC-5' at 2244, 3'-GTGCTC-5' at 3232, 3'-GTGTTA-5' at 3515, 3'-GTGTTC-5' at 3634, 3'-GTGCTC-5' at 4472.
# AGCEi, negative strand, positive direction, looking for 3'-G-T-G-C/T-T-A/C-5': 3, 3'-GTGTTC-5' at 107, 3'-GTGCTC-5' at 2090, 3'-GTGCTC-5' at 3152.
# AGCEi, positive strand, negative direction, looking for 3'-G-T-G-C/T-T-A/C-5': 2, 3'-GTGCTA-5' at 336, 3'-GTGCTC-5' at 4403.
# AGCEi, positive strand, positive direction, looking for 3'-G-T-G-C/T-T-A/C-5': 0.
 
==ATA boxes==
{{main|ATA box gene transcriptions}}
 
===Core promoters===
 
There is the following inverse ATA box on the negative strand, negative direction: 1, 3'-AAATAA-5' at 4537 inside A1BG as the TSS is at 4460 nts from ZSCAN22.
 
===Proximal promoters===
 
There is the following inverse ATA box on the positive strand, negative direction: 3'-AAATAA-5' at 4221.
 
There is one inverse and inverse complement between 4050 and 4300 in the positive direction: 3'-AAATAA-5' at 4142, and 3'-TTTATT-5' at 4142.
 
===Distal promoters===
 
There is the following ATA box on the negative strand in the negative direction: 1, 3'-AATAAA-5' at 1726 nts from ZSCAN22.
 
There are the following ATA boxes on the positive strand in the negative direction: 3, 3'-AATAAA-5' at 3014, 3'-AATAAA-5' at 3335, and 3'-AATAAA-5' at 4072.
 
There are the following inverse ATA boxes on the positive strand, negative direction: 4, 3'-AAATAA-5' at 3013, 3'-AAATAA-5' at 3334, 3'-AAATAA-5' at 4071, 3'-AAATAA-5' at 4075.
 
There is the following ATA box on the negative strand in the positive direction: 1, 3'-AATAAA-5' at 3427. It has a complement on the positive strand in the positive direction: 1, 3'-TTATTT-5' at 3427.
 
There is another inverse complement ATA box on the negative strand in the positive direction in distal promoter: 3'-TTTATT-5' at 2347. It also has an inverse in the distal promoter: 3'-AAATAA-5' at 2347.
 
==B boxes==
{{main|B box gene transcriptions}}
While there appear to be at least two B boxes, TGGGCA is one B-box,<ref name=Johnson>{{ cite journal
|author=PA Johnson, D Bunick, NB Hecht
|title=Protein Binding Regions in the Mouse and Rat Protamine-2 Genes
|journal=Biology of Reproduction
|date=1991
|volume=44
|issue=1
|pages=127-134
|url=https://academic.oup.com/biolreprod/article-pdf/44/1/127/10536199/biolreprod0127.pdf
|arxiv=
|bibcode=
|doi=
|pmid=
|accessdate=6 April 2019 }}</ref> where the "mP2 EB fragment used for binding was the 118 nucleotide fragment extending from the ''Dde'' I site at position -140 to the ''Dde'' I site at position -23 [...]. This fragment contains the GC, E, B, CAAT, and TATA boxes."<ref name=Johnson/>
 
# negative strand in the negative direction, looking for 3'-TGGGCA-5', 0.
# negative strand in the positive direction, looking for 3'-TGGGCA-5', 4, 3'-TGGGCA-5' at 27, 3'-TGGGCA-5' at 1945, 3'-TGGGCA-5' at 2894, 3'-TGGGCA-5' at 4180.
# positive strand in the negative direction, looking for 3'-TGGGCA-5', 9, 3'-TGGGCA-5' at 462, 3'-TGGGCA-5' at 902, 3'-TGGGCA-5' at 1114, 3'-TGGGCA-5' at 1359, 3'-TGGGCA-5' at 2438, 3'-TGGGCA-5' at 2773, 3'-TGGGCA-5' at 3301, 3'-TGGGCA-5' at 4040, 3'-TGGGCA-5' at 4191.
# positive strand in the positive direction, looking for 3'-TGGGCA-5', 0,
# complement, negative strand, negative direction, looking for 3'-ACCCGT-5', 9, 3'-ACCCGT-5' at 462, 3'-ACCCGT-5' at 902, 3'-ACCCGT-5' at 1114, 3'-ACCCGT-5' at 1359, 3'-ACCCGT-5' at 2438, 3'-ACCCGT-5' at 2773, 3'-ACCCGT-5' at 3301, 3'-ACCCGT-5' at 4040, 3'-ACCCGT-5' at 4191.
# complement, negative strand, positive direction, looking for 3'-ACCCGT-5', 0.
# complement, positive strand, negative direction, looking for 3'-ACCCGT-5', 0.
# complement, positive strand, positive direction, looking for 3'-ACCCGT-5', 4, 3'-ACCCGT-5' at 27, 3'-ACCCGT-5' at 1945, 3'-ACCCGT-5' at 2894, 3'-ACCCGT-5' at 4180.
# inverse complement, negative strand, negative direction, looking for 3'-TGCCCA-5', 0.
# inverse complement, negative strand, positive direction, looking for 3'-TGCCCA-5', 2, 3'-TGCCCA-5' at 3237, 3'-TGCCCA-5' at 3377.
# inverse complement, positive strand, negative direction, looking for 3'-TGCCCA-5', 4, 3'-TGCCCA-5' at 1458, 3'-TGCCCA-5' at 3854, 3'-TGCCCA-5' at 3883, 3'-TGCCCA-5' at 4251.
# inverse complement, positive strand, positive direction, looking for 3'-TGCCCA-5', 1, 3'-TGCCCA-5' at 3750.
# inverse, negative strand, negative direction, looking for 3'-ACGGGT-5', 4, 3'-ACGGGT-5' at 1458, 3'-ACGGGT-5' at 3854, 3'-ACGGGT-5' at 3883, 3'-ACGGGT-5' at 4251.
# inverse, negative strand, positive direction, looking for 3'-ACGGGT-5', 1, 3'-ACGGGT-5' at 3750.
# inverse, positive strand, negative direction, looking for 3'-ACGGGT-5', 0.
# inverse, positive strand, positive direction, looking for 3'-ACGGGT-5', 2, 3'-ACGGGT-5' at 3237, 3'-ACGGGT-5' at 3377.
 
The other is associated with the human transforming growth factor b1 binding sequences.<ref name=Paratore>{{ cite journal
|author=Amber Paratore Sanchez and Kumar Sharma
|title=Transcription factors in the pathogenesis of diabetic nephropathy
|journal=Expert Reviews in Molecular Medicine
|date=July 2009
|volume=11
|volume=11
|issue=
|issue=6
|pages=e13
|pages=1586-1601
|url=https://www.cambridge.org/core/journals/expert-reviews-in-molecular-medicine/article/transcription-factors-in-the-pathogenesis-of-diabetic-nephropathy/5459130CB955272C047982BE21FEE256
|url=
|arxiv=
|bibcode=
|doi=10.1017/S1462399409001057
|pmid=
|accessdate=1 October 2018 }}</ref>
 
And, has the consensus sequence 3'-TGTCTCA-5'. Let it be designated B1box.
 
# negative strand in the negative direction, looking for 3'-TGTCTCA-5', 2, 3'-TGTCTCA-5' at 1075, 3'-TGTCTCA-5' at 2445.
# negative strand in the positive direction, looking for 3'-TGTCTCA-5', 2, 3'-TGTCTCA-5'at 2174, 3'-TGTCTCA-5' at 2468.
# positive strand in the negative direction, looking for 3'-TGTCTCA-5', 5, 3'-TGTCTCA-5' at 923, 3'-TGTCTCA-5' at 1089, 3'-TGTCTCA-5' at 2033, 3'-TGTCTCA-5' at 3323, 3'-TGTCTCA-5' at 4373.
# positive strand in the positive direction, looking for 3'-TGTCTCA-5', 0.
# complement, negative strand, negative direction, looking for 3'-ACAGAGT-5', 5, 3'-ACAGAGT-5' at 923, 3'-ACAGAGT-5' at 1089, 3'-ACAGAGT-5' at 2033, 3'-ACAGAGT-5' at 3323, 3'-ACAGAGT-5' at 4373.
# complement, negative strand, positive direction, looking for 3'-ACAGAGT-5', 0.
# complement, positive strand, negative direction, looking for 3'-ACAGAGT-5', 2, 3'-ACAGAGT-5' at 1075, 3'-ACAGAGT-5' at 2445.
# complement, positive strand, positive direction, looking for 3'-ACAGAGT-5', 2, 3'-ACAGAGT-5' at 2174, 3'-ACAGAGT-5' at 2468.
# inverse complement, negative strand, negative direction, looking for 3'-TGAGACA-5', 3, 3'-TGAGACA-5' at 919, 3'-TGAGACA-5' at 1085, 3'-TGAGACA-5' at 2029.
# inverse complement, negative strand, positive direction, looking for 3'-TGAGACA-5', 0.
# inverse complement, positive strand, negative direction, looking for 3'-TGAGACA-5', 0.
# inverse complement, positive strand, positive direction, looking for 3'-TGAGACA-5', 1, 3'-TGAGACA-5' at 2308.
# inverse, negative strand, negative direction, looking for 3'-ACTCTGT-5', 0.
# inverse, negative strand, positive direction, looking for 3'-ACTCTGT-5', 1, 3'-ACTCTGT-5' at 2308.
# inverse, positive strand, negative direction, looking for 3'-ACTCTGT-5', 3, 3'-ACTCTGT-5' at 919, 3'-ACTCTGT-5' at 1085, 3'-ACTCTGT-5' at 2029.
# inverse, positive strand, positive direction, looking for 3'-ACTCTGT-5', 0.
 
==B recognition elements==
{{main|Factor II B recognition element gene transcriptions}}
The factor II B recognition element is BRE<sup>u</sup>.
 
Negative strand in the negative direction there are 3: 3'-CCACGCC-5' at 380, 3'-CCGCGCC-5' at 1762, and 3'-CCACGCC-5' at 2197 the distal promoter.
 
Complement, negative strand, negative direction there us 1: 3'-CCTGCGG-5' at 1153.
 
Inverse complement, positive strand, negative direction there are 4: 3'-GGCGTGG-5' at 1244, 3'-GGCGCGG-5' at 1762, 3'-GGCGTGG-5' at 1897, and 3'-GGCGTGG-5' at 3047.
 
Negative strand in the positive direction there are 3: 3'-GCACGCC-5', 1302, 3'-GGACGCC-5', 1672, 3'-GGGCGCC-5', 1769.
 
Positive strand in the positive direction there are 3: 3'-CCACGCC-5', 489, 3'-CGACGCC-5', 1033, 3'-CCACGCC-5', 1764.
 
Inverse complement, negative strand, positive direction there is 1: 3'-GGCGCCC-5', 1770.
 
Inverse complement, positive strand, positive direction there is 4: 3'-GGCGCGC-5', 682, 3'-GGCGCCG-5', 1338, 3'-GGCGCCG-5', 1438, 3'-GGCGTGG-5', 2566.
 
==CAAT boxes==
{{Main|CAAT box gene transcriptions}}
There are no CAAT boxes in either promoter.
 
==CAREs==
{{main|CARE gene transcriptions}}
A CARE occurs in the negative direction: 3'-CAACTC-5' at 86 possibly associated with ZSCAN22. But inverse CAREs occur 3'-CTCAAC-5' at 1406, 3'-CTCAAC-5' at 2592, 3'-CTCAAC-5' at 2704, 3'-CTCAAC-5' at 3115, and 3'-CTCAAC-5' at 4096.
 
A CARE occurs in the positive direction: 3'-CAACTC-5' at 3292 in the positive direction. But inverse CARE occur 3'-CTCAAC-5' at 1406 and 3'-CTCAAC-5' at 1621 and 3'-CTCAAC-5' at 3290.
 
==CArG boxes==
{{main|CArG box gene transcriptions}}
 
There is a more general CArG box, 3'-CATTAAAAGG-5', at 3441 from ZSCAN22, or -1019 nts from the TSS of A1BG in the negative direction on the positive strand in the distal promoter.
 
A second more general CArG box, 3'-CAAAAAAAAG-5', at 1399 from ZSCAN22, or -3061 nts from the A1BG TSS may be a CArG box for ZSCAN22 in the negative direction on the positive strand in the distal promoter.
 
==C boxes==
{{main|C box gene transcriptions}}
 
===Proximal promoters===
 
Inverse complement, negative strand, negative direction there is 1: 3'-ACATCA-5', 4124.
 
There is one C box 3'-ACATCA-5' at 4116 nts in the positive direction.
 
===Distal promoters===
 
There are four C boxes: 3'-AGTAGT-5' at 2888, 3'-AGTAGT-5' at 2944, 3'-AGTAGT-5' at 3418, and 3'-AGTAGT-5' at 3521 on the negative strand in the negative direction and its complement on the positive strand.
 
Inverse complement, negative strand, negative direction there are 2: 3'-ACATCA-5', 2340, 3'-ACATCA-5', 2541.
 
There is one complement C box: 3'-TCATCA-5' at 3251 on the negative strand in the positive direction and its complement on the positive strand.
 
Inverse, negative strand, positive direction, there is 1: 3'-TGATGA-5', 2144.
 
Positive strand in the positive direction there is 1: 3'-AGTAGT-5', 3251.
 
==CENP-B boxes==
{{main|CENP-B box gene transcriptions}}
There are no CENP-B boxes in either promoter.
 
==CGCG boxes==
{{main|CGCG box gene transcriptions}}
Negative strand in the negative direction there are 2: 3'-GCGCGT-5', 161, 3'-CCGCGC-5', 1761, in the distal promoter.
 
Positive strand in the negative direction there is 1: 3'-GCGCGG-5', 1762, in the distal promoter.
 
Negative strand in the positive direction there are 8: 3'-GCGCGT-5', 543, 3'-CCGCGC-5', 681, 3'-GCGCGC-5', 683, 3'-ACGCGG-5', 871, 3'-ACGCGG-5', 971, 3'-CCGCGG-5', 1337, 3'-CCGCGG-5', 1437, 3'-CCGCGC-5', 1650, in the distal promoter.
 
Positive strand in the positive direction there are 22: 3'-CCGCGC-5', 161, 3'-ACGCGG-5', 452, 3'-CCGCGC-5', 542, 3'-GCGCGC-5', 682, 3'-GCGCGT-5', 684, 3'-CCGCGT-5', 876, 3'-CCGCGT-5', 976, 3'-CCGCGT-5', 1046, 3'-ACGCGG-5', 1078, 3'-ACGCGG-5', 1162, 3'-CCGCGC-5', 1214, 3'-ACGCGG-5', 1246, 3'-CCGCGT-5', 1298, 3'-ACGCGT-5', 1314, 3'-ACGCGG-5', 1354, 3'-ACGCGG-5', 1398, 3'-ACGCGT-5', 1414, 3'-ACGCGG-5', 1454, 3'-ACGCGG-5', 1498, 3'-ACGCGT-5', 1523, 3'-CCGCGT-5', 1550, 3'-CCGCGG-5', 1769, in the distal promoter.
 
==CRE boxes==
{{main|CRE box gene transcriptions}}
Negative strand in the negative direction there is 1: 3'-TGACGTCA-5', 4317, and its complement in the proximal promoter.
 
==D boxes==
{{main|C and D box gene transcriptions}}
There is one D box in the distal promoter: 3'-AGTCTG-5' at 2947 on the negative strand in the negative direction and its complement on the positive strand.
 
Positive strand in the negative direction there is 1: 3'-AGTCTG-5', 1355.
 
Inverse complement, positive strand, negative direction there are 2: 3'-CAGACT-5', 15, 3'-CAGACT-5', 1616.
 
There is one D box in the distal promoter: 3'-AGTCTG-5' at 3923 on the negative strand in the positive direction and its complement on the positive strand.
 
Inverse complement, negative strand, positive direction there are 2: 3'-CAGACT-5', 1744, 3'-CAGACT-5', 2416.
 
Inverse complement, positive strand, positive direction there are 3: 3'-CAGACT-5', 2943, 3'-CAGACT-5', 3006, 3'-CAGACT-5', 3924.
 
==Downstream B recognition elements==
{{main|Downstream TFIIB recognition element gene transcriptions}}
# negative strand in the negative direction, looking for 3'-A/G-T-A/G/T-G/T-G/T-G/T-G/T-5', 59, 3'-ATTTTGT-5' at 68, 3'-ATATGTT-5' at 113, 3'-GTTTTGT-5' at 166, 3'-ATATTTT-5' at 183, 3'-ATATTTT-5' at 222, 3'-GTTTTGG-5' at 259, 3'-ATGTTTT-5' at 485, 3'-GTTTTTT-5' at 487, 3'-ATTGGGG-5' at 616, 3'-ATGTTTT-5' at 637, 3'-GTTTTTT-5' at 639, 3'-ATGTTTT-5' at 771, 3'-GTTTTTT-5' at 773, 3'-GTGTGGT-5' at 883, 3'-GTTTTTT-5' at 928, 3'-GTTTTTT-5' at 1094, 3'-ATGTTTT-5' at 1228, 3'-GTTTTTT-5' at 1230, 3'-GTTTTTG-5' at 1386, 3'-GTTTGTT-5' at 1392, 3'-GTTTTTT-5' at 1396, 3'-GTTGGGT-5' at 1409, 3'-GTTGGGT-5' at 1516, 3'-GTTTGTG-5' at 1540, 3'-ATGTTTT-5' at 1880, 3'-GTTTTTT-5' at 1882, 3'-GTTTTTT-5' at 2038, 3'-ATGTTTT-5' at 2182, 3'-GTTTTTT-5' at 2184, 3'-ATGTTTT-5' at 2307, 3'-GTTTTTT-5' at 2309, 3'-GTGTGGT-5' at 2419, 3'-GTTTGTT-5' at 2484, 3'-GTTTGTT-5' at 2488, 3'-ATATGTT-5' at 2642, 3'-ATGTTTT-5' at 2644, 3'-GTGGGGT-5' at 2764, 3'-GTTGGGT-5' at 2846, 3'-ATATTTG-5' at 2875, 3'-GTAGTTT-5' at 2890, 3'-ATTTTTT-5' at 3026, 3'-GTGGGTT-5' at 3136, 3'-ATTTTTG-5' at 3165, 3'-GTATTTT-5' at 3171, 3'-GTTTTTG-5' at 3328, 3'-ATTTGTT-5' at 3338, 3'-ATTTGGT-5' at 3365, 3'-ATTTGGT-5' at 3484, 3'-GTAGTTG-5' at 3523, 3'-ATGGTGG-5' at 3740, 3'-GTGTTTT-5' at 3767, 3'-ATGTTTT-5' at 4066, 3'-GTTTTTT-5' at 4068, 3'-GTTGTGT-5' at 4196, 3'-ATGTTTT-5' at 4216, 3'-GTTTTTT-5' at 4218, 3'-GTTTTTT-5' at 4378, 3'-GTGGGGT-5' at 4446, 3'-GTAGGTG-5' at 4458 and their complements.
# negative strand in the positive direction, looking for 3'-A/G-T-A/G/T-G/T-G/T-G/T-G/T-5', 11, 3'-GTGGGGG-5' at 56, 3'-ATTTTTT-5' at 2451, 3'-GTGTTGG-5' at 2816, 3'-ATGTTTG-5' at 3339, 3'-GTGGTGG-5' at 3816, 3'-GTGTGGT-5' at 3967, 3'-GTGGTGT-5' at 3969, 3'-GTGGTTT-5' at 4108, 3'-ATTGTTG-5' at 4173, 3'-ATGGGGG-5' at 4225, 3'-GTGGGGT-5' at 4397 and their complements.
# positive strand in the negative direction, looking for 3'-A/G-T-A/G/T-G/T-G/T-G/T-G/T-5', 31, 3'-ATATGTT-5' at 43, 3'-ATATGGG-5' at 78, 3'-ATGGGGT-5' at 204, 3'-ATGTTTT-5' at 215, 3'-ATATGGT-5' at 606, 3'-ATGGTGT-5' at 608, 3'-ATGTGGT-5' at 788, 3'-GTGGTGG-5' at 790, 3'-GTGGTGT-5' at 793, 3'-ATTGGGT-5' at 1047, 3'-GTGGGTG-5' at 1163, 3'-GTGGTGG-5' at 1247, 3'-GTGGTGT-5' at 1477, 3'-GTGGTGG-5' at 1900, 3'-GTGGTGG-5' at 1903, 3'-GTGGGTG-5' at 2332, 3'-GTGTGGT-5' at 2659, 3'-GTGGTGG-5' at 2661, 3'-ATATTTT-5' at 2853, 3'-GTGGTGG-5' at 3050, 3'-GTGTGGT-5' at 3187, 3'-GTGGTGG-5' at 3189, 3'-GTGGTGG-5' at 3192, 3'-GTGGGTG-5' at 3195, 3'-ATTGGTT-5' at 3531, 3'-GTGGTTG-5' at 3605, 3'-ATGGGGT-5' at 3802, 3'-ATGTGGT-5' at 3811, 3'-GTGTTGG-5' at 3942, 3'-GTTGGTT-5' at 3944, 3'-ATGGTGG-5' at 4110 and their complements.
# positive strand in the positive direction, looking for 3'-A/G-T-A/G/T-G/T-G/T-G/T-G/T-5', 19, 3'-GTGGGTG-5' at 72, 3'-GTAGGTG-5' at 631, 3'-GTAGGTG-5' at 700, 3'-GTGGTGG-5' at 704, 3'-ATGGGGT-5' at 1891, 3'-GTTGGGT-5' at 2015, 3'-GTGGGGG-5' at 2020, 3'-GTTGGTG-5' at 2122, 3'-ATATGGT-5' at 2591, 3'-ATGGTGT-5' at 2600, 3'-GTGTGGT-5' at 2603, 3'-ATGGTGG-5' at 2759, 3'-GTGTGGG-5' at 2965, 3'-ATAGGGT-5' at 3386, 3'-GTAGGGT-5' at 3631, 3'-GTGTGGT-5' at 3825, 3'-GTTTGTG-5' at 4257, 3'-GTGGGGT-5' at 4286, 3'-GTGGGGT-5' at 4328 and their complements.
# inverse, negative strand, negative direction, is SuccessablesdBREi--.bas, looking for 3'-G/T-G/T-G/T-G/T-A/G/T-T-A/G-5', 44, 3'-TTTGTTA-5' at 230, 3'-TTTTGTA-5' at 361, 3'-TTTTTTA-5' at 488, 3'-TTTTATG-5' at 633, 3'-TTTTATG-5' at 767, 3'-TGTGGTA-5' at 884, 3'-GGTTGTA-5' at 1205, 3'-TTTTTTA-5' at 1231, 3'-GTTTTTG-5' at 1386, 3'-GTTTGTG-5' at 1540, 3'-TTTTATG-5' at 1564, 3'-TTGTTTG-5' at 1587, 3'-TTTTATA-5' at 1740, 3'-TGGGGTA-5' at 1861, 3'-TTTTATG-5' at 1876, 3'-TTTTTTA-5' at 2061, 3'-GGTTGTA-5' at 2150, 3'-TTTTTTA-5' at 2185, 3'-TGGGGTA-5' at 2288, 3'-TTTTATG-5' at 2303, 3'-TGTGGTG-5' at 2420, 3'-TTGTTTG-5' at 2486, 3'-TTGTTTG-5' at 2511, 3'-GGTTGTG-5' at 2549, 3'-GGTTGTA-5' at 2612, 3'-GTTTTTA-5' at 2646, 3'-TTTGTTG-5' at 2843, 3'-TTTTATA-5' at 2869, 3'-TTTTTTA-5' at 2930, 3'-TTTGGTG-5' at 2972, 3'-TTTTTTG-5' at 3027, 3'-TGGGTTG-5' at 3137, 3'-TGGGGTA-5' at 3152, 3'-TTTTGTA-5' at 3167, 3'-GTTTTTG-5' at 3328, 3'-TTTGGTG-5' at 3366, 3'-TTTTGTG-5' at 3512, 3'-GTTGATA-5' at 3526, 3'-TGTTTTA-5' at 3768, 3'-GGGTATG-5' at 3857, 3'-GGTTGTG-5' at 3981, 3'-TTTTTTA-5' at 4069, 3'-TTTTTTA-5' at 4219, 3'-TTGGGTA-5' at 4454 and their complements.
# inverse, negative strand, positive direction, is SuccessablesdBREi-+.bas, looking for 3'-G/T-G/T-G/T-G/T-A/G/T-T-A/G-5', 16, 3'-GGGGATG-5' at 59, 3'-TGTTTTA-5' at 148, 3'-TTGGGTG-5' at 1802, 3'-TTTTTTG-5' at 2282, 3'-TGGGATG-5' at 2409, 3'-TTTTTTG-5' at 2452, 3'-GGGGATA-5' at 2659, 3'-GGTTTTG-5' at 2688, 3'-GTGGATG-5' at 2714, 3'-GGTGTTG-5' at 2815, 3'-GGTTATG-5' at 3026, 3'-TGTGGTG-5' at 3644, 3'-TTTGGTG-5' at 3949, 3'-TGTGGTG-5' at 3968, 3'-GGTTTTA-5' at 4110, 3'-TGGGGTG-5' at 4398 and their complements.
# inverse, positive strand, negative direction, is SuccessablesdBREi+-.bas, looking for 3'-G/T-G/T-G/T-G/T-A/G/T-T-A/G-5', 16, 3'-GTTTTTA-5' at 217, 3'-TGGTGTG-5' at 609, 3'-TGTGGTG-5' at 789, 3'-TTGGGTG-5' at 1048, 3'-GTGGGTG-5' at 1163, 3'-TTTTTTG-5' at 1433, 3'-TGGTGTG-5' at 1478, 3'-GGTGGTG-5' at 1902, 3'-GTGGGTG-5' at 2332, 3'-TGTGGTG-5' at 2660, 3'-GGGTGTG-5' at 3185, 3'-GGTTTTA-5' at 3350, 3'-TTGGTTG-5' at 3532, 3'-GTGGTTG-5' at 3605, 3'-GGTGATG-5' at 3798, 3'-TTGGTTG-5' at 3945 and their complements.
# inverse, positive strand, positive direction, is SuccessablesdBREi++.bas, looking for 3'-G/T-G/T-G/T-G/T-A/G/T-T-A/G-5', 14, 3'-GTGGGTG-5' at 72, 3'-GGTGGTG-5' at 703, 3'-TTGGATG-5' at 1283, 3'-TTGGGTG-5' at 2016, 3'-GTTGGTG-5' at 2122, 3'-TGGTGTG-5' at 2601, 3'-TTTGGTG-5' at 2633, 3'-TTGTGTG-5' at 3097, 3'-TGGTTTG-5' at 3176, 3'-TGTGGTA-5' at 3826, 3'-TGGGGTG-5' at 3941, 3'-TGGGGTA-5' at 4220, 3'-GTTTGTG-5' at 4257, 3'-TGGGGTG-5' at 4287 and their complements.
 
==Downstream core elements==
{{main|Downstream core element gene transcriptions}}
 
In the negative direction on the negative strand, the A1BG transcription start site is at 4460 nucleotides from the last nucleotide of the gene ZSCAN22. In the positive direction on the negative strand, the A1BG transcription start site is at 4300 from well within the gene ZNF497. Downstream core elements are expected downstream of these TSSs. Occurrences before the TSSs can be found on [[Downstream core element gene transcriptions]].
 
# negative strand, negative direction, looking for DCE SI: 3'-CTTC-5', 0.
# negative strand, positive direction, looking for DCE SI: 3'-CTTC-5', 0.
# positive strand, negative direction, looking for DCE SI: 3'-CTTC-5' at 4528.
# positive strand, positive direction, looking for DCE SI: 3'-CTTC-5', 0.
 
# negative strand, negative direction, looking for DCE SII: 3'-CTGT-5', 2, 3'-CTGT-5' at 4468 , 3'-CTGT-5' at 4507.
# negative strand, positive direction, looking for DCE SII: 3'-CTGT-5', 1, 3'-CTGT-5' at 4392.
# positive strand, negative direction, looking for DCE SII: 3'-CTGT-5', 0.
# positive strand, positive direction, looking for DCE SII: 3'-CTGT-5', 1, 3'-CTGT-5' at 4332.
 
# negative strand, negative direction, looking for DCE SIII: 3'-AGC-5', 0.
# negative strand, positive direction, looking for DCE SIII: 3'-AGC-5', 1, 3'-AGC-5' at 4352.
# positive strand, negative direction, looking for DCE SIII: 3'-AGC-5', 3, 3'-AGC-5' at 4480, 3'-AGC-5' at 4489, 3'-AGC-5' at 4520.
# positive strand, positive direction, looking for DCE SIII: 3'-AGC-5', 1, 3'-AGC-5' at 4374.
 
===Complements===
# negative strand, negative direction, looking for DCE SIc: 3'-GAAG-5', 1, 3'-GAAG-5' at 4528.
# negative strand, positive direction, looking for DCE SIc: 3'-GAAG-5', 0.
# positive strand, negative direction, looking for DCE SIc: 3'-GAAG-5', 0.
# positive strand, positive direction, looking for DCE SIc: 3'-GAAG-5', 0.
 
# negative strand, negative direction, looking for DCE SIIc: 3'-GACA-5', 0.
# negative strand, positive direction, looking for DCE SIIc: 3'-GACA-5', 1, 3'-GACA-5' at 4332.
# positive strand, negative direction, looking for DCE SIIc: 3'-GACA-5', 2, 3'-GACA-5' at 4468, 3'-GACA-5' at 4507.
# positive strand, positive direction, looking for DCE SIIc: 3'-GAAG-5', 1, 3'-GACA-5' at 4392.
 
# negative strand, negative direction, looking for DCE SIIIc: 3'-TCG-5', 3, 3'-TCG-5' at 4480, 3'-TCG-5' at 4489, 3'-TCG-5' at 4520.
# negative strand, positive direction, looking for DCE SIIIc: 3'-TCG-5', 1, 3'-TCG-5' at 4374.
# positive strand, negative direction, looking for DCE SIIIc: 3'-TCG-5', 0.
# positive strand, positive direction, looking for DCE SIIIc: 3'-TCG-5', 1, 3'-TCG-5' at 4352.
 
===Inverse complements===
 
# looking for DCE SIci: 3'-GAAG-5', same as the complements.
 
# negative strand, negative direction, looking for DCE SIIci: 3'-ACAG-5', 0.
# negative strand, positive direction, looking for DCE SIIci: 3'-ACAG-5', 0.
# positive strand, negative direction, looking for DCE SIIci: 3'-ACAG-5', 1, 3'-ACAG-5' at 4517.
# positive strand, positive direction, looking for DCE SIIci: 3'-ACAG-5', 1, 3'-ACAG-5' at 4366.
 
# negative strand, negative direction, looking for DCE SIIIci: 3'-GCT-5', 1, 3'-GCT-5' at 4471.
# negative strand, positive direction, looking for DCE SIIIci: 3'-GCT-5', 4, 3'-GCT-5' at 4312, 3'-GCT-5' at 4321, 3'-GCT-5' at 4372, 3'-GCT-5' at 4390.
# positive strand, negative direction, looking for DCE SIIIci: 3'-GCT-5', 0.
# positive strand, positive direction, looking for DCE SIIIci: 3'-GCT-5', 1, 3'-GCT-5' at 4356.
 
===Inverses===
 
# looking for DCE SIi: 3'-CTTC-5', same as the direct transcript.
 
# negative strand, negative direction, looking for DCE SIIi: 3'-TGTC-5', 1, 3'-TGTC-5' at 4517.
# negative strand, positive direction, looking for DCE SIIi: 3'-TGTC-5', 1, 3'-TGTC-5' at 4366.
# positive strand, negative direction, looking for DCE SIIi: 3'-TGTC-5', 0.
# positive strand, positive direction, looking for DCE SIIi: 3'-TGTC-5', 0.
 
# negative strand, negative direction, looking for DCE SIIIi: 3'-CGA-5', 0.
# negative strand, positive direction, looking for DCE SIIIi: 3'-CGA-5', 1, 3'-CGA-5' at 4356.
# positive strand, negative direction, looking for DCE SIIIi: 3'-CGA-5', 1, 3'-CGA-5' at 4471.
# positive strand, positive direction, looking for DCE SIIIi: 3'-CGA-5', 4, 3'-CGA-5' at 4312, 3'-CGA-5' at 4321, 3'-CGA-5' at 4372, 3'-CGA-5' at 4390.
 
==Downstream promoter elements==
{{main|Downstream promoter element gene transcriptions}}
# negative strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesDPE--.bas, looking for 3'-A/G-G-A/T-C/T-A/C/G-5', 163, 3'-GGTCG-5', 35, 3'-AGATA-5', 234, 3'-GGTCC-5', 262, 3'-GGACA-5', 394, 3'-GGTCG-5', 403, 3'-GGTTC-5', 419, 3'-AGTCC-5', 441, 3'-GGACC-5', 459, 3'-AGATG-5', 481, 3'-GGTCG-5', 504, 3'-GGACC-5', 508, 3'-GGTCG-5', 540, 3'-GGTTC-5', 556, 3'-AGTCC-5', 578, 3'-GGACC-5', 596, 3'-AGATG-5', 624, 3'-GGTCC-5', 648, 3'-GGACA-5', 667, 3'-GGTCG-5', 676, 3'-GGTTC-5', 692, 3'-AGTCC-5', 714, 3'-GGTCG-5', 728, 3'-GGTCG-5', 737, 3'-AGATG-5', 758, 3'-GGACA-5', 801, 3'-GGTCG-5', 810, 3'-GGTCC-5', 850, 3'-GGTTC-5', 874, 3'-GGTCG-5', 895, 3'-GGACC-5', 899, 3'-AGACA-5', 919, 3'-GGTCC-5', 948, 3'-GGACA-5', 967, 3'-GGTCG-5', 976, 3'-AGTCC-5', 984, 3'-GGACC-5', 1015, 3'-GGTCG-5', 1061, 3'-AGACA-5', 1085, 3'-GGACA-5', 1131, 3'-GGTCG-5', 1140, 3'-GGTCG-5', 1194, 3'-GGACC-5', 1198, 3'-GGTTG-5', 1203, 3'-AGATG-5', 1224, 3'-GGACA-5', 1258, 3'-GGTCG-5', 1267, 3'-AGTCC-5', 1275, 3'-GGATC-5', 1306, 3'-GGTCA-5', 1352, 3'-AGACC-5', 1356, 3'-AGTTG-5', 1406, 3'-AGACA-5', 1452, 3'-GGTCC-5', 1460, 3'-AGTCG-5', 1486, 3'-AGTTG-5', 1513, 3'-AGATA-5', 1525, 3'-GGTCA-5', 1532, 3'-GGTCG-5', 1611, 3'-AGACA-5', 1776, 3'-GGTCG-5', 1785, 3'-GGTTC-5', 1817, 3'-GGACC-5', 1841, 3'-AGATG-5', 1867, 3'-GGACA-5', 1911, 3'-GGTCG-5', 1920, 3'-GGACC-5', 1959, 3'-GGTCG-5', 2005, 3'-GGACC-5', 2009, 3'-AGACA-5', 2029, 3'-GGTCC-5', 2077, 3'-GGATC-5', 2093, 3'-AGTCC-5', 2134, 3'-GGTTG-5', 2148, 3'-AGATG-5', 2169, 3'-GGTCA-5', 2211, 3'-AGTCC-5', 2250, 3'-GGTCG-5', 2264, 3'-GGACC-5', 2268, 3'-AGATG-5', 2294, 3'-GGACA-5', 2337, 3'-GGTCG-5', 2346, 3'-GGACC-5', 2385, 3'-GGTCG-5', 2431, 3'-GGACC-5', 2435, 3'-AGTTA-5', 2496, 3'-GGTCC-5', 2519, 3'-GGACA-5', 2538, 3'-GGTTG-5', 2547, 3'-AGTCC-5', 2587, 3'-GGTCA-5', 2601, 3'-GGTTG-5', 2610, 3'-AGTCG-5', 2650, 3'-GGTCA-5', 2654, 3'-GGACA-5', 2672, 3'-GGTCG-5', 2681, 3'-GGACC-5', 2720, 3'-GGTCG-5', 2766, 3'-GGACC-5', 2770, 3'-GGTTA-5', 2848, 3'-AGATG-5', 2988, 3'-GGATA-5', 2996, 3'-GGACA-5', 3061, 3'-GGTCG-5', 3070, 3'-AGTCC-5', 3110, 3'-GGTCG-5', 3124, 3'-GGACC-5', 3128, 3'-GGTTG-5', 3137, 3'-AGATG-5', 3158, 3'-GGACA-5', 3200, 3'-AGTCG-5', 3204, 3'-GGTCG-5', 3209, 3'-AGTCC-5', 3217, 3'-GGTCC-5', 3249, 3'-GGTTC-5', 3273, 3'-GGTCG-5', 3294, 3'-GGACC-5', 3298, 3'-AGACA-5', 3319, 3'-AGTCC-5', 3396, 3'-AGTTG-5', 3523, 3'-AGACA-5', 3556, 3'-GGTCC-5', 3564, 3'-GGACG-5', 3579, 3'-GGTCC-5', 3585, 3'-GGTCG-5', 3682, 3'-GGTCG-5', 3701, 3'-AGACG-5', 3706, 3'-GGTCG-5', 3731, 3'-GGACC-5', 3744, 3'-AGACC-5', 3835, 3'-AGTTC-5', 3844, 3'-GGACG-5', 3861, 3'-GGTCC-5', 3871, 3'-GGTCC-5', 3885, 3'-GGACC-5', 3906, 3'-GGTCC-5', 3951, 3'-GGACA-5', 3970, 3'-GGTTG-5', 3979, 3'-GGTTC-5', 4019, 3'-AGTTC-5', 4027, 3'-GGTCG-5', 4033, 3'-GGACC-5', 4037, 3'-AGATG-5', 4062, 3'-GGTCC-5', 4102, 3'-GGACA-5', 4121, 3'-GGTCG-5', 4130, 3'-AGTCC-5', 4138, 3'-GGTCC-5', 4170, 3'-AGTTC-5', 4178, 3'-GGACA-5', 4208, 3'-AGATG-5', 4212, 3'-GGTCC-5', 4253, 3'-GGTCG-5', 4261, 3'-GGACC-5', 4300, 3'-GGTCG-5', 4345, 3'-GGACC-5', 4349, 3'-GGACA-5', 4369, 3'-GGTCA-5', 4415, 3'-AGATG-5', 4430, 3'-AGTCC-5', 4436, 3'-GGTCG-5', 4480, 3'-AGTCG-5', 4489, 3'-GGACC-5', 4494, 3'-GGACC-5', 4546, and their complements.
# negative strand in the positive direction (from ZNF497 to A1BG) is SuccessablesDPE-+.bas, looking for 3'-A/G-G-A/T-C/T-A/C/G-5', 73, 3'-GGACC-5' at 37, 3'-GGATG-5' at 59, 3'-GGTCA-5' at 153, 3'-AGATG-5' at 166, 3'-AGTCC-5' at 172, 3'-GGACC-5' at 187, 3'-GGTCC-5' at 218, 3'-GGTTC-5' at 305, 3'-GGACG-5' at 323, 3'-GGACG-5' at 359, 3'-AGACG-5' at 398, 3'-GGACG-5' at 410, 3'-AGACC-5' at 440, 3'-AGACA-5' at 712, 3'-AGTCC-5' at 757, 3'-AGATC-5' at 864, 3'-AGATC-5' at 964, 3'-AGTCG-5' at 1528, 3'-GGACG-5' at 1670, 3'-GGTCG-5' at 1687, 3'-GGACA-5' at 1693, 3'-AGTCC-5' at 1826, 3'-AGTCC-5' at 1841, 3'-GGACA-5' at 1869, 3'-GGATG-5' at 1878, 3'-GGTTC-5' at 1926, 3'-AGTTC-5' at 1987, 3'-AGTCC-5' at 2026, 3'-GGTCA-5' at 2035, 3'-AGTCA-5' at 2100, 3'-AGTTA-5' at 2134, 3'-GGTCA-5' at 2220, 3'-AGATC-5' at 2230, 3'-GGATG-5' at 2409, 3'-GGACA-5' at 2460, 3'-AGTCA-5' at 2607, 3'-AGTCA-5' at 2613, 3'-AGTCA-5' at 2618, 3'-GGATA-5' at 2659, 3'-AGTTA-5' at 2666, 3'-GGATG-5' at 2714, 3'-GGATA-5' at 2737, 3'-AGACC-5' at 2861, 3'-GGTTC-5' at 2922, 3'-AGTTC-5' at 2954, 3'-AGTCC-5' at 2998, 3'-GGTTA-5' at 3024, 3'-GGTTG-5' at 3050, 3'-AGTCC-5' at 3084, 3'-GGACA-5' at 3131, 3'-GGACC-5' at 3172, 3'-AGTCG-5' at 3283, 3'-AGTTA-5' at 3381, 3'-AGATG-5' at 3418, 3'-GGATG-5' at 3457, 3'-AGATG-5' at 3475, 3'-GGTTG-5' at 3490, 3'-GGACA-5' at 3530, 3'-GGACC-5' at 3545, 3'-AGACC-5' at 3550, 3'-GGATG-5' at 3574, 3'-GGTCA-5' at 3820, 3'-AGTCC-5' at 3863, 3'-AGACA-5' at 3893, 3'-GGTTC-5' at 4073, 3'-GGATC-5' at 4080, 3'-GGATG-5' at 4099, 3'-AGTTC-5' at 4200, 3'-GGACA-5' at 4252, 3'-GGTCA-5' at 4269, 3'-AGACG-5' at 4319, 3'-AGACA-5' at 4332, 3'-GGTCC-5' at 4420, and their complements.
# positive strand in the negative direction is SuccessablesDPE+-.bas, looking for 3'-A/G-G-A/T-C/T-A/C/G-5', 101, 3'-GGACC-5', 32, 3'-AGATA-5', 57, 3'-GGATA-5', 74, 3'-AGTTG-5', 84, 3'-GGATA-5', 98, 3'-GGATA-5', 108, 3'-AGTCG-5', 157, 3'-AGACA-5', 170, 3'-GGTCA-5', 206, 3'-AGATG-5', 244, 3'-AGTTC-5', 253, 3'-AGACA-5', 422, 3'-GGATC-5', 430, 3'-GGTCA-5', 439, 3'-GGATC-5', 525, 3'-AGACA-5', 559, 3'-GGTCA-5', 568, 3'-GGTCA-5', 576, 3'-AGATC-5', 589, 3'-GGATC-5', 703, 3'-GGTCA-5', 712, 3'-AGTTC-5', 719, 3'-AGACC-5', 725, 3'-GGATG-5', 784, 3'-GGTTG-5', 862, 3'-AGATC-5', 877, 3'-AGATC-5', 972, 3'-GGTTG-5', 1028, 3'-GGACG-5', 1151, 3'-GGATC-5', 1167, 3'-AGTTC-5', 1177, 3'-GGTTG-5', 1319, 3'-AGATG-5', 1438, 3'-AGACA-5', 1569, 3'-AGATA-5', 1595, 3'-GGATC-5', 1812, 3'-AGATG-5', 1828, 3'-AGACC-5', 1834, 3'-AGATC-5', 1987, 3'-GGACA-5', 2117, 3'-AGACC-5', 2121, 3'-AGACC-5', 2145, 3'-AGATA-5', 2177, 3'-GGTTG-5', 2234, 3'-GGATC-5', 2239, 3'-GGTCA-5', 2248, 3'-AGACC-5', 2261, 3'-GGACA-5', 2271, 3'-GGTTG-5', 2398, 3'-AGATC-5', 2413, 3'-AGTCC-5', 2543, 3'-GGATC-5', 2574, 3'-GGTCA-5', 2585, 3'-AGTTG-5', 2592, 3'-AGACC-5', 2598, 3'-AGTTG-5', 2704, 3'-AGTTG-5', 2733, 3'-AGACA-5', 2880, 3'-AGATG-5', 2894, 3'-AGATG-5', 2905, 3'-AGACA-5', 2948, 3'-AGATA-5', 2981, 3'-GGATC-5', 3097, 3'-AGTTG-5', 3115, 3'-AGACC-5', 3121, 3'-GGTTG-5', 3261, 3'-AGATC-5', 3276, 3'-GGACA-5', 3389, 3'-AGACA-5', 3433, 3'-AGATA-5', 3465, 3'-AGATC-5', 3488, 3'-GGTTG-5', 3532, 3'-GGTTG-5', 3605, 3'-AGATG-5', 3620, 3'-AGATG-5', 3627, 3'-GGATA-5', 3655, 3'-GGACA-5', 3756, 3'-AGACC-5', 3761, 3'-GGTTG-5', 3804, 3'-GGTCG-5', 3813, 3'-GGACC-5', 3868, 3'-AGATG-5', 3919, 3'-GGTTG-5', 3945, 3'-GGATC-5', 4006, 3'-AGTTC-5', 4024, 3'-AGACC-5', 4030, 3'-AGTTG-5', 4096, 3'-AGTCC-5', 4126, 3'-GGATC-5', 4157, 3'-AGTTC-5', 4175, 3'-AGACA-5', 4181, 3'-AGACC-5', 4204, 3'-AGACG-5', 4235, 3'-GGATC-5', 4288, 3'-GGTCA-5', 4307, 3'-AGACC-5', 4365, 3'-AGTTC-5', 4417, 3'-GGACA-5', 4468, 3'-AGATC-5', 4475, 3'-AGTCC-5', 4500, 3'-AGACA-5', 4507, and their complements.
# positive strand in the positive direction is SuccessablesDPE++.bas, looking for 3'-A/G-G-A/T-C/T-A/C/G-5', 159, 3'-GGTCC-5' at 8, 3'-GGTCC-5' at 33, 3'-GGACC-5' at 40, 3'-AGTCC-5' at 90, 3'-AGACA-5' at 98, 3'-AGACC-5' at 102, 3'-GGACA-5' at 144, 3'-GGTTC-5' at 177, 3'-GGACG-5' at 191, 3'-GGTCC-5' at 215, 3'-AGACG-5' at 223, 3'-AGACC-5' at 270, 3'-GGACC-5' at 286, 3'-GGTCG-5' at 329, 3'-GGTCC-5' at 424, 3'-GGACG-5' at 435, 3'-AGTCG-5' at 511, 3'-GGTCC-5' at 515, 3'-GGACC-5' at 598, 3'-GGTTG-5' at 607, 3'-AGTCG-5' at 613, 3'-GGTCG-5' at 617, 3'-GGTCG-5' at 623, 3'-GGATG-5' at 649, 3'-GGTCC-5' at 707, 3'-GGACG-5' at 807, 3'-AGTCG-5' at 831, 3'-GGTTG-5' at 843, 3'-GGACC-5' at 847, 3'-GGACA-5' at 891, 3'-GGACG-5' at 907, 3'-AGTCG-5' at 931, 3'-GGTTG-5' at 943, 3'-GGACC-5' at 947, 3'-GGACA-5' at 991, 3'-GGACG-5' at 1075, 3'-GGACG-5' at 1118, 3'-GGTCG-5' at 1127, 3'-GGTCC-5' at 1175, 3'-GGATG-5' at 1195, 3'-GGACC-5' at 1199, 3'-GGTCA-5' at 1250, 3'-AGTCG-5' at 1267, 3'-GGTCG-5' at 1271, 3'-GGTTG-5' at 1279, 3'-GGATG-5' at 1283, 3'-GGACG-5' at 1311, 3'-GGTCG-5' at 1357, 3'-GGTCG-5' at 1363, 3'-GGACG-5' at 1369, 3'-AGACC-5' at 1376, 3'-AGACG-5' at 1395, 3'-GGACG-5' at 1411, 3'-GGTCG-5' at 1457, 3'-GGTCG-5' at 1463, 3'-GGACG-5' at 1469, 3'-AGACC-5' at 1476, 3'-AGACG-5' at 1495, 3'-GGATG-5' at 1573, 3'-AGTCG-5' at 1603, 3'-AGTTG-5' at 1621, 3'-AGACG-5' at 1733, 3'-GGACG-5' at 1776, 3'-GGACC-5' at 1815, 3'-GGTCC-5' at 1855, 3'-GGACA-5' at 1860, 3'-AGACC-5' at 1864, 3'-GGTCC-5' at 1893, 3'-AGACC-5' at 1992, 3'-GGTTG-5' at 2012, 3'-GGTCA-5' at 2024, 3'-GGTCG-5' at 2052, 3'-AGTCA-5' at 2060, 3'-AGTCA-5' at 2098, 3'-AGTCG-5' at 2102, 3'-AGTCC-5' at 2115, 3'-AGATC-5' at 2167, 3'-AGACA-5' at 2182, 3'-AGTCG-5' at 2198, 3'-AGTTA-5' at 2233, 3'-GGACA-5' at 2250, 3'-AGACA-5' at 2260, 3'-AGACA-5' at 2308, 3'-GGTCC-5' at 2316, 3'-AGTCC-5' at 2372, 3'-AGTCG-5' at 2390, 3'-GGTTC-5' at 2398, 3'-GGACC-5' at 2433, 3'-GGATC-5' at 2481, 3'-GGACC-5' at 2501, 3'-AGTTC-5' at 2508, 3'-GGACG-5' at 2520, 3'-AGTCG-5' at 2526, 3'-GGACC-5' at 2569, 3'-GGTCC-5' at 2574, 3'-GGTTC-5' at 2593, 3'-GGTCA-5' at 2605, 3'-AGTTC-5' at 2615, 3'-AGTCC-5' at 2620, 3'-GGTCC-5' at 2780, 3'-AGACG-5' at 2856, 3'-GGTCC-5' at 2876, 3'-AGACC-5' at 2883, 3'-GGACC-5' at 2891, 3'-GGTTA-5' at 2908, 3'-AGACA-5' at 2925, 3'-AGTCA-5' at 2936, 3'-AGACA-5' at 2957, 3'-AGACG-5' at 2975, 3'-AGACC-5' at 2983, 3'-GGACC-5' at 2988, 3'-GGTCA-5' at 2996, 3'-GGTCC-5' at 3016, 3'-AGACC-5' at 3021, 3'-AGTCC-5' at 3034, 3'-AGTCG-5' at 3041, 3'-GGACC-5' at 3047, 3'-AGACG-5' at 3060, 3'-GGTCA-5' at 3082, 3'-GGTCC-5' at 3111, 3'-AGTCG-5' at 3155, 3'-GGTCG-5' at 3239, 3'-AGATA-5' at 3258, 3'-AGACG-5' at 3267, 3'-AGACG-5' at 3278, 3'-AGTTG-5' at 3290, 3'-GGACC-5' at 3296, 3'-AGACG-5' at 3306, 3'-AGACG-5' at 3358, 3'-GGACC-5' at 3362, 3'-GGTCA-5' at 3379, 3'-AGACC-5' at 3405, 3'-AGTTA-5' at 3424, 3'-GGACA-5' at 3434, 3'-GGACC-5' at 3496, 3'-GGTCC-5' at 3536, 3'-GGACA-5' at 3617, 3'-GGACA-5' at 3622, 3'-GGTTG-5' at 3633, 3'-GGACC-5' at 3679, 3'-GGTCC-5' at 3687, 3'-GGTCG-5' at 3720, 3'-AGTCC-5' at 3728, 3'-GGACC-5' at 3758, 3'-AGTCG-5' at 3775, 3'-GGACC-5' at 3787, 3'-GGTCA-5' at 3841, 3'-AGTCC-5' at 3868, 3'-AGTCG-5' at 3997, 3'-AGTCG-5' at 4023, 3'-GGTCC-5' at 4032, 3'-AGTCG-5' at 4052, 3'-AGATC-5' at 4064, 3'-AGATC-5' at 4076, 3'-GGACG-5' at 4231, 3'-AGTCA-5' at 4271, 3'-GGACC-5' at 4409, 3'-AGACC-5' at 4416, 3'-GGACC-5' at 4424, and their complements.
# inverse, negative strand, negative direction, is SuccessablesDPEi--.bas, looking for 3'-A/C/G-C/T-A/T-G-A/G-5', 58, 3'-CCTGG-5', 32, 3'-ACAGA-5', 479, 3'-GTAGG-5', 593, 3'-ATTGG-5', 614, 3'-ACTGG-5', 734, 3'-GCAGA-5', 754, 3'-CTTGG-5', 846, 3'-ACAGA-5', 921, 3'-CTAGG-5', 973, 3'-CTTGG-5', 1012, 3'-ACTGA-5', 1051, 3'-ACAGA-5', 1087, 3'-GCTGG-5', 1191, 3'-ACAGA-5', 1222, 3'-CTTGG-5', 1303, 3'-GTTGG-5', 1407, 3'-CTAGA-5', 1482, 3'-GTTGG-5', 1514, 3'-ATAGG-5', 1529, 3'-GTAGG-5', 1572, 3'-CTTGA-5', 1685, 3'-ATAGA-5', 1731, 3'-GCAGA-5', 1774, 3'-CTAGG-5', 1813, 3'-GTAGG-5', 1838, 3'-GTAGA-5', 1863, 3'-CTTGG-5', 1956, 3'-ACAGA-5', 2031, 3'-ACAGA-5', 2165, 3'-ACTGG-5', 2189, 3'-GTAGA-5', 2290, 3'-CTTGG-5', 2382, 3'-CTTGG-5', 2717, 3'-ACTGA-5', 2786, 3'-GTTGG-5', 2844, 3'-GTTGA-5', 2911, 3'-ACAGA-5', 2986, 3'-GTAGA-5', 3154, 3'-CTTGG-5', 3245, 3'-ACAGA-5', 3321, 3'-CTTGA-5', 3460, 3'-GTTGA-5', 3524, 3'-GTAGA-5', 3551, 3'-CCTGA-5', 3640, 3'-GCAGG-5', 3698, 3'-CCTGA-5', 3747, 3'-CTTGG-5', 3784, 3'-ACAGA-5', 3833, 3'-GTTGA-5', 3849, 3'-CCTGG-5', 3868, 3'-GTAGG-5', 3903, 3'-GTAGA-5', 4058, 3'-ACAGA-5', 4210, 3'-CCTGA-5', 4327, 3'-ACAGA-5', 4371, 3'-CTTGG-5', 4451, 3'-GTAGG-5', 4456, 3'-CTAGG-5', 4476,
# inverse, negative strand, positive direction, is SuccessablesDPEi-+.bas, looking for 3'-A/C/G-C/T-A/T-G-A/G-5', 152 , 3'-CCAGG-5' at 8 , 3'-CCAGA-5' at 15 , 3'-ATTGG-5' at 24 , 3'-CCAGG-5' at 33 , 3'-CCTGG-5' at 40 , 3'-ACAGG-5' at 157 , 3'-GCAGG-5' at 194 , 3'-CCAGA-5' at 204 , 3'-CCAGG-5' at 215 , 3'-GCTGG-5' at 277 , 3'-CCTGG-5' at 286 , 3'-GCAGG-5' at 318 , 3'-ACTGG-5' at 347 , 3'-ACAGG-5' at 365 , 3'-GCAGG-5' at 379 , 3'-GCTGG-5' at 386 , 3'-GCAGA-5' at 396 , 3'-GCTGG-5' at 417 , 3'-CCAGG-5' at 424 , 3'-GCAGA-5' at 438 , 3'-CCAGA-5' at 468 , 3'-CCAGG-5' at 515 , 3'-ACAGG-5' at 552 , 3'-CCTGG-5' at 598 , 3'-GCAGG-5' at 658 , 3'-CCAGG-5' at 707 , 3'-CCTGA-5' at 725 , 3'-GCTGG-5' at 779 , 3'-CCAGA-5' at 835 , 3'-CCTGG-5' at 847 , 3'-CCTGA-5' at 859 , 3'-CCAGA-5' at 935 , 3'-CCTGG-5' at 947 , 3'-CCTGA-5' at 959 , 3'-ACTGG-5' at 1140 , 3'-CCAGG-5' at 1175 , 3'-CCTGG-5' at 1199 , 3'-ACTGA-5' at 1286 , 3'-GCAGA-5' at 1316 , 3'-GCAGA-5' at 1416 , 3'-CCAGA-5' at 1631 , 3'-CCTGA-5' at 1660 , 3'-CCTGA-5' at 1676 , 3'-GCTGG-5' at 1736 , 3'-CCAGA-5' at 1742 , 3'-GCTGG-5' at 1779 , 3'-GCAGG-5' at 1788 , 3'-CTTGG-5' at 1799 , 3'-CCTGG-5' at 1815 , 3'-CCAGG-5' at 1855 , 3'-GTAGG-5' at 1875 , 3'-CCAGG-5' at 1893 , 3'-GCAGG-5' at 1905 , 3'-GCAGA-5' at 1937 , 3'-ACTGG-5' at 1953 , 3'-ACAGG-5' at 1966 , 3'-ACAGG-5' at 2125 , 3'-GTTGG-5' at 2185 , 3'-CCTGA-5' at 2211 , 3'-CCAGA-5' at 2228 , 3'-GTAGG-5' at 2255 , 3'-GCAGG-5' at 2296 , 3'-CCAGG-5' at 2316 , 3'-GCTGG-5' at 2320 , 3'-GCTGG-5' at 2405 , 3'-ACAGA-5' at 2414 , 3'-CCTGG-5' at 2433 , 3'-CTAGG-5' at 2482 , 3'-CCTGG-5' at 2501 , 3'-GTTGG-5' at 2541 , 3'-ATAGG-5' at 2550 , 3'-CCTGG-5' at 2569 , 3'-CCAGG-5' at 2574 , 3'-ATAGA-5' at 2627 , 3'-CTAGG-5' at 2639 , 3'-ACAGA-5' at 2652 , 3'-ACTGA-5' at 2674 , 3'-GCAGG-5' at 2683 , 3'-GCAGA-5' at 2721 , 3'-GCTGG-5' at 2734 , 3'-GCAGG-5' at 2745 , 3'-GCTGG-5' at 2770 , 3'-CCAGG-5' at 2780 , 3'-GCTGG-5' at 2810 , 3'-GTTGG-5' at 2816 , 3'-ACAGA-5' at 2837 , 3'-GCAGA-5' at 2859 , 3'-CCAGG-5' at 2876 , 3'-CCTGG-5' at 2891 , 3'-GCTGA-5' at 2915 , 3'-CCTGA-5' at 2968 , 3'-CCTGG-5' at 2988 , 3'-CCAGG-5' at 3016 , 3'-CCTGG-5' at 3047 , 3'-CCAGA-5' at 3091 , 3'-CCAGG-5' at 3111 , 3'-ACTGG-5' at 3117 , 3'-GCAGG-5' at 3128 , 3'-ACAGA-5' at 3133 , 3'-GCAGG-5' at 3147 , 3'-ACAGA-5' at 3179 , 3'-GCAGA-5' at 3214 , 3'-CCAGA-5' at 3221 , 3'-GCTGG-5' at 3242 , 3'-CCTGG-5' at 3296 , 3'-ACTGG-5' at 3345 , 3'-CCTGG-5' at 3362 , 3'-GTAGA-5' at 3416 , 3'-GCAGG-5' at 3466 , 3'-GCAGA-5' at 3473 , 3'-CTAGG-5' at 3484 , 3'-CCTGG-5' at 3496 , 3'-CTAGG-5' at 3522 , 3'-GCTGG-5' at 3526 , 3'-CCAGG-5' at 3536 , 3'-CCAGA-5' at 3548 , 3'-ACAGG-5' at 3571 , 3'-GCTGA-5' at 3588 , 3'-ACAGG-5' at 3636 , 3'-GCAGG-5' at 3662 , 3'-CCTGG-5' at 3679 , 3'-CCAGG-5' at 3687 , 3'-GCAGG-5' at 3694 , 3'-ACTGA-5' at 3735 , 3'-GTAGG-5' at 3753 , 3'-CCTGG-5' at 3758 , 3'-GCAGG-5' at 3768 , 3'-GCTGA-5' at 3778 , 3'-CCTGG-5' at 3787 , 3'-CCAGA-5' at 3806 , 3'-GCAGA-5' at 3831 , 3'-CCAGA-5' at 3891 , 3'-GCAGA-5' at 3916 , 3'-ACAGG-5' at 3975 , 3'-GCTGG-5' at 3989 , 3'-CTTGA-5' at 4016 , 3'-CCAGG-5' at 4032 , 3'-GTAGA-5' at 4036 , 3'-CTAGA-5' at 4065 , 3'-ACAGG-5' at 4070 , 3'-CTAGG-5' at 4077 , 3'-ACTGA-5' at 4089 , 3'-CTTGA-5' at 4131 , 3'-ATTGA-5' at 4161 , 3'-GCTGG-5' at 4177 , 3'-CCTGA-5' at 4186 , 3'-CCTGA-5' at 4214 , 3'-CTTGG-5' at 4300 , 3'-GCAGA-5' at 4317 , 3'-CCAGA-5' at 4330 , 3'-CCTGG-5' at 4409 , 3'-CCTGG-5' at 4424.
# inverse, positive strand, negative direction, is SuccessablesDPEi+-.bas, looking for 3'-A/C/G-C/T-A/T-G-A/G-5', 174, 3'-ACAGA-5', 13, 3'-ACTGA-5', 17, 3'-GTTGA-5', 85, 3'-ATAGA-5', 100, 3'-GTAGG-5', 119, 3'-ACTGA-5', 130, 3'-GCTGA-5', 140, 3'-ACAGA-5', 168, 3'-CCAGG-5', 262, 3'-GTAGA-5', 284, 3'-ACAGA-5', 289, 3'-ACTGA-5', 307, 3'-CTTGG-5', 328, 3'-ATAGA-5', 355, 3'-ACAGG-5', 424, 3'-CCTGG-5', 459, 3'-CCTGG-5', 508, 3'-ACAGG-5', 561, 3'-GCAGG-5', 565, 3'-ATTGA-5', 585, 3'-CCTGG-5', 596, 3'-ATTGG-5', 643, 3'-CCAGG-5', 648, 3'-GCAGG-5', 697, 3'-CCTGA-5', 732, 3'-GCTGG-5', 781, 3'-GCTGA-5', 825, 3'-GCAGG-5', 831, 3'-CTTGA-5', 843, 3'-CCAGG-5', 850, 3'-CCTGG-5', 899, 3'-ACAGA-5', 907, 3'-CCAGG-5', 948, 3'-GTAGA-5', 970, 3'-GCTGA-5', 991, 3'-GCAGG-5', 997, 3'-CTTGA-5', 1009, 3'-CCTGG-5', 1015, 3'-GCAGA-5', 1023, 3'-ATTGG-5', 1045, 3'-ACAGA-5', 1073, 3'-GCTGG-5', 1111, 3'-CCTGA-5', 1173, 3'-CCTGG-5', 1198, 3'-GCTGA-5', 1282, 3'-GCAGG-5', 1288, 3'-CTTGA-5', 1300, 3'-CTAGG-5', 1307, 3'-GCAGA-5', 1314, 3'-CCAGA-5', 1411, 3'-CCAGG-5', 1460, 3'-GCTGG-5', 1464, 3'-CCAGA-5', 1518, 3'-ACAGA-5', 1567, 3'-GCAGA-5', 1614, 3'-CCTGA-5', 1623, 3'-CTTGG-5', 1649, 3'-GTAGA-5', 1653, 3'-CCAGA-5', 1670, 3'-ATAGA-5', 1710, 3'-GCTGG-5', 1746, 3'-GCTGG-5', 1756, 3'-GCTGA-5', 1800, 3'-GCAGG-5', 1823, 3'-CCTGG-5', 1841, 3'-GTTGA-5', 1853, 3'-GCTGG-5', 1891, 3'-CTTGG-5', 1927, 3'-ACTGA-5', 1935, 3'-GCAGG-5', 1941, 3'-CCTGG-5', 1959, 3'-GCAGA-5', 1967, 3'-CCTGG-5', 2009, 3'-ACAGA-5', 2017, 3'-GCTGG-5', 2069, 3'-CCAGG-5', 2077, 3'-GCTGA-5', 2109, 3'-ACAGA-5', 2119, 3'-CTTGA-5', 2127, 3'-GCTGA-5', 2226, 3'-GTTGG-5', 2235, 3'-CCTGG-5', 2268, 3'-GCTGG-5', 2326, 3'-GCTGA-5', 2361, 3'-GCAGG-5', 2367, 3'-CTTGA-5', 2379, 3'-CCTGG-5', 2385, 3'-GCAGG-5', 2389, 3'-CCTGG-5', 2435, 3'-ACAGA-5', 2443, 3'-ACAGG-5', 2514, 3'-CCAGG-5', 2519, 3'-GCTGA-5', 2562, 3'-GCAGG-5', 2568, 3'-CTTGA-5', 2580, 3'-GTTGA-5', 2593, 3'-ACAGG-5', 2689, 3'-GCTGA-5', 2696, 3'-GTTGA-5', 2705, 3'-CTTGA-5', 2714, 3'-CCTGG-5', 2720, 3'-GCTGA-5', 2744, 3'-CCTGG-5', 2770, 3'-ACAGA-5', 2778, 3'-ACAGA-5', 2878, 3'-ATAGA-5', 2903, 3'-CTTGG-5', 2921, 3'-GCTGG-5', 3035, 3'-GCTGG-5', 3041, 3'-GCTGA-5', 3085, 3'-CTTGA-5', 3103, 3'-GTTGG-5', 3116, 3'-CCTGG-5', 3128, 3'-GCTGG-5', 3180, 3'-GCTGA-5', 3224, 3'-CTTGA-5', 3242, 3'-CCAGG-5', 3249, 3'-GTAGA-5', 3256, 3'-CCTGG-5', 3298, 3'-ATTGA-5', 3358, 3'-CTTGA-5', 3401, 3'-ATAGA-5', 3422, 3'-GCAGA-5', 3431, 3'-ATAGG-5', 3447, 3'-CTAGA-5', 3463, 3'-CCAGA-5', 3486, 3'-GTTGA-5', 3505, 3'-ATTGG-5', 3529, 3'-GTTGA-5', 3533, 3'-ACTGA-5', 3542, 3'-CCAGG-5', 3564, 3'-CTTGA-5', 3571, 3'-CCAGG-5', 3585, 3'-GCAGA-5', 3589, 3'-GTTGG-5', 3606, 3'-GCTGA-5', 3649, 3'-ACAGA-5', 3672, 3'-GCTGG-5', 3719, 3'-CCTGG-5', 3744, 3'-ACTGG-5', 3749, 3'-CCTGA-5', 3781, 3'-CTTGG-5', 3793, 3'-GTTGA-5', 3805, 3'-GTAGA-5', 3820, 3'-GCTGG-5', 3864, 3'-CCAGG-5', 3871, 3'-CCAGG-5', 3885, 3'-CCTGG-5', 3906, 3'-ACAGA-5', 3917, 3'-CCTGA-5', 3932, 3'-GTTGG-5', 3942, 3'-GTTGG-5', 3946, 3'-CCAGG-5', 3951, 3'-GCTGA-5', 3994, 3'-CTTGA-5', 4012, 3'-CCTGG-5', 4037, 3'-ATAGA-5', 4079, 3'-GTTGG-5', 4097, 3'-CCAGG-5', 4102, 3'-GCTGA-5', 4145, 3'-CCAGG-5', 4170, 3'-CTTGG-5', 4188, 3'-CCAGA-5', 4233, 3'-CCAGG-5', 4253, 3'-CTTGG-5', 4268, 3'-GCTGA-5', 4276, 3'-GCAGG-5', 4282, 3'-CTTGA-5', 4294, 3'-CCTGG-5', 4300, 3'-CCTGG-5', 4349, 3'-CCAGA-5', 4448, 3'-CCTGG-5', 4494, 3'-ACAGA-5', 4518, 3'-CCTGG-5', 4546,
# inverse, positive strand, positive direction, is SuccessablesDPEi++.bas, looking for 3'-A/C/G-C/T-A/T-G-A/G-5', 95, 3'-GTAGG-5' at 30, 3'-CCTGG-5' at 37, 3'-ACAGG-5' at 82, 3'-ACAGA-5' at 100, 3'-CCTGG-5' at 187, 3'-CCAGG-5' at 218, 3'-ACAGA-5' at 268, 3'-GTTGG-5' at 608, 3'-GTAGG-5' at 629, 3'-GTAGG-5' at 698, 3'-CCTGA-5' at 746, 3'-CCTGA-5' at 814, 3'-GTTGG-5' at 844, 3'-CTAGG-5' at 865, 3'-ACAGG-5' at 893, 3'-GCTGA-5' at 898, 3'-CCTGA-5' at 914, 3'-GTTGG-5' at 944, 3'-CTAGG-5' at 965, 3'-ACAGG-5' at 993, 3'-GCTGA-5' at 998, 3'-GTTGG-5' at 1280, 3'-GCAGA-5' at 1393, 3'-GCAGA-5' at 1493, 3'-GTTGG-5' at 1616, 3'-ACTGG-5' at 1662, 3'-CCAGA-5' at 1711, 3'-ACAGA-5' at 1731, 3'-CTTGG-5' at 1811, 3'-ACAGA-5' at 1862, 3'-GCAGG-5' at 1930, 3'-CTTGA-5' at 1951, 3'-CCAGA-5' at 1958, 3'-GTTGG-5' at 2013, 3'-ACAGA-5' at 2078, 3'-GTAGA-5' at 2111, 3'-GTTGG-5' at 2120, 3'-ACAGA-5' at 2172, 3'-ACTGG-5' at 2213, 3'-CTTGG-5' at 2225, 3'-CCAGA-5' at 2258, 3'-CCTGA-5' at 2271, 3'-GCTGA-5' at 2359, 3'-CTAGG-5' at 2378, 3'-ACAGA-5' at 2466, 3'-CCAGA-5' at 2489, 3'-CTAGG-5' at 2514, 3'-CTTGG-5' at 2579, 3'-CCTGA-5' at 2672, 3'-CTTGG-5' at 2776, 3'-CCTGA-5' at 2820, 3'-GTAGA-5' at 2852, 3'-ACTGG-5' at 2873, 3'-CCAGA-5' at 2941, 3'-ACTGA-5' at 2945, 3'-ACAGA-5' at 3004, 3'-CCAGA-5' at 3019, 3'-ACTGA-5' at 3029, 3'-ACAGA-5' at 3053, 3'-GTAGG-5' at 3108, 3'-CCTGG-5' at 3172, 3'-GCAGG-5' at 3203, 3'-CCAGA-5' at 3245, 3'-GCAGA-5' at 3256, 3'-GTTGA-5' at 3291, 3'-CCAGA-5' at 3299, 3'-GTAGA-5' at 3329, 3'-ATAGG-5' at 3384, 3'-ACAGA-5' at 3392, 3'-GTAGA-5' at 3403, 3'-CCTGG-5' at 3545, 3'-ACAGG-5' at 3577, 3'-CCAGA-5' at 3608, 3'-ACAGG-5' at 3619, 3'-GTAGG-5' at 3629, 3'-ACTGG-5' at 3714, 3'-ATTGA-5' at 3733, 3'-CCAGA-5' at 3771, 3'-ACTGG-5' at 3784, 3'-GCTGA-5' at 3801, 3'-CTTGG-5' at 3838, 3'-CTTGG-5' at 3856, 3'-GTTGG-5' at 3911, 3'-CTTGG-5' at 3937, 3'-ACTGG-5' at 4018, 3'-CTTGA-5' at 4048, 3'-GCAGA-5' at 4056, 3'-CTAGG-5' at 4081, 3'-GTAGG-5' at 4183, 3'-ACTGG-5' at 4216, 3'-GCTGG-5' at 4358, 3'-ACAGG-5' at 4367, 3'-CCAGA-5' at 4380, 3'-CCAGA-5' at 4414, 3'-CCAGG-5' at 4420.
{{clear}}
 
==DREB boxes==
{{main|DREB box gene transcriptions}}
There are no dehydration-responsive element-binding (DREB) boxes in either promoter.
 
==E2 boxes==
{{main|E2 box gene transcriptions}}
Negative strand in the negative direction there are 5: 3'-ACAGATGT-5', 482, 3'-ACAGATGT-5', 1225, 3'-GCAGTTGG-5', 1514, 3'-ACAGATGT-5', 2989, 3'-ACAGATGT-5', 4213, in the distal promoter.
 
Positive strand in the negative direction there are 2: 3'-GCAGGTGG-5', 2571, 3'-ACAGATGA-5', 3920.
 
Inverse complement, negative strand, negative direction there is 1: 3'-CCACCTGT-5', 2117.
 
Inverse complement, positive strand, negative direction there are 4: 3'-CCACCTGT-5', 394, 3'-ACACCTGT-5', 1131, 3'-GCAACTGC-5', 3851, 3'-ACACCTGT-5', 3970
 
Negative strand in the positive direction there is 1: 3'-GCAGATGA-5', 37.
 
==EIF4E basal elements==
{{main|EIF4E basal element gene transcriptions}}
There are no EIF4E basal element, also eIF4E, (4EBE), in either promoter.
 
==Enhancer boxes==
{{main|Enhancer box gene transcriptions}}
 
===Core promoters===
 
Negative strand in the positive direction there are 2: 3'-CACATG-5', 4364, 3'-CACATG-5', 4370.
 
===Proximal promoters===
 
Positive strand, negative direction there is 1: 3'-CACATG-5' at 4247.
 
Negative strand, positive direction there are 2: 3'-CACATG-5', 4153, 3'-CACATG-5', 4221.
 
===Distal promoters===
 
Negative strand in the negative direction there are 4: 3'-CACATG-5' at 324, 3'-CACATG-5' at 797, 3'-CACATG-5' at 2213, and 3'-CACATG-5' at 2342.
 
Positive strand in the negative direction there are 17, 3'-CACATG-5' at 123, 3'-CACATG-5' at 200, 3'-CACATG-5' at 952, 3'-CACATG-5' at 1206, 3'-CACATG-5' at 1849, 3'-CACATG-5' at 1952, 3'-CACATG-5' at 2151, 3'-CACATG-5' at 2276, 3'-CACATG-5' at 2322, 3'-CACATG-5' at 2533, 3'-CACATG-5' at 2613, 3'-CACATG-5' at 2667, 3'-CACATG-5' at 2751, 3'-CACATG-5' at 2783, 3'-CACATG-5' at 4106, 3'-CACATG-5' at 4116.
 
Negative strand in the positive direction there are 17: 3'-CACATG-5', 1186, 3'-CACATG-5', 1238, 3'-CACATG-5', 1871, 3'-CACATG-5', 1933, 3'-CACATG-5', 2031, 3'-CACATG-5', 2140, 3'-CACATG-5', 2153, 3'-CACATG-5', 2266, 3'-CACATG-5', 2473, 3'-CACATG-5', 3140, 3'-CACATG-5', 3335, 3'-CACATG-5', 3580, 3'-CACATG-5', 3707, 3'-CACATG-5', 3742, 3'-CACATG-5', 3827, 3'-CACATG-5', 3900, 3'-CACATG-5', 3956.
 
Positive strand in the positive direction there are 4: 3'-CACATG-5', 126, 3'-CACATG-5', 565, 3'-CACATG-5', 2596, 3'-CACATG-5', 3114.
 
==F boxes==
{{main|F box gene transcriptions}}
 
"Male sex determination in the ''Caenorhabditis elegans'' hermaphrodite germline requires translational repression of tra-2 mRNA by the [Germ Line Development] GLD-1 RNA binding protein."<ref name=Clifford>{{ cite journal
|author=Robert Clifford, Min-Ho Lee, Sudhir Nayak, Mitsue Ohmachi, Flav Giorgini and Tim Schedl
|title=FOG-2, a novel F-box containing protein, associates with the GLD-1 RNA binding protein and directs male sex determination in the ''C. elegans'' hermaphrodite germline
|journal=Development
|date=December 2000
|volume=127
|issue=24
|pages=5265-76
|url=https://dev.biologists.org/content/develop/127/24/5265.full.pdf
|arxiv=
|bibcode=
|doi=
|pmid=
|accessdate=10 August 2020 }}</ref>
 
Skp, Cullin, F-box containing complex (or SCF complex)  is a multi-protein E3 [[ubiquitin ligase]] complex that catalyzes the [[ubiquitin]]ation of proteins destined for 26S [[Proteasome|proteasomal]] degradation.<ref name=Ou>{{ Cite journal
|last1=Ou|first1=Young
|title=The Centrosome in Higher Organisms: Structure, Composition, and Duplication
|volume=238
|date=2004
|journal=International Review of Cytology
|pages=119–182
|isbn=978-0-12-364642-2
|last2=Rattner|first2=J.B.
|doi=10.1016/s0074-7696(04)38003-4
|pmid=15364198 }}</ref>
 
"Canonical F-box proteins act as bridging components of the SCF ubiquitin ligase complex; the N-terminal F-box binds a Skp1 homolog, recruiting ubiquination machinery, while a C-terminal protein-protein interaction domain binds a specific substrate for degradation."<ref name=Clifford/>
 
"We used the yeast Gal4p two-hybrid system (Fields and Sternglanz, 1994) to identify proteins that physically interact with GLD-1. We recovered two identical cDNAs in two-hybrid screens [...]. One (OG2.3) using GLD-1 residues 84-341 and the other (CD13.1) using residues 273-457, both fused to the Gal4p DNA binding domain [...]."<ref name=Clifford/>
 
"When fused to the DNA-binding domain of [𝛃-galactosidase (Gal)] Gal4p, Ino2p but not Ino4p was able to activate a [upstream activation site] UAS<sub>GAL</sub>-containing reporter gene even in the absence of the heterologous Fbfl subunit. By deletion studies, two separate transcriptional activation domains were identified in the N-terminal part of Ino2p. Thus, the [basic helix-loop-helix] bHLH domains of Ino2p and Ino4p constitute the dimerization/DNA-binding module of Fbfl mediating its interaction with the [inositol/choline-responsive element] ICRE, while transcriptional activation is effected exclusively by Ino2p."<ref name=Schwank>{{ cite journal
|author=Sabine Schwank, Ronald Ebbert, Karin Rautenstrau𝛃, Eckhart Schweizer and Hans-Joachim Schüller
|title=Yeast transcriptional activator ''IN02'' interacts as an Ino2p/Ino4p basic helix-loop-helix heteromeric complex with the inositol/choline responsive element necessary for expression of phospholipid biosynthetic genes in ''Saccharomyces cerevisiae''
|journal=Nucleic Acids Research
|date=25 January 1995
|volume=23
|issue=2
|pages=230-37
|url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC306659/pdf/nar00002-0046.pdf
|arxiv=
|bibcode=
|doi=10.1093/nar/23.2.230
|pmid=
|accessdate=10 August 2020 }}</ref>
 
"This ICRE (consensus sequence TYTTCACATGY) contains the core sequence CANNTG, which is also known as an E box and which serves as a recognition site for DNA-binding proteins of the basic helix-loop-helix (bHLH) family (3). Members of the bHLH family comprise determinants of cellular differentiation and proliferation in mammalian and invertebrate systems such as the myogenic transcription factors MyoD, MRF4, myogenin and Myf-5(4) as well as factors not restricted to specialized tisues (E12, E47, daughterless, c-Myc and Mad; 5-7). Proteins of the bHLH group may form either homodimers or heterodimers or both, dependent on the individual structure of the respective interaction surface provided by the HLH domain(8)."<ref name=Schwank/>
 
==GAAC elements==
{{main|GAAC element gene transcriptions}}
# negative strand in the negative direction, looking for 3'-GAACT-5', 13, 3'-GAACT-5', 843, 3'-GAACT-5', 1009, 3'-GAACT-5', 1300, 3'-GAACT-5', 2127, 3'-GAACT-5', 2379, 3'-GAACT-5', 2580, 3'-GAACT-5', 2714, 3'-GAACT-5', 3103, 3'-GAACT-5', 3242, 3'-GAACT-5', 3401, 3'-GAACT-5', 3571, 3'-GAACT-5', 4012, 3'-GAACT-5', 4294,
# negative strand in the positive direction, looking for 3'-GAACT-5', 1, 3'-GAACT-5', 609,
# positive strand in the negative direction, looking for 3'-GAACT-5', 2, 3'-GAACT-5', 1685, 3'-GAACT-5', 3460,
# positive strand in the positive direction, looking for 3'-GAACT-5', 2, 3'-GAACT-5', 577, 3'-GAACT-5', 692,
# complement, negative strand, negative direction, looking for 3'-CTTGA-5', 2, 3'-CTTGA-5', 1685, 3'-CTTGA-5', 3460,
# complement, negative strand, positive direction, looking for 3'-CTTGA-5', 2, 3'-CTTGA-5', 577, 3'-CTTGA-5', 692,
# complement, positive strand, negative direction, looking for 3'-CTTGA-5', 13, 3'-CTTGA-5', 843, 3'-CTTGA-5', 1009, 3'-CTTGA-5', 1300, 3'-CTTGA-5', 2127, 3'-CTTGA-5', 2379, 3'-CTTGA-5', 2580, 3'-CTTGA-5', 2714, 3'-CTTGA-5', 3103, 3'-CTTGA-5', 3242, 3'-CTTGA-5', 3401, 3'-CTTGA-5', 3571, 3'-CTTGA-5', 4012, 3'-CTTGA-5', 4294,
# complement, positive strand, positive direction, looking for 3'-CTTGA-5', 1, 3'-CTTGA-5', 609,
# inverse complement, negative strand, negative direction, looking for 3'-AGTTC-5', 3, 3'-AGTTC-5', 3844, 3'-AGTTC-5', 4027, 3'-AGTTC-5', 4178,
# inverse complement, negative strand, positive direction, looking for 3'-AGTTC-5', 1, 3'-AGTTC-5', 761,
# inverse complement, positive strand, negative direction, looking for 3'-AGTTC-5', 6, 3'-AGTTC-5', 253, 3'-AGTTC-5', 719, 3'-AGTTC-5', 1177, 3'-AGTTC-5', 4024, 3'-AGTTC-5', 4175, 3'-AGTTC-5', 4417,
# inverse complement, positive strand, positive direction, looking for 3'-AGTTC-5', 0,
# inverse, negative strand, negative direction, looking for 3'-TCAAG-5', 6, 3'-TCAAG-5', 253, 3'-TCAAG-5', 719, 3'-TCAAG-5', 1177, 3'-TCAAG-5', 4024, 3'-TCAAG-5', 4175, 3'-TCAAG-5', 4417,
# inverse, negative strand, positive direction, looking for 3'-TCAAG-5', 0,
# inverse, positive strand, negative direction, looking for 3'-TCAAG-5', 3, 3'-TCAAG-5', 3844, 3'-TCAAG-5', 4027, 3'-TCAAG-5', 4178,
# inverse, positive strand, positive direction, looking for 3'-TCAAG-5', 1, 3'-TCAAG-5', 761.
 
==GA responsive elements==
{{main|GARE gene transcriptions}}
Only one GARE (an inverse) occurs: between ZSCAN22 and A1BG 3'-AAACAAT-5' at 230 nts and its complement.
 
==GATA boxes==
{{main|GATA gene transcriptions}}
 
GTGA-box has the consensus sequence GATA.<ref name=Ye/>
 
===Proximal promoters===
 
Inverse complement, negative strand, positive direction there is 1: 3'-TTTATCAC-5', 4125.
 
===Distal promoters===
 
Positive strand in the negative direction there are 2: 3'-GGGATAGA-5', 100, 3'-ATGATAGA-5', 355.
 
Inverse complement, negative strand, negative direction there is 1: 3'-GTTATCAT-5', 2500.
 
Inverse complement, positive strand, negative direction there is 1: 3'-TTTATCTT-5', 1732.
 
Inverse complement, negative strand, positive direction there is 1: 3'-GTTATCCC-5', 3385.
 
Inverse complement, positive strand, positive direction there are 2: 3'-GCTATCAG-5', 1840, 3'-TTTATCTT-5', 2628.
 
==G boxes==
{{main|G box gene transcriptions}}
There are no G boxes in either promoter.
 
==GC boxes==
{{main|GC box gene transcriptions}}
Positive strand in the negative direction there are 2; 3'-TGGGCGTGGT-5', 1898, 3'-TGGGCGTGGT-5', 3048, in the distal promoter.
 
Inverse complement, negative strand, negative direction there is 1: 3'-ACTCCGCCCA-5', 3092.
 
Inverse complement, positive strand, negative direction there is 1: 3'-GCTCCGCCTC-5', 1505.
 
Negative strand in the positive direction there is 1: 3'-TGGGCGGGAC-5', 409.
 
Inverse complement, positive strand, positive direction there is 1:, 3'-GCCACGCCCC-5', 491.
 
==GCC boxes==
 
The GCC box is the same as the AGC box.
 
==GLM boxes==
{{main|GLM box gene transcriptions}}
There are no GCN4-like motif (GLM) boxes in either promoter.
 
==Grainy head transcription factor binding sites==
{{main|Grainy head gene transcriptions}}
 
"The defined [[GRHL1]] DNA‐binding consensus sequence (AACCGGTT) was identical to that defined for [[GRHL3]], and also matched the consensus sequence for ''Drosophila GRH'' DNA binding, which we had previously identified by alignment of multiple GRH‐responsive gene regulatory regions (Wilanowski ''et al'', 2002; Ting ''et al'', 2005). Of note, the first of the two cytosines and the second of the guanines were invariant in both GRHL1 and GRHL3 CASTing assays."<ref name=Wilanowski>{{ cite journal
|author=Tomasz Wilanowski, Jacinta Caddy, Stephen B Ting, Nikki R Hislop, Loretta Cerruti, Alana Auden, Lin‐Lin Zhao, Stephen Asquith, Sarah Ellis, Rodney Sinclair, John M Cunningham and Stephen M Jane
|title=Perturbed desmosomal cadherin expression in grainy head‐like 1‐null mice
|journal=The EMBO Journal
|date=21 February 2008
|volume=27
|issue=
|pages=886-897
|url=https://www.embopress.org/doi/10.1038/emboj.2008.24
|arxiv=
|arxiv=
|bibcode=
|bibcode=
|doi=10.1038/emboj.2008.24
|doi=10.1093/gbe/evz102
|pmid=
|pmid=31106814
|accessdate=7 February 2020 }}</ref>
|accessdate=31 January 2021 }}</ref>
 
"The putative GRHL1‐binding motif (GACTGGTT) is perfectly conserved, together with 6 bp upstream and 12 bp downstream flanking sequences, in three of the ''Dsg1'' promoters: mouse ''Dsg1α'', mouse ''Dsg1γ'', and human ''DSG1'' [...]. In the mouse Dsg1β promoter, this motif is slightly different (AACTGGTT), although the flanking sequences are still conserved."<ref name=Wilanowski/>
 
==H boxes==
{{main|H box gene transcriptions}}
 
===Core promoters===
 
Between ZSCAN22 and A1BG: There is one inverse and its complement 3'-AGGAGA-5' at 4428 nts.


Between ZNF497 and A1BG: There is an inverse and its complement 3'-AGGACA-5' at 4252. There is five after the TSS: 3'-AGAGAA-5' at 4387, 3'-AGTACA-5' at 4365, 3'-ACCAGA-5' at 4380, 3'-AAGAGA-5' at 4386, 3'-ACGACA-5' at 4392 and their complements.
"The leukocyte receptor cluster (LRC) is a family of structurally related genes for immunoregulatory receptors. Originally, the term LRC was introduced to emphasize the linkage of the genes encoding killer immunoglobulin-like receptors (KIRs), leukocyte Ig-like receptors (LILRs), and FcαR on human chromosome 19q13.4 (Wagtmann et al. 1997; Wende et al. 1999). Subsequently, it has been found that the region contains some other structurally related genes, such as ''NCR1, GPVI, LAIR1, LAIR2,'' and ''OSCAR'' (Meyaard et al. 1997; Sivori et al. 1997; Clemetson et al. 1999; Kim et al. 2002). Most recently, the LRC has been further extended by adding two more genes named ''VSTM1/SIRL1'' and ''TARM1'' (Steevels et al. 2010; Radjabova et al. 2015)."<ref name=Guselnikov/>


===Proximal promoters===
"Except for LAIR2, which is a secreted protein, all human LRC products are type I cell surface receptors with extracellular regions composed of 1–4 C2-type Ig-like domains."<ref name=Guselnikov/>


Between ZSCAN22 and A1BG: There is one H box (3'-ANANNA-5'): negative direction, negative strand, 3'-ACACGA-5' at 4402. On the positive strand in the negative direction there are 16: 3'-ACAAAA-5' at 4216, 3'-AAAAAA-5' at 4218, 3'-AAAATA-5' at 4220, 3'-AAATAA-5' at 4221, 3'-ATAATA-5' at 4223, 3'-AAAAAA-5' at 4378, 3'-AAAAGA-5' at 4380, 3'-AAAGAA-5' at 4381, 3'-AGAAAA-5' at 4383, 3'-AAAAAA-5'at 4385, 3'-AAAAGA-5' at 4387, 3'-AAAGAA-5' at 4388, 3'-AGAAAA-5' at 4390, 3'-AAAAGA-5' at 4392, 3'-AAAGAA-5' at 4393, and 3'-AGAAAA-5' at 4395, with their complements on the negative strand, negative direction.
The "eutherian LRC family, in addition to commonly recognized members, includes two new, IGSF1 and alpha-1-B glycoprotein (A1BG)."<ref name=Guselnikov/>


Between ZNF497 and A1BG: There is one H box (3'-ANANNA-5'): 3'-AGAGAA-5' at 4387 in the proximal promoter, negative strand, positive direction. There are four: 3'-TCATGT-5' at 4365, 3'-TGGTCT-5' at 4380, 3'-TTCTCT-5' at 4386, and 3'-TGCTGT-5' at 4392 and their complements in the positive direction.
"Nucleotide sequences were retrieved and analyzed using utilities at the NCBI (https://www.ncbi.nlm.nih.gov/, last accessed May 20, 2019) and Ensemble (http://www.ensembl.org, last accessed May 20, 2019) websites."<ref name=Guselnikov/>


===Distal promoters===
"In our previous studies, it was observed that the Ig-like domains of the frog and chicken LRC proteins reproducibly showed homology not only to known LRC members but also to the products of four mammalian genes that to our knowledge have never been considered in the phylogenetic analyses of LRC. These genes are ''VSTM1, TARM1, A1BG,'' and ''IGSF1''. ''VSTM1'' and ''TARM1'' are the most recently identified members of the human LRC (Steevels et al. 2010; Radjabova et al. 2015). ''A1BG'' encodes alpha-1 B glycoprotein, a soluble component of mammalian blood plasma that is known for half a century (Schultze et al. 1963). The protein is composed of five Ig-like domains and has been shown to bind to CRISP-3, a small polypeptide that is present in exocrine secretions of neutrophilic granulocytes and that is believed to play a role in innate immunity (Udby et al. 2004). In the human genome, ''A1BG'' maps to 19q13.4 some 3.3 Mb away from ''GPVI'' [...]."<ref name=Guselnikov/>


Between ZSCAN22 and A1BG, negative strand, negative direction: 3'-AGAGGA-5' at 3387, 3'-AGAGGA-5' at 3638, and 3'-AGAGGA-5' at 3675. One inverse and its complement 3'-AGGAGA-5' at 3790. There are 14 H boxes: 3'-ACACCA-5' at 788, 3'-ACATCA-5' at 2541, 3'-ACACCA-5' at 2659, 3'-ACATTA-5' at 2675, 3'-ATAAAA-5' at 2853, 3'-AAAGTA-5' at 2886, 3'-ACATTA-5' at 3064, 3'-AGATGA-5' at 3159, 3'-ACACCA-5' at 3187, 3'-AGAAGA-5' at 3554, 3'-AGACGA-5' at 3707, 3'-ACACCA-5' at 3811, 3'-ACATTA-5' at 3973, and 3'-ACATCA-5' at 4124.
"The attribution of IGSF1 and A1BG domains to the LRC was supported by their 3D structures predicted using homology modeling [...]."<ref name=Guselnikov/>


On the positive strand, negative direction, there are 127 H boxes: 3'-ACCACA-5' at 608, 3'-ACCACA-5' at 793, 3'-ACACCA-5' at 883, 3'-ACCACA-5', 1477, 3'-ACACCA-5' at 2419, 3'-AAAAAA-5' at 2461, 3'-AAAAAA-5' at 2462, 3'-AAAAAA-5' at 2463, 3'-AAAAAA-5' at 2464, 3'-AAAAAA-5' at 2465, 3'-AAAAAA-5' at 2466, 3'-AAAAAA-5' at 2467, 3'-AAAAAA-5' at 2468, 3'-AAAAAA-5' at 2469, 3'-AAAAAA-5' at 2470, 3'-AAAGCA-5' at 2473, 3'-AAAGCA-5' at 2479, 3'-AAACAA-5' at 2484, 3'-AAACAA-5' at 2488, 3'-ACAAAA-5' at 2490, 3'-ATAGTA-5' at 2500, 3'-AGAAAA-5' at 2506, 3'-AAAACA-5' at 2508, 3'-AAACAA-5' at 2509, 3'-AGACCA-5' at 2599, 3'-ATACAA-5' at 2642, 3'-ACAAAA-5' at 2644, 3'-AAATCA-5' at 2648, 3'-ACAGGA-5' at 2690, 3'-AAATCA-5' at 2749, 3'-AGAGCA-5' at 2781, 3'-AAAAGA-5' at 2798, 3'-AAAGAA-5' at 2799, 3'-AAAGAA-5' at 2803, 3'-AGAAAA-5' at 2805, 3'-AAAAGA-5' at 2807, 3'-AGAGAA-5' at 2810, 3'-AGAAGA-5' at 2812, 3'-AGAAAA-5' at 2815, 3'-AAAAAA-5' at 2817, 3'-AAAAGA-5' at 2819, 3'-AAAGAA-5' at 2820, 3'-AGAAAA-5' at 2822, 3'-AAAAGA-5' at 2824, 3'-AGAGAA-5' at 2827, 3'-AGAAGA-5' at 2829, 3'-AGAAAA-5' at 2832, 3'-AAAAAA-5' at 2834, 3'-AAAAGA-5' at 2836, 3'-AAAGAA-5' at 2837, 3'-AGAAAA-5' at 2839, 3'-AAAACA-5' at 2841, 3'-AAACAA-5' at 2842, 3'-AAAATA-5' at 2868, 3'-ATATAA-5' at 2873, 3'-AAAAAA-5' at 2929, 3'-ACATCA-5' at 2941, 3'-ACATTA-5' at 2951, 3'-AAACCA-5' at 2971, 3'-AAAATA-5' at 3012, 3'-AAATAA-5' at 3013, 3'-AAAAAA-5' at 3026, 3'-AAACTA-5' at 3029, 3'-AGACCA-5' at 3122, 3'-AAAACA-5' at 3166, 3'-ACATAA-5' at 3169, 3'-ATAAAA-5' at 3171, 3'-AAATTA-5' at 3175, 3'-AGATCA-5' at 3277, 3'-ACAAGA-5' at 3307, 3'-AGAGCA-5' at 3310, 3'-AAAACA-5' at 3329, 3'-AAACAA-5' at 3330, 3'-AAATAA-5' at 3334, 3'-AAACAA-5' at 3338, 3'-ACAAGA-5' at 3340, 3'-AGAAAA-5' at 3343, 3'-AAACCA-5' at 3365, 3'-AGAGGA-5' at 3387, 3'-ACATCA-5' at 3394, 3'-AGAGAA-5' at 3406, 3'-ACATCA-5' at 3415, 3'-ACATTA-5' at 3436, 3'-ATATTA-5' at 3454, 3'-ATATTA-5' at 3468, 3'-AAACCA-5' at 3484, 3'-AGATCA-5' at 3489, 3'-AAAACA-5' at 3511, 3'-ACACAA-5' at 3514, 3'-ATAATA-5' at 3538, 3'-ACAAGA-5' at 3635, 3'-AGAGGA-5' at 3638, 3'-AAAGAA-5' at 3666, 3'-AGAACA-5' at 3668, 3'-AGAGGA-5' at 3675, 3'-ACAAGA-5' at 3759, 3'-AGACCA-5' at 3762, 3'-ACCACA-5' at 3764, 3'-ACAAAA-5' at 3767, 3'-AGAGCA-5' at 3913, 3'-AGATGA-5' at 3920, 3'-AGACCA-5' at 4031, 3'-ACAAAA-5' at 4066, 3'-AAAAAA-5' at 4068, 3'-AAAATA-5' at 4070, 3'-AAATAA-5' at 4071, 3'-AAATAA-5' at 4075, 3'-ATAATA-5' at 4077, 3'-ATAGAA-5' at 4080, 3'-AAAGAA-5' at 4084, 3'-AGAAAA-5' at 4086, 3'-AGACAA-5' at 4182, 3'-ACAAAA-5' at 4216, 3'-AAAAAA-5' at 4218, 3'-AAAATA-5' at 4220, 3'-AAATAA-5' at 4221, 3'-ATAATA-5' at 4223, 3'-AAAAAA-5' at 4378, 3'-AAAAGA-5' at 4380, 3'-AAAGAA-5' at 4381, 3'-AGAAAA-5' at 4383, 3'-AAAAAA-5' at 4385, 3'-AAAAGA-5' at 4387, 3'-AAAGAA-5' at 4388, 3'-AGAAAA-5' at 4390, 3'-AAAAGA-5' at 4392, 3'-AAAGAA-5' at 4393, and 3'-AGAAAA-5' at 4395.
"Noteworthy is that the D1 and D6 domains of IgSF1 fall into one clade with the N-terminal (d1) domains of A1BG and OSCAR (cluster B1). Closer relationship of A1BG and OSCAR was supported by clustering of the d2–d5 domains of A1BG with membrane-proximal (d2) domain of OSCAR (cluster B2)."<ref name=Guselnikov/>


Between ZNF497 and A1BG: There are two H boxes after nucleotide number 2300 in the negative strand and positive direction: 3'-ACCACA-5' at 420, 3'-ACACCA-5' at 386, 3'-TGGTGT-5' at 511, 3'-TGGTGT-5' at 530, 3'-ACACCA-5' at 2603 and 3'-ACACCA-5' at 3825.
"Altogether, these results support the attribution of IGSF1 and A1BG to the LRC and suggest their relatedness to OSCAR, TARM1, and VSTM1."<ref name=Guselnikov/>


There are two H boxes after nucleotide number 2300 in the positive strand and positive direction: 3'-ACACCA-5' at 204, 3'-ACACCA-5' at 528, 3'-ACACCA-5' at 3643 and 3'-ACACCA-5' at 3967.
"Clustering of the N-terminal domains of OSCAR, IGSF1, and A1BG with each other and with IGSF1 d6 was also reproduced. Finally, the d2 domains of OSCAR cluster with the d2–d5 domains of A1BG (fig. 5). These results further justify grouping IGSF1, A1BG, OSCAR, TARM1, and VSTM1 into a distinct group B."<ref name=Guselnikov/>
 
Regarding 3'-ANANNA-5', on the negative strand, positive direction, there are 25 H boxes: 3'-ATACCA-5' at 2591, 3'-ACACCA-5' at 2603, 3'-ATAGAA-5' at 2628, 3'-AAACCA-5' at 2632, 3'-ACACTA-5'at 2637, 3'-ATATAA-5' at 2662, 3'-AGAGCA-5' at 2704, 3'-AGAGGA-5' at 2793, 3'-AAAGGA-5' at 2829, 3'-ACAGAA-5' at 2838, 3'-AAAGAA-5' at 3066, 3'-AGAACA-5' at 3094, 3'-AGAGCA-5' at 3138, 3'-ACAGCA-5' at 3212, 3'-ACAGTA-5' at 3414, 3'-AGATGA-5' at 3476, 3'-ACAGGA-5' at 3572, 3'-AAAGCA-5' at 3599, 3'-ACATGA-5' at 3708, 3'-ACACCA-5' at 3825, 3'-AAAAGA-5' at 3929, 3'-AGAACA-5' at 4068, 3'-AAATGA-5' at 4094, 3'-ACATCA-5' at 4116, and 3'-ACATGA-5' at 4154.
 
On the positive strand, positive direction there are 20 H boxes: 3'-AAATAA-5' at 2347, 3'-AAAAAA-5' at 2451, 3'-AAAACA-5' at 2453, 3'-AGACGA-5' at 2976, 3'-AGACCA-5' at 3022, 3'-AGAGAA-5' at 3056, 3'-AGAAGA-5' at 3058, 3'-AGAGGA-5' at 3302, 3'-AGACGA-5' at 3307, 3'-ACAGAA-5' at 3393, 3'-AGAAGA-5' at 3395, 3'-ACAGGA-5' at 3620, 3'-ACACCA-5' at 3643, 3'-AAACCA-5' at 3948, 3'-ACACCA-5' at 3967, 3'-AGAGGA-5' at 4059, 3'-AAAATA-5' at 4122, 3'-AAATCA-5' at 4137, 3'-AAATAA-5' at 4142, and 3'-ATATTA-5' at 4168.
 
There inverses on the negative strand in the positive direction of 31 H boxes: 3'-ATGACA-5' at 2412, 3'-ACTACA-5' at 2428, 3'-AGGACA-5' at 2460, 3'-ATTATA-5' at 2548, 3'-ACCACA-5' at 2600, 3'-AGGAAA-5' at 2623, 3'-AATAGA-5' at 2627, 3'-ACCACA-5' at 2634, 3'-AACAGA-5' at 2652, 3'-AGCAAA-5' at 2706, 3'-AGGAAA-5' at 2831, 3'-AACACA-5' at 2835, 3'-ATGACA-5' at 2843, 3'-AGAACA-5' at 3094, 3'-AACACA-5' at 3096, 3'-AGGACA-5' at 3131, 3'-ACCAAA-5' at 3175, 3'-AACAGA-5' at 3179, 3'-AGCAGA-5' at 3214, 3'-AGTAGA-5' at 3416, 3'-AATAAA-5' at 3427, 3'-ACCAGA-5' at 3548, 3'-ATGACA-5' at 3569, 3'-AGGAGA-5' at 3650, 3'-AGCACA-5' at 3740, 3'-ACCACA-5' at 3859, 3'-AAAAGA-5' at 3929, 3'-AGAACA-5' at 4068, 3'-ATCATA-5' at 4149, and 3'-ATTATA-5' at 4166.
 
==HMG boxes==
{{main|HMG box gene transcriptions}}
 
==HNF6s==
{{main|HNF gene transcriptions}}
 
===Core promoters===
 
Inverse complement, positive strand, negative direction there is 1: 3'-TTATTAATTC-5', 4542.
 
===Proximal promoters===
 
Negative strand in the negative direction there is 1: 3'-TTATTAATCG-5', 4229.
 
Negative strand in the positive direction there are 2: 3'-TTATTAATCA-5', 4147, 3'-TTATTGATTA-5', 4164.
 
Inverse complement, positive strand, positive direction there are 1: 3'-ATATTAACAA-5', 4172.
 
===Distal promoters===
 
Negative strand in the negative direction there are 2: 3'-GTGTTAATAA-5', 1725, 3'-TAGTTGATAA-5', 3527.
 
Positive strand in the negative direction there is 1: 3'-AAATTGATAA-5', 3361.
 
Inverse complement, negative strand, negative direction there are 2: 3'-ACATGGACAT-5', 802, 3'-TAATGAACTT-5', 1301.
 
Inverse complement, positive strand, negative direction there are 2: 3'-AAATTGATAA-5', 3361, 3'-TCATCAACTA-5', 3525.
 
Negative strand in the positive direction there are 1: 3'-ATGTCCATGG-5', 3581.
 
Positive strand in the positive direction there is 1: 3'-GAGTCCATTG-5', 3732.
 
Inverse complement, positive strand, positive direction there is 1: 3'-CCATTGACTC-5', 3736.
 
==Homeoboxes==
 
"Transcription factors Pax-4 and Pax-6 are known to be key regulators of pancreatic cell differentiation and development. [...] The gene-targeting experiments revealed that Pax-4 and Pax-6 cannot substitute for each other in tissue with overlapping expression of both genes. [The] DNA-binding specificities of Pax-4 and Pax-6 are similar. The Pax-4 homeodomain [HD} was shown to preferentially dimerize on DNA sequences consisting of an inverted TAAT motif, separated by 4-nucleotide spacing."<ref name=Kalousova>{{ cite journal
|author=Anna Kalousová, Vladimı́r Beneš, Jan Pačes, Václav Pačes and Zbyněk Kozmik
|title=DNA Binding and Transactivating Properties of the Paired and Homeobox Protein Pax4
|journal=Biochemical and Biophysical Research Communications
|date=June 1999
|volume=259
|issue=3
|pages=510-518
|url=https://www.sciencedirect.com/science/article/abs/pii/S0006291X99908094
|arxiv=
|bibcode=
|doi=
|pmid=
|accessdate=6 May 2020 }}</ref>
 
The "crucial difference between the binding sites of Antennapedia class and TTF-1 HDs is in the motifs 5'-TAAT-3', recognized by Antennapedia [a Hox gene, a subset of homeobox genes, first discovered in Drosophila which controls the formation of legs during development], and 5'-CAAG-3', preferentially bound by TTF-1. [The] binding of wild type and mutants TTF-1 HD to oligonucleotides containing either 5'-TAAT-3' or 5'-CAAG-3' indicate that only in the presence of the latter motif the Gln<sub>50</sub> in TTF-1 HD is utilized for DNA recognition."<ref name=Damante>{{ cite journal
|author=G. Damante, D. Fabbro, L. Pelizari, D. Civitareale, S. Guazzi, M. Polycarpou-Schwartz, S. Cauci, F. Quadrifoglio, S. Formisano and R. Di Lauro
|title=Sequence-specific DNA recognition by the thyroid transcription factor-1 homeodomain
|journal=Nucleic Acids Research
|date=20 June 1994
|volume=22
|issue=15
|pages=3075-83
|url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC310278/pdf/nar00039-0221.pdf
|arxiv=
|bibcode=
|doi=
|pmid=
|accessdate=6 May 2020 }}</ref>
 
==HY boxes==
{{main|HY box gene transcriptions}}
 
===Core promoters===
 
Positive strand in the negative direction there is 1: 3'-TGAGGG-5' at 4558.
 
Inverse complement, negative strand, negative direction there is 1: 3'-CCCTCA-5', 4498.
 
Negative strand in the positive direction there is 1: 3'-TGTGGG-5', 4395.
 
===Distal promoters===
 
Negative strand in the negative direction there is 1: 3'-TGTGGG-5' at 749.
 
Positive strand in the negative direction there are 4: 3'-TGAGGG-5' at 88, 3'-TGAGGG-5' at 2699, 3'-TGAGGG-5' at 3652, 3'-TGTGGG-5' at 3712.
 
Inverse complement, negative strand, negative direction there are 3: 3'-CCCTCA-5', 2702, 3'-CCCACA-5', 3184, 3'-CCCTCA-5', 3889.
 
Positive strand in the positive direction there are 2: 3'-TGTGGG-5', 2965, 3'-TGTGGG-5', 3533.
 
Negative strand in the positive direction there are 3: 3'-TGAGGG-5', 258, 3'-TGAGGG-5', 3479, 3'-TGAGGG-5', 3879.
 
Inverse complement, negative strand, positive direction there are 3: 3'-CCCTCA-5', 88, 3'-CCCTCA-5', 3207, 3'-CCCTCA-5', 3503.
 
Inverse complement, positive strand, positive direction there is 5: 3'-CCCTCA-5', 494, 3'-CCCTCA-5', 662, 3'-CCCTCA-5', 1783, 3'-CCCACA-5', 1803, 3'-CCCTCA-5', 3185.
 
==I boxes==
{{main|I box gene transcriptions}}
 
==Initiator elements (YYANWYY)==
{{main|Initiator element gene transcriptions}}
 
===Core promoters===
 
There is the following Inr in the core promoter, negative strand, negative direction: 3'-TTACTCC-5' at 4557.
 
There are four Inrs in the core promoter, positive strand, negative direction: 3'-CCACTCC-5' at 4425, 3'-CCACTTT-5' at 4461, 3'-TCACATT-5' at 4533, and 3'-TTAATTC-5' at 4542.
 
There is the following Inr in the core promoter, negative strand, positive direction: 3'-CTGCACC-5' at 4343.
 
There are two Inrs in the core promoter, positive strand, positive direction: 3'-CCACTCC-5' at 4401 and 3'-CCAGACC-5' at 4416.
 
===Proximal promoters===
 
There are eight Inrs on the negative strand in the negative direction: 3'-TCACTCT-5' at 4202, 3'-TCGGTCT-5' at 4233, 3'-CTGCACC-5' at 4238, 3'-TCGGACC-5' at 4300, 3'-CCAGTTT-5' at 4309, 3'-TCGGACC-5' at 4349, 3'-TCACACT-5' at 4361, and 3'-TTACTCC-5' at 4557.
 
There are seven Inrs on the positive strand in the negative direction: 3'-CCGGACT-5' at 4327, 3'-CTGCACT-5' at 4340, 3'-CCAGTTC-5' at 4417, 3'-CCACTCC-5' at 4425, 3'-CCACTTT-5' at 4461, 3'-TCACATT-5' at 4533, and 3'-TTAATTC-5' at 4542.
 
There is one Inr on the negative strand in the positive direction: 3'-CTGCACC-5' at 4343.
 
There is two Inrs on the positive strand in the positive direction: 3'-CCACTCC-5' at 4401 and 3'-CCAGACC-5' at 4416.
 
===Distal promoters===
 
Negative strand in the negative direction there are 87: 3'-TTGTTCC-5', 71, 3'-CTATACC-5', 77, 3'-CCGTTTC-5', 93, 3'-CCGTACT-5', 124, 3'-CCATATT-5', 181, 3'-CTACATT-5', 247, 3'-TTGGTCC-5', 262, 3'-TTATACT-5', 274, 3'-TCACTCT-5', 301, 3'-CTGCTTT-5', 312, 3'-CCGGTTC-5', 419, 3'-CCAGTCC-5', 441, 3'-TCGGACC-5', 459, 3'-TTGTATC-5', 468, 3'-TCACTTT-5', 473, 3'-TCGGACC-5', 508, 3'-CCGGTTC-5', 556, 3'-CCAGTCC-5', 578, 3'-TTATACC-5', 605, 3'-CCGGTCC-5', 648, 3'-CCGGTTC-5', 692, 3'-CCAGTCC-5', 714, 3'-TCGGACT-5', 732, 3'-TCGCACC-5', 741, 3'-CTACACC-5', 787, 3'-TCGGTTC-5', 874, 3'-TCGGACC-5', 899, 3'-TCGCTCT-5', 913, 3'-TCGGTCC-5', 948, 3'-CCGTACC-5', 953, 3'-TTAGTCC-5', 984, 3'-TTGGACC-5', 1015, 3'-TCACTCT-5', 1079, 3'-TCGGACC-5', 1198, 3'-TTGTACC-5', 1207, 3'-CCACTTT-5', 1212, 3'-CCGCACC-5', 1244, 3'-TTGGATC-5', 1306, 3'-TCAGACC-5', 1356, 3'-TTATTCT-5', 1365, 3'-TCGTTTT-5', 1371, 3'-TTGTTTT-5', 1394, 3'-CCACACT-5', 1479, 3'-TTGCTTC-5', 1555, 3'-CCGTTTT-5', 1561, 3'-TTACTTT-5', 1582, 3'-TTGGATT-5', 1591, 3'-TTAATTT-5', 1697, 3'-TTATACC-5', 1742, 3'-CCGCACC-5', 1897, 3'-CCGTACT-5', 1953, 3'-TTGGACC-5', 1959, 3'-TCGGACC-5', 2009, 3'-TCGTTCT-5', 2023, 3'-TTACACC-5', 2065, 3'-CCGGTCC-5', 2077, 3'-TCACATT-5', 2087, 3'-TCAAACT-5', 2141, 3'-TTGTACC-5', 2152, 3'-CCGCTTT-5', 2157, 3'-CCAGTCC-5', 2250, 3'-TCAAACT-5', 2257, 3'-TCGGACC-5', 2268, 3'-TCGTACC-5', 2277, 3'-CCACTTT-5', 2282, 3'-TTGGACC-5', 2385, 3'-TCGGACC-5', 2435, 3'-TCACTCT-5', 2449, 3'-TCGTTTT-5', 2476, 3'-TTGTTTT-5', 2490, 3'-TCATTCT-5', 2503, 3'-CCGGTCC-5', 2519, 3'-CCAGTCC-5', 2587, 3'-TCACACC-5', 2605, 3'-TTGTACC-5', 2614, 3'-CCACTTT-5', 2619, 3'-TCACACC-5', 2658, 3'-TTGGACC-5', 2720, 3'-TCGGACC-5', 2770, 3'-TCGTACT-5', 2784, 3'-TTGATTC-5', 2914, 3'-CCGATTT-5', 3009, 3'-TTGATTC-5', 3031, 3'-CCGCACC-5', 3047, 3'-TCGGACC-5', 3128, 3'-TTGTTCC-5', 3141, 3'-CCACTTT-5', 3146, 3'-TTGTATT-5', 3169, 3'-CCACACC-5', 3186, 3'-TCGGTTC-5', 3273, 3'-TCGGACC-5', 3298, 3'-TTGTTCT-5', 3307, 3'-TCGTTTT-5', 3313, 3'-TTGTTCT-5', 3340, 3'-TCGTTCT-5', 3374, 3'-CCGAACT-5', 3401, 3'-CCGTATC-5', 3446, 3'-TTGATCT-5', 3463, 3'-TTGGTCT-5', 3486, 3'-CTGTTCT-5', 3759, 3'-CTACACC-5', 3810, 3'-CTGGTCC-5', 3871, 3'-TCATTCT-5', 3893, 3'-CTACTTT-5', 3922, 3'-CCGGTCC-5', 3951, 3'-TCGGACC-5', 4037, 3'-TTGTATC-5', 4046, 3'-TCACTCT-5', 4051, 3'-TTACACT-5', 4092, 3'-CCGGTCC-5', 4102, 3'-CCGTACC-5', 4107, 3'-CCGGTCC-5', 4170, 3'-TCGAACC-5', 4188.
 
Positive strand in the negative direction there are 40: 3'-CTGAATT-5', 20, 3'-TTGGACC-5', 32, 3'-CTGCATT-5', 152, 3'-TTGAACC-5', 846, 3'-TCACACC-5', 882, 3'-TTGAACC-5', 1012, 3'-TCACTCC-5', 1058, 3'-TCACACC-5', 1128, 3'-TTGAACC-5', 1303, 3'-TTGCACC-5', 1339, 3'-TTGCACT-5', 1347, 3'-CCAGTCT-5', 1354, 3'-CCATTTC-5', 1380, 3'-TCGCTCT-5', 1450, 3'-CTATATC-5', 1528, 3'-TTATTTT-5', 1727, 3'-CTGCACT-5', 2000, 3'-CTACTCC-5', 2352, 3'-TTGAACC-5', 2382, 3'-TCACACC-5', 2418, 3'-CTGCACT-5', 2426, 3'-TTGAATC-5', 2708, 3'-TTGAACC-5', 2717, 3'-CTGCACC-5', 2761, 3'-TTGAACC-5', 3245, 3'-TTGCACT-5', 3289, 3'-CCAGATC-5', 3488, 3'-CTGCTCC-5', 3582, 3'-CCATTTC-5', 3688, 3'-CTGGACT-5', 3747, 3'-CTGAACC-5', 3784, 3'-CCATACC-5', 3858, 3'-TCACACC-5', 3967.
 
Inverse complement, negative strand, negative direction there are 32: 3'-GATACAA-5', 213, 3'-GGACCGA-5', 598, 3'-AGTGCGG-5', 664, 3'-GGACTGG-5', 734, 3'-AGTGTGG-5', 882, 3'-GAAGTGA-5', 1056, 3'-AGTGTGG-5', 1128, 3'-GGACCGG-5', 1200, 3'-AGAGCGA-5', 1448, 3'-GGTCCGA-5', 1462, 3'-GATATAG-5', 1528, 3'-AGAACGG-5', 1608, 3'-AAAATAG-5', 1730, 3'-AGTGCAG-5', 1773, 3'-GGACCGA-5', 1843, 3'-AGTGCGG-5', 1992, 3'-AGTGCGG-5', 2208, 3'-AGTGTGG-5', 2418, 3'-AGTACGG-5', 2535, 3'-AGTACGG-5', 2753, 3'-AAAGTAG-5', 2887, 3'-GATTCGA-5', 3033, 3'-GGACCGG-5', 3130, 3'-AGTGCGG-5', 3281, 3'-AGTCCGA-5', 3398, 3'-GGTCTAG-5', 3488, 3'-GGTATGG-5', 3858, 3'-GGTCCGG-5', 3873, 3'-AGTGTGG-5', 3967.
 
Negative strand in the positive direction there are 45: 3'-TTGTATT-5', 115, 3'-CTGTTTT-5', 147, 3'-CCACACT-5', 345, 3'-CCGGACT-5', 746, 3'-CTGCACT-5', 1372, 3'-CTGCACT-5', 1472, 3'-CCAGACT-5', 1744, 3'-CCACTTC-5', 1914, 3'-CTATTTC-5', 1978, 3'-CCAGTCC-5', 2026, 3'-TCGCTTC-5', 2095, 3'-TCATATT-5', 2178, 3'-CTGCATT-5', 2206, 3'-CCAGATC-5', 2230, 3'-TCAATCT-5', 2235, 3'-CTGTTTC-5', 2263, 3'-TCACTCT-5', 2306, 3'-CTACACC-5', 2430, 3'-CTAATTT-5', 2440, 3'-CCGCACC-5', 2566, 3'-TTATACC-5', 2590, 3'-CCACACC-5', 2602, 3'-CCACACT-5', 2636, 3'-TCAGATT-5', 2868, 3'-CTGCTCC-5', 2978, 3'-CCAGTCC-5', 2998, 3'-CCAGTCC-5', 3084, 3'-CTGGTCT-5', 3245, 3'-TCGCTCT-5', 3276, 3'-CTGGTCT-5', 3299, 3'-CTGCTCC-5', 3309, 3'-CTGCACC-5', 3322, 3'-CCGCATC-5', 3328, 3'-TTGCACT-5', 3343, 3'-CTGTTCC-5', 3352, 3'-TTGCATC-5', 3402, 3'-TCACACT-5', 3507, 3'-CCAGACC-5', 3550, 3'-CTGTTCC-5', 3625, 3'-TCACACC-5', 3824, 3'-TCATTTT-5', 4120, 3'-TCACTCT-5', 4128, 3'-TTGATTT-5', 4134, 3'-TTAGTTT-5', 4139.
 
Positive strand in the positive direction there are 75: 3'-CTGGACC-5', 40, 3'-CCGGTCC-5', 215, 3'-TTACACT-5', 230, 3'-CCGGACC-5', 286, 3'-CCGTTCC-5', 503, 3'-TCGGTCC-5', 515, 3'-CCGCTCT-5', 557, 3'-CCGTTCC-5', 587, 3'-CCGCTCT-5', 641, 3'-CCGTTCC-5', 671, 3'-CCGGACT-5', 725, 3'-CCGTTCC-5', 823, 3'-TCGGTCT-5', 835, 3'-TTGGACC-5', 847, 3'-CCGTTCC-5', 923, 3'-TCGGTCT-5', 935, 3'-TTGGACC-5', 947, 3'-CCGTTCC-5', 1007, 3'-TCGCTCT-5', 1061, 3'-CCGGTCC-5', 1175, 3'-CCGCTCT-5', 1229, 3'-CCGTTCC-5', 1259, 3'-CCGTTCC-5', 1327, 3'-CCGCTCT-5', 1381, 3'-CCGTTCC-5', 1427, 3'-CCGCTCT-5', 1481, 3'-TCGTTCC-5', 1511, 3'-CCGCTCT-5', 1565, 3'-CCGCACT-5', 1720, 3'-CCACACC-5', 1805, 3'-CCGCTCT-5', 1921, 3'-CCGTTCT-5', 1948, 3'-CCACACC-5', 1971, 3'-TCAATTT-5', 2136, 3'-TTGTACT-5', 2141, 3'-CTACTTT-5', 2146, 3'-CCGTTCT-5', 2190, 3'-CCAGTCT-5', 2222, 3'-TTGGTCT-5', 2228, 3'-CCGCACT-5', 2555, 3'-CCGGTCC-5', 2574, 3'-TCAGTCT-5', 2609, 3'-TCAGTTC-5', 2615, 3'-TCAGTCC-5', 2620, 3'-CTATATT-5', 2662, 3'-TCAATCC-5', 2668, 3'-TCGTTTT-5', 2707, 3'-TCGATTC-5', 2789, 3'-TTGCTCC-5', 2806, 3'-CTAAACT-5', 2871, 3'-CTGGTCC-5', 2876, 3'-CCAGACT-5', 2943, 3'-CCGGACC-5', 2988, 3'-CCAGACC-5', 3021, 3'-TTATACC-5', 3162, 3'-CTGGTTT-5', 3175, 3'-TCGGTCT-5', 3221, 3'-CTACTCC-5', 3478, 3'-CCGATCC-5', 3484, 3'-TCGATCC-5', 3522, 3'-CTGGTCT-5', 3548, 3'-TCACACT-5', 3594, 3'-CCACTCC-5', 3647, 3'-CCGGACC-5', 3679, 3'-CCGGACC-5', 3758, 3'-CTGGACC-5', 3787, 3'-TCACTCC-5', 3878, 3'-TCAGACT-5', 3924, 3'-TCACACC-5', 3966, 3'-CCACACT-5', 3971, 3'-TTACTCC-5', 4096, 3'-CTACTCC-5', 4102, 3'-CTAAATC-5', 4136, 3'-CCACTCC-5'.
 
Inverse complement, negative strand, positive direction there are 61: 3'-AGAGTGG-5', 53, 3'-AATGTGA-5', 230, 3'-GGAGCGA-5', 429, 3'-AGACCGG-5', 442, 3'-GGTGCGG-5', 489, 3'-AGTGCGG-5', 498, 3'-AGTGCGG-5', 582, 3'-AGTGCGG-5', 666, 3'-GGTGCAG-5', 784, 3'-AGTGCGG-5', 1086, 3'-AGTGCGG-5', 1170, 3'-AGTGCGG-5', 1254, 3'-AATGCGG-5', 1322, 3'-AATGCGG-5', 1422, 3'-AGTGCGG-5', 1590, 3'-GAAGCGG-5', 1636, 3'-GGTGCGG-5', 1764, 3'-AGTGCAG-5', 1787, 3'-GGTGTGG-5', 1805, 3'-GAACTGG-5', 1953, 3'-GGTGTGG-5', 1971, 3'-AAAGCAG-5', 2007, 3'-AGTGCAG-5', 2064, 3'-GAACCAG-5', 2227, 3'-AGATCAA-5', 2232, 3'-AGTGCAG-5', 2327, 3'-GGTGCAA-5', 2335, 3'-GAAATAG-5', 2626, 3'-GATATAA-5', 2662, 3'-GGACTGA-5', 2674, 3'-AGAGCAA-5', 2705, 3'-AAAGTGG-5', 2711, 3'-GGTGCAA-5', 2801, 3'-AGAATGA-5', 2841, 3'-GATTTGA-5', 2871, 3'-GGTCTGA-5', 2943, 3'-GGTCTGG-5', 3021, 3'-AATATGG-5', 3162, 3'-GAAATGG-5', 3168, 3'-GGACCAA-5', 3174, 3'-GGAATGA-5', 3441, 3'-GATGCAG-5', 3460, 3'-AGTGCAG-5', 3465, 3'-GGACCAG-5', 3547, 3'-GGAATGA-5', 3567, 3'-AGTGTGA-5', 3594, 3'-GAAGCGG-5', 3670, 3'-AATCCGA-5', 3799, 3'-AGAATGA-5', 3835, 3'-GAACCAG-5', 3840, 3'-AGAGTGA-5', 3876, 3'-AGTCTGA-5', 3924, 3'-AGTGTGG-5', 3966, 3'-GGTGTGA-5', 3971, 3'-AGAGTGG-5', 4040, 3'-AGAACAG-5', 4069, 3'-GAAATGA-5', 4094, 3'-GATTTAG-5', 4136.
 
Inverse complement, positive strand, negative direction there are 100: 3'-AGACTGA-5', 17, 3'-GGACCAG-5', 34, 3'-AAAACAA-5', 69, 3'-GATATGG-5', 77, 3'-AAACTGA-5', 130, 3'-AAAACAG-5', 167, 3'-GGTATAA-5', 181, 3'-GAAACAA-5', 229, 3'-GATGTAA-5', 247, 3'-AGTTCAA-5', 255, 3'-AAACCAG-5', 261, 3'-AATATGA-5', 274, 3'-AGAACAG-5', 288, 3'-AAACTGA-5', 307, 3'-GGTGCGG-5', 380, 3'-AGTGCGA-5', 448, 3'-AATACGA-5', 492, 3'-AAATTAG-5', 499, 3'-AGATTGA-5', 585, 3'-AATATGG-5', 605, 3'-AATACAA-5', 635, 3'-AAATTGG-5', 643, 3'-AGTTCGA-5', 721, 3'-AGACCAG-5', 727, 3'-AATACAA-5', 769, 3'-AAATTAG-5', 777, 3'-GATGTGG-5', 787, 3'-AGAGCGA-5', 911, 3'-GATCCAG-5', 975, 3'-AGATTGG-5', 1045, 3'-AGAGTGA-5', 1077, 3'-AAATTAG-5', 1234, 3'-AGTCTGG-5', 1356, 3'-AGAGCAA-5', 1369, 3'-AAAACAA-5', 1388, 3'-AGTGCAG-5', 1471, 3'-GGTGTGA-5', 1479, 3'-AGTGCAA-5', 1536, 3'-AGAACGA-5', 1553, 3'-AATACAG-5', 1566, 3'-GAAACAA-5', 1585, 3'-GAAATGA-5', 1663, 3'-AAAGCGG-5', 1680, 3'-GAATTAA-5', 1696, 3'-AATATGG-5', 1742, 3'-AATACAA-5', 1878, 3'-AAATTAG-5', 1887, 3'-AGACTGA-5', 1935, 3'-AGAATGG-5', 1948, 3'-AGAGCAA-5', 2021, 3'-AATGTGG-5', 2065, 3'-GGTGCAG-5', 2082, 3'-AGTGTAA-5', 2087, 3'-AGTTTGA-5', 2141, 3'-AGACCAA-5', 2147, 3'-GATACAA-5', 2180, 3'-AAAATGA-5', 2187, 3'-GGTGCGG-5', 2197, 3'-AGTTTGA-5', 2257, 3'-AGACCAG-5', 2263, 3'-AATACAA-5', 2305, 3'-AAACTAG-5', 2313, 3'-AGAGTGA-5', 2447, 3'-GATTCGG-5', 2454, 3'-AAAGCAA-5', 2474, 3'-AAAGCAA-5', 2480, 3'-AAAACAA-5', 2509, 3'-AGACCAG-5', 2600, 3'-AGTGTGG-5', 2605, 3'-AAATCAG-5', 2649, 3'-AGTGTGG-5', 2658, 3'-AAAACAA-5', 2842, 3'-AGAATGG-5', 3004, 3'-AAAATAA-5', 3013, 3'-AAACTAA-5', 3030, 3'-AGACCAG-5', 3123, 3'-AAATTAG-5', 3176, 3'-GGTGTGG-5', 3186, 3'-AGAGCAA-5', 3311, 3'-AAAACAA-5', 3330, 3'-AAATTGA-5', 3358, 3'-GAAGTGA-5', 3410, 3'-GAACTAG-5', 3462, 3'-AAACCAG-5', 3485, 3'-AATCCAG-5', 3681, 3'-GGAACAG-5', 3725, 3'-GGACTGG-5', 3749, 3'-AATGCAG-5', 3772, 3'-GATGTGG-5', 3810, 3'-GGACCAG-5', 3870, 3'-GGAGTAA-5', 3891, 3'-AGTTCAA-5', 4026, 3'-AGACCAG-5', 4032, 3'-AAAATAA-5', 4071, 3'-AATGTGA-5', 4092, 3'-AGTTCAA-5', 4177.
 
Inverse complement, positive strand, positive direction there are 75: 3'-GGTCCGA-5', 10, 3'-AGTCCGG-5', 92, 3'-AATCCAG-5', 152, 3'-GGTCCAG-5', 217, 3'-GGTGTGA-5', 345, 3'-GAAGCGG-5', 459, 3'-AGAATGA-5', 524, 3'-GAAGCGG-5', 595, 3'-GATGCGA-5', 652, 3'-GGTGCGA-5', 777, 3'-GGACCGG-5', 849, 3'-GGACCGG-5', 949, 3'-GGTCCGA-5', 1177, 3'-AAAGCAG-5', 1183, 3'-GAAGCGG-5', 1308, 3'-GAAGCGG-5', 1408, 3'-AATTCGG-5', 1541, 3'-GATGCGA-5', 1576, 3'-GGACTGG-5', 1662, 3'-GGTCTGA-5', 1744, 3'-GGACCGA-5', 1817, 3'-GGTCCGG-5', 1857, 3'-AGAATGG-5', 1888, 3'-GAAGTAG-5', 2110, 3'-AGTATAA-5', 2178, 3'-GGACTGG-5', 2213, 3'-GGTCTAG-5', 2230, 3'-AGAGTGG-5', 2247, 3'-AAAGTGA-5', 2304, 3'-GGTCCGA-5', 2318, 3'-AATCCGA-5', 2368, 3'-GATGTGG-5', 2430, 3'-GGACCGA-5', 2435, 3'-AGAGTGG-5', 2470, 3'-GGTACAA-5', 2475, 3'-GGACCGG-5', 2571, 3'-AATATGG-5', 2590, 3'-GGTGTGG-5', 2602, 3'-AGTTCAG-5', 2617, 3'-GGTGTGA-5', 2636, 3'-AGTCTAA-5', 2868, 3'-AAACTGG-5', 2873, 3'-GGTCCGG-5', 2878, 3'-AGACCGA-5', 2885, 3'-GGAGTAA-5', 2902, 3'-AGACTGA-5', 2945, 3'-AGACCGG-5', 2985, 3'-GGACCGG-5', 2990, 3'-GGAACAG-5', 3003, 3'-GGTCCAG-5', 3018, 3'-AGACCAA-5', 3023, 3'-AGTCCGG-5', 3036, 3'-GGACCAA-5', 3049, 3'-GAAGTAG-5', 3250, 3'-AGTGCAG-5', 3255, 3'-GGACCAG-5', 3298, 3'-AGAGTGA-5', 3317, 3'-GGTACAA-5', 3337, 3'-GGAACGG-5', 3375, 3'-AGTGTGA-5', 3507, 3'-GATCCGA-5', 3524, 3'-GGTCTGG-5', 3550, 3'-AGAGTGG-5', 3612, 3'-GGACCGG-5', 3681, 3'-AGTGTGG-5', 3824, 3'-GAACTGG-5', 4018, 3'-AAAATAG-5', 4123, 3'-GAACTAA-5', 4133, 3'-AAATCAA-5', 4138.
 
==Initiator elements (BBCABW)==
{{main|Initiator element gene transcriptions}}
 
===Core promoters===
 
There are five Inrs, positive strand, negative direction: 3'-TCCACT-5', 4423, 3'-CCCAGA-5', 4448, 3'-TCCACT-5', 4459, 3'-CCCACT-5', 4485, 3'-TTCACA-5', 4531.
 
There are five Inrs, negative strand, positive direction: 3'-GTCAGT-5', 4271, 3'-CTCATT-5', 4309, 3'-TGCAGA-5', 4317, 3'-CCCAGA-5', 4330, 3'-CTCACT-5', 4338.
 
There are four Inrs, positive strand, positive direction: 3'-TCCAGT-5', 4269, 3'-CTCACT-5', 4350, 3'-CCCACT-5', 4399, 3'-CCCAGA-5', 4414.
 
===Proximal promoters===
 
There are five Inrs on the negative strand in the negative direction: 3'-GTCACT-5', 4200, 3'-TCCAGT-5', 4307, 3'-GTCACT-5', 4319, 3'-CCCACT-5', 4353, 3'-GTCACA-5', 4359.
 
There are nine Inrs on the positive strand in the negative direction: 3'-GCCAGA-5', 4233, 3'-TGCAGT-5', 4317, 3'-TGCACT-5', 4340, 3'-GCCAGT-5', 4415, 3'-TCCACT-5', 4423, 3'-CCCAGA-5', 4448, 3'-TCCACT-5', 4459, 3'-CCCACT-5', 4485, 3'-TTCACA-5', 4531.
 
There is six Inrs on the negative strand in the positive direction: 3'-CTCAGA-5', 4195, 3'-GTCAGT-5', 4271, 3'-CTCATT-5', 4309, 3'-TGCAGA-5', 4317, 3'-CCCAGA-5', 4330, 3'-CTCACT-5', 4338.
 
There is four Inrs on the positive strand in the positive direction: 3'-TCCAGT-5', 4269, 3'-CTCACT-5', 4350, 3'-CCCACT-5', 4399, 3'-CCCAGA-5', 4414.
 
===Distal promoters===
 
Negative strand in the negative direction there are 44: 3'-TCCATA-5', 179, 3'-CCCAGT-5', 206, 3'-CTCAGA-5', 278, 3'-GTCACT-5', 299, 3'-TTCACA-5', 322, 3'-TCCAGT-5', 439, 3'-TGCATT-5', 533, 3'-TCCAGT-5', 568, 3'-TCCAGT-5', 576, 3'-TCCAGT-5', 712, 3'-GGCAGA-5', 754, 3'-GCCACT-5', 868, 3'-GTCACT-5', 1034, 3'-CCCACT-5', 1049, 3'-CTCACT-5', 1077, 3'-GGCACA-5', 1220, 3'-GTCACT-5', 1325, 3'-GTCAGA-5', 1354, 3'-CTCAGA-5', 1444, 3'-GGCAGT-5', 1511, 3'-TGCAGA-5', 1774, 3'-GTCACT-5', 1978, 3'-GTCACA-5', 2085, 3'-TCCAGT-5', 2248, 3'-GTCACT-5', 2404, 3'-CTCACT-5', 2447, 3'-TCCAGT-5', 2585, 3'-GTCACA-5', 2603, 3'-GTCACA-5', 2656, 3'-GTCACT-5', 2739, 3'-TTCACA-5', 2860, 3'-TCCACT-5', 3144, 3'-CCCACA-5', 3184, 3'-TTCACT-5', 3410, 3'-GTCATT-5', 3480, 3'-TCCACT-5', 3825, 3'-CTCATA-5', 3829, 3'-CTCATT-5', 3891, 3'-TTCACA-5', 3939.
 
Positive strand in the negative direction there are 59: 3'-GCCATA-5', 39, 3'-TGCATT-5', 152, 3'-GTCACT-5', 208, 3'-GGCACA-5', 266, 3'-GGCACA-5', 518, 3'-GGCACA-5', 960, 3'-GGCAGA-5', 1023, 3'-TGCAGT-5', 1032, 3'-TTCACT-5', 1056, 3'-GGCACA-5', 1116, 3'-CTCACA-5', 1126, 3'-GGCAGA-5', 1314, 3'-TGCAGT-5', 1323, 3'-TGCACT-5', 1347, 3'-TCCAGT-5', 1352, 3'-TCCATT-5', 1378, 3'-CCCAGA-5', 1411, 3'-TGCAGT-5', 1472, 3'-CTCACT-5', 1491, 3'-CCCAGA-5', 1518, 3'-TCCAGT-5', 1532, 3'-TGCACA-5', 1719, 3'-GGCAGA-5', 1967, 3'-TGCAGT-5', 1976, 3'-GCCACT-5', 1995, 3'-TGCACT-5', 2000, 3'-TGCAGT-5', 2083, 3'-GCCAGT-5', 2211, 3'-TGCAGT-5', 2402, 3'-TGCACT-5', 2426, 3'-TCCACT-5', 2632, 3'-GCCAGT-5', 2654, 3'-GGCACA-5', 2665, 3'-TGCAGT-5', 2737, 3'-GCCACT-5', 2756, 3'-GCCATT-5', 3284, 3'-TGCACT-5', 3289, 3'-TGCAGA-5', 3431, 3'-GGCATA-5', 3445, 3'-GGCATA-5', 3451, 3'-GGCAGT-5', 3478, 3'-GGCAGA-5', 3589, 3'-GGCAGT-5', 3600, 3'-GTCAGA-5', 3625, 3'-GGCACA-5', 3632, 3'-CTCAGA-5', 3644, 3'-GCCATT-5', 3686, 3'-TCCACA-5', 3692, 3'-CCCATA-5', 3856, 3'-CTCACA-5', 3965.
 
Inverse complement, negative strand, negative direction there are 46: 3'-TCTGAC-5', 16, 3'-TGTGGA-5', 62, 3'-TGTGCA-5', 342, 3'-TGTGCA-5', 531, 3'-AGTGCG-5', 663, 3'-TGTGGG-5', 749, 3'-TCTGAG-5', 916, 3'-TGTGCG-5', 963, 3'-ACTGAA-5', 1052, 3'-AGTGAG-5', 1057, 3'-TCTGAG-5', 1082, 3'-TGTGGA-5', 1129, 3'-AGTGGA-5', 1171, 3'-AATGAA-5', 1298, 3'-TCTGAG-5', 1403, 3'-AGTGAC-5', 1492, 3'-TGTGAA-5', 1544, 3'-TCTGAA-5', 1617, 3'-AGTGCA-5', 1772, 3'-TCTGAC-5', 1934, 3'-AGTGCG-5', 1991, 3'-TCTGAG-5', 2026, 3'-TATGAC-5', 2162, 3'-ACTGGC-5', 2190, 3'-AGTGCG-5', 2207, 3'-TGTGAA-5', 2551, 3'-AGTGAA-5', 2578, 3'-ACTGAG-5', 2787, 3'-TATGGA-5', 2994, 3'-AGTGGG-5', 3057, 3'-AGTGAA-5', 3101, 3'-AGTGAA-5', 3240, 3'-AGTGCG-5', 3280, 3'-TCTGAC-5', 3425, 3'-TATGAC-5', 3541, 3'-TATGCG-5', 3547, 3'-TATGGA-5', 3859, 3'-TGTGGA-5', 3968, 3'-TGTGAA-5', 3983.
 
Inverse complement, positive strand, negative direction there are 54, 3'-ACTGAA-5', 18, 3'-TATGGG-5', 78, 3'-ACTGAA-5', 131, 3'-TATGAG-5', 275, 3'-AGTGAG-5', 300, 3'-ACTGAC-5', 308, 3'-AGTGCG-5', 447, 3'-AGTGAA-5', 472, 3'-AGTGGA-5', 523, 3'-AGTGAG-5', 1035, 3'-AGTGAG-5', 1078, 3'-AGTGGC-5', 1121, 3'-AGTGAG-5', 1326, 3'-TCTGGG-5', 1357, 3'-AGTGCA-5', 1470, 3'-ACTGCA-5', 1494, 3'-AGTGCA-5', 1535, 3'-AATGAA-5', 1581, 3'-AATGCC-5', 1634, 3'-TATGGC-5', 1743, 3'-ACTGAG-5', 1936, 3'-AATGGC-5', 1949, 3'-AGTGAG-5', 1979, 3'-ACTGCA-5', 1998, 3'-TGTGGC-5', 2066, 3'-AATGAC-5', 2188, 3'-AGTGAG-5', 2405, 3'-ACTGCA-5', 2424, 3'-AGTGAG-5', 2448, 3'-TGTGGC-5', 2606, 3'-AGTGAG-5', 2740, 3'-ACTGCA-5', 2759, 3'-TGTGCA-5', 2863, 3'-AATGGC-5', 3005, 3'-TGTGAG-5', 3268, 3'-AGTGAC-5', 3411, 3'-TGTGCA-5', 3429, 3'-TGTGCC-5', 3561, 3'-AATGGG-5', 3660, 3'-TGTGGG-5', 3712, 3'-ACTGGG-5', 3750, 3'-AATGCA-5', 3771, 3'-TCTGGA-5', 3836, 3'-ACTGCC-5', 3852, 3'-TGTGGC-5', 3960, 3'-AGTGAG-5', 4050, 3'-TGTGAG-5', 4093.
 
Negative strand in the positive direction there 87: 3'-TCCAGA-5', 15, 3'-GGCATT-5', 22, 3'-GTCACA-5', 155, 3'-CCCAGA-5', 204, 3'-GCCACA-5', 343, 3'-CGCAGA-5', 396, 3'-TGCAGA-5', 438, 3'-CCCAGA-5', 468, 3'-TGCACA-5', 548, 3'-TCCACA-5', 632, 3'-CGCACT-5', 686, 3'-CGCACA-5', 800, 3'-GCCAGA-5', 835, 3'-GCCACA-5', 884, 3'-GCCAGA-5', 935, 3'-GCCACA-5', 984, 3'-CGCACA-5', 1052, 3'-CGCACA-5', 1136, 3'-TGCACA-5', 1220, 3'-CCCAGT-5', 1250, 3'-CGCAGA-5', 1316, 3'-TGCACT-5', 1372, 3'-CGCAGA-5', 1416, 3'-TGCACT-5', 1472, 3'-CCCACT-5', 1502, 3'-CGCACA-5', 1556, 3'-GGCATT-5', 1702, 3'-CCCAGA-5', 1742, 3'-TGCACA-5', 1822, 3'-TCCACT-5', 1912, 3'-TGCAGA-5', 1937, 3'-GGCACT-5', 1996, 3'-CCCAGT-5', 2024, 3'-TCCACA-5', 2029, 3'-CTCAGT-5', 2060, 3'-TGCAGT-5', 2065, 3'-GCCACT-5', 2072, 3'-TTCAGT-5', 2098, 3'-CTCATA-5', 2176, 3'-TGCATT-5', 2206, 3'-GTCAGA-5', 2222, 3'-CTCAGA-5', 2239, 3'-TTCACT-5', 2304, 3'-TGCAGT-5', 2328, 3'-GTCACT-5', 2425, 3'-GTCAGA-5', 2609, 3'-CTCAGA-5', 2699, 3'-TGCAGA-5', 2721, 3'-CTCAGA-5', 2729, 3'-TGCAGA-5', 2859, 3'-CTCAGA-5', 2866, 3'-CTCATT-5', 2902, 3'-GTCACT-5', 2929, 3'-TTCAGT-5', 2936, 3'-TGCACA-5', 2962, 3'-TGCATT-5', 3072, 3'-CCCAGT-5', 3082, 3'-CCCAGA-5', 3091, 3'-TCCACA-5', 3192, 3'-CTCACA-5', 3209, 3'-GCCAGA-5', 3221, 3'-TGCAGT-5', 3232, 3'-TGCAGT-5', 3281, 3'-CTCACT-5', 3317, 3'-TGCACT-5', 3343, 3'-CCCAGT-5', 3379, 3'-CCCACT-5', 3388, 3'-GGCACA-5', 3409, 3'-TGCAGT-5', 3461, 3'-GGCAGA-5', 3473, 3'-CTCACA-5', 3505, 3'-GCCACA-5', 3705, 3'-TCCAGA-5', 3806, 3'-GTCACA-5', 3822, 3'-TGCAGA-5', 3831, 3'-TCCAGA-5', 3891, 3'-CGCAGA-5', 3916, 3'-GTCACA-5', 3954, 3'-TGCAGT-5', 3962, 3'-GGCACT-5', 4006, 3'-TCCACT-5', 4013.
 
Positive strand in the positive direction there are 40: 3'-TCCAGT-5', 153, 3'-CGCACA-5', 1020, 3'-CCCAGA-5', 1711, 3'-CGCACT-5', 1720, 3'-CCCACA-5', 1803, 3'-CCCAGA-5', 1958, 3'-TCCACA-5', 1969, 3'-GTCAGT-5', 2100, 3'-TCCACT-5', 2128, 3'-TCCAGT-5', 2220, 3'-TCCAGA-5', 2258, 3'-TCCACT-5', 2375, 3'-CGCAGT-5', 2423, 3'-GTCACA-5', 2464, 3'-CCCAGA-5', 2489, 3'-TTCACT-5', 2511, 3'-CGCACT-5', 2555, 3'-GTCAGT-5', 2607, 3'-CTCAGT-5', 2613, 3'-TTCAGT-5', 2618, 3'-TCCATA-5', 2642, 3'-TCCAGA-5', 3019, 3'-CTCAGA-5', 3187, 3'-TGCAGA-5', 3256, 3'-CTCACA-5', 3592, 3'-GCCAGA-5', 3608, 3'-CTCACT-5', 3712, 3'-TCCATT-5', 3731, 3'-TCCAGA-5', 3771, 3'-CCCAGT-5', 3820, 3'-GTCACT-5', 3843, 3'-CTCACT-5', 3876, 3'-TTCAGA-5', 3922, 3'-TCCACT-5', 3934, 3'-GTCACA-5', 3964, 3'-CGCAGA-5', 4056.
 
Inverse complement, negative strand, positive direction there are 94: 3'-AGTGGG-5', 54, 3'-TCTGCA-5', 224, 3'-TGTGAA-5', 231, 3'-ACTGCC-5', 238, 3'-TCTGAG-5', 256, 3'-TCTGGA-5', 271, 3'-ACTGGG-5', 348, 3'-AGTGCG-5', 497, 3'-AGTGCG-5', 581, 3'-AGTGCG-5', 665, 3'-ACTGCG-5', 749, 3'-TGTGGC-5', 819, 3'-ACTGCC-5', 901, 3'-TGTGGC-5', 919, 3'-ACTGCG-5', 1001, 3'-TGTGGC-5', 1023, 3'-AGTGCG-5', 1085, 3'-AGTGCG-5', 1160, 3'-AGTGCG-5', 1169, 3'-AGTGCG-5', 1253, 3'-ACTGAG-5', 1287, 3'-AATGCG-5', 1321, 3'-TCTGGC-5', 1377, 3'-TCTGCG-5', 1396, 3'-AATGCG-5', 1421, 3'-TCTGGC-5', 1477, 3'-TCTGCG-5', 1496, 3'-ACTGCA-5', 1505, 3'-AGTGCG-5', 1589, 3'-AGTGCG-5', 1725, 3'-AGTGCA-5', 1786, 3'-TGTGGA-5', 1806, 3'-TCTGGG-5', 1865, 3'-ACTGGG-5', 1954, 3'-TGTGGC-5', 1972, 3'-TCTGGC-5', 1993, 3'-AGTGCA-5', 2063, 3'-AGTGGC-5', 2068, 3'-TATGGC-5', 2160, 3'-ACTGCA-5', 2204, 3'-AGTGCA-5', 2326, 3'-TGTGCA-5', 2681, 3'-AGTGGA-5', 2712, 3'-ACTGCC-5', 2823, 3'-AATGAC-5', 2842, 3'-TCTGCA-5', 2857, 3'-TCTGGC-5', 2884, 3'-AATGGG-5', 2911, 3'-TCTGAC-5', 2944, 3'-TCTGAG-5', 2951, 3'-TGTGCA-5', 2960, 3'-TCTGGC-5', 2984, 3'-TCTGAG-5', 3007, 3'-AGTGCC-5', 3011, 3'-TATGAC-5', 3028, 3'-TCTGCA-5', 3061, 3'-AATGCA-5', 3070, 3'-ACTGGC-5', 3118, 3'-TCTGAG-5', 3124, 3'-TATGGA-5', 3163, 3'-AATGGG-5', 3169, 3'-AGTGCC-5', 3235, 3'-TATGAG-5', 3261, 3'-TCTGCA-5', 3268, 3'-TCTGCA-5', 3279, 3'-ACTGCA-5', 3320, 3'-ACTGGC-5', 3346, 3'-TCTGCC-5', 3359, 3'-TCTGGC-5', 3406, 3'-AATGCC-5', 3431, 3'-TGTGGA-5', 3437, 3'-AATGAA-5', 3442, 3'-AATGAG-5', 3446, 3'-AGTGGG-5', 3450, 3'-AGTGCA-5', 3464, 3'-AATGAC-5', 3568, 3'-TGTGAA-5', 3595, 3'-AGTGAC-5', 3713, 3'-ACTGAG-5', 3736, 3'-AATGAC-5', 3783, 3'-AATGAA-5', 3836, 3'-AGTGAG-5', 3877, 3'-TGTGAG-5', 3904, 3'-TCTGAA-5', 3925, 3'-TGTGCA-5', 3960, 3'-TGTGAC-5', 3972, 3'-AGTGGG-5', 4041, 3'-ACTGAA-5', 4090, 3'-AATGAG-5', 4095.
 
Inverse complement, positive strand, positive direction there are 47: 3'-TCTGAC-5', 236, 3'-TGTGAC-5', 346, 3'-TCTGCC-5', 399, 3'-TCTGGC-5', 441, 3'-AATGAA-5', 525, 3'-TGTGCA-5', 569, 3'-TGTGCG-5', 803, 3'-TGTGCG-5', 887, 3'-TGTGCG-5', 987, 3'-TGTGAC-5', 1139, 3'-TGTGCC-5', 1223, 3'-TGTGCC-5', 1559, 3'-ACTGGG-5', 1663, 3'-TGTGCC-5', 1698, 3'-TCTGAA-5', 1745, 3'-AATGGG-5', 1889, 3'-ACTGGC-5', 2214, 3'-AGTGGA-5', 2248, 3'-AGTGAG-5', 2305, 3'-AGTGGG-5', 2313, 3'-AGTGAC-5', 2341, 3'-TCTGAA-5', 2417, 3'-TGTGGA-5', 2431, 3'-TATGAA-5', 2740, 3'-TCTGGA-5', 2862, 3'-AGTGAC-5', 2930, 3'-ACTGAA-5', 2946, 3'-TGTGGG-5', 2965, 3'-ACTGAA-5', 3030, 3'-AGTGCA-5', 3254, 3'-AGTGAC-5', 3318, 3'-TGTGAG-5', 3508, 3'-TGTGGG-5', 3533, 3'-TCTGGA-5', 3551, 3'-AGTGGG-5', 3613, 3'-AGTGCC-5', 3748, 3'-ACTGGA-5', 3785, 3'-ACTGGA-5', 4019, 3'-AGTGAC-5', 4088, 3'-AGTGAG-5', 4127.
 
==L boxes==
{{main|L box gene transcriptions}}
 
The consensus sequence for the L1 box is TAAATGYA.<ref name=Ye/> Y is (A/C/G).
 
==M35 boxes==
{{main|M35 box gene transcriptions}}
negative strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesM35--.bas, looking for 3'-TTGACA-5', 2, 3'-TTGACA-5', 477, 3'-TTGACA-5', 4399.
 
==M boxes==
{{main|M box gene transcriptions}}
 
==Metal responsive elements==
{{main|Metal responsive element gene transcriptions}}
 
===Proximal promoters===
 
On the positive strand in the negative direction there is an MRE 3'-TGCACTC-5' at 4341.
 
===Distal promoters===
 
Positive strand in the negative direction there are 6: 3'-TGCGCTC-5', 891, 3'-TGCACTC-5', 1348, 3'-TGCACTC-5', 2001, 3'-TGCACTC-5', 2427, 3'-TGCACCC-5', 2762, 3'-TGCACTC-5', 3290.
 
Inverse complement, negative strand, negative direction there are 2: 3'-GTGTGCA-5', 531, 3'-GAGTGCA-5', 1772.
 
Inverse complement, positive strand, negative direction there are 2: 3'-GAGTGCA-5', 1470, 3'-GTGTGCA-5', 2863.
 
Negative strand in the positive direction there are 11: 3'-TGCGCCC-5', 453, 3'-TGCACAC-5', 549, 3'-TGCACAC-5', 1221, 3'-TGCGCCC-5', 1247, 3'-TGCACTC-5', 1373, 3'-TGCGCCC-5', 1399, 3'-TGCACTC-5', 1473, 3'-TGCGCCC-5', 1499, 3'-TGCGCCC-5', 1657, 3'-TGCACAC-5', 2963, 3'-TGCACCC-5', 3323.
 
Positive strand in the positive direction there are 2: 3'-TGCGCCC-5', 872, 3'-TGCGCCC-5', 972.
 
Inverse complement, negative strand, positive direction there are 10: 3'-GCGTGCA-5', 546, 3'-GCGCGCA-5', 684, 3'-GGGCGCA-5', 876, 3'-GGGCGCA-5', 976, 3'-GCGTGCA-5', 1218, 3'-GTGCGCA-5', 1523, 3'-GAGTGCA-5', 1786, 3'-GAGTGCA-5', 2326, 3'-GGGTGCA-5', 2800, 3'-GGGTGCA-5', 3883.
 
==Motif ten elements==
{{main|Motif ten element gene transcriptions}}
There are no MTEs in either promoter.
 
==MYB recognition elements==
{{main|MYB recognition element gene transcriptions}}
 
==P boxes==
{{main|P box gene transcriptions}}
 
==Pollen1 elements==
 
"Electrophoretic mobility shift assays identified a pollen-specific ''cis''-acting element POLLEN1 (AGAAA) mapped at ''AtACBP4'' (−157/−153) which interacted with nuclear proteins from flower and this was substantiated by DNase I footprinting."<ref name=Ye/>
 
==Pribnow boxes==
{{main|Pribnow box gene transcriptions}}
# negative strand in the negative direction, looking for 3'-TATAAT-5', 2, 3'-TATAAT-5', 3454, 3'-TATAAT-5', 3468,
# negative strand in the positive direction, looking for 3'-TATAAT-5', 1, 3'-TATAAT-5', 729,
# positive strand in the negative direction, looking for 3'-TATAAT-5', 0,
# positive strand in the positive direction, looking for 3'-TATAAT-5', 0,
# complement, negative strand, negative direction, looking for 3'-ATATTA-5', 0,
# complement, negative strand, positive direction, looking for 3'-ATATTA-5', 0,
# complement, positive strand, negative direction, looking for 3'-ATATTA-5', 2, 3'-ATATTA-5', 3454, 3'-ATATTA-5', 3468,
# complement, positive strand, positive direction, looking for 3'-ATATTA-5', 1, 3'-ATATTA-5', 729,
# inverse complement, negative strand, negative direction, looking for 3'-ATTATA-5', 2, 3'-ATTATA-5', 272, 3'-ATTATA-5', 603,
# inverse complement, negative strand, positive direction, looking for 3'-ATTATA-5', 1, 3'-ATTATA-5', 727,
# inverse complement, positive strand, negative direction, looking for 3'-ATTATA-5', 0,
# inverse complement, positive strand, positive direction, looking for 3'-ATTATA-5', 0,
# inverse, negative strand, negative direction, looking for 3'-TAATAT-5', 0,
# inverse, negative strand, positive direction, looking for 3'-TAATAT-5', 0,
# inverse, positive strand, negative direction, looking for 3'-TAATAT-5', 2, 3'-TAATAT-5', 272, 3'-TAATAT-5', 603,
# inverse, positive strand, positive direction, looking for 3'-TAATAT-5', 1, 3'-TAATAT-5', 727.
 
==Prolamin boxes==
{{main|Prolamin box gene transcriptions}}
# negative strand in the negative direction: 1, 3'-TGTAAAG-5', 2884,
# negative strand in the positive direction: 1, 3'-TGAAAAG-5', 489,
# positive strand in the negative direction: 1, 3'-TGAAAAG-5', 1627.
 
==Pyrimidine boxes==
{{main|Pyrimidine box gene transcriptions}}
Pyrimidine boxes and their complements in the negative direction: 3'-CCTTTT-5' at 2459, 3'-CCTTTT-5' at 2927, and 3'-CCTTTT-5' at 2968 occur. Inverse pyrimidine boxes and their complements occur 3'-AAAAGG-5' at 105, 3'-AAAAGG-5' at 1107, 3'-AAAAGG-5' at 3345, and 3'-AAAAGG-5' at 3441.
 
Pyrimidine boxes in the positive direction: 3'-CCTTTT-5' at 135 and 3'-CCTTTT-5' at 291 and their complements are close to ZNF497.
 
==Q elements==
 
"The basal regulatory elements identified include a putative TATA-box (−30/−24) for RNA polymerase binding and a CAAT box (−64/−61; [...]). Several putative floral expression-related cis-elements identified included a putative 6-nucleotide Q element (−770/−665), three GTGA boxes (−372/−369, −209/−206 and −164/−161) and four putative highly-conserved POLLEN1 boxes (−737/−733, −711/−707, −150/−146 and −36/−32; [...])."<ref name=Ye>{{ cite journal
|author=Zi-Wei Ye, Jie Xu, Jianxin Shi, Dabing Zhang and Mee-Len Chye
|title=Kelch-motif containing acyl-CoA binding proteins AtACBP4 and AtACBP5 are differentially expressed and function in floral lipid metabolism
|journal=Plant Molecular Biology
|date=January 2017
|volume=93
|issue=
|pages=209-225
|url=https://www.researchgate.net/profile/Jianxin_Shi6/publication/309799453_Kelch-motif_containing_acyl-CoA_binding_proteins_AtACBP4_and_AtACBP5_are_differentially_expressed_and_function_in_floral_lipid_metabolism/links/5d11201c458515c11cf5f6b1/Kelch-motif-containing-acyl-CoA-binding-proteins-AtACBP4-and-AtACBP5-are-differentially-expressed-and-function-in-floral-lipid-metabolism.pdf
|arxiv=
|bibcode=
|doi=10.1007/s11103-016-0557-5
|pmid=
|accessdate=7 May 2020 }}</ref>
 
The consensus sequence for a Q element is 3'-AGGTCA-5'.<ref name=Ye/>
 
==Retinoblastoma control elements==
{{main|TC element gene transcriptions}}
 
==R response elements==
{{main|MYB recognition element gene transcriptions}}
 
==STAT5s==
{{main|STAT gene transcriptions}}
 
===Proximal promoters===
 
Negative strand in the positive direction there is 1: 3'-TTCCGGGAA-5', 4247.
 
===Distal promoters===
 
Positive strand in the negative direction there are 2: 3'-TTCGTTGAA-5', 3506, 3'-TTCCCTGAA-5', 3782.
 
Positive strand in the positive direction there is 1: 3'-TTCCATGAA-5', 128.
 
==Synaptic Activity-Responsive Elements==
{{main|Synaptic Activity-Responsive Elements}}
 
==TACTAAC boxes==
{{main|TACTAAC box gene transcriptions}}
 
==Tapetum boxes==
 
The consensus sequence for the TAPETUM box is TCGTGT.<ref name=Ye/>
 
==TATA boxes==
{{main|TATA box gene transcriptions}}
Negative strand in the negative direction there are 2: 3'-TATATATA-5' at 1600 (or -2860 nts upstream from the TSS) and 3'-TATATAAA-5' at 1602 (or -2858 nts).
 
Positive strand in the negative direction there are 3: 3'-TATAAAAG-5' at 184 (or -4276 nts), 3'-TATAAAAG-5' at 223 (or -4237 nts), and 3'-TATATAAA-5' at 2874 (or -1586 nts).
 
Inverse complement, negative strand, negative direction there are 2: 3'-TATATATA-5', 1600, 3'-TTTATATA-5', 2871.
 
Inverse complement, positive strand, negative direction there is 1: 3'-TTTTTATA-5', 219.
 
==TAT boxes==
{{main|TAT box gene transcriptions}}
Only an inverse and its complement occurs between ZSCAN22 and A1BG: 3'-TACCTAT-5' at 2996 nts from ZSCAN22.
 
==TATCCAC boxes==
{{main|TATC box gene transcriptions}}
None occur.
 
==T boxes==
{{main|T box gene transcriptions}}
 
==TCCACCATA elements==
 
"Given that ''AtACBP4pro::GUS'' (−156/−67) could drive promoter activity for pollen expression, [electrophoretic mobility shift assays] EMSAs were carried out to investigate the role of the putative POLLEN1 ''cis''-element, AGAAA (−150/−146), and its adjacent co-dependent regulatory element TCCACCATA (–141/–133)."<ref name=Ye/>
 
"POLLEN1 and the TCCACCATA element are co-dependent regulatory elements responsible for pollen-specific activation of tomato ''LAT52'' (Bate and Twell 1998)."<ref name=Ye/>
 
==Telomeric repeat DNA-binding factors==
 
Copying the consensus telomeric repeat DNA-binding factor (TRF): 3'-TTAGGG-5' and putting the sequence in "⌘F" locates this sequence in the A1BG negative direction, nucleotide positions as can be found by the computer programs.
 
In the nucleotides between ZSCAN22 and A1BG there is at least one 3'-TTAGGG-5' beginning about 680 nucleotides from ZSCAN22 or ending at about 686 nts.
 
''Homo sapiens'' genes containing these are found using Homo sapiens "TRF (TTAGGG repeat-binding factor)".
 
==Tetradecanoylphorbol-13-acetate response elements==
{{main|Tetradecanoylphorbol-13-acetate response element gene transcriptions}}
 
==TGFβ control elements==
{{main|TC element gene transcriptions}}
 
==TGF-β inhibitory elements==
{{main|TC element gene transcriptions}}
 
==Upstream response elements==
{{main|Upstream response element gene transcriptions}}
 
==V boxes==
{{main|V box gene transcriptions}}
 
==W boxes==
{{main|W box gene transcriptions}}
 
===Proximal promoters===
 
Inverse W boxes occur in the negative strand, negative direction of A1BG: 3'-GGTCAA-5' at 4416 and 3'-GGTCAA-5' at 4308.
 
W boxes occur in the positive direction, positive strand of A1BG: 3'-CTGACC-5' and its complement at 4216 and inverse W boxes occur 3'-GGTCAG-5' and its complement at 4270.
 
===Distal promoters===
 
A W box occurs 3'-CTGACC-5' at 3749, whereas 3'-CTGACT-5' at 17, 3'-TTGACT-5' at 130, 3'-TTGACT-5' at 307, and 3'-CTGACC-5' at 734 occur close to ZSCAN22, but 3'-CTGACT-5' at 1935 could be associated ZSCAN22 or an unknown gene between it and A1BG, along with their complements, negative strand, negative direction.
 
Inverse complement, positive strand, negative direction there are 5: 3'-GGTCAG-5', 440, 3'-GGTCAG-5', 577, 3'-GGTCAG-5', 713, 3'-GGTCAG-5', 2249, 3'-GGTCAG-5', 2586.
 
W box inverses occur 3'-GGTCAG-5' at 1353 negative direction.
 
W boxes 3'-AGTCAG-5' at 2101, 3'-GGTCAG-5' at 2221, 3'-AGTCAG-5' at 2608, 3'-AGTCAA-5' at 2614, and 3'-AGTCAG-5' at 2619 along with their complements, positive direction.
 
W boxes in the positive direction occur 3'-CTGACC-5' at 1662, 3'-CTGACC-5' at 2213, 3'-TTGACC-5' at 2873, 3'-CTGACT-5' at 2945, and 3'-TTGACC-5' at 4018 that could be associated with A1BG, along with 3'-TTGACC-5' at 1953, 3'-CTGACT-5' at 2674, and 3'-TTGACT-5' at 3735.
 
Inverse complement, positive strand, positive direction there are 6: 3'-GGTCAG-5', 2025, 3'-AGTCAG-5', 2099, 3'-GGTCAG-5', 2606, 3'-GGTCAG-5', 2997, 3'-GGTCAG-5', 3083, 3'-GGTCAA-5', 3380.
 
==X boxes==
{{main|X box gene transcriptions}}
There are no X boxes in either promoter.
 
==X core promoter elements==
{{main|X core promoter element gene transcriptions}}
# negative strand in the negative direction, looking for 3'-G/A/T-G/C-G-T/C-G-G-G/A-A-G/C-A/C-5', 1, 3'-TGGTGGGACC-5', 3744,
# negative strand in the positive direction, looking for 3'-G/A/T-G/C-G-T/C-G-G-G/A-A-G/C-A/C-5', 0,
# positive strand in the negative direction, looking for 3'-G/A/T-G/C-G-T/C-G-G-G/A-A-G/C-A/C-5', 0,
# positive strand in the positive direction, looking for 3'-G/A/T-G/C-G-T/C-G-G-G/A-A-G/C-A/C-5', 0,
# complement, negative strand, negative direction, looking for 3'-C/A/T-G/C-C-A/G-C-C-C/T-T-G/C-G/T-5', 0,
# complement, negative strand, positive direction, looking for 3'-C/A/T-G/C-C-A/G-C-C-C/T-T-G/C-G/T-5', 0,
# complement, positive strand, negative direction, looking for 3'-C/A/T-G/C-C-A/G-C-C-C/T-T-G/C-G/T-5', 1, 3'-ACCACCCTGG-5', 3744,
# complement, positive strand, positive direction, looking for 3'-C/A/T-G/C-C-A/G-C-C-C/T-T-G/C-G/T-5', 0,
# inverse complement, negative strand, negative, looking for 3'-G/T-G/C-T-C/T-C-C-A/G-C-G/C-C/A/T-5', 0,
# inverse complement, negative strand, positive direction, looking for 3'-G/T-G/C-T-C/T-C-C-A/G-C-G/C-C/A/T-5', 0,
# inverse complement, positive strand, negative direction, looking for 3'-G/T-G/C-T-C/T-C-C-A/G-C-G/C-C/A/T-5', 1, 3'-GCTCCCACCT-5', 392,
# inverse complement, positive strand, positive direction, looking for 3'-G/T-G/C-T-C/T-C-C-A/G-C-G/C-C/A/T-5', 0,
# inverse, negative strand, negative direction, looking for 3'-A/C-G/C-A-G/A-G-G-T/C-G-G/C-G/A/T-5', 1, 3'-CGAGGGTGGA-5', 392,
# inverse, negative strand, positive direction, looking for 3'-A/C-G/C-A-G/A-G-G-T/C-G-G/C-G/A/T-5', 1, 3'-CCAGGGTGGG-5', 102,
# inverse, positive strand, negative direction, looking for 3'-A/C-G/C-A-G/A-G-G-T/C-G-G/C-G/A/T-5', 0,
# inverse, positive strand, positive direction, looking for 3'-A/C-G/C-A-G/A-G-G-T/C-G-G/C-G/A/T-5', 0.
 
==Y boxes==
{{main|Y box gene transcriptions}}
There are no Y boxes in either promoter.
 
==Z boxes==
{{main|Z box gene transcriptions}}


==Hypotheses==
==Hypotheses==
Line 1,615: Line 1,743:
==See also==
==See also==
{{div col|colwidth=20em}}
{{div col|colwidth=20em}}
* [[A1BG gene transcription core promoters]]
* [[A1BG gene transcriptions]]
* [[A1BG gene transcriptions]]
* [[A1BG regulatory elements and regions]]
* [[A1BG response element gene transcriptions]]
* [[A1BG response element negative results]]
* [[A1BG response element positive results]]
* [[Alpha-1-B glycoprotein]]
* [[Alpha-1-B glycoprotein]]
* [[Immunoglobulin domain cl11960]]
* [[Immunoglobulin domain cl11960]]
Line 1,636: Line 1,769:


<!-- footer templates -->
<!-- footer templates -->
{{Gene project}}{{tlx|Phosphate biochemistry}}{{Sisterlinks|Complex locus A1BG and ZNF497}}
{{Gene project}}{{Transcription factors and intracellular receptors}}{{tlx|Phosphate biochemistry}}


<!-- footer categories -->
<!-- footer categories -->
[[Category:Resources last modified in July 2020]]

Latest revision as of 21:09, 16 May 2023

Associate Editor(s)-in-Chief: Henry A. Hoff

Alpha-1-B glycoprotein is a 54.3 kDa protein in humans that is encoded by the A1BG gene.[1] The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins.

A1BG was located on the DNA strand of chromosome 19.[2] Additionally, A1BG, in current nucleotide numbering (58,345,183-58,353,492), is located adjacent to the ZSCAN22 gene (58,326,994-58,342,332) on the positive DNA strand, as well as the ZNF837 (58,367,623 - 58,381,030, complement) and ZNF497 (58,354,357 - 58,362,751, complement) genes on the negative strand.[2]

In the current nucleotide numbering, the A1BG untranslated region (UTR) has been expanded so that with ZSCAN22 ending at 58,342,332, the nucleotides used in this study are 58,342,333 to 58,346,892 on both strands, with the current UTR for A1BG beginning at 58,345,183. On the other side of A1BG ending at 58,353,492, the nucleotides used are 58,353,493 to 58,357,937. With ZNF497 beginning at 58,354,357, this study goes into ZNF497 to 58,357,937 or 3580 nucleotides from its downstream TSS or 4445 nucleotides from the TSS of A1BG downstream from ZNF497.

For example, an abscisic acid responsive element (ABRE) with the consensus sequence of ACGTG(G/T)C (Watanabe et al. 2017) occurs in the positive strand in the negative direction from ZSCAN22 to A1BG as ACGTGGC ending at 4239 nucleotides from the end of ZSCAN22 or 58,346,571, where the A is at 58,346,565 inside the UTR of A1BG.

Introduction

"Many important disease-related pathways utilize transcription factors that specifically bind DNA (e.g., c-Myc, HIF-1, TCF1, p53) as key nodes or endpoints in complex signaling networks. In such cases the transcription factor itself is often the most attractive target. However, drugging transcription factors is challenging owing to an absence of small ligand binding sites in their DNA-binding domain and the presence of a highly charged DNA-binding surface [1]."[3]

If a specific gene appears to be involved in a disease-related or deleterious pathway being able to alter its expression so as to improve the person's health may be needed. To alter its expression constructively may require knowing what regulatory elements exist in the gene's nearby promoters.

Response elements

Identifying a bona fide response element is more difficult than a simple inspection. In order to attribute the response element to a candidate sequence, some observations have to be conducted using molecular, biological and biophysical methods and functional approaches. Findings may indicate that response element in the promoter is a functional element.[4]

A likely response element found by simple inspection may also be inactive due to methylation.

Response Elements: "Nucleotide sequences, usually upstream, which are recognized by specific regulatory transcription factors, thereby causing gene response to various regulatory agents. These elements may be found in both promoter and enhancer regions."[5]

"Under conditions of stress, a transcription activator protein binds to the response element and stimulates transcription. If the same response element sequence is located in the control regions of different genes, then these genes will be activated by the same stimuli, thus producing a coordinated response."[6]

WD-40 repeat family

"Receptor for activated C kinase (RACK1) is a highly conserved, eukaryotic protein of the WD-40 repeat family. [...] During Phaseolus vulgaris root development, RACK1 (PvRACK1) mRNA expression was induced by auxins, abscisic acid, cytokinin, and gibberellic acid."[7]

Abscisic acid (ABA) response elements

Auxin response factors

ARFUs
ARFBs
ARF2s
ARF5s

CAACTC regulatory elements

CAREs (Fan)
CAREs (Garaeva)

Cytokinins

ARR1s
ARR10s
ARR12s
ARRFs
ARRR1s
ARRR2s

Coupling elements

CE3Ws
CE3Ds

EREs

Gibberellic acid response elements

GAREs
GAREL1s

Hypoxia response elements

HIFs
HREs
CACAs

Pyrimidine boxes

TAT boxes

TATFs
TATYs

General Regulatory Factors

The following general regulatory factors occur in the promoters between ZSCAN22, A1BG and ZNF497 on human chromosome 19.

Abfms

Rap1s

Reb1s

Tbf1s

Basic leucine zipper (bZIP) class response elements

A-boxes

ACGTs

"A majority of the plant bZIP proteins isolated to date recognize elements with an ACGT core (Foster et al., 1994)."[8]

"Most recombinant bZIP proteins can interact with ACGT elements derived from different plant genes, albeit with different affinity. Systematic protein/DNA binding studies have shown that sequences flanking the ACGT core affect bZIP protein binding specificity. These studies have provided the basis for a concise ACGT nomenclature and defined high-affinity A-box, C-box, and G-box elements."[9]

"HY5 binds to the promoter of light-responsive genes featuring "ACGT-containing elements" such as the G-box (CACGTG), C-box (GACGTC), Z-box (ATACGGT), and A-box (TACGTA) (4, 6)."[10]

Activating transcription factors

ATFBs
ATFKs

Affinity Capture-Western; Two-hybrid transcription factors

AFTs

Box As

C-boxes

C-boxes come in several varieties:

C-boxes (Johnson)
C boxes (Samarsky)
C boxes (Voronina)
C boxes (Song)
C boxes (Song hybrids)

Hybrids: C/A-box (TGACGTAT), C/G-box (TGACGTGT), C/T-box (TGACGTTA).

CAMPs

ESRE

The endoplasmic reticulum stress response element (ESRE) has two parts: (1) CCAAT and (2) CCACG which are tested separately then compared to see if any parts have any nine nucleotides between them.

CCAAT
CCACG

According to So (2018) the endoplasmic reticulum stress response element should be CCAAT-N9-CCACG. Samplings demonstrate that the ideal CCAAT-N9-CCACG or its complement inverse do not occur on either side of A1BG or close to ZSCAN22 or ZNF497.

Hap motif

G-boxes

G-box (CACGTG)

GCN4 motif

GCREs (Gcn4)

Migs

Nuclear factors

NFATs
HNF6s

T boxes

TboxCs
TboxZs

Vboxes

Z-boxes

ZboxGs
ZboxSps

Helix-turn-helix (HTH) transcription factors

Gene ID: 4602 is MYB [myeloblastosis] MYB proto-oncogene, transcription factor on 6q23.3: "This gene encodes a protein with three HTH DNA-binding domains that functions as a transcription regulator. This protein plays an essential role in the regulation of hematopoiesis. This gene may be aberrently expressed or rearranged or undergo translocation in leukemias and lymphomas, and is considered to be an oncogene. Alternative splicing results in multiple transcript variants."[11]

CadC binding domains

Factor II B recognition elements

Forkhead boxes

Homeoboxes

Homeodomains

HSE3 (Eastmond)

HSE4 (Eastmond)

HSE8 GAP1 (Eastmond)

HSE9 GAP2 (Eastmond)

Hsf (Tang)

MREs

Tryptophan residues

Basic helix-loop-helix (bHLH) transcription factors

"The [palindromic E-box motif (CACGTG)] motif is bound by the transcription factor Pho4, [and has the] class of basic helix-loop-helix DNA binding domain and core recognition sequence (Zhou and O'Shea 2011)."[12]

"Pho4 bound to virtually all E-boxes in vitro (96%) [...]. That was not the case in vivo, where only 5% were bound by Pho4, under activating conditions as determined by ChIP-seq [Zhou and O'Shea 2011]."[12]

"Pho4 possesses the intrinsic ability to bind every E-box, but in vivo is prevented from binding by chromatin unless assisted by chromatin remodelers (Svaren et al. 1994) that are targeted at promoter regions."[12]

"On one end of that spectrum, typical transcription factors like Pho4 do not appear to compete with nucleosomes and instead predominantly sample motifs that already exist in the [nucleosome-free promoter regions] NFRs generated by other factors. In vitro (PB-exo), Pho4 bound nearly every instance of an E-box motif across the yeast genome. However, in vivo, Pho4 is a low-abundance protein that is recruited to the nucleus upon phosphate starvation by other factors, to act at a few dozen genes (Komeili and O'Shea 1999; Zhou and O'Shea 2011). Since Pho4 appears unable to compete with nucleosomes, competent sites that are occluded by nucleosomes are invisible to Pho4."[12]

The Pho4 homodimer binds to DNA sequences containing the bHLH binding site CACGTG.[13]

The upstream activating sequence (UAS) for Pho4p is CAC(A/G)T(T/G) in the promoters of HIS4 and PHO5 regarding phosphate limitation with respect to regulation of the purine and histidine biosynthesis pathways [66].[14]

bHLH proteins typically bind to a consensus sequence called an E-box, CANNTG.[15]

"A computer search for transcription promoter elements [...] showed the presence of a prominent TATA box 22 nucleotides upstream of the transcription start site and an Sp1 site at position -42 to -33. The 5'-flanking sequence also contains three E boxes with CANNTG consensus sequences at positions -464 to -459, -90 to -85, and -52 to -47 that have been marked as E box, E1 box, and E2 box, respectively [...]. In addition, the 5'-flanking region contains one or more GRE, XRE, GATA-1, GCN-4, PEA-3, AP1, and AP2 consensus motifs and also three imperfect CArG sites [...]."[16]

AhRYs

AHRE-IIs

AEREs

CAT boxes

CAT-box-like elements

"Class C"

"Class I"

TCFs

DIOXs

Enhancer boxes

ChoRE motifs
CarbE1s
CarbE2s
CarbE3s
Phors

Palindromic E-box motif (CACGTG).

E2 boxes

GATAs

Gln3s

Glucocorticoid response elements

ICRE (Lopes)

ICRE (Schwank)

Pho4

QRDREs

Carbon source-responsive elements

CATTCAs
TCCGs

XREs

Basic helix-loop-helix leucine zipper transcription factors

Basic helix-loop-helix leucine zipper transcription factors are, as their name indicates, transcription factors containing both Basic helix-loop-helix and leucine zipper motifs.

Examples include Microphthalmia-associated transcription factor and Sterol regulatory element-binding protein (SREBP).

MITF recognizes E-box (CAYRTG) and M-box (TCAYRTG or CAYRTGA) sequences in the promoter regions of target genes.[17]

Serum response element gene transcriptions: The SRE wild type (SREwt) contains the nucleotide sequence ACAGGATGTCCATATTAGGACATCTGC, of which CCATATTAGG is the CArG box, TTAGGACAT is the C/EBP box, and CATCTG is the E box.[18]

"Serum response factor (SRF) is an important transcription factor that regulates cardiac and skeletal muscle genes during development, maturation and adult aging [17,18]. SRF regulates its target genes by binding to serum response elements (SREs), which contain a consensus CC(A/T)6GG (CArG) motif."[19]

CArG boxes

MITF E-boxes

RREs

Consensus sequence: CATCTG.

M-boxes

M box (Bertolotto)
M-box (Hoek)
M-box (Ripoll)

SER elements

Basic helix-span-helix

Activating proteins

AP2as
APCo1s
APCo2s
APM3Ns
APM4Ns
Yao1s
Yao2s
Yau3s

"Pemphigus foliaceus (PF) is an autoimmune disease, endemic in Brazilian rural areas, characterized by acantholysis and accompanied by complement activation, with generalized or localized distribution of painful epidermal blisters. CD59 is an essential complement regulator, inhibiting formation of the membrane attack complex, and mediating signal transduction and activation of T lymphocytes. CD59 has different transcripts by alternative splicing, of which only two are widely expressed, suggesting the presence of regulatory sites in their noncoding regions. To date, there is no association study with polymorphisms in CD59 noncoding regions and susceptibility to autoimmune diseases. In this study, we aimed to evaluate if CD59 polymorphisms have a possible regulatory effect on gene expression and susceptibility to PF. Six noncoding polymorphisms were haplotyped in 157 patients and 215 controls by sequence-specific PCR, and CD59 mRNA levels were measured in 82 subjects, by qPCR. The rs861256-allele-G (rs861256*G) was associated with increased mRNA expression (p = .0113) and PF susceptibility in women (OR = 4.11, p = .0001), which were also more prone to develop generalized lesions (OR = 4.3, p = .009) and to resist disease remission (OR = 3.69, p = .045). Associations were also observed for rs831625*G (OR = 3.1, p = .007) and rs704697*A (OR = 3.4, p = .006) in Euro-Brazilian women, and for rs704701*C (OR = 2.33, p = .037) in Afro-Brazilians. These alleles constitute the GGCCAA haplotype, which also increases PF susceptibility (OR = 4.9, p = .045) and marks higher mRNA expression (p = .0025). [...] higher CD59 transcriptional levels may be related with PF susceptibility (especially in women), probably due to the effect of genetic polymorphism and to the CD59 role in T cell signal transduction."[20]

Stem-loops

File:Stem-loop.svg
An example of an RNA stem-loop is shown. Credit: Sakurambo.{{free media}}

As an important secondary structure of RNA, a stem-loop can direct RNA folding, protect structural stability for messenger RNA (mRNA), provide recognition sites for RNA binding proteins, and serve as a substrate for enzymatic reactions.[21]

Hairpin loops are often elements found within the 5'UTR of prokaryotes. These structures are often bound by proteins or cause the attenuation of a transcript in order to regulate translation.[22]

The mRNA stem-loop structure forming at the ribosome binding site may control an initiation of translation.[23][24]

AUREs

Adenylate–uridylate rich elements (Chen and Shyu, Class I)

Adenylate–uridylate rich elements (Chen and Shyu, Class II)

Adenylate–uridylate rich elements (Chen and Shyu, Class III)

MERs

Constitutive decay elements

Cys
2
His
2
SP / Kruppel-like factor (KLF) transcription factor family

The Cys
2
His
2
-like fold group (Cys
2
His
2
) is by far the best-characterized class of zinc fingers, and is common in mammalian transcription factors, where such domains adopt a simple ββα fold and have the amino acid sequence motif:[25]

X2-Cys-X2,4-Cys-X12-His-X3,4,5-His

Alcohol dehydrogenase repressor 1

SP1M1s

SP1M2s

SP-1 (Sato)s

SP1 (Yao)s

YY1Ts

AP-2/EREBP-related factors

AGC boxes

AP-1 transcription factor network (Pathway)

Sixty-nine genes are included in the AP-1 transcription factor network (Pathway).[26]

AGCEs

Zinc finger DNA-binding domains

AnRE1s

AnDRE2s

AnREWs

B-boxes

Box Bs

β-Scaffold factors

"Higher animals have [transcription factor] TF genes for the basic domain, the β-scaffold factor, and other new structures; however, their total proportion is less than 15% and most are [zinc (Zn)-coordinating factor] ZF and [Helix-Turn-Helix] HTH genes."[27]

ATA boxes

Γ-interferon activated sequences

HMG boxes

Zn(II)2Cys6 proteins

"The transcription factors Uga3, Dal81 and Leu3 belong to the class III family (Zn(II)2Cys6 proteins), and they recognize highly related sequences rich in GGC triplets [15]."[28]

Dal81

GCC boxes

GGC triplets

GGCGGC triplets

Leu3

Uga3

Hairpin-hinge-hairpin-tail

"In addition to this ACA box, they have the consensus H box sequence (5'-ANANNA-3') but have no other primary sequence identity. Despite this lack of primary sequence conservation, the H and ACA boxes are embedded in an evolutionarily conserved hairpin-hinge-hairpin-tail core secondary structure with the H box in the single-stranded hinge region and the ACA box in the single-stranded tail (5, 16)."[29]

H and ACA boxes

H-boxes (Grandbastien)

H-boxes (Lindsay)

H boxes (Mitchell)

H boxes (Rozhdestvensky)

Unknown response element types

ACEs

BBCABW Inrs

Calcineurin-responsive transcription factors

Carbs

Carb1s

Cat8s

Cell-cycle box variants

CGCG boxes

Circadian control elements

Cold-responsive elements

Copper response elements

CuREQs
CuREPs

Cytoplasmic polyadenylation elements

DAF-16 binding elements

D box (Samarsky)

D box (Voronina)

D-box (Motojima)

dBRE

Downstream core elements

DCE SI

DCE SII

DCE SIII

DPE (Juven-Gershon)

DPE (Kadonaga)

DPE (Matsumoto)

EIN3 binding sites

Endosperm expressions

Estrogen response elements

ERE1s
ERE2s

GAAC elements

GC boxes (Briggs)

GC boxes (Ye)

GC boxes (Zhang)

GCR1s

GREs

GT boxes (Sato)

Hex sequences

HY boxes

IFNs

Inr-like, TCTs

IRF3s

IRSs

KAR2s

MBE1s

MBE2s

MBE3s

NF𝜿BSs

PREs

Pribs

RAREs

Rgts

ROREs

SERVs

STAT5s

STREs

Sucroses

TACTs

TAGteams

TAPs

TATAs

Examining the promoter regions upstream from ZSCAN22 to A1BG and downstream from ZNF497 to A1BG for TATA boxes has shown that TATA boxes in various forms are present and likely active or activable: (1) TATAAAA (Carninci 2006), (2) TATA(A/T)A(A/T) (Watson 2014), (3) TATA(A/T)AA(A/G) (Juven-Gershon 2010), and (4) TATA(A/T)A(A/T)(A/G) (Basehoar 2004).

The TATA boxes have the pattern of appearing in only the negative direction UTRs, proximal and distals. The shorter TATA box: TATAAA does appear as above but also in the positive direction as the complement inverse TTTATA at 2588 in the distal promoter.

TATABs

TATACs

TATAJs

TATAWs

TEAs

TECs

THRs

TRFs

UPREs

UPRE-1s

URS (Sumrada, core)

VDREs

XCPE1s

Yaps

YYRNWYY Inrs

A1BG orthologs

Geotrypetes seraphini

File:Geotrypetes seraphini 81151944.jpg
Geotrypetes seraphini, the Gaboon caecilian, is a species of amphibian. Credit: Marius Burger.{{free media}}

Geotrypetes seraphini, the Gaboon caecilian, is a species of amphibian in the family Dermophiidae.[30]

Its A1BG ortholog has 368 aa vs 495 aa for Homo sapiens.

ZSCAN22

  1. Gene ID: 342945 is ZSCAN22 zinc finger and SCAN domain containing 22 on 19q13.43.[31] ZSCAN22 is transcribed in the negative direction from LOC100887072.[31]
  2. Gene ID: 102465484 is MIR6806 microRNA 6806 on 19q13.43: "microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop."[32] MIR6806 is transcribed in the negative direction from LOC105372480.[32]

Of the some 111 gaps between genes on chromosome locus 19q13.43 as of 4 August 2020, gap number 88 is between ZSCAN22 and A1BG. But, there is no gap between ZNF497 and A1BG.

Promoters

The core promoter begins approximately -35 nts upstream from the transcription start site (TSS). For the numbered nucleotides between ZSCAN22 and A1BG the core promoter extends from 4425 nts up to 4460 nts (TSS). The proximal promoter extends from approximately -250 to the TSS or 4210 nts up to 4460 nts. The distal promoter begins at about 2460 nts and extends to about 4210 nts.

From the ZNF497 side the core promoter begins about 4265 nts up to 4300 nts, the proximal promoter from 4050 nts to 4265 nts, and the distal promoter from 2300 nts to 4050 nts.

Alpha-1-B glycoprotein

Def. "a substance that induces an immune response, usually foreign"[33] is called an antigen.

Def. any "substance that elicits [an] immune response"[34] is called an immunogen.

An antigen "or immunogen is a molecule that sometimes stimulates an immune system response."[35] But, "the immune system does not consist of only antibodies",[35] instead it "encompasses all substances that can be recognized by the adaptive immune system."[35]

Def. "a protein produced by B-lymphocytes that binds to [a specific antigen or][36] an antigen"[37] is called an antibody.

Five different antibody isotypes are known in mammals, which perform different roles, and help direct the appropriate immune response for each different type of foreign object they encounter.[38]

Although the general structure of all antibodies is very similar, a small region, known as the hypervariable region, at the tip of the protein is extremely variable, allowing millions of antibodies with slightly different tip structures to exist, where each of these variants can bind to a different target, known as an antigen.[39]

Def. "any of the glycoproteins in blood serum that respond to invasion by foreign antigens and that protect the host by removing pathogens;"[40] "an antibody"[41] is called an immunoglobulin.

Gene ID: 1 is A1BG alpha-1-B glycoprotein on 19q13.43, a 54.3 kDa protein in humans that is encoded by the A1BG gene.[42] A1BG is transcribed in the positive direction from ZNF497.[42] "The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins."[42]

  1. NP_570602.2 alpha-1B-glycoprotein precursor, cd05751 Location: 401 → 493 Ig1_LILRB1_like; First immunoglobulin (Ig)-like domain found in Leukocyte Ig-like receptors (LILR)B1 (also known as LIR-1) and similar proteins, smart00410 Location: 218 → 280 IG_like; Immunoglobulin like, pfam13895 Location: 210 → 301 Ig_2; Immunoglobulin domain and cl11960 Location: 28 → 110 Ig; Immunoglobulin domain.[42]

Patients who have pancreatic ductal adenocarcinoma show an overexpression of A1BG in pancreatic juice.[43]

Immunoglobulin supergene family

"𝛂1B-glycoprotein(𝛂1B) [...] consists of a single polypeptide chain N-linked to four glucosamine oligosaccharides. The polypeptide has five intrachain disulfide bonds and contains 474 amino acid residues. [...] 𝛂1B exhibits internal duplication and consists of five repeating structural domains, each containing about 95 amino acids and one disulfide bond. [...] several domains of 𝛂1B, especially the third, show statistically significant homology to variable regions of certain immunoglobulin light and heavy chains. 𝛂1B [...] exhibits sequence similarity to other members of the immunoglobulin supergene family such as the receptor for transepithelial transport of IgA and IgM and the secretory component of human IgA."[44]

"Some of the domains of 𝛂1B show significant homology to variable (V) and constant (C) regions of certain immunoglobulins. Likewise, there is statistically significant homology between 𝛂1B and the secretory component (SC) of human IgA (15) and also with the extracellular portion of the rabbit receptor for transepithelial transport of polymeric immunoglobulins (IgA and IgM). Mostov et al. (16) have called the later protein the poly-Ig receptor or poly-IgR and have shown that it is the precursor of SC."[44]

The immunoglobulin supergene family is "the group of proteins that have immunoglobulin-like domains, including histocompatibility antigens, the T-cell antigen receptor, poly-IgR, and other proteins involved in the vertebrate immune response (17)."[44]

"The internal homology in primary structure [...] and the presence of an intrasegment disulfide bond suggest that 𝛂1B is composed of five structural domains that arose by duplication of a primordial gene coding for about 95 amino acid residues."[44]

"Unlike immunoglobulins (25), ceruloplasmin (6), and hemopexin (7), 𝛂1B is not subject to limited interdomain cleavage by proteolytic enzymes. At least, we were not able to produce such fragments by use of a variety of proteases. This stability of 𝛂1B is probably associated with the frequency of proline in the sequences linking the domains [...]."[44]

"A peptide identified in the late and early milk proteomes showed homology to eutherian alpha 1B glycoprotein (A1BG), a plasma protein with unknown function46, as well as venom inhibitors characterised in the Southern opossum Didelphis marsupialis (DM43 and DM4647,48,49), all members of the immunoglobulin superfamily. To characterise the relationship between the peptide sequence identified in koala, A1BG, DM43 and DM46, a phylogenetic tree was constructed [...] including all marsupial and monotreme homologs (identified by BLAST), three phylogenetically representative eutherian sequences, with human IGSF1 and TARM1, related members of the immunoglobulin super family, used as outgroups. This phylogeny indicates that A1BG-like proteins in marsupials and the Didelphis antitoxic proteins are homologs of eutherian A1BG, with excellent bootstrap support (98%). The marsupial A1BG-like sequences and the Didelphis antitoxic proteins formed a single clade with strong bootstrap support (97%)."[45]

"Human TARM1 and IGSF1, related members of the immunoglobulin superfamily are used as outgroups. The tree was constructed using the maximum likelihood approach and the JTT model with bootstrap support values from 500 bootstrap tests. Bootstrap values less than 50% are not displayed. Accession numbers: Tasmanian devil (Sarcophilus harrisii; XP_012402143), Wallaby (Macropus eugenii; FY619507), Possum (Trichosurus vulpecula; DY596639) Virginia opossum (Didelphis virginiana; AAA30970, AAN06914), Southern opossum (Didelphis marsupialis; AAL82794, P82957, AAN64698), Human (Homo sapiens; P04217, B6A8C7, Q8N6C5), Platypus (Ornithorhychus anatinus; ENSOANP00000000762), Cow (Bos taurus; Q2KJF1), Alpaca (Vicugna pacos; XP_015107031)."[45]

"The sequences of 𝛂1B-glycoprotein (38) and chicken N-CAM (neural cell-adhesion molecule) (39) have been shown to be related to the immunoglobulin supergene family."[46]

A1BG contains the immunoglobulin domain: cl11960 and three immunoglobulin-like domains: pfam13895, cd05751 and smart00410.

"Immunoglobulin (Ig) domain [cl11960] found in the Ig superfamily. The Ig superfamily is a heterogenous group of proteins, built on a common fold comprised of a sandwich of two beta sheets. Members of this group are components of immunoglobulin, neuroglia, cell surface glycoproteins, such as, T-cell receptors, CD2, CD4, CD8, and membrane glycoproteins, such as, butyrophilin and chondroitin sulfate proteoglycan core protein. A predominant feature of most Ig domains is a disulfide bridge connecting the two beta-sheets with a tryptophan residue packed against the disulfide bond."[47]

"This domain [pfam13895] contains immunoglobulin-like domains."[48]

"Ig1_LILR_KIR_like: [cd05751] domain similar to the first immunoglobulin (Ig)-like domain found in Leukocyte Ig-like receptors (LILRs) and Natural killer inhibitory receptors (KIRs). This group includes LILRB1 (or LIR-1), LILRA5 (or LIR9), an activating natural cytotoxicity receptor NKp46, the immune-type receptor glycoprotein VI (GPVI), and the IgA-specific receptor Fc-alphaRI (or CD89). LILRs are a family of immunoreceptors expressed on expressed on T and B cells, on monocytes, dendritic cells, and subgroups of natural killer (NK) cells. The human LILR family contains nine proteins (LILRA1-3,and 5, and LILRB1-5). From functional assays, and as the cytoplasmic domains of various LILRs, for example LILRB1 (LIR-1), LILRB2 (LIR-2), and LILRB3 (LIR-3) contain immunoreceptor tyrosine-based inhibitory motifs (ITIMs) it is thought that LIR proteins are inhibitory receptors. Of the eight LIR family proteins, only LIR-1 (LILRB1), and LIR-2 (LILRB2), show detectable binding to class I MHC molecules; ligands for the other members have yet to be determined. The extracellular portions of the different LIR proteins contain different numbers of Ig-like domains for example, four in the case of LILRB1 (LIR-1), and LILRB2 (LIR-2), and two in the case of LILRB4 (LIR-5). The activating natural cytotoxicity receptor NKp46 is expressed in natural killer cells, and is organized as an extracellular portion having two Ig-like extracellular domains, a transmembrane domain, and a small cytoplasmic portion. GPVI, which also contains two Ig-like domains, participates in the processes of collagen-mediated platelet activation and arterial thrombus formation. Fc-alphaRI is expressed on monocytes, eosinophils, neutrophils and macrophages; it mediates IgA-induced immune effector responses such as phagocytosis, antibody-dependent cell-mediated cytotoxicity and respiratory burst."[49]

"IG domains [smart00410] that cannot be classified into one of IGv1, IGc1, IGc2, IG."[50] "𝛂1B-glycoprotein(𝛂1B) [...] consists of a single polypeptide chain N-linked to four glucosamine oligosaccharides. The polypeptide has five intrachain disulfide bonds and contains 474 amino acid residues. [...] 𝛂1B exhibits internal duplication and consists of five repeating structural domains, each containing about 95 amino acids and one disulfide bond. [...] several domains of 𝛂1B, especially the third, show statistically significant homology to variable regions of certain immunoglobulin light and heavy chains. 𝛂1B [...] exhibits sequence similarity to other members of the immunoglobulin supergene family such as the receptor for transepithelial transport of IgA and IgM and the secretory component of human IgA."[44]

A1BG protein species

Def. a "group of plants or animals having similar appearance"[51] or "the largest group of organisms in which [any][52] two individuals [of the appropriate sexes or mating types][52] can produce fertile offspring, typically by sexual reproduction"[53] is called a species.

The gene contains 20 distinct introns.[54] Transcription produces 15 different mRNAs, 10 alternatively spliced variants and 5 unspliced forms.[54] There are 4 probable alternative promoters, 4 non overlapping alternative last exons and 7 validated alternative polyadenylation sites.[54] The mRNAs appear to differ by truncation of the 5' end, truncation of the 3' end, presence or absence of 4 cassette exons, overlapping exons with different boundaries, splicing versus retention of 3 introns.[54]

Variants or isoforms

Def. a "different sequence of a gene (locus)"[55] is called a variant.

Def. any "of several different forms of the same protein, arising from either single nucleotide polymorphisms,[56] differential splicing of mRNA, or post-translational modifications (e.g. sulfation, glycosylation, etc.)"[57] is called an isoform.

Regarding additional isoforms, mention has been made of "new genetic variants of A1BG."[58]

"Proteomic analysis revealed that [a circulating] set of plasma proteins was α 1 B-glycoprotein (A1BG) and its post-translationally modified isoforms."[59]

Pharmacogenomic variants have been reported.[60]

Genotypes

Def. the "part (DNA sequence) of the genetic makeup of an organism which determines a specific characteristic (phenotype) of that organism"[61] or a "group of organisms having the same genetic constitution" [62]is called a genotype.

There are A1BG genotypes.[60]

A1BG has a genetic risk score of rs893184.[60]

"A genetic risk score, including rs16982743, rs893184, and rs4525 in F5, was significantly associated with treatment-related adverse cardiovascular outcomes in whites and Hispanics from the INVEST study and in the Nordic Diltiazem study (meta-analysis interaction P=2.39×10−5)."[60]

Polymorphs

Def. the "regular existence of two or more different genotypes within a given species or population; also, variability of amino acid sequences within a gene's protein"[63] is called polymorphism.

Def. "one of a number of alternative forms of the same gene occupying a given position, [or locus],[64] on a chromosome"[65] is called an allele.

"rs893184 causes a histidine (His) to arginine (Arg) [nonsynonymous single nucleotide polymorphism (nsSNP), A (minor) for G (major)] substitution at amino acid position 52 in A1BG."[60]

"Genetic polymorphism of human plasma (serum) alpha 1B-glycoprotein (alpha 1B) was observed using one-dimensional horizontal polyacrylamide gel electrophoresis (PAGE) pH 9.0 of plasma samples followed by Western blotting with specific antiserum to alpha 1B."[66]

A1B*5 is a "new allele [...] of human plasma 𝜶1B-glycoprotein [...]."[67]

"Genetic polymorphism of human plasma 𝜶1B-glycoprotein (𝜶1B) was reported first, in brief, by Altland et al. [1983; also given in Altkand and Hacklar, 1984]. A detailed description of human 𝜶1B polymorphism was reported in subsequent studies [Gahne et al., 1987; Juneja et al., 1988, 1989]. Five different 𝜶1B alleles (A1B*1, A1B*2, A1B*3, A1B*4 and A1B*5) were reported. In Caucasian whites, the frequencies of A1B*1 and ''A1B*2 were about 0.95 and 0.05, respectively. A1B*4 was observed in 2 related Czech individuals. In American blacks, A1B*1 and A1B*2 occurred with a frequency of 0.73 and 0.21, respectively, while a new allele, viz, A1B*3 had a frequency of 0.06. A1B*5 was observed only in Swedish Lapps and in Finns with a frequency of 0.04 and 0.007, respectively."[68]

"The frequency of A1B*1 varied from 0.89 to 0.91 and that of A1B*2 from 0.08 to 0.10. The A1B*3 allele, reported previously only in American blacks, was observed with a frequency range of 0.003-0.01 in 3 of the Chinese populations, in Koreans and in Malays. A new 𝜶1B allele (A1B*6) was observed in 2 Chinese individuals."[68]

Phenotypes

Def. the "appearance of an organism based on a single trait [multifactorial combination of genetic traits and environmental factors][69], especially used in pedigrees"[70] or any "observable characteristic of an organism, such as its morphological, developmental, biochemical or physiological properties, or its behavior"[71] is called a phenotype.

"The three different phenotypes of α1B observed (designated 1-1, 1-2, and 2-2) were apparently identical to those reported by Altland et al. (1983), who used double one-dimensional electrophoresis. Family data supported the hypothesis that the three α1B phenotypes are determined by two codominant alleles at an autosomal locus, designated A1B. Allele frequencies in a Swedish population were: A1B *1, 0.937; A1B *2, 0.063; PIC, 0.111."[66]

Protein species

"Both protein species of [alpha 1-beta glycoprotein] A1B (A1Ba, p = 0.008; f.c.= +1.62, A1Bb, p = 0.003; f.c. = +1.82) [...] were apparently overexpressed in patients with PTCa [...]."[72]

A1BG is mainly produced in the liver, and is secreted to plasma to levels of approximately 0.22 mg/mL.[44]

CRISPs

The human cysteine-rich secretory protein (CRISP3) "is present in exocrine secretions and in secretory granules of neutrophilic granulocytes and is believed to play a role in innate immunity."[73] CRISP3 has a relatively high content in human plasma.[73]

"The A1BG-CRISP-3 complex is noncovalent with a 1:1 stoichiometry and is held together by strong electrostatic forces."[73] "Similar [complex formation] between toxins from snake venom and A1BG-like plasma proteins ... inhibits the toxic effect of snake venom metalloproteinases or myotoxins and protects the animal from envenomation."[73]

Opossums have a remarkably robust immune system, and show partial or total immunity to the venom of rattlesnakes, Agkistrodon piscivorus, cottonmouths, and other Crotalinae, pit vipers.[74][75]

"Crisp3 [is] mainly [expressed] in the salivary glands, pancreas, and prostate."[76] "CRISP3 is highly expressed in the human cauda epididymidis and ampulla of vas deferens (Udby et al. 2005)."[76]

A1BG-AS1

Gene ID: 503538 is A1BG-AS1 A1BG antisense RNA 1.[77] A1BG-AS1 is transcribed in the negative direction from ZSCAN22.[77]

Gene ID 503538 extends from 58,351,390 to 58,355,183. It is a long, non-coding (lnc) RNA.[78] Extensive evidence indicates that long noncoding RNAs (lncRNAs) regulate the tumorigenesis and progression of hepatocellular carcinoma (HCC).[78]

The underexpression of A1BG-AS1 was found in HCC via analysis of The Cancer Genome Atlas database.[78] A1BG-AS1 expression in HCC was markedly lower than that in noncancerous tissues.[78]

ZNF497

Gene ID: 162968 is ZNF497 zinc finger protein 497.[79] ZNF497 is transcribed in the positive direction from RNA5SP473.[79]

  1. NP_001193938.1 zinc finger protein 497: "Transcript Variant: This variant (2) lacks an alternate exon in the 5' UTR, compared to variant 1. Variants 1 and 2 encode the same protein."[79]
  2. NP_940860.2 zinc finger protein 497: "Transcript Variant: This variant (1) is the longer transcript. Variants 1 and 2 encode the same protein."[79]

Gene ID: 100419840 is LOC100419840 zinc finger protein 446 pseudogene.[80] LOC100419840 may be transcribed in the positive direction from LOC105372483.[80]

Gene ID: 105372483 is LOC105372483 uncharacterized LOC105372483 ncRNA.[81] LOC105372483 is transcribed in the negative direction from LOC100419840.[81]

Gene ID: 106479017 is RNA5SP473 RNA, 5S ribosomal pseudogene 473.[82] RNA5SP473 may be transcribed in the negative direction from ZNF497.[82]

GC contents

Approximately "76% of human core promoters lack TATA-like elements, have a high GC content, and are enriched in Sp1 binding sites."[83]

CpG islands typically occur at or near the transcription start site of genes, particularly housekeeping genes, in vertebrates.[84]

The number of CG or GC pairs near the TSS for A1BG appears to be low: between ZSCAN22 and A1BG are 8.2 % CG/GC and between ZNF497 and A1BG are 15 % CG/GC.

19q13.43

Regulatory elements and regions

Functions of A1BG

"Receptors of the leukocyte receptor cluster (LRC) play a range of important functions in the human immune system."[85]

"The leukocyte receptor cluster (LRC) is a family of structurally related genes for immunoregulatory receptors. Originally, the term LRC was introduced to emphasize the linkage of the genes encoding killer immunoglobulin-like receptors (KIRs), leukocyte Ig-like receptors (LILRs), and FcαR on human chromosome 19q13.4 (Wagtmann et al. 1997; Wende et al. 1999). Subsequently, it has been found that the region contains some other structurally related genes, such as NCR1, GPVI, LAIR1, LAIR2, and OSCAR (Meyaard et al. 1997; Sivori et al. 1997; Clemetson et al. 1999; Kim et al. 2002). Most recently, the LRC has been further extended by adding two more genes named VSTM1/SIRL1 and TARM1 (Steevels et al. 2010; Radjabova et al. 2015)."[85]

"Except for LAIR2, which is a secreted protein, all human LRC products are type I cell surface receptors with extracellular regions composed of 1–4 C2-type Ig-like domains."[85]

The "eutherian LRC family, in addition to commonly recognized members, includes two new, IGSF1 and alpha-1-B glycoprotein (A1BG)."[85]

"Nucleotide sequences were retrieved and analyzed using utilities at the NCBI (https://www.ncbi.nlm.nih.gov/, last accessed May 20, 2019) and Ensemble (http://www.ensembl.org, last accessed May 20, 2019) websites."[85]

"In our previous studies, it was observed that the Ig-like domains of the frog and chicken LRC proteins reproducibly showed homology not only to known LRC members but also to the products of four mammalian genes that to our knowledge have never been considered in the phylogenetic analyses of LRC. These genes are VSTM1, TARM1, A1BG, and IGSF1. VSTM1 and TARM1 are the most recently identified members of the human LRC (Steevels et al. 2010; Radjabova et al. 2015). A1BG encodes alpha-1 B glycoprotein, a soluble component of mammalian blood plasma that is known for half a century (Schultze et al. 1963). The protein is composed of five Ig-like domains and has been shown to bind to CRISP-3, a small polypeptide that is present in exocrine secretions of neutrophilic granulocytes and that is believed to play a role in innate immunity (Udby et al. 2004). In the human genome, A1BG maps to 19q13.4 some 3.3 Mb away from GPVI [...]."[85]

"The attribution of IGSF1 and A1BG domains to the LRC was supported by their 3D structures predicted using homology modeling [...]."[85]

"Noteworthy is that the D1 and D6 domains of IgSF1 fall into one clade with the N-terminal (d1) domains of A1BG and OSCAR (cluster B1). Closer relationship of A1BG and OSCAR was supported by clustering of the d2–d5 domains of A1BG with membrane-proximal (d2) domain of OSCAR (cluster B2)."[85]

"Altogether, these results support the attribution of IGSF1 and A1BG to the LRC and suggest their relatedness to OSCAR, TARM1, and VSTM1."[85]

"Clustering of the N-terminal domains of OSCAR, IGSF1, and A1BG with each other and with IGSF1 d6 was also reproduced. Finally, the d2 domains of OSCAR cluster with the d2–d5 domains of A1BG (fig. 5). These results further justify grouping IGSF1, A1BG, OSCAR, TARM1, and VSTM1 into a distinct group B."[85]

Hypotheses

  1. Downstream core promoters may work as transcription factors even as their complements or inverses.
  2. In addition to the DNA binding sequences listed above, the transcription factors that can open up and attach through the local epigenome need to be known and specified.
  3. Each DNA binding domain serving as a transcription factor for the promoter of any immunoglobulin supergene family member, also serves or is present in the promoters for A1BG.
  4. The function of A1BG is the same as other immunoglobulin genes possessing the immunoglobulin domain cl11960 and/or any of three immunoglobulin-like domains: pfam13895, cd05751 and smart00410 in the order and nucleotide sequence: cd05751 Location: 401 → 493, smart00410 Location: 218 → 280, pfam13895 Location: 210 → 301 and cl11960 Location: 28 → 110.

See also

References

  1. "Entrez Gene: Alpha-1-B glycoprotein". Retrieved 2012-11-09.
  2. 2.0 2.1 "A1BG alpha-1-B glycoprotein". Retrieved May 10, 2013.
  3. Qingliang Li, Rezaul M. Karim, Mo Cheng, Mousumi Das, Lihong Chen, Chen Zhang, Harshani R. Lawrence, Gary W. Daughdrill, Ernst Schonbrunn, Haitao Ji and Jiandong Chen (July 2020). "Inhibition of p53 DNA binding by a small molecule protects mice from radiation toxicity". Oncogene. 39 (29): 5187–5200. doi:10.1038/s41388-020-1344-y. PMID 32555331 Check |pmid= value (help). Retrieved 29 August 2020.
  4. Ruoyi Gu, Jun Xu, Yixiang Lin, Jing Zhang, Huijun Wang, Wei Sheng, Duan Ma, Xiaojing Ma & Guoying Huang (July 2016). "Liganded retinoic acid X receptor α represses connexin 43 through a potential retinoic acid response element in the promoter region". Pediatric Research. 80 (1): 159–168. doi:10.1038/pr.2016.47. PMID 26991262. Retrieved 7 September 2020.
  5. U.S. National Library of Medicine (8 July 2008). "Response Elements MeSH Descriptor Data 2021". 8600 Rockville Pike, Bethesda, MD 20894: National Institutes of Health. Retrieved 22 April 2021.
  6. Benjamin A. Pierce (24 December 2004). Control of Gene Expression, In: Genetics Solutions and Problem Solving MegaManual. Macmillan. p. 221. Retrieved 22 April 2021.
  7. Tania Islas-Flores, Gabriel Guillén, Xóchitl Alvarado-Affantranger, Miguel Lara-Flores, Federico Sánchez, and Marco A. Villanueva (2011). "PvRACK1 Loss-of-Function Impairs Cell Expansion and Morphogenesis in Phaseolus vulgaris L. Root Nodules". Molecular Plant-Microbe Interactions. 24 (7): 819–826. doi:10.1094/MPMI-11-10-0261. Retrieved 25 April 2021.
  8. Nijhawan A, Jain M, Tyagi AK, Khurana JP (February 2008). "Genomic survey and gene expression analysis of the basic leucine zipper transcription factor family in rice". Plant Physiology. 146 (2): 333–50. doi:10.1104/pp.107.112821. PMID 18065552.
  9. Randy Foster, Takeshi Izawa and Nam-Hai Chua (1 February 1994). "Plant bZIP proteins gather at ACGT elements". FASEB. 8 (2): 192–200. doi:10.1096/fasebj.8.2.8119490. PMID 8119490. Retrieved 25 June 2021.
  10. Ganesh M. Nawkar, Chang Ho Kanga, Punyakishore Maibam, Joung Hun Park, Young Jun Jung, Ho Byoung Chae, Yong Hun Chi, In Jung Jung, Woe Yeon Kim, Dae-Jin Yun, and Sang Yeol Lee (21 February 2017). "HY5, a positive regulator of light signaling, negatively controls the unfolded protein response in Arabidopsis" (PDF). Proceedings of the National Academy of Sciences USA. 114 (8): 2084–89. doi:10.1073/pnas.1609844114. Retrieved 24 June 2021.
  11. RefSeq (January 2016). "MYB MYB proto-oncogene, transcription factor [ Homo sapiens (human) ]". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 7 February 2021.
  12. 12.0 12.1 12.2 12.3 Matthew J. Rossi, William K.M. Lai and B. Franklin Pugh (21 March 2018). "Genome-wide determinants of sequence-specific DNA binding of general regulatory factors". Genome Research. 28: 497–508. doi:10.1101/gr.229518.117. PMID 29563167. Retrieved 31 August 2020.
  13. Dalei Shao, Caretha L. Creasy, Lawrence W. Bergman (1 February 1998). "A cysteine residue in helixII of the bHLH domain is essential for homodimerization of the yeast transcription factor Pho4p". Nucleic Acids Research. 26 (3): 710–4. doi:10.1093/nar/26.3.710. PMC 147311. PMID 9443961.
  14. Hongting Tang, Yanling Wu, Jiliang Deng, Nanzhu Chen, Zhaohui Zheng, Yongjun Wei, Xiaozhou Luo, and Jay D. Keasling (6 August 2020). "Promoter Architecture and Promoter Engineering in Saccharomyces cerevisiae". Metabolites. 10 (8): 320–39. doi:10.3390/metabo10080320. PMID 32781665 Check |pmid= value (help). Retrieved 18 September 2020.
  15. Chaudhary J, Skinner MK (1999). "Basic helix-loop-helix proteins can act at the E-box within the serum response element of the c-fos promoter to influence hormone-induced promoter activation in Sertoli cells". Mol. Endocrinol. 13 (5): 774–86. doi:10.1210/mend.13.5.0271. PMID 10319327.
  16. Nibedita Lenka, Aruna Basu, Jayati Mullick, and Narayan G. Avadhani (22 November 1996). "The role of an E box binding basic helix loop helix protein in the cardiac muscle-specific expression of the rat cytochrome oxidase subunit VIII gene" (PDF). The Journal of Biological Chemistry. 271 (47): 30281–30289. doi:10.1074/jbc.271.47.30281. Retrieved 7 February 2019.
  17. Hoek KS, Schlegel NC, Eichhoff OM, Widmer DS, Praetorius C, Einarsson SO, Valgeirsdottir S, Bergsteinsdottir K, Schepsky A, Dummer R, Steingrimsson E (2008). "Novel MITF targets identified using a two-step DNA microarray strategy". Pigment Cell Melanoma Res. 21 (6): 665–76. doi:10.1111/j.1755-148X.2008.00505.x. PMID 19067971.
  18. Ravi P. Misra; Azad Bonni; Cindy K. Miranti; Victor M. Rivera; Morgan Sheng; Michael E.Greenberg (14 October 1994). "L-type Voltage-sensitive Calcium Channel Activation Stimulates Gene Expression by a Serum Response Factor-dependent Pathway" (PDF). The Journal of Biological Chemistry. 269 (41): 25483–25493. PMID 7929249. Retrieved 7 December 2019.
  19. Xiaomin Zhang, Gohar Azhar, Jeanne Y. Wei (21 December 2017). "SIRT2 gene has a classic SRE element, is a downstream target of serum response factor and is likely activated during serum stimulation". PLOS One. 12 (12): e0190011. doi:10.1371/journal.pone.0190011. Retrieved 23 February 2021.
  20. Amanda Salviano-Silva, Maria Luiza Petzl-Erler & Angelica Beate Winter Boldt (29 April 2017). "CD59 polymorphisms are associated with gene expression and different sexual susceptibility to pemphigus foliaceus". Autoimmunity. 50 (6): 377–385. doi:10.1080/08916934.2017.1329830. Retrieved 27 September 2021.
  21. Svoboda, P., & Cara, A. (2006). Hairpin RNA: A secondary structure of primary importance. Cellular and Molecular Life Sciences, 63(7), 901-908.
  22. Meyer, Michelle; Deiorio-Haggar K; Anthony J (July 2013). "RNA structures regulating ribosomal protein biosynthesis in bacilli". RNA Biology. 7. 10: 1160–1164. doi:10.4161/rna.24151. PMID 23611891.
  23. Malys N, Nivinskas R (2009). "Non-canonical RNA arrangement in T4-even phages: accommodated ribosome binding site at the gene 26-25 intercistronic junction". Mol Microbiol. 73 (6): 1115–1127. doi:10.1111/j.1365-2958.2009.06840.x. PMID 19708923.
  24. Malys N, McCarthy JEG (2010). "Translation initiation: variations in the mechanism can be anticipated". Cellular and Molecular Life Sciences. 68 (6): 991–1003. doi:10.1007/s00018-010-0588-z. PMID 21076851.
  25. Pabo CO, Peisach E, Grant RA (2001). "Design and selection of novel Cys2His2 zinc finger proteins". Annual Review of Biochemistry. 70: 313–40. doi:10.1146/annurev.biochem.70.1.313. PMID 11395410.
  26. NCBI (9 March 2021). "AP-1 transcription factor network". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 26 October 2021.
  27. Toshifumi Nagata, Aeni Hosaka-Sasaki and Shoshi Kikuchi (2016). Daniel H. Gonzalez, ed. The Evolutionary Diversification of Genes that Encode Transcription Factor Proteins in Plants, In: Plant Transcription Factors Evolutionary, Structural and Functional Aspects. Academic Press. pp. 73–97. doi:10.1016/B978-0-12-800854-6.00005-1. ISBN 978-0-12-800854-6. Retrieved 28 November 2021.
  28. Marcos Palavecino-Ruiz, Mariana Bermudez-Moretti, Susana Correa-Garcia (1 November 2017). "Unravelling the transcriptional regulation of Saccharomyces cerevisiae UGA genes: the dual role of transcription factor LEU3" (PDF). Microbiology. doi:10.1099/mic.0.000560. Retrieved 21 February 2021.
  29. James R. Mitchell, Jeffrey Cheng, ang Kathleen Collins (January 1999). "A Box H/ACA Small Nucleolar RNA-Like Domain at the Human Telomerase RNA 3' End" (PDF). Molecular and Cellular Biology. 19 (1): 567–576. Retrieved 5 November 2018.
  30. IUCN SSC Amphibian Specialist Group (2019). "Geotrypetes seraphini". 2019: e.T59557A16957715. doi:10.2305/IUCN.UK.2019-1.RLTS.T59557A16957715.en. Retrieved 16 November 2021.
  31. 31.0 31.1 HGNC (13 March 2020). "ZSCAN22 zinc finger and SCAN domain containing 22 [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  32. 32.0 32.1 RefSeq (10 September 2009). "MIR6806 microRNA 6806 [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  33. Jag123 (7 March 2005). "antigen". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 7 March 2020.
  34. SemperBlotto (21 April 2008). "immunogen". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 8 March 2020.
  35. 35.0 35.1 35.2 C. Michael Gibson (27 April 2008). "Antigen". Boston, Massachusetts: WikiDoc Foundation. Retrieved 8 March 2020.
  36. Williamsayers79 (26 February 2007). "antibody". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 7 March 2020.
  37. Jag123 (7 March 2005). "antibody". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 7 March 2020.
  38. Eleonora Market, F. Nina Papavasiliou (2003). "V(D)J Recombination and the Evolution of the Adaptive Immune System". PLoS Biology. 1 (1): e16. doi:10.1371/journal.pbio.0000016.
  39. Charles A Janeway, Jr, Paul Travers, Mark Walport, and Mark J Shlomchik (2001). Immunobiolog (5th ed. ed.). Garland Publishing. ISBN 0-8153-3642-X.
  40. SemperBlotto (25 February 2006). "immunoglobulin". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 7 March 2020.
  41. SemperBlotto (28 April 2008). "immunoglobulin". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 7 March 2020.
  42. 42.0 42.1 42.2 42.3 RefSeq (10 December 2019). "A1BG alpha-1-B glycoprotein [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  43. Mei Tian, Ya-Zhou Cui, Guan-Hua Song, Mei-Juan Zong, Xiao-Yan Zhou, Yu Chen, Jin-Xiang Han (2008). "Proteomic analysis identifies MMP-9, DJ-1 and A1BG as overexpressed proteins in pancreatic juice from pancreatic ductal adenocarcinoma patients". BMC Cancer. 8: 241. doi:10.1186/1471-2407-8-241. PMC 2528014. PMID 18706098.
  44. 44.0 44.1 44.2 44.3 44.4 44.5 44.6 Noriaki Ishioka, Nobuhiro Takahashi, and Frank W. Putnam (April 1986). "Amino acid sequence of human plasma 𝛂1B-glycoprotein: Homology to the immunoglobulin supergene family" (PDF). Proceedings of the National Academy of Sciences USA. 83 (8): 2363–7. doi:10.1073/pnas.83.8.2363. PMID 3458201. Retrieved 9 March 2020.
  45. 45.0 45.1 Katrina M. Morris, Denis O’Meally, Thiri Zaw, Xiaomin Song, Amber Gillett, Mark P. Molloy, Adam Polkinghorne, and Katherine Belova (7 October 2016). "Characterisation of the immune compounds in koala milk using a combined transcriptomic and proteomic approach". Scientific Reports. 6: 35011. doi:10.1038/srep35011. PMID 27713568. Retrieved 14 March 2020.
  46. R. J. Paxton, G. Mooser, H. Pande, T. D. Lee, and J. E. Shively (1 February 1987). "Sequence analysis of carcinoembryonic antigen: identification of glycosylation sites and homology with the immunoglobulin supergene family" (PDF). Proceedings of the National Academy of Sciences USA. 84 (4): 920–924. doi:10.1073/pnas.84.4.920. PMID 3469650. Retrieved 26 March 2020.
  47. NCBI (2 February 2016). "Conserved Protein Domain Family cl11960: Ig Superfamily". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 22 May 2020.
  48. NCBI (5 August 2015). "Conserved Protein Domain Family pfam13895: Ig_2". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 24 May 2020.
  49. NCBI (16 August 2016). "Conserved Protein Domain Family cd05751: Ig1_LILR_KIR_like". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 24 May 2020.
  50. NCBI (16 January 2013). "Conserved Protein Domain Family smart00410: IG_like". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 24 May 2020.
  51. 24.98.118.180 (28 February 2007). "species". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  52. 52.0 52.1 Peter coxhead (22 August 2018). "Species". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  53. Chiswick Chap (1 December 2016). "Species". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  54. 54.0 54.1 54.2 54.3 "AceView: A1BG". Retrieved May 11, 2013.
  55. Pdeitiker (26 July 2008). "variant". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  56. SemperBlotto (6 January 2007). "isoform". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2 December 2018.
  57. 72.178.245.181 (30 November 2008). "isoform". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2 December 2018.
  58. H Eiberg, ML Bisgaard, J Mohr (1 December 1989). "Linkage between alpha 1B-glycoprotein (A1BG) and Lutheran (LU) red blood group system: assignment to chromosome 19: new genetic variants of A1BG". Clinical genetics. 36 (6): 415–8. PMID 2591067. Retrieved 2017-10-08.
  59. John R. Stehle Jr., Mark E. Weeks, Kai Lin, Mark C. Willingham, Amy M. Hicks, John F. Timms, Zheng Cui (January 2007). "Mass spectrometry identification of circulating alpha-1-B glycoprotein, increased in aged female C57BL/6 mice". Biochimica et Biophysica Acta (BBA) - General Subjects. 1770 (1): 79–86. doi:10.1016/j.bbagen.2006.06.020. PMID 16945486. Retrieved 2017-10-08.
  60. 60.0 60.1 60.2 60.3 60.4 Caitrin W. McDonough, Yan Gong, Sandosh Padmanabhan, Ben Burkley, Taimour Y. Langaee, Olle Melander, Carl J. Pepine, Anna F. Dominiczak, Rhonda M. Cooper-DeHoff, and Julie A. Johnson (June 2013). "Pharmacogenomic Association of Nonsynonymous SNPs in SIGLEC12, A1BG, and the Selectin Region and Cardiovascular Outcomes" (PDF). Hypertension. 62 (1): 48–54. doi:10.1161/HYPERTENSIONAHA.111.00823. PMID 23690342. Retrieved 2017-10-08.
  61. DTLHS (10 January 2018). "genotype". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  62. SemperBlotto (22 October 2005). "genotype". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  63. Widsith (28 March 2012). "polymorphism". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  64. 217.105.66.98 (8 September 2016). "allele". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  65. 138.130.33.215 (7 April 2004). "allele". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  66. 66.0 66.1 B. Gahne, R. K. Juneja, and A. Stratil (June 1987). "Genetic polymorphism of human plasma alpha 1B-glycoprotein: phenotyping by immunoblotting or by a simple method of 2-D electrophoresis". Human Genetics. 76 (2): 111–5. doi:10.1007/bf00284904. PMID 3610142. Retrieved 25 March 2020.
  67. R.K. Juneja, G. Beckman, M. Lukka, B. Gahne, and C. Ehnholm (1989). "Plasma α1B-Glycoprotein Allele Frequencies in Finns and Swedish Lapps: Evidence for a New α1B Allele". Human Heredity. 39 (1): 32–36. doi:10.1159/000153828. PMID 2759622. Retrieved 25 March 2020.
  68. 68.0 68.1 R.K. Juneja, N. Saha, B. Gahne and J.S.H. Tay (1989). "Distribution of Plasma Alpha-1-B-Glycoprotein Phenotypes in Several Mongoloid Populations of East Asia". Human Heredity. 39: 218–222. doi:10.1159/000153863. PMID 2583734. Retrieved 25 March 2020.
  69. 24.235.196.118 (23 September 2007). "phenotype". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2016-10-04.
  70. SemperBlotto (14 February 2005). "phenotype". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2016-10-04.
  71. N2e (3 July 2008). "phenotype". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2016-10-04.
  72. Mardiaty Iryani Abdullah, Ching Chin Lee, Sarni Mat Junit, Khoon Leong Ng, and Onn Haji Hashim (13 September 2016). "Tissue and serum samples of patients with papillary thyroid cancer with and without benign background demonstrate different altered expression of proteins". Peer J. 4: e2450. doi:10.7717/peerj.2450. PMID 27672505. Retrieved 15 March 2020.
  73. 73.0 73.1 73.2 73.3 Udby L, Sørensen OE, Pass J, Johnsen AH, Behrendt N, Borregaard N, Kjeldsen L. (12 October 2004). "Cysteine-rich secretory protein 3 is a ligand of alpha1B-glycoprotein in human plasma". Biochemistry. 43 (40): 12877–86. doi:10.1021/bi048823e. PMID 15461460. Retrieved 2011-11-28.
  74. "The Opossum: Our Marvelous Marsupial, The Social Loner". Wildlife Rescue League.
  75. Journal Of Venomous Animals And Toxins – Anti-Lethal Factor From Opossum Serum Is A Potent Antidote For Animal, Plant And Bacterial Toxins. Retrieved 2009-12-29.
  76. 76.0 76.1 B Haendler, J Krätzschmar, F Theuring and W D Schleuning (July 1993). "Transcripts for cysteine-rich secretory protein-1 (CRISP-1; DE/AEG) and the novel related CRISP-3 are expressed under androgen control in the mouse salivary gland". Endocrinology. 133 (1): 192–8. doi:10.1210/en.133.1.192. PMID 8319566. Retrieved 2012-02-20.
  77. 77.0 77.1 HGNC (10 December 2019). "A1BG-AS1 A1BG antisense RNA 1 [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  78. 78.0 78.1 78.2 78.3 Jigang Bai, Bowen Yao, Liang Wang, Liankang Sun, Tianxiang Chen, Runkun Liu, Guozhi Yin, Qiuran Xu, Wei Yang (June 2019). "lncRNA A1BG-AS1 suppresses proliferation and invasion of hepatocellular carcinoma cells by targeting miR-216a-5p". 120 (6): 10310–10322. doi:10.1002/jcb.28315. PMID 30556161. Retrieved 16 May 2023.
  79. 79.0 79.1 79.2 79.3 HGNC (10 December 2019). "ZNF497 zinc finger protein 497 [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  80. 80.0 80.1 HGNC (10 December 2019). "LOC100419840 zinc finger protein 446 pseudogene [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  81. 81.0 81.1 HGNC (10 December 2019). "LOC105372483 uncharacterized LOC105372483 [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  82. 82.0 82.1 HGNC (10 December 2019). "RNA5SP473 RNA, 5S ribosomal pseudogene 473 [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  83. Chuhu Yang, Eugene Bolotin, Tao Jiang, Frances M. Sladek, Ernest Martinez. (2007). "Prevalence of the initiator over the TATA box in human and yeast genes and identification of DNA motifs enriched in human TATA-less core promoters". Gene. 389 (1): 52–65. doi:10.1016/j.gene.2006.09.029. PMID 17123746. Unknown parameter |month= ignored (help)
  84. Saxonov S, Berg P, Brutlag DL (2006). "A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters". Proc Natl Acad Sci USA. 103 (5): 1412–1417. doi:10.1073/pnas.0510310103. PMC 1345710. PMID 16432200.
  85. 85.0 85.1 85.2 85.3 85.4 85.5 85.6 85.7 85.8 85.9 Sergey V Guselnikov and Alexander V Taranin (1 June 2019). "Unraveling the LRC Evolution in Mammals: IGSF1 and A1BG Provide the Keys". Genome Biology and Evolution. 11 (6): 1586–1601. doi:10.1093/gbe/evz102. PMID 31106814. |access-date= requires |url= (help)

External links

{{Phosphate biochemistry}}