Complex locus A1BG and ZNF497: Difference between revisions

Jump to navigation Jump to search
Line 2,685: Line 2,685:
"Within the cAMP-responsive element of the somatostatin gene, we observed an 8-base palindrome, 5'-TGACGTCA-3', which is highly conserved in many other genes whose expression is regulated by cAMP."<ref name=Montminy/>
"Within the cAMP-responsive element of the somatostatin gene, we observed an 8-base palindrome, 5'-TGACGTCA-3', which is highly conserved in many other genes whose expression is regulated by cAMP."<ref name=Montminy/>


The consensus sequence for the myocyte enhancer factor 2 (MEF2) is (C/T)TA(A/T)(A/T)(A/T)(A/T)TA(A/G).<ref name=Zia>{{ cite journal
The consensus sequence for the myocyte enhancer factor 2 (MEF2) is (C/T)TA(A/T)(A/T)(A/T)(A/T)TA(A/G).<ref name=Zia/>
|author=Ayisha Zia, Muhammad Imran, and Sajid Rashid
|title=In Silico Exploration of Conformational Dynamics and Novel Inhibitors for Targeting MEF2-Associated Transcriptional Activity
|journal=Journal of Chemical Information and Modeling
|date=7 February 2020
|volume=60
|issue=3
|pages=1892-1909
|url=https://pubs.acs.org/doi/abs/10.1021/acs.jcim.0c00008
|arxiv=
|bibcode=
|doi=10.1021/acs.jcim.0c00008
|pmid=
|accessdate=10 September 2020 }}</ref>


The SRE wild type (SREwt) contains the nucleotide sequence ACAGGATGTCCATATTAGGACATCTGC, of which CCATATTAGG is the CArG box, TTAGGACAT is the C/EBP box, and CATCTG is the E box.<ref name=Misra>{{ cite journal
The SRE wild type (SREwt) contains the nucleotide sequence ACAGGATGTCCATATTAGGACATCTGC, of which CCATATTAGG is the CArG box, TTAGGACAT is the C/EBP box, and CATCTG is the E box.<ref name=Misra>{{ cite journal

Revision as of 02:51, 24 September 2020

Associate Editor(s)-in-Chief: Henry A. Hoff

Alpha-1-B glycoprotein is a 54.3 kDa protein in humans that is encoded by the A1BG gene.[1] The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins.

A1BG is located on the negative DNA strand of chromosome 19 from 58,858,172 – 58,864,865.[2] Additionally, A1BG is located directly adjacent to the ZSCAN22 gene (58,838,385-58,853,712) on the positive DNA strand, as well as the ZNF837 (58,878,990 - 58,892,389, complement) and ZNF497 (58865723 - 58,874,214, complement) genes on the negative strand.[2]

Introduction

"Many important disease-related pathways utilize transcription factors that specifically bind DNA (e.g., c-Myc, HIF-1, TCF1, p53) as key nodes or endpoints in complex signaling networks. In such cases the transcription factor itself is often the most attractive target. However, drugging transcription factors is challenging owing to an absence of small ligand binding sites in their DNA-binding domain and the presence of a highly charged DNA-binding surface [1]."[3]

If a specific gene appears to be involved in a disease-related or deleterious pathway being able to alter its expression so as to improve the person's health may be needed. To alter its expression constructively may require knowing what regulatory elements exist in the gene's nearby promoters.

Identifying a bona fide response element is more difficult than a simple inspection. In order to attribute the response element to a candidate sequence, some observations have to be conducted using molecular, biological and biophysical methods and functional approaches. Findings may indicate that response element in the promoter is a functional element.[4]

A likely response element found by simple inspection may also be inactive due to methylation.

ZSCAN22

  1. Gene ID: 342945 is ZSCAN22 zinc finger and SCAN domain containing 22 on 19q13.43.[5] ZSCAN22 is transcribed in the negative direction from LOC100887072.[5]
  2. Gene ID: 102465484 is MIR6806 microRNA 6806 on 19q13.43: "microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop."[6] MIR6806 is transcribed in the negative direction from LOC105372480.[6]

Of the some 111 gaps between genes on chromosome locus 19q13.43 as of 4 August 2020, gap number 88 is between ZSCAN22 and A1BG. But, there is no gap between ZNF497 and A1BG.

Alpha-1-B glycoprotein

Def. "a substance that induces an immune response, usually foreign"[7] is called an antigen.

Def. any "substance that elicits [an] immune response"[8] is called an immunogen.

An antigen "or immunogen is a molecule that sometimes stimulates an immune system response."[9] But, "the immune system does not consist of only antibodies",[9] instead it "encompasses all substances that can be recognized by the adaptive immune system."[9]

Def. "a protein produced by B-lymphocytes that binds to [a specific antigen or][10] an antigen"[11] is called an antibody.

Five different antibody isotypes are known in mammals, which perform different roles, and help direct the appropriate immune response for each different type of foreign object they encounter.[12]

Although the general structure of all antibodies is very similar, a small region, known as the hypervariable region, at the tip of the protein is extremely variable, allowing millions of antibodies with slightly different tip structures to exist, where each of these variants can bind to a different target, known as an antigen.[13]

Def. "any of the glycoproteins in blood serum that respond to invasion by foreign antigens and that protect the host by removing pathogens;"[14] "an antibody"[15] is called an immunoglobulin.

Gene ID: 1 is A1BG alpha-1-B glycoprotein on 19q13.43, a 54.3 kDa protein in humans that is encoded by the A1BG gene.[16] A1BG is transcribed in the positive direction from ZNF497.[16] "The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins."[16]

  1. NP_570602.2 alpha-1B-glycoprotein precursor, cd05751 Location: 401 → 493 Ig1_LILRB1_like; First immunoglobulin (Ig)-like domain found in Leukocyte Ig-like receptors (LILR)B1 (also known as LIR-1) and similar proteins, smart00410 Location: 218 → 280 IG_like; Immunoglobulin like, pfam13895 Location: 210 → 301 Ig_2; Immunoglobulin domain and cl11960 Location: 28 → 110 Ig; Immunoglobulin domain.[16]

Patients who have pancreatic ductal adenocarcinoma show an overexpression of A1BG in pancreatic juice.[17]

Immunoglobulin supergene family

"𝛂1B-glycoprotein(𝛂1B) [...] consists of a single polypeptide chain N-linked to four glucosamine oligosaccharides. The polypeptide has five intrachain disulfide bonds and contains 474 amino acid residues. [...] 𝛂1B exhibits internal duplication and consists of five repeating structural domains, each containing about 95 amino acids and one disulfide bond. [...] several domains of 𝛂1B, especially the third, show statistically significant homology to variable regions of certain immunoglobulin light and heavy chains. 𝛂1B [...] exhibits sequence similarity to other members of the immunoglobulin supergene family such as the receptor for transepithelial transport of IgA and IgM and the secretory component of human IgA."[18]

"Some of the domains of 𝛂1B show significant homology to variable (V) and constant (C) regions of certain immunoglobulins. Likewise, there is statistically significant homology between 𝛂1B and the secretory component (SC) of human IgA (15) and also with the extracellular portion of the rabbit receptor for transepithelial transport of polymeric immunoglobulins (IgA and IgM). Mostov et al. (16) have called the later protein the poly-Ig receptor or poly-IgR and have shown that it is the precursor of SC."[18]

The immunoglobulin supergene family is "the group of proteins that have immunoglobulin-like domains, including histocompatibility antigens, the T-cell antigen receptor, poly-IgR, and other proteins involved in the vertebrate immune response (17)."[18]

"The internal homology in primary structure [...] and the presence of an intrasegment disulfide bond suggest that 𝛂1B is composed of five structural domains that arose by duplication of a primordial gene coding for about 95 amino acid residues."[18]

"Unlike immunoglobulins (25), ceruloplasmin (6), and hemopexin (7), 𝛂1B is not subject to limited interdomain cleavage by proteolytic enzymes. At least, we were not able to produce such fragments by use of a variety of proteases. This stability of 𝛂1B is probably associated with the frequency of proline in the sequences linking the domains [...]."[18]

"A peptide identified in the late and early milk proteomes showed homology to eutherian alpha 1B glycoprotein (A1BG), a plasma protein with unknown function46, as well as venom inhibitors characterised in the Southern opossum Didelphis marsupialis (DM43 and DM4647,48,49), all members of the immunoglobulin superfamily. To characterise the relationship between the peptide sequence identified in koala, A1BG, DM43 and DM46, a phylogenetic tree was constructed [...] including all marsupial and monotreme homologs (identified by BLAST), three phylogenetically representative eutherian sequences, with human IGSF1 and TARM1, related members of the immunoglobulin super family, used as outgroups. This phylogeny indicates that A1BG-like proteins in marsupials and the Didelphis antitoxic proteins are homologs of eutherian A1BG, with excellent bootstrap support (98%). The marsupial A1BG-like sequences and the Didelphis antitoxic proteins formed a single clade with strong bootstrap support (97%)."[19]

"Human TARM1 and IGSF1, related members of the immunoglobulin superfamily are used as outgroups. The tree was constructed using the maximum likelihood approach and the JTT model with bootstrap support values from 500 bootstrap tests. Bootstrap values less than 50% are not displayed. Accession numbers: Tasmanian devil (Sarcophilus harrisii; XP_012402143), Wallaby (Macropus eugenii; FY619507), Possum (Trichosurus vulpecula; DY596639) Virginia opossum (Didelphis virginiana; AAA30970, AAN06914), Southern opossum (Didelphis marsupialis; AAL82794, P82957, AAN64698), Human (Homo sapiens; P04217, B6A8C7, Q8N6C5), Platypus (Ornithorhychus anatinus; ENSOANP00000000762), Cow (Bos taurus; Q2KJF1), Alpaca (Vicugna pacos; XP_015107031)."[19]

"The sequences of 𝛂1B-glycoprotein (38) and chicken N-CAM (neural cell-adhesion molecule) (39) have been shown to be related to the immunoglobulin supergene family."[20]

A1BG contains the immunoglobulin domain: cl11960 and three immunoglobulin-like domains: pfam13895, cd05751 and smart00410.

"Immunoglobulin (Ig) domain [cl11960] found in the Ig superfamily. The Ig superfamily is a heterogenous group of proteins, built on a common fold comprised of a sandwich of two beta sheets. Members of this group are components of immunoglobulin, neuroglia, cell surface glycoproteins, such as, T-cell receptors, CD2, CD4, CD8, and membrane glycoproteins, such as, butyrophilin and chondroitin sulfate proteoglycan core protein. A predominant feature of most Ig domains is a disulfide bridge connecting the two beta-sheets with a tryptophan residue packed against the disulfide bond."[21]

"This domain [pfam13895] contains immunoglobulin-like domains."[22]

"Ig1_LILR_KIR_like: [cd05751] domain similar to the first immunoglobulin (Ig)-like domain found in Leukocyte Ig-like receptors (LILRs) and Natural killer inhibitory receptors (KIRs). This group includes LILRB1 (or LIR-1), LILRA5 (or LIR9), an activating natural cytotoxicity receptor NKp46, the immune-type receptor glycoprotein VI (GPVI), and the IgA-specific receptor Fc-alphaRI (or CD89). LILRs are a family of immunoreceptors expressed on expressed on T and B cells, on monocytes, dendritic cells, and subgroups of natural killer (NK) cells. The human LILR family contains nine proteins (LILRA1-3,and 5, and LILRB1-5). From functional assays, and as the cytoplasmic domains of various LILRs, for example LILRB1 (LIR-1), LILRB2 (LIR-2), and LILRB3 (LIR-3) contain immunoreceptor tyrosine-based inhibitory motifs (ITIMs) it is thought that LIR proteins are inhibitory receptors. Of the eight LIR family proteins, only LIR-1 (LILRB1), and LIR-2 (LILRB2), show detectable binding to class I MHC molecules; ligands for the other members have yet to be determined. The extracellular portions of the different LIR proteins contain different numbers of Ig-like domains for example, four in the case of LILRB1 (LIR-1), and LILRB2 (LIR-2), and two in the case of LILRB4 (LIR-5). The activating natural cytotoxicity receptor NKp46 is expressed in natural killer cells, and is organized as an extracellular portion having two Ig-like extracellular domains, a transmembrane domain, and a small cytoplasmic portion. GPVI, which also contains two Ig-like domains, participates in the processes of collagen-mediated platelet activation and arterial thrombus formation. Fc-alphaRI is expressed on monocytes, eosinophils, neutrophils and macrophages; it mediates IgA-induced immune effector responses such as phagocytosis, antibody-dependent cell-mediated cytotoxicity and respiratory burst."[23]

"IG domains [smart00410] that cannot be classified into one of IGv1, IGc1, IGc2, IG."[24] "𝛂1B-glycoprotein(𝛂1B) [...] consists of a single polypeptide chain N-linked to four glucosamine oligosaccharides. The polypeptide has five intrachain disulfide bonds and contains 474 amino acid residues. [...] 𝛂1B exhibits internal duplication and consists of five repeating structural domains, each containing about 95 amino acids and one disulfide bond. [...] several domains of 𝛂1B, especially the third, show statistically significant homology to variable regions of certain immunoglobulin light and heavy chains. 𝛂1B [...] exhibits sequence similarity to other members of the immunoglobulin supergene family such as the receptor for transepithelial transport of IgA and IgM and the secretory component of human IgA."[18]

A1BG protein species

Def. a "group of plants or animals having similar appearance"[25] or "the largest group of organisms in which [any][26] two individuals [of the appropriate sexes or mating types][26] can produce fertile offspring, typically by sexual reproduction"[27] is called a species.

The gene contains 20 distinct introns.[28] Transcription produces 15 different mRNAs, 10 alternatively spliced variants and 5 unspliced forms.[28] There are 4 probable alternative promoters, 4 non overlapping alternative last exons and 7 validated alternative polyadenylation sites.[28] The mRNAs appear to differ by truncation of the 5' end, truncation of the 3' end, presence or absence of 4 cassette exons, overlapping exons with different boundaries, splicing versus retention of 3 introns.[28]

Variants or isoforms

Def. a "different sequence of a gene (locus)"[29] is called a variant.

Def. any "of several different forms of the same protein, arising from either single nucleotide polymorphisms,[30] differential splicing of mRNA, or post-translational modifications (e.g. sulfation, glycosylation, etc.)"[31] is called an isoform.

Regarding additional isoforms, mention has been made of "new genetic variants of A1BG."[32]

"Proteomic analysis revealed that [a circulating] set of plasma proteins was α 1 B-glycoprotein (A1BG) and its post-translationally modified isoforms."[33]

Pharmacogenomic variants have been reported.[34]

Genotypes

Def. the "part (DNA sequence) of the genetic makeup of an organism which determines a specific characteristic (phenotype) of that organism"[35] or a "group of organisms having the same genetic constitution" [36]is called a genotype.

There are A1BG genotypes.[34]

A1BG has a genetic risk score of rs893184.[34]

"A genetic risk score, including rs16982743, rs893184, and rs4525 in F5, was significantly associated with treatment-related adverse cardiovascular outcomes in whites and Hispanics from the INVEST study and in the Nordic Diltiazem study (meta-analysis interaction P=2.39×10−5)."[34]

Polymorphs

Def. the "regular existence of two or more different genotypes within a given species or population; also, variability of amino acid sequences within a gene's protein"[37] is called polymorphism.

Def. "one of a number of alternative forms of the same gene occupying a given position, [or locus],[38] on a chromosome"[39] is called an allele.

"rs893184 causes a histidine (His) to arginine (Arg) [nonsynonymous single nucleotide polymorphism (nsSNP), A (minor) for G (major)] substitution at amino acid position 52 in A1BG."[34]

"Genetic polymorphism of human plasma (serum) alpha 1B-glycoprotein (alpha 1B) was observed using one-dimensional horizontal polyacrylamide gel electrophoresis (PAGE) pH 9.0 of plasma samples followed by Western blotting with specific antiserum to alpha 1B."[40]

A1B*5 is a "new allele [...] of human plasma 𝜶1B-glycoprotein [...]."[41]

"Genetic polymorphism of human plasma 𝜶1B-glycoprotein (𝜶1B) was reported first, in brief, by Altland et al. [1983; also given in Altkand and Hacklar, 1984]. A detailed description of human 𝜶1B polymorphism was reported in subsequent studies [Gahne et al., 1987; Juneja et al., 1988, 1989]. Five different 𝜶1B alleles (A1B*1, A1B*2, A1B*3, A1B*4 and A1B*5) were reported. In Caucasian whites, the frequencies of A1B*1 and ''A1B*2 were about 0.95 and 0.05, respectively. A1B*4 was observed in 2 related Czech individuals. In American blacks, A1B*1 and A1B*2 occurred with a frequency of 0.73 and 0.21, respectively, while a new allele, viz, A1B*3 had a frequency of 0.06. A1B*5 was observed only in Swedish Lapps and in Finns with a frequency of 0.04 and 0.007, respectively."[42]

"The frequency of A1B*1 varied from 0.89 to 0.91 and that of A1B*2 from 0.08 to 0.10. The A1B*3 allele, reported previously only in American blacks, was observed with a frequency range of 0.003-0.01 in 3 of the Chinese populations, in Koreans and in Malays. A new 𝜶1B allele (A1B*6) was observed in 2 Chinese individuals."[42]

Phenotypes

Def. the "appearance of an organism based on a single trait [multifactorial combination of genetic traits and environmental factors][43], especially used in pedigrees"[44] or any "observable characteristic of an organism, such as its morphological, developmental, biochemical or physiological properties, or its behavior"[45] is called a phenotype.

"The three different phenotypes of α1B observed (designated 1-1, 1-2, and 2-2) were apparently identical to those reported by Altland et al. (1983), who used double one-dimensional electrophoresis. Family data supported the hypothesis that the three α1B phenotypes are determined by two codominant alleles at an autosomal locus, designated A1B. Allele frequencies in a Swedish population were: A1B *1, 0.937; A1B *2, 0.063; PIC, 0.111."[40]

Protein species

"Both protein species of [alpha 1-beta glycoprotein] A1B (A1Ba, p = 0.008; f.c.= +1.62, A1Bb, p = 0.003; f.c. = +1.82) [...] were apparently overexpressed in patients with PTCa [...]."[46]

A1BG is mainly produced in the liver, and is secreted to plasma to levels of approximately 0.22 mg/mL.[18]

CRISPs

The human cysteine-rich secretory protein (CRISP3) "is present in exocrine secretions and in secretory granules of neutrophilic granulocytes and is believed to play a role in innate immunity."[47] CRISP3 has a relatively high content in human plasma.[47]

"The A1BG-CRISP-3 complex is noncovalent with a 1:1 stoichiometry and is held together by strong electrostatic forces."[47] "Similar [complex formation] between toxins from snake venom and A1BG-like plasma proteins ... inhibits the toxic effect of snake venom metalloproteinases or myotoxins and protects the animal from envenomation."[47]

Opossums have a remarkably robust immune system, and show partial or total immunity to the venom of rattlesnakes, Agkistrodon piscivorus, cottonmouths, and other Crotalinae, pit vipers.[48][49]

"Crisp3 [is] mainly [expressed] in the salivary glands, pancreas, and prostate."[50] "CRISP3 is highly expressed in the human cauda epididymidis and ampulla of vas deferens (Udby et al. 2005)."[50]

ZNF497

Gene ID: 503538 is A1BG-AS1 A1BG antisense RNA 1.[51] A1BG-AS1 is transcribed in the negative direction from ZSCAN22.[51]

Gene ID: 162968 is ZNF497 zinc finger protein 497.[52] ZNF497 is transcribed in the positive direction from RNA5SP473.[52]

  1. NP_001193938.1 zinc finger protein 497: "Transcript Variant: This variant (2) lacks an alternate exon in the 5' UTR, compared to variant 1. Variants 1 and 2 encode the same protein."[52]
  2. NP_940860.2 zinc finger protein 497: "Transcript Variant: This variant (1) is the longer transcript. Variants 1 and 2 encode the same protein."[52]

Gene ID: 100419840 is LOC100419840 zinc finger protein 446 pseudogene.[53] LOC100419840 may be transcribed in the positive direction from LOC105372483.[53]

Gene ID: 105372483 is LOC105372483 uncharacterized LOC105372483 ncRNA.[54] LOC105372483 is transcribed in the negative direction from LOC100419840.[54]

Gene ID: 106479017 is RNA5SP473 RNA, 5S ribosomal pseudogene 473.[55] RNA5SP473 may be transcribed in the negative direction from ZNF497.[55]

19q13.43

Regulatory elements and regions

It may be still fair to say that in the apparent present era of functional genomics, the challenge is to elucidate gene function such as that of A1BG, its likely regulatory networks and signaling pathways.[56] "Since regulation of gene expression in vivo mainly occurs at the transcriptional level, identifying the location of genetic regulatory elements is a key to understanding the machinery regulating gene transcription. A major goal of current genome research is to identify the locations of all gene regulatory elements, including promoters, enhancers, silencers, insulators and boundary elements, and to analyze their relationship to the current annotation of human genes."[57][58] Although "many genome-wide strategies have been developed for identifying functional elements", "no method yet has the resolution to precisely identify all regulatory elements or can be readily applied to the entire human genome."[59]

There is one CRISPRi-validated cis-regulatory element on 19q13.43: Gene ID: 116286197 LOC116286197. And, four Sharpr-MPRA regulatory regions: (1) Gene ID: 112553117 LOC112553117 Sharpr-MPRA regulatory region 1998, Gene ID: 112553119 LOC112553119 Sharpr-MPRA regulatory region 10473, Gene ID: 112577453 LOC112577453 Sharpr-MPRA regulatory region 7872, and Gene ID: 112577454 is Sharpr-MPRA regulatory region 9894.

Def. nucleotide "sequences, usually upstream, which are recognized by specific regulatory transcription factors, thereby causing gene response to various regulatory agents", [that] "may be found in both promoter and enhancer regions"[60] are called response elements.

DNase I hypersensitive sites

"This genomic region represents a DNase I hypersensitive site (DHS) that was predicted to be an enhancer by the ENCODE (ENCyclopedia Of DNA Elements) project based on various combinations of H3K27 acetylation and binding of p300, GATA1 and RNA polymerase II in K562 erythroleukemia cells. It was validated as a high-confidence cis-regulatory element for the ZNF582 (zinc finger protein 582) gene on chromosome 19 based on multiplex CRISPR/Cas9-mediated perturbation in K562 cells."[61]

Gene ID: 116286197 CRISPRi-validated cis-regulatory element chr19.6329 is at NC_000019.10 (56186901..56187499).[61]

Gene ID: 147948 ZNF582 is at NC_000019.10 (56382751..56393585, complement).[62] The CRISPRi-validated cis-regulatory element chr19.6329 is (56382751 - 56186901) = 195850 nts from the beginning of ZNF582.

Transcriptional regulatory regions

"This genomic sequence was predicted to be a transcriptional regulatory region based on chromatin state analysis from the ENCODE (ENCyclopedia Of DNA Elements) project. It was validated as a functional enhancer by the Sharpr-MPRA technique (Systematic high-resolution activation and repression profiling with reporter tiling using massively parallel reporter assays) in K562 erythroleukemia cells (group: K562 Activating DNase unmatched - State 1:Tss, active promoter, TSS/CpG island region), with weaker activation in HepG2 liver carcinoma cells (group: HepG2 Activating DNase matched - State 1:Tss)."[63]

"This genomic sequence was predicted to be a transcriptional regulatory region based on chromatin state analysis from the ENCODE (ENCyclopedia Of DNA Elements) project. It was validated as a functional enhancer by the Sharpr-MPRA technique (Systematic high-resolution activation and repression profiling with reporter tiling using massively parallel reporter assays) in HepG2 liver carcinoma cells (group: HepG2 Activating DNase matched - State 5:Enh, candidate strong enhancer, open chromatin). It also displayed weak repressive activity by Sharpr-MPRA in K562 erythroleukemia cells (group: K562 Repressive non-DNase unmatched - State 24:Quies, heterochromatin/dead zone)."[64]

"This genomic sequence was predicted to be a transcriptional regulatory region based on chromatin state analysis from the ENCODE (ENCyclopedia Of DNA Elements) project. It was validated as a functional enhancer by the Sharpr-MPRA technique (Systematic high-resolution activation and repression profiling with reporter tiling using massively parallel reporter assays) in both HepG2 liver carcinoma cells (group: HepG2 Activating DNase unmatched - State 1:Tss, active promoter, TSS/CpG island region) and K562 erythroleukemia cells (group: K562 Activating DNase unmatched - State 1:Tss)."[65]

"This genomic sequence was predicted to be a transcriptional regulatory region based on chromatin state analysis from the ENCODE (ENCyclopedia Of DNA Elements) project. It was validated as a functional enhancer by the Sharpr-MPRA technique (Systematic high-resolution activation and repression profiling with reporter tiling using massively parallel reporter assays) in K562 erythroleukemia cells (group: K562 Activating DNase unmatched - State 1:Tss, active promoter, TSS/CpG island region), with weaker activation in HepG2 liver carcinoma cells (group: HepG2 Activating DNase matched - State 1:Tss)."[66]

"The growth hormone-regulated transcription factors STAT5 and BCL6 coordinately regulate sex differences in mouse liver, primarily through effects in male liver, where male-biased genes are upregulated and many female-biased genes are actively repressed."[67] "CUX2, a highly female-specific liver transcription factor, contributes to an analogous regulatory network in female liver. Adenoviral overexpression of CUX2 in male liver induced 36% of female-biased genes and repressed 35% of male-biased genes. In female liver, CUX2 small interfering RNA (siRNA) preferentially induced genes repressed by adenovirus expressing CUX2 (adeno-CUX2) in male liver, and it preferentially repressed genes induced by adeno-CUX2 in male liver. CUX2 binding in female liver chromatin was enriched at sites of male-biased DNase hypersensitivity and at genomic regions showing male-enriched STAT5 binding. CUX2 binding was also enriched near genes repressed by adeno-CUX2 in male liver or induced by CUX2 siRNA in female liver but not at genes induced by adeno-CUX2, indicating that CUX2 binding is preferentially associated with gene repression. Nevertheless, direct CUX2 binding was seen at several highly female-specific genes that were positively regulated by CUX2, including A1bg [A1BG in humans], Cyp2b9, Cyp3a44, Tox [TOX in humans], and Trim24 [TRIM24 in humans]."[67]

ABA-response elements

"The key cis-elements in [non-yellow coloring 1] NYC1 promoter, namely, ABA-response element (ABRE) (ACGTG), ACGT, GCCcore (GCCGCC), and ethylene-inducible 3 [EIN3-Like1] (EIN3)/EIL1-binding sequence (T[TAG][GA]CGT[GA][TCA][TAG]), can be targeted by ABA insensitive 3 (ABI3), ABI5, and ABF2, 3, 4 in the ABA-signaling pathway [60,61]. GCCGCC and EIN3/EIL1-binding sequence (T[TAG][GA]CGT[GA][TCA][TAG]) are induced by ethylene-inducible TF and EIN3/EIL1 in the ethylene signaling pathway [61]. Therefore, ABA signaling is crucial for [chlorophyll] Chl b reductase activities to catalyze the Chl degradation, the first part of leaf senescence."[68]

Abf1 regulatory factors

Specific "sequences considered as exact Abf1 motif occurrences": CGTNNNNNACGA(C/T), CGTNNNNNA(C/T)GAC, CGTNNNNNA(C/T)GA(C/T), CGTNNNNN(A/G)(C/T)GA(C/T).[69]

Copying a consensus Abf1 regulatory factor: 3'-CGTNNNNNACGAT-5' and putting the sequence in "⌘F" locates 2 to 5 of this sequence in the ZSCAN22 to A1BG direction, and none between ZNF497 and A1BG as can be found by the computer programs.

A boxes

Most bZIP proteins show high binding affinity for the ACGT motifs, which include [...] TACGTA (A box) [...].[70][71][72]

"The human TGF-β1 promoter region contains two binding sequences for AP-1, designated AP-1 box A (TGACTCT) and box B (TGTCTCA), which mediate the up-regulation of promoter activity after [High glucose] HG stimulation."[73]

There is one A box on the positive strand in the negative direction (from ZSCAN22 to A1BG): 3'-TGACTCT-5' at 2788.

There is one A box complement on the negative strand in the negative direction: 3'-ACTGAGA-5' at 2788.

There is one A box inverse complement on the negative strand in the positive direction: 3'-AGAGTCA-5' at 2613.

There is one A box inverse on the positive strand in the positive direction: 3'-TCTCAGT-5' at 2613.

Abscisic acid-responsive elements

Abscisic acid-responsive elements (CACGTG).[74]

ACA boxes

The "3' end of mature hTR (45) has an ACA trinucleotide 3 nt upstream of its 3' end. In addition, the 3' region of hTR contains a single H box consensus sequence (5'-AGAGGA-3')."[75]

ACGT-containing elements

The consensus sequence for the ACGT-containing elements (ACEs) is 5'-CACGT-3'.[76]

  1. ACGT elements, negative strand, negative direction: 24 between 150 and 4338 nts.
  2. ACGT elements, negative strand, positive direction: 2, 3'-ACGT-5' at 569, 3'-ACGT-5' at 3254.
  3. ACGT elements, positive strand, negative direction: 4, 3'-ACGT-5' at 342, 3'-ACGT-5' at 531, 3'-ACGT-5' at 1772, 3'-ACGT-5' at 4236.
  4. ACGT elements, positive strand, positive direction: 44 between 192 and 4341 nts.

ACGT-containing elements include these metal responsive elements:

  1. complement, negative strand, negative direction: 6 between 1348 and 4341 nts.
  2. complement, positive strand, negative direction: 6 between 549 and 3323 nts.
  3. inverse, negative strand, negative direction: 2, 3'-CTCACGT-5' at 1470, 3'-CACACGT-5' at 2863.
  4. inverse, positive strand, negative direction: 2, 3'-CACACGT-5' at 531, 3'-CTCACGT-5' at 1772.
  5. inverse, positive strand, positive direction: 6 between 546 and 3883 nts.

ACGT-containing elements include these cAMP response elements (CRE):

  1. negative strand in the negative direction (from ZSCAN22 to A1BG): 1, 3'-TGACGTCA-5' at 4317.

Activating protein 2

Consensus sequences for the Activating protein 2 (AP-2) are TCTTCCC, CTCCCA and GGCCAA.[77]

Consensus sequences for the Activating protein 2 (AP-2) are GCCTGGCC and TCCCCCGCCC.[78]

Activating transcription factors

"ATF4 regulates transcription of its target genes through the formation of homodimers or heterooligomers with the transcription factors Jun, AP-1 and C/EBP38,39 that bind to CARE (C/EBP-ATF) responsive elements having the consensus sequence XTTXCATCA (where X = G, A or T).39 In the region from -625 to -618 bp relative to the SESN2 translation start codon (from -228 to -221 bp relative to the transcription start site) we found a candidate sequence for the ATF4 binding site TTTTCATCA."[79]

"The ATF4 binding consensus sequence has been reported as (G/A/C)TT(G/A/T)C(G/A)TCA (38), which matches the ChIP-seq data."[80]

Adr1ps

The upstream activating sequence (UAS) for Adr1p is 5'-TTGGGG-3' or 5'-TTGG(A/G)G-3'.[81]

Copying 5'-TTGGGG-3' in "⌘F" yields six between ZSCAN22 and A1BG and one between ZNF497 and A1BG as can be found by the computer programs.

Aft1ps

The upstream activating sequence (UAS) for Aft1p is 5'-PyPuCACCCPu-3' or 5'-(C/T)(A/G)CACCC(A/G).[81]

Copying 5'-TGCACCC-3' in "⌘F" yields none between ZSCAN22 and A1BG and one 5'-TGCACCCG-3' between ZNF497 and A1BG as can be found by the computer programs.

AGC boxes

An inverse AGC box occurs negative strand, negative direction, 3'-CCGCCGA-5' at 1754 nts from ZSCAN22 toward A1BG in the distal promoter with its complement on the positive strand, negative direction.

The GCC box is the same as the AGC box.

Alpha-amylase conserved elements

Alpha-amylase conserved elements (TATCCATCCATCC).[74]

Amino acid response elements

There is only one nucleotide difference between the SESN2 gene CARE and the amino acid response element (AARE) in the pseudokinase gene TRIB3 with the consensus sequence (TTTGCATCA).[79]

Androgen response elements

Androgen response elements (AREs).

Angiotensinogen core promoter elements

The consensus sequence is 3'-A/C-T-C/T-5'.[82] The core nucleotides for AGCE1 include 3'-A/C-T-C/T-G-T-G-5', "located between the TATA box and transcription initiation site (positions −25 to −1) is an authentic regulator of human AG transcription."[83]

  1. AGCE, negative strand, negative direction, looking for 3'-A/C-T-C/T-G-T-G-5': 4 between 340 and 3914 nts and complements.
  2. AGCE, negative strand, positive direction, looking for 3'-A/C-T-C/T-G-T-G-5': 2, 3'-ATTGTG-5' at 2679, 3'-CTCGTG-5' at 4376 and complements.
  3. AGCE, positive strand, positive direction, looking for 3'-A/C-T-C/T-G-T-G-5': 6 between 855 and 3739 and complements.
  4. AGCEci, negative strand, negative direction, looking for 3'-C-A-C-A/G-A-G/T-5': 2, 3'-CACGAT-5' at 336, 3'-CACGAG-5' at 4403 and complements.
  5. AGCEci, negative strand, positive direction, looking for 3'-C-A-C-A/G-A-G/T-5': 1, 3'-CACGAG-5' at 243.
  6. AGCEci, positive strand, negative direction, looking for 3'-C-A-C-A/G-A-G/T-5': 10 between 435 and 4472 and complements.
  7. AGCEci, positive strand, positive direction, looking for 3'-C-A-C-A/G-A-G/T-5': 3 between 107 and 3152 and complements.

Antioxidant/electrophile responsive elements

"The transcription factor Nrf2 (nuclear factor erythroid 2 p45‐related factor 2) regulates the expression of genes involved in cellular protection against damage by oxidants, electrophiles, and inflammatory agents, and in the maintenance of mitochondrial function, cellular redox, and protein homeostasis [1]. Nrf2 protein comprises seven functional domains termed Nrf2‐ECH homology (Neh) 1–7 domains [...]."[84]

"At homeostatic conditions, Nrf2 is a short‐lived protein. Under stress conditions, Nrf2 is stabilized and translocates to the nucleus, where it binds (as a heterodimer with a member of the small Maf family of transcription factors) to the ARE/EpRE sequences in the promoter of its target genes, and activates their transcription. Nrf2 targets include genes that encode detoxification, antioxidant, and anti‐inflammatory proteins as well as proteins involved in the regulation of autophagy and clearance of damaged proteins, such as proteasomal subunits [9-11]."[84]

"Neh1 is responsible for the formation of a heterodimer with small musculoaponeurotic fibrosarcoma (sMaf) proteins, and mediates binding to antioxidant/electrophile response element (ARE/EpRE) sequences in the promoter regions of Nrf2 target genes."[84]

"A meta‐analysis of [Parkinson's disease] PD and [Alzheimer's disease] AD microarray datasets identified 31 common downregulated genes containing the ARE/EpRE consensus sequence in their promoters, in addition to increased levels of Nrf2 [27]."[84]

"Nrf2 binds an upstream response element in the frataxin locus, and the anesthetic dyclonine has been shown to activate Nrf2, increase the mRNA and protein levels of frataxin and rescue frataxin‐dependent enzyme deficiencies in the iron‐sulfur enzymes aconitase and succinate dehydrogenase [54]."[84]

The "Nrf2-sMaf heterodimer recognizes DNA sequences referred to as the antioxidant/electrophile responsive element (ARE/EpRE)".[85] "We have compared these binding sequences and found that they show a common consensus sequence, 5′-(A/G)TGA(G/C)nnnGC-3′, but these recognition elements are partially distinct from the element bound by Maf homodimers."[85]

ATA boxes

Core promoters

There is the following inverse ATA box on the negative strand, negative direction: 1, 3'-AAATAA-5' at 4537 inside A1BG as the TSS is at 4460 nts from ZSCAN22.

Proximal promoters

There is the following inverse ATA box on the positive strand, negative direction: 3'-AAATAA-5' at 4221.

There is one inverse and inverse complement between 4050 and 4300 in the positive direction: 3'-AAATAA-5' at 4142, and 3'-TTTATT-5' at 4142.

Distal promoters

There is the following ATA box on the negative strand in the negative direction: 1, 3'-AATAAA-5' at 1726 nts from ZSCAN22.

There are the following ATA boxes on the positive strand in the negative direction: 3, 3'-AATAAA-5' at 3014, 3'-AATAAA-5' at 3335, and 3'-AATAAA-5' at 4072.

There are the following inverse ATA boxes on the positive strand, negative direction: 4, 3'-AAATAA-5' at 3013, 3'-AAATAA-5' at 3334, 3'-AAATAA-5' at 4071, 3'-AAATAA-5' at 4075.

There is the following ATA box on the negative strand in the positive direction: 1, 3'-AATAAA-5' at 3427. It has a complement on the positive strand in the positive direction: 1, 3'-TTATTT-5' at 3427.

There is another inverse complement ATA box on the negative strand in the positive direction in distal promoter: 3'-TTTATT-5' at 2347. It also has an inverse in the distal promoter: 3'-AAATAA-5' at 2347.

Auxin response factors

The "genome binding of two [auxin response factors] ARFs (ARF2 and ARF5/Monopteros [MP]) differ largely because these two factors have different preferred ARF binding site (ARFbs) arrangements (orientation and spacing)."[86] "ARFbs were originally defined as TGTCTC (Ulmasov et al., 1995, Guilfoyle et al., 1998), [...]. More recently, protein binding microarray (PBM) experiments suggested that TGTCGG are preferred ARFbs, [...] (Boer et al., 2014, Franco-Zorrilla et al., 2014, Liao et al., 2015)."[86]

A more general consensus sequence may be 1(C/G/T)-2N-3(G/T)-4G-5(C/T)-6(C/T)-7N-8N-9N-10N, where ARF2[b] is 1(C/G/T)-2(A/C/T)-3(G/T)-4G-5(C/T)-6(C/T)-7(G/T)-8(C/G)-9(A/C/T)-10(A/G/T) and ARF5/MP[b] is 1(C/G/T)-2N-3(G/T)-4G-5T-6C-7(G/T)-8N-9-10N.[86] ARF1[b] has 4G.[86]

B boxes

While there appear to be at least two B boxes, TGGGCA is one B-box,[87] where the "mP2 EB fragment used for binding was the 118 nucleotide fragment extending from the Dde I site at position -140 to the Dde I site at position -23 [...]. This fragment contains the GC, E, B, CAAT, and TATA boxes."[87]

  1. negative strand in the positive direction, looking for 3'-TGGGCA-5', 4 between 27 and 4180 and complements.
  2. positive strand in the negative direction, looking for 3'-TGGGCA-5', 9 between 462 and 4191 and complements.
  3. inverse complement, negative strand, positive direction, looking for 3'-TGCCCA-5', 2, 3'-TGCCCA-5' at 3237, 3'-TGCCCA-5' at 3377 and complements.
  4. inverse complement, positive strand, negative direction, looking for 3'-TGCCCA-5', 4 between 1458 and 4251 and complements.
  5. inverse complement, positive strand, positive direction, looking for 3'-TGCCCA-5', 1, 3'-TGCCCA-5' at 3750 and complement.

The other is associated with the human transforming growth factor b1 binding sequences.[88]

And, has the consensus sequence 3'-TGTCTCA-5'. Let it be designated B1box.

  1. negative strand in the negative direction, looking for 3'-TGTCTCA-5', 2, 3'-TGTCTCA-5' at 1075, 3'-TGTCTCA-5' at 2445 and complements.
  2. negative strand in the positive direction, looking for 3'-TGTCTCA-5', 2, 3'-TGTCTCA-5'at 2174, 3'-TGTCTCA-5' at 2468 and complements.
  3. positive strand in the negative direction, looking for 3'-TGTCTCA-5', 5 between 923 and 4373 and complements.
  4. inverse complement, negative strand, negative direction, looking for 3'-TGAGACA-5', 3 between 919 and 2029 and complements.
  5. inverse complement, positive strand, positive direction, looking for 3'-TGAGACA-5', 1, 3'-TGAGACA-5' at 2308 and complement.

B recognition elements

The factor II B recognition element is BREu.

Negative strand in the negative direction there are 3: 3'-CCACGCC-5' at 380, 3'-CCGCGCC-5' at 1762, and 3'-CCACGCC-5' at 2197 the distal promoter.

Complement, negative strand, negative direction there us 1: 3'-CCTGCGG-5' at 1153.

Inverse complement, positive strand, negative direction there are 4: 3'-GGCGTGG-5' at 1244, 3'-GGCGCGG-5' at 1762, 3'-GGCGTGG-5' at 1897, and 3'-GGCGTGG-5' at 3047.

Negative strand in the positive direction there are 3: 3'-GCACGCC-5', 1302, 3'-GGACGCC-5', 1672, 3'-GGGCGCC-5', 1769.

Positive strand in the positive direction there are 3: 3'-CCACGCC-5', 489, 3'-CGACGCC-5', 1033, 3'-CCACGCC-5', 1764.

Inverse complement, negative strand, positive direction there is 1: 3'-GGCGCCC-5', 1770.

Inverse complement, positive strand, positive direction there is 4: 3'-GGCGCGC-5', 682, 3'-GGCGCCG-5', 1338, 3'-GGCGCCG-5', 1438, 3'-GGCGTGG-5', 2566.

CadC binding domains

"Altogether, the specific contacts observed suggest a consensus binding motif of 5′-T-T-A-x-x-x-x-T-3′."[89] "Dimerization of CadC enables the binding of two DBDs to the two Cad1 consensus target sites."[89] "The DNA consensus sequence 5′-T-T-A-x-x-x-x-T-3′ is present once in the quasi-palindromic Cad1 17-mer DNA, consistent with the formation of a 1:1 complex. However, a second consensus facilitates the formation of the 2:1 complex of CadC with Cad1 41-mer DNA as evidenced by the CadC model with the minimal Cad1 26-mer DNA that spans the two AT-rich regions, i.e. consensus sites."[89]

Calcium-response elements

The calcium-responsive transcription factor (CaRF, also known as amyotrophic lateral sclerosis 2 chromosomal region candidate gene 8 protein) acts as a transcriptional activator that mediates the calcium- and neuron-selective induction of BDNF exon III transcription and binds to the consensus calcium-response element CaRE1 5'-CTATTTCGAG-3' sequence.[90]

CAREs

A CARE occurs in the negative direction: 3'-CAACTC-5' at 86 possibly associated with ZSCAN22. But inverse CAREs occur 3'-CTCAAC-5' at 1406, 3'-CTCAAC-5' at 2592, 3'-CTCAAC-5' at 2704, 3'-CTCAAC-5' at 3115, and 3'-CTCAAC-5' at 4096.

A CARE occurs in the positive direction: 3'-CAACTC-5' at 3292 in the positive direction. But inverse CARE occur 3'-CTCAAC-5' at 1406 and 3'-CTCAAC-5' at 1621 and 3'-CTCAAC-5' at 3290.

CArG boxes

"RIN [Ripening Inhibitor] binds to DNA sequences known as the CA/T-rich-G (CArG) box, which is the general target of MADS box proteins (Ito et al., 2008)."[91]

There is a more general CArG box, 3'-CATTAAAAGG-5', at 3441 from ZSCAN22, or -1019 nts from the TSS of A1BG in the negative direction on the positive strand in the distal promoter.

A second more general CArG box, 3'-CAAAAAAAAG-5', at 1399 from ZSCAN22, or -3061 nts from the A1BG TSS may be a CArG box for ZSCAN22 in the negative direction on the positive strand in the distal promoter.

C boxes

Most bZIP proteins show high binding affinity for the ACGT motifs, which include [...] GACGTC (C box) [...].[70][71][72]

Proximal promoters

Inverse complement, negative strand, negative direction there is 1: 3'-ACATCA-5', 4124.

There is one C box 3'-ACATCA-5' at 4116 nts in the positive direction.

Distal promoters

There are four C boxes: 3'-AGTAGT-5' at 2888, 3'-AGTAGT-5' at 2944, 3'-AGTAGT-5' at 3418, and 3'-AGTAGT-5' at 3521 on the negative strand in the negative direction and its complement on the positive strand.

Inverse complement, negative strand, negative direction there are 2: 3'-ACATCA-5', 2340, 3'-ACATCA-5', 2541.

There is one complement C box: 3'-TCATCA-5' at 3251 on the negative strand in the positive direction and its complement on the positive strand.

Inverse, negative strand, positive direction, there is 1: 3'-TGATGA-5', 2144.

Positive strand in the positive direction there is 1: 3'-AGTAGT-5', 3251.

C-boxes

Analysis "of the recombinant (soybean [Glycine max] TGACG-motif binding factor 1) STF1 protein revealed the C-box (nGACGTCn) to be a high-affinity binding site (Cheong et al., 1998). [...] To test whether STF1 and HY5 have similar DNA-binding properties, the binding properties of each were compared with eight different DNA sequences that represent G-, C-, and C/G-box motifs [TGACGTGT]. C-box sequences carrying the mammalian cAMP responsive element (CRE; TGACGTCA) motif and the Hex sequence (TGACGTGGC), a hybrid C/G-box (Cheong et al., 1998), were high-affinity binding sites for both proteins [...]."[92]

C-boxes are TCTTACGTCATC, AATGACGTCGAA, TCTCACGTGTGG, TTTGACGTGTGA, GATGACGTCATC, and AGAGACGTCAAC for an apparent consensus sequence of (A/G/T)(A/C/G/T)(A/T)(C/G/T)ACGT(C/G)(A/G/T)(A/G/T)(A/C/G).[92]

GAGGCCATCT is a C-box, [...].[87]

The human ribosomal protein L11 gene (HRPL11) has [...] two potential snRNA-coding sequences in intron 4: the C box beginning at +4131 (GGTGATG), [...] a D box beginning at +4237 (TCCTG), [...].[93]

CCAAT-box-binding transcription factors

CAAT boxes: CCAAT-box-binding transcription factor, TGGCA-binding protein are used by some nuclear factors.[94]

The consensus sequence for the CCAAT-enhancer-binding site (C/EBP) is TAGCATT.[77]

"TTAGGACAT is the C/EBP box".[95]

"ATF4 regulates transcription of its target genes through the formation of homodimers or heterooligomers with the transcription factors Jun, AP-1 and C/EBP38,39 that bind to CARE (C/EBP-ATF) responsive elements having the consensus sequence XTTXCATCA (where X = G, A or T).39 In the region from -625 to -618 bp relative to the SESN2 translation start codon (from -228 to -221 bp relative to the transcription start site) we found a candidate sequence for the ATF4 binding site TTTTCATCA."[79]

"The ATF4 binding consensus sequence has been reported as (G/A/C)TT(G/A/T)C(G/A)TCA (38), which matches the ChIP-seq data."[80]

Cell cycle regulation

Cell cycle regulation (CCCAACGGT).[74]

CGCG boxes

Negative strand in the negative direction there are 2: 3'-GCGCGT-5', 161, 3'-CCGCGC-5', 1761, in the distal promoter.

Positive strand in the negative direction there is 1: 3'-GCGCGG-5', 1762, in the distal promoter.

Negative strand in the positive direction there are 8: between 543 and 1650, in the distal promoter.

Positive strand in the positive direction there are 22: between 161 and 1769, in the distal promoter.

Circadian control elements

Circadian control elements (CAANNNNATC).[74]

Cold-responsive elements

A "putative cold-responsive element (CRE) [...] is specified by a conserved 5-bp core sequence (CCGAC) typical for C-repeat (CRT)/dehydration-responsive elements (DRE) that are recognized by cold-specific transcription factors (TFs) [16]."[96]

CRE boxes

"Within the cAMP-responsive element of the somatostatin gene, we observed an 8-base palindrome, 5'-TGACGTCA-3', which is highly conserved in many other genes whose expression is regulated by cAMP."[97]

Negative strand in the negative direction there is 1: 3'-TGACGTCA-5', 4317, and its complement in the proximal promoter.

The upstream activating sequence (UAS) for the Aca1p, the basic "leucine zipper (bZIP) transcription factor [55] involved in carbon source utilization" is 5'-TGACGTCA-3'[81] the same as a CRE.

The upstream activating sequence (UAS) for the Sko1p, involved "in osmotic and oxidative stress responses" is 5'-TGACGTCA-3'[81] the same as a CRE.

CTCFs

"Experiments using chromatin immunoprecipitation exonuclease (ChIP-exo) uncovered a broad CTCF-binding motif that contains a 12–15 bp consensus sequence, 5′-NCA-NNA-G(G/A)N-GGC-(G/A)(C/G)(T/C)-3′ (Nakahashi et al., 2013, Rhee and Pugh, 2011) [...]."[98]

DAF-16 binding elements

"DAF-16 binding element (DBE), GTAAACA or TGTTTAC, and DAF-16-associated element (DAE), TGATAAG or CTTATCA, enriched in DAF-16 regulated genes [2, 4, 13, 14]. The DBE was recognized by DAF-16, and the DAE by transcription factor PQM-1 [13]."[99] Note: GTAAACA is the inverse complement (ic) of TGTTTAC and TGATAAG is the ic of CTTATCA.

DNA replication-related elements

"The promoters of Drosophila genes encoding DNA replication-related proteins contain transcription regulatory elements consisting of an 8-bp palindromic DNA replication-related element (DRE) sequence (5′-TATCGATA)."[100]

Copying the consensus of the DRE: 5'-TATCGATA-3' and putting the sequence in "⌘F" finds no locations for this sequence in any A1BG direction as can be found by the computer programs.

D boxes

There is one D box in the distal promoter: 3'-AGTCTG-5' at 2947 on the negative strand in the negative direction and its complement on the positive strand.

Positive strand in the negative direction there is 1: 3'-AGTCTG-5', 1355.

Inverse complement, positive strand, negative direction there are 2: 3'-CAGACT-5', 15, 3'-CAGACT-5', 1616.

There is one D box in the distal promoter: 3'-AGTCTG-5' at 3923 on the negative strand in the positive direction and its complement on the positive strand.

Inverse complement, negative strand, positive direction there are 2: 3'-CAGACT-5', 1744, 3'-CAGACT-5', 2416.

Inverse complement, positive strand, positive direction there are 3: 3'-CAGACT-5', 2943, 3'-CAGACT-5', 3006, 3'-CAGACT-5', 3924.

D-box (TGAGTGG).[101]

Defense and stress-responsive elements

Defense and stress-responsive elements (ATTTTCTTCA).[74]

Copying the consensus of the DRE: 5'-ATTTTCTTCA-3' and putting the sequence in "⌘F" finds no locations for this sequence in any A1BG direction as can be found by the computer programs.

Downstream B recognition elements

  1. negative strand in the negative direction, looking for 3'-A/G-T-A/G/T-G/T-G/T-G/T-G/T-5', 59: between 68 and 4458 and their complements.
  2. negative strand in the positive direction, looking for 3'-A/G-T-A/G/T-G/T-G/T-G/T-G/T-5', 11: between 56 and 4397 and their complements.
  3. positive strand in the negative direction, looking for 3'-A/G-T-A/G/T-G/T-G/T-G/T-G/T-5', 31: between 43 and 4110 and their complements.
  4. positive strand in the positive direction, looking for 3'-A/G-T-A/G/T-G/T-G/T-G/T-G/T-5', 19: between 72 and 4328 and their complements.
  5. inverse, negative strand, negative direction, is SuccessablesdBREi--.bas, looking for 3'-G/T-G/T-G/T-G/T-A/G/T-T-A/G-5': 44 between 230 and 4454 and their complements.
  6. inverse, negative strand, positive direction, is SuccessablesdBREi-+.bas, looking for 3'-G/T-G/T-G/T-G/T-A/G/T-T-A/G-5', 16: between 59 and 4398 and their complements.
  7. inverse, positive strand, negative direction, is SuccessablesdBREi+-.bas, looking for 3'-G/T-G/T-G/T-G/T-A/G/T-T-A/G-5', 16: between 217 and 3945 and their complements.
  8. inverse, positive strand, positive direction, is SuccessablesdBREi++.bas, looking for 3'-G/T-G/T-G/T-G/T-A/G/T-T-A/G-5', 14: between 72 and 4287 and their complements.

Downstream core elements

In the negative direction on the negative strand, the A1BG transcription start site is at 4460 nucleotides from the last nucleotide of the gene ZSCAN22. In the positive direction on the negative strand, the A1BG transcription start site is at 4300 from well within the gene ZNF497. Downstream core elements are expected downstream of these TSSs. Occurrences before the TSSs can be found on Downstream core element gene transcriptions.

  1. positive strand, negative direction, looking for DCE SI: 3'-CTTC-5' at 4528.
  1. negative strand, negative direction, looking for DCE SII: 3'-CTGT-5', 2, 3'-CTGT-5' at 4468 , 3'-CTGT-5' at 4507.
  2. negative strand, positive direction, looking for DCE SII: 3'-CTGT-5', 1, 3'-CTGT-5' at 4392.
  3. positive strand, positive direction, looking for DCE SII: 3'-CTGT-5', 1, 3'-CTGT-5' at 4332.
  1. negative strand, positive direction, looking for DCE SIII: 3'-AGC-5', 1, 3'-AGC-5' at 4352.
  2. positive strand, negative direction, looking for DCE SIII: 3'-AGC-5', 3, 3'-AGC-5' at 4480, 3'-AGC-5' at 4489, 3'-AGC-5' at 4520.
  3. positive strand, positive direction, looking for DCE SIII: 3'-AGC-5', 1, 3'-AGC-5' at 4374.

Complements

  1. negative strand, negative direction, looking for DCE SIc: 3'-GAAG-5', 1, 3'-GAAG-5' at 4528.
  1. negative strand, positive direction, looking for DCE SIIc: 3'-GACA-5', 1, 3'-GACA-5' at 4332.
  2. positive strand, negative direction, looking for DCE SIIc: 3'-GACA-5', 2, 3'-GACA-5' at 4468, 3'-GACA-5' at 4507.
  3. positive strand, positive direction, looking for DCE SIIc: 3'-GAAG-5', 1, 3'-GACA-5' at 4392.
  1. negative strand, negative direction, looking for DCE SIIIc: 3'-TCG-5', 3, 3'-TCG-5' at 4480, 3'-TCG-5' at 4489, 3'-TCG-5' at 4520.
  2. negative strand, positive direction, looking for DCE SIIIc: 3'-TCG-5', 1, 3'-TCG-5' at 4374.
  3. positive strand, positive direction, looking for DCE SIIIc: 3'-TCG-5', 1, 3'-TCG-5' at 4352.

Inverse complements

  1. looking for DCE SIci: 3'-GAAG-5', same as the complements.
  1. positive strand, negative direction, looking for DCE SIIci: 3'-ACAG-5', 1, 3'-ACAG-5' at 4517.
  2. positive strand, positive direction, looking for DCE SIIci: 3'-ACAG-5', 1, 3'-ACAG-5' at 4366.
  1. negative strand, negative direction, looking for DCE SIIIci: 3'-GCT-5', 1, 3'-GCT-5' at 4471.
  2. negative strand, positive direction, looking for DCE SIIIci: 3'-GCT-5', 4, 3'-GCT-5' at 4312, 3'-GCT-5' at 4321, 3'-GCT-5' at 4372, 3'-GCT-5' at 4390.
  3. positive strand, positive direction, looking for DCE SIIIci: 3'-GCT-5', 1, 3'-GCT-5' at 4356.

Inverses

  1. looking for DCE SIi: 3'-CTTC-5', same as the direct transcript.
  1. negative strand, negative direction, looking for DCE SIIi: 3'-TGTC-5', 1, 3'-TGTC-5' at 4517.
  2. negative strand, positive direction, looking for DCE SIIi: 3'-TGTC-5', 1, 3'-TGTC-5' at 4366.
  1. negative strand, positive direction, looking for DCE SIIIi: 3'-CGA-5', 1, 3'-CGA-5' at 4356.
  2. positive strand, negative direction, looking for DCE SIIIi: 3'-CGA-5', 1, 3'-CGA-5' at 4471.
  3. positive strand, positive direction, looking for DCE SIIIi: 3'-CGA-5', 4, 3'-CGA-5' at 4312, 3'-CGA-5' at 4321, 3'-CGA-5' at 4372, 3'-CGA-5' at 4390.

Downstream promoter elements

  1. negative strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesDPE--.bas, looking for 3'-A/G-G-A/T-C/T-A/C/G-5', 163: between 35 and 4546, and their complements.
  2. negative strand in the positive direction (from ZNF497 to A1BG) is SuccessablesDPE-+.bas, looking for 3'-A/G-G-A/T-C/T-A/C/G-5', 73: between 37 and 4420, and their complements.
  3. positive strand in the negative direction is SuccessablesDPE+-.bas, looking for 3'-A/G-G-A/T-C/T-A/C/G-5', 101: between 32 and 4507, and their complements.
  4. positive strand in the positive direction is SuccessablesDPE++.bas, looking for 3'-A/G-G-A/T-C/T-A/C/G-5', 159: between 8 and 4424, and their complements.
  5. inverse, negative strand, negative direction, is SuccessablesDPEi--.bas, looking for 3'-A/C/G-C/T-A/T-G-A/G-5', 58: between 32 and 4476,
  6. inverse, negative strand, positive direction, is SuccessablesDPEi-+.bas, looking for 3'-A/C/G-C/T-A/T-G-A/G-5', 152: between 8 and 4424.
  7. inverse, positive strand, negative direction, is SuccessablesDPEi+-.bas, looking for 3'-A/C/G-C/T-A/T-G-A/G-5', 174: between 13 and 4546,
  8. inverse, positive strand, positive direction, is SuccessablesDPEi++.bas, looking for 3'-A/C/G-C/T-A/T-G-A/G-5', 95: between 30 and 4420.

E2 boxes

Negative strand in the negative direction there are 5: 3'-ACAGATGT-5', 482, 3'-ACAGATGT-5', 1225, 3'-GCAGTTGG-5', 1514, 3'-ACAGATGT-5', 2989, 3'-ACAGATGT-5', 4213, in the distal promoter.

Positive strand in the negative direction there are 2: 3'-GCAGGTGG-5', 2571, 3'-ACAGATGA-5', 3920.

Inverse complement, negative strand, negative direction there is 1: 3'-CCACCTGT-5', 2117.

Inverse complement, positive strand, negative direction there are 4: 3'-CCACCTGT-5', 394, 3'-ACACCTGT-5', 1131, 3'-GCAACTGC-5', 3851, 3'-ACACCTGT-5', 3970

Negative strand in the positive direction there is 1: 3'-GCAGATGA-5', 37.

EIF4E basal elements

There are no EIF4E basal element, also eIF4E, (4EBE), in either promoter.

Endoplasmic reticulum stress response elements

"The released aminoterminal of ATF6 (ATF6-N) then migrates to the nucleus and binds to the ER stress response element (ERSE) containing the consensus sequence CCAAT-N9-CCACG to activate genes encoding ER chaperones, ERAD components, and XBP1 (Chen et al., 2010; Yamamoto et al., 2004; Yoshida et al., 2001)."[102]

Endosperm expression

Endosperm expression (TGTGTCA).[74]

Enhancer boxes

Core promoters

Proximal promoters

Negative strand, negative direction there is 1: 3'-CAGATG-5' at 4212.

Positive strand, positive direction there is 1: 3'-CAAGTG-5' at 4202.

Distal promoters

Negative strand in the negative direction there are 9: between 324 and 3482.

Positive strand in the negative direction there are 21: between 41 and 4011.

Negative strand in the positive direction there are 26: between 196 and 4015.

Positive strand in the positive direction there are 10: between 186 and 3936.

Estrogen response elements

Estrogen response elements (EREs).

Ethylene responsive elements

Ethylene responsive elements (ATTTCAAA).[74]

F boxes

"Male sex determination in the Caenorhabditis elegans hermaphrodite germline requires translational repression of tra-2 mRNA by the [Germ Line Development] GLD-1 RNA binding protein."[103]

Skp, Cullin, F-box containing complex (or SCF complex) is a multi-protein E3 ubiquitin ligase complex that catalyzes the ubiquitination of proteins destined for 26S proteasomal degradation.[104]

"Canonical F-box proteins act as bridging components of the SCF ubiquitin ligase complex; the N-terminal F-box binds a Skp1 homolog, recruiting ubiquination machinery, while a C-terminal protein-protein interaction domain binds a specific substrate for degradation."[103]

GAAC elements

  1. negative strand in the negative direction, looking for 3'-GAACT-5', 13: between 843 and 4294 and complements,
  2. negative strand in the positive direction, looking for 3'-GAACT-5', 1, 3'-GAACT-5', 609 and complement,
  3. positive strand in the negative direction, looking for 3'-GAACT-5', 2, 3'-GAACT-5', 1685, 3'-GAACT-5', 3460 and complements,
  4. positive strand in the positive direction, looking for 3'-GAACT-5', 2, 3'-GAACT-5', 577, 3'-GAACT-5', 692 and complements,
  5. inverse complement, negative strand, negative direction, looking for 3'-AGTTC-5', 3: between 3844 and 4178 and complements,
  6. inverse complement, negative strand, positive direction, looking for 3'-AGTTC-5', 1, 3'-AGTTC-5', 761 and complement,
  7. inverse complement, positive strand, negative direction, looking for 3'-AGTTC-5', 6: between 253 and 4417.

GA responsive elements

Only one GARE (an inverse) occurs: between ZSCAN22 and A1BG 3'-AAACAAT-5' at 230 nts and its complement.

GATA boxes

GTGA-box has the consensus sequence GATA.[105]

Proximal promoters

Inverse complement, negative strand, positive direction there is 1: 3'-TTTATCAC-5', 4125.

Distal promoters

Positive strand in the negative direction there are 2: 3'-GGGATAGA-5', 100, 3'-ATGATAGA-5', 355.

Inverse complement, negative strand, negative direction there is 1: 3'-GTTATCAT-5', 2500.

Inverse complement, positive strand, negative direction there is 1: 3'-TTTATCTT-5', 1732.

Inverse complement, negative strand, positive direction there is 1: 3'-GTTATCCC-5', 3385.

Inverse complement, positive strand, positive direction there are 2: 3'-GCTATCAG-5', 1840, 3'-TTTATCTT-5', 2628.

G boxes

The "perfect palindrome 5'-GCCACGTGGC-3' which is also known as the G-box motif."[106]

"TAF-1 can bind to the G-box and related motifs and that it functions as a transcription activator."[106]

"A G-box-related motif, containing the core sequence CACGTG is also present in the 5' regions of two other classes of light-responsive genes".[106]

"Two distinct sequence elements, the H-box (consensus CCTACC(N)7CT) and the G-box (CACGTG), are required for stimulation of the chsl5 promoter by [p-coumaric acid] 4-CA."[107]

Most bZIP proteins show high binding affinity for the ACGT motifs, which include CACGTG (G box) [...].[70][71][72]

Binding "activity to the G-box of the light-responsive unit 1 (U1) region of the parsley (Petroselinum crispum) CHS promoter (CHS-U1: TCCACGTGGC; Schulze-Lefert et al., 1989) or the G-box of GmAux28 (TCCACGTGTC) was much weaker than to the PA G-box [...]."[92]

There are no "perfect palindrome" G boxes in either promoter.

GC boxes

GC box (GGGCGG).[108]

Positive strand in the negative direction there are 2; 3'-TGGGCGTGGT-5', 1898, 3'-TGGGCGTGGT-5', 3048, in the distal promoter.

Inverse complement, negative strand, negative direction there is 1: 3'-ACTCCGCCCA-5', 3092.

Inverse complement, positive strand, negative direction there is 1: 3'-GCTCCGCCTC-5', 1505.

Negative strand in the positive direction there is 1: 3'-TGGGCGGGAC-5', 409.

Inverse complement, positive strand, positive direction there is 1:, 3'-GCCACGCCCC-5', 491.

Gibberellin responsive elements

Gibberellin responsive elements (CCTTTTG, AAACAGA).[74]

Copying an apparent consensus sequence of CCTTTTG, AAACAGA and putting it in "⌘F" finds one located between ZSCAN22 and A1BG and two between ZNF497 and A1BG as can be found by the computer programs.

Glucocorticoid response elements

"DNA-binding by the GR-DBD has been well-characterized; it is highly sequence-specific, directly recognizing invariant guanine nucleotides of two AGAACA [TGTTCT] half sites called the glucocorticoid response element (GRE), and binds as a dimer in head-to-head orientation with mid-nanomolar affinity (4,12–18). [...] The consensus DNA glucocorticoid response element (GRE) is comprised of two half-sites (AGAACA) separated by a three base-pair spacer (13,15,60,61)."[109]

Copying an apparent consensus sequence of AGAACA and putting it in "⌘F" finds one located between ZSCAN22 and A1BG and two between ZNF497 and A1BG as can be found by the computer programs.

Gcr1ps

The upstream activating sequence (UAS) for Gcr1p is 5'-CTTCC-3' for the transcriptional activator involved in the regulation of glycolysis [77].[81]

Copying an apparent consensus sequence of 5'-CTTCC-3' and putting it in "⌘F" finds none located between ZSCAN22 and A1BG and six between ZNF497 and A1BG as can be found by the computer programs.

H boxes

Core promoters

Between ZSCAN22 and A1BG: There is one inverse and its complement 3'-AGGAGA-5' at 4428 nts.

Between ZNF497 and A1BG: There is an inverse and its complement 3'-AGGACA-5' at 4252. There is five after the TSS: between 4387 and 4392 and their complements.

Proximal promoters

Between ZSCAN22 and A1BG: There is one H box (3'-ANANNA-5'): negative direction, negative strand, 3'-ACACGA-5' at 4402. On the positive strand in the negative direction there are 16: between 4216 and 4395, with their complements on the negative strand, negative direction.

Between ZNF497 and A1BG: There is one H box (3'-ANANNA-5'): 3'-AGAGAA-5' at 4387 in the proximal promoter, negative strand, positive direction. There are four: between 4365 and 4392 and their complements in the positive direction.

Distal promoters

Between ZSCAN22 and A1BG, negative strand, negative direction: 3'-AGAGGA-5' at 3387, 3'-AGAGGA-5' at 3638, and 3'-AGAGGA-5' at 3675. One inverse and its complement 3'-AGGAGA-5' at 3790. There are 14 H boxes: between 788 and 4124.

On the positive strand, negative direction, there are 127 H boxes: between 608 and 4395.

Between ZNF497 and A1BG: There are two H boxes after nucleotide number 2300 in the negative strand and positive direction: between 420 and 530, and 3'-ACACCA-5' at 2603 and 3'-ACACCA-5' at 3825.

There are two H boxes after nucleotide number 2300 in the positive strand and positive direction: 3'-ACACCA-5' at 204, 3'-ACACCA-5' at 528, 3'-ACACCA-5' at 3643 and 3'-ACACCA-5' at 3967.

Regarding 3'-ANANNA-5', on the negative strand, positive direction, there are 25 H boxes: between 2591 and 4154.

On the positive strand, positive direction there are 20 H boxes: between 2347 and 4168.

There inverses on the negative strand in the positive direction of 31 H boxes: between 2412 and 4166.

HMG boxes

"Most HMG box proteins contain two or more HMG boxes and appear to bind DNA in a relatively sequence-aspecific manner (5, 13, 15, 16 and references therein). [...] they all appear to bind to the minor groove of the A/T A/T C A A A G-motif (10, 14, 18-20)."[110]

Copying an apparent consensus sequence of (A/T)(A/T)CAAAG and putting it in "⌘F" finds none located between ZSCAN22 and A1BG and none between ZNF497 and A1BG as can be found by the computer programs.

HNFs

Gene ID: 6927 is HNF1A HNF1 homeobox A aka TCF1 on 12q24.31: "The protein encoded by this gene is a transcription factor required for the expression of several liver-specific genes. The encoded protein functions as a homodimer and binds to the inverted palindrome 5'-GTTAATNATTAAC-3'. Defects in this gene are a cause of maturity onset diabetes of the young type 3 (MODY3) and also can result in the appearance of hepatic adenomas. Alternative splicing results in multiple transcript variants encoding different isoforms."[111]

"Canonical Wnt signaling results in the accumulation and binding of β-catenin to DNA-binding partner TCF1."[3] TCF-1 binding site is CCTTTGA.[3]

"HNF3 can bind to the site in the absence of HNF6 (Lahuna et al. 1997)."[112]

HNF6 core promoters

Inverse complement, positive strand, negative direction there is 1: 3'-TTATTAATTC-5', 4542.

HNF6 proximal promoters

Negative strand in the negative direction there is 1: 3'-TTATTAATCG-5', 4229.

Negative strand in the positive direction there are 2: 3'-TTATTAATCA-5', 4147, 3'-TTATTGATTA-5', 4164.

Inverse complement, positive strand, positive direction there are 1: 3'-ATATTAACAA-5', 4172.

HNF6 distal promoters

Negative strand in the negative direction there are 2: 3'-GTGTTAATAA-5', 1725, 3'-TAGTTGATAA-5', 3527.

Positive strand in the negative direction there is 1: 3'-AAATTGATAA-5', 3361.

Inverse complement, negative strand, negative direction there are 2: 3'-ACATGGACAT-5', 802, 3'-TAATGAACTT-5', 1301.

Inverse complement, positive strand, negative direction there are 2: 3'-AAATTGATAA-5', 3361, 3'-TCATCAACTA-5', 3525.

Negative strand in the positive direction there are 1: 3'-ATGTCCATGG-5', 3581.

Positive strand in the positive direction there is 1: 3'-GAGTCCATTG-5', 3732.

Inverse complement, positive strand, positive direction there is 1: 3'-CCATTGACTC-5', 3736.

Homeoboxes

"Transcription factors Pax-4 and Pax-6 are known to be key regulators of pancreatic cell differentiation and development. [...] The gene-targeting experiments revealed that Pax-4 and Pax-6 cannot substitute for each other in tissue with overlapping expression of both genes. [The] DNA-binding specificities of Pax-4 and Pax-6 are similar. The Pax-4 homeodomain [HD] was shown to preferentially dimerize on DNA sequences consisting of an inverted TAAT motif, separated by 4-nucleotide spacing."[113]

The "crucial difference between the binding sites of Antennapedia class and TTF-1 HDs is in the motifs 5'-TAAT-3', recognized by Antennapedia [a Hox gene, a subset of homeobox genes, first discovered in Drosophila which controls the formation of legs during development], and 5'-CAAG-3', preferentially bound by TTF-1. [The] binding of wild type and mutants TTF-1 HD to oligonucleotides containing either 5'-TAAT-3' or 5'-CAAG-3' indicate that only in the presence of the latter motif the Gln50 in TTF-1 HD is utilized for DNA recognition."[114]

Copying a portion of the homeobox motif of CAAG and putting it in "⌘F" finds eight located between ZSCAN22 and A1BG and 21 between ZNF497 and A1BG as can be found by the computer programs.

Hsf1ps

The upstream activating sequence (UAS) for the Hsf1p is 5'-NGAAN-3' or 5'-(A/C/G/T)GAA(A/C/G/T)-3'.[81]

Copying 5'-TGAAA-3' in "⌘F" yields twelve between ZSCAN22 and A1BG and 5'-CGAAC-3' one between ZNF497 and A1BG as can be found by the computer programs.

HY boxes

Core promoters

Positive strand in the negative direction there is 1: 3'-TGAGGG-5' at 4558.

Inverse complement, negative strand, negative direction there is 1: 3'-CCCTCA-5', 4498.

Negative strand in the positive direction there is 1: 3'-TGTGGG-5', 4395.

Distal promoters

Negative strand in the negative direction there is 1: 3'-TGTGGG-5' at 749.

Positive strand in the negative direction there are 4: between 88 and 3712.

Inverse complement, negative strand, negative direction there are 3: between 2702 and 3889.

Positive strand in the positive direction there are 2: 3'-TGTGGG-5', 2965, 3'-TGTGGG-5', 3533.

Negative strand in the positive direction there are 3: between 258 and 3879.

Inverse complement, negative strand, positive direction there are 3: between 88 and 3503.

Inverse complement, positive strand, positive direction there is 5: between 494 and 3185.

Initiator elements (YYANWYY)

Core promoters

There is the following Inr in the core promoter, negative strand, negative direction: 3'-TTACTCC-5' at 4557.

There are four Inrs in the core promoter, positive strand, negative direction: between 4425 and at 4542.

There is the following Inr in the core promoter, negative strand, positive direction: 3'-CTGCACC-5' at 4343.

There are two Inrs in the core promoter, positive strand, positive direction: 3'-CCACTCC-5' at 4401 and 3'-CCAGACC-5' at 4416.

Proximal promoters

There are eight Inrs on the negative strand in the negative direction: between 4202 and 4557.

There are seven Inrs on the positive strand in the negative direction: between 4327 and 4542.

There is one Inr on the negative strand in the positive direction: 3'-CTGCACC-5' at 4343.

There is two Inrs on the positive strand in the positive direction: 3'-CCACTCC-5' at 4401 and 3'-CCAGACC-5' at 4416.

Distal promoters

Negative strand in the negative direction there are 87: between 71 and 4188.

Positive strand in the negative direction there are 40: between 20 and 3967.

Inverse complement, negative strand, negative direction there are 32: between 213 and 3967.

Negative strand in the positive direction there are 45: between 115 and 4139.

Positive strand in the positive direction there are 75: between 40 and 4136.

Inverse complement, negative strand, positive direction there are 61: between 53 and 4136.

Inverse complement, positive strand, negative direction there are 100: between 17 and 4177.

Inverse complement, positive strand, positive direction there are 75: between 524 and 4138.

Initiator elements (BBCABW)

Core promoters

There are five Inrs, positive strand, negative direction: between 4423 and 4531.

There are five Inrs, negative strand, positive direction: between 4271 and 4338.

There are four Inrs, positive strand, positive direction: between 4269 and 4414.

Proximal promoters

There are five Inrs on the negative strand in the negative direction: between 4200 and 4359.

There are nine Inrs on the positive strand in the negative direction: between 4233 and 4531.

There is six Inrs on the negative strand in the positive direction: between 4195 and 4338.

There is four Inrs on the positive strand in the positive direction: between 4269 and 4414.

Distal promoters

Negative strand in the negative direction there are 44: between 179 and 3939.

Positive strand in the negative direction there are 59: between 39 and 3965.

Inverse complement, negative strand, negative direction there are 46: 3'-TCTGAC-5', 16: between 62 and 3983.

Inverse complement, positive strand, negative direction there are 54, 3'-ACTGAA-5', 18: between 78 and 4093.

Negative strand in the positive direction there 87: between 15 and 4013.

Positive strand in the positive direction there are 40: between 153 and 4056.

Inverse complement, negative strand, positive direction there are 94: between 54 and 4095.

Inverse complement, positive strand, positive direction there are 47: between 236 and 4127.

Jasmonic acid-responsive elements

Jasmonic acid-responsive elements (TGACG, CGTCA).[74]

Copying an apparent consensus sequence for the jasmonic acid-responsive element (JARE)[115] of TGACG and putting it in "⌘F" finds eight located between ZSCAN22 and A1BG and one between ZNF497 and A1BG as can be found by the computer programs.

Krüppel-like factors

"Krüppel-like factor 1 (KLF1/EKLF) is a transcription factor that globally activates genes involved in erythroid cell development. [...] KLF1 belongs to the KLF family of transcription factors that binds the G-rich strand of so-called CACCC-box motifs located in regulatory regions of numerous erythroid genes."[116]

"Using the in vitro CASTing method, we identified a new set of sequences bound by [congenital dyserythropoietic anemia] CDA-KLF1, and based on them we defined the consensus binding site as 5′-NGG-GG(T/G)-(T/G)(T/G)(T/G)-3′. It differs from the consensus binding sites for [wild-type] WT-KLF1, 5′-NGG-G(C/T)G-(T/G)GG-3′, and for [neonatal anemia] Nan-KLF1, 5′-NGG-G(C/A)N-(T/G)GG-3′, as well."[116]

An apparent consensus is GGG(A/C/G/T)(A/C/G/T)(G/T)(G/T)(G/T).

Copying an apparent consensus sequence for the KLF of GGGTCGTG and putting it in "⌘F" finds six located between ZSCAN22 and A1BG and none between ZNF497 and A1BG as can be found by the computer programs.

M35 boxes

Negative strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesM35--.bas, looking for 3'-TTGACA-5', 2, 3'-TTGACA-5', 477, 3'-TTGACA-5', 4399.

Metal responsive elements

Proximal promoters

On the positive strand in the negative direction there is an MRE 3'-TGCACTC-5' at 4341.

Distal promoters

Positive strand in the negative direction there are 6: between 891 and 3290.

Inverse complement, negative strand, negative direction there are 2: 3'-GTGTGCA-5', 531, 3'-GAGTGCA-5', 1772.

Inverse complement, positive strand, negative direction there are 2: 3'-GAGTGCA-5', 1470, 3'-GTGTGCA-5', 2863.

Negative strand in the positive direction there are 11: between 453 and 3323.

Positive strand in the positive direction there are 2: 3'-TGCGCCC-5', 872, 3'-TGCGCCC-5', 972.

Inverse complement, negative strand, positive direction there are 10: between 546 and 3883.

Mig1ps

The upstream activating sequence (UAS) for the Mig1p transcription factor is 5'-SYGGGG-3' or 5'-(C/G)(C/T)GGGG-3'.[81]

Copying 5'-CTGGGG-3' in "⌘F" yields none between ZSCAN22 and A1BG and four between ZNF497 and A1BG as can be found by the computer programs.

Msn2,4p

The upstream activating sequence (UAS) for the Msn2,4p transcription factor is 5'-CCCCT-3'.[81]

Copying 5'-CCCCT-3' in "⌘F" yields one between ZSCAN22 and A1BG and three between ZNF497 and A1BG as can be found by the computer programs.

MYB recognition elements

"These elements fit the type II MYB consensus sequence A(A/C)C(A/T)A(A/C)C, suggesting that they are MYB recognition elements (MREs)."[117]

MYB binding site involved in drought induction (TAACTG).[74]

Copying an apparent core consensus sequence for the MYBRE of AACAAAC or TAACTG and putting it in "⌘F" finds none located between ZSCAN22 and none or one between ZNF497 and A1BG as can be found by the computer programs.

Myocyte enhancer factor 2 (MEF2)

Myocyte enhancer factor-2 (MEF2) proteins are a family of transcription factors which through control of gene expression are important regulators of cellular differentiation and consequently play a critical role in embryonic development.[118] In adult organisms, Mef2 proteins mediate the stress response in some tissues.[118]

"The current study delineates the conformational paradigm, clustered recognition, and comparative DNA binding preferences for MEF2A and MEF2B-specific MADS-box/MEF2 domains at the YTA(A/T)4TAR consensus motif."[119] Y = (C/T) and R = (A/G). The consensus sequence is (C/T)TA(A/T)(A/T)(A/T)(A/T)TA(A/G).[119]

Copying an apparent consensus sequence for the TTATAT or CTAATT and putting it in "⌘F" finds two (TTATAT) located between ZSCAN22 and one (CTAATT) between ZNF497 and A1BG as can be found by the computer programs.

Nuclear factor kappa-light-chain-enhancer of activated B cells

The "natural 11 bp 𝜿B binding site MHC H-2 [is 3'-CCCCTAAGGGG-5'] which is well ordered in our structure."[120]

Binding site for NF𝛋B in humans (GGAATTCCCC) with a core of (GAATTC).[108]

Copying an apparent core consensus sequence for the NF𝛋B of GAATTC and putting it in "⌘F" finds three cores located between ZSCAN22 and none between ZNF497 and A1BG as can be found by the computer programs.

Nuclear factor of activated T cell transcriptions

Mutation "of the core NFATp binding sequence (GGAAAA) in the IL2 promoter NFAT site entirely eliminates the function of the site, as does mutation of an adjacent non-canonical AP-1 site that is not essential for NFATp binding but that is required for formation of the NFATp-Fos-Jun complex(6, 15).3"[121]

Copying an apparent consensus sequence for the NFAT GGAAAA and putting it in "⌘F" finds none located between ZSCAN22 and one between ZNF497 and A1BG as can be found by the computer programs.

Nuclear factor 1

Nuclear factor 1 (NF-1) is a family of closely related transcription factors that constitutively bind as dimers to specific sequences of DNA with high affinity.[122] Family members contain an unusual DNA binding domain that binds to the recognition sequence 5'-TTGGCXXXXXGCCAA-3'.[123]

Consensus sequences for the nuclear factor 1 are TGGCA, TGGCG and TGGAA.[124]

An apparent consensus sequence for the NF1 is TGG(A/C)(A/G).

Copying an apparent consensus sequence for the NF1 TGGCA and putting it in "⌘F" finds none located between ZSCAN22 and five between ZNF497 and A1BG as can be found by the computer programs.

p63 DNA-binding sites

"p63 bound preferentially to DNA fragments conforming to the 20 bp sequence 5'-RRRC(A/G)(A/T)GYYYRRRC(A/T)(C/T)GYYY-3'."[125]

The apparent consensus sequence is (A/G)(A/G)(A/G)C(A/G)(A/T)G(C/T)(C/T)(C/T).

Copying an apparent consensus sequence for the P63 (GAGCGAGCCT) and putting it in "⌘F" finds none located between ZSCAN22 and one between ZNF497 and A1BG as can be found by the computer programs.

P boxes

"As VRI [target gene: vrille (VRI)] accumulates in the nucleus during the mid to late day, it binds VRI/PDP1ϵ binding sites (V/P-boxes) [consensus of V box: A(/G)TTA(/T)T(/C), of P-box: GTAAT(/C)], to repress Clk and cry transcription (Hardin, 2004)."[126]

Copying the apparent consensus sequence for the P box (GTAA(T/C)) and putting it in "⌘F" finds seven located between ZSCAN22 and one between ZNF497 and A1BG as can be found by the computer programs.

Peroxisome proliferator hormone response elements

"After activation by ligands, PPARs/RXRs heterodimers bind to PPRE consensus sequence (AGGTCANAGGTCA) in the promoter of their target genes."[127]

The DNA consensus sequence is AGGTCANAGGTCA, with N being any nucleotide.

Peroxisome proliferator hormone response elements (PPREs) consensus sequences are AGGGGA and TCCCCT.[77]

Copying the apparent consensus sequence for the PPRE (AGGGGA) and putting it in "⌘F" finds none located between ZSCAN22 or three between ZNF497 and A1BG as can be found by the computer programs.

Phosphate starvation-response transcription factors

"The [palindromic E-box motif (CACGTG)] motif is bound by the transcription factor Pho4, [and has the] class of basic helix-loop-helix DNA binding domain and core recognition sequence (Zhou and O'Shea 2011)."[69]

The Pho4 homodimer binds to DNA sequences containing the bHLH binding site 5'-CACGTG-3'.[128]

Copying the apparent consensus sequence for the Pho (CACGTG) and putting it in "⌘F" finds none located between ZSCAN22 or one between ZNF497 and A1BG as can be found by the computer programs.

Pollen1 elements

"Electrophoretic mobility shift assays identified a pollen-specific cis-acting element POLLEN1 (AGAAA) mapped at AtACBP4 (−157/−153) which interacted with nuclear proteins from flower and this was substantiated by DNase I footprinting."[105]

"Given that AtACBP4pro::GUS (−156/−67) could drive promoter activity for pollen expression, [electrophoretic mobility shift assays] EMSAs were carried out to investigate the role of the putative POLLEN1 cis-element, AGAAA (−150/−146), and its adjacent co-dependent regulatory element TCCACCATA (–141/–133)."[105]

"POLLEN1 and the TCCACCATA element are co-dependent regulatory elements responsible for pollen-specific activation of tomato LAT52 (Bate and Twell 1998)."[105]

Copying the consensus for POLLEN1: 3'-AGAAA-5' and putting the sequence in "⌘F" finds many locations for this sequence in the A1BG directions as can be found by the computer programs.

Pribnow boxes

  1. negative strand in the negative direction, looking for 3'-TATAAT-5', 2, 3'-TATAAT-5', 3454, 3'-TATAAT-5', 3468,
  2. negative strand in the positive direction, looking for 3'-TATAAT-5', 1, 3'-TATAAT-5', 729,
  3. complement, positive strand, negative direction, looking for 3'-ATATTA-5', 2, 3'-ATATTA-5', 3454, 3'-ATATTA-5', 3468,
  4. complement, positive strand, positive direction, looking for 3'-ATATTA-5', 1, 3'-ATATTA-5', 729,
  5. inverse complement, negative strand, negative direction, looking for 3'-ATTATA-5', 2, 3'-ATTATA-5', 272, 3'-ATTATA-5', 603,
  6. inverse complement, negative strand, positive direction, looking for 3'-ATTATA-5', 1, 3'-ATTATA-5', 727,
  7. inverse, positive strand, negative direction, looking for 3'-TAATAT-5', 2, 3'-TAATAT-5', 272, 3'-TAATAT-5', 603,
  8. inverse, positive strand, positive direction, looking for 3'-TAATAT-5', 1, 3'-TAATAT-5', 727.

Prolamin boxes

  1. negative strand in the negative direction: 1, 3'-TGTAAAG-5', 2884,
  2. negative strand in the positive direction: 1, 3'-TGAAAAG-5', 489,
  3. positive strand in the negative direction: 1, 3'-TGAAAAG-5', 1627.

Pyrimidine boxes

Pyrimidine boxes and their complements in the negative direction: 3'-CCTTTT-5' at 2459, 3'-CCTTTT-5' at 2927, and 3'-CCTTTT-5' at 2968 occur. Inverse pyrimidine boxes and their complements occur 3'-AAAAGG-5' at 105, 3'-AAAAGG-5' at 1107, 3'-AAAAGG-5' at 3345, and 3'-AAAAGG-5' at 3441.

Pyrimidine boxes in the positive direction: 3'-CCTTTT-5' at 135 and 3'-CCTTTT-5' at 291 and their complements are close to ZNF497.

Q elements

"The basal regulatory elements identified include a putative TATA-box (−30/−24) for RNA polymerase binding and a CAAT box (−64/−61; [...]). Several putative floral expression-related cis-elements identified included a putative 6-nucleotide Q element (−770/−665), three GTGA boxes (−372/−369, −209/−206 and −164/−161) and four putative highly-conserved POLLEN1 boxes (−737/−733, −711/−707, −150/−146 and −36/−32; [...])."[105]

The consensus sequence for a Q element is 3'-AGGTCA-5'.[105]

Copying the apparent consensus sequence for the QE (AGGTCA) and putting it in "⌘F" finds two located between ZSCAN22 or three between ZNF497 and A1BG as can be found by the computer programs.

Rap1 regulatory factors

Consensus sequences: C(A/C/G)(A/C/G)(A/G)(C/G/T)C(A/C/T)(A/G/T)(C/G/T)(A/G/T)(A/C/G)(A/C)(A/C/T)(A/C/T).[69]

"Rap1 is another GRF that organizes chromatin, binds promoters of genes that encode ribosomal and glycolytic proteins, and binds telomeres (Shore 1994; Ganapathi et al. 2011; Hughes and de Boer 2013). [...] DNA shape analysis revealed that Rap1 motifs possess an intrinsically wide minor groove spanning the central degenerate region of the motif that was wider at binding-competent sites [...]. A clear trend was observed between increased width of the minor groove in the central degenerate region of the motif and increased Rap1 binding in vitro."[69]

Copying an apparent consensus sequence for Rap1 (CCCACCAACAAAA) and putting it in "⌘F" finds none located between ZSCAN22 or none between ZNF497 and A1BG as can be found by the computer programs.

Reb1 general regulatory factors

Purified "Reb1 bound [...] exact TTACCCK occurrences [...] with >60% of 780 occurrences at promoters. [And can have] the extended motif VTTACCCGNH (IUPAC nomenclature) (Rhee and Pugh 2011)."[69] K = G, T; V = not T, N - aNy base and H = not G.

Copying the apparent consensus sequence for Reb1 (TTACCC(G/T)) and putting it in "⌘F" finds one located between ZSCAN22 or none between ZNF497 and A1BG as can be found by the computer programs. However, an extended Reb1 (ATTACCCGAA) finds none located between ZSCAN22 or between ZNF497 and A1BG.

Retinoblastoma control elements

"Robbins et al. (18) have reported that expression of pRB in mouse fibroblasts suppresses transcription of c-fos and have identified an element, termed the retinoblastoma control element (RCE), in the c-fos promoter necessary for this suppression. More recently, sequences homologous to the RCE have been identified in the TGF-β1, -β2, and -β3 promoters by Kim et al. (19)."[129]

"Comparison of the sequence of the newly cloned mouse MMP-9 promoter region with our previous human isolate revealed that [...] four units of GGGG(T/A)GGGG sequence (GT box) were conserved between the two species."[108]

"Expression of some matrix metalloproteinases (MMPs) are regulated by cytokines and tumor promoters, namely tumor necrosis factor-𝛂 (TNF-𝛂), epidermal growth factor, interleukin-1, and 12-O-tetradecanoylphorbol-13-acetate (TPA) (15-20)."[108]

Expression "of v-Src induces the synthesis of MMP-9, which is mediated by alterations in activity of binding factors for the AP-1 site and the sequence motif GGGGTGGGG (GT box). This GT box is homologous to the so-called retinoblastoma (Rb) control element (RCE) (29,30), and Rb can produce an anti-oncogene or tumor suppressor gene product (31-38) which is involved in regulating transcription of certain genes."[108]

Binding site for NF𝛋B in humans (GGAATTCCCC) with a core of (GAATTC), Sp-1 (CCGCCCC), 12-O-tetradecanoylphorbol-13-acetate (TPA) responsive element (TRE) (TGAGTCA), and GC box (GGGCGG).[108]

"Angiotensin II (Ang II) up-regulates plasminogen-activator inhibitor type-1 (PAI-1) expression in mesangial cells to enhance extracellular matrix formation. The proximal promoter region (bp -87 to -45) of the human PAI-1 gene contains several potent binding sites for transcription factors [two phorbol-ester-response-element (TRE)-like sequences; D-box (-82 to -76) and P-box (-61 to 54), and one Sp1 binding site-like sequence, Sp1-box 1 (-72 to -67)]."[101]

"The methylation-interference experiment demonstrated that human recombinant Sp1 bound to the so-called GT box (TGGGTGGGGCT, -78 to -69), which contains the Sp1-box 1."[101]

D-box (TGAGTGG), Sp1-box 1 (GGGGCT), P-box (TGAGTTCA), Sp1-box 2 (CTGCCC), and TATA box (TATAAA).[101]

Copying the apparent consensus sequence for the RCE, GT box, (GGGGTGGGG) and putting it in "⌘F" finds none located between ZSCAN22 or between ZNF497 and A1BG as can be found by the computer programs. However, RCE (GGGGAGGGG) finds none located between ZSCAN22 and one between ZNF497 and A1BG.

Retinoic acid response elements

Retinoic acid response elements (RAREs).

"Retinoic acid is considered as the earliest factor for regulating anteroposterior axis of neural tube and positioning of structures in developing brain through retinoic acid response elements (RARE) consensus sequence (5′–AGGTCA–3′) in promoter regions of retinoic acid-dependent genes."[130]

"Several studies have suggested that the target gene of the RA signal generally contains two direct-repeat half sites of the consensus sequence AGGTCA that are spaced by one to five base pairs (14,16,32,38)."[4]

"Xavier-Neto’s review demonstrated that the magic AGGTCA has high affinity but poor specificity (16). Some other [nuclear receptors] NRs also utilized the RARE with the same spacer models that are used by RXRs/RARs, for example, orphan receptors, vitamin D receptors (VDR) and peroxisome proliferator-activated receptors (PPAR) (32,39). Identifying a bona fide RARE is more difficult than a simple inspection. In order to attribute the RARE in Cx43 to a candidate sequence, some observations have been conducted in our study using molecular, biological and biophysical methods and functional approaches. In a ligand-dependent luciferase assay, RARE was located between the −1,426 to −341 base pair position. The constitutively active mutant Cx43 RARE represses the luciferase activity in the absence of the ligand and has no response to the 9cRA. Our findings indicate that RARE in the Cx43 promoter is a functional element."[4]

Additional response elements that include the 5'-AGGTCA-3' are Q elements, ROR-response elements and Thyroid hormone response elements.

A likely general consensus sequence may be 5'-AG(A/G)TCA-3'.[4]

Copying the apparent consensus sequence for the RARE (AGGTCA) and putting it in "⌘F" finds two located between ZSCAN22 and A1BG and three between ZNF497 and A1BG as can be found by the computer programs.

Root specific elements

Root specific elements (TGACGTCA).[74]

Copying the apparent consensus sequence for the RSE (TGACGTCA) and putting it in "⌘F" finds one located between ZSCAN22 and A1BG and none between ZNF497 and A1BG as can be found by the computer programs.

ROR-response elements

RAR-related orphan receptor "ROR-γ binds DNA with specific sequence motifs AA/TNTAGGTCA (the classic RORE motif) or CT/AG/AGGNCA (the variant RORE motif)13, 31."[59]

Copying the apparent consensus sequence for the RORE (ATATAGGTCA) and putting it in "⌘F" finds one located between ZSCAN22 and A1BG and none between ZNF497 and A1BG as can be found by the computer programs.

Copying the apparent consensus sequence for the variant RORE (CTGGGACA) and putting it in "⌘F" finds two located between ZSCAN22 and A1BG and one between ZNF497 and A1BG as can be found by the computer programs.

R response elements

The consensus sequence for the RRE is 5'-CATCTG-3'.[76]

Copying the apparent consensus sequence for the RRE (CATCTG) and putting it in "⌘F" finds none located between ZSCAN22 and A1BG and one between ZNF497 and A1BG as can be found by the computer programs.

Seed-specific elements

Seed-specific element (CATGCATG).[74]

Copying the apparent consensus sequence for the seed-specific element (CATGCATG) and putting it in "⌘F" finds none located between ZSCAN22 and A1BG or between ZNF497 and A1BG as can be found by the computer programs.

Serum response elements

The SRE wild type (SREwt) contains the nucleotide sequence ACAGGATGTCCATATTAGGACATCTGC, of which CCATATTAGG is the CArG box, TTAGGACAT is the C/EBP box, and CATCTG is the E box.[95]

5'-CCATATTAGG-3' is a CArG box that does not occur in either promoter of A1BG.

5'-CATCTG-3' is an E box that does not occur in either promoter of A1BG.

5'-TTAGGACAT-3' is a C/EBP box that does not occur in either promoter of A1BG using "⌘F".

5'-ACAGGATGT-3' is contained in the above nucleotide sequence which has one occurring between ZNF497 and A1BG using "⌘F" and none between ZSCAN22 and A1BG.

Servenius sequences

The "positive effect of W element may result from cooperative interactions between Z and other downstream elements such as the Servenius sequence, GGACCCT, located from -131 to -125 bp(28,38)."[131]

Copying the apparent consensus sequence for Servenius (GGACCCT) and putting it in "⌘F" finds three located between ZSCAN22 and A1BG and one between ZNF497 and A1BG as can be found by the computer programs.

Specificity proteins

Sp1-box 1 (GGGGCT) and Sp1-box 2 (CTGCCC).[101]

"Sp3 has been shown to repress transcriptional activity of Sp1 [9]."[101]

Sp-1 (CCGCCCC).[108]

Sp1 (GCGGC).[124]

An apparent consensus sequences for Sp1 (GGGGCT), (CTGCCC) or (CCGCCCC) is 3'-(C/G)(C/G/T)G(C/G)C(C/T)-5'. Or, each must be considered separately.

Copying the apparent consensus sequences for Sp1 (GGGGCT), (CTGCCC) or (CCGCCCC) and putting each sequence in "⌘F" finds none located between ZSCAN22 and A1BG and four, two or none between ZNF497 and A1BG as can be found by the computer programs.

STATs

A "homologous IFN-𝛄 activation site (GAS) element, having the consensus sequence TTC/ANNNG/TAA, is found in the promoters of several [interferon-stimulated genes] ISG.(37–40)"[132] Consensus sequences: STAT1 - TTCC(C/G)GGAA, STAT3 - TTCC(C/G)GGAA, STAT4 - TTCCGGAA, STAT5 - TTCNNNGAA and STAT6 - TTCNNNNGAA.[132]

"The GAS element is palindromic and the sequence TTCN(2-4)GAA defines the optimal binding site for all STATs, with the exception of STAT2 which appears to be defective in GAS-DNA binding [...]."[133]

Proximal promoters

Negative strand in the positive direction there is 1: 3'-TTCCGGGAA-5', 4247.

Distal promoters

Positive strand in the negative direction there are 2: 3'-TTCGTTGAA-5', 3506, 3'-TTCCCTGAA-5', 3782.

Positive strand in the positive direction there is 1: 3'-TTCCATGAA-5', 128.

Ste12p

The upstream activating sequence (UAS) for Ste12p is 5'-TGAAAC-3'.[81]

Copying 5'-TGAAAC-3' in "⌘F" yields eleven between ZSCAN22 and A1BG and one between ZNF497 and A1BG as can be found by the computer programs.

Synaptic Activity-Responsive Elements

"A unique synaptic activity-responsive element (SARE) sequence, composed of the consensus binding sites for SRF, MEF2 and CREB, is necessary for control of transcriptional upregulation of the Arc gene in response to synaptic activity."[134]

"Within the cAMP-responsive element of the somatostatin gene, we observed an 8-base palindrome, 5'-TGACGTCA-3', which is highly conserved in many other genes whose expression is regulated by cAMP."[97]

The consensus sequence for the myocyte enhancer factor 2 (MEF2) is (C/T)TA(A/T)(A/T)(A/T)(A/T)TA(A/G).[119]

The SRE wild type (SREwt) contains the nucleotide sequence ACAGGATGTCCATATTAGGACATCTGC, of which CCATATTAGG is the CArG box, TTAGGACAT is the C/EBP box, and CATCTG is the E box.[95]

TACTAAC boxes

"A consensus sequence TACTAA(C/T) was derived for the branch site of Dictyostelium introns."[135]

  1. positive strand in the positive direction is SuccessablesTACT++.bas, looking for 3'-TACTAA(C/T)-5', 1, 3'-TACTAAT-5', 718,
  2. complement, negative strand, positive direction is SuccessablesTACTc-+.bas, looking for 3'-ATGATT(A/G)-5', 1, 3'-ATGATTA-5', 718,
  3. inverse complement, positive strand, positive direction is SuccessablesTACTci++.bas, looking for 3'-(A/G)TTAGTA-5', 1, 3'-ATTAGTA-5', 709,
  4. inverse, negative strand, positive direction, is SuccessablesTACTi-+.bas, looking for 3'-(C/T)AATCAT-5', 1, 3'-TAATCAT-5', 709.

TAGteams

The "heptamer consensus sequence CAGGTAG (i.e., the TAGteam) is overrepresented in regulatory regions of the earliest expressed zygotic genes [2]."[136]

Copying the consensus TAGteam: 5'-CAGGTAG-3' and putting the sequence in "⌘F" finds one location between ZNF497 and A1BG or no locations between ZSCAN22 and A1BG as can be found by the computer programs.

Tapetum boxes

The consensus sequence for the TAPETUM box is TCGTGT.[105]

Copying the consensus Tapetum box: 3'-TCGTGT-5' and putting the sequence in "⌘F" finds one location between ZNF497 and A1BG and one between ZSCAN22 and A1BG as can be found by the computer programs.

TATA boxes

Negative strand in the negative direction there are 2: 3'-TATATATA-5' at 1600 (or -2860 nts upstream from the TSS) and 3'-TATATAAA-5' at 1602 (or -2858 nts).

Positive strand in the negative direction there are 3: 3'-TATAAAAG-5' at 184 (or -4276 nts), 3'-TATAAAAG-5' at 223 (or -4237 nts), and 3'-TATATAAA-5' at 2874 (or -1586 nts).

Inverse complement, negative strand, negative direction there are 2: 3'-TATATATA-5', 1600, 3'-TTTATATA-5', 2871.

Inverse complement, positive strand, negative direction there is 1: 3'-TTTTTATA-5', 219.

TAT boxes

Only an inverse and its complement occurs between ZSCAN22 and A1BG: 3'-TACCTAT-5' at 2996 nts from ZSCAN22.

T boxes

"The different inducing activities of Xbra, VegT and Eomesodermin suggest that the proteins might recognise different DNA target sequences. [...] All three proteins prove to recognise the same core sequence of TCACACCT with some differences in flanking nucleotides."[137]

Most bZIP proteins show high binding affinity for the ACGT motifs, which include [...] AACGTT (T box) [...].[70][71][72]

"Despite sequence variations within the Tbox DBD between family members, all members of the family appear to bind to the same DNA consensus sequence, TCACACCT. In several in vitro binding-site selection studies, members of the Tbox family were found to bind preferentially sequences containing two or more of these core motifs arranged in various orientations; however, the significance of such double sites in vivo is uncertain, as most Tbox target gene sites have been found to contain only a single consensus motif (18)."[138]

Copying the consensus T boxes: 3'-TCACACCT-5' or 3'-AACGTT-5' and putting the sequence in "⌘F" finds two locations or zero for these sequences respectively between ZSCAN22 or ZNF497 and A1BG as can be found by the computer programs.

Telomeric repeat DNA-binding factors

Copying the consensus telomeric repeat DNA-binding factor (TRF): 3'-TTAGGG-5' and putting the sequence in "⌘F" locates ten of this sequence between ZSCAN22 and A1BG in the negative direction and two nucleotides between ZNF497 and A1BG as can be found by the computer programs.

In the nucleotides between ZSCAN22 and A1BG there is are ten 3'-TTAGGG-5' beginning about 300 nucleotides from ZSCAN22 or ending at about 3900 nts. There are two among the nucleotides between ZNF497 and A1BG as A1BG is approached from ZNF497.

Homo sapiens genes containing these are found using Homo sapiens "TRF (TTAGGG repeat binding factor)".[139]

Thyroid hormone response elements

"The arrangement of TREs within the promoter might regulate THR action by determining THR isoform binding, THR dimerization, and coregulators binding. In the classic view of how TH and its receptor stimulate gene expression, the gene promoter contains TREs consisting of a 6-bp consensus sequence (AGGTCA) organized as a direct repeat separated by 4 bp (DR4), a palindrome without spacing (PAL), or an inverted palindrome (LAP) separated by 4 to 6 bp (10–13)."[140]

Copying the consensus sequence for the TRE: 5'-AGGTCA-3' and putting the sequence in "⌘F" finds no locations between ZNF497 and A1BG or two locations between ZSCAN22 and A1BG as can be found by the computer programs.

Upstream stimulating factors

"The helix-loop-helix transcription factor USF (upstream stimulating factor) binds to a regulatory sequence of the human insulin gene enhancer."[141]

"The regulation of insulin gene expression is dependent on sequences located upstream of the transcription start site (Clark and Docherty, 1992). Two important cis-acting elements, the insulin enhancer binding site 1 (IEBI) or NIR box and the IEB2 or FAR box, have been identified in the rat insulin I gene (Karlsson et al., 1987, 1989). Located at positions -104 (IEBI/NIR) and -233 (IEB2/FAR), these elements share an identical 8 bp sequence, GCCATCTG, which contains a consensus sequence, CANNTG, characteristic of E-box elements (Kingston, 1989). E boxes are present in enhancers from a variety of genes, including immunoglobulin and muscle-specific genes, where they interact with transcription factors containing a helix-loop-helix (HLH) dimerization domain (Murre et al., 1989)."[141]

"The IEB1 box is highly conserved among insulin genes, and is thus likely to play an important role in controlling transcription. The IEB2 site is not well conserved; in the rat insulin 2 gene the equivalent sequence is GCCACCCAGGAG, and in the human insulin gene the homologous sequence, which has been previously designated the GC2 box (Boam et al., 1990a), is GCCACCGG."[141]

"Confirmation that USF bound at the IEB2 site was obtained using an oligonucleotide containing the USF binding site from the adenovirus MLP."[141]

A likely general USF box consensus sequence may be 3'-GCC(A/T)NN(C/G/T)(A/G)-5'.

Those containing an E-box (CANNTG)

  1. Negative strand in the positive direction (from ZNF497 to A1BG) is SuccessablesUSFbox-+.bas, looking for 3'-GCC(A/T)NN(C/G/T)(A/G)-5': 1, 3'-GCCACATG-5' at 3707.
  2. inverse complement, positive strand, negative direction is SuccessablesUSFboxci+-.bas, looking for 3'-(C/T)(A/C/G)NN(A/T)GGC-5': 1, 3'-CAGATGGC-5' at 3629.
  3. inverse complement, positive strand, positive direction is SuccessablesUSFboxci++.bas, looking for 3'-(C/T)(A/C/G)NN(A/T)GGC-5': 1, 3'-CAGGTGGC-5' at 1845.

Those containing an E-box (GTNNAC)

  1. inverse negative strand, negative direction is SuccessablesUSFboxi--.bas, looking for 3'-(A/G)(C/G/T)NN(A/T)CCG-5': 1, 3'-GTCTACCG-5' at 3629.
  2. inverse negative strand, positive direction is SuccessablesUSFboxi-+.bas, looking for 3'-(A/G)(C/G/T)NN(A/T)CCG-5': 1, 3'-GTCCACCG-5' at 1845.
  3. inverse positive strand, positive direction is SuccessablesUSFboxi++.bas, looking for 3'-(A/G)(C/G/T)NN(A/T)CCG-5': 3, 3'-GTCCACCG-5' at 198, 3'-GTGGACCG-5' at 2570 and 3'-GTAGACCG-5' at 3406.
  4. complement, negative strand, negative direction is SuccessablesUSFboxc--.bas, looking for 3'-CGG(A/T)NN(A/C/G)(C/T)-5': 2, 3'-CGGTCCAC-5' at 2079 and 3'-CGGTCCAC-5' at 3953.
  5. complement, positive strand, positive direction is SuccessablesUSFboxc++.bas, looking for 3'-CGG(A/T)NN(A/C/G)(C/T)-5': 1, 3'-CGGTGTAC-5' at 3707.

V boxes

"As VRI accumulates in the nucleus during the mid to late day, it binds VRI/PDP1ϵ binding sites (V/P-boxes) [consensus V box:A(/G)TTA(/T)T(/C), P box:GTAAT(/C)], to repress Clk and cry transcription (Hardin, 2004)."[126]

In the negative direction (from ZSCAN22 to A1BG) there are up to 81 V boxes, 28 to 4538 nts from ZSCAN22 with the apparent TSS at 4460 nts.

In the positive direction (from ZNF497 to A1BG) there are up to 21 V boxes, 23 to 4310 nts from ZNF497 with the known TSS at 4300 nts.

W boxes

Proximal promoters

Inverse W boxes occur in the negative strand, negative direction of A1BG: 3'-GGTCAA-5' at 4416 and 3'-GGTCAA-5' at 4308.

W boxes occur in the positive direction, positive strand of A1BG: 3'-CTGACC-5' and its complement at 4216 and inverse W boxes occur 3'-GGTCAG-5' and its complement at 4270.

Distal promoters

A W box occurs 3'-CTGACC-5' at 3749, whereas 3'-CTGACT-5' at 17, 3'-TTGACT-5' at 130, 3'-TTGACT-5' at 307, and 3'-CTGACC-5' at 734 occur close to ZSCAN22, but 3'-CTGACT-5' at 1935 could be associated ZSCAN22 or an unknown gene between it and A1BG, along with their complements, negative strand, negative direction.

Inverse complement, positive strand, negative direction there are 5: 3'-GGTCAG-5', 440, 3'-GGTCAG-5', 577, 3'-GGTCAG-5', 713, 3'-GGTCAG-5', 2249, 3'-GGTCAG-5', 2586.

W box inverses occur 3'-GGTCAG-5' at 1353 negative direction.

W boxes 3'-AGTCAG-5' at 2101, 3'-GGTCAG-5' at 2221, 3'-AGTCAG-5' at 2608, 3'-AGTCAA-5' at 2614, and 3'-AGTCAG-5' at 2619 along with their complements, positive direction.

W boxes in the positive direction occur 3'-CTGACC-5' at 1662, 3'-CTGACC-5' at 2213, 3'-TTGACC-5' at 2873, 3'-CTGACT-5' at 2945, and 3'-TTGACC-5' at 4018 that could be associated with A1BG, along with 3'-TTGACC-5' at 1953, 3'-CTGACT-5' at 2674, and 3'-TTGACT-5' at 3735.

Inverse complement, positive strand, positive direction there are 6: 3'-GGTCAG-5', 2025, 3'-AGTCAG-5', 2099, 3'-GGTCAG-5', 2606, 3'-GGTCAG-5', 2997, 3'-GGTCAG-5', 3083, 3'-GGTCAA-5', 3380.

X core promoter elements

  1. negative strand in the negative direction, looking for 3'-G/A/T-G/C-G-T/C-G-G-G/A-A-G/C-A/C-5', 1, 3'-TGGTGGGACC-5', 3744 and complement,
  2. inverse complement, positive strand, negative direction, looking for 3'-G/T-G/C-T-C/T-C-C-A/G-C-G/C-C/A/T-5', 1, 3'-GCTCCCACCT-5', 392 and complement, and
  3. inverse, negative strand, positive direction, looking for 3'-A/C-G/C-A-G/A-G-G-T/C-G-G/C-G/A/T-5', 1, 3'-CCAGGGTGGG-5', 102.

Z boxes

"The HY5 protein interacts with both the G- (CACGTG) and Z- (ATACGTGT) boxes of the light-regulated promoter of RbcS1A (ribulose bisphosphate carboxylase small subunit) and the CHS (chalcone synthase) genes (Ang et al., 1998; Chattopadhyay et al., 1998; Yadav et al., 2002)."[142]

Z-boxes 1-3 contain 5'-AGGTG-3'.[143]

  1. negative strand in the positive direction (from ZNF497 to A1BG) is SuccessablesZbox-+.bas, looking for 3'-A(C/T)A(C/G)GT(A/G)T-5', 1, 3'-ACAGGTGT-5' at 1969 and complement.
  2. positive strand in the positive direction (from ZNF497 to A1BG) is SuccessablesZbox++.bas, looking for 3'-A(C/T)A(C/G)GT(A/G)T-5', 1, 3'-ACACGTGT-5', 2962 and complement.
  3. inverse complement, positive strand, negative direction is SuccessablesZboxci+-.bas, looking for 3'-A(C/T)AC(C/G)T(A/G)T-5', 3 between 1131 and 3970 complements.

Response element negative results

Response elements not occurring in promoters near A1BG
Name of elements Consensus sequences Testing Notes
CAAT boxes 5'-CAAT-3' 16 consensus sequence for the CCAAT-enhancer-binding site (C/EBP) is TAGCATT
Cat8ps 5'-CGGTCCGC-3' ⌘F 5'-CGGNBNVMHGGA-3', 5'-CGG(A/C/G/T)(C/G/T)(A/C/G/T)(A/C/G)(A/C)(A/C/T)GGA-3'
Cbf1 regulatory factors 5'-TCACGTGA-3' ⌘F strongly bound Cbf1 motifs enriched at both ends with a "T" on the 5′ and "A" on the 3′ end
CENP-B boxes 5'-TTTCGTTGGAAGCGGGA-3' 16 specifically localized at the centromere
Crz1ps 5'-TGCGCCCC-3' ⌘F 5'-TG(A/C)GCCNC-3'
DNA damage response elements (DREs) 5'-TAGCCGCCG-3' or 5'-TTTCAAT-3' ⌘F in the upstream repression sequence (URS)
DREB boxes 5'-TACCGACAT-3' 16 CRT/DREB box
Forkhead boxes 5'-(A/G)(C/T)AAA(C/T)A-3' ⌘F 5'-GTAAACAA-3' FOXO1
Gal4ps 5'-CGGACCGC-3' ⌘F 5'-CGG(A/G)NN(A/G)C(C/T)N(C/T)NCNCCG-3'
GCN4 motifs 5'-TGACTCA-3', 5'-TGAGTCA-3' ⌘F ACGT motif
Gcn4ps 5'-ATGACTCTT-3' ⌘F GCN4 motifs
GLM boxes 5′-(G/A)TGA(G/C)TCA(T/C)-3′ 16 GCN4-like motif
γ-interferon activated sequences (GAS) 5'-TTCCTAGAA-3' ⌘F ALS-GAS1 between nt −633 and nt −625
Grainy head transcription factor binding sites 5'-AACCGGTT-3' ⌘F also 5'-GACTGGTT-3'
GT boxes 5'-GGGGTGGGG-3' ⌘F (-78 to -69)
Hac1ps 5'-CAGCGTG-3' ⌘F Regulates the unfolded protein response
Heat-responsive elements 5'-AAAAAATTTC-3' ⌘F four nGAAn motifs
HMG boxes 5'-(A/T)(A/T)CAAAG-3' ⌘F two or more HMG boxes
Hybrid C, A boxes 5'-TGACGTAT-3' ⌘F A at the 12 position
Hybrid C, G boxes 5'-TGACGTGT-3' ⌘F G at the 12 position
Hybrid C, T boxes 5'-TGACGTTA-3' ⌘F T at the 12 position
Hypoxia-inducible factors 5'-GCCCTACGT-3' ⌘F composed of HIF-1α and HIF-1β
I boxes 5'-GATAAG-3' ⌘F 5'-GGATGAGATAAGA-3'
Inositol, choline-responsive element 5'-TYTTCACATGY-3' ⌘F 5'-TCTTCAC, TCTTCACAT-3'
L boxes 5'-TAAATG(A/C/G)A-3' ⌘F L1 box
MAREs 5'-TGCTGA(G/C)TCAGCA-3' ⌘F and 5'-TGCTGA(GC/CG)TCAGCA-3'
M boxes 5'-GTCATGTGCT-3' ⌘F upstream of the TATA box
Mcm1 regulatory factors 5'-(A/C/T)(A/C/T)NC(C/T)(A/C/T)(A/C/T)(A/T)(A/C/T)(A/C/T)N(A/G)(C/G/T)(A/C/T)-3' ⌘F Genome-wide determinant search
Met31ps 5'-AAACTGTGG-3' ⌘F Sulfur amino acid metabolism [72]
Middle sporulation elements 5'-C(A/G)CAAA(A/T)-3' ⌘F 5'-ACACAAA-3' (2017)
Motif ten elements 5'-C-C/G-A-A/G-C-C/G-C/G-A-A-C-G-C/G-3' 16 Gene ID: 6309
Ndt80ps 5'-TCCGCA-3' ⌘F 5'-DNCRCAAAW-3'
Nuclear factor Y 5'-TACCGACAT-3' ⌘F NF-Y is a trimeric complex
Nutrient-sensing response element 1 5'-GTTTCATCA-3' ⌘F only one nucleotide difference between the SESN2 CARE and the ASNS
Oaf1ps 5'-(A/C/G/T)(A/C/G/T)(A/C/G/T)T(A/C/G/T)A(A/C/G/T)-3' ⌘F 5'-CGG(A/C/G/T)3T(A/C/G/T)A(A/C/G/T)9-12CCG-3'
p53 response element 5'-(A/G)(A/G)(A/G)C(A/T)(A/T)G(C/T)(C/T)(C/T)-3' ⌘F (GGGCATGCCT) two closely spaced decameric half-sites
Pdr1p/Pdr3ps 5'-TCCGCGGA-3' ⌘F Pdr1p/Pdr3p response element (PDRE)
Polycomb response elements 5'-CGCCATTT-3' ⌘F closely resembles the extended Pho-Phol consensus sequence
Rap1 regulatory factors 5'-C(A/C/G)(A/C/G)(A/G)(C/G/T)C(A/C/T)(A/G/T)(C/G/T)(A/G/T)(A/C/G)(A/C)(A/C/T)(A/C/T)-3' ⌘F Rap1 (CCCACCAACAAAA) none
Rgt1ps 5'-CGGACCA-3' ⌘F Glucose-responsive transcription factor
Rlm1ps 5'-CTATATATAG-3' ⌘F CTA(T/A)4TAG
Rox1ps 5'-GGGTAA-3' ⌘F Heme-dependent repressor of hypoxic genes [78]
Rpn4ps 5'-GGTGGCAAA-3' ⌘F proteasome genes
Seed-specific elements 5'-CATGCATG-3' ⌘F SRE consensus: 5'-CAGCAGATTGCG-3' is none
Shoot specific elements 5'-GATAATGATG-3' ⌘F SRE consensus: 5'-CAGCAGATTGCG-3' is none
Sip4ps 5'-CCGTCCGT-3' ⌘F 5'-CC(C/G)T(C/T)C(C/G)TCCG-3'
Smp1ps 5'-ACTACTA-3' ⌘F 5-ACTACTA(T/A)4TAG-3'
Sterol response elements 5'-TCGTATA-3' ⌘F perhaps plant specific
TATCCAC boxes 5'-TATCCAC-3' 16 GA responsive complex component
TCCACCATA elements 5'-TCCACCATA-3' ⌘F adjacent co-dependent regulatory element of POLLEN1
Tec1ps 5'-GAATGT-3' ⌘F Ste12p cofactor
Tetradecanoylphorbol-13-acetate response elements (TREs) 5'-TGA(G/C)TCA-3' 16 cis-regulatory element of the human metallothionein IIa (hMTIIa) promoter and SV40
TGF-β control elements (TCEs) 5'-GAGTGGGGCG-3' ⌘F in mouse and rat, 5'-GCGTGGGGGA-3' in humans
TGF-β inhibitory elements (TIEs) 5'-GAGTGGTGA-3' 16 in the rat transin/stromelysin promoter
Thyroid hormone response elements (TREs) 5'-AGGTCA-3' ⌘F See VDREs, X boxes
Unfolded protein response elements (UPREs) 5'-TGACGTG(G/A)-3' ⌘F XBP1 binds to UPRE
Vhr1ps 5'-AATCA-N8-TGA(C/T)T-3' ⌘F Response to low biotin [71] concentrations
Vitamin D response elements (VDREs) 5'-(A/G)G(G/T)(G/T)CA-3' ⌘F 5'-AGGTCA-3' not ⌘F
X boxes 5'-GTTGGCATGGCAAC-3' 16 X2 box is 5'-AGGTCCA-3' not ⌘F
Xbp1ps 5'-GcCTCGA(G/A)G(C/A)g(a/g)-3' ⌘F Transcriptional repressor
Xenobiotic response elements (XREs) 5'-(T/G)NGCGTG(A/C)(G/C)A-3' ⌘F contains the core sequence 5'-GCGTG-3'
Yap1p,2ps 5'-TTACTAA-3' ⌘F Yap1p binding sites
Y boxes 5'-(A/G)CTAACC(A/G)(A/G)(C/T)-3' 16 inverted CAAT box
Zap1ps 5'-ACCCTCA-3' ⌘F 5'-ACC(C/T)(C/T)(A/C/G/T)AAGGT-3'

Hypotheses

  1. Downstream core promoters may work as transcription factors even as their complements or inverses.
  2. In addition to the DNA binding sequences listed above, the transcription factors that can open up and attach through the local epigenome need to be known and specified.
  3. Each DNA binding domain serving as a transcription factor for the promoter of any immunoglobulin supergene family member, also serves or is present in the promoters for A1BG.
  4. The function of A1BG is the same as other immunoglobulin genes possessing the immunoglobulin domain cl11960 and/or any of three immunoglobulin-like domains: pfam13895, cd05751 and smart00410 in the order and nucleotide sequence: cd05751 Location: 401 → 493, smart00410 Location: 218 → 280, pfam13895 Location: 210 → 301 and cl11960 Location: 28 → 110.

See also

References

  1. "Entrez Gene: Alpha-1-B glycoprotein". Retrieved 2012-11-09.
  2. 2.0 2.1 "A1BG alpha-1-B glycoprotein". Retrieved May 10, 2013.
  3. 3.0 3.1 3.2 Qingliang Li, Rezaul M. Karim, Mo Cheng, Mousumi Das, Lihong Chen, Chen Zhang, Harshani R. Lawrence, Gary W. Daughdrill, Ernst Schonbrunn, Haitao Ji and Jiandong Chen (July 2020). "Inhibition of p53 DNA binding by a small molecule protects mice from radiation toxicity". Oncogene. 39 (29): 5187–5200. doi:10.1038/s41388-020-1344-y. PMID 32555331 Check |pmid= value (help). Retrieved 29 August 2020.
  4. 4.0 4.1 4.2 4.3 Ruoyi Gu, Jun Xu, Yixiang Lin, Jing Zhang, Huijun Wang, Wei Sheng, Duan Ma, Xiaojing Ma & Guoying Huang (July 2016). "Liganded retinoic acid X receptor α represses connexin 43 through a potential retinoic acid response element in the promoter region". Pediatric Research. 80 (1): 159–168. doi:10.1038/pr.2016.47. PMID 26991262. Retrieved 7 September 2020.
  5. 5.0 5.1 HGNC (13 March 2020). "ZSCAN22 zinc finger and SCAN domain containing 22 [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  6. 6.0 6.1 RefSeq (10 September 2009). "MIR6806 microRNA 6806 [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  7. Jag123 (7 March 2005). "antigen". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 7 March 2020. Vancouver style error: non-Latin character (help)
  8. SemperBlotto (21 April 2008). "immunogen". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 8 March 2020.
  9. 9.0 9.1 9.2 C. Michael Gibson (27 April 2008). "Antigen". Boston, Massachusetts: WikiDoc Foundation. Retrieved 8 March 2020.
  10. Williamsayers79 (26 February 2007). "antibody". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 7 March 2020. Vancouver style error: non-Latin character (help)
  11. Jag123 (7 March 2005). "antibody". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 7 March 2020. Vancouver style error: non-Latin character (help)
  12. Eleonora Market, F. Nina Papavasiliou (2003). "V(D)J Recombination and the Evolution of the Adaptive Immune System". PLoS Biology. 1 (1): e16. doi:10.1371/journal.pbio.0000016.
  13. Charles A Janeway, Jr, Paul Travers, Mark Walport, and Mark J Shlomchik (2001). Immunobiolog (5th ed. ed.). Garland Publishing. ISBN 0-8153-3642-X.
  14. SemperBlotto (25 February 2006). "immunoglobulin". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 7 March 2020.
  15. SemperBlotto (28 April 2008). "immunoglobulin". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 7 March 2020.
  16. 16.0 16.1 16.2 16.3 RefSeq (10 December 2019). "A1BG alpha-1-B glycoprotein [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  17. Mei Tian, Ya-Zhou Cui, Guan-Hua Song, Mei-Juan Zong, Xiao-Yan Zhou, Yu Chen, Jin-Xiang Han (2008). "Proteomic analysis identifies MMP-9, DJ-1 and A1BG as overexpressed proteins in pancreatic juice from pancreatic ductal adenocarcinoma patients". BMC Cancer. 8: 241. doi:10.1186/1471-2407-8-241. PMC 2528014. PMID 18706098.
  18. 18.0 18.1 18.2 18.3 18.4 18.5 18.6 Noriaki Ishioka, Nobuhiro Takahashi, and Frank W. Putnam (April 1986). "Amino acid sequence of human plasma 𝛂1B-glycoprotein: Homology to the immunoglobulin supergene family" (PDF). Proceedings of the National Academy of Sciences USA. 83 (8): 2363–7. doi:10.1073/pnas.83.8.2363. PMID 3458201. Retrieved 9 March 2020.
  19. 19.0 19.1 Katrina M. Morris, Denis O’Meally, Thiri Zaw, Xiaomin Song, Amber Gillett, Mark P. Molloy, Adam Polkinghorne, and Katherine Belova (7 October 2016). "Characterisation of the immune compounds in koala milk using a combined transcriptomic and proteomic approach". Scientific Reports. 6: 35011. doi:10.1038/srep35011. PMID 27713568. Retrieved 14 March 2020.
  20. R. J. Paxton, G. Mooser, H. Pande, T. D. Lee, and J. E. Shively (1 February 1987). "Sequence analysis of carcinoembryonic antigen: identification of glycosylation sites and homology with the immunoglobulin supergene family" (PDF). Proceedings of the National Academy of Sciences USA. 84 (4): 920–924. doi:10.1073/pnas.84.4.920. PMID 3469650. Retrieved 26 March 2020.
  21. NCBI (2 February 2016). "Conserved Protein Domain Family cl11960: Ig Superfamily". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 22 May 2020.
  22. NCBI (5 August 2015). "Conserved Protein Domain Family pfam13895: Ig_2". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 24 May 2020.
  23. NCBI (16 August 2016). "Conserved Protein Domain Family cd05751: Ig1_LILR_KIR_like". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 24 May 2020.
  24. NCBI (16 January 2013). "Conserved Protein Domain Family smart00410: IG_like". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 24 May 2020.
  25. 24.98.118.180 (28 February 2007). "species". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020. Vancouver style error: non-Latin character (help)
  26. 26.0 26.1 Peter coxhead (22 August 2018). "Species". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  27. Chiswick Chap (1 December 2016). "Species". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  28. 28.0 28.1 28.2 28.3 "AceView: A1BG". Retrieved May 11, 2013.
  29. Pdeitiker (26 July 2008). "variant". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  30. SemperBlotto (6 January 2007). "isoform". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2 December 2018.
  31. 72.178.245.181 (30 November 2008). "isoform". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2 December 2018. Vancouver style error: non-Latin character (help)
  32. H Eiberg, ML Bisgaard, J Mohr (1 December 1989). "Linkage between alpha 1B-glycoprotein (A1BG) and Lutheran (LU) red blood group system: assignment to chromosome 19: new genetic variants of A1BG". Clinical genetics. 36 (6): 415–8. PMID 2591067. Retrieved 2017-10-08.
  33. John R. Stehle Jr., Mark E. Weeks, Kai Lin, Mark C. Willingham, Amy M. Hicks, John F. Timms, Zheng Cui (January 2007). "Mass spectrometry identification of circulating alpha-1-B glycoprotein, increased in aged female C57BL/6 mice". Biochimica et Biophysica Acta (BBA) - General Subjects. 1770 (1): 79–86. doi:10.1016/j.bbagen.2006.06.020. PMID 16945486. Retrieved 2017-10-08.
  34. 34.0 34.1 34.2 34.3 34.4 Caitrin W. McDonough, Yan Gong, Sandosh Padmanabhan, Ben Burkley, Taimour Y. Langaee, Olle Melander, Carl J. Pepine, Anna F. Dominiczak, Rhonda M. Cooper-DeHoff, and Julie A. Johnson (June 2013). "Pharmacogenomic Association of Nonsynonymous SNPs in SIGLEC12, A1BG, and the Selectin Region and Cardiovascular Outcomes" (PDF). Hypertension. 62 (1): 48–54. doi:10.1161/HYPERTENSIONAHA.111.00823. PMID 23690342. Retrieved 2017-10-08.
  35. DTLHS (10 January 2018). "genotype". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  36. SemperBlotto (22 October 2005). "genotype". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  37. Widsith (28 March 2012). "polymorphism". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  38. 217.105.66.98 (8 September 2016). "allele". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020. Vancouver style error: non-Latin character (help)
  39. 138.130.33.215 (7 April 2004). "allele". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020. Vancouver style error: non-Latin character (help)
  40. 40.0 40.1 B. Gahne, R. K. Juneja, and A. Stratil (June 1987). "Genetic polymorphism of human plasma alpha 1B-glycoprotein: phenotyping by immunoblotting or by a simple method of 2-D electrophoresis". Human Genetics. 76 (2): 111–5. doi:10.1007/bf00284904. PMID 3610142. Retrieved 25 March 2020.
  41. R.K. Juneja, G. Beckman, M. Lukka, B. Gahne, and C. Ehnholm (1989). "Plasma α1B-Glycoprotein Allele Frequencies in Finns and Swedish Lapps: Evidence for a New α1B Allele". Human Heredity. 39 (1): 32–36. doi:10.1159/000153828. PMID 2759622. Retrieved 25 March 2020.
  42. 42.0 42.1 R.K. Juneja, N. Saha, B. Gahne and J.S.H. Tay (1989). "Distribution of Plasma Alpha-1-B-Glycoprotein Phenotypes in Several Mongoloid Populations of East Asia". Human Heredity. 39: 218–222. doi:10.1159/000153863. PMID 2583734. Retrieved 25 March 2020.
  43. 24.235.196.118 (23 September 2007). "phenotype". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2016-10-04. Vancouver style error: non-Latin character (help)
  44. SemperBlotto (14 February 2005). "phenotype". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2016-10-04.
  45. N2e (3 July 2008). "phenotype". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2016-10-04. Vancouver style error: non-Latin character (help)
  46. Mardiaty Iryani Abdullah, Ching Chin Lee, Sarni Mat Junit, Khoon Leong Ng, and Onn Haji Hashim (13 September 2016). "Tissue and serum samples of patients with papillary thyroid cancer with and without benign background demonstrate different altered expression of proteins". Peer J. 4: e2450. doi:10.7717/peerj.2450. PMID 27672505. Retrieved 15 March 2020.
  47. 47.0 47.1 47.2 47.3 Udby L, Sørensen OE, Pass J, Johnsen AH, Behrendt N, Borregaard N, Kjeldsen L (12 October 2004). "Cysteine-rich secretory protein 3 is a ligand of alpha1B-glycoprotein in human plasma". Biochemistry. 43 (40): 12877–86. doi:10.1021/bi048823e. PMID 15461460. Retrieved 2011-11-28. Vancouver style error: punctuation (help)
  48. "The Opossum: Our Marvelous Marsupial, The Social Loner". Wildlife Rescue League.
  49. Journal Of Venomous Animals And Toxins – Anti-Lethal Factor From Opossum Serum Is A Potent Antidote For Animal, Plant And Bacterial Toxins. Retrieved 2009-12-29.
  50. 50.0 50.1 B Haendler, J Krätzschmar, F Theuring and W D Schleuning (July 1993). "Transcripts for cysteine-rich secretory protein-1 (CRISP-1; DE/AEG) and the novel related CRISP-3 are expressed under androgen control in the mouse salivary gland". Endocrinology. 133 (1): 192–8. doi:10.1210/en.133.1.192. PMID 8319566. Retrieved 2012-02-20.
  51. 51.0 51.1 HGNC (10 December 2019). "A1BG-AS1 A1BG antisense RNA 1 [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  52. 52.0 52.1 52.2 52.3 HGNC (10 December 2019). "ZNF497 zinc finger protein 497 [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  53. 53.0 53.1 HGNC (10 December 2019). "LOC100419840 zinc finger protein 446 pseudogene [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  54. 54.0 54.1 HGNC (10 December 2019). "LOC105372483 uncharacterized LOC105372483 [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  55. 55.0 55.1 HGNC (10 December 2019). "RNA5SP473 RNA, 5S ribosomal pseudogene 473 [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  56. Francis S Collins, Eric D Green, Alan E Guttmacher, Mark S Guyer (24 April 2003). "A vision for the future of genomics research". Nature. 422 (6934): 835–47. doi:10.1038/nature01626. PMID 12695777. Retrieved 9 August 2020.
  57. The ENCODE Project Consortium (22 October 2004). "The ENCODE (ENCyclopedia of DNA Elements) Project". Science. 306 (5696): 636–640. doi:10.1126/science.1105136. PMID 15499007. Retrieved 9 August 2020.
  58. The ENCODE Project Consortium (14 June 2007). "Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project". Nature. 447 (7146): 799–816. doi:10.1038/nature05874. PMID 17571346. Retrieved 9 August 2020.
  59. 59.0 59.1 Ya-Mei Wang, Ping Zhou, Li-Yong Wang, Zhen-Hua Li, Yao-Nan Zhang, and Yu-Xiang Zhang (10 August 2012). "Correlation Between DNase I Hypersensitive Site Distribution and Gene Expression in HeLa S3 Cells". PLoS One. 7 (8): e2414. doi:10.1371/journal.pone.0042414. PMID 22900019. Retrieved 9 August 2020.
  60. MeSH (8 July 2008). "Response Elements". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894: National Institutes of Health, Health & Human Services. Retrieved 2 September 2020.
  61. 61.0 61.1 RefSeq (November 2019). "LOC116286197 CRISPRi-validated cis-regulatory element chr19.6329 [ Homo sapiens (human) ]". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 25 July 2020.
  62. RefSeq (February 2016). "ZNF582 zinc finger protein 582 [ Homo sapiens (human) ]". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 28 May 2020.
  63. RefSeq (June 2018). "LOC112553117 Sharpr-MPRA regulatory region 1998 [ Homo sapiens (human) ]". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 25 July 2020.
  64. RefSeq (June 2018). "Sharpr-MPRA regulatory region 10473 [ Homo sapiens (human) ]". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 16 July 2020.
  65. RefSeq (June 2018). "Sharpr-MPRA regulatory region 7872 [ Homo sapiens (human) ]". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 1 August 2020.
  66. RefSeq (June 2018). "Sharpr-MPRA regulatory region 9894 [ Homo sapiens (human) ]". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 16 July 2020.
  67. 67.0 67.1 Tara L. Conforto, Yijing Zhang, Jennifer Sherman, and David J. Waxman (November 2012). "Impact of CUX2 on the Female Mouse Liver Transcriptome: Activation of Female-Biased Genes and Repression of Male-Biased Genes" (PDF). Molecular and Cellular Biology. 32 (22): 4611–4627. doi:10.1128/MCB.00886-12. PMID 22966202. Retrieved 8 August 2020.
  68. Muhammad Asad Ullah Asad, Shamsu Ado Zakari, Qian Zhao, Lujian Zhou, Yu Ye and Fangmin Cheng (10 January 2019). "Abiotic Stresses Intervene with ABA Signaling to Induce Destructive Metabolic Pathways Leading to Death: Premature Leaf Senescence in Plants". International Journal of Molecular Sciences. 20 (2): 256–278. doi:10.3390/ijms20020256. PMID 30634648. Retrieved 27 August 2020.
  69. 69.0 69.1 69.2 69.3 69.4 Matthew J. Rossi, William K.M. Lai and B. Franklin Pugh (21 March 2018). "Genome-wide determinants of sequence-specific DNA binding of general regulatory factors". Genome Research. 28: 497–508. doi:10.1101/gr.229518.117. PMID 29563167. Retrieved 31 August 2020.
  70. 70.0 70.1 70.2 70.3 Landschulz WH, Johnson PF, McKnight SL (June 1988). "The leucine zipper: a hypothetical structure common to a new class of DNA binding proteins". Science. 240 (4860): 1759–64. Bibcode:1988Sci...240.1759L. doi:10.1126/science.3289117. PMID 3289117.
  71. 71.0 71.1 71.2 71.3 Z G E, Zhang YP, Zhou JH, Wang L (April 2014). "Mini review roles of the bZIP gene family in rice". Genetics and Molecular Research. 13 (2): 3025–36. doi:10.4238/2014.April.16.11. PMID 24782137. Vancouver style error: name (help)
  72. 72.0 72.1 72.2 72.3 Nijhawan A, Jain M, Tyagi AK, Khurana JP (February 2008). "Genomic survey and gene expression analysis of the basic leucine zipper transcription factor family in rice". Plant Physiology. 146 (2): 333–50. doi:10.1104/pp.107.112821. PMC 2245831. PMID 18065552.
  73. Keiko Kokoroishi, Ayumu Nakashima, Shigehiro Doi, Toshinori Ueno, Toshiki Doi, Yukio Yokoyama, Kiyomasa Honda, Masami Kanawa, Yukio Kato, Nobuoki Kohno & Takao Masaki (28 May 2015). "High glucose promotes TGF-β1 production by inducing FOS expression in human peritoneal mesothelial cells". Clinical and Experimental Nephrology. 20 (1): 30–8. doi:10.1007/s10157-015-1128-9. PMID 26018137. Retrieved 14 August 2020.
  74. 74.00 74.01 74.02 74.03 74.04 74.05 74.06 74.07 74.08 74.09 74.10 74.11 Bhaskar Sharma & Joemar Taganna (12 June 2020). "Genome-wide analysis of the U-box E3 ubiquitin ligase enzyme gene family in tomato". Scientific Reports. 10 (9581). doi:10.1038/s41598-020-66553-1. PMID 32533036 Check |pmid= value (help). Retrieved 27 August 2020.
  75. James R. Mitchell, Jeffrey Cheng, ang Kathleen Collins (January 1999). "A Box H/ACA Small Nucleolar RNA-Like Domain at the Human Telomerase RNA 3' End" (PDF). Molecular and Cellular Biology. 19 (1): 567–576. doi:10.1128/mcb.19.1.567. PMID 9858580. Retrieved 5 November 2018.
  76. 76.0 76.1 Ulrike Hartmann, Martin Sagasser, Frank Mehrtens, Ralf Stracke and Bernd Weisshaar (January 2005). "Differential combinatorial interactions of cis-acting elements recognized by R2R3-MYB, BZIP, and BHLH factors control light-responsive and tissue-specific activation of phenylpropanoid biosynthesis genes" (PDF). Plant Molecular Biology. 57 (2): 155–171. doi:10.1007/s11103-004-6910-0. PMID 15821875. Retrieved 10 November 2018.
  77. 77.0 77.1 77.2 Yao EF, Denison MS (June 1992). "DNA sequence determinants for binding of transformed Ah receptor to a dioxin-responsive enhancer". Biochemistry. 31 (21): 5060–7. doi:10.1021/bi00136a019. PMID 1318077.
  78. Isabelle R. Cohen, Susanne Grässel, Alan D. Murdoch, and Renat V. Iozzo (1 November 1993). "Structural characterization of the complete human perlecan gene and its promoter" (PDF). Proceedings of the National Academy of Sciences USA. 90 (21): 10404–10408. doi:10.1073/pnas.90.21.10404. PMID 8234307. Retrieved 6 September 2020.
  79. 79.0 79.1 79.2 Alisa A. Garaeva, Irina E. Kovaleva, Peter M. Chumakov & Alexandra G. Evstafieva (15 January 2016). "Mitochondrial dysfunction induces SESN2 gene expression through Activating Transcription Factor 4". Cell Cycle. 15 (1): 64–71. doi:10.1080/15384101.2015.1120929. PMID 26771712. Retrieved 5 September 2020.
  80. 80.0 80.1 Thomas D. Burton, Anthony O. Fedele, Jianling Xie, Lauren Sandeman and Christopher G. Proud (22 May 2020). "The gene for the lysosomal protein LAMP3 is a direct target of the transcription factor ATF4" (PDF). Journal of Biological Chemistry. 295 (21): 7418. doi:10.1074/jbc.RA119.011864. PMID 32312748 Check |pmid= value (help). Retrieved 5 September 2020.
  81. 81.0 81.1 81.2 81.3 81.4 81.5 81.6 81.7 81.8 Hongting Tang, Yanling Wu, Jiliang Deng, Nanzhu Chen, Zhaohui Zheng, Yongjun Wei, Xiaozhou Luo, and Jay D. Keasling (6 August 2020). "Promoter Architecture and Promoter Engineering in Saccharomyces cerevisiae". Metabolites. 10 (8): 320–39. doi:10.3390/metabo10080320. PMID 32781665 Check |pmid= value (help). Retrieved 18 September 2020.
  82. Noriyuki Sato; Tomohiro Katsuya; Hiromi Rakugi; Seiju Takami; Yukiko Nakata; Tetsuro Miki; Jitsuo Higaki; Toshio Ogihara (September 1997). "Association of Variants in Critical Core Promoter Element of Angiotensinogen Gene With Increased Risk of Essential Hypertension in Japanese". Hypertension. 30 (3 Pt 1): 321–5. doi:10.1161/01.HYP.30.3.321. PMID 9314411. Retrieved 2012-02-20.
  83. Kazuyuki Yanai, Tomoko Saito, Keiko Hirota, Hideyuki Kobayashi, Kazuo Murakami and Akiyoshi Fukamizu (28 November 1997). "Molecular Variation of the Human Angiotensinogen Core Promoter Element Located between the TATA Box and Transcription Initiation Site Affects Its Transcriptional Activity". The Journal of Biological Chemistry. 272 (48): 30558–62. PMID 9374551. Retrieved 2012-02-20.
  84. 84.0 84.1 84.2 84.3 84.4 Albena T. Dinkova‐Kostova, Rumen V. Kostov and Aleksey G. Kazantsev (11 January 2018). "The role of Nrf2 signaling in counteracting neurodegenerative diseases". The FEBS Journal. 285 (19). doi:10.1111/febs.14379. PMID 29323772. Retrieved 21 August 2020.
  85. 85.0 85.1 Akihito Otsuki, Mikiko Suzuki, Fumiki Katsuoka, Kouhei Tsuchida, Hiromi Suda, Masanobu Morita, Ritsuko Shimizu, Masayuki Yamamoto (February 2016). "Unique cistrome defined as CsMBE is strictly required for Nrf2-sMaf heterodimer function in cytoprotection". Free Radical Biology and Medicine. 91: 45–57. doi:10.1016/j.freeradbiomed.2015.12.005. PMID 26677805. Retrieved 21 August 2020.
  86. 86.0 86.1 86.2 86.3 Arnaud Stigliani, Raquel Martin-Arevalillo, Jérémy Lucas, Adrien Bessy, Thomas Vinos-Poyo, Victoria Mironova, Teva Vernoux, Renaud Dumas and François Parcy (3 June 2019). "Capturing Auxin Response Factors Syntax Using DNA Binding Models". Molecular Plant. 12 (6): 822–832. doi:10.1016/j.molp.2018.09.010. PMID 30336329. Retrieved 29 August 2020.
  87. 87.0 87.1 87.2 PA Johnson, D Bunick, NB Hecht (1991). "Protein Binding Regions in the Mouse and Rat Protamine-2 Genes" (PDF). Biology of Reproduction. 44 (1): 127–134. doi:10.1095/biolreprod44.1.127. PMID 2015343. Retrieved 6 April 2019.
  88. Amber Paratore Sanchez and Kumar Sharma (July 2009). "Transcription factors in the pathogenesis of diabetic nephropathy". Expert Reviews in Molecular Medicine. 11: e13. doi:10.1017/S1462399409001057. PMID 19397838. Retrieved 1 October 2018.
  89. 89.0 89.1 89.2 Andreas Schlundt, Sophie Buchner, Robert Janowski, Thomas Heydenreich, Ralf Heermann, Jürgen Lassak, Arie Geerlof, Ralf Stehle, Dierk Niessing, Kirsten Jung & Michael Sattler (21 April 2017). "Structure-function analysis of the DNA-binding domain of a transmembrane transcriptional activator". Scientific Reports. 7: 1051. doi:10.1038/s41598-017-01031-9. PMID 28432336. Retrieved 28 August 2020.
  90. Xu Tao, Anne E. West, Wen G. Chen, Gabriel Corfas, Michael E. Greenberg (2002). "A calcium-responsive transcription factor, CaRF, that regulates neuronal activity-dependent expression of BDNF". Neuron. 33: 383–95. doi:10.1016/S0896-6273(01)00561-X. PMID 11832226. Retrieved 2 September 2020.
  91. Masaki Fujisawa, Toshitsugu Nakano, Yoko Shima and Yasuhiro Ito (5 February 2013). "A large-scale identification of direct targets of the tomato MADS box transcription factor RIPENING INHIBITOR reveals the regulation of fruit ripening". The Plant Cell. 25 (2): 371–86. doi:10.​1105/​tpc.​112.​108118 Check |doi= value (help). PMID 23386264. Retrieved 2017-02-19. zero width space character in |doi= at position 4 (help)
  92. 92.0 92.1 92.2 Young Hun Song, Cheol Min Yoo, An Pio Hong, Seong Hee Kim, Hee Jeong Jeong, Su Young Shin, Hye Jin Kim, Dae-Jin Yun, Chae Oh Lim, Jeong Dong Bahk, Sang Yeol Lee, Ron T. Nagao, Joe L. Key, and Jong Chan Hong (April 2008). "DNA-Binding Study Identifies C-Box and Hybrid C/G-Box or C/A-Box Motifs as High-Affinity Binding Sites for STF1 and LONG HYPOCOTYL5 Proteins" (PDF). Plant Physiology. 146 (4): 1862–1877. doi:10.1104/pp.107.113217. PMID 18287490. Retrieved 26 March 2019.
  93. E. N. Voronina, T. D. Kolokol’tsova, E. A. Nechaeva, and M. L. Filipenko (2003). "Structural–Functional Analysis of the Human Gene for Ribosomal Protein L11" (PDF). Molecular Biology. 37 (3): 362–371. Retrieved 11 April 2019.
  94. RefSeq (September 2011). "NFIA nuclear factor I A [ Homo sapiens (human) ]". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 4 May 2020.
  95. 95.0 95.1 95.2 Ravi P. Misra, Azad Bonni, Cindy K. Miranti, Victor M. Rivera, Morgan Sheng, and Michael E.Greenberg (14 October 1994). "L-type Voltage-sensitive Calcium Channel Activation Stimulates Gene Expression by a Serum Response Factor-dependent Pathway" (PDF). The Journal of Biological Chemistry. 269 (41): 25483–25493. PMID 7929249. Retrieved 7 December 2019.
  96. Björn Pietzenuk, Catarine Markus, Hervé Gaubert, Navratan Bagwan, Aldo Merotto, Etienne Bucher & Ales Pecinka (11 October 2016). "Recurrent evolution of heat-responsiveness in Brassicaceae COPIA elements". Genome Biology. 17: 209. doi:10.1186/s13059-016-1072-3. Retrieved 14 September 2020.
  97. 97.0 97.1 Marc R. Montminy, Kevin A. Sevarino, John A. Wagner, Gail Mandel, and Richard H. Goodman (September 1986). "Identification of a cyclic-AMP-responsive element within the rat somatostatin gene" (PDF). Proceedings of the National Academy of Sciences of the USA. 83 (18): 6382–6. PMID 2875459. Retrieved 17 September 2018.
  98. Hideharu Hashimoto, Dongxue Wang, John R. Horton, Xing Zhang, Victor G. Corces and Xiaodong Cheng (1 June 2017). "Structural Basis for the Versatile and Methylation-Dependent Binding of CTCF to DNA". Molecular Cell. 66 (5): 711–720.e3. doi:10.1016/j.molcel.2017.05.004. PMID 28529057. Retrieved 28 August 2020.
  99. Yan-Hui Li and Gai-Gai Zhang (12 April 2016). "Towards understanding the lifespan extension by reduced insulin signaling: bioinformatics analysis of DAF-16/FOXO direct targets in Caenorhabditis elegans". Oncotarget. 7 (15): 19185–19192. doi:10.18632/oncotarget.8313. PMID 2702736. Retrieved 27 August 2020.
  100. Fumiko Hirose, Masamitsu Yamaguchi, Akio Matsukage (September 1999). "Targeted Expression of the DNA Binding Domain of DRE-Binding Factor, a Drosophila Transcription Factor, Attenuates DNA Replication of the Salivary Gland and Eye Imaginal Disc". Molecular and Cellular Biology. 19 (9): 6020–6028. doi:10.1128/MCB.19.9.6020. PMID 10454549. Retrieved 4 September 2020.
  101. 101.0 101.1 101.2 101.3 101.4 101.5 Masaru Motojima, Takao Ando and Toshimasa Yoshioka (10 July 2000). "Sp1-like activity mediates angiotensin-II-induced plasminogen-activator inhibitor type-1 (PAI-1) gene expression in mesangial cells" (PDF). Biomedical Journal. 349 (2): 435–441. doi:10.1042/0264-6021:3490435. PMID 10880342. Retrieved 13 August 2020.
  102. Jae-Seon So (31 August 2018). "Roles of Endoplasmic Reticulum Stress in Immune Responses". Molecules and Cells. 41 (8): 705–16. doi:10.14348/molcells.2018.0241. PMID 30078231. Retrieved 5 September 2020.
  103. 103.0 103.1 Robert Clifford, Min-Ho Lee, Sudhir Nayak, Mitsue Ohmachi, Flav Giorgini and Tim Schedl (December 2000). "FOG-2, a novel F-box containing protein, associates with the GLD-1 RNA binding protein and directs male sex determination in the C. elegans hermaphrodite germline" (PDF). Development. 127 (24): 5265–76. PMID 11076749. Retrieved 10 August 2020.
  104. Ou, Young; Rattner, J.B. (2004). "The Centrosome in Higher Organisms: Structure, Composition, and Duplication". International Review of Cytology. 238: 119–182. doi:10.1016/s0074-7696(04)38003-4. ISBN 978-0-12-364642-2. PMID 15364198.
  105. 105.0 105.1 105.2 105.3 105.4 105.5 105.6 Zi-Wei Ye, Jie Xu, Jianxin Shi, Dabing Zhang and Mee-Len Chye (January 2017). "Kelch-motif containing acyl-CoA binding proteins AtACBP4 and AtACBP5 are differentially expressed and function in floral lipid metabolism" (PDF). Plant Molecular Biology. 93: 209–225. doi:10.1007/s11103-016-0557-5. PMID 27826761. Retrieved 7 May 2020.
  106. 106.0 106.1 106.2 K Oeda, J Salinas, and N H Chua (1991). "A tobacco bZip transcription activator (TAF-1) binds to a G-box-like motif conserved in plant genes". The EMBO Journal. 10 (7): 1793–1802. PMID 2050116. Retrieved 2017-02-13. Unknown parameter |month= ignored (help)
  107. Gary J. Loake, Ouriel Faktor, Christopher J. Lamb, and Richard A. Dixon (October 1992). "Combination of H-box [CCTACC(N)7CT] and G-box (CACGTG) cis elements is necessary for feed-forward stimulation of a chalcone synthase promoter by the phenylpropanoid-pathway intermediate p-coumaricacid" (PDF). Proceedings of the National Academy of Sciences USA. 89: 9230–4. PMID 1409628. Retrieved 5 May 2020.
  108. 108.0 108.1 108.2 108.3 108.4 108.5 108.6 Hiroshi Sato, Megumi Kita, and Motoharu Seiki (5 November 1993). "v-Src Activates the Expression of 92-kDa Type IV Collagenase Gene through the AP-1 Site and the GT Box Homologous to Retinoblastoma Control Elements" (PDF). The Journal of Biological Chemistry. 268 (31): 23460–8. PMID 8226872. Retrieved 13 August 2020.
  109. Nicholas V Parsonnet, Nickolaus C Lammer, Zachariah E Holmes, Robert T Batey, Deborah S Wuttke (5 September 2019). "The glucocorticoid receptor DNA-binding domain recognizes RNA hairpin structures with high affinity". Nucleic Acids Research. 47 (15): 8180–8192. doi:10.1093/nar/gkz486. PMID 31147715. Retrieved 28 August 2020.
  110. Vincent Laudet, Dominique Stehelin and Hans Clevers (1993). "Ancestry and diversity of the HMG box superfamily" (PDF). Nucleic Acids Research. 21 (10): 2493–501. doi:10.1093/nar/21.10.2493. PMID 8506143. Retrieved 2017-04-05.
  111. RefSeq (April 2015). HNF1A HNF1 homeobox A [ Homo sapiens (human) ]. 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 7 November 2018.
  112. Cissi Gardmo and Agneta Mode (1 December 2006). "In vivo transfection of rat liver discloses binding sites conveying GH-dependent and female-specific gene expression". Journal of Molecular Endocrinology. 37 (3): 433–441. doi:10.1677/jme.1.02116. PMID 17170084. Retrieved 2017-09-01.
  113. Anna Kalousová, Vladimı́r Beneš, Jan Pačes, Václav Pačes and Zbyněk Kozmik (June 1999). "DNA Binding and Transactivating Properties of the Paired and Homeobox Protein Pax4". Biochemical and Biophysical Research Communications. 259 (3): 510–518. PMID 10364449. Retrieved 6 May 2020.
  114. G. Damante, D. Fabbro, L. Pelizari, D. Civitareale, S. Guazzi, M. Polycarpou-Schwartz, S. Cauci, F. Quadrifoglio, S. Formisano and R. Di Lauro (20 June 1994). "Sequence-specific DNA recognition by the thyroid transcription factor-1 homeodomain" (PDF). Nucleic Acids Research. 22 (15): 3075–83. doi:10.1093/nar/22.15.3075. PMID 7915030. Retrieved 6 May 2020.
  115. Young Jin Kim, Dong Gwan Kim, Sun Hi Lee and Incheol Lee (February 2006). "Wound-induced expression of the ferulate 5-hydroxylase gene in Camptotheca acuminata". Biochimica et Biophysica Acta (BBA) - General Subjects. 1760 (2): 182–190. doi:10.1016/j.bbagen.2005.08.015. PMID 16332414. Retrieved 9 September 2020.
  116. 116.0 116.1 Klaudia Kulczynska, James J. Bieker, Miroslawa Siatecka (12 February 2020). "A Krüppel-like factor 1 (KLF1) Mutation Associated with Severe Congenital Dyserythropoietic Anemia Alters Its DNA-Binding Specificity". Molecular and Cellular Biology. 40 (5): e00444–19. doi:10.1128/MCB.00444-19. PMID 31818881. |access-date= requires |url= (help)
  117. Paul J Rushton and Imre E Somssich (August 1998). "Transcriptional control of plant genes responsive to pathogens" (PDF). Current Opinion in Plant Biology. 1 (4): 311–5. doi:10.1016/1369-5266(88)80052-9. PMID 10066598. Retrieved 5 November 2018.
  118. 118.0 118.1 Potthoff MJ, Olson EN (December 2007). "MEF2: a central regulator of diverse developmental programs". Development. 134 (23): 4131–40. doi:10.1242/dev.008367. PMID 17959722.
  119. 119.0 119.1 119.2 Ayisha Zia, Muhammad Imran, and Sajid Rashid (7 February 2020). "In Silico Exploration of Conformational Dynamics and Novel Inhibitors for Targeting MEF2-Associated Transcriptional Activity". Journal of Chemical Information and Modeling. 60 (3): 1892–1909. doi:10.1021/acs.jcim.0c00008. Retrieved 10 September 2020.
  120. Patrick Cramer, Christopher J. Larson, Gregory L. Verdine and Christoph W. Müller (1 December 1997). "Structure of the human NF‐κB p52 homodimer‐DNA complex at 2.1 Å resolution". The EMBO Journal. 16 (23): 7078–90. doi:10.1093/emboj/16.23.7078. PMID 9384586. Retrieved 3 May 2020.
  121. Jugnu Jain, Emmanuel Burgeon, Tina M. Badalian, Patrick G. Hogan and Anjana Rao (24 February 1995). "A Similar DNA-binding Motif in NFAT Family Proteins and the Rel Homology Region" (PDF). Journal of Biological Chemistry. 270 (8): 4138–4145. doi:10.1074/jbc.270.8.4138. PMID 7876165. Retrieved 15 August 2020.
  122. Blomquist P, Belikov S, Wrange O (January 1999). "Increased nuclear factor 1 binding to its nucleosomal site mediated by sequence-dependent DNA structure". Nucleic Acids Research. 27 (2): 517–25. doi:10.1093/nar/27.2.517. PMC 148209. PMID 9862974.
  123. Walter F. Boron (2003). Medical Physiology: A Cellular And Molecular Approach. Elsevier/Saunders. pp. 125–126. ISBN 1-4160-2328-3.
  124. 124.0 124.1 D. W. Yao, J. Luo, Q. Y. He, J. Li, H. Wang, H. B. Shi, H. F. Xu, M. Wang and J. J. Loor (May 2016). "Characterization of the liver X receptor-dependent regulatory mechanism of goat stearoyl-coenzyme A desaturase 1 gene by linoleic acid". Journal of Dairy Science. 99 (5): 3945–3957. doi:10.3168/jds.2015-10601. PMID 26947306. Retrieved 5 September 2020.
  125. C A Perez, J Ott, D J Mays & J A Pietenpol (15 November 2007). "p63 consensus DNA-binding site: identification, analysis and application into a p63MH algorithm". Oncogene. 26 (52): 7363–70. doi:10.1038/sj.onc.1210561. PMID 17563751. Retrieved 28 August 2020.
  126. 126.0 126.1 Wangjie Yu and Paul E. Hardin (2006). "Circadian oscillators of Drosophila and mammals". Journal of Cell Science. 119: 4793–5. doi:10.1242/jcs.03174. PMID 17130292. Retrieved 2017-02-19.
  127. Mengli You, Shuping Yuan, Juanjuan Shi, Yongzhong Hou (1 June 2015). "PPARδ signaling regulates colorectal cancer". Current Pharmaceutical Design. 21 (21): 2956–2959. doi:10.2174/1381612821666150514104035. PMID 26004416. Retrieved 10 September 2020.
  128. Dalei Shao, Caretha L. Creasy, Lawrence W. Bergman (1 February 1998). "A cysteine residue in helixII of the bHLH domain is essential for homodimerization of the yeast transcription factor Pho4p". Nucleic Acids Research. 26 (3): 710–4. doi:10.1093/nar/26.3.710. PMC 147311. PMID 9443961.
  129. Jennifer A. Pietenpol, Karl Munger, Peter M. Howley, Roland W. Stein and Harold L. Moses (November 15, 1991). "Factor-binding element in the human c-myc promoter involved in transcriptional regulation by transforming growth factor β1 and by the retinoblastoma gene product" (PDF). Proceedings of the National Academy of Sciences USA. 88 (22): 10227–10231. doi:10.1073/pnas.88.22.10227. PMID 1946442. Retrieved 5 December 2018.
  130. Ashutosh Kumar, Himanshu N. Singh, Vikas Pareek, Khursheed Raza, Subrahamanyam Dantham, Pavan Kumar, Sankat Mochan and Muneeb A. Faiq (9 August 2016). "A Possible Mechanism of Zika Virus Associated Microcephaly: Imperative Role of Retinoic Acid Response Element (RARE) Consensus Sequence Repeats in the Viral Genome". Frontiers in Human Neuroscience. 10: 403. doi:10.3389/fnhum.2016.00403. PMID 27555815. Retrieved 7 September 2020.
  131. John P. Cogswell, Patricia V. Basta, and Jenny P.-Y. Ting (October 1990). "X-box-binding proteins positively and negatively regulate transcription of the HLA-DRA gene through interaction with discrete upstream W and V elements" (PDF). Proceedings of the National Academy of Sciences USA. 87 (19): 7703–7707. doi:10.1073/pnas.87.19.7703. PMID 2120707. Retrieved 20 August 2020.
  132. 132.0 132.1 Julien J. Ghislain, Thomas Wong, Melody Nguyen, and Eleanor N. Fish (June 2001). "The Interferon-Inducible Stat2:Stat1 Heterodimer Preferentially Binds In Vitro to a Consensus Element Found in the Promoters of a Subset of Interferon-Stimulated Genes" (PDF). Journal of Interferon and Cytokine Research. 21 (6): 379–388. doi:10.1089/107999001750277. PMID 11440635. Retrieved 15 August 2020.
  133. Joanna Wesoly, Zofia Szweykowska-Kulinska and Hans A R Bluyssen (31 March 2007). "STAT activation and differential complex formation dictate selectivity of interferon responses". Acta Biochimica Polonica. 54 (1): 27–38. doi:10.18388/abp.2007_3266. PMID 17351669. Retrieved 15 August 2020.
  134. Fernanda M. Rodríguez-Tornos, Iñigo San Aniceto, Beatriz Cubelos, Marta Nieto (31 January 2013). "Enrichment of Conserved Synaptic Activity-Responsive Element in Neuronal Genes Predicts a Coordinated Response of MEF2, CREB and SRF". PLoS ONE. 8 (1): e53848. doi:10.1371/journal.pone.0053848. PMID 23382855. Retrieved 12 November 2018.
  135. Francisco Rivero (2002). "mRNA processing in Dictyostelium: sequence requirements for termination and splicing" (PDF). Protist. 153 (2): 169–76. doi:10.1078/1434-4610-00095. PMID 12125758. Retrieved 2017-04-05. Unknown parameter |month= ignored (help)
  136. Rodrigo Nunes da Fonseca and Thiago M. Venancio (1 March 2018). "Maternal or zygotic: Unveiling the secrets of the Pancrustacea transcription factor zelda". Plos Genetics. 14 (3): e1007201. doi:10.1371/journal.pgen.1007201. PMID 29494591. Retrieved 5 September 2020.
  137. Frank L. Conlon, Lynne Fairclough, Brenda M. J. Price, Elena S. Casey and J. C. Smith (2001). "Determinants of T box protein specificity" (PDF). Development. 128 (19): 3749–3758. PMID 11585801. Retrieved 17 November 2018.
  138. Ce Feng Liu, Gabriel S. Brandt, Quyen Q. Hoang, Natalia Naumova, Vanja Lazarevic, Eun Sook Hwang, Job Dekker, Laurie H. Glimcher, Dagmar Ringe, and Gregory A. Petsko (25 October 2016). "Crystal structure of the DNA binding domain of the transcription factor T-bet suggests simultaneous recognition of distant genome sites". Proceedings of the National Academy of Sciences of the USA. 113 (43): E6572–E6581. doi:10.1073/pnas.1613914113. PMID 27791029. Retrieved 28 August 2020.
  139. Yoshiro Maru (2016). Basic Research, In: "Inflammation and Metastasis". Tokyo: Springer. pp. 193–231. doi:10.1007/978-4-431-56024-1_10. ISBN 978-4-431-56022-7. Retrieved 28 August 2020.
  140. Vitor M S Pinto, Svetlana Minakhina, Shuiqing Qiu, Aniket Sidhaye, Michael P Brotherton, Amy Suhotliv, Fredric E Wondisford (1 September 2017). "Naturally Occurring Amino Acids in Helix 10 of the Thyroid Hormone Receptor Mediate Isoform-Specific TH Gene Regulation". Endocrinology. 158 (9): 3067–3078. doi:10.1210/en.2017-00314. PMID 28911178. Retrieved 5 September 2020.
  141. 141.0 141.1 141.2 141.3 Martin L. Read, Andrew R. Clark and Kevin Docherty (1993). "The helix-loop-helix transcription factor USF (upstream stimulating factor) binds to a regulatory sequence of the human insulin gene enhancer" (PDF). Biochemical Journal. 295: 233–237. doi:10.1042/bj2950233. PMID 8216223. Retrieved 14 August 2020.
  142. Young Hun Song, Cheol Min Yoo, An Pio Hong, Seong Hee Kim, Hee Jeong Jeong, Su Young Shin, Hye Jin Kim, Dae-Jin Yun, Chae Oh Lim, Jeong Dong Bahk, Sang Yeol Lee, Ron T. Nagao, Joe L. Key, and Jong Chan Hong (April 2008). "DNA-Binding Study Identifies C-Box and Hybrid C/G-Box or C/A-Box Motifs as High-Affinity Binding Sites for STF1 and LONG HYPOCOTYL5 Proteins" (PDF). Plant Physiology. 146 (4): 1862–1877. doi:10.1104/pp.107.113217. PMID 18287490. Retrieved 26 March 2019.
  143. Jakob Mejlvang, Marina Kriajevska, Cindy Vandewalle, Tatyana Chernova, A. Emre Sayan, Geert Berx, J. Kilian Mellon, and Eugene Tulchinsky (November 2007). "Direct Repression of Cyclin D1 by SIP1 Attenuates Cell Cycle Progression in Cells Undergoing an Epithelial Mesenchymal Transition". Molecular Biology of the Cell. 18 (11): 4615–4624. doi:10.1091/mbc.e07-05-0406. PMID 17855508. Retrieved 15 November 2018.

External links

{{Phosphate biochemistry}}Template:Sisterlinks