Complex locus A1BG and ZNF497: Difference between revisions

Jump to navigation Jump to search
(One intermediate revision by the same user not shown)
Line 1,034: Line 1,034:
==F boxes==
==F boxes==
{{main|F box gene transcriptions}}
{{main|F box gene transcriptions}}
"Male sex determination in the ''Caenorhabditis elegans'' hermaphrodite germline requires translational repression of tra-2 mRNA by the GLD-1 RNA binding protein."<ref name=Clifford>{{ cite journal
|author=Robert Clifford, Min-Ho Lee, Sudhir Nayak, Mitsue Ohmachi, Flav Giorgini and Tim Schedl
|title=FOG-2, a novel F-box containing protein, associates with the GLD-1 RNA binding protein and directs male sex determination in the ''C. elegans'' hermaphrodite germline
|journal=Development
|date=December 2000
|volume=127
|issue=24
|pages=5265-76
|url=https://dev.biologists.org/content/develop/127/24/5265.full.pdf
|arxiv=
|bibcode=
|doi=
|pmid=
|accessdate=10 August 2020 }}</ref>
"We used the yeast Gal4p two-hybrid system (Fields and Sternglanz, 1994) to identify proteins that physically interact with GLD-1. We recovered two identical cDNAs in two-hybrid screens [...]. One (OG2.3) using GLD-1 residues 84-341 and the other (CD13.1) using residues 273-457, both fused to the Gal4p DNA binding domain [...]."<ref name=Clifford/>
"When fused to the DNA-binding domain of Gal4p, Ino2p but not Ino4p was able to activate a UAS<sub>GAL</sub>-containing reporter gene even in the absence of the heterologous Fbfl subunit. By deletion studies, two separate transcriptional activation domains were identified in the N-terminal part of Ino2p. Thus, the bHLH domains of Ino2p and Ino4p constitute the dimerization/DNA-binding module of Fbfl mediating its interaction with the ICRE, while transcriptional activation is effected exclusively by Ino2p."<ref name=Schwank>{{ cite journal
|author=Sabine Schwank, Ronald Ebbert, Karin Rautenstrau𝛃, Eckhart Schweizer and Hans-Joachim Schüller
|title=Yeast transcriptional activator ''IN02'' interacts as an Ino2p/Ino4p basic helix-loop-helix heteromeric complex with the inositol/choline responsive element necessary for expression of phospholipid biosynthetic genes in ''Saccharomyces cerevisiae''
|journal=Nucleic Acids Research
|date=25 January 1995
|volume=23
|issue=2
|pages=230-37
|url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC306659/pdf/nar00002-0046.pdf
|arxiv=
|bibcode=
|doi=10.1093/nar/23.2.230
|pmid=
|accessdate=10 August 2020 }}</ref>
"This ICRE (consensus sequence TYTTCACATGY) contains the core sequence CANNTG, which is also known as an E box and which serves as a recognition site for DNA-binding proteins of the basic helix-loop-helix (bHLH) family (3). Members of the bHLH family comprise determinants of cellular differentiation and proliferation in mammalian and invertebrate systems such as the myogenic transcription factors MyoD, MRF4, myogenin and Myf-5(4) as well as factors not restricted to specialized tisues (E12, E47, daughterless, c-Myc and Mad; 5-7). Proteins of the bHLH group may form either homodimers or heterodimers or both, dependent on the individual structure of the respective interaction surface provided by the HLH domain(8)."<ref name=Schwank/>


==GAAC elements==
==GAAC elements==

Revision as of 12:55, 11 August 2020

Associate Editor(s)-in-Chief: Henry A. Hoff

Alpha-1-B glycoprotein is a 54.3 kDa protein in humans that is encoded by the A1BG gene.[1] The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins.

A1BG is located on the negative DNA strand of chromosome 19 from 58,858,172 – 58,864,865.[2] Additionally, A1BG is located directly adjacent to the ZSCAN22 gene (58,838,385-58,853,712) on the positive DNA strand, as well as the ZNF837 (58,878,990 - 58,892,389, complement) and ZNF497 (58865723 - 58,874,214, complement) genes on the negative strand.[2]

ZSCAN22

  1. Gene ID: 342945 is ZSCAN22 zinc finger and SCAN domain containing 22 on 19q13.43.[3] ZSCAN22 is transcribed in the negative direction from LOC100887072.[3]
  2. Gene ID: 102465484 is MIR6806 microRNA 6806 on 19q13.43: "microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop."[4] MIR6806 is transcribed in the negative direction from LOC105372480.[4]

Of the some 111 gaps between genes on chromosome locus 19q13.43 as of 4 August 2020, gap number 88 is between ZSCAN22 and A1BG. But, there is no gap between ZNF497 and A1BG.

Alpha-1-B glycoprotein

Def. "a substance that induces an immune response, usually foreign"[5] is called an antigen.

Def. any "substance that elicits [an] immune response"[6] is called an immunogen.

An antigen "or immunogen is a molecule that sometimes stimulates an immune system response."[7] But, "the immune system does not consist of only antibodies",[7] instead it "encompasses all substances that can be recognized by the adaptive immune system."[7]

Def. "a protein produced by B-lymphocytes that binds to [a specific antigen or][8] an antigen"[9] is called an antibody.

Five different antibody isotypes are known in mammals, which perform different roles, and help direct the appropriate immune response for each different type of foreign object they encounter.[10]

Although the general structure of all antibodies is very similar, a small region, known as the hypervariable region, at the tip of the protein is extremely variable, allowing millions of antibodies with slightly different tip structures to exist, where each of these variants can bind to a different target, known as an antigen.[11]

Def. "any of the glycoproteins in blood serum that respond to invasion by foreign antigens and that protect the host by removing pathogens;"[12] "an antibody"[13] is called an immunoglobulin.

Gene ID: 1 is A1BG alpha-1-B glycoprotein on 19q13.43, a 54.3 kDa protein in humans that is encoded by the A1BG gene.[14] A1BG is transcribed in the positive direction from ZNF497.[14] "The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins."[14]

  1. NP_570602.2 alpha-1B-glycoprotein precursor, cd05751 Location: 401 → 493 Ig1_LILRB1_like; First immunoglobulin (Ig)-like domain found in Leukocyte Ig-like receptors (LILR)B1 (also known as LIR-1) and similar proteins, smart00410 Location: 218 → 280 IG_like; Immunoglobulin like, pfam13895 Location: 210 → 301 Ig_2; Immunoglobulin domain and cl11960 Location: 28 → 110 Ig; Immunoglobulin domain.[14]

Patients who have pancreatic ductal adenocarcinoma show an overexpression of A1BG in pancreatic juice.[15]

Immunoglobulin supergene family

"𝛂1B-glycoprotein(𝛂1B) [...] consists of a single polypeptide chain N-linked to four glucosamine oligosaccharides. The polypeptide has five intrachain disulfide bonds and contains 474 amino acid residues. [...] 𝛂1B exhibits internal duplication and consists of five repeating structural domains, each containing about 95 amino acids and one disulfide bond. [...] several domains of 𝛂1B, especially the third, show statistically significant homology to variable regions of certain immunoglobulin light and heavy chains. 𝛂1B [...] exhibits sequence similarity to other members of the immunoglobulin supergene family such as the receptor for transepithelial transport of IgA and IgM and the secretory component of human IgA."[16]

"Some of the domains of 𝛂1B show significant homology to variable (V) and constant (C) regions of certain immunoglobulins. Likewise, there is statistically significant homology between 𝛂1B and the secretory component (SC) of human IgA (15) and also with the extracellular portion of the rabbit receptor for transepithelial transport of polymeric immunoglobulins (IgA and IgM). Mostov et al. (16) have called the later protein the poly-Ig receptor or poly-IgR and have shown that it is the precursor of SC."[16]

The immunoglobulin supergene family is "the group of proteins that have immunoglobulin-like domains, including histocompatibility antigens, the T-cell antigen receptor, poly-IgR, and other proteins involved in the vertebrate immune response (17)."[16]

"The internal homology in primary structure [...] and the presence of an intrasegment disulfide bond suggest that 𝛂1B is composed of five structural domains that arose by duplication of a primordial gene coding for about 95 amino acid residues."[16]

"Unlike immunoglobulins (25), ceruloplasmin (6), and hemopexin (7), 𝛂1B is not subject to limited interdomain cleavage by proteolytic enzymes. At least, we were not able to produce such fragments by use of a variety of proteases. This stability of 𝛂1B is probably associated with the frequency of proline in the sequences linking the domains [...]."[16]

"A peptide identified in the late and early milk proteomes showed homology to eutherian alpha 1B glycoprotein (A1BG), a plasma protein with unknown function46, as well as venom inhibitors characterised in the Southern opossum Didelphis marsupialis (DM43 and DM4647,48,49), all members of the immunoglobulin superfamily. To characterise the relationship between the peptide sequence identified in koala, A1BG, DM43 and DM46, a phylogenetic tree was constructed [...] including all marsupial and monotreme homologs (identified by BLAST), three phylogenetically representative eutherian sequences, with human IGSF1 and TARM1, related members of the immunoglobulin super family, used as outgroups. This phylogeny indicates that A1BG-like proteins in marsupials and the Didelphis antitoxic proteins are homologs of eutherian A1BG, with excellent bootstrap support (98%). The marsupial A1BG-like sequences and the Didelphis antitoxic proteins formed a single clade with strong bootstrap support (97%)."[17]

"Human TARM1 and IGSF1, related members of the immunoglobulin superfamily are used as outgroups. The tree was constructed using the maximum likelihood approach and the JTT model with bootstrap support values from 500 bootstrap tests. Bootstrap values less than 50% are not displayed. Accession numbers: Tasmanian devil (Sarcophilus harrisii; XP_012402143), Wallaby (Macropus eugenii; FY619507), Possum (Trichosurus vulpecula; DY596639) Virginia opossum (Didelphis virginiana; AAA30970, AAN06914), Southern opossum (Didelphis marsupialis; AAL82794, P82957, AAN64698), Human (Homo sapiens; P04217, B6A8C7, Q8N6C5), Platypus (Ornithorhychus anatinus; ENSOANP00000000762), Cow (Bos taurus; Q2KJF1), Alpaca (Vicugna pacos; XP_015107031)."[17]

"The sequences of 𝛂1B-glycoprotein (38) and chicken N-CAM (neural cell-adhesion molecule) (39) have been shown to be related to the immunoglobulin supergene family."[18]

A1BG contains the immunoglobulin domain: cl11960 and three immunoglobulin-like domains: pfam13895, cd05751 and smart00410.

"Immunoglobulin (Ig) domain [cl11960] found in the Ig superfamily. The Ig superfamily is a heterogenous group of proteins, built on a common fold comprised of a sandwich of two beta sheets. Members of this group are components of immunoglobulin, neuroglia, cell surface glycoproteins, such as, T-cell receptors, CD2, CD4, CD8, and membrane glycoproteins, such as, butyrophilin and chondroitin sulfate proteoglycan core protein. A predominant feature of most Ig domains is a disulfide bridge connecting the two beta-sheets with a tryptophan residue packed against the disulfide bond."[19]

"This domain [pfam13895] contains immunoglobulin-like domains."[20]

"Ig1_LILR_KIR_like: [cd05751] domain similar to the first immunoglobulin (Ig)-like domain found in Leukocyte Ig-like receptors (LILRs) and Natural killer inhibitory receptors (KIRs). This group includes LILRB1 (or LIR-1), LILRA5 (or LIR9), an activating natural cytotoxicity receptor NKp46, the immune-type receptor glycoprotein VI (GPVI), and the IgA-specific receptor Fc-alphaRI (or CD89). LILRs are a family of immunoreceptors expressed on expressed on T and B cells, on monocytes, dendritic cells, and subgroups of natural killer (NK) cells. The human LILR family contains nine proteins (LILRA1-3,and 5, and LILRB1-5). From functional assays, and as the cytoplasmic domains of various LILRs, for example LILRB1 (LIR-1), LILRB2 (LIR-2), and LILRB3 (LIR-3) contain immunoreceptor tyrosine-based inhibitory motifs (ITIMs) it is thought that LIR proteins are inhibitory receptors. Of the eight LIR family proteins, only LIR-1 (LILRB1), and LIR-2 (LILRB2), show detectable binding to class I MHC molecules; ligands for the other members have yet to be determined. The extracellular portions of the different LIR proteins contain different numbers of Ig-like domains for example, four in the case of LILRB1 (LIR-1), and LILRB2 (LIR-2), and two in the case of LILRB4 (LIR-5). The activating natural cytotoxicity receptor NKp46 is expressed in natural killer cells, and is organized as an extracellular portion having two Ig-like extracellular domains, a transmembrane domain, and a small cytoplasmic portion. GPVI, which also contains two Ig-like domains, participates in the processes of collagen-mediated platelet activation and arterial thrombus formation. Fc-alphaRI is expressed on monocytes, eosinophils, neutrophils and macrophages; it mediates IgA-induced immune effector responses such as phagocytosis, antibody-dependent cell-mediated cytotoxicity and respiratory burst."[21]

"IG domains [smart00410] that cannot be classified into one of IGv1, IGc1, IGc2, IG."[22] "𝛂1B-glycoprotein(𝛂1B) [...] consists of a single polypeptide chain N-linked to four glucosamine oligosaccharides. The polypeptide has five intrachain disulfide bonds and contains 474 amino acid residues. [...] 𝛂1B exhibits internal duplication and consists of five repeating structural domains, each containing about 95 amino acids and one disulfide bond. [...] several domains of 𝛂1B, especially the third, show statistically significant homology to variable regions of certain immunoglobulin light and heavy chains. 𝛂1B [...] exhibits sequence similarity to other members of the immunoglobulin supergene family such as the receptor for transepithelial transport of IgA and IgM and the secretory component of human IgA."[16]

A1BG protein species

Def. a "group of plants or animals having similar appearance"[23] or "the largest group of organisms in which [any][24] two individuals [of the appropriate sexes or mating types][24] can produce fertile offspring, typically by sexual reproduction"[25] is called a species.

The gene contains 20 distinct introns.[26] Transcription produces 15 different mRNAs, 10 alternatively spliced variants and 5 unspliced forms.[26] There are 4 probable alternative promoters, 4 non overlapping alternative last exons and 7 validated alternative polyadenylation sites.[26] The mRNAs appear to differ by truncation of the 5' end, truncation of the 3' end, presence or absence of 4 cassette exons, overlapping exons with different boundaries, splicing versus retention of 3 introns.[26]

Variants or isoforms

Def. a "different sequence of a gene (locus)"[27] is called a variant.

Def. any "of several different forms of the same protein, arising from either single nucleotide polymorphisms,[28] differential splicing of mRNA, or post-translational modifications (e.g. sulfation, glycosylation, etc.)"[29] is called an isoform.

Regarding additional isoforms, mention has been made of "new genetic variants of A1BG."[30]

"Proteomic analysis revealed that [a circulating] set of plasma proteins was α 1 B-glycoprotein (A1BG) and its post-translationally modified isoforms."[31]

Pharmacogenomic variants have been reported.[32]

Genotypes

Def. the "part (DNA sequence) of the genetic makeup of an organism which determines a specific characteristic (phenotype) of that organism"[33] or a "group of organisms having the same genetic constitution" [34]is called a genotype.

There are A1BG genotypes.[32]

A1BG has a genetic risk score of rs893184.[32]

"A genetic risk score, including rs16982743, rs893184, and rs4525 in F5, was significantly associated with treatment-related adverse cardiovascular outcomes in whites and Hispanics from the INVEST study and in the Nordic Diltiazem study (meta-analysis interaction P=2.39×10−5)."[32]

Polymorphs

Def. the "regular existence of two or more different genotypes within a given species or population; also, variability of amino acid sequences within a gene's protein"[35] is called polymorphism.

Def. "one of a number of alternative forms of the same gene occupying a given position, [or locus],[36] on a chromosome"[37] is called an allele.

"rs893184 causes a histidine (His) to arginine (Arg) [nonsynonymous single nucleotide polymorphism (nsSNP), A (minor) for G (major)] substitution at amino acid position 52 in A1BG."[32]

"Genetic polymorphism of human plasma (serum) alpha 1B-glycoprotein (alpha 1B) was observed using one-dimensional horizontal polyacrylamide gel electrophoresis (PAGE) pH 9.0 of plasma samples followed by Western blotting with specific antiserum to alpha 1B."[38]

A1B*5 is a "new allele [...] of human plasma 𝜶1B-glycoprotein [...]."[39]

"Genetic polymorphism of human plasma 𝜶1B-glycoprotein (𝜶1B) was reported first, in brief, by Altland et al. [1983; also given in Altkand and Hacklar, 1984]. A detailed description of human 𝜶1B polymorphism was reported in subsequent studies [Gahne et al., 1987; Juneja et al., 1988, 1989]. Five different 𝜶1B alleles (A1B*1, A1B*2, A1B*3, A1B*4 and A1B*5) were reported. In Caucasian whites, the frequencies of A1B*1 and ''A1B*2 were about 0.95 and 0.05, respectively. A1B*4 was observed in 2 related Czech individuals. In American blacks, A1B*1 and A1B*2 occurred with a frequency of 0.73 and 0.21, respectively, while a new allele, viz, A1B*3 had a frequency of 0.06. A1B*5 was observed only in Swedish Lapps and in Finns with a frequency of 0.04 and 0.007, respectively."[40]

"The frequency of A1B*1 varied from 0.89 to 0.91 and that of A1B*2 from 0.08 to 0.10. The A1B*3 allele, reported previously only in American blacks, was observed with a frequency range of 0.003-0.01 in 3 of the Chinese populations, in Koreans and in Malays. A new 𝜶1B allele (A1B*6) was observed in 2 Chinese individuals."[40]

Phenotypes

Def. the "appearance of an organism based on a single trait [multifactorial combination of genetic traits and environmental factors][41], especially used in pedigrees"[42] or any "observable characteristic of an organism, such as its morphological, developmental, biochemical or physiological properties, or its behavior"[43] is called a phenotype.

"The three different phenotypes of α1B observed (designated 1-1, 1-2, and 2-2) were apparently identical to those reported by Altland et al. (1983), who used double one-dimensional electrophoresis. Family data supported the hypothesis that the three α1B phenotypes are determined by two codominant alleles at an autosomal locus, designated A1B. Allele frequencies in a Swedish population were: A1B *1, 0.937; A1B *2, 0.063; PIC, 0.111."[38]

Protein species

"Both protein species of [alpha 1-beta glycoprotein] A1B (A1Ba, p = 0.008; f.c.= +1.62, A1Bb, p = 0.003; f.c. = +1.82) [...] were apparently overexpressed in patients with PTCa [...]."[44]

A1BG is mainly produced in the liver, and is secreted to plasma to levels of approximately 0.22 mg/mL.[16]

CRISPs

The human cysteine-rich secretory protein (CRISP3) "is present in exocrine secretions and in secretory granules of neutrophilic granulocytes and is believed to play a role in innate immunity."[45] CRISP3 has a relatively high content in human plasma.[45]

"The A1BG-CRISP-3 complex is noncovalent with a 1:1 stoichiometry and is held together by strong electrostatic forces."[45] "Similar [complex formation] between toxins from snake venom and A1BG-like plasma proteins ... inhibits the toxic effect of snake venom metalloproteinases or myotoxins and protects the animal from envenomation."[45]

Opossums have a remarkably robust immune system, and show partial or total immunity to the venom of rattlesnakes, Agkistrodon piscivorus, cottonmouths, and other Crotalinae, pit vipers.[46][47]

"Crisp3 [is] mainly [expressed] in the salivary glands, pancreas, and prostate."[48] "CRISP3 is highly expressed in the human cauda epididymidis and ampulla of vas deferens (Udby et al. 2005)."[48]

ZNF497

Gene ID: 503538 is A1BG-AS1 A1BG antisense RNA 1.[49] A1BG-AS1 is transcribed in the negative direction from ZSCAN22.[49]

Gene ID: 162968 is ZNF497 zinc finger protein 497.[50] ZNF497 is transcribed in the positive direction from RNA5SP473.[50]

  1. NP_001193938.1 zinc finger protein 497: "Transcript Variant: This variant (2) lacks an alternate exon in the 5' UTR, compared to variant 1. Variants 1 and 2 encode the same protein."[50]
  2. NP_940860.2 zinc finger protein 497: "Transcript Variant: This variant (1) is the longer transcript. Variants 1 and 2 encode the same protein."[50]

Gene ID: 100419840 is LOC100419840 zinc finger protein 446 pseudogene.[51] LOC100419840 may be transcribed in the positive direction from LOC105372483.[51]

Gene ID: 105372483 is LOC105372483 uncharacterized LOC105372483 ncRNA.[52] LOC105372483 is transcribed in the negative direction from LOC100419840.[52]

Gene ID: 106479017 is RNA5SP473 RNA, 5S ribosomal pseudogene 473.[53] RNA5SP473 may be transcribed in the negative direction from ZNF497.[53]

19q13.43

Regulatory elements and regions

It may be still fair to say that in the apparent present era of functional genomics, the challenge is to elucidate gene function such as that of A1BG, its likely regulatory networks and signaling pathways.[54] "Since regulation of gene expression in vivo mainly occurs at the transcriptional level, identifying the location of genetic regulatory elements is a key to understanding the machinery regulating gene transcription. A major goal of current genome research is to identify the locations of all gene regulatory elements, including promoters, enhancers, silencers, insulators and boundary elements, and to analyze their relationship to the current annotation of human genes."[55][56] Although "many genome-wide strategies have been developed for identifying functional elements", "no method yet has the resolution to precisely identify all regulatory elements or can be readily applied to the entire human genome."[57]

There is one CRISPRi-validated cis-regulatory element on 19q13.43: Gene ID: 116286197 LOC116286197. And, four Sharpr-MPRA regulatory regions: (1) Gene ID: 112553117 LOC112553117 Sharpr-MPRA regulatory region 1998, Gene ID: 112553119 LOC112553119 Sharpr-MPRA regulatory region 10473, Gene ID: 112577453 LOC112577453 Sharpr-MPRA regulatory region 7872, and Gene ID: 112577454 is Sharpr-MPRA regulatory region 9894.

DNase I hypersensitive sites

"This genomic region represents a DNase I hypersensitive site (DHS) that was predicted to be an enhancer by the ENCODE (ENCyclopedia Of DNA Elements) project based on various combinations of H3K27 acetylation and binding of p300, GATA1 and RNA polymerase II in K562 erythroleukemia cells. It was validated as a high-confidence cis-regulatory element for the ZNF582 (zinc finger protein 582) gene on chromosome 19 based on multiplex CRISPR/Cas9-mediated perturbation in K562 cells."[58]

Gene ID: 116286197 CRISPRi-validated cis-regulatory element chr19.6329 is at NC_000019.10 (56186901..56187499).[58] Gene ID: 147948 ZNF582 is at NC_000019.10 (56382751..56393585, complement).[59] The CRISPRi-validated cis-regulatory element chr19.6329 is (56382751 - 56186901) = 195850 nts from the beginning of ZNF582.

Transcriptional regulatory regions

"This genomic sequence was predicted to be a transcriptional regulatory region based on chromatin state analysis from the ENCODE (ENCyclopedia Of DNA Elements) project. It was validated as a functional enhancer by the Sharpr-MPRA technique (Systematic high-resolution activation and repression profiling with reporter tiling using massively parallel reporter assays) in K562 erythroleukemia cells (group: K562 Activating DNase unmatched - State 1:Tss, active promoter, TSS/CpG island region), with weaker activation in HepG2 liver carcinoma cells (group: HepG2 Activating DNase matched - State 1:Tss)."[60]

"This genomic sequence was predicted to be a transcriptional regulatory region based on chromatin state analysis from the ENCODE (ENCyclopedia Of DNA Elements) project. It was validated as a functional enhancer by the Sharpr-MPRA technique (Systematic high-resolution activation and repression profiling with reporter tiling using massively parallel reporter assays) in HepG2 liver carcinoma cells (group: HepG2 Activating DNase matched - State 5:Enh, candidate strong enhancer, open chromatin). It also displayed weak repressive activity by Sharpr-MPRA in K562 erythroleukemia cells (group: K562 Repressive non-DNase unmatched - State 24:Quies, heterochromatin/dead zone)."[61]

"This genomic sequence was predicted to be a transcriptional regulatory region based on chromatin state analysis from the ENCODE (ENCyclopedia Of DNA Elements) project. It was validated as a functional enhancer by the Sharpr-MPRA technique (Systematic high-resolution activation and repression profiling with reporter tiling using massively parallel reporter assays) in both HepG2 liver carcinoma cells (group: HepG2 Activating DNase unmatched - State 1:Tss, active promoter, TSS/CpG island region) and K562 erythroleukemia cells (group: K562 Activating DNase unmatched - State 1:Tss)."[62]

"This genomic sequence was predicted to be a transcriptional regulatory region based on chromatin state analysis from the ENCODE (ENCyclopedia Of DNA Elements) project. It was validated as a functional enhancer by the Sharpr-MPRA technique (Systematic high-resolution activation and repression profiling with reporter tiling using massively parallel reporter assays) in K562 erythroleukemia cells (group: K562 Activating DNase unmatched - State 1:Tss, active promoter, TSS/CpG island region), with weaker activation in HepG2 liver carcinoma cells (group: HepG2 Activating DNase matched - State 1:Tss)."[63]

"The growth hormone-regulated transcription factors STAT5 and BCL6 coordinately regulate sex differences in mouse liver, primarily through effects in male liver, where male-biased genes are upregulated and many female-biased genes are actively repressed."[64] "CUX2, a highly female-specific liver transcription factor, contributes to an analogous regulatory network in female liver. Adenoviral overexpression of CUX2 in male liver induced 36% of female-biased genes and repressed 35% of male-biased genes. In female liver, CUX2 small interfering RNA (siRNA) preferentially induced genes repressed by adenovirus expressing CUX2 (adeno-CUX2) in male liver, and it preferentially repressed genes induced by adeno-CUX2 in male liver. CUX2 binding in female liver chromatin was enriched at sites of male-biased DNase hypersensitivity and at genomic regions showing male-enriched STAT5 binding. CUX2 binding was also enriched near genes repressed by adeno-CUX2 in male liver or induced by CUX2 siRNA in female liver but not at genes induced by adeno-CUX2, indicating that CUX2 binding is preferentially associated with gene repression. Nevertheless, direct CUX2 binding was seen at several highly female-specific genes that were positively regulated by CUX2, including A1bg [A1BG in humans], Cyp2b9, Cyp3a44, Tox [TOX in humans], and Trim24 [TRIM24 in humans]."[64]

A boxes

There is one A box on the positive strand in the negative direction (from ZSCAN22 to A1BG): 3'-TGACTCT-5' at 2788.

There is one A box complement on the negative strand in the negative direction: 3'-ACTGAGA-5' at 2788.

There is one A box inverse complement on the negative strand in the positive direction: 3'-AGAGTCA-5' at 2613.

There is one A box inverse on the positive strand in the positive direction: 3'-TCTCAGT-5' at 2613.

ACGT-containing elements

  1. ACGT elements, negative strand, negative direction: 24, 3'-ACGT-5' at 150, 3'-ACGT-5' at 1030, 3'-ACGT-5' at 1321, 3'-ACGT-5' at 1337, 3'-ACGT-5' at 1345, 3'-ACGT-5' at 1470, 3'-ACGT-5' at 1494, 3'-ACGT-5' at 1535, 3'-ACGT-5' at 1717, 3'-ACGT-5' at 1974, 3'-ACGT-5' at 1998, 3'-ACGT-5' at 2081, 3'-ACGT-5' at 2400, 3'-ACGT-5' at 2424, 3'-ACGT-5' at 2735, 3'-ACGT-5' at 2759, 3'-ACGT-5' at 2863, 3'-ACGT-5' at 3287, 3'-ACGT-5' at 3429, 3'-ACGT-5' at 3771, 3'-ACGT-5' at 4245, 3'-ACGT-5' at 4315, 3'-ACGT-5' at 4330, 3'-ACGT-5' at 4338.
  2. ACGT elements, negative strand, positive direction: 2, 3'-ACGT-5' at 569, 3'-ACGT-5' at 3254.
  3. ACGT elements, positive strand, negative direction: 4, 3'-ACGT-5' at 342, 3'-ACGT-5' at 531, 3'-ACGT-5' at 1772, 3'-ACGT-5' at 4236.
  4. ACGT elements, positive strand, positive direction: 44, 3'-ACGT-5' at 192, 3'-ACGT-5' at 224, 3'-ACGT-5' at 436, 3'-ACGT-5' at 531, 3'-ACGT-5' at 546, 3'-ACGT-5' at 656, 3'-ACGT-5' at 783, 3'-ACGT-5' at 1119, 3'-ACGT-5' at 1218, 3'-ACGT-5' at 1370, 3'-ACGT-5' at 1470, 3'-ACGT-5' at 1505, 3'-ACGT-5' at 1613, 3'-ACGT-5' at 1786, 3'-ACGT-5' at 1820, 3'-ACGT-5' at 1935, 3'-ACGT-5' at 2063, 3'-ACGT-5' at 2204, 3'-ACGT-5' at 2326, 3'-ACGT-5' at 2334, 3'-ACGT-5' at 2350, 3'-ACGT-5' at 2681, 3'-ACGT-5' at 2690, 3'-ACGT-5' at 2719, 3'-ACGT-5' at 2743, 3'-ACGT-5' at 2800, 3'-ACGT-5' at 2857, 3'-ACGT-5' at 2960, 3'-ACGT-5' at 3061, 3'-ACGT-5' at 3070, 3'-ACGT-5' at 3142, 3'-ACGT-5' at 3230, 3'-ACGT-5' at 3268, 3'-ACGT-5' at 3279, 3'-ACGT-5' at 3320, 3'-ACGT-5' at 3341, 3'-ACGT-5' at 3400, 3'-ACGT-5' at 3459, 3'-ACGT-5' at 3464, 3'-ACGT-5' at 3829, 3'-ACGT-5' at 3883, 3'-ACGT-5' at 3960, 3'-ACGT-5' at 4315, 3'-ACGT-5' at 4341.

ACGT-containing elements include these metal responsive elements:

  1. complement, negative strand, negative direction: 6, 3'-ACGTGAG-5' at 1348, 3'-ACGTGAG-5' at 2001, 3'-ACGTGAG-5' at 2427, 3'-ACGTGGG-5' at 2762, 3'-ACGTGAG-5' at 3290, and 3'-ACGTGAG-5' at 4341.
  2. complement, positive strand, negative direction: 6, 3'-ACGTGTG-5' at 549, 3'-ACGTGTG-5' at 1221, 3'-ACGTGAG-5' at 1373, 3'-ACGTGAG-5' at 1473, 3'-ACGTGTG-5' at 2963, 3'-ACGTGGG-5' at 3323.
  3. inverse, negative strand, negative direction: 2, 3'-CTCACGT-5' at 1470, 3'-CACACGT-5' at 2863.
  4. inverse, positive strand, negative direction: 2, 3'-CACACGT-5' at 531, 3'-CTCACGT-5' at 1772.
  5. inverse, positive strand, positive direction: 6, 3'-CGCACGT-5' at 546, 3'-CGCACGT-5' at 1218, 3'-CTCACGT-5' at 1786, 3'-CTCACGT-5' at 2326, 3'-CCCACGT-5' at 2800, 3'-CCCACGT-5' at 3883.

ACGT-containing elements include these cAMP response elements (CRE):

  1. negative strand in the negative direction (from ZSCAN22 to A1BG): 1, 3'-TGACGTCA-5' at 4317.

AGC boxes

An inverse AGC box occurs negative strand, negative direction, 3'-CCGCCGA-5' at 1754 nts from ZSCAN22 toward A1BG in the distal promoter with its complement on the positive strand, negative direction.

Angiotensinogen core promoter elements

  1. AGCE, negative strand, negative direction, looking for 3'-A/C-T-C/T-G-T-G-5': 4, 3'-ATTGTG-5' at 340, 3'-ATCGTG-5' at 2096, 3'-CTTGTG-5' at 3669, 3'-CTCGTG-5' at 3914.
  2. AGCE, negative strand, positive direction, looking for 3'-A/C-T-C/T-G-T-G-5': 2, 3'-ATTGTG-5' at 2679, 3'-CTCGTG-5' at 4376.
  3. AGCE, positive strand, negative direction, looking for 3'-A/C-T-C/T-G-T-G-5': 0.
  4. AGCE, positive strand, positive direction, looking for 3'-A/C-T-C/T-G-T-G-5': 6, 3'-CTCGTG-5' at 855, 3'-CTCGTG-5' at 955, 3'-CTCGTG-5' at 1207, 3'-CTCGTG-5' at 1627, 3'-CTTGTG-5' at 3095, 3'-CTCGTG-5' at 3739.
  5. AGCEc, negative strand, negative direction, looking for 3'-G/T-A-A/G-C-A-C-5': 0.
  6. AGCEc, negative strand, positive direction, looking for 3'-G/T-A-A/G-C-A-C-5': 6, 3'-GAGCAC-5' at 855, 3'-GAGCAC-5' at 955, 3'-GAGCAC-5' at 1207, 3'-GAGCAC-5' at 1627, 3'-GAACAC-5' at 3095, 3'-GAGCAC-5' at 3739.
  7. AGCEc, positive strand, negative direction, looking for 3'-G/T-A-A/G-C-A-C-5': 4, 3'-TAACAC-5' at 340, 3'-TAGCAC-5' at 2096, 3'-GAACAC-5' at 3669, 3'-GAGCAC-5' at 3914.
  8. AGCEc, positive strand, positive direction, looking for 3'-G/T-A-A/G-C-A-C-5': 2, 3'-TAACAC-5' at 2679, 3'-GAGCAC-5' at 4376.
  9. AGCEci, negative strand, negative direction, looking for 3'-C-A-C-A/G-A-G/T-5': 2, 3'-CACGAT-5' at 336, 3'-CACGAG-5' at 4403.
  10. AGCEci, negative strand, positive direction, looking for 3'-C-A-C-A/G-A-G/T-5': 1, 3'-CACGAG-5' at 243.
  11. AGCEci, positive strand, negative direction, looking for 3'-C-A-C-A/G-A-G/T-5': 10, 3'-CACGAG-5' at 435, 3'-CACGAG-5' at 572, 3'-CACGAG-5' at 708, 3'-CACGAG-5' at 1182, 3'-CACAAT-5' at 1721, 3'-CACAAG-5' at 2244, 3'-CACGAG-5' at 3232, 3'-CACAAT-5' at 3515, 3'-CACAAG-5' at 3634, 3'-CACGAG-5' at 4472.
  12. AGCEci, positive strand, positive direction, looking for 3'-C-A-C-A/G-A-G/T-5': 3, 3'-CACAAG-5' at 107, 3'-CACGAG-5' at 2090, 3'-CACGAG-5' at 3152.
  13. AGCEi, negative strand, negative direction, looking for 3'-G-T-G-C/T-T-A/C-5': 10, 3'-GTGCTC-5' at 435, 3'-GTGCTC-5' at 572, 3'-GTGCTC-5' at 708, 3'-GTGCTC-5' at 1182, 3'-GTGTTA-5' at 1721, 3'-GTGTTC-5' at 2244, 3'-GTGCTC-5' at 3232, 3'-GTGTTA-5' at 3515, 3'-GTGTTC-5' at 3634, 3'-GTGCTC-5' at 4472.
  14. AGCEi, negative strand, positive direction, looking for 3'-G-T-G-C/T-T-A/C-5': 3, 3'-GTGTTC-5' at 107, 3'-GTGCTC-5' at 2090, 3'-GTGCTC-5' at 3152.
  15. AGCEi, positive strand, negative direction, looking for 3'-G-T-G-C/T-T-A/C-5': 2, 3'-GTGCTA-5' at 336, 3'-GTGCTC-5' at 4403.
  16. AGCEi, positive strand, positive direction, looking for 3'-G-T-G-C/T-T-A/C-5': 0.

ATA boxes

Core promoters

There is the following inverse ATA box on the negative strand, negative direction: 1, 3'-AAATAA-5' at 4537 inside A1BG as the TSS is at 4460 nts from ZSCAN22.

Proximal promoters

There is the following inverse ATA box on the positive strand, negative direction: 3'-AAATAA-5' at 4221.

There is one inverse and inverse complement between 4050 and 4300 in the positive direction: 3'-AAATAA-5' at 4142, and 3'-TTTATT-5' at 4142.

Distal promoters

There is the following ATA box on the negative strand in the negative direction: 1, 3'-AATAAA-5' at 1726 nts from ZSCAN22.

There are the following ATA boxes on the positive strand in the negative direction: 3, 3'-AATAAA-5' at 3014, 3'-AATAAA-5' at 3335, and 3'-AATAAA-5' at 4072.

There are the following inverse ATA boxes on the positive strand, negative direction: 4, 3'-AAATAA-5' at 3013, 3'-AAATAA-5' at 3334, 3'-AAATAA-5' at 4071, 3'-AAATAA-5' at 4075.

There is the following ATA box on the negative strand in the positive direction: 1, 3'-AATAAA-5' at 3427. It has a complement on the positive strand in the positive direction: 1, 3'-TTATTT-5' at 3427.

There is another inverse complement ATA box on the negative strand in the positive direction in distal promoter: 3'-TTTATT-5' at 2347. It also has an inverse in the distal promoter: 3'-AAATAA-5' at 2347.

B boxes

While there appear to be at least two B boxes, TGGGCA is one B-box,[65] where the "mP2 EB fragment used for binding was the 118 nucleotide fragment extending from the Dde I site at position -140 to the Dde I site at position -23 [...]. This fragment contains the GC, E, B, CAAT, and TATA boxes."[65]

  1. negative strand in the negative direction, looking for 3'-TGGGCA-5', 0.
  2. negative strand in the positive direction, looking for 3'-TGGGCA-5', 4, 3'-TGGGCA-5' at 27, 3'-TGGGCA-5' at 1945, 3'-TGGGCA-5' at 2894, 3'-TGGGCA-5' at 4180.
  3. positive strand in the negative direction, looking for 3'-TGGGCA-5', 9, 3'-TGGGCA-5' at 462, 3'-TGGGCA-5' at 902, 3'-TGGGCA-5' at 1114, 3'-TGGGCA-5' at 1359, 3'-TGGGCA-5' at 2438, 3'-TGGGCA-5' at 2773, 3'-TGGGCA-5' at 3301, 3'-TGGGCA-5' at 4040, 3'-TGGGCA-5' at 4191.
  4. positive strand in the positive direction, looking for 3'-TGGGCA-5', 0,
  5. complement, negative strand, negative direction, looking for 3'-ACCCGT-5', 9, 3'-ACCCGT-5' at 462, 3'-ACCCGT-5' at 902, 3'-ACCCGT-5' at 1114, 3'-ACCCGT-5' at 1359, 3'-ACCCGT-5' at 2438, 3'-ACCCGT-5' at 2773, 3'-ACCCGT-5' at 3301, 3'-ACCCGT-5' at 4040, 3'-ACCCGT-5' at 4191.
  6. complement, negative strand, positive direction, looking for 3'-ACCCGT-5', 0.
  7. complement, positive strand, negative direction, looking for 3'-ACCCGT-5', 0.
  8. complement, positive strand, positive direction, looking for 3'-ACCCGT-5', 4, 3'-ACCCGT-5' at 27, 3'-ACCCGT-5' at 1945, 3'-ACCCGT-5' at 2894, 3'-ACCCGT-5' at 4180.
  9. inverse complement, negative strand, negative direction, looking for 3'-TGCCCA-5', 0.
  10. inverse complement, negative strand, positive direction, looking for 3'-TGCCCA-5', 2, 3'-TGCCCA-5' at 3237, 3'-TGCCCA-5' at 3377.
  11. inverse complement, positive strand, negative direction, looking for 3'-TGCCCA-5', 4, 3'-TGCCCA-5' at 1458, 3'-TGCCCA-5' at 3854, 3'-TGCCCA-5' at 3883, 3'-TGCCCA-5' at 4251.
  12. inverse complement, positive strand, positive direction, looking for 3'-TGCCCA-5', 1, 3'-TGCCCA-5' at 3750.
  13. inverse, negative strand, negative direction, looking for 3'-ACGGGT-5', 4, 3'-ACGGGT-5' at 1458, 3'-ACGGGT-5' at 3854, 3'-ACGGGT-5' at 3883, 3'-ACGGGT-5' at 4251.
  14. inverse, negative strand, positive direction, looking for 3'-ACGGGT-5', 1, 3'-ACGGGT-5' at 3750.
  15. inverse, positive strand, negative direction, looking for 3'-ACGGGT-5', 0.
  16. inverse, positive strand, positive direction, looking for 3'-ACGGGT-5', 2, 3'-ACGGGT-5' at 3237, 3'-ACGGGT-5' at 3377.

The other is associated with the human transforming growth factor b1 binding sequences.[66]

And, has the consensus sequence 3'-TGTCTCA-5'. Let it be designated B1box.

  1. negative strand in the negative direction, looking for 3'-TGTCTCA-5', 2, 3'-TGTCTCA-5' at 1075, 3'-TGTCTCA-5' at 2445.
  2. negative strand in the positive direction, looking for 3'-TGTCTCA-5', 2, 3'-TGTCTCA-5'at 2174, 3'-TGTCTCA-5' at 2468.
  3. positive strand in the negative direction, looking for 3'-TGTCTCA-5', 5, 3'-TGTCTCA-5' at 923, 3'-TGTCTCA-5' at 1089, 3'-TGTCTCA-5' at 2033, 3'-TGTCTCA-5' at 3323, 3'-TGTCTCA-5' at 4373.
  4. positive strand in the positive direction, looking for 3'-TGTCTCA-5', 0.
  5. complement, negative strand, negative direction, looking for 3'-ACAGAGT-5', 5, 3'-ACAGAGT-5' at 923, 3'-ACAGAGT-5' at 1089, 3'-ACAGAGT-5' at 2033, 3'-ACAGAGT-5' at 3323, 3'-ACAGAGT-5' at 4373.
  6. complement, negative strand, positive direction, looking for 3'-ACAGAGT-5', 0.
  7. complement, positive strand, negative direction, looking for 3'-ACAGAGT-5', 2, 3'-ACAGAGT-5' at 1075, 3'-ACAGAGT-5' at 2445.
  8. complement, positive strand, positive direction, looking for 3'-ACAGAGT-5', 2, 3'-ACAGAGT-5' at 2174, 3'-ACAGAGT-5' at 2468.
  9. inverse complement, negative strand, negative direction, looking for 3'-TGAGACA-5', 3, 3'-TGAGACA-5' at 919, 3'-TGAGACA-5' at 1085, 3'-TGAGACA-5' at 2029.
  10. inverse complement, negative strand, positive direction, looking for 3'-TGAGACA-5', 0.
  11. inverse complement, positive strand, negative direction, looking for 3'-TGAGACA-5', 0.
  12. inverse complement, positive strand, positive direction, looking for 3'-TGAGACA-5', 1, 3'-TGAGACA-5' at 2308.
  13. inverse, negative strand, negative direction, looking for 3'-ACTCTGT-5', 0.
  14. inverse, negative strand, positive direction, looking for 3'-ACTCTGT-5', 1, 3'-ACTCTGT-5' at 2308.
  15. inverse, positive strand, negative direction, looking for 3'-ACTCTGT-5', 3, 3'-ACTCTGT-5' at 919, 3'-ACTCTGT-5' at 1085, 3'-ACTCTGT-5' at 2029.
  16. inverse, positive strand, positive direction, looking for 3'-ACTCTGT-5', 0.

B recognition elements

The factor II B recognition element is BREu.

Negative strand in the negative direction there are 3: 3'-CCACGCC-5' at 380, 3'-CCGCGCC-5' at 1762, and 3'-CCACGCC-5' at 2197 the distal promoter.

Complement, negative strand, negative direction there us 1: 3'-CCTGCGG-5' at 1153.

Inverse complement, positive strand, negative direction there are 4: 3'-GGCGTGG-5' at 1244, 3'-GGCGCGG-5' at 1762, 3'-GGCGTGG-5' at 1897, and 3'-GGCGTGG-5' at 3047.

Negative strand in the positive direction there are 3: 3'-GCACGCC-5', 1302, 3'-GGACGCC-5', 1672, 3'-GGGCGCC-5', 1769.

Positive strand in the positive direction there are 3: 3'-CCACGCC-5', 489, 3'-CGACGCC-5', 1033, 3'-CCACGCC-5', 1764.

Inverse complement, negative strand, positive direction there is 1: 3'-GGCGCCC-5', 1770.

Inverse complement, positive strand, positive direction there is 4: 3'-GGCGCGC-5', 682, 3'-GGCGCCG-5', 1338, 3'-GGCGCCG-5', 1438, 3'-GGCGTGG-5', 2566.

CAAT boxes

There are no CAAT boxes in either promoter.

CAREs

A CARE occurs in the negative direction: 3'-CAACTC-5' at 86 possibly associated with ZSCAN22. But inverse CAREs occur 3'-CTCAAC-5' at 1406, 3'-CTCAAC-5' at 2592, 3'-CTCAAC-5' at 2704, 3'-CTCAAC-5' at 3115, and 3'-CTCAAC-5' at 4096.

A CARE occurs in the positive direction: 3'-CAACTC-5' at 3292 in the positive direction. But inverse CARE occur 3'-CTCAAC-5' at 1406 and 3'-CTCAAC-5' at 1621 and 3'-CTCAAC-5' at 3290.

CArG boxes

There is a more general CArG box, 3'-CATTAAAAGG-5', at 3441 from ZSCAN22, or -1019 nts from the TSS of A1BG in the negative direction on the positive strand in the distal promoter.

A second more general CArG box, 3'-CAAAAAAAAG-5', at 1399 from ZSCAN22, or -3061 nts from the A1BG TSS may be a CArG box for ZSCAN22 in the negative direction on the positive strand in the distal promoter.

C boxes

Proximal promoters

Inverse complement, negative strand, negative direction there is 1: 3'-ACATCA-5', 4124.

There is one C box 3'-ACATCA-5' at 4116 nts in the positive direction.

Distal promoters

There are four C boxes: 3'-AGTAGT-5' at 2888, 3'-AGTAGT-5' at 2944, 3'-AGTAGT-5' at 3418, and 3'-AGTAGT-5' at 3521 on the negative strand in the negative direction and its complement on the positive strand.

Inverse complement, negative strand, negative direction there are 2: 3'-ACATCA-5', 2340, 3'-ACATCA-5', 2541.

There is one complement C box: 3'-TCATCA-5' at 3251 on the negative strand in the positive direction and its complement on the positive strand.

Inverse, negative strand, positive direction, there is 1: 3'-TGATGA-5', 2144.

Positive strand in the positive direction there is 1: 3'-AGTAGT-5', 3251.

CENP-B boxes

There are no CENP-B boxes in either promoter.

CGCG boxes

Negative strand in the negative direction there are 2: 3'-GCGCGT-5', 161, 3'-CCGCGC-5', 1761, in the distal promoter.

Positive strand in the negative direction there is 1: 3'-GCGCGG-5', 1762, in the distal promoter.

Negative strand in the positive direction there are 8: 3'-GCGCGT-5', 543, 3'-CCGCGC-5', 681, 3'-GCGCGC-5', 683, 3'-ACGCGG-5', 871, 3'-ACGCGG-5', 971, 3'-CCGCGG-5', 1337, 3'-CCGCGG-5', 1437, 3'-CCGCGC-5', 1650, in the distal promoter.

Positive strand in the positive direction there are 22: 3'-CCGCGC-5', 161, 3'-ACGCGG-5', 452, 3'-CCGCGC-5', 542, 3'-GCGCGC-5', 682, 3'-GCGCGT-5', 684, 3'-CCGCGT-5', 876, 3'-CCGCGT-5', 976, 3'-CCGCGT-5', 1046, 3'-ACGCGG-5', 1078, 3'-ACGCGG-5', 1162, 3'-CCGCGC-5', 1214, 3'-ACGCGG-5', 1246, 3'-CCGCGT-5', 1298, 3'-ACGCGT-5', 1314, 3'-ACGCGG-5', 1354, 3'-ACGCGG-5', 1398, 3'-ACGCGT-5', 1414, 3'-ACGCGG-5', 1454, 3'-ACGCGG-5', 1498, 3'-ACGCGT-5', 1523, 3'-CCGCGT-5', 1550, 3'-CCGCGG-5', 1769, in the distal promoter.

CRE boxes

Negative strand in the negative direction there is 1: 3'-TGACGTCA-5', 4317, and its complement in the proximal promoter.

D boxes

There is one D box in the distal promoter: 3'-AGTCTG-5' at 2947 on the negative strand in the negative direction and its complement on the positive strand.

Positive strand in the negative direction there is 1: 3'-AGTCTG-5', 1355.

Inverse complement, positive strand, negative direction there are 2: 3'-CAGACT-5', 15, 3'-CAGACT-5', 1616.

There is one D box in the distal promoter: 3'-AGTCTG-5' at 3923 on the negative strand in the positive direction and its complement on the positive strand.

Inverse complement, negative strand, positive direction there are 2: 3'-CAGACT-5', 1744, 3'-CAGACT-5', 2416.

Inverse complement, positive strand, positive direction there are 3: 3'-CAGACT-5', 2943, 3'-CAGACT-5', 3006, 3'-CAGACT-5', 3924.

Downstream B recognition elements

  1. negative strand in the negative direction, looking for 3'-A/G-T-A/G/T-G/T-G/T-G/T-G/T-5', 59, 3'-ATTTTGT-5' at 68, 3'-ATATGTT-5' at 113, 3'-GTTTTGT-5' at 166, 3'-ATATTTT-5' at 183, 3'-ATATTTT-5' at 222, 3'-GTTTTGG-5' at 259, 3'-ATGTTTT-5' at 485, 3'-GTTTTTT-5' at 487, 3'-ATTGGGG-5' at 616, 3'-ATGTTTT-5' at 637, 3'-GTTTTTT-5' at 639, 3'-ATGTTTT-5' at 771, 3'-GTTTTTT-5' at 773, 3'-GTGTGGT-5' at 883, 3'-GTTTTTT-5' at 928, 3'-GTTTTTT-5' at 1094, 3'-ATGTTTT-5' at 1228, 3'-GTTTTTT-5' at 1230, 3'-GTTTTTG-5' at 1386, 3'-GTTTGTT-5' at 1392, 3'-GTTTTTT-5' at 1396, 3'-GTTGGGT-5' at 1409, 3'-GTTGGGT-5' at 1516, 3'-GTTTGTG-5' at 1540, 3'-ATGTTTT-5' at 1880, 3'-GTTTTTT-5' at 1882, 3'-GTTTTTT-5' at 2038, 3'-ATGTTTT-5' at 2182, 3'-GTTTTTT-5' at 2184, 3'-ATGTTTT-5' at 2307, 3'-GTTTTTT-5' at 2309, 3'-GTGTGGT-5' at 2419, 3'-GTTTGTT-5' at 2484, 3'-GTTTGTT-5' at 2488, 3'-ATATGTT-5' at 2642, 3'-ATGTTTT-5' at 2644, 3'-GTGGGGT-5' at 2764, 3'-GTTGGGT-5' at 2846, 3'-ATATTTG-5' at 2875, 3'-GTAGTTT-5' at 2890, 3'-ATTTTTT-5' at 3026, 3'-GTGGGTT-5' at 3136, 3'-ATTTTTG-5' at 3165, 3'-GTATTTT-5' at 3171, 3'-GTTTTTG-5' at 3328, 3'-ATTTGTT-5' at 3338, 3'-ATTTGGT-5' at 3365, 3'-ATTTGGT-5' at 3484, 3'-GTAGTTG-5' at 3523, 3'-ATGGTGG-5' at 3740, 3'-GTGTTTT-5' at 3767, 3'-ATGTTTT-5' at 4066, 3'-GTTTTTT-5' at 4068, 3'-GTTGTGT-5' at 4196, 3'-ATGTTTT-5' at 4216, 3'-GTTTTTT-5' at 4218, 3'-GTTTTTT-5' at 4378, 3'-GTGGGGT-5' at 4446, 3'-GTAGGTG-5' at 4458 and their complements.
  2. negative strand in the positive direction, looking for 3'-A/G-T-A/G/T-G/T-G/T-G/T-G/T-5', 11, 3'-GTGGGGG-5' at 56, 3'-ATTTTTT-5' at 2451, 3'-GTGTTGG-5' at 2816, 3'-ATGTTTG-5' at 3339, 3'-GTGGTGG-5' at 3816, 3'-GTGTGGT-5' at 3967, 3'-GTGGTGT-5' at 3969, 3'-GTGGTTT-5' at 4108, 3'-ATTGTTG-5' at 4173, 3'-ATGGGGG-5' at 4225, 3'-GTGGGGT-5' at 4397 and their complements.
  3. positive strand in the negative direction, looking for 3'-A/G-T-A/G/T-G/T-G/T-G/T-G/T-5', 31, 3'-ATATGTT-5' at 43, 3'-ATATGGG-5' at 78, 3'-ATGGGGT-5' at 204, 3'-ATGTTTT-5' at 215, 3'-ATATGGT-5' at 606, 3'-ATGGTGT-5' at 608, 3'-ATGTGGT-5' at 788, 3'-GTGGTGG-5' at 790, 3'-GTGGTGT-5' at 793, 3'-ATTGGGT-5' at 1047, 3'-GTGGGTG-5' at 1163, 3'-GTGGTGG-5' at 1247, 3'-GTGGTGT-5' at 1477, 3'-GTGGTGG-5' at 1900, 3'-GTGGTGG-5' at 1903, 3'-GTGGGTG-5' at 2332, 3'-GTGTGGT-5' at 2659, 3'-GTGGTGG-5' at 2661, 3'-ATATTTT-5' at 2853, 3'-GTGGTGG-5' at 3050, 3'-GTGTGGT-5' at 3187, 3'-GTGGTGG-5' at 3189, 3'-GTGGTGG-5' at 3192, 3'-GTGGGTG-5' at 3195, 3'-ATTGGTT-5' at 3531, 3'-GTGGTTG-5' at 3605, 3'-ATGGGGT-5' at 3802, 3'-ATGTGGT-5' at 3811, 3'-GTGTTGG-5' at 3942, 3'-GTTGGTT-5' at 3944, 3'-ATGGTGG-5' at 4110 and their complements.
  4. positive strand in the positive direction, looking for 3'-A/G-T-A/G/T-G/T-G/T-G/T-G/T-5', 19, 3'-GTGGGTG-5' at 72, 3'-GTAGGTG-5' at 631, 3'-GTAGGTG-5' at 700, 3'-GTGGTGG-5' at 704, 3'-ATGGGGT-5' at 1891, 3'-GTTGGGT-5' at 2015, 3'-GTGGGGG-5' at 2020, 3'-GTTGGTG-5' at 2122, 3'-ATATGGT-5' at 2591, 3'-ATGGTGT-5' at 2600, 3'-GTGTGGT-5' at 2603, 3'-ATGGTGG-5' at 2759, 3'-GTGTGGG-5' at 2965, 3'-ATAGGGT-5' at 3386, 3'-GTAGGGT-5' at 3631, 3'-GTGTGGT-5' at 3825, 3'-GTTTGTG-5' at 4257, 3'-GTGGGGT-5' at 4286, 3'-GTGGGGT-5' at 4328 and their complements.
  5. inverse, negative strand, negative direction, is SuccessablesdBREi--.bas, looking for 3'-G/T-G/T-G/T-G/T-A/G/T-T-A/G-5', 44, 3'-TTTGTTA-5' at 230, 3'-TTTTGTA-5' at 361, 3'-TTTTTTA-5' at 488, 3'-TTTTATG-5' at 633, 3'-TTTTATG-5' at 767, 3'-TGTGGTA-5' at 884, 3'-GGTTGTA-5' at 1205, 3'-TTTTTTA-5' at 1231, 3'-GTTTTTG-5' at 1386, 3'-GTTTGTG-5' at 1540, 3'-TTTTATG-5' at 1564, 3'-TTGTTTG-5' at 1587, 3'-TTTTATA-5' at 1740, 3'-TGGGGTA-5' at 1861, 3'-TTTTATG-5' at 1876, 3'-TTTTTTA-5' at 2061, 3'-GGTTGTA-5' at 2150, 3'-TTTTTTA-5' at 2185, 3'-TGGGGTA-5' at 2288, 3'-TTTTATG-5' at 2303, 3'-TGTGGTG-5' at 2420, 3'-TTGTTTG-5' at 2486, 3'-TTGTTTG-5' at 2511, 3'-GGTTGTG-5' at 2549, 3'-GGTTGTA-5' at 2612, 3'-GTTTTTA-5' at 2646, 3'-TTTGTTG-5' at 2843, 3'-TTTTATA-5' at 2869, 3'-TTTTTTA-5' at 2930, 3'-TTTGGTG-5' at 2972, 3'-TTTTTTG-5' at 3027, 3'-TGGGTTG-5' at 3137, 3'-TGGGGTA-5' at 3152, 3'-TTTTGTA-5' at 3167, 3'-GTTTTTG-5' at 3328, 3'-TTTGGTG-5' at 3366, 3'-TTTTGTG-5' at 3512, 3'-GTTGATA-5' at 3526, 3'-TGTTTTA-5' at 3768, 3'-GGGTATG-5' at 3857, 3'-GGTTGTG-5' at 3981, 3'-TTTTTTA-5' at 4069, 3'-TTTTTTA-5' at 4219, 3'-TTGGGTA-5' at 4454 and their complements.
  6. inverse, negative strand, positive direction, is SuccessablesdBREi-+.bas, looking for 3'-G/T-G/T-G/T-G/T-A/G/T-T-A/G-5', 16, 3'-GGGGATG-5' at 59, 3'-TGTTTTA-5' at 148, 3'-TTGGGTG-5' at 1802, 3'-TTTTTTG-5' at 2282, 3'-TGGGATG-5' at 2409, 3'-TTTTTTG-5' at 2452, 3'-GGGGATA-5' at 2659, 3'-GGTTTTG-5' at 2688, 3'-GTGGATG-5' at 2714, 3'-GGTGTTG-5' at 2815, 3'-GGTTATG-5' at 3026, 3'-TGTGGTG-5' at 3644, 3'-TTTGGTG-5' at 3949, 3'-TGTGGTG-5' at 3968, 3'-GGTTTTA-5' at 4110, 3'-TGGGGTG-5' at 4398 and their complements.
  7. inverse, positive strand, negative direction, is SuccessablesdBREi+-.bas, looking for 3'-G/T-G/T-G/T-G/T-A/G/T-T-A/G-5', 16, 3'-GTTTTTA-5' at 217, 3'-TGGTGTG-5' at 609, 3'-TGTGGTG-5' at 789, 3'-TTGGGTG-5' at 1048, 3'-GTGGGTG-5' at 1163, 3'-TTTTTTG-5' at 1433, 3'-TGGTGTG-5' at 1478, 3'-GGTGGTG-5' at 1902, 3'-GTGGGTG-5' at 2332, 3'-TGTGGTG-5' at 2660, 3'-GGGTGTG-5' at 3185, 3'-GGTTTTA-5' at 3350, 3'-TTGGTTG-5' at 3532, 3'-GTGGTTG-5' at 3605, 3'-GGTGATG-5' at 3798, 3'-TTGGTTG-5' at 3945 and their complements.
  8. inverse, positive strand, positive direction, is SuccessablesdBREi++.bas, looking for 3'-G/T-G/T-G/T-G/T-A/G/T-T-A/G-5', 14, 3'-GTGGGTG-5' at 72, 3'-GGTGGTG-5' at 703, 3'-TTGGATG-5' at 1283, 3'-TTGGGTG-5' at 2016, 3'-GTTGGTG-5' at 2122, 3'-TGGTGTG-5' at 2601, 3'-TTTGGTG-5' at 2633, 3'-TTGTGTG-5' at 3097, 3'-TGGTTTG-5' at 3176, 3'-TGTGGTA-5' at 3826, 3'-TGGGGTG-5' at 3941, 3'-TGGGGTA-5' at 4220, 3'-GTTTGTG-5' at 4257, 3'-TGGGGTG-5' at 4287 and their complements.

Downstream core elements

In the negative direction on the negative strand, the A1BG transcription start site is at 4460 nucleotides from the last nucleotide of the gene ZSCAN22. In the positive direction on the negative strand, the A1BG transcription start site is at 4300 from well within the gene ZNF497. Downstream core elements are expected downstream of these TSSs. Occurrences before the TSSs can be found on Downstream core element gene transcriptions.

  1. negative strand, negative direction, looking for DCE SI: 3'-CTTC-5', 0.
  2. negative strand, positive direction, looking for DCE SI: 3'-CTTC-5', 0.
  3. positive strand, negative direction, looking for DCE SI: 3'-CTTC-5' at 4528.
  4. positive strand, positive direction, looking for DCE SI: 3'-CTTC-5', 0.
  1. negative strand, negative direction, looking for DCE SII: 3'-CTGT-5', 2, 3'-CTGT-5' at 4468 , 3'-CTGT-5' at 4507.
  2. negative strand, positive direction, looking for DCE SII: 3'-CTGT-5', 1, 3'-CTGT-5' at 4392.
  3. positive strand, negative direction, looking for DCE SII: 3'-CTGT-5', 0.
  4. positive strand, positive direction, looking for DCE SII: 3'-CTGT-5', 1, 3'-CTGT-5' at 4332.
  1. negative strand, negative direction, looking for DCE SIII: 3'-AGC-5', 0.
  2. negative strand, positive direction, looking for DCE SIII: 3'-AGC-5', 1, 3'-AGC-5' at 4352.
  3. positive strand, negative direction, looking for DCE SIII: 3'-AGC-5', 3, 3'-AGC-5' at 4480, 3'-AGC-5' at 4489, 3'-AGC-5' at 4520.
  4. positive strand, positive direction, looking for DCE SIII: 3'-AGC-5', 1, 3'-AGC-5' at 4374.

Complements

  1. negative strand, negative direction, looking for DCE SIc: 3'-GAAG-5', 1, 3'-GAAG-5' at 4528.
  2. negative strand, positive direction, looking for DCE SIc: 3'-GAAG-5', 0.
  3. positive strand, negative direction, looking for DCE SIc: 3'-GAAG-5', 0.
  4. positive strand, positive direction, looking for DCE SIc: 3'-GAAG-5', 0.
  1. negative strand, negative direction, looking for DCE SIIc: 3'-GACA-5', 0.
  2. negative strand, positive direction, looking for DCE SIIc: 3'-GACA-5', 1, 3'-GACA-5' at 4332.
  3. positive strand, negative direction, looking for DCE SIIc: 3'-GACA-5', 2, 3'-GACA-5' at 4468, 3'-GACA-5' at 4507.
  4. positive strand, positive direction, looking for DCE SIIc: 3'-GAAG-5', 1, 3'-GACA-5' at 4392.
  1. negative strand, negative direction, looking for DCE SIIIc: 3'-TCG-5', 3, 3'-TCG-5' at 4480, 3'-TCG-5' at 4489, 3'-TCG-5' at 4520.
  2. negative strand, positive direction, looking for DCE SIIIc: 3'-TCG-5', 1, 3'-TCG-5' at 4374.
  3. positive strand, negative direction, looking for DCE SIIIc: 3'-TCG-5', 0.
  4. positive strand, positive direction, looking for DCE SIIIc: 3'-TCG-5', 1, 3'-TCG-5' at 4352.

Inverse complements

  1. looking for DCE SIci: 3'-GAAG-5', same as the complements.
  1. negative strand, negative direction, looking for DCE SIIci: 3'-ACAG-5', 0.
  2. negative strand, positive direction, looking for DCE SIIci: 3'-ACAG-5', 0.
  3. positive strand, negative direction, looking for DCE SIIci: 3'-ACAG-5', 1, 3'-ACAG-5' at 4517.
  4. positive strand, positive direction, looking for DCE SIIci: 3'-ACAG-5', 1, 3'-ACAG-5' at 4366.
  1. negative strand, negative direction, looking for DCE SIIIci: 3'-GCT-5', 1, 3'-GCT-5' at 4471.
  2. negative strand, positive direction, looking for DCE SIIIci: 3'-GCT-5', 4, 3'-GCT-5' at 4312, 3'-GCT-5' at 4321, 3'-GCT-5' at 4372, 3'-GCT-5' at 4390.
  3. positive strand, negative direction, looking for DCE SIIIci: 3'-GCT-5', 0.
  4. positive strand, positive direction, looking for DCE SIIIci: 3'-GCT-5', 1, 3'-GCT-5' at 4356.

Inverses

  1. looking for DCE SIi: 3'-CTTC-5', same as the direct transcript.
  1. negative strand, negative direction, looking for DCE SIIi: 3'-TGTC-5', 1, 3'-TGTC-5' at 4517.
  2. negative strand, positive direction, looking for DCE SIIi: 3'-TGTC-5', 1, 3'-TGTC-5' at 4366.
  3. positive strand, negative direction, looking for DCE SIIi: 3'-TGTC-5', 0.
  4. positive strand, positive direction, looking for DCE SIIi: 3'-TGTC-5', 0.
  1. negative strand, negative direction, looking for DCE SIIIi: 3'-CGA-5', 0.
  2. negative strand, positive direction, looking for DCE SIIIi: 3'-CGA-5', 1, 3'-CGA-5' at 4356.
  3. positive strand, negative direction, looking for DCE SIIIi: 3'-CGA-5', 1, 3'-CGA-5' at 4471.
  4. positive strand, positive direction, looking for DCE SIIIi: 3'-CGA-5', 4, 3'-CGA-5' at 4312, 3'-CGA-5' at 4321, 3'-CGA-5' at 4372, 3'-CGA-5' at 4390.

Downstream promoter elements

  1. negative strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesDPE--.bas, looking for 3'-A/G-G-A/T-C/T-A/C/G-5', 163, 3'-GGTCG-5', 35, 3'-AGATA-5', 234, 3'-GGTCC-5', 262, 3'-GGACA-5', 394, 3'-GGTCG-5', 403, 3'-GGTTC-5', 419, 3'-AGTCC-5', 441, 3'-GGACC-5', 459, 3'-AGATG-5', 481, 3'-GGTCG-5', 504, 3'-GGACC-5', 508, 3'-GGTCG-5', 540, 3'-GGTTC-5', 556, 3'-AGTCC-5', 578, 3'-GGACC-5', 596, 3'-AGATG-5', 624, 3'-GGTCC-5', 648, 3'-GGACA-5', 667, 3'-GGTCG-5', 676, 3'-GGTTC-5', 692, 3'-AGTCC-5', 714, 3'-GGTCG-5', 728, 3'-GGTCG-5', 737, 3'-AGATG-5', 758, 3'-GGACA-5', 801, 3'-GGTCG-5', 810, 3'-GGTCC-5', 850, 3'-GGTTC-5', 874, 3'-GGTCG-5', 895, 3'-GGACC-5', 899, 3'-AGACA-5', 919, 3'-GGTCC-5', 948, 3'-GGACA-5', 967, 3'-GGTCG-5', 976, 3'-AGTCC-5', 984, 3'-GGACC-5', 1015, 3'-GGTCG-5', 1061, 3'-AGACA-5', 1085, 3'-GGACA-5', 1131, 3'-GGTCG-5', 1140, 3'-GGTCG-5', 1194, 3'-GGACC-5', 1198, 3'-GGTTG-5', 1203, 3'-AGATG-5', 1224, 3'-GGACA-5', 1258, 3'-GGTCG-5', 1267, 3'-AGTCC-5', 1275, 3'-GGATC-5', 1306, 3'-GGTCA-5', 1352, 3'-AGACC-5', 1356, 3'-AGTTG-5', 1406, 3'-AGACA-5', 1452, 3'-GGTCC-5', 1460, 3'-AGTCG-5', 1486, 3'-AGTTG-5', 1513, 3'-AGATA-5', 1525, 3'-GGTCA-5', 1532, 3'-GGTCG-5', 1611, 3'-AGACA-5', 1776, 3'-GGTCG-5', 1785, 3'-GGTTC-5', 1817, 3'-GGACC-5', 1841, 3'-AGATG-5', 1867, 3'-GGACA-5', 1911, 3'-GGTCG-5', 1920, 3'-GGACC-5', 1959, 3'-GGTCG-5', 2005, 3'-GGACC-5', 2009, 3'-AGACA-5', 2029, 3'-GGTCC-5', 2077, 3'-GGATC-5', 2093, 3'-AGTCC-5', 2134, 3'-GGTTG-5', 2148, 3'-AGATG-5', 2169, 3'-GGTCA-5', 2211, 3'-AGTCC-5', 2250, 3'-GGTCG-5', 2264, 3'-GGACC-5', 2268, 3'-AGATG-5', 2294, 3'-GGACA-5', 2337, 3'-GGTCG-5', 2346, 3'-GGACC-5', 2385, 3'-GGTCG-5', 2431, 3'-GGACC-5', 2435, 3'-AGTTA-5', 2496, 3'-GGTCC-5', 2519, 3'-GGACA-5', 2538, 3'-GGTTG-5', 2547, 3'-AGTCC-5', 2587, 3'-GGTCA-5', 2601, 3'-GGTTG-5', 2610, 3'-AGTCG-5', 2650, 3'-GGTCA-5', 2654, 3'-GGACA-5', 2672, 3'-GGTCG-5', 2681, 3'-GGACC-5', 2720, 3'-GGTCG-5', 2766, 3'-GGACC-5', 2770, 3'-GGTTA-5', 2848, 3'-AGATG-5', 2988, 3'-GGATA-5', 2996, 3'-GGACA-5', 3061, 3'-GGTCG-5', 3070, 3'-AGTCC-5', 3110, 3'-GGTCG-5', 3124, 3'-GGACC-5', 3128, 3'-GGTTG-5', 3137, 3'-AGATG-5', 3158, 3'-GGACA-5', 3200, 3'-AGTCG-5', 3204, 3'-GGTCG-5', 3209, 3'-AGTCC-5', 3217, 3'-GGTCC-5', 3249, 3'-GGTTC-5', 3273, 3'-GGTCG-5', 3294, 3'-GGACC-5', 3298, 3'-AGACA-5', 3319, 3'-AGTCC-5', 3396, 3'-AGTTG-5', 3523, 3'-AGACA-5', 3556, 3'-GGTCC-5', 3564, 3'-GGACG-5', 3579, 3'-GGTCC-5', 3585, 3'-GGTCG-5', 3682, 3'-GGTCG-5', 3701, 3'-AGACG-5', 3706, 3'-GGTCG-5', 3731, 3'-GGACC-5', 3744, 3'-AGACC-5', 3835, 3'-AGTTC-5', 3844, 3'-GGACG-5', 3861, 3'-GGTCC-5', 3871, 3'-GGTCC-5', 3885, 3'-GGACC-5', 3906, 3'-GGTCC-5', 3951, 3'-GGACA-5', 3970, 3'-GGTTG-5', 3979, 3'-GGTTC-5', 4019, 3'-AGTTC-5', 4027, 3'-GGTCG-5', 4033, 3'-GGACC-5', 4037, 3'-AGATG-5', 4062, 3'-GGTCC-5', 4102, 3'-GGACA-5', 4121, 3'-GGTCG-5', 4130, 3'-AGTCC-5', 4138, 3'-GGTCC-5', 4170, 3'-AGTTC-5', 4178, 3'-GGACA-5', 4208, 3'-AGATG-5', 4212, 3'-GGTCC-5', 4253, 3'-GGTCG-5', 4261, 3'-GGACC-5', 4300, 3'-GGTCG-5', 4345, 3'-GGACC-5', 4349, 3'-GGACA-5', 4369, 3'-GGTCA-5', 4415, 3'-AGATG-5', 4430, 3'-AGTCC-5', 4436, 3'-GGTCG-5', 4480, 3'-AGTCG-5', 4489, 3'-GGACC-5', 4494, 3'-GGACC-5', 4546, and their complements.
  2. negative strand in the positive direction (from ZNF497 to A1BG) is SuccessablesDPE-+.bas, looking for 3'-A/G-G-A/T-C/T-A/C/G-5', 73, 3'-GGACC-5' at 37, 3'-GGATG-5' at 59, 3'-GGTCA-5' at 153, 3'-AGATG-5' at 166, 3'-AGTCC-5' at 172, 3'-GGACC-5' at 187, 3'-GGTCC-5' at 218, 3'-GGTTC-5' at 305, 3'-GGACG-5' at 323, 3'-GGACG-5' at 359, 3'-AGACG-5' at 398, 3'-GGACG-5' at 410, 3'-AGACC-5' at 440, 3'-AGACA-5' at 712, 3'-AGTCC-5' at 757, 3'-AGATC-5' at 864, 3'-AGATC-5' at 964, 3'-AGTCG-5' at 1528, 3'-GGACG-5' at 1670, 3'-GGTCG-5' at 1687, 3'-GGACA-5' at 1693, 3'-AGTCC-5' at 1826, 3'-AGTCC-5' at 1841, 3'-GGACA-5' at 1869, 3'-GGATG-5' at 1878, 3'-GGTTC-5' at 1926, 3'-AGTTC-5' at 1987, 3'-AGTCC-5' at 2026, 3'-GGTCA-5' at 2035, 3'-AGTCA-5' at 2100, 3'-AGTTA-5' at 2134, 3'-GGTCA-5' at 2220, 3'-AGATC-5' at 2230, 3'-GGATG-5' at 2409, 3'-GGACA-5' at 2460, 3'-AGTCA-5' at 2607, 3'-AGTCA-5' at 2613, 3'-AGTCA-5' at 2618, 3'-GGATA-5' at 2659, 3'-AGTTA-5' at 2666, 3'-GGATG-5' at 2714, 3'-GGATA-5' at 2737, 3'-AGACC-5' at 2861, 3'-GGTTC-5' at 2922, 3'-AGTTC-5' at 2954, 3'-AGTCC-5' at 2998, 3'-GGTTA-5' at 3024, 3'-GGTTG-5' at 3050, 3'-AGTCC-5' at 3084, 3'-GGACA-5' at 3131, 3'-GGACC-5' at 3172, 3'-AGTCG-5' at 3283, 3'-AGTTA-5' at 3381, 3'-AGATG-5' at 3418, 3'-GGATG-5' at 3457, 3'-AGATG-5' at 3475, 3'-GGTTG-5' at 3490, 3'-GGACA-5' at 3530, 3'-GGACC-5' at 3545, 3'-AGACC-5' at 3550, 3'-GGATG-5' at 3574, 3'-GGTCA-5' at 3820, 3'-AGTCC-5' at 3863, 3'-AGACA-5' at 3893, 3'-GGTTC-5' at 4073, 3'-GGATC-5' at 4080, 3'-GGATG-5' at 4099, 3'-AGTTC-5' at 4200, 3'-GGACA-5' at 4252, 3'-GGTCA-5' at 4269, 3'-AGACG-5' at 4319, 3'-AGACA-5' at 4332, 3'-GGTCC-5' at 4420, and their complements.
  3. positive strand in the negative direction is SuccessablesDPE+-.bas, looking for 3'-A/G-G-A/T-C/T-A/C/G-5', 101, 3'-GGACC-5', 32, 3'-AGATA-5', 57, 3'-GGATA-5', 74, 3'-AGTTG-5', 84, 3'-GGATA-5', 98, 3'-GGATA-5', 108, 3'-AGTCG-5', 157, 3'-AGACA-5', 170, 3'-GGTCA-5', 206, 3'-AGATG-5', 244, 3'-AGTTC-5', 253, 3'-AGACA-5', 422, 3'-GGATC-5', 430, 3'-GGTCA-5', 439, 3'-GGATC-5', 525, 3'-AGACA-5', 559, 3'-GGTCA-5', 568, 3'-GGTCA-5', 576, 3'-AGATC-5', 589, 3'-GGATC-5', 703, 3'-GGTCA-5', 712, 3'-AGTTC-5', 719, 3'-AGACC-5', 725, 3'-GGATG-5', 784, 3'-GGTTG-5', 862, 3'-AGATC-5', 877, 3'-AGATC-5', 972, 3'-GGTTG-5', 1028, 3'-GGACG-5', 1151, 3'-GGATC-5', 1167, 3'-AGTTC-5', 1177, 3'-GGTTG-5', 1319, 3'-AGATG-5', 1438, 3'-AGACA-5', 1569, 3'-AGATA-5', 1595, 3'-GGATC-5', 1812, 3'-AGATG-5', 1828, 3'-AGACC-5', 1834, 3'-AGATC-5', 1987, 3'-GGACA-5', 2117, 3'-AGACC-5', 2121, 3'-AGACC-5', 2145, 3'-AGATA-5', 2177, 3'-GGTTG-5', 2234, 3'-GGATC-5', 2239, 3'-GGTCA-5', 2248, 3'-AGACC-5', 2261, 3'-GGACA-5', 2271, 3'-GGTTG-5', 2398, 3'-AGATC-5', 2413, 3'-AGTCC-5', 2543, 3'-GGATC-5', 2574, 3'-GGTCA-5', 2585, 3'-AGTTG-5', 2592, 3'-AGACC-5', 2598, 3'-AGTTG-5', 2704, 3'-AGTTG-5', 2733, 3'-AGACA-5', 2880, 3'-AGATG-5', 2894, 3'-AGATG-5', 2905, 3'-AGACA-5', 2948, 3'-AGATA-5', 2981, 3'-GGATC-5', 3097, 3'-AGTTG-5', 3115, 3'-AGACC-5', 3121, 3'-GGTTG-5', 3261, 3'-AGATC-5', 3276, 3'-GGACA-5', 3389, 3'-AGACA-5', 3433, 3'-AGATA-5', 3465, 3'-AGATC-5', 3488, 3'-GGTTG-5', 3532, 3'-GGTTG-5', 3605, 3'-AGATG-5', 3620, 3'-AGATG-5', 3627, 3'-GGATA-5', 3655, 3'-GGACA-5', 3756, 3'-AGACC-5', 3761, 3'-GGTTG-5', 3804, 3'-GGTCG-5', 3813, 3'-GGACC-5', 3868, 3'-AGATG-5', 3919, 3'-GGTTG-5', 3945, 3'-GGATC-5', 4006, 3'-AGTTC-5', 4024, 3'-AGACC-5', 4030, 3'-AGTTG-5', 4096, 3'-AGTCC-5', 4126, 3'-GGATC-5', 4157, 3'-AGTTC-5', 4175, 3'-AGACA-5', 4181, 3'-AGACC-5', 4204, 3'-AGACG-5', 4235, 3'-GGATC-5', 4288, 3'-GGTCA-5', 4307, 3'-AGACC-5', 4365, 3'-AGTTC-5', 4417, 3'-GGACA-5', 4468, 3'-AGATC-5', 4475, 3'-AGTCC-5', 4500, 3'-AGACA-5', 4507, and their complements.
  4. positive strand in the positive direction is SuccessablesDPE++.bas, looking for 3'-A/G-G-A/T-C/T-A/C/G-5', 159, 3'-GGTCC-5' at 8, 3'-GGTCC-5' at 33, 3'-GGACC-5' at 40, 3'-AGTCC-5' at 90, 3'-AGACA-5' at 98, 3'-AGACC-5' at 102, 3'-GGACA-5' at 144, 3'-GGTTC-5' at 177, 3'-GGACG-5' at 191, 3'-GGTCC-5' at 215, 3'-AGACG-5' at 223, 3'-AGACC-5' at 270, 3'-GGACC-5' at 286, 3'-GGTCG-5' at 329, 3'-GGTCC-5' at 424, 3'-GGACG-5' at 435, 3'-AGTCG-5' at 511, 3'-GGTCC-5' at 515, 3'-GGACC-5' at 598, 3'-GGTTG-5' at 607, 3'-AGTCG-5' at 613, 3'-GGTCG-5' at 617, 3'-GGTCG-5' at 623, 3'-GGATG-5' at 649, 3'-GGTCC-5' at 707, 3'-GGACG-5' at 807, 3'-AGTCG-5' at 831, 3'-GGTTG-5' at 843, 3'-GGACC-5' at 847, 3'-GGACA-5' at 891, 3'-GGACG-5' at 907, 3'-AGTCG-5' at 931, 3'-GGTTG-5' at 943, 3'-GGACC-5' at 947, 3'-GGACA-5' at 991, 3'-GGACG-5' at 1075, 3'-GGACG-5' at 1118, 3'-GGTCG-5' at 1127, 3'-GGTCC-5' at 1175, 3'-GGATG-5' at 1195, 3'-GGACC-5' at 1199, 3'-GGTCA-5' at 1250, 3'-AGTCG-5' at 1267, 3'-GGTCG-5' at 1271, 3'-GGTTG-5' at 1279, 3'-GGATG-5' at 1283, 3'-GGACG-5' at 1311, 3'-GGTCG-5' at 1357, 3'-GGTCG-5' at 1363, 3'-GGACG-5' at 1369, 3'-AGACC-5' at 1376, 3'-AGACG-5' at 1395, 3'-GGACG-5' at 1411, 3'-GGTCG-5' at 1457, 3'-GGTCG-5' at 1463, 3'-GGACG-5' at 1469, 3'-AGACC-5' at 1476, 3'-AGACG-5' at 1495, 3'-GGATG-5' at 1573, 3'-AGTCG-5' at 1603, 3'-AGTTG-5' at 1621, 3'-AGACG-5' at 1733, 3'-GGACG-5' at 1776, 3'-GGACC-5' at 1815, 3'-GGTCC-5' at 1855, 3'-GGACA-5' at 1860, 3'-AGACC-5' at 1864, 3'-GGTCC-5' at 1893, 3'-AGACC-5' at 1992, 3'-GGTTG-5' at 2012, 3'-GGTCA-5' at 2024, 3'-GGTCG-5' at 2052, 3'-AGTCA-5' at 2060, 3'-AGTCA-5' at 2098, 3'-AGTCG-5' at 2102, 3'-AGTCC-5' at 2115, 3'-AGATC-5' at 2167, 3'-AGACA-5' at 2182, 3'-AGTCG-5' at 2198, 3'-AGTTA-5' at 2233, 3'-GGACA-5' at 2250, 3'-AGACA-5' at 2260, 3'-AGACA-5' at 2308, 3'-GGTCC-5' at 2316, 3'-AGTCC-5' at 2372, 3'-AGTCG-5' at 2390, 3'-GGTTC-5' at 2398, 3'-GGACC-5' at 2433, 3'-GGATC-5' at 2481, 3'-GGACC-5' at 2501, 3'-AGTTC-5' at 2508, 3'-GGACG-5' at 2520, 3'-AGTCG-5' at 2526, 3'-GGACC-5' at 2569, 3'-GGTCC-5' at 2574, 3'-GGTTC-5' at 2593, 3'-GGTCA-5' at 2605, 3'-AGTTC-5' at 2615, 3'-AGTCC-5' at 2620, 3'-GGTCC-5' at 2780, 3'-AGACG-5' at 2856, 3'-GGTCC-5' at 2876, 3'-AGACC-5' at 2883, 3'-GGACC-5' at 2891, 3'-GGTTA-5' at 2908, 3'-AGACA-5' at 2925, 3'-AGTCA-5' at 2936, 3'-AGACA-5' at 2957, 3'-AGACG-5' at 2975, 3'-AGACC-5' at 2983, 3'-GGACC-5' at 2988, 3'-GGTCA-5' at 2996, 3'-GGTCC-5' at 3016, 3'-AGACC-5' at 3021, 3'-AGTCC-5' at 3034, 3'-AGTCG-5' at 3041, 3'-GGACC-5' at 3047, 3'-AGACG-5' at 3060, 3'-GGTCA-5' at 3082, 3'-GGTCC-5' at 3111, 3'-AGTCG-5' at 3155, 3'-GGTCG-5' at 3239, 3'-AGATA-5' at 3258, 3'-AGACG-5' at 3267, 3'-AGACG-5' at 3278, 3'-AGTTG-5' at 3290, 3'-GGACC-5' at 3296, 3'-AGACG-5' at 3306, 3'-AGACG-5' at 3358, 3'-GGACC-5' at 3362, 3'-GGTCA-5' at 3379, 3'-AGACC-5' at 3405, 3'-AGTTA-5' at 3424, 3'-GGACA-5' at 3434, 3'-GGACC-5' at 3496, 3'-GGTCC-5' at 3536, 3'-GGACA-5' at 3617, 3'-GGACA-5' at 3622, 3'-GGTTG-5' at 3633, 3'-GGACC-5' at 3679, 3'-GGTCC-5' at 3687, 3'-GGTCG-5' at 3720, 3'-AGTCC-5' at 3728, 3'-GGACC-5' at 3758, 3'-AGTCG-5' at 3775, 3'-GGACC-5' at 3787, 3'-GGTCA-5' at 3841, 3'-AGTCC-5' at 3868, 3'-AGTCG-5' at 3997, 3'-AGTCG-5' at 4023, 3'-GGTCC-5' at 4032, 3'-AGTCG-5' at 4052, 3'-AGATC-5' at 4064, 3'-AGATC-5' at 4076, 3'-GGACG-5' at 4231, 3'-AGTCA-5' at 4271, 3'-GGACC-5' at 4409, 3'-AGACC-5' at 4416, 3'-GGACC-5' at 4424, and their complements.
  5. inverse, negative strand, negative direction, is SuccessablesDPEi--.bas, looking for 3'-A/C/G-C/T-A/T-G-A/G-5', 58, 3'-CCTGG-5', 32, 3'-ACAGA-5', 479, 3'-GTAGG-5', 593, 3'-ATTGG-5', 614, 3'-ACTGG-5', 734, 3'-GCAGA-5', 754, 3'-CTTGG-5', 846, 3'-ACAGA-5', 921, 3'-CTAGG-5', 973, 3'-CTTGG-5', 1012, 3'-ACTGA-5', 1051, 3'-ACAGA-5', 1087, 3'-GCTGG-5', 1191, 3'-ACAGA-5', 1222, 3'-CTTGG-5', 1303, 3'-GTTGG-5', 1407, 3'-CTAGA-5', 1482, 3'-GTTGG-5', 1514, 3'-ATAGG-5', 1529, 3'-GTAGG-5', 1572, 3'-CTTGA-5', 1685, 3'-ATAGA-5', 1731, 3'-GCAGA-5', 1774, 3'-CTAGG-5', 1813, 3'-GTAGG-5', 1838, 3'-GTAGA-5', 1863, 3'-CTTGG-5', 1956, 3'-ACAGA-5', 2031, 3'-ACAGA-5', 2165, 3'-ACTGG-5', 2189, 3'-GTAGA-5', 2290, 3'-CTTGG-5', 2382, 3'-CTTGG-5', 2717, 3'-ACTGA-5', 2786, 3'-GTTGG-5', 2844, 3'-GTTGA-5', 2911, 3'-ACAGA-5', 2986, 3'-GTAGA-5', 3154, 3'-CTTGG-5', 3245, 3'-ACAGA-5', 3321, 3'-CTTGA-5', 3460, 3'-GTTGA-5', 3524, 3'-GTAGA-5', 3551, 3'-CCTGA-5', 3640, 3'-GCAGG-5', 3698, 3'-CCTGA-5', 3747, 3'-CTTGG-5', 3784, 3'-ACAGA-5', 3833, 3'-GTTGA-5', 3849, 3'-CCTGG-5', 3868, 3'-GTAGG-5', 3903, 3'-GTAGA-5', 4058, 3'-ACAGA-5', 4210, 3'-CCTGA-5', 4327, 3'-ACAGA-5', 4371, 3'-CTTGG-5', 4451, 3'-GTAGG-5', 4456, 3'-CTAGG-5', 4476,
  6. inverse, negative strand, positive direction, is SuccessablesDPEi-+.bas, looking for 3'-A/C/G-C/T-A/T-G-A/G-5', 152 , 3'-CCAGG-5' at 8 , 3'-CCAGA-5' at 15 , 3'-ATTGG-5' at 24 , 3'-CCAGG-5' at 33 , 3'-CCTGG-5' at 40 , 3'-ACAGG-5' at 157 , 3'-GCAGG-5' at 194 , 3'-CCAGA-5' at 204 , 3'-CCAGG-5' at 215 , 3'-GCTGG-5' at 277 , 3'-CCTGG-5' at 286 , 3'-GCAGG-5' at 318 , 3'-ACTGG-5' at 347 , 3'-ACAGG-5' at 365 , 3'-GCAGG-5' at 379 , 3'-GCTGG-5' at 386 , 3'-GCAGA-5' at 396 , 3'-GCTGG-5' at 417 , 3'-CCAGG-5' at 424 , 3'-GCAGA-5' at 438 , 3'-CCAGA-5' at 468 , 3'-CCAGG-5' at 515 , 3'-ACAGG-5' at 552 , 3'-CCTGG-5' at 598 , 3'-GCAGG-5' at 658 , 3'-CCAGG-5' at 707 , 3'-CCTGA-5' at 725 , 3'-GCTGG-5' at 779 , 3'-CCAGA-5' at 835 , 3'-CCTGG-5' at 847 , 3'-CCTGA-5' at 859 , 3'-CCAGA-5' at 935 , 3'-CCTGG-5' at 947 , 3'-CCTGA-5' at 959 , 3'-ACTGG-5' at 1140 , 3'-CCAGG-5' at 1175 , 3'-CCTGG-5' at 1199 , 3'-ACTGA-5' at 1286 , 3'-GCAGA-5' at 1316 , 3'-GCAGA-5' at 1416 , 3'-CCAGA-5' at 1631 , 3'-CCTGA-5' at 1660 , 3'-CCTGA-5' at 1676 , 3'-GCTGG-5' at 1736 , 3'-CCAGA-5' at 1742 , 3'-GCTGG-5' at 1779 , 3'-GCAGG-5' at 1788 , 3'-CTTGG-5' at 1799 , 3'-CCTGG-5' at 1815 , 3'-CCAGG-5' at 1855 , 3'-GTAGG-5' at 1875 , 3'-CCAGG-5' at 1893 , 3'-GCAGG-5' at 1905 , 3'-GCAGA-5' at 1937 , 3'-ACTGG-5' at 1953 , 3'-ACAGG-5' at 1966 , 3'-ACAGG-5' at 2125 , 3'-GTTGG-5' at 2185 , 3'-CCTGA-5' at 2211 , 3'-CCAGA-5' at 2228 , 3'-GTAGG-5' at 2255 , 3'-GCAGG-5' at 2296 , 3'-CCAGG-5' at 2316 , 3'-GCTGG-5' at 2320 , 3'-GCTGG-5' at 2405 , 3'-ACAGA-5' at 2414 , 3'-CCTGG-5' at 2433 , 3'-CTAGG-5' at 2482 , 3'-CCTGG-5' at 2501 , 3'-GTTGG-5' at 2541 , 3'-ATAGG-5' at 2550 , 3'-CCTGG-5' at 2569 , 3'-CCAGG-5' at 2574 , 3'-ATAGA-5' at 2627 , 3'-CTAGG-5' at 2639 , 3'-ACAGA-5' at 2652 , 3'-ACTGA-5' at 2674 , 3'-GCAGG-5' at 2683 , 3'-GCAGA-5' at 2721 , 3'-GCTGG-5' at 2734 , 3'-GCAGG-5' at 2745 , 3'-GCTGG-5' at 2770 , 3'-CCAGG-5' at 2780 , 3'-GCTGG-5' at 2810 , 3'-GTTGG-5' at 2816 , 3'-ACAGA-5' at 2837 , 3'-GCAGA-5' at 2859 , 3'-CCAGG-5' at 2876 , 3'-CCTGG-5' at 2891 , 3'-GCTGA-5' at 2915 , 3'-CCTGA-5' at 2968 , 3'-CCTGG-5' at 2988 , 3'-CCAGG-5' at 3016 , 3'-CCTGG-5' at 3047 , 3'-CCAGA-5' at 3091 , 3'-CCAGG-5' at 3111 , 3'-ACTGG-5' at 3117 , 3'-GCAGG-5' at 3128 , 3'-ACAGA-5' at 3133 , 3'-GCAGG-5' at 3147 , 3'-ACAGA-5' at 3179 , 3'-GCAGA-5' at 3214 , 3'-CCAGA-5' at 3221 , 3'-GCTGG-5' at 3242 , 3'-CCTGG-5' at 3296 , 3'-ACTGG-5' at 3345 , 3'-CCTGG-5' at 3362 , 3'-GTAGA-5' at 3416 , 3'-GCAGG-5' at 3466 , 3'-GCAGA-5' at 3473 , 3'-CTAGG-5' at 3484 , 3'-CCTGG-5' at 3496 , 3'-CTAGG-5' at 3522 , 3'-GCTGG-5' at 3526 , 3'-CCAGG-5' at 3536 , 3'-CCAGA-5' at 3548 , 3'-ACAGG-5' at 3571 , 3'-GCTGA-5' at 3588 , 3'-ACAGG-5' at 3636 , 3'-GCAGG-5' at 3662 , 3'-CCTGG-5' at 3679 , 3'-CCAGG-5' at 3687 , 3'-GCAGG-5' at 3694 , 3'-ACTGA-5' at 3735 , 3'-GTAGG-5' at 3753 , 3'-CCTGG-5' at 3758 , 3'-GCAGG-5' at 3768 , 3'-GCTGA-5' at 3778 , 3'-CCTGG-5' at 3787 , 3'-CCAGA-5' at 3806 , 3'-GCAGA-5' at 3831 , 3'-CCAGA-5' at 3891 , 3'-GCAGA-5' at 3916 , 3'-ACAGG-5' at 3975 , 3'-GCTGG-5' at 3989 , 3'-CTTGA-5' at 4016 , 3'-CCAGG-5' at 4032 , 3'-GTAGA-5' at 4036 , 3'-CTAGA-5' at 4065 , 3'-ACAGG-5' at 4070 , 3'-CTAGG-5' at 4077 , 3'-ACTGA-5' at 4089 , 3'-CTTGA-5' at 4131 , 3'-ATTGA-5' at 4161 , 3'-GCTGG-5' at 4177 , 3'-CCTGA-5' at 4186 , 3'-CCTGA-5' at 4214 , 3'-CTTGG-5' at 4300 , 3'-GCAGA-5' at 4317 , 3'-CCAGA-5' at 4330 , 3'-CCTGG-5' at 4409 , 3'-CCTGG-5' at 4424.
  7. inverse, positive strand, negative direction, is SuccessablesDPEi+-.bas, looking for 3'-A/C/G-C/T-A/T-G-A/G-5', 174, 3'-ACAGA-5', 13, 3'-ACTGA-5', 17, 3'-GTTGA-5', 85, 3'-ATAGA-5', 100, 3'-GTAGG-5', 119, 3'-ACTGA-5', 130, 3'-GCTGA-5', 140, 3'-ACAGA-5', 168, 3'-CCAGG-5', 262, 3'-GTAGA-5', 284, 3'-ACAGA-5', 289, 3'-ACTGA-5', 307, 3'-CTTGG-5', 328, 3'-ATAGA-5', 355, 3'-ACAGG-5', 424, 3'-CCTGG-5', 459, 3'-CCTGG-5', 508, 3'-ACAGG-5', 561, 3'-GCAGG-5', 565, 3'-ATTGA-5', 585, 3'-CCTGG-5', 596, 3'-ATTGG-5', 643, 3'-CCAGG-5', 648, 3'-GCAGG-5', 697, 3'-CCTGA-5', 732, 3'-GCTGG-5', 781, 3'-GCTGA-5', 825, 3'-GCAGG-5', 831, 3'-CTTGA-5', 843, 3'-CCAGG-5', 850, 3'-CCTGG-5', 899, 3'-ACAGA-5', 907, 3'-CCAGG-5', 948, 3'-GTAGA-5', 970, 3'-GCTGA-5', 991, 3'-GCAGG-5', 997, 3'-CTTGA-5', 1009, 3'-CCTGG-5', 1015, 3'-GCAGA-5', 1023, 3'-ATTGG-5', 1045, 3'-ACAGA-5', 1073, 3'-GCTGG-5', 1111, 3'-CCTGA-5', 1173, 3'-CCTGG-5', 1198, 3'-GCTGA-5', 1282, 3'-GCAGG-5', 1288, 3'-CTTGA-5', 1300, 3'-CTAGG-5', 1307, 3'-GCAGA-5', 1314, 3'-CCAGA-5', 1411, 3'-CCAGG-5', 1460, 3'-GCTGG-5', 1464, 3'-CCAGA-5', 1518, 3'-ACAGA-5', 1567, 3'-GCAGA-5', 1614, 3'-CCTGA-5', 1623, 3'-CTTGG-5', 1649, 3'-GTAGA-5', 1653, 3'-CCAGA-5', 1670, 3'-ATAGA-5', 1710, 3'-GCTGG-5', 1746, 3'-GCTGG-5', 1756, 3'-GCTGA-5', 1800, 3'-GCAGG-5', 1823, 3'-CCTGG-5', 1841, 3'-GTTGA-5', 1853, 3'-GCTGG-5', 1891, 3'-CTTGG-5', 1927, 3'-ACTGA-5', 1935, 3'-GCAGG-5', 1941, 3'-CCTGG-5', 1959, 3'-GCAGA-5', 1967, 3'-CCTGG-5', 2009, 3'-ACAGA-5', 2017, 3'-GCTGG-5', 2069, 3'-CCAGG-5', 2077, 3'-GCTGA-5', 2109, 3'-ACAGA-5', 2119, 3'-CTTGA-5', 2127, 3'-GCTGA-5', 2226, 3'-GTTGG-5', 2235, 3'-CCTGG-5', 2268, 3'-GCTGG-5', 2326, 3'-GCTGA-5', 2361, 3'-GCAGG-5', 2367, 3'-CTTGA-5', 2379, 3'-CCTGG-5', 2385, 3'-GCAGG-5', 2389, 3'-CCTGG-5', 2435, 3'-ACAGA-5', 2443, 3'-ACAGG-5', 2514, 3'-CCAGG-5', 2519, 3'-GCTGA-5', 2562, 3'-GCAGG-5', 2568, 3'-CTTGA-5', 2580, 3'-GTTGA-5', 2593, 3'-ACAGG-5', 2689, 3'-GCTGA-5', 2696, 3'-GTTGA-5', 2705, 3'-CTTGA-5', 2714, 3'-CCTGG-5', 2720, 3'-GCTGA-5', 2744, 3'-CCTGG-5', 2770, 3'-ACAGA-5', 2778, 3'-ACAGA-5', 2878, 3'-ATAGA-5', 2903, 3'-CTTGG-5', 2921, 3'-GCTGG-5', 3035, 3'-GCTGG-5', 3041, 3'-GCTGA-5', 3085, 3'-CTTGA-5', 3103, 3'-GTTGG-5', 3116, 3'-CCTGG-5', 3128, 3'-GCTGG-5', 3180, 3'-GCTGA-5', 3224, 3'-CTTGA-5', 3242, 3'-CCAGG-5', 3249, 3'-GTAGA-5', 3256, 3'-CCTGG-5', 3298, 3'-ATTGA-5', 3358, 3'-CTTGA-5', 3401, 3'-ATAGA-5', 3422, 3'-GCAGA-5', 3431, 3'-ATAGG-5', 3447, 3'-CTAGA-5', 3463, 3'-CCAGA-5', 3486, 3'-GTTGA-5', 3505, 3'-ATTGG-5', 3529, 3'-GTTGA-5', 3533, 3'-ACTGA-5', 3542, 3'-CCAGG-5', 3564, 3'-CTTGA-5', 3571, 3'-CCAGG-5', 3585, 3'-GCAGA-5', 3589, 3'-GTTGG-5', 3606, 3'-GCTGA-5', 3649, 3'-ACAGA-5', 3672, 3'-GCTGG-5', 3719, 3'-CCTGG-5', 3744, 3'-ACTGG-5', 3749, 3'-CCTGA-5', 3781, 3'-CTTGG-5', 3793, 3'-GTTGA-5', 3805, 3'-GTAGA-5', 3820, 3'-GCTGG-5', 3864, 3'-CCAGG-5', 3871, 3'-CCAGG-5', 3885, 3'-CCTGG-5', 3906, 3'-ACAGA-5', 3917, 3'-CCTGA-5', 3932, 3'-GTTGG-5', 3942, 3'-GTTGG-5', 3946, 3'-CCAGG-5', 3951, 3'-GCTGA-5', 3994, 3'-CTTGA-5', 4012, 3'-CCTGG-5', 4037, 3'-ATAGA-5', 4079, 3'-GTTGG-5', 4097, 3'-CCAGG-5', 4102, 3'-GCTGA-5', 4145, 3'-CCAGG-5', 4170, 3'-CTTGG-5', 4188, 3'-CCAGA-5', 4233, 3'-CCAGG-5', 4253, 3'-CTTGG-5', 4268, 3'-GCTGA-5', 4276, 3'-GCAGG-5', 4282, 3'-CTTGA-5', 4294, 3'-CCTGG-5', 4300, 3'-CCTGG-5', 4349, 3'-CCAGA-5', 4448, 3'-CCTGG-5', 4494, 3'-ACAGA-5', 4518, 3'-CCTGG-5', 4546,
  8. inverse, positive strand, positive direction, is SuccessablesDPEi++.bas, looking for 3'-A/C/G-C/T-A/T-G-A/G-5', 95, 3'-GTAGG-5' at 30, 3'-CCTGG-5' at 37, 3'-ACAGG-5' at 82, 3'-ACAGA-5' at 100, 3'-CCTGG-5' at 187, 3'-CCAGG-5' at 218, 3'-ACAGA-5' at 268, 3'-GTTGG-5' at 608, 3'-GTAGG-5' at 629, 3'-GTAGG-5' at 698, 3'-CCTGA-5' at 746, 3'-CCTGA-5' at 814, 3'-GTTGG-5' at 844, 3'-CTAGG-5' at 865, 3'-ACAGG-5' at 893, 3'-GCTGA-5' at 898, 3'-CCTGA-5' at 914, 3'-GTTGG-5' at 944, 3'-CTAGG-5' at 965, 3'-ACAGG-5' at 993, 3'-GCTGA-5' at 998, 3'-GTTGG-5' at 1280, 3'-GCAGA-5' at 1393, 3'-GCAGA-5' at 1493, 3'-GTTGG-5' at 1616, 3'-ACTGG-5' at 1662, 3'-CCAGA-5' at 1711, 3'-ACAGA-5' at 1731, 3'-CTTGG-5' at 1811, 3'-ACAGA-5' at 1862, 3'-GCAGG-5' at 1930, 3'-CTTGA-5' at 1951, 3'-CCAGA-5' at 1958, 3'-GTTGG-5' at 2013, 3'-ACAGA-5' at 2078, 3'-GTAGA-5' at 2111, 3'-GTTGG-5' at 2120, 3'-ACAGA-5' at 2172, 3'-ACTGG-5' at 2213, 3'-CTTGG-5' at 2225, 3'-CCAGA-5' at 2258, 3'-CCTGA-5' at 2271, 3'-GCTGA-5' at 2359, 3'-CTAGG-5' at 2378, 3'-ACAGA-5' at 2466, 3'-CCAGA-5' at 2489, 3'-CTAGG-5' at 2514, 3'-CTTGG-5' at 2579, 3'-CCTGA-5' at 2672, 3'-CTTGG-5' at 2776, 3'-CCTGA-5' at 2820, 3'-GTAGA-5' at 2852, 3'-ACTGG-5' at 2873, 3'-CCAGA-5' at 2941, 3'-ACTGA-5' at 2945, 3'-ACAGA-5' at 3004, 3'-CCAGA-5' at 3019, 3'-ACTGA-5' at 3029, 3'-ACAGA-5' at 3053, 3'-GTAGG-5' at 3108, 3'-CCTGG-5' at 3172, 3'-GCAGG-5' at 3203, 3'-CCAGA-5' at 3245, 3'-GCAGA-5' at 3256, 3'-GTTGA-5' at 3291, 3'-CCAGA-5' at 3299, 3'-GTAGA-5' at 3329, 3'-ATAGG-5' at 3384, 3'-ACAGA-5' at 3392, 3'-GTAGA-5' at 3403, 3'-CCTGG-5' at 3545, 3'-ACAGG-5' at 3577, 3'-CCAGA-5' at 3608, 3'-ACAGG-5' at 3619, 3'-GTAGG-5' at 3629, 3'-ACTGG-5' at 3714, 3'-ATTGA-5' at 3733, 3'-CCAGA-5' at 3771, 3'-ACTGG-5' at 3784, 3'-GCTGA-5' at 3801, 3'-CTTGG-5' at 3838, 3'-CTTGG-5' at 3856, 3'-GTTGG-5' at 3911, 3'-CTTGG-5' at 3937, 3'-ACTGG-5' at 4018, 3'-CTTGA-5' at 4048, 3'-GCAGA-5' at 4056, 3'-CTAGG-5' at 4081, 3'-GTAGG-5' at 4183, 3'-ACTGG-5' at 4216, 3'-GCTGG-5' at 4358, 3'-ACAGG-5' at 4367, 3'-CCAGA-5' at 4380, 3'-CCAGA-5' at 4414, 3'-CCAGG-5' at 4420.

DREB boxes

There are no dehydration-responsive element-binding (DREB) boxes in either promoter.

E2 boxes

Negative strand in the negative direction there are 5: 3'-ACAGATGT-5', 482, 3'-ACAGATGT-5', 1225, 3'-GCAGTTGG-5', 1514, 3'-ACAGATGT-5', 2989, 3'-ACAGATGT-5', 4213, in the distal promoter.

Positive strand in the negative direction there are 2: 3'-GCAGGTGG-5', 2571, 3'-ACAGATGA-5', 3920.

Inverse complement, negative strand, negative direction there is 1: 3'-CCACCTGT-5', 2117.

Inverse complement, positive strand, negative direction there are 4: 3'-CCACCTGT-5', 394, 3'-ACACCTGT-5', 1131, 3'-GCAACTGC-5', 3851, 3'-ACACCTGT-5', 3970

Negative strand in the positive direction there is 1: 3'-GCAGATGA-5', 37.

EIF4E basal elements

There are no EIF4E basal element, also eIF4E, (4EBE), in either promoter.

Enhancer boxes

Core promoters

Negative strand in the positive direction there are 2: 3'-CACATG-5', 4364, 3'-CACATG-5', 4370.

Proximal promoters

Positive strand, negative direction there is 1: 3'-CACATG-5' at 4247.

Negative strand, positive direction there are 2: 3'-CACATG-5', 4153, 3'-CACATG-5', 4221.

Distal promoters

Negative strand in the negative direction there are 4: 3'-CACATG-5' at 324, 3'-CACATG-5' at 797, 3'-CACATG-5' at 2213, and 3'-CACATG-5' at 2342.

Positive strand in the negative direction there are 17, 3'-CACATG-5' at 123, 3'-CACATG-5' at 200, 3'-CACATG-5' at 952, 3'-CACATG-5' at 1206, 3'-CACATG-5' at 1849, 3'-CACATG-5' at 1952, 3'-CACATG-5' at 2151, 3'-CACATG-5' at 2276, 3'-CACATG-5' at 2322, 3'-CACATG-5' at 2533, 3'-CACATG-5' at 2613, 3'-CACATG-5' at 2667, 3'-CACATG-5' at 2751, 3'-CACATG-5' at 2783, 3'-CACATG-5' at 4106, 3'-CACATG-5' at 4116.

Negative strand in the positive direction there are 17: 3'-CACATG-5', 1186, 3'-CACATG-5', 1238, 3'-CACATG-5', 1871, 3'-CACATG-5', 1933, 3'-CACATG-5', 2031, 3'-CACATG-5', 2140, 3'-CACATG-5', 2153, 3'-CACATG-5', 2266, 3'-CACATG-5', 2473, 3'-CACATG-5', 3140, 3'-CACATG-5', 3335, 3'-CACATG-5', 3580, 3'-CACATG-5', 3707, 3'-CACATG-5', 3742, 3'-CACATG-5', 3827, 3'-CACATG-5', 3900, 3'-CACATG-5', 3956.

Positive strand in the positive direction there are 4: 3'-CACATG-5', 126, 3'-CACATG-5', 565, 3'-CACATG-5', 2596, 3'-CACATG-5', 3114.

F boxes

"Male sex determination in the Caenorhabditis elegans hermaphrodite germline requires translational repression of tra-2 mRNA by the GLD-1 RNA binding protein."[67]

"We used the yeast Gal4p two-hybrid system (Fields and Sternglanz, 1994) to identify proteins that physically interact with GLD-1. We recovered two identical cDNAs in two-hybrid screens [...]. One (OG2.3) using GLD-1 residues 84-341 and the other (CD13.1) using residues 273-457, both fused to the Gal4p DNA binding domain [...]."[67]

"When fused to the DNA-binding domain of Gal4p, Ino2p but not Ino4p was able to activate a UASGAL-containing reporter gene even in the absence of the heterologous Fbfl subunit. By deletion studies, two separate transcriptional activation domains were identified in the N-terminal part of Ino2p. Thus, the bHLH domains of Ino2p and Ino4p constitute the dimerization/DNA-binding module of Fbfl mediating its interaction with the ICRE, while transcriptional activation is effected exclusively by Ino2p."[68]

"This ICRE (consensus sequence TYTTCACATGY) contains the core sequence CANNTG, which is also known as an E box and which serves as a recognition site for DNA-binding proteins of the basic helix-loop-helix (bHLH) family (3). Members of the bHLH family comprise determinants of cellular differentiation and proliferation in mammalian and invertebrate systems such as the myogenic transcription factors MyoD, MRF4, myogenin and Myf-5(4) as well as factors not restricted to specialized tisues (E12, E47, daughterless, c-Myc and Mad; 5-7). Proteins of the bHLH group may form either homodimers or heterodimers or both, dependent on the individual structure of the respective interaction surface provided by the HLH domain(8)."[68]

GAAC elements

  1. negative strand in the negative direction, looking for 3'-GAACT-5', 13, 3'-GAACT-5', 843, 3'-GAACT-5', 1009, 3'-GAACT-5', 1300, 3'-GAACT-5', 2127, 3'-GAACT-5', 2379, 3'-GAACT-5', 2580, 3'-GAACT-5', 2714, 3'-GAACT-5', 3103, 3'-GAACT-5', 3242, 3'-GAACT-5', 3401, 3'-GAACT-5', 3571, 3'-GAACT-5', 4012, 3'-GAACT-5', 4294,
  2. negative strand in the positive direction, looking for 3'-GAACT-5', 1, 3'-GAACT-5', 609,
  3. positive strand in the negative direction, looking for 3'-GAACT-5', 2, 3'-GAACT-5', 1685, 3'-GAACT-5', 3460,
  4. positive strand in the positive direction, looking for 3'-GAACT-5', 2, 3'-GAACT-5', 577, 3'-GAACT-5', 692,
  5. complement, negative strand, negative direction, looking for 3'-CTTGA-5', 2, 3'-CTTGA-5', 1685, 3'-CTTGA-5', 3460,
  6. complement, negative strand, positive direction, looking for 3'-CTTGA-5', 2, 3'-CTTGA-5', 577, 3'-CTTGA-5', 692,
  7. complement, positive strand, negative direction, looking for 3'-CTTGA-5', 13, 3'-CTTGA-5', 843, 3'-CTTGA-5', 1009, 3'-CTTGA-5', 1300, 3'-CTTGA-5', 2127, 3'-CTTGA-5', 2379, 3'-CTTGA-5', 2580, 3'-CTTGA-5', 2714, 3'-CTTGA-5', 3103, 3'-CTTGA-5', 3242, 3'-CTTGA-5', 3401, 3'-CTTGA-5', 3571, 3'-CTTGA-5', 4012, 3'-CTTGA-5', 4294,
  8. complement, positive strand, positive direction, looking for 3'-CTTGA-5', 1, 3'-CTTGA-5', 609,
  9. inverse complement, negative strand, negative direction, looking for 3'-AGTTC-5', 3, 3'-AGTTC-5', 3844, 3'-AGTTC-5', 4027, 3'-AGTTC-5', 4178,
  10. inverse complement, negative strand, positive direction, looking for 3'-AGTTC-5', 1, 3'-AGTTC-5', 761,
  11. inverse complement, positive strand, negative direction, looking for 3'-AGTTC-5', 6, 3'-AGTTC-5', 253, 3'-AGTTC-5', 719, 3'-AGTTC-5', 1177, 3'-AGTTC-5', 4024, 3'-AGTTC-5', 4175, 3'-AGTTC-5', 4417,
  12. inverse complement, positive strand, positive direction, looking for 3'-AGTTC-5', 0,
  13. inverse, negative strand, negative direction, looking for 3'-TCAAG-5', 6, 3'-TCAAG-5', 253, 3'-TCAAG-5', 719, 3'-TCAAG-5', 1177, 3'-TCAAG-5', 4024, 3'-TCAAG-5', 4175, 3'-TCAAG-5', 4417,
  14. inverse, negative strand, positive direction, looking for 3'-TCAAG-5', 0,
  15. inverse, positive strand, negative direction, looking for 3'-TCAAG-5', 3, 3'-TCAAG-5', 3844, 3'-TCAAG-5', 4027, 3'-TCAAG-5', 4178,
  16. inverse, positive strand, positive direction, looking for 3'-TCAAG-5', 1, 3'-TCAAG-5', 761.

GA responsive elements

Only one GARE (an inverse) occurs: between ZSCAN22 and A1BG 3'-AAACAAT-5' at 230 nts and its complement.

GATA boxes

GTGA-box has the consensus sequence GATA.[69]

Proximal promoters

Inverse complement, negative strand, positive direction there is 1: 3'-TTTATCAC-5', 4125.

Distal promoters

Positive strand in the negative direction there are 2: 3'-GGGATAGA-5', 100, 3'-ATGATAGA-5', 355.

Inverse complement, negative strand, negative direction there is 1: 3'-GTTATCAT-5', 2500.

Inverse complement, positive strand, negative direction there is 1: 3'-TTTATCTT-5', 1732.

Inverse complement, negative strand, positive direction there is 1: 3'-GTTATCCC-5', 3385.

Inverse complement, positive strand, positive direction there are 2: 3'-GCTATCAG-5', 1840, 3'-TTTATCTT-5', 2628.

G boxes

There are no G boxes in either promoter.

GC boxes

Positive strand in the negative direction there are 2; 3'-TGGGCGTGGT-5', 1898, 3'-TGGGCGTGGT-5', 3048, in the distal promoter.

Inverse complement, negative strand, negative direction there is 1: 3'-ACTCCGCCCA-5', 3092.

Inverse complement, positive strand, negative direction there is 1: 3'-GCTCCGCCTC-5', 1505.

Negative strand in the positive direction there is 1: 3'-TGGGCGGGAC-5', 409.

Inverse complement, positive strand, positive direction there is 1:, 3'-GCCACGCCCC-5', 491.

GCC boxes

The GCC box is the same as the AGC box.

GLM boxes

There are no GCN4-like motif (GLM) boxes in either promoter.

Grainy head transcription factor binding sites

H boxes

Core promoters

Between ZSCAN22 and A1BG: There is one inverse and its complement 3'-AGGAGA-5' at 4428 nts.

Between ZNF497 and A1BG: There is an inverse and its complement 3'-AGGACA-5' at 4252. There is five after the TSS: 3'-AGAGAA-5' at 4387, 3'-AGTACA-5' at 4365, 3'-ACCAGA-5' at 4380, 3'-AAGAGA-5' at 4386, 3'-ACGACA-5' at 4392 and their complements.

Proximal promoters

Between ZSCAN22 and A1BG: There is one H box (3'-ANANNA-5'): negative direction, negative strand, 3'-ACACGA-5' at 4402. On the positive strand in the negative direction there are 16: 3'-ACAAAA-5' at 4216, 3'-AAAAAA-5' at 4218, 3'-AAAATA-5' at 4220, 3'-AAATAA-5' at 4221, 3'-ATAATA-5' at 4223, 3'-AAAAAA-5' at 4378, 3'-AAAAGA-5' at 4380, 3'-AAAGAA-5' at 4381, 3'-AGAAAA-5' at 4383, 3'-AAAAAA-5'at 4385, 3'-AAAAGA-5' at 4387, 3'-AAAGAA-5' at 4388, 3'-AGAAAA-5' at 4390, 3'-AAAAGA-5' at 4392, 3'-AAAGAA-5' at 4393, and 3'-AGAAAA-5' at 4395, with their complements on the negative strand, negative direction.

Between ZNF497 and A1BG: There is one H box (3'-ANANNA-5'): 3'-AGAGAA-5' at 4387 in the proximal promoter, negative strand, positive direction. There are four: 3'-TCATGT-5' at 4365, 3'-TGGTCT-5' at 4380, 3'-TTCTCT-5' at 4386, and 3'-TGCTGT-5' at 4392 and their complements in the positive direction.

Distal promoters

Between ZSCAN22 and A1BG, negative strand, negative direction: 3'-AGAGGA-5' at 3387, 3'-AGAGGA-5' at 3638, and 3'-AGAGGA-5' at 3675. One inverse and its complement 3'-AGGAGA-5' at 3790. There are 14 H boxes: 3'-ACACCA-5' at 788, 3'-ACATCA-5' at 2541, 3'-ACACCA-5' at 2659, 3'-ACATTA-5' at 2675, 3'-ATAAAA-5' at 2853, 3'-AAAGTA-5' at 2886, 3'-ACATTA-5' at 3064, 3'-AGATGA-5' at 3159, 3'-ACACCA-5' at 3187, 3'-AGAAGA-5' at 3554, 3'-AGACGA-5' at 3707, 3'-ACACCA-5' at 3811, 3'-ACATTA-5' at 3973, and 3'-ACATCA-5' at 4124.

On the positive strand, negative direction, there are 127 H boxes: 3'-ACCACA-5' at 608, 3'-ACCACA-5' at 793, 3'-ACACCA-5' at 883, 3'-ACCACA-5', 1477, 3'-ACACCA-5' at 2419, 3'-AAAAAA-5' at 2461, 3'-AAAAAA-5' at 2462, 3'-AAAAAA-5' at 2463, 3'-AAAAAA-5' at 2464, 3'-AAAAAA-5' at 2465, 3'-AAAAAA-5' at 2466, 3'-AAAAAA-5' at 2467, 3'-AAAAAA-5' at 2468, 3'-AAAAAA-5' at 2469, 3'-AAAAAA-5' at 2470, 3'-AAAGCA-5' at 2473, 3'-AAAGCA-5' at 2479, 3'-AAACAA-5' at 2484, 3'-AAACAA-5' at 2488, 3'-ACAAAA-5' at 2490, 3'-ATAGTA-5' at 2500, 3'-AGAAAA-5' at 2506, 3'-AAAACA-5' at 2508, 3'-AAACAA-5' at 2509, 3'-AGACCA-5' at 2599, 3'-ATACAA-5' at 2642, 3'-ACAAAA-5' at 2644, 3'-AAATCA-5' at 2648, 3'-ACAGGA-5' at 2690, 3'-AAATCA-5' at 2749, 3'-AGAGCA-5' at 2781, 3'-AAAAGA-5' at 2798, 3'-AAAGAA-5' at 2799, 3'-AAAGAA-5' at 2803, 3'-AGAAAA-5' at 2805, 3'-AAAAGA-5' at 2807, 3'-AGAGAA-5' at 2810, 3'-AGAAGA-5' at 2812, 3'-AGAAAA-5' at 2815, 3'-AAAAAA-5' at 2817, 3'-AAAAGA-5' at 2819, 3'-AAAGAA-5' at 2820, 3'-AGAAAA-5' at 2822, 3'-AAAAGA-5' at 2824, 3'-AGAGAA-5' at 2827, 3'-AGAAGA-5' at 2829, 3'-AGAAAA-5' at 2832, 3'-AAAAAA-5' at 2834, 3'-AAAAGA-5' at 2836, 3'-AAAGAA-5' at 2837, 3'-AGAAAA-5' at 2839, 3'-AAAACA-5' at 2841, 3'-AAACAA-5' at 2842, 3'-AAAATA-5' at 2868, 3'-ATATAA-5' at 2873, 3'-AAAAAA-5' at 2929, 3'-ACATCA-5' at 2941, 3'-ACATTA-5' at 2951, 3'-AAACCA-5' at 2971, 3'-AAAATA-5' at 3012, 3'-AAATAA-5' at 3013, 3'-AAAAAA-5' at 3026, 3'-AAACTA-5' at 3029, 3'-AGACCA-5' at 3122, 3'-AAAACA-5' at 3166, 3'-ACATAA-5' at 3169, 3'-ATAAAA-5' at 3171, 3'-AAATTA-5' at 3175, 3'-AGATCA-5' at 3277, 3'-ACAAGA-5' at 3307, 3'-AGAGCA-5' at 3310, 3'-AAAACA-5' at 3329, 3'-AAACAA-5' at 3330, 3'-AAATAA-5' at 3334, 3'-AAACAA-5' at 3338, 3'-ACAAGA-5' at 3340, 3'-AGAAAA-5' at 3343, 3'-AAACCA-5' at 3365, 3'-AGAGGA-5' at 3387, 3'-ACATCA-5' at 3394, 3'-AGAGAA-5' at 3406, 3'-ACATCA-5' at 3415, 3'-ACATTA-5' at 3436, 3'-ATATTA-5' at 3454, 3'-ATATTA-5' at 3468, 3'-AAACCA-5' at 3484, 3'-AGATCA-5' at 3489, 3'-AAAACA-5' at 3511, 3'-ACACAA-5' at 3514, 3'-ATAATA-5' at 3538, 3'-ACAAGA-5' at 3635, 3'-AGAGGA-5' at 3638, 3'-AAAGAA-5' at 3666, 3'-AGAACA-5' at 3668, 3'-AGAGGA-5' at 3675, 3'-ACAAGA-5' at 3759, 3'-AGACCA-5' at 3762, 3'-ACCACA-5' at 3764, 3'-ACAAAA-5' at 3767, 3'-AGAGCA-5' at 3913, 3'-AGATGA-5' at 3920, 3'-AGACCA-5' at 4031, 3'-ACAAAA-5' at 4066, 3'-AAAAAA-5' at 4068, 3'-AAAATA-5' at 4070, 3'-AAATAA-5' at 4071, 3'-AAATAA-5' at 4075, 3'-ATAATA-5' at 4077, 3'-ATAGAA-5' at 4080, 3'-AAAGAA-5' at 4084, 3'-AGAAAA-5' at 4086, 3'-AGACAA-5' at 4182, 3'-ACAAAA-5' at 4216, 3'-AAAAAA-5' at 4218, 3'-AAAATA-5' at 4220, 3'-AAATAA-5' at 4221, 3'-ATAATA-5' at 4223, 3'-AAAAAA-5' at 4378, 3'-AAAAGA-5' at 4380, 3'-AAAGAA-5' at 4381, 3'-AGAAAA-5' at 4383, 3'-AAAAAA-5' at 4385, 3'-AAAAGA-5' at 4387, 3'-AAAGAA-5' at 4388, 3'-AGAAAA-5' at 4390, 3'-AAAAGA-5' at 4392, 3'-AAAGAA-5' at 4393, and 3'-AGAAAA-5' at 4395.

Between ZNF497 and A1BG: There are two H boxes after nucleotide number 2300 in the negative strand and positive direction: 3'-ACCACA-5' at 420, 3'-ACACCA-5' at 386, 3'-TGGTGT-5' at 511, 3'-TGGTGT-5' at 530, 3'-ACACCA-5' at 2603 and 3'-ACACCA-5' at 3825.

There are two H boxes after nucleotide number 2300 in the positive strand and positive direction: 3'-ACACCA-5' at 204, 3'-ACACCA-5' at 528, 3'-ACACCA-5' at 3643 and 3'-ACACCA-5' at 3967.

Regarding 3'-ANANNA-5', on the negative strand, positive direction, there are 25 H boxes: 3'-ATACCA-5' at 2591, 3'-ACACCA-5' at 2603, 3'-ATAGAA-5' at 2628, 3'-AAACCA-5' at 2632, 3'-ACACTA-5'at 2637, 3'-ATATAA-5' at 2662, 3'-AGAGCA-5' at 2704, 3'-AGAGGA-5' at 2793, 3'-AAAGGA-5' at 2829, 3'-ACAGAA-5' at 2838, 3'-AAAGAA-5' at 3066, 3'-AGAACA-5' at 3094, 3'-AGAGCA-5' at 3138, 3'-ACAGCA-5' at 3212, 3'-ACAGTA-5' at 3414, 3'-AGATGA-5' at 3476, 3'-ACAGGA-5' at 3572, 3'-AAAGCA-5' at 3599, 3'-ACATGA-5' at 3708, 3'-ACACCA-5' at 3825, 3'-AAAAGA-5' at 3929, 3'-AGAACA-5' at 4068, 3'-AAATGA-5' at 4094, 3'-ACATCA-5' at 4116, and 3'-ACATGA-5' at 4154.

On the positive strand, positive direction there are 20 H boxes: 3'-AAATAA-5' at 2347, 3'-AAAAAA-5' at 2451, 3'-AAAACA-5' at 2453, 3'-AGACGA-5' at 2976, 3'-AGACCA-5' at 3022, 3'-AGAGAA-5' at 3056, 3'-AGAAGA-5' at 3058, 3'-AGAGGA-5' at 3302, 3'-AGACGA-5' at 3307, 3'-ACAGAA-5' at 3393, 3'-AGAAGA-5' at 3395, 3'-ACAGGA-5' at 3620, 3'-ACACCA-5' at 3643, 3'-AAACCA-5' at 3948, 3'-ACACCA-5' at 3967, 3'-AGAGGA-5' at 4059, 3'-AAAATA-5' at 4122, 3'-AAATCA-5' at 4137, 3'-AAATAA-5' at 4142, and 3'-ATATTA-5' at 4168.

There inverses on the negative strand in the positive direction of 31 H boxes: 3'-ATGACA-5' at 2412, 3'-ACTACA-5' at 2428, 3'-AGGACA-5' at 2460, 3'-ATTATA-5' at 2548, 3'-ACCACA-5' at 2600, 3'-AGGAAA-5' at 2623, 3'-AATAGA-5' at 2627, 3'-ACCACA-5' at 2634, 3'-AACAGA-5' at 2652, 3'-AGCAAA-5' at 2706, 3'-AGGAAA-5' at 2831, 3'-AACACA-5' at 2835, 3'-ATGACA-5' at 2843, 3'-AGAACA-5' at 3094, 3'-AACACA-5' at 3096, 3'-AGGACA-5' at 3131, 3'-ACCAAA-5' at 3175, 3'-AACAGA-5' at 3179, 3'-AGCAGA-5' at 3214, 3'-AGTAGA-5' at 3416, 3'-AATAAA-5' at 3427, 3'-ACCAGA-5' at 3548, 3'-ATGACA-5' at 3569, 3'-AGGAGA-5' at 3650, 3'-AGCACA-5' at 3740, 3'-ACCACA-5' at 3859, 3'-AAAAGA-5' at 3929, 3'-AGAACA-5' at 4068, 3'-ATCATA-5' at 4149, and 3'-ATTATA-5' at 4166.

HMG boxes

HNF6s

Core promoters

Inverse complement, positive strand, negative direction there is 1: 3'-TTATTAATTC-5', 4542.

Proximal promoters

Negative strand in the negative direction there is 1: 3'-TTATTAATCG-5', 4229.

Negative strand in the positive direction there are 2: 3'-TTATTAATCA-5', 4147, 3'-TTATTGATTA-5', 4164.

Inverse complement, positive strand, positive direction there are 1: 3'-ATATTAACAA-5', 4172.

Distal promoters

Negative strand in the negative direction there are 2: 3'-GTGTTAATAA-5', 1725, 3'-TAGTTGATAA-5', 3527.

Positive strand in the negative direction there is 1: 3'-AAATTGATAA-5', 3361.

Inverse complement, negative strand, negative direction there are 2: 3'-ACATGGACAT-5', 802, 3'-TAATGAACTT-5', 1301.

Inverse complement, positive strand, negative direction there are 2: 3'-AAATTGATAA-5', 3361, 3'-TCATCAACTA-5', 3525.

Negative strand in the positive direction there are 1: 3'-ATGTCCATGG-5', 3581.

Positive strand in the positive direction there is 1: 3'-GAGTCCATTG-5', 3732.

Inverse complement, positive strand, positive direction there is 1: 3'-CCATTGACTC-5', 3736.

Homeoboxes

"Transcription factors Pax-4 and Pax-6 are known to be key regulators of pancreatic cell differentiation and development. [...] The gene-targeting experiments revealed that Pax-4 and Pax-6 cannot substitute for each other in tissue with overlapping expression of both genes. [The] DNA-binding specificities of Pax-4 and Pax-6 are similar. The Pax-4 homeodomain [HD} was shown to preferentially dimerize on DNA sequences consisting of an inverted TAAT motif, separated by 4-nucleotide spacing."[70]

The "crucial difference between the binding sites of Antennapedia class and TTF-1 HDs is in the motifs 5'-TAAT-3', recognized by Antennapedia [a Hox gene, a subset of homeobox genes, first discovered in Drosophila which controls the formation of legs during development], and 5'-CAAG-3', preferentially bound by TTF-1. [The] binding of wild type and mutants TTF-1 HD to oligonucleotides containing either 5'-TAAT-3' or 5'-CAAG-3' indicate that only in the presence of the latter motif the Gln50 in TTF-1 HD is utilized for DNA recognition."[71]

HY boxes

Core promoters

Positive strand in the negative direction there is 1: 3'-TGAGGG-5' at 4558.

Inverse complement, negative strand, negative direction there is 1: 3'-CCCTCA-5', 4498.

Negative strand in the positive direction there is 1: 3'-TGTGGG-5', 4395.

Distal promoters

Negative strand in the negative direction there is 1: 3'-TGTGGG-5' at 749.

Positive strand in the negative direction there are 4: 3'-TGAGGG-5' at 88, 3'-TGAGGG-5' at 2699, 3'-TGAGGG-5' at 3652, 3'-TGTGGG-5' at 3712.

Inverse complement, negative strand, negative direction there are 3: 3'-CCCTCA-5', 2702, 3'-CCCACA-5', 3184, 3'-CCCTCA-5', 3889.

Positive strand in the positive direction there are 2: 3'-TGTGGG-5', 2965, 3'-TGTGGG-5', 3533.

Negative strand in the positive direction there are 3: 3'-TGAGGG-5', 258, 3'-TGAGGG-5', 3479, 3'-TGAGGG-5', 3879.

Inverse complement, negative strand, positive direction there are 3: 3'-CCCTCA-5', 88, 3'-CCCTCA-5', 3207, 3'-CCCTCA-5', 3503.

Inverse complement, positive strand, positive direction there is 5: 3'-CCCTCA-5', 494, 3'-CCCTCA-5', 662, 3'-CCCTCA-5', 1783, 3'-CCCACA-5', 1803, 3'-CCCTCA-5', 3185.

I boxes

Initiator elements (YYANWYY)

Core promoters

There is the following Inr in the core promoter, negative strand, negative direction: 3'-TTACTCC-5' at 4557.

There are four Inrs in the core promoter, positive strand, negative direction: 3'-CCACTCC-5' at 4425, 3'-CCACTTT-5' at 4461, 3'-TCACATT-5' at 4533, and 3'-TTAATTC-5' at 4542.

There is the following Inr in the core promoter, negative strand, positive direction: 3'-CTGCACC-5' at 4343.

There are two Inrs in the core promoter, positive strand, positive direction: 3'-CCACTCC-5' at 4401 and 3'-CCAGACC-5' at 4416.

Proximal promoters

There are eight Inrs on the negative strand in the negative direction: 3'-TCACTCT-5' at 4202, 3'-TCGGTCT-5' at 4233, 3'-CTGCACC-5' at 4238, 3'-TCGGACC-5' at 4300, 3'-CCAGTTT-5' at 4309, 3'-TCGGACC-5' at 4349, 3'-TCACACT-5' at 4361, and 3'-TTACTCC-5' at 4557.

There are seven Inrs on the positive strand in the negative direction: 3'-CCGGACT-5' at 4327, 3'-CTGCACT-5' at 4340, 3'-CCAGTTC-5' at 4417, 3'-CCACTCC-5' at 4425, 3'-CCACTTT-5' at 4461, 3'-TCACATT-5' at 4533, and 3'-TTAATTC-5' at 4542.

There is one Inr on the negative strand in the positive direction: 3'-CTGCACC-5' at 4343.

There is two Inrs on the positive strand in the positive direction: 3'-CCACTCC-5' at 4401 and 3'-CCAGACC-5' at 4416.

Distal promoters

Negative strand in the negative direction there are 87: 3'-TTGTTCC-5', 71, 3'-CTATACC-5', 77, 3'-CCGTTTC-5', 93, 3'-CCGTACT-5', 124, 3'-CCATATT-5', 181, 3'-CTACATT-5', 247, 3'-TTGGTCC-5', 262, 3'-TTATACT-5', 274, 3'-TCACTCT-5', 301, 3'-CTGCTTT-5', 312, 3'-CCGGTTC-5', 419, 3'-CCAGTCC-5', 441, 3'-TCGGACC-5', 459, 3'-TTGTATC-5', 468, 3'-TCACTTT-5', 473, 3'-TCGGACC-5', 508, 3'-CCGGTTC-5', 556, 3'-CCAGTCC-5', 578, 3'-TTATACC-5', 605, 3'-CCGGTCC-5', 648, 3'-CCGGTTC-5', 692, 3'-CCAGTCC-5', 714, 3'-TCGGACT-5', 732, 3'-TCGCACC-5', 741, 3'-CTACACC-5', 787, 3'-TCGGTTC-5', 874, 3'-TCGGACC-5', 899, 3'-TCGCTCT-5', 913, 3'-TCGGTCC-5', 948, 3'-CCGTACC-5', 953, 3'-TTAGTCC-5', 984, 3'-TTGGACC-5', 1015, 3'-TCACTCT-5', 1079, 3'-TCGGACC-5', 1198, 3'-TTGTACC-5', 1207, 3'-CCACTTT-5', 1212, 3'-CCGCACC-5', 1244, 3'-TTGGATC-5', 1306, 3'-TCAGACC-5', 1356, 3'-TTATTCT-5', 1365, 3'-TCGTTTT-5', 1371, 3'-TTGTTTT-5', 1394, 3'-CCACACT-5', 1479, 3'-TTGCTTC-5', 1555, 3'-CCGTTTT-5', 1561, 3'-TTACTTT-5', 1582, 3'-TTGGATT-5', 1591, 3'-TTAATTT-5', 1697, 3'-TTATACC-5', 1742, 3'-CCGCACC-5', 1897, 3'-CCGTACT-5', 1953, 3'-TTGGACC-5', 1959, 3'-TCGGACC-5', 2009, 3'-TCGTTCT-5', 2023, 3'-TTACACC-5', 2065, 3'-CCGGTCC-5', 2077, 3'-TCACATT-5', 2087, 3'-TCAAACT-5', 2141, 3'-TTGTACC-5', 2152, 3'-CCGCTTT-5', 2157, 3'-CCAGTCC-5', 2250, 3'-TCAAACT-5', 2257, 3'-TCGGACC-5', 2268, 3'-TCGTACC-5', 2277, 3'-CCACTTT-5', 2282, 3'-TTGGACC-5', 2385, 3'-TCGGACC-5', 2435, 3'-TCACTCT-5', 2449, 3'-TCGTTTT-5', 2476, 3'-TTGTTTT-5', 2490, 3'-TCATTCT-5', 2503, 3'-CCGGTCC-5', 2519, 3'-CCAGTCC-5', 2587, 3'-TCACACC-5', 2605, 3'-TTGTACC-5', 2614, 3'-CCACTTT-5', 2619, 3'-TCACACC-5', 2658, 3'-TTGGACC-5', 2720, 3'-TCGGACC-5', 2770, 3'-TCGTACT-5', 2784, 3'-TTGATTC-5', 2914, 3'-CCGATTT-5', 3009, 3'-TTGATTC-5', 3031, 3'-CCGCACC-5', 3047, 3'-TCGGACC-5', 3128, 3'-TTGTTCC-5', 3141, 3'-CCACTTT-5', 3146, 3'-TTGTATT-5', 3169, 3'-CCACACC-5', 3186, 3'-TCGGTTC-5', 3273, 3'-TCGGACC-5', 3298, 3'-TTGTTCT-5', 3307, 3'-TCGTTTT-5', 3313, 3'-TTGTTCT-5', 3340, 3'-TCGTTCT-5', 3374, 3'-CCGAACT-5', 3401, 3'-CCGTATC-5', 3446, 3'-TTGATCT-5', 3463, 3'-TTGGTCT-5', 3486, 3'-CTGTTCT-5', 3759, 3'-CTACACC-5', 3810, 3'-CTGGTCC-5', 3871, 3'-TCATTCT-5', 3893, 3'-CTACTTT-5', 3922, 3'-CCGGTCC-5', 3951, 3'-TCGGACC-5', 4037, 3'-TTGTATC-5', 4046, 3'-TCACTCT-5', 4051, 3'-TTACACT-5', 4092, 3'-CCGGTCC-5', 4102, 3'-CCGTACC-5', 4107, 3'-CCGGTCC-5', 4170, 3'-TCGAACC-5', 4188.

Positive strand in the negative direction there are 40: 3'-CTGAATT-5', 20, 3'-TTGGACC-5', 32, 3'-CTGCATT-5', 152, 3'-TTGAACC-5', 846, 3'-TCACACC-5', 882, 3'-TTGAACC-5', 1012, 3'-TCACTCC-5', 1058, 3'-TCACACC-5', 1128, 3'-TTGAACC-5', 1303, 3'-TTGCACC-5', 1339, 3'-TTGCACT-5', 1347, 3'-CCAGTCT-5', 1354, 3'-CCATTTC-5', 1380, 3'-TCGCTCT-5', 1450, 3'-CTATATC-5', 1528, 3'-TTATTTT-5', 1727, 3'-CTGCACT-5', 2000, 3'-CTACTCC-5', 2352, 3'-TTGAACC-5', 2382, 3'-TCACACC-5', 2418, 3'-CTGCACT-5', 2426, 3'-TTGAATC-5', 2708, 3'-TTGAACC-5', 2717, 3'-CTGCACC-5', 2761, 3'-TTGAACC-5', 3245, 3'-TTGCACT-5', 3289, 3'-CCAGATC-5', 3488, 3'-CTGCTCC-5', 3582, 3'-CCATTTC-5', 3688, 3'-CTGGACT-5', 3747, 3'-CTGAACC-5', 3784, 3'-CCATACC-5', 3858, 3'-TCACACC-5', 3967.

Inverse complement, negative strand, negative direction there are 32: 3'-GATACAA-5', 213, 3'-GGACCGA-5', 598, 3'-AGTGCGG-5', 664, 3'-GGACTGG-5', 734, 3'-AGTGTGG-5', 882, 3'-GAAGTGA-5', 1056, 3'-AGTGTGG-5', 1128, 3'-GGACCGG-5', 1200, 3'-AGAGCGA-5', 1448, 3'-GGTCCGA-5', 1462, 3'-GATATAG-5', 1528, 3'-AGAACGG-5', 1608, 3'-AAAATAG-5', 1730, 3'-AGTGCAG-5', 1773, 3'-GGACCGA-5', 1843, 3'-AGTGCGG-5', 1992, 3'-AGTGCGG-5', 2208, 3'-AGTGTGG-5', 2418, 3'-AGTACGG-5', 2535, 3'-AGTACGG-5', 2753, 3'-AAAGTAG-5', 2887, 3'-GATTCGA-5', 3033, 3'-GGACCGG-5', 3130, 3'-AGTGCGG-5', 3281, 3'-AGTCCGA-5', 3398, 3'-GGTCTAG-5', 3488, 3'-GGTATGG-5', 3858, 3'-GGTCCGG-5', 3873, 3'-AGTGTGG-5', 3967.

Negative strand in the positive direction there are 45: 3'-TTGTATT-5', 115, 3'-CTGTTTT-5', 147, 3'-CCACACT-5', 345, 3'-CCGGACT-5', 746, 3'-CTGCACT-5', 1372, 3'-CTGCACT-5', 1472, 3'-CCAGACT-5', 1744, 3'-CCACTTC-5', 1914, 3'-CTATTTC-5', 1978, 3'-CCAGTCC-5', 2026, 3'-TCGCTTC-5', 2095, 3'-TCATATT-5', 2178, 3'-CTGCATT-5', 2206, 3'-CCAGATC-5', 2230, 3'-TCAATCT-5', 2235, 3'-CTGTTTC-5', 2263, 3'-TCACTCT-5', 2306, 3'-CTACACC-5', 2430, 3'-CTAATTT-5', 2440, 3'-CCGCACC-5', 2566, 3'-TTATACC-5', 2590, 3'-CCACACC-5', 2602, 3'-CCACACT-5', 2636, 3'-TCAGATT-5', 2868, 3'-CTGCTCC-5', 2978, 3'-CCAGTCC-5', 2998, 3'-CCAGTCC-5', 3084, 3'-CTGGTCT-5', 3245, 3'-TCGCTCT-5', 3276, 3'-CTGGTCT-5', 3299, 3'-CTGCTCC-5', 3309, 3'-CTGCACC-5', 3322, 3'-CCGCATC-5', 3328, 3'-TTGCACT-5', 3343, 3'-CTGTTCC-5', 3352, 3'-TTGCATC-5', 3402, 3'-TCACACT-5', 3507, 3'-CCAGACC-5', 3550, 3'-CTGTTCC-5', 3625, 3'-TCACACC-5', 3824, 3'-TCATTTT-5', 4120, 3'-TCACTCT-5', 4128, 3'-TTGATTT-5', 4134, 3'-TTAGTTT-5', 4139.

Positive strand in the positive direction there are 75: 3'-CTGGACC-5', 40, 3'-CCGGTCC-5', 215, 3'-TTACACT-5', 230, 3'-CCGGACC-5', 286, 3'-CCGTTCC-5', 503, 3'-TCGGTCC-5', 515, 3'-CCGCTCT-5', 557, 3'-CCGTTCC-5', 587, 3'-CCGCTCT-5', 641, 3'-CCGTTCC-5', 671, 3'-CCGGACT-5', 725, 3'-CCGTTCC-5', 823, 3'-TCGGTCT-5', 835, 3'-TTGGACC-5', 847, 3'-CCGTTCC-5', 923, 3'-TCGGTCT-5', 935, 3'-TTGGACC-5', 947, 3'-CCGTTCC-5', 1007, 3'-TCGCTCT-5', 1061, 3'-CCGGTCC-5', 1175, 3'-CCGCTCT-5', 1229, 3'-CCGTTCC-5', 1259, 3'-CCGTTCC-5', 1327, 3'-CCGCTCT-5', 1381, 3'-CCGTTCC-5', 1427, 3'-CCGCTCT-5', 1481, 3'-TCGTTCC-5', 1511, 3'-CCGCTCT-5', 1565, 3'-CCGCACT-5', 1720, 3'-CCACACC-5', 1805, 3'-CCGCTCT-5', 1921, 3'-CCGTTCT-5', 1948, 3'-CCACACC-5', 1971, 3'-TCAATTT-5', 2136, 3'-TTGTACT-5', 2141, 3'-CTACTTT-5', 2146, 3'-CCGTTCT-5', 2190, 3'-CCAGTCT-5', 2222, 3'-TTGGTCT-5', 2228, 3'-CCGCACT-5', 2555, 3'-CCGGTCC-5', 2574, 3'-TCAGTCT-5', 2609, 3'-TCAGTTC-5', 2615, 3'-TCAGTCC-5', 2620, 3'-CTATATT-5', 2662, 3'-TCAATCC-5', 2668, 3'-TCGTTTT-5', 2707, 3'-TCGATTC-5', 2789, 3'-TTGCTCC-5', 2806, 3'-CTAAACT-5', 2871, 3'-CTGGTCC-5', 2876, 3'-CCAGACT-5', 2943, 3'-CCGGACC-5', 2988, 3'-CCAGACC-5', 3021, 3'-TTATACC-5', 3162, 3'-CTGGTTT-5', 3175, 3'-TCGGTCT-5', 3221, 3'-CTACTCC-5', 3478, 3'-CCGATCC-5', 3484, 3'-TCGATCC-5', 3522, 3'-CTGGTCT-5', 3548, 3'-TCACACT-5', 3594, 3'-CCACTCC-5', 3647, 3'-CCGGACC-5', 3679, 3'-CCGGACC-5', 3758, 3'-CTGGACC-5', 3787, 3'-TCACTCC-5', 3878, 3'-TCAGACT-5', 3924, 3'-TCACACC-5', 3966, 3'-CCACACT-5', 3971, 3'-TTACTCC-5', 4096, 3'-CTACTCC-5', 4102, 3'-CTAAATC-5', 4136, 3'-CCACTCC-5'.

Inverse complement, negative strand, positive direction there are 61: 3'-AGAGTGG-5', 53, 3'-AATGTGA-5', 230, 3'-GGAGCGA-5', 429, 3'-AGACCGG-5', 442, 3'-GGTGCGG-5', 489, 3'-AGTGCGG-5', 498, 3'-AGTGCGG-5', 582, 3'-AGTGCGG-5', 666, 3'-GGTGCAG-5', 784, 3'-AGTGCGG-5', 1086, 3'-AGTGCGG-5', 1170, 3'-AGTGCGG-5', 1254, 3'-AATGCGG-5', 1322, 3'-AATGCGG-5', 1422, 3'-AGTGCGG-5', 1590, 3'-GAAGCGG-5', 1636, 3'-GGTGCGG-5', 1764, 3'-AGTGCAG-5', 1787, 3'-GGTGTGG-5', 1805, 3'-GAACTGG-5', 1953, 3'-GGTGTGG-5', 1971, 3'-AAAGCAG-5', 2007, 3'-AGTGCAG-5', 2064, 3'-GAACCAG-5', 2227, 3'-AGATCAA-5', 2232, 3'-AGTGCAG-5', 2327, 3'-GGTGCAA-5', 2335, 3'-GAAATAG-5', 2626, 3'-GATATAA-5', 2662, 3'-GGACTGA-5', 2674, 3'-AGAGCAA-5', 2705, 3'-AAAGTGG-5', 2711, 3'-GGTGCAA-5', 2801, 3'-AGAATGA-5', 2841, 3'-GATTTGA-5', 2871, 3'-GGTCTGA-5', 2943, 3'-GGTCTGG-5', 3021, 3'-AATATGG-5', 3162, 3'-GAAATGG-5', 3168, 3'-GGACCAA-5', 3174, 3'-GGAATGA-5', 3441, 3'-GATGCAG-5', 3460, 3'-AGTGCAG-5', 3465, 3'-GGACCAG-5', 3547, 3'-GGAATGA-5', 3567, 3'-AGTGTGA-5', 3594, 3'-GAAGCGG-5', 3670, 3'-AATCCGA-5', 3799, 3'-AGAATGA-5', 3835, 3'-GAACCAG-5', 3840, 3'-AGAGTGA-5', 3876, 3'-AGTCTGA-5', 3924, 3'-AGTGTGG-5', 3966, 3'-GGTGTGA-5', 3971, 3'-AGAGTGG-5', 4040, 3'-AGAACAG-5', 4069, 3'-GAAATGA-5', 4094, 3'-GATTTAG-5', 4136.

Inverse complement, positive strand, negative direction there are 100: 3'-AGACTGA-5', 17, 3'-GGACCAG-5', 34, 3'-AAAACAA-5', 69, 3'-GATATGG-5', 77, 3'-AAACTGA-5', 130, 3'-AAAACAG-5', 167, 3'-GGTATAA-5', 181, 3'-GAAACAA-5', 229, 3'-GATGTAA-5', 247, 3'-AGTTCAA-5', 255, 3'-AAACCAG-5', 261, 3'-AATATGA-5', 274, 3'-AGAACAG-5', 288, 3'-AAACTGA-5', 307, 3'-GGTGCGG-5', 380, 3'-AGTGCGA-5', 448, 3'-AATACGA-5', 492, 3'-AAATTAG-5', 499, 3'-AGATTGA-5', 585, 3'-AATATGG-5', 605, 3'-AATACAA-5', 635, 3'-AAATTGG-5', 643, 3'-AGTTCGA-5', 721, 3'-AGACCAG-5', 727, 3'-AATACAA-5', 769, 3'-AAATTAG-5', 777, 3'-GATGTGG-5', 787, 3'-AGAGCGA-5', 911, 3'-GATCCAG-5', 975, 3'-AGATTGG-5', 1045, 3'-AGAGTGA-5', 1077, 3'-AAATTAG-5', 1234, 3'-AGTCTGG-5', 1356, 3'-AGAGCAA-5', 1369, 3'-AAAACAA-5', 1388, 3'-AGTGCAG-5', 1471, 3'-GGTGTGA-5', 1479, 3'-AGTGCAA-5', 1536, 3'-AGAACGA-5', 1553, 3'-AATACAG-5', 1566, 3'-GAAACAA-5', 1585, 3'-GAAATGA-5', 1663, 3'-AAAGCGG-5', 1680, 3'-GAATTAA-5', 1696, 3'-AATATGG-5', 1742, 3'-AATACAA-5', 1878, 3'-AAATTAG-5', 1887, 3'-AGACTGA-5', 1935, 3'-AGAATGG-5', 1948, 3'-AGAGCAA-5', 2021, 3'-AATGTGG-5', 2065, 3'-GGTGCAG-5', 2082, 3'-AGTGTAA-5', 2087, 3'-AGTTTGA-5', 2141, 3'-AGACCAA-5', 2147, 3'-GATACAA-5', 2180, 3'-AAAATGA-5', 2187, 3'-GGTGCGG-5', 2197, 3'-AGTTTGA-5', 2257, 3'-AGACCAG-5', 2263, 3'-AATACAA-5', 2305, 3'-AAACTAG-5', 2313, 3'-AGAGTGA-5', 2447, 3'-GATTCGG-5', 2454, 3'-AAAGCAA-5', 2474, 3'-AAAGCAA-5', 2480, 3'-AAAACAA-5', 2509, 3'-AGACCAG-5', 2600, 3'-AGTGTGG-5', 2605, 3'-AAATCAG-5', 2649, 3'-AGTGTGG-5', 2658, 3'-AAAACAA-5', 2842, 3'-AGAATGG-5', 3004, 3'-AAAATAA-5', 3013, 3'-AAACTAA-5', 3030, 3'-AGACCAG-5', 3123, 3'-AAATTAG-5', 3176, 3'-GGTGTGG-5', 3186, 3'-AGAGCAA-5', 3311, 3'-AAAACAA-5', 3330, 3'-AAATTGA-5', 3358, 3'-GAAGTGA-5', 3410, 3'-GAACTAG-5', 3462, 3'-AAACCAG-5', 3485, 3'-AATCCAG-5', 3681, 3'-GGAACAG-5', 3725, 3'-GGACTGG-5', 3749, 3'-AATGCAG-5', 3772, 3'-GATGTGG-5', 3810, 3'-GGACCAG-5', 3870, 3'-GGAGTAA-5', 3891, 3'-AGTTCAA-5', 4026, 3'-AGACCAG-5', 4032, 3'-AAAATAA-5', 4071, 3'-AATGTGA-5', 4092, 3'-AGTTCAA-5', 4177.

Inverse complement, positive strand, positive direction there are 75: 3'-GGTCCGA-5', 10, 3'-AGTCCGG-5', 92, 3'-AATCCAG-5', 152, 3'-GGTCCAG-5', 217, 3'-GGTGTGA-5', 345, 3'-GAAGCGG-5', 459, 3'-AGAATGA-5', 524, 3'-GAAGCGG-5', 595, 3'-GATGCGA-5', 652, 3'-GGTGCGA-5', 777, 3'-GGACCGG-5', 849, 3'-GGACCGG-5', 949, 3'-GGTCCGA-5', 1177, 3'-AAAGCAG-5', 1183, 3'-GAAGCGG-5', 1308, 3'-GAAGCGG-5', 1408, 3'-AATTCGG-5', 1541, 3'-GATGCGA-5', 1576, 3'-GGACTGG-5', 1662, 3'-GGTCTGA-5', 1744, 3'-GGACCGA-5', 1817, 3'-GGTCCGG-5', 1857, 3'-AGAATGG-5', 1888, 3'-GAAGTAG-5', 2110, 3'-AGTATAA-5', 2178, 3'-GGACTGG-5', 2213, 3'-GGTCTAG-5', 2230, 3'-AGAGTGG-5', 2247, 3'-AAAGTGA-5', 2304, 3'-GGTCCGA-5', 2318, 3'-AATCCGA-5', 2368, 3'-GATGTGG-5', 2430, 3'-GGACCGA-5', 2435, 3'-AGAGTGG-5', 2470, 3'-GGTACAA-5', 2475, 3'-GGACCGG-5', 2571, 3'-AATATGG-5', 2590, 3'-GGTGTGG-5', 2602, 3'-AGTTCAG-5', 2617, 3'-GGTGTGA-5', 2636, 3'-AGTCTAA-5', 2868, 3'-AAACTGG-5', 2873, 3'-GGTCCGG-5', 2878, 3'-AGACCGA-5', 2885, 3'-GGAGTAA-5', 2902, 3'-AGACTGA-5', 2945, 3'-AGACCGG-5', 2985, 3'-GGACCGG-5', 2990, 3'-GGAACAG-5', 3003, 3'-GGTCCAG-5', 3018, 3'-AGACCAA-5', 3023, 3'-AGTCCGG-5', 3036, 3'-GGACCAA-5', 3049, 3'-GAAGTAG-5', 3250, 3'-AGTGCAG-5', 3255, 3'-GGACCAG-5', 3298, 3'-AGAGTGA-5', 3317, 3'-GGTACAA-5', 3337, 3'-GGAACGG-5', 3375, 3'-AGTGTGA-5', 3507, 3'-GATCCGA-5', 3524, 3'-GGTCTGG-5', 3550, 3'-AGAGTGG-5', 3612, 3'-GGACCGG-5', 3681, 3'-AGTGTGG-5', 3824, 3'-GAACTGG-5', 4018, 3'-AAAATAG-5', 4123, 3'-GAACTAA-5', 4133, 3'-AAATCAA-5', 4138.

Initiator elements (BBCABW)

Core promoters

There are five Inrs, positive strand, negative direction: 3'-TCCACT-5', 4423, 3'-CCCAGA-5', 4448, 3'-TCCACT-5', 4459, 3'-CCCACT-5', 4485, 3'-TTCACA-5', 4531.

There are five Inrs, negative strand, positive direction: 3'-GTCAGT-5', 4271, 3'-CTCATT-5', 4309, 3'-TGCAGA-5', 4317, 3'-CCCAGA-5', 4330, 3'-CTCACT-5', 4338.

There are four Inrs, positive strand, positive direction: 3'-TCCAGT-5', 4269, 3'-CTCACT-5', 4350, 3'-CCCACT-5', 4399, 3'-CCCAGA-5', 4414.

Proximal promoters

There are five Inrs on the negative strand in the negative direction: 3'-GTCACT-5', 4200, 3'-TCCAGT-5', 4307, 3'-GTCACT-5', 4319, 3'-CCCACT-5', 4353, 3'-GTCACA-5', 4359.

There are nine Inrs on the positive strand in the negative direction: 3'-GCCAGA-5', 4233, 3'-TGCAGT-5', 4317, 3'-TGCACT-5', 4340, 3'-GCCAGT-5', 4415, 3'-TCCACT-5', 4423, 3'-CCCAGA-5', 4448, 3'-TCCACT-5', 4459, 3'-CCCACT-5', 4485, 3'-TTCACA-5', 4531.

There is six Inrs on the negative strand in the positive direction: 3'-CTCAGA-5', 4195, 3'-GTCAGT-5', 4271, 3'-CTCATT-5', 4309, 3'-TGCAGA-5', 4317, 3'-CCCAGA-5', 4330, 3'-CTCACT-5', 4338.

There is four Inrs on the positive strand in the positive direction: 3'-TCCAGT-5', 4269, 3'-CTCACT-5', 4350, 3'-CCCACT-5', 4399, 3'-CCCAGA-5', 4414.

Distal promoters

Negative strand in the negative direction there are 44: 3'-TCCATA-5', 179, 3'-CCCAGT-5', 206, 3'-CTCAGA-5', 278, 3'-GTCACT-5', 299, 3'-TTCACA-5', 322, 3'-TCCAGT-5', 439, 3'-TGCATT-5', 533, 3'-TCCAGT-5', 568, 3'-TCCAGT-5', 576, 3'-TCCAGT-5', 712, 3'-GGCAGA-5', 754, 3'-GCCACT-5', 868, 3'-GTCACT-5', 1034, 3'-CCCACT-5', 1049, 3'-CTCACT-5', 1077, 3'-GGCACA-5', 1220, 3'-GTCACT-5', 1325, 3'-GTCAGA-5', 1354, 3'-CTCAGA-5', 1444, 3'-GGCAGT-5', 1511, 3'-TGCAGA-5', 1774, 3'-GTCACT-5', 1978, 3'-GTCACA-5', 2085, 3'-TCCAGT-5', 2248, 3'-GTCACT-5', 2404, 3'-CTCACT-5', 2447, 3'-TCCAGT-5', 2585, 3'-GTCACA-5', 2603, 3'-GTCACA-5', 2656, 3'-GTCACT-5', 2739, 3'-TTCACA-5', 2860, 3'-TCCACT-5', 3144, 3'-CCCACA-5', 3184, 3'-TTCACT-5', 3410, 3'-GTCATT-5', 3480, 3'-TCCACT-5', 3825, 3'-CTCATA-5', 3829, 3'-CTCATT-5', 3891, 3'-TTCACA-5', 3939.

Positive strand in the negative direction there are 59: 3'-GCCATA-5', 39, 3'-TGCATT-5', 152, 3'-GTCACT-5', 208, 3'-GGCACA-5', 266, 3'-GGCACA-5', 518, 3'-GGCACA-5', 960, 3'-GGCAGA-5', 1023, 3'-TGCAGT-5', 1032, 3'-TTCACT-5', 1056, 3'-GGCACA-5', 1116, 3'-CTCACA-5', 1126, 3'-GGCAGA-5', 1314, 3'-TGCAGT-5', 1323, 3'-TGCACT-5', 1347, 3'-TCCAGT-5', 1352, 3'-TCCATT-5', 1378, 3'-CCCAGA-5', 1411, 3'-TGCAGT-5', 1472, 3'-CTCACT-5', 1491, 3'-CCCAGA-5', 1518, 3'-TCCAGT-5', 1532, 3'-TGCACA-5', 1719, 3'-GGCAGA-5', 1967, 3'-TGCAGT-5', 1976, 3'-GCCACT-5', 1995, 3'-TGCACT-5', 2000, 3'-TGCAGT-5', 2083, 3'-GCCAGT-5', 2211, 3'-TGCAGT-5', 2402, 3'-TGCACT-5', 2426, 3'-TCCACT-5', 2632, 3'-GCCAGT-5', 2654, 3'-GGCACA-5', 2665, 3'-TGCAGT-5', 2737, 3'-GCCACT-5', 2756, 3'-GCCATT-5', 3284, 3'-TGCACT-5', 3289, 3'-TGCAGA-5', 3431, 3'-GGCATA-5', 3445, 3'-GGCATA-5', 3451, 3'-GGCAGT-5', 3478, 3'-GGCAGA-5', 3589, 3'-GGCAGT-5', 3600, 3'-GTCAGA-5', 3625, 3'-GGCACA-5', 3632, 3'-CTCAGA-5', 3644, 3'-GCCATT-5', 3686, 3'-TCCACA-5', 3692, 3'-CCCATA-5', 3856, 3'-CTCACA-5', 3965.

Inverse complement, negative strand, negative direction there are 46: 3'-TCTGAC-5', 16, 3'-TGTGGA-5', 62, 3'-TGTGCA-5', 342, 3'-TGTGCA-5', 531, 3'-AGTGCG-5', 663, 3'-TGTGGG-5', 749, 3'-TCTGAG-5', 916, 3'-TGTGCG-5', 963, 3'-ACTGAA-5', 1052, 3'-AGTGAG-5', 1057, 3'-TCTGAG-5', 1082, 3'-TGTGGA-5', 1129, 3'-AGTGGA-5', 1171, 3'-AATGAA-5', 1298, 3'-TCTGAG-5', 1403, 3'-AGTGAC-5', 1492, 3'-TGTGAA-5', 1544, 3'-TCTGAA-5', 1617, 3'-AGTGCA-5', 1772, 3'-TCTGAC-5', 1934, 3'-AGTGCG-5', 1991, 3'-TCTGAG-5', 2026, 3'-TATGAC-5', 2162, 3'-ACTGGC-5', 2190, 3'-AGTGCG-5', 2207, 3'-TGTGAA-5', 2551, 3'-AGTGAA-5', 2578, 3'-ACTGAG-5', 2787, 3'-TATGGA-5', 2994, 3'-AGTGGG-5', 3057, 3'-AGTGAA-5', 3101, 3'-AGTGAA-5', 3240, 3'-AGTGCG-5', 3280, 3'-TCTGAC-5', 3425, 3'-TATGAC-5', 3541, 3'-TATGCG-5', 3547, 3'-TATGGA-5', 3859, 3'-TGTGGA-5', 3968, 3'-TGTGAA-5', 3983.

Inverse complement, positive strand, negative direction there are 54, 3'-ACTGAA-5', 18, 3'-TATGGG-5', 78, 3'-ACTGAA-5', 131, 3'-TATGAG-5', 275, 3'-AGTGAG-5', 300, 3'-ACTGAC-5', 308, 3'-AGTGCG-5', 447, 3'-AGTGAA-5', 472, 3'-AGTGGA-5', 523, 3'-AGTGAG-5', 1035, 3'-AGTGAG-5', 1078, 3'-AGTGGC-5', 1121, 3'-AGTGAG-5', 1326, 3'-TCTGGG-5', 1357, 3'-AGTGCA-5', 1470, 3'-ACTGCA-5', 1494, 3'-AGTGCA-5', 1535, 3'-AATGAA-5', 1581, 3'-AATGCC-5', 1634, 3'-TATGGC-5', 1743, 3'-ACTGAG-5', 1936, 3'-AATGGC-5', 1949, 3'-AGTGAG-5', 1979, 3'-ACTGCA-5', 1998, 3'-TGTGGC-5', 2066, 3'-AATGAC-5', 2188, 3'-AGTGAG-5', 2405, 3'-ACTGCA-5', 2424, 3'-AGTGAG-5', 2448, 3'-TGTGGC-5', 2606, 3'-AGTGAG-5', 2740, 3'-ACTGCA-5', 2759, 3'-TGTGCA-5', 2863, 3'-AATGGC-5', 3005, 3'-TGTGAG-5', 3268, 3'-AGTGAC-5', 3411, 3'-TGTGCA-5', 3429, 3'-TGTGCC-5', 3561, 3'-AATGGG-5', 3660, 3'-TGTGGG-5', 3712, 3'-ACTGGG-5', 3750, 3'-AATGCA-5', 3771, 3'-TCTGGA-5', 3836, 3'-ACTGCC-5', 3852, 3'-TGTGGC-5', 3960, 3'-AGTGAG-5', 4050, 3'-TGTGAG-5', 4093.

Negative strand in the positive direction there 87: 3'-TCCAGA-5', 15, 3'-GGCATT-5', 22, 3'-GTCACA-5', 155, 3'-CCCAGA-5', 204, 3'-GCCACA-5', 343, 3'-CGCAGA-5', 396, 3'-TGCAGA-5', 438, 3'-CCCAGA-5', 468, 3'-TGCACA-5', 548, 3'-TCCACA-5', 632, 3'-CGCACT-5', 686, 3'-CGCACA-5', 800, 3'-GCCAGA-5', 835, 3'-GCCACA-5', 884, 3'-GCCAGA-5', 935, 3'-GCCACA-5', 984, 3'-CGCACA-5', 1052, 3'-CGCACA-5', 1136, 3'-TGCACA-5', 1220, 3'-CCCAGT-5', 1250, 3'-CGCAGA-5', 1316, 3'-TGCACT-5', 1372, 3'-CGCAGA-5', 1416, 3'-TGCACT-5', 1472, 3'-CCCACT-5', 1502, 3'-CGCACA-5', 1556, 3'-GGCATT-5', 1702, 3'-CCCAGA-5', 1742, 3'-TGCACA-5', 1822, 3'-TCCACT-5', 1912, 3'-TGCAGA-5', 1937, 3'-GGCACT-5', 1996, 3'-CCCAGT-5', 2024, 3'-TCCACA-5', 2029, 3'-CTCAGT-5', 2060, 3'-TGCAGT-5', 2065, 3'-GCCACT-5', 2072, 3'-TTCAGT-5', 2098, 3'-CTCATA-5', 2176, 3'-TGCATT-5', 2206, 3'-GTCAGA-5', 2222, 3'-CTCAGA-5', 2239, 3'-TTCACT-5', 2304, 3'-TGCAGT-5', 2328, 3'-GTCACT-5', 2425, 3'-GTCAGA-5', 2609, 3'-CTCAGA-5', 2699, 3'-TGCAGA-5', 2721, 3'-CTCAGA-5', 2729, 3'-TGCAGA-5', 2859, 3'-CTCAGA-5', 2866, 3'-CTCATT-5', 2902, 3'-GTCACT-5', 2929, 3'-TTCAGT-5', 2936, 3'-TGCACA-5', 2962, 3'-TGCATT-5', 3072, 3'-CCCAGT-5', 3082, 3'-CCCAGA-5', 3091, 3'-TCCACA-5', 3192, 3'-CTCACA-5', 3209, 3'-GCCAGA-5', 3221, 3'-TGCAGT-5', 3232, 3'-TGCAGT-5', 3281, 3'-CTCACT-5', 3317, 3'-TGCACT-5', 3343, 3'-CCCAGT-5', 3379, 3'-CCCACT-5', 3388, 3'-GGCACA-5', 3409, 3'-TGCAGT-5', 3461, 3'-GGCAGA-5', 3473, 3'-CTCACA-5', 3505, 3'-GCCACA-5', 3705, 3'-TCCAGA-5', 3806, 3'-GTCACA-5', 3822, 3'-TGCAGA-5', 3831, 3'-TCCAGA-5', 3891, 3'-CGCAGA-5', 3916, 3'-GTCACA-5', 3954, 3'-TGCAGT-5', 3962, 3'-GGCACT-5', 4006, 3'-TCCACT-5', 4013.

Positive strand in the positive direction there are 40: 3'-TCCAGT-5', 153, 3'-CGCACA-5', 1020, 3'-CCCAGA-5', 1711, 3'-CGCACT-5', 1720, 3'-CCCACA-5', 1803, 3'-CCCAGA-5', 1958, 3'-TCCACA-5', 1969, 3'-GTCAGT-5', 2100, 3'-TCCACT-5', 2128, 3'-TCCAGT-5', 2220, 3'-TCCAGA-5', 2258, 3'-TCCACT-5', 2375, 3'-CGCAGT-5', 2423, 3'-GTCACA-5', 2464, 3'-CCCAGA-5', 2489, 3'-TTCACT-5', 2511, 3'-CGCACT-5', 2555, 3'-GTCAGT-5', 2607, 3'-CTCAGT-5', 2613, 3'-TTCAGT-5', 2618, 3'-TCCATA-5', 2642, 3'-TCCAGA-5', 3019, 3'-CTCAGA-5', 3187, 3'-TGCAGA-5', 3256, 3'-CTCACA-5', 3592, 3'-GCCAGA-5', 3608, 3'-CTCACT-5', 3712, 3'-TCCATT-5', 3731, 3'-TCCAGA-5', 3771, 3'-CCCAGT-5', 3820, 3'-GTCACT-5', 3843, 3'-CTCACT-5', 3876, 3'-TTCAGA-5', 3922, 3'-TCCACT-5', 3934, 3'-GTCACA-5', 3964, 3'-CGCAGA-5', 4056.

Inverse complement, negative strand, positive direction there are 94: 3'-AGTGGG-5', 54, 3'-TCTGCA-5', 224, 3'-TGTGAA-5', 231, 3'-ACTGCC-5', 238, 3'-TCTGAG-5', 256, 3'-TCTGGA-5', 271, 3'-ACTGGG-5', 348, 3'-AGTGCG-5', 497, 3'-AGTGCG-5', 581, 3'-AGTGCG-5', 665, 3'-ACTGCG-5', 749, 3'-TGTGGC-5', 819, 3'-ACTGCC-5', 901, 3'-TGTGGC-5', 919, 3'-ACTGCG-5', 1001, 3'-TGTGGC-5', 1023, 3'-AGTGCG-5', 1085, 3'-AGTGCG-5', 1160, 3'-AGTGCG-5', 1169, 3'-AGTGCG-5', 1253, 3'-ACTGAG-5', 1287, 3'-AATGCG-5', 1321, 3'-TCTGGC-5', 1377, 3'-TCTGCG-5', 1396, 3'-AATGCG-5', 1421, 3'-TCTGGC-5', 1477, 3'-TCTGCG-5', 1496, 3'-ACTGCA-5', 1505, 3'-AGTGCG-5', 1589, 3'-AGTGCG-5', 1725, 3'-AGTGCA-5', 1786, 3'-TGTGGA-5', 1806, 3'-TCTGGG-5', 1865, 3'-ACTGGG-5', 1954, 3'-TGTGGC-5', 1972, 3'-TCTGGC-5', 1993, 3'-AGTGCA-5', 2063, 3'-AGTGGC-5', 2068, 3'-TATGGC-5', 2160, 3'-ACTGCA-5', 2204, 3'-AGTGCA-5', 2326, 3'-TGTGCA-5', 2681, 3'-AGTGGA-5', 2712, 3'-ACTGCC-5', 2823, 3'-AATGAC-5', 2842, 3'-TCTGCA-5', 2857, 3'-TCTGGC-5', 2884, 3'-AATGGG-5', 2911, 3'-TCTGAC-5', 2944, 3'-TCTGAG-5', 2951, 3'-TGTGCA-5', 2960, 3'-TCTGGC-5', 2984, 3'-TCTGAG-5', 3007, 3'-AGTGCC-5', 3011, 3'-TATGAC-5', 3028, 3'-TCTGCA-5', 3061, 3'-AATGCA-5', 3070, 3'-ACTGGC-5', 3118, 3'-TCTGAG-5', 3124, 3'-TATGGA-5', 3163, 3'-AATGGG-5', 3169, 3'-AGTGCC-5', 3235, 3'-TATGAG-5', 3261, 3'-TCTGCA-5', 3268, 3'-TCTGCA-5', 3279, 3'-ACTGCA-5', 3320, 3'-ACTGGC-5', 3346, 3'-TCTGCC-5', 3359, 3'-TCTGGC-5', 3406, 3'-AATGCC-5', 3431, 3'-TGTGGA-5', 3437, 3'-AATGAA-5', 3442, 3'-AATGAG-5', 3446, 3'-AGTGGG-5', 3450, 3'-AGTGCA-5', 3464, 3'-AATGAC-5', 3568, 3'-TGTGAA-5', 3595, 3'-AGTGAC-5', 3713, 3'-ACTGAG-5', 3736, 3'-AATGAC-5', 3783, 3'-AATGAA-5', 3836, 3'-AGTGAG-5', 3877, 3'-TGTGAG-5', 3904, 3'-TCTGAA-5', 3925, 3'-TGTGCA-5', 3960, 3'-TGTGAC-5', 3972, 3'-AGTGGG-5', 4041, 3'-ACTGAA-5', 4090, 3'-AATGAG-5', 4095.

Inverse complement, positive strand, positive direction there are 47: 3'-TCTGAC-5', 236, 3'-TGTGAC-5', 346, 3'-TCTGCC-5', 399, 3'-TCTGGC-5', 441, 3'-AATGAA-5', 525, 3'-TGTGCA-5', 569, 3'-TGTGCG-5', 803, 3'-TGTGCG-5', 887, 3'-TGTGCG-5', 987, 3'-TGTGAC-5', 1139, 3'-TGTGCC-5', 1223, 3'-TGTGCC-5', 1559, 3'-ACTGGG-5', 1663, 3'-TGTGCC-5', 1698, 3'-TCTGAA-5', 1745, 3'-AATGGG-5', 1889, 3'-ACTGGC-5', 2214, 3'-AGTGGA-5', 2248, 3'-AGTGAG-5', 2305, 3'-AGTGGG-5', 2313, 3'-AGTGAC-5', 2341, 3'-TCTGAA-5', 2417, 3'-TGTGGA-5', 2431, 3'-TATGAA-5', 2740, 3'-TCTGGA-5', 2862, 3'-AGTGAC-5', 2930, 3'-ACTGAA-5', 2946, 3'-TGTGGG-5', 2965, 3'-ACTGAA-5', 3030, 3'-AGTGCA-5', 3254, 3'-AGTGAC-5', 3318, 3'-TGTGAG-5', 3508, 3'-TGTGGG-5', 3533, 3'-TCTGGA-5', 3551, 3'-AGTGGG-5', 3613, 3'-AGTGCC-5', 3748, 3'-ACTGGA-5', 3785, 3'-ACTGGA-5', 4019, 3'-AGTGAC-5', 4088, 3'-AGTGAG-5', 4127.

L boxes

The consensus sequence for the L1 box is TAAATGYA.[69] Y is (A/C/G).

M35 boxes

negative strand in the negative direction (from ZSCAN22 to A1BG) is SuccessablesM35--.bas, looking for 3'-TTGACA-5', 2, 3'-TTGACA-5', 477, 3'-TTGACA-5', 4399.

M boxes

Metal responsive elements

Proximal promoters

On the positive strand in the negative direction there is an MRE 3'-TGCACTC-5' at 4341.

Distal promoters

Positive strand in the negative direction there are 6: 3'-TGCGCTC-5', 891, 3'-TGCACTC-5', 1348, 3'-TGCACTC-5', 2001, 3'-TGCACTC-5', 2427, 3'-TGCACCC-5', 2762, 3'-TGCACTC-5', 3290.

Inverse complement, negative strand, negative direction there are 2: 3'-GTGTGCA-5', 531, 3'-GAGTGCA-5', 1772.

Inverse complement, positive strand, negative direction there are 2: 3'-GAGTGCA-5', 1470, 3'-GTGTGCA-5', 2863.

Negative strand in the positive direction there are 11: 3'-TGCGCCC-5', 453, 3'-TGCACAC-5', 549, 3'-TGCACAC-5', 1221, 3'-TGCGCCC-5', 1247, 3'-TGCACTC-5', 1373, 3'-TGCGCCC-5', 1399, 3'-TGCACTC-5', 1473, 3'-TGCGCCC-5', 1499, 3'-TGCGCCC-5', 1657, 3'-TGCACAC-5', 2963, 3'-TGCACCC-5', 3323.

Positive strand in the positive direction there are 2: 3'-TGCGCCC-5', 872, 3'-TGCGCCC-5', 972.

Inverse complement, negative strand, positive direction there are 10: 3'-GCGTGCA-5', 546, 3'-GCGCGCA-5', 684, 3'-GGGCGCA-5', 876, 3'-GGGCGCA-5', 976, 3'-GCGTGCA-5', 1218, 3'-GTGCGCA-5', 1523, 3'-GAGTGCA-5', 1786, 3'-GAGTGCA-5', 2326, 3'-GGGTGCA-5', 2800, 3'-GGGTGCA-5', 3883.

Motif ten elements

There are no MTEs in either promoter.

MYB recognition elements

P boxes

Pollen1 elements

"Electrophoretic mobility shift assays identified a pollen-specific cis-acting element POLLEN1 (AGAAA) mapped at AtACBP4 (−157/−153) which interacted with nuclear proteins from flower and this was substantiated by DNase I footprinting."[69]

Pribnow boxes

  1. negative strand in the negative direction, looking for 3'-TATAAT-5', 2, 3'-TATAAT-5', 3454, 3'-TATAAT-5', 3468,
  2. negative strand in the positive direction, looking for 3'-TATAAT-5', 1, 3'-TATAAT-5', 729,
  3. positive strand in the negative direction, looking for 3'-TATAAT-5', 0,
  4. positive strand in the positive direction, looking for 3'-TATAAT-5', 0,
  5. complement, negative strand, negative direction, looking for 3'-ATATTA-5', 0,
  6. complement, negative strand, positive direction, looking for 3'-ATATTA-5', 0,
  7. complement, positive strand, negative direction, looking for 3'-ATATTA-5', 2, 3'-ATATTA-5', 3454, 3'-ATATTA-5', 3468,
  8. complement, positive strand, positive direction, looking for 3'-ATATTA-5', 1, 3'-ATATTA-5', 729,
  9. inverse complement, negative strand, negative direction, looking for 3'-ATTATA-5', 2, 3'-ATTATA-5', 272, 3'-ATTATA-5', 603,
  10. inverse complement, negative strand, positive direction, looking for 3'-ATTATA-5', 1, 3'-ATTATA-5', 727,
  11. inverse complement, positive strand, negative direction, looking for 3'-ATTATA-5', 0,
  12. inverse complement, positive strand, positive direction, looking for 3'-ATTATA-5', 0,
  13. inverse, negative strand, negative direction, looking for 3'-TAATAT-5', 0,
  14. inverse, negative strand, positive direction, looking for 3'-TAATAT-5', 0,
  15. inverse, positive strand, negative direction, looking for 3'-TAATAT-5', 2, 3'-TAATAT-5', 272, 3'-TAATAT-5', 603,
  16. inverse, positive strand, positive direction, looking for 3'-TAATAT-5', 1, 3'-TAATAT-5', 727.

Prolamin boxes

  1. negative strand in the negative direction: 1, 3'-TGTAAAG-5', 2884,
  2. negative strand in the positive direction: 1, 3'-TGAAAAG-5', 489,
  3. positive strand in the negative direction: 1, 3'-TGAAAAG-5', 1627.

Pyrimidine boxes

Pyrimidine boxes and their complements in the negative direction: 3'-CCTTTT-5' at 2459, 3'-CCTTTT-5' at 2927, and 3'-CCTTTT-5' at 2968 occur. Inverse pyrimidine boxes and their complements occur 3'-AAAAGG-5' at 105, 3'-AAAAGG-5' at 1107, 3'-AAAAGG-5' at 3345, and 3'-AAAAGG-5' at 3441.

Pyrimidine boxes in the positive direction: 3'-CCTTTT-5' at 135 and 3'-CCTTTT-5' at 291 and their complements are close to ZNF497.

Q elements

"The basal regulatory elements identified include a putative TATA-box (−30/−24) for RNA polymerase binding and a CAAT box (−64/−61; [...]). Several putative floral expression-related cis-elements identified included a putative 6-nucleotide Q element (−770/−665), three GTGA boxes (−372/−369, −209/−206 and −164/−161) and four putative highly-conserved POLLEN1 boxes (−737/−733, −711/−707, −150/−146 and −36/−32; [...])."[69]

The consensus sequence for a Q element is 3'-AGGTCA-5'.[69]

Retinoblastoma control elements

R response elements

STAT5s

Proximal promoters

Negative strand in the positive direction there is 1: 3'-TTCCGGGAA-5', 4247.

Distal promoters

Positive strand in the negative direction there are 2: 3'-TTCGTTGAA-5', 3506, 3'-TTCCCTGAA-5', 3782.

Positive strand in the positive direction there is 1: 3'-TTCCATGAA-5', 128.

Synaptic Activity-Responsive Elements

TACTAAC boxes

Tapetum boxes

The consensus sequence for the TAPETUM box is TCGTGT.[69]

TATA boxes

Negative strand in the negative direction there are 2: 3'-TATATATA-5' at 1600 (or -2860 nts upstream from the TSS) and 3'-TATATAAA-5' at 1602 (or -2858 nts).

Positive strand in the negative direction there are 3: 3'-TATAAAAG-5' at 184 (or -4276 nts), 3'-TATAAAAG-5' at 223 (or -4237 nts), and 3'-TATATAAA-5' at 2874 (or -1586 nts).

Inverse complement, negative strand, negative direction there are 2: 3'-TATATATA-5', 1600, 3'-TTTATATA-5', 2871.

Inverse complement, positive strand, negative direction there is 1: 3'-TTTTTATA-5', 219.

TAT boxes

Only an inverse and its complement occurs between ZSCAN22 and A1BG: 3'-TACCTAT-5' at 2996 nts from ZSCAN22.

TATCCAC boxes

None occur.

T boxes

TCCACCATA elements

"Given that AtACBP4pro::GUS (−156/−67) could drive promoter activity for pollen expression, [electrophoretic mobility shift assays] EMSAs were carried out to investigate the role of the putative POLLEN1 cis-element, AGAAA (−150/−146), and its adjacent co-dependent regulatory element TCCACCATA (–141/–133)."[69]

"POLLEN1 and the TCCACCATA element are co-dependent regulatory elements responsible for pollen-specific activation of tomato LAT52 (Bate and Twell 1998)."[69]

Telomeric repeat DNA-binding factors

Copying the consensus telomeric repeat DNA-binding factor (TRF): 3'-TTAGGG-5' and putting the sequence in "⌘F" locates this sequence in the A1BG negative direction, nucleotide positions as can be found by the computer programs.

In the nucleotides between ZSCAN22 and A1BG there is at least one 3'-TTAGGG-5' beginning about 680 nucleotides from ZSCAN22 or ending at about 686 nts.

Homo sapiens genes containing these are found using Homo sapiens "TRF (TTAGGG repeat-binding factor)".

Tetradecanoylphorbol-13-acetate response elements

TGFβ control elements

TGF-β inhibitory elements

Upstream response elements

V boxes

W boxes

Proximal promoters

Inverse W boxes occur in the negative strand, negative direction of A1BG: 3'-GGTCAA-5' at 4416 and 3'-GGTCAA-5' at 4308.

W boxes occur in the positive direction, positive strand of A1BG: 3'-CTGACC-5' and its complement at 4216 and inverse W boxes occur 3'-GGTCAG-5' and its complement at 4270.

Distal promoters

A W box occurs 3'-CTGACC-5' at 3749, whereas 3'-CTGACT-5' at 17, 3'-TTGACT-5' at 130, 3'-TTGACT-5' at 307, and 3'-CTGACC-5' at 734 occur close to ZSCAN22, but 3'-CTGACT-5' at 1935 could be associated ZSCAN22 or an unknown gene between it and A1BG, along with their complements, negative strand, negative direction.

Inverse complement, positive strand, negative direction there are 5: 3'-GGTCAG-5', 440, 3'-GGTCAG-5', 577, 3'-GGTCAG-5', 713, 3'-GGTCAG-5', 2249, 3'-GGTCAG-5', 2586.

W box inverses occur 3'-GGTCAG-5' at 1353 negative direction.

W boxes 3'-AGTCAG-5' at 2101, 3'-GGTCAG-5' at 2221, 3'-AGTCAG-5' at 2608, 3'-AGTCAA-5' at 2614, and 3'-AGTCAG-5' at 2619 along with their complements, positive direction.

W boxes in the positive direction occur 3'-CTGACC-5' at 1662, 3'-CTGACC-5' at 2213, 3'-TTGACC-5' at 2873, 3'-CTGACT-5' at 2945, and 3'-TTGACC-5' at 4018 that could be associated with A1BG, along with 3'-TTGACC-5' at 1953, 3'-CTGACT-5' at 2674, and 3'-TTGACT-5' at 3735.

Inverse complement, positive strand, positive direction there are 6: 3'-GGTCAG-5', 2025, 3'-AGTCAG-5', 2099, 3'-GGTCAG-5', 2606, 3'-GGTCAG-5', 2997, 3'-GGTCAG-5', 3083, 3'-GGTCAA-5', 3380.

X boxes

There are no X boxes in either promoter.

X core promoter elements

  1. negative strand in the negative direction, looking for 3'-G/A/T-G/C-G-T/C-G-G-G/A-A-G/C-A/C-5', 1, 3'-TGGTGGGACC-5', 3744,
  2. negative strand in the positive direction, looking for 3'-G/A/T-G/C-G-T/C-G-G-G/A-A-G/C-A/C-5', 0,
  3. positive strand in the negative direction, looking for 3'-G/A/T-G/C-G-T/C-G-G-G/A-A-G/C-A/C-5', 0,
  4. positive strand in the positive direction, looking for 3'-G/A/T-G/C-G-T/C-G-G-G/A-A-G/C-A/C-5', 0,
  5. complement, negative strand, negative direction, looking for 3'-C/A/T-G/C-C-A/G-C-C-C/T-T-G/C-G/T-5', 0,
  6. complement, negative strand, positive direction, looking for 3'-C/A/T-G/C-C-A/G-C-C-C/T-T-G/C-G/T-5', 0,
  7. complement, positive strand, negative direction, looking for 3'-C/A/T-G/C-C-A/G-C-C-C/T-T-G/C-G/T-5', 1, 3'-ACCACCCTGG-5', 3744,
  8. complement, positive strand, positive direction, looking for 3'-C/A/T-G/C-C-A/G-C-C-C/T-T-G/C-G/T-5', 0,
  9. inverse complement, negative strand, negative, looking for 3'-G/T-G/C-T-C/T-C-C-A/G-C-G/C-C/A/T-5', 0,
  10. inverse complement, negative strand, positive direction, looking for 3'-G/T-G/C-T-C/T-C-C-A/G-C-G/C-C/A/T-5', 0,
  11. inverse complement, positive strand, negative direction, looking for 3'-G/T-G/C-T-C/T-C-C-A/G-C-G/C-C/A/T-5', 1, 3'-GCTCCCACCT-5', 392,
  12. inverse complement, positive strand, positive direction, looking for 3'-G/T-G/C-T-C/T-C-C-A/G-C-G/C-C/A/T-5', 0,
  13. inverse, negative strand, negative direction, looking for 3'-A/C-G/C-A-G/A-G-G-T/C-G-G/C-G/A/T-5', 1, 3'-CGAGGGTGGA-5', 392,
  14. inverse, negative strand, positive direction, looking for 3'-A/C-G/C-A-G/A-G-G-T/C-G-G/C-G/A/T-5', 1, 3'-CCAGGGTGGG-5', 102,
  15. inverse, positive strand, negative direction, looking for 3'-A/C-G/C-A-G/A-G-G-T/C-G-G/C-G/A/T-5', 0,
  16. inverse, positive strand, positive direction, looking for 3'-A/C-G/C-A-G/A-G-G-T/C-G-G/C-G/A/T-5', 0.

Y boxes

There are no Y boxes in either promoter.

Z boxes

Hypotheses

  1. Downstream core promoters may work as transcription factors even as their complements or inverses.
  2. In addition to the DNA binding sequences listed above, the transcription factors that can open up and attach through the local epigenome need to be known and specified.
  3. Each DNA binding domain serving as a transcription factor for the promoter of any immunoglobulin supergene family member, also serves or is present in the promoters for A1BG.
  4. The function of A1BG is the same as other immunoglobulin genes possessing the immunoglobulin domain cl11960 and/or any of three immunoglobulin-like domains: pfam13895, cd05751 and smart00410 in the order and nucleotide sequence: cd05751 Location: 401 → 493, smart00410 Location: 218 → 280, pfam13895 Location: 210 → 301 and cl11960 Location: 28 → 110.

See also

References

  1. "Entrez Gene: Alpha-1-B glycoprotein". Retrieved 2012-11-09.
  2. 2.0 2.1 "A1BG alpha-1-B glycoprotein". Retrieved May 10, 2013.
  3. 3.0 3.1 HGNC (13 March 2020). "ZSCAN22 zinc finger and SCAN domain containing 22 [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  4. 4.0 4.1 RefSeq (10 September 2009). "MIR6806 microRNA 6806 [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  5. Jag123 (7 March 2005). "antigen". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 7 March 2020.
  6. SemperBlotto (21 April 2008). "immunogen". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 8 March 2020.
  7. 7.0 7.1 7.2 C. Michael Gibson (27 April 2008). "Antigen". Boston, Massachusetts: WikiDoc Foundation. Retrieved 8 March 2020.
  8. Williamsayers79 (26 February 2007). "antibody". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 7 March 2020.
  9. Jag123 (7 March 2005). "antibody". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 7 March 2020.
  10. Eleonora Market, F. Nina Papavasiliou (2003) V(D)J Recombination and the Evolution of the Adaptive Immune System PLoS Biology 1(1): e16.
  11. Charles A. Janeway, Jr; et al. (2001). Immunobiolog (5th ed. ed.). Garland Publishing. ISBN 0-8153-3642-X.
  12. SemperBlotto (25 February 2006). "immunoglobulin". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 7 March 2020.
  13. SemperBlotto (28 April 2008). "immunoglobulin". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 7 March 2020.
  14. 14.0 14.1 14.2 14.3 RefSeq (10 December 2019). "A1BG alpha-1-B glycoprotein [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  15. Tian M, Cui YZ, Song GH, Zong MJ, Zhou XY, Chen Y, Han JX (2008). "Proteomic analysis identifies MMP-9, DJ-1 and A1BG as overexpressed proteins in pancreatic juice from pancreatic ductal adenocarcinoma patients". BMC Cancer. 8: 241. doi:10.1186/1471-2407-8-241. PMC 2528014. PMID 18706098.
  16. 16.0 16.1 16.2 16.3 16.4 16.5 16.6 Noriaki Ishioka, Nobuhiro Takahashi, and Frank W. Putnam (April 1986). "Amino acid sequence of human plasma 𝛂1B-glycoprotein: Homology to the immunoglobulin supergene family" (PDF). Proceedings of the National Academy of Sciences USA. 83 (8): 2363–7. doi:10.1073/pnas.83.8.2363. PMID 3458201. Retrieved 9 March 2020.
  17. 17.0 17.1 Katrina M. Morris, Denis O’Meally, Thiri Zaw, Xiaomin Song, Amber Gillett, Mark P. Molloy, Adam Polkinghorne, and Katherine Belova (7 October 2016). "Characterisation of the immune compounds in koala milk using a combined transcriptomic and proteomic approach". Scientific Reports. 6: 35011. doi:10.1038/srep35011. PMID 27713568. Retrieved 14 March 2020.
  18. R J Paxton, G Mooser, H Pande, T D Lee, and J E Shively (1 February 1987). "Sequence analysis of carcinoembryonic antigen: identification of glycosylation sites and homology with the immunoglobulin supergene family" (PDF). Proceedings of the National Academy of Sciences USA. 84 (4): 920–924. doi:10.1073/pnas.84.4.920. PMID 3469650. Retrieved 26 March 2020.
  19. NCBI (2 February 2016). "Conserved Protein Domain Family cl11960: Ig Superfamily". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 22 May 2020.
  20. NCBI (5 August 2015). "Conserved Protein Domain Family pfam13895: Ig_2". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 24 May 2020.
  21. NCBI (16 August 2016). "Conserved Protein Domain Family cd05751: Ig1_LILR_KIR_like". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 24 May 2020.
  22. NCBI (16 January 2013). "Conserved Protein Domain Family smart00410: IG_like". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 24 May 2020.
  23. 24.98.118.180 (28 February 2007). "species". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  24. 24.0 24.1 Peter coxhead (22 August 2018). "Species". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  25. Chiswick Chap (1 December 2016). "Species". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  26. 26.0 26.1 26.2 26.3 "AceView: A1BG". Retrieved May 11, 2013.
  27. Pdeitiker (26 July 2008). "variant". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  28. SemperBlotto (6 January 2007). "isoform". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2 December 2018.
  29. 72.178.245.181 (30 November 2008). "isoform". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2 December 2018.
  30. H Eiberg, ML Bisgaard, J Mohr (1 December 1989). "Linkage between alpha 1B-glycoprotein (A1BG) and Lutheran (LU) red blood group system: assignment to chromosome 19: new genetic variants of A1BG". Clinical genetics. 36 (6): 415–8. PMID 2591067. Retrieved 2017-10-08.
  31. John R. Stehle Jr., Mark E. Weeks, Kai Lin, Mark C. Willingham, Amy M. Hicks, John F. Timms, Zheng Cui (January 2007). "Mass spectrometry identification of circulating alpha-1-B glycoprotein, increased in aged female C57BL/6 mice". Biochimica et Biophysica Acta (BBA) - General Subjects. 1770 (1): 79–86. Retrieved 2017-10-08.
  32. 32.0 32.1 32.2 32.3 32.4 Caitrin W. McDonough, Yan Gong, Sandosh Padmanabhan, Ben Burkley, Taimour Y. Langaee, Olle Melander, Carl J. Pepine, Anna F. Dominiczak, Rhonda M. Cooper-DeHoff, Julie A. Johnson (June 2013). "Pharmacogenomic Association of Nonsynonymous SNPs in SIGLEC12, A1BG, and the Selectin Region and Cardiovascular Outcomes" (PDF). Hypertension. 62 (1): 48–54. doi:10.1161/HYPERTENSIONAHA.111.00823. PMID 23690342. Retrieved 2017-10-08.
  33. DTLHS (10 January 2018). "genotype". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  34. SemperBlotto (22 October 2005). "genotype". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  35. Widsith (28 March 2012). "polymorphism". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  36. 217.105.66.98 (8 September 2016). "allele". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  37. 138.130.33.215 (7 April 2004). "allele". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 25 March 2020.
  38. 38.0 38.1 B. Gahne, R. K. Juneja, and A. Stratil (June 1987). "Genetic polymorphism of human plasma alpha 1B-glycoprotein: phenotyping by immunoblotting or by a simple method of 2-D electrophoresis". Human Genetics. 76 (2): 111–5. doi:10.1007/bf00284904. PMID 3610142. Retrieved 25 March 2020.
  39. R.K. Juneja, G. Beckman, M. Lukka, B. Gahne, and C. Ehnholm (1989). "Plasma α1B-Glycoprotein Allele Frequencies in Finns and Swedish Lapps: Evidence for a New α1B Allele". Human Heredity. 39 (1): 32–36. doi:10.1159/000153828. Retrieved 25 March 2020.
  40. 40.0 40.1 R.K. Juneja, N. Saha, B. Gahne and J.S.H. Tay (1989). "Distribution of Plasma Alpha-1-B-Glycoprotein Phenotypes in Several Mongoloid Populations of East Asia". Human Heredity. 39: 218–222. doi:10.1159/000153863. Retrieved 25 March 2020.
  41. 24.235.196.118 (23 September 2007). "phenotype". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2016-10-04.
  42. SemperBlotto (14 February 2005). "phenotype". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2016-10-04.
  43. N2e (3 July 2008). "phenotype". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2016-10-04.
  44. Mardiaty Iryani Abdullah, Ching Chin Lee, Sarni Mat Junit, Khoon Leong Ng, and Onn Haji Hashim (13 September 2016). "Tissue and serum samples of patients with papillary thyroid cancer with and without benign background demonstrate different altered expression of proteins". Peer J. 4: e2450. doi:10.7717/peerj.2450. PMID 27672505. Retrieved 15 March 2020.
  45. 45.0 45.1 45.2 45.3 Udby L, Sørensen OE, Pass J, Johnsen AH, Behrendt N, Borregaard N, Kjeldsen L. (October 2004). "Cysteine-rich secretory protein 3 is a ligand of alpha1B-glycoprotein in human plasma". Biochemistry. 43 (40): 12877–86. doi:10.1021/bi048823e. PMID 15461460. |access-date= requires |url= (help)
  46. "The Opossum: Our Marvelous Marsupial, The Social Loner". Wildlife Rescue League.
  47. Journal Of Venomous Animals And Toxins – Anti-Lethal Factor From Opossum Serum Is A Potent Antidote For Animal, Plant And Bacterial Toxins. Retrieved 2009-12-29.
  48. 48.0 48.1 B Haendler, J Krätzschmar, F Theuring and W D Schleuning (July 1993). "Transcripts for cysteine-rich secretory protein-1 (CRISP-1; DE/AEG) and the novel related CRISP-3 are expressed under androgen control in the mouse salivary gland". Endocrinology. 133 (1): 192–8. doi:10.1210/en.133.1.192. PMID 8319566. Retrieved 2012-02-20.
  49. 49.0 49.1 HGNC (10 December 2019). "A1BG-AS1 A1BG antisense RNA 1 [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  50. 50.0 50.1 50.2 50.3 HGNC (10 December 2019). "ZNF497 zinc finger protein 497 [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  51. 51.0 51.1 HGNC (10 December 2019). "LOC100419840 zinc finger protein 446 pseudogene [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  52. 52.0 52.1 HGNC (10 December 2019). "LOC105372483 uncharacterized LOC105372483 [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  53. 53.0 53.1 HGNC (10 December 2019). "RNA5SP473 RNA, 5S ribosomal pseudogene 473 [ Homo sapiens (human) ]". U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information. Retrieved 2019-12-18.
  54. Francis S Collins, Eric D Green, Alan E Guttmacher, Mark S Guyer (24 April 2003). "A vision for the future of genomics research". Nature. 422 (6934): 835–47. doi:10.1038/nature01626. PMID 12695777. Retrieved 9 August 2020.
  55. The ENCODE Project Consortium (22 October 2004). "The ENCODE (ENCyclopedia of DNA Elements) Project". Science. 306 (5696): 636–640. doi:10.1126/science.1105136. PMID 15499007. Retrieved 9 August 2020.
  56. The ENCODE Project Consortium (14 June 2007). "Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project". Nature. 447 (7146): 799–816. doi:10.1038/nature05874. PMID 17571346. Retrieved 9 August 2020.
  57. Ya-Mei Wang, Ping Zhou, Li-Yong Wang, Zhen-Hua Li, Yao-Nan Zhang, and Yu-Xiang Zhang (10 August 2012). "Correlation Between DNase I Hypersensitive Site Distribution and Gene Expression in HeLa S3 Cells". PLoS One. 7 (8): e2414. doi:10.1371/journal.pone.0042414. PMID 22900019. Retrieved 9 August 2020.
  58. 58.0 58.1 RefSeq (November 2019). "LOC116286197 CRISPRi-validated cis-regulatory element chr19.6329 [ Homo sapiens (human) ]". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 25 July 2020.
  59. RefSeq (June 2018). "LOC112553117 Sharpr-MPRA regulatory region 1998 [ Homo sapiens (human) ]". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 25 July 2020.
  60. RefSeq (June 2018). "Sharpr-MPRA regulatory region 10473 [ Homo sapiens (human) ]". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 16 July 2020.
  61. RefSeq (June 2018). "Sharpr-MPRA regulatory region 7872 [ Homo sapiens (human) ]". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 1 August 2020.
  62. RefSeq (June 2018). "Sharpr-MPRA regulatory region 9894 [ Homo sapiens (human) ]". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 16 July 2020.
  63. 64.0 64.1 Tara L. Conforto, Yijing Zhang, Jennifer Sherman, and David J. Waxman (November 2012). "Impact of CUX2 on the Female Mouse Liver Transcriptome: Activation of Female-Biased Genes and Repression of Male-Biased Genes" (PDF). Molecular and Cellular Biology. 32 (22): 4611–4627. doi:10.1128/MCB.00886-12. Retrieved 8 August 2020.
  64. 65.0 65.1 PA Johnson, D Bunick, NB Hecht (1991). "Protein Binding Regions in the Mouse and Rat Protamine-2 Genes" (PDF). Biology of Reproduction. 44 (1): 127–134. Retrieved 6 April 2019.
  65. Amber Paratore Sanchez and Kumar Sharma (July 2009). "Transcription factors in the pathogenesis of diabetic nephropathy". Expert Reviews in Molecular Medicine. 11: e13. doi:10.1017/S1462399409001057. Retrieved 1 October 2018.
  66. 67.0 67.1 Robert Clifford, Min-Ho Lee, Sudhir Nayak, Mitsue Ohmachi, Flav Giorgini and Tim Schedl (December 2000). "FOG-2, a novel F-box containing protein, associates with the GLD-1 RNA binding protein and directs male sex determination in the C. elegans hermaphrodite germline" (PDF). Development. 127 (24): 5265–76. Retrieved 10 August 2020.
  67. 68.0 68.1 Sabine Schwank, Ronald Ebbert, Karin Rautenstrau𝛃, Eckhart Schweizer and Hans-Joachim Schüller (25 January 1995). "Yeast transcriptional activator IN02 interacts as an Ino2p/Ino4p basic helix-loop-helix heteromeric complex with the inositol/choline responsive element necessary for expression of phospholipid biosynthetic genes in Saccharomyces cerevisiae" (PDF). Nucleic Acids Research. 23 (2): 230–37. doi:10.1093/nar/23.2.230. Retrieved 10 August 2020.
  68. 69.0 69.1 69.2 69.3 69.4 69.5 69.6 69.7 Zi-Wei Ye, Jie Xu, Jianxin Shi, Dabing Zhang and Mee-Len Chye (January 2017). "Kelch-motif containing acyl-CoA binding proteins AtACBP4 and AtACBP5 are differentially expressed and function in floral lipid metabolism" (PDF). Plant Molecular Biology. 93: 209–225. doi:10.1007/s11103-016-0557-5. Retrieved 7 May 2020.
  69. Anna Kalousová, Vladimı́r Beneš, Jan Pačes, Václav Pačes and Zbyněk Kozmik (June 1999). "DNA Binding and Transactivating Properties of the Paired and Homeobox Protein Pax4". Biochemical and Biophysical Research Communications. 259 (3): 510–518. Retrieved 6 May 2020.
  70. G. Damante, D. Fabbro, L. Pelizari, D. Civitareale, S. Guazzi, M. Polycarpou-Schwartz, S. Cauci, F. Quadrifoglio, S. Formisano and R. Di Lauro (20 June 1994). "Sequence-specific DNA recognition by the thyroid transcription factor-1 homeodomain" (PDF). Nucleic Acids Research. 22 (15): 3075–83. Retrieved 6 May 2020.

External links

{{Phosphate biochemistry}}Template:Sisterlinks