Open reading frame
Synonyms and Keywords: ORF
An open reading frame or ORF is a portion of an organism's genome which contains a sequence of bases that could potentially encode a protein. The start and stop ends of the ORF are not equivalent to the ends of the mRNA, but they are usually contained within the mRNA. In a gene, ORFs are located between the start-code sequence (initiation codon) and the stop-code sequence (termination codon). ORFs are usually encountered when sifting through pieces of DNA while trying to locate a gene. Since there exist variations in the start-code sequence of organisms with altered genetic code, the ORF will be identified differently. A typical ORF finder will employ algorithms based on existing genetic codes (including the altered ones) and all possible reading frames.
In fact, the existence of an ORF, especially a long one, is usually a good indication of the presence of a gene in the surrounding sequence. In this case, the ORF is part of the sequence that will be translated by the ribosomes, it will be long, and if the DNA is eukaryotic, the ORF may continue over gaps called introns. However, short ORFs can also occur by chance outside of genes. Usually ORFs outside genes are not very long and terminate after a few codons.
Once a gene has been sequenced it is important to determine the correct open reading frame (ORF). Theoretically, the DNA sequence can be read in six reading frames in organisms with double-stranded DNA; three in the forward and three in the reverse direction. The longest sequence without a stop codon usually determines the open reading frame. That is the case with prokaryotes. Eukaryotic mRNA is typically monocistronic and therefore only contains a single ORF. A problem arises when working with eukaryotic pre-mRNA long parts of the DNA within an ORF are not translated (introns). When the aim is to find eukaryotic open reading frames it is necessary to have a look at the spliced messenger RNA mRNA.
For example, if you have 5'-UCUAAAGGUCCA-3' it has 2 out of 3 reading frames possible. This is one of the 2 possible mRNA sequences of the transcript, and we see that it can be reading in the 3 possible ways: 1. UCU AAA GGU CCA 2. CUA AAG GUC etc 3. UAA AGG UCA etc As you can see, the 3rd possibility has a stop codon, thus only 2 out of the 3 reading frames are open (aka have no stop codons).
One common use of open reading frames is as one piece of evidence to assist in gene prediction. Long ORFs are often used, along with other evidence, to initially identify candidate protein coding regions in a DNA sequence. The presence of an ORF does not necessarily mean that the region is ever translated. For example in a randomly generated DNA sequence with an equal percentage of each nucleotide, a stop-codon would be expected once every 21 codons. A simple gene prediction algorithm for prokaryotes might look for a start codon followed by an open reading frame that is long enough to encode a typical protein, where the codon usage of that region matches the frequency characteristic for the given organism's coding regions. By itself even a long open reading frame is not conclusive evidence for the presence of a gene.
If a portion of a genome has been sequenced, ORFs can be located by examining each of the three possible reading frames on each strand. Possible stop codons in DNA are "TGA", "TAA" and "TAG".
Since DNA has two anti-parallel strands, an additional three reading frames arise, giving a possible six frame translations.
ORF finding tools
- ORF Finder: The ORF Finder (Open Reading Frame Finder) is a graphical analysis tool which finds all open reading frames of a selectable minimum size in a user's sequence or in a sequence already in the database. This tool identifies all open reading frames using the standard or alternative genetic codes. The deduced amino acid sequence can be saved in various formats and searched against the sequence database using the WWW BLAST server. The ORF Finder should be helpful in preparing complete and accurate sequence submissions. It is also packaged with the Sequin sequence submission software.(sequence analyser)
- ORF Investigator: ORF Investigator is a program which not only gives information about the coding and non coding sequences but also can perform pairwise global alignment of different gene/DNA regions sequences. The tool efficiently finds the ORFs for corresponding amino acid sequences and converts them into their single letter amino acid code, and provides their locations in the sequence. The pairwise global alignment between the sequences makes it convenient to detect the different mutations, including single nucleotide polymorphism. Needleman and Wunsch algorithms are used for the gene alignment. The ORF Investigator is written in the portable Perl programming language, and is therefore available to users of all common operating systems.
- ORFPredictor: OrfPredictor is a web server designed for identifying protein-coding regions in expressed sequence tag (EST)-derived sequences. For query sequences with a hit in BLASTX, the program predicts the coding regions based on the translation reading frames identified in BLASTX alignments, otherwise, it predicts the most probable coding region based on the intrinsic signals of the query sequences. The output is the predicted peptide sequences in the FASTA format, and a definition line that includes the query ID, the translation reading frame and the nucleotide positions where the coding region begins and ends. OrfPredictor facilitates the annotation of EST-derived sequences, particularly, for large-scale EST projects.