Brought to you by molecularsciences.org.
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License.
This publication may not be redistributed without this notice.

Prerequisite Knowledge

A gene is the fundamental physical and functional unit of heredity. It is an ordered sequence of nucleotides located in a particular position on a particular chromosome that encodes a specific function product (RNA or protein).

An Open Reading Frame (ORF) is a series of DNA codons which do not contain any stop codons.

A Coding Sequence (CDS) is a region of DNA or RNA whose sequence determines the sequence of amino acids in a protein.

Frames always read from 5’ to 3’.

Prokaryotic gene model

Prokaryotes have small genomes with high gene density. They contain operons, which mean that one transcript results in many genes. Since there are no introns, one gene produces one protein. There is one ORF per gene. ORFs begin with start codon and end with stop codon. There are conserved promoter regions around the start sites of transcription and translation. Genes often overlap in prokaryotes.

The principal difficulties with prokaryote gene prediction are overlapping ORFs, short genes, and finding promoters. In spite of these difficulties, gene prediction in prokaryotes is 99% accurate.

Eukaryotic gene structure