Open reading frames and why the frame matters

The same DNA can be read six different ways, and most of them are gibberish. Finding the gene means finding the one frame — on the right strand — that reads cleanly from a start to a stop.

What a reading frame is

Protein-coding DNA is read three bases at a time; each triplet is a codon. Where you start changes everything: begin one base over and every following codon shifts, producing a completely different sequence of amino acids. That gives three possible frames on the forward strand. The reverse-complement strand can also be read in three frames, so there are six reading frames in total.

Start codons, stop codons, and ORFs

Translation starts at a start codon — ATG (methionine) — and continues until it hits an in-frame stop codon: TAA, TAG or TGA. An open reading frame (ORF) is a run from a start codon to the next in-frame stop, long enough to plausibly encode a real protein. A frame full of frequent stops, or with no start, is not coding.

Why one frame isn’t enough to check

Because a one-base shift rewrites every codon, checking a single frame — or only the forward strand — can completely miss the real gene. A meaningful sequence in one frame becomes nonsense in the next, and plenty of genes sit on the reverse strand. So you scan all six.

A worked example

Take the short sequence ATGAAATAG. Read in the first frame it splits as:

ATG · AAA · TAG → Met · Lys · stop

That is a clean little ORF — a start, one coding codon, and a stop (protein “MK”). Now shift the reading frame by a single base and read from the second position:

A · TGA · AAT · AG → (TGA is a stop; no ATG to start)

The very first full codon in the shifted frame, TGA, is a stop, and there is no ATG to open a frame at all — so the same nine bases encode nothing in this frame. One base of offset turned a tidy ORF into noise, which is exactly why the frame (and the strand) has to be right.