Why is my real coverage gappier than this?

Lander–Waterman assumes uniform random placement. GC bias, mappability, repeats and capture unevenness all create more (and larger) gaps than the ideal model predicts.

Does read length change the uncovered fraction?

For a fixed mean depth the uncovered fraction is e^−C regardless of read length. Read length affects the expected number and size of gaps, not the fraction.

Coverage gap estimator

Use the Lander–Waterman model to estimate what fraction of a genome is left uncovered at a given mean depth, and how many gaps to expect.

How it works

Formula

Treating reads as randomly placed (Poisson), the chance a base gets zero reads at mean depth C is e^−C. That is the expected uncovered fraction; expected gaps ≈ (number of reads) × e^−C.

Worked example

At 1× mean depth, e^−1 ≈ 0.37, so about 37% of bases are uncovered. At 5×, e^−5 ≈ 0.0067 — about 0.67%. By 10× it is under 0.005%.

When to use it

To reason about breadth, not just depth: how much of the target will be missed at a planned depth, and roughly how fragmented the result will be. Useful when low-input or cost constraints push you below the usual depth.

Sensible defaults

Defaults use 5× over a human-sized genome with 150 bp reads so the uncovered fraction is visible. Real data is gappier than the model because coverage is not perfectly uniform.

FAQ

Why is my real coverage gappier than this?: Lander–Waterman assumes uniform random placement. GC bias, mappability, repeats and capture unevenness all create more (and larger) gaps than the ideal model predicts.
Does read length change the uncovered fraction?: For a fixed mean depth the uncovered fraction is e^−C regardless of read length. Read length affects the expected number and size of gaps, not the fraction.