Coverage gap estimator
Use the Lander–Waterman model to estimate what fraction of a genome is left uncovered at a given mean depth, and how many gaps to expect.
How it works
Formula
Treating reads as randomly placed (Poisson), the chance a base gets zero reads at mean depth C is e^−C. That is the expected uncovered fraction; expected gaps ≈ (number of reads) × e^−C.
Worked example
At 1× mean depth, e^−1 ≈ 0.37, so about 37% of bases are uncovered. At 5×, e^−5 ≈ 0.0067 — about 0.67%. By 10× it is under 0.005%.
When to use it
To reason about breadth, not just depth: how much of the target will be missed at a planned depth, and roughly how fragmented the result will be. Useful when low-input or cost constraints push you below the usual depth.
Sensible defaults
Defaults use 5× over a human-sized genome with 150 bp reads so the uncovered fraction is visible. Real data is gappier than the model because coverage is not perfectly uniform.
FAQ
- Why is my real coverage gappier than this?
- Lander–Waterman assumes uniform random placement. GC bias, mappability, repeats and capture unevenness all create more (and larger) gaps than the ideal model predicts.
- Does read length change the uncovered fraction?
- For a fixed mean depth the uncovered fraction is e^−C regardless of read length. Read length affects the expected number and size of gaps, not the fraction.