A Small Population Size

Marker Coverage of the Genome

Number of Progeny Required for Minimum Confidence Interval

The Simple Ratio

 

  Click here for a
printer-friendly version

Marker Coverage of the Genome

Source: Statistical Genomics: Linkage, mapping and QTL analysis. B.H. Liu. Pg. 349

“Marker coverage is the simple ratio between genomic map length and the total genome length. Map density is the average or the maximum distance between adjacent markers”

The total genome length is the sum of the lengths of each chromosome in the genome. Suppose we have three non-homologous chromosomes with lengths of 100 cM each. Then the total genome length is: L1 + L2 + L3 = 100 cM + 100 cM + 100 cM = 300 cM. = L The soybean genome is estimated to be 3000 cM. For soybean L = 3000 cM

Suppose we want the distance between markers to be at most 2d. Let d = 10 cM. If markers are evenly spaced on the genome then: L/2d = the number of markers needed = n. 100% marker coverage of the soybean genome would be achieved with
n = L/2d = 3000 cM/20cM = 150 markers; provided that we had prior knowledge of the location of the markers such that we could use markers that were evenly spaced at 20 cM intervals.

The marker coverage (c) is given by the following formula:
c =
genomic map length
total map length

However, markers are not evenly distributed across the genome. In some cases we do not have a map to allow us to identify the map position of the markers. If we had no prior knowledge of the marker positions and had to use random markers, then we could use other formulas to determine how many markers we need to achieve a certain marker coverage. The proportion of the genome that is not covered by a marker located within d cM is
1 - c = (1 - 2d)n.

Then marker coverage can be estimated for the case of randomly distributed markers using the formula:
c = 1 - e-2dn/L.
Where 2d is the distance between markers; n is the number of randomly distributed markers; and L is the total map length.
Example:
Let  
L = 3000 cM;
  d = 10 cM;
and  
n = 150 markers,
then  
-2dn/L = -2(10)150/3000 = -1.
  c = 1 - e-2dn/L = 1 - e-1 = 1 - 0.37 = 0.63 = 63%.
This example shows that 150 randomly distributed markers would cover 63% of the genome. Previously, we showed that if we could place markers at evenly spaced intervals of 20 cM we would only need 150 markers to get 100% coverage of the soybean genome of 3000 cM. This example shows the advantage of being able to use prior knowledge of marker location such that 150 markers would completely cover the genome, while random markers would only cover 63% of the genome. We can use the formula:
n =
-L ln(1 - c)
2d
to determine how many markers we need to cover a proportion c of the genome. Suppose that we want to determine how many markers will be required to get c = 95% coverage of the genome when we are using random markers and we want each marker to be within 20 cM of the other markers.

Using the formula,
n =
-L ln(1 - c)
=
-3000 ln(1 - 0.95)
=
-3000(-2.995)
= 449 markers.
2d
20
20

This result shows that we only needed 150 evenly spaced markers to cover 100% of the soybean genome with 20 cM distance between markers, but we would need 449 markers to get 95% coverage of the soybean genome using random markers.

Copyright 2000©, Ted Helms

Back | Home | Top | Next
Home Forward Back