A Small Population Size

Marker Coverage of the Genome

Number of Progeny Required for Minimum Confidence Interval

The Simple Ratio

 

  Click here for a
printer-friendly version

The Simple Ratio

Marker Coverage of the Genome
Source: Statistical Genomics: Linkage, mapping and QTL analysis. B.H. Liu. Pg. 349

“Marker coverage is the simple ratio between genomic map length and the total genome length. Map density is the average or the maximum distance between adjacent markers” The total genome length is the sum of the lengths of each chromosome in the genome. Suppose we have three non-homologous chromosomes with lengths of 100 cM each. Then the total genome length is: L1 + L2 + L3 = 100 cM + 100 cM + 100 cM = 300 cM. = L The soybean genome is estimated to be 3000 cM. For soybean L = 3000 cM
Suppose we want the distance between markers to be at most 2d. Let d = 10 cM. If markers are evenly spaced on the genome then: L/2d = the number of markers needed = n. 100% marker coverage of the soybean genome would be achieved with n = L/2d = 3000 cM/20cM = 150 markers; provided that we had prior knowledge of the location of the markers such that we could use markers that were evenly spaced at 20 cM intervals.

The marker coverage (c) is given by the following formula:
c= genomic map length
total map length
 
However, markers are not evenly distributed across the genome. In some cases we do not have a map to allow us to identify the map position of the markers. If we had no prior knowledge of the marker positions and had to use random markers, then we could use other formulas to determine how many markers we need to achieve a certain marker coverage. The proportion of the genome that is not covered by a marker located within d cM is
1 - c = (1 - 2d)n.

Then marker coverage can be estimated for the case of randomly distributed markers using the formula: c = 1 - e-2dn/L.

Where 2d is the distance between markers; n is the number of randomly distributed markers; and L is the total map length.
Example: Let L = 3000 cM; d = 10 cM; and n = 150 markers
then -2dn/L = -2(10)150/3000 = -1

      c = 1 - e-2dn/L = 1 - e-1 = 1 - 0.37 = 0.63 = 63%

This example shows that 150 randomly distributed markers would cover 63% of the genome. Previously, we showed that if we could place markers at evenly spaced intervals of 20 cM we would only need 150 markers to get 100% coverage of the soybean genome of 3000 cM. This example shows the advantage of being able to use prior knowledge of marker location such that 150 markers would completely cover the genome, while random markers would only cover 63% of the genome. We can use the formula:

n= -L In(1 - c)
2d

to determine how many markers we need to cover a proportion c of the genome. Suppose that we want to determine how many markers will be required to get c = 95% coverage of the genome when we are using random markers and we want each marker to be within 20 cM of the other markers.

Using the formula:
        
n= -L In(1-c) = -3000 In(1-0.95) = -3000(-2.995) =449 markers
2d 20 20

This result shows that we only needed 150 evenly spaced markers to cover 100% of the soybean genome with 20 cM distance between markers, but we would need 449 markers to get 95% coverage of the soybean genome using random markers.

Number of Progeny Required for Minimum Confidence Interval

Source: Pg. 192 Statistical Genomics: Linkage, mapping and QTL analysis. B.H. Liu.

Let c = the maximum range of the 95% confidence interval for our estimate of p; i = I(p) is the expected information content per individual as p is varied; N is the number of individuals required to obtain the desired confidence interval. For the case of a 95% confidence interval, we know that Za = 1.96. 2(1.96) = 3.92.

The formula is as follows:
N > (3.92)2
c2I(p)

Example: For the testcross family with repulsion linkage and p = 0.4, we have shown that
I(p)=i=     1     =     1     =4.167=i
p(1-p) 0.4(0.6)
        
Now let us find the number of individuals required for a 95% confidence interval for p = 0.4 in the case of testcross family when we want the size of the confidence interval to be c = 0.02
using the formula N> (3.92)2 =     (3.92)2      =904 individuals
c2I(p) (0.02)2(4.167)

Example: Let us find the number of individuals needed in our family when we estimate p for the testcross when p = 0.05 and we want the 95% confidence interval to be c = 0.02.
N >     (3.92)2     =1830
(0.02)2(21.05)
We knew that I(p) = 21.05 because
I(p) = 1/ p(1-p) = 1/ 0.05(1 - 0.05) = 21.05 = if
Example: Let us find N when we have p = 0.4 for the F2 family in repulsion-phase linkage.
I(p)=   2(1+2p2)   =1.46 then N >     (3.92)2     =26500
(1-p2)(2+p2) (0.02)2(1.46)
We can use the above formulas to plan experiments to have a pre-determined level of precision.


Copyright 2000©, Ted Helms

Back | Home | Top | Next