The Simple Ratio
Marker Coverage of the Genome
Source: Statistical Genomics: Linkage, mapping and QTL
analysis. B.H. Liu. Pg. 349
“Marker coverage is the simple ratio between genomic
map length and the total genome length. Map density
is the average or the maximum distance between adjacent
markers” The total genome length is the sum of the lengths
of each chromosome in the genome. Suppose we have three
non-homologous chromosomes with lengths of 100 cM each.
Then the total genome length is: L1 + L2
+ L3 = 100 cM + 100 cM + 100 cM = 300 cM.
= L The soybean genome is estimated to be 3000 cM. For
soybean L = 3000 cM
Suppose we want the distance between markers to be at
most 2d. Let d = 10 cM. If markers are evenly spaced
on the genome then: L/2d = the number of markers needed
= n. 100% marker coverage of the soybean genome would
be achieved with n = L/2d = 3000 cM/20cM = 150 markers;
provided that we had prior knowledge of the location
of the markers such that we could use markers that were
evenly spaced at 20 cM intervals.
The marker coverage (c) is given by the following formula:
| c= |
genomic map length |
| total map length |
However, markers are not evenly distributed across the
genome. In some cases we do not have a map to allow
us to identify the map position of the markers. If we
had no prior knowledge of the marker positions and had
to use random markers, then we could use other formulas
to determine how many markers we need to achieve a certain
marker coverage. The proportion of the genome that is
not covered by a marker located within d cM is
1 - c = (1 - 2d)n.
Then marker coverage can be estimated for the case of
randomly distributed markers using the formula: c =
1 - e-2dn/L.
Where 2d is the distance between markers;
n is the number of randomly distributed markers; and
L is the total map length.
Example: Let L = 3000 cM; d = 10 cM; and n =
150 markers
then -2dn/L = -2(10)150/3000 = -1
c = 1 - e-2dn/L
= 1 - e-1 = 1 - 0.37 = 0.63 = 63%
This example shows that 150 randomly distributed markers
would cover 63% of the genome. Previously, we showed
that if we could place markers at evenly spaced intervals
of 20 cM we would only need 150 markers to get 100%
coverage of the soybean genome of 3000 cM. This example
shows the advantage of being able to use prior knowledge
of marker location such that 150 markers would completely
cover the genome, while random markers would only cover
63% of the genome. We can use the formula:
to determine how many markers we need to cover a proportion
c of the genome. Suppose that we want to determine how
many markers will be required to get c = 95% coverage
of the genome when we are using random markers and we
want each marker to be within 20 cM of the other markers.
Using the formula:
| n= |
-L In(1-c) |
= |
-3000 In(1-0.95) |
= |
-3000(-2.995) |
=449 markers |
| 2d |
20 |
20 |
This result shows that we only needed 150 evenly spaced
markers to cover 100% of the soybean genome with 20
cM distance between markers, but we would need 449 markers
to get 95% coverage of the soybean genome using random
markers.
Number of Progeny Required for Minimum
Confidence Interval
Source: Pg. 192 Statistical Genomics: Linkage, mapping
and QTL analysis. B.H. Liu.
Let c = the maximum range of the 95% confidence interval
for our estimate of p; i = I(p) is the expected information
content per individual as p is varied; N is the number
of individuals required to obtain the desired confidence
interval. For the case of a 95% confidence interval,
we know that Za
= 1.96. 2(1.96) = 3.92.
The formula is as follows:
Example: For the testcross family with repulsion
linkage and p = 0.4, we have shown that
| I(p)=i= |
1 |
= |
1 |
=4.167=i |
| p(1-p) |
0.4(0.6) |
Now let us find the number of individuals required for
a 95% confidence interval for p = 0.4 in the case of
testcross family when we want the size of the confidence
interval to be c = 0.02
| using the formula N> |
(3.92)2 |
= |
(3.92)2 |
=904 individuals |
| c2I(p) |
(0.02)2(4.167) |
Example: Let us find the number of individuals
needed in our family when we estimate p for the testcross
when p = 0.05 and we want the 95% confidence interval
to be c = 0.02.
| N > |
(3.92)2 |
=1830 |
| (0.02)2(21.05) |
We knew that I(p) = 21.05 because
I(p) = 1/ p(1-p) = 1/ 0.05(1 - 0.05) = 21.05 = if
Example: Let us find N when we have p = 0.4 for
the F2 family in repulsion-phase linkage.
| I(p)= |
2(1+2p2) |
=1.46 then N > |
(3.92)2 |
=26500 |
| (1-p2)(2+p2) |
(0.02)2(1.46) |
We can use the above formulas to plan experiments to
have a pre-determined level of precision.