PLSC 781: Quantitative Genetics
Flanking Markers to Map QTL's
Reference: Haley, C.S. and S.A. Knott. 1992. A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69:315-324.
The position of a QTL can only be determined when there are markers linked on both sides of the QTL. Previously we showed that the effect of the QTL and the position of the QTL are confounded when there is only one marker linked to the QTL. This means that when there is a marker on only one side of the QTL, the position of the QTL cannot be determined. When there are markers linked to a QTL on both sides of it, these are called flanking markers. We will show that the closer our estimate of the position of the QTL is to the actual QTL position, the larger the F-value for regression becomes. When the F-value due to regression is maximized, this is the same position as the peak LOD score for a program such as Mapmaker that uses the maximum likelihood approach. The least squares solution which is used to estimate the position of the QTL is easier to understand than the maximum likelihood method and the results are identical.
Let r represent the observed recombination fraction between the two markers A and B. 2r is the probability of a crossover event between the two loci. Suppose A1 and B1 are in coupling phase linkage. The probability of an A1B1 gamete is (1-r)/2 and the probability of a A2B2 gamete is (1-r)/2. The probability of a gamete that results from recombination, such as the A1B2 or the A2B1 gametes is r/2 for either case.
Suppose we have a QTL located between the A and B marker loci. There are two alleles at the QTL locus and there are labeled Q1 and Q2. The observed recombination between the A1 and Q1 loci is rA. The observed recombination between the Q1 and B1 loci is rB. Then A1, Q1, and B1 are all linked in coupling phase.
Let (1-2r) = (1-2rA)(1-2rB). In words this equation states that the probability of no recombination between the A and B loci equals the probability of no recombination between the A and Q loci times the probability of no recombination between the Q and B loci. The above formula reduces to r = rA + rB - 2rArB. 2rArB is the probability of a double crossover event between the the A and Q as well as the Q and B loci. An observed recombinant gamete that is the result of a double cross-over, such as A1Q2B2, would occur with probability rArB/2. Likewise, the probability of an A2Q1B2 gamete would be rArB/2. The probability of observing an non-recombinant gamete such as A1Q1B1 would be (1-rA)(1-rB)/2 and this probability would also be identical for the probability
of observing an A2Q2B2 non-recombinant gamete. We develop the following table that provides the probability of each type of gamete.
| Gametes |
Probability |
Type |
| |
A1Q1B1 |
(1-rA)(1-rB)/2 |
|
parental |
| |
A1Q1B2 |
(1-rA)rB/2 |
|
single c.o. |
| |
A1Q2B1 |
rArB/2 |
|
double c.o. |
| |
A2Q1B1 |
(1-rB)rA/2 |
|
single c.o. |
| |
A1Q2B2 |
rA(1-rB)/2 |
|
single c.o. |
| |
A2Q2B1 |
(1-rA)rB/2 |
|
single c.o. |
| |
A2Q2B2 |
(1-rA)(1-rB)/2 |
|
parental |
| |
A2Q1B2 |
rArB/2 |
|
double c.o. |
The probability of A1Q1 is (1-rA)/2
and the probability of A1Q2 is rA/2.
The probability of either A1Q1 or
A1Q2 is
(1-rA)/2 + rA/2 = ½.
The probability of the A1Q2B1 gamete is the probability of a crossover between A1 and Q2 which is rA times the probability of a crossover between Q2 and B1 which is rB, divided by one-half. The reason we divide by one-half is that when a double crossover occurs the A1Q2B1 and A2Q1B2 gametes will both occur. One-half of the double crossover gametes will be A1Q2B1 and the other half of the gametes will be A2Q1B2. Then the probability of an A1Q2B1 gamete is rArB/2. Likewise, the probability of an A2Q1B2 double crossover gamete is rArB/2.
We can develop a table that shows the probability of each type of marker gamete, without considering the genotype of the QTL which is between the flanking markers. When we evaluate the progeny of a cross we can only determine the genotypes of the markers for each individual or line which is scored. The QTL is a quantitative trait and is affected by the environment as well as other loci. A quantitative trait is by definition a trait which has many loci that determine the genotype. The QTL which is positioned between the two markers is only one of many loci that influence the inheritance of that trait. Because there are many loci that affect the trait of interest, we cannot separate the genetic effect of the QTL which is positioned between markers A and B from the genetic effects of all the other loci. However, after we estimate r, rA, and rB, then we can estimate the effect of the QTL by using all the data from the various genotypes which we have scored for the markers and measured for the quantitative trait of interest.
The expected relative frequency of the A1Q2B1 gamete is the probability of the double cross-over gamete multiplied by one-half = rArB(½). Likewise, the probability of the A2Q1B2 gamete is the probability of the double cross-over gamete multiplied by one-half = rArB(1/2).
The marker genotypes A1A1B1B1 and A2A2B2B2 have expected frequencies of (1-r)2/4, because the probability of marker gametes A1B1 and A2B2 are (1-r)/2. The marker genotypes A1A1B2B2 and A2A2B1B1 have expected frequencies of r2/4, because the probability of marker gametes A1B2 and A2B1 are r/2. The probability of recombination is r and the probability of a crossover event is
2r because only two of four strands are involved. The marker genotypes A1A1B1B2, A1A2B1B1, and A1A2B2B2, and A2A2B1B2 occur with probabilities r(1-r)/4, because the probability of a crossover gamete is r/2 and the probability of a non-crossover gamete is (1-r)/2.
The marker genotype A1A2B1B2 can occur in two ways. One way is due to two non-crossover gametes uniting, and the second way is due to two crossover gametes uniting. The probability of two non-crossover gametes uniting is [1-r/2]2 = (1-r)2/4. The probability of two crossover gametes uniting is [r/2]2 = r2/4.
The probability of these two mutually exclusive events is the sum of the separate probabilities
| =(1-r)2/4+(r)2/4= |
(1-r)2+r2 |
| 4 |
| Marker |
|
|
Genotypic value of QTL genotypes |
| genotype |
|
Probability |
|
Q1Q1 |
Q1Q2 |
Q2Q2 |
| |
A1A1B1B1 |
|
(1-r)2/4 |
|
a |
d |
-a |
| |
A1A1B1B2 |
|
(1-r)r/2 |
|
a |
d |
-a |
| |
A1A1B2B2 |
|
r2/2 |
|
a |
d |
-a |
| |
A1A2B1B1 |
|
(1-r)r/2 |
|
a |
d |
-a |
| |
A1A2B1B2 |
|
(1-r)2/2 + r2/2 |
|
a |
d |
-a |
| |
A1A2B2B2 |
|
(1-r)r/2 |
|
a |
d |
-a |
| |
A2A2B1B1 |
|
r2/4 |
|
a |
d |
-a |
| |
A2A2B1B2 |
|
r(1-r)/2 |
|
a |
d |
-a |
| |
A2A2B2B2 |
|
(1-r)2/4 |
|
a |
d |
-a |
We can also develop a table for the probability of each
QTL-marker genotype.
Marker
QTL
Probability
A1A1B1B1
Q1Q1
(1-rA)2(1-rB)2/4
A1A1B1B1
Q1Q2
2(1-rA)(1-rB)rArB/4
A1A1B1B1
Q2Q2
rA2rB2/4
A1A1B1B2
Q1Q1
(1-rA)2(1-rB)rB/2
A1A1B1B2
Q1Q2
2[(1-rA)(1-rB)2rA/4
+ rArB2(1-rA)/4]
A1A1B1B2
Q2Q2
rA2rB(1-rB)/2
A1A1B2B2
Q1Q1
(1-rA)2rB2/4
A1A1B2B2
Q1Q2
2(1-rA)rB(1-rB)rA/4
A1A1B2B2
Q2Q2
rA2(1-rB)2/4
A1A2B1B1
Q1Q1
(1-rA)(1-rB)2rA/2
A1A2B1B1
Q1Q2
2[rA2rB(1-rB)/4
+ (1-rA)2rB(1-rB)/4]
A1A2B1B1
Q2Q2
rArB2(1-rA)/2
A1A2B1B2
Q1Q1
2[(1-rA)(1-rB)rArB/4
+(1-rA)rB(1-rB)rA/4]
A1A2B1B2
Q1Q2
2[(1-rA)2(1-rB)2/4
+ rA2rB2/4
+ (1-rA)2
rB2/4 + (1-rB)2rA2/4]
A1A2B1B2
Q2Q2
2[rArB(1-rA)(1-rB)/4
+ (1-rA)rBrA(1-rB)/4]
A1A2B2B2
Q1Q1
(1-rA)rB2 rA/2
A1A2B2B2
Q1Q2
2[(1-rA)2rB(1-rB)/4
+ rA2(1-rB)rB/4]
A1A2B2B2
Q2Q2
rA(1-rB)2(1-rA)/2
A2A2B1B1
Q1Q1
(1-rB)2rA2/4
A2A2B1B1
Q1Q2
2rArB(1-rA)(1-rB)/4
A2A2B1B1
Q2Q2
(1-rA)2rB2/4
A2A2B1B2
Q1Q1
(1-rB)rA2rB/2
A2A2B1B2
Q1Q2
2[(1-rB)2rA(1-rA)/4
+ (1-rA)rB2rA/4]
A2A2B1B2
Q2Q2
(1-rA)2rB(1-rB)/2
A2A2B2B2
Q1Q1
rA2rB2/4
A2A2B2B2
Q1Q2
2rArB(1-rA)(1-rB)/4
A2A2B2B2
Q2Q2
(1-rA)2(1-rB)2/4
In the above table there is a coefficient of 2 for heterozygous QTL’s. The probability of a given QTL genotype is conditional on having the specified marker genotype. Because of double crossovers, the probability of A1A1B1B1 is different than the probability of A1A1Q1Q1B1B1.
Example: We can determine the additive and dominance effects for the QTL, conditional on the specified marker genotype. The mean of homozygous QTL genotypes for the A1A1B1B1 marker is a[(1-rA)2(1-rB)2/4]
- (rA2rB2)]/4.
The mean of the heterozygous QTL genotype associated with
the A1A1B1B1
marker genotype is d[2(1 - rA)(1 - rB)rArB/4]
. The additive effect of the QTL conditional on the marker
being A1A1B1B1
is
| a[(1-rA)2(1-rB)2]/4-a(rA2rB2)/4 |
| (1-r)2/4 |
because (1 - r)2/4 is the probability of the
A1A1B1B1 marker
and a conditional probability is defined as
| Prob.(A1A1Q1Q1B1B1 or A1A1Q2Q2B1B1) |
| Prob. (A1A1B1B1) |
for the homozygous genotypes A1A1Q1Q1B1B1
or A1A1Q2Q2B1B1
given the marker genotype is A1A1B1B1.
The additive effect of the QTL, conditional on the A1A1B1B1
marker gentotype is
| a[(1-rA)2(1-rB)2 - (rA2rB2)] |
| (1-r)2 |
because the divisors of 4 cancel out. The dominance effect
of the QTL, conditional on the A1A1B1B1
marker genotype is
| d[2rArB(1-rA)(1-rB)] |
| (1-r)2 |
Now the recombination value between the QTL and flanking markers can be estimated. We have nine equations, one for each marker class. The reason we have nine equations is because we know the phenotypic value for each marker class. We estimate the ‘a’ and ‘d’ genetic effects for a known ‘r’ with rA and rB choosen for each 1 cM interval between markers. For example, let r=0.1648 and set rA = rB = 0.0906 for the first evaluation. Then for the A1A1B1B1 marker class the coefficient for ‘a’ is
| [(1-rA)2(1-rB)2-(rA2rB2)] |
| (1-r)2 |
| = |
[(1-0.0906)2(1-0.0906)2-(0.0906)2(0.0906)2] |
| (1-0.1648)2 |
| The coefficient of 'd' is |
d[2rArB(1-rA)(1-rB)] |
| (1-r)2 |
| = |
[2(0.0906)(0.9094)(0.0906)(0.9094)] |
=0.0194632 |
| (0.8352)2 |
The coefficients for ‘a’ and ‘d’ are the relative frequencies used to determine the mean of each marker class.
Then the following equations can be set up to solve for the ‘a’ and ‘d’ effects. The Freq test statistic for each set of rA and rB values:
| Marker genotype |
Marker value |
Coefficients |
| |
A1A1B1B1 |
Y1111 |
|
0.9804 |
0.0195 |
| |
A1A1B1B2 |
Y1112 |
|
0.4902 |
0.5 |
| |
A1A1B2B2 |
Y1122 |
|
0.0 |
0.5 |
| |
A1A2B1B1 |
Y1211 |
|
0.4902 |
0.5 |
| |
A1A2B1B2 |
Y1212 |
|
0.0 |
0.9625 |
| |
A1A2B2B2 |
Y1222 |
- |
0.4902 |
0.5 |
| |
A2A2B1B1 |
Y2211 |
|
0.0 |
0.5 |
| |
A2A2B1B2 |
Y2212 |
- |
0.4902 |
0.5 |
| |
A2A2B2B2 |
Y2222 |
- |
0.9803 |
0.0195 |
Now we can calculate the Yijkl values for each
genotype.
For example,
Y1111 = 0.9803(a) + 0.0195(d)
Y1112 = 0.4902(a) + 0.5(d)
Y1122 = 0.5(d)
Y1211 = 0.4902(a) + 0.5(d)
Y1212 = 0.9625(d)
Y1222 = -0.4902(a) + 0.5(d)
Y2211 = 0.5(d)
Y2212 = -0.4902(a) + 0.5(d)
Y2222 = -0.9803(a) + 0.0195(d)
The test statistic for determining plotting the position
of the QTL is
| (p)MS(regression) |
=pFreg |
| Ms(residual) |
To solve for unknowns ‘a' and ‘d' we use least squares
equations B = (X'X)-1X'Y and
| B = |
m |
Y = |
Y1111 |
X = |
1 |
|
0.9803 |
0.0195 |
| |
a |
|
Y1112 |
|
1 |
|
0.4902 |
0.5 |
| |
d |
|
Y1122 |
|
1 |
|
0 |
0.5 |
| |
|
Y1211 |
|
1 |
|
0.4902 |
0.5 |
| |
|
Y1212 |
|
1 |
|
0 |
0.9625 |
| |
|
Y1222 |
|
1 |
- |
0.4902 |
0.5 |
| |
|
Y2211 |
|
1 |
|
0 |
0.5 |
| |
|
Y2212 |
|
1 |
- |
0.4902 |
0.5 |
| |
|
Y2222 |
|
1 |
- |
0.9803 |
0.0195 |
Now B'X'Y = Sum of Squares
regression with p degrees of freedom and p = 3. Y'Y -
B'X'Y = Sum of Squares residual with n - p degrees of
freedom. We derive the following ANOVA table:
| Source |
Sums of Squares |
df |
Mean Squares |
Description |
| model |
B'X'Y |
p |
B'X'Y/p |
regression |
| residual |
Y'Y-B'X'Y |
n - p |
[Y'Y-B'X'Y]/n-p |
residual |
| Total |
Y'Y |
n |
|
|
The coefficients for ‘a' and ‘d' vary, depending
on the distance between marker A and the QTL (rA), the
distance between marker B and the QTL (rB), and the distance
between markers A and B (r).
In the last numerical example,
we set r = 0.1648, rA = rB = 0.0906. The recombination
between the markers does not change. We must use the estimate
of r, based on the data. Then we vary rA and rB to determine
the position of the QTL. As we vary rA, then rB must also
vary. We use the equation (1 - 2r) = (1 - 2rA)(1 - 2rB)
with r fixed, as we try a different value of rA we then
use the equation to solve for rB. For example, r = 0.1648,
and we set rA = 0.12 and solve for rB. (1 - 0.3296) =
(1 - 0.24)(1- 2rB). Solving produces rB = 0.05895. New
values of rA = 0.12 and rB = 0.05895 result in a new F-value.
We plot pFreg as we vary rA and rB. The best estimate
of rA and rB will maximize the F-value and when the F-value
is maximized this is the position of the QTL between markers
A and B.
Copyright
2000©, Ted Helms
|