|
|
The Permutation Test is Used to Reduce
the Probability of Finding False Marker-QTL Associations
Citations:
Churchill, G.A., and R.W. Doerge. 1994. Empirical threshold
values for quantitative trait mapping. Genetics 138:963-971.
Knott, S.A. and C.S. Haley. 1992. Aspects of maximium
likelihood methods for the mapping of quantitative trait
loci in line crosses. Genet. Res. 60:139-151.
Lander, E.S. and D. Botstein. 1989. Mapping Mendelian
factors underlying quantitative traits using RFLP linkage
maps. Genetics 121:185-199.
Liu, B.H. 1998. Statistical Genomics: Linkage, mapping
and QTL analysis. Pg. 570. CRC Press, Boca Raton.
Synopsis:
1)The probability of Type I error is determined
for the entire experiment by using a random sample of
permutations.
2)The experiment wise error rate is different for
each experiment. Factors that affect the experiment wise
error include: the sample size, the genome size of that
specie, the number of markers evaluated, the number of
QTL that influence the trait, and the magnitude of the
effects of the QTL's (Churchill and Doerge, 1994).
3)If the probability of Type I error is set at
5% for each marker-QTL association test, the probability
of identifying marker-QTL associations that do not exist
is unacceptably high.
4)If the point-wise Type I error rate is set at
5%, and 100 marker intervals are evaluated, we would expect
to find 5 marker-QTL associations that are spurious. This
means that we would expect to report 5 markers that are
associated with QTL's and none of the QTL's would be real.
Assuming each marker-QTL test is independent, as in the
sparse marker map.
5)If the experiment wise Type I error rate is set
at 5%, and 100 marker intervals are evaluated, we would
expect that at least one marker-QTL association is spurious
in 5% of such experiments.
Experiment Type I Error:
Let a = the probability of
making a Type I error. 1 - a
= the probability of making a correct decision, that is
to accept Ho when Ho is true. If a
= 0.05 then 1 - a = 0.95. If we evaluate one marker-QTL
association then the probability of making the correct
decision is 0.95. Suppose we test for a second marker-QTL
association and this test is conducted on a different
chromosome than that of the first test, now we have made
two independent tests of Ho. The probability of making
two correct decisions is (0.95)(0.95) = (0.95)2 = 0.9025.
The probability of making at least one wrong decision
(Type I error) is 1 - (0.95)2 = 0.0975 (Knot and Haley,
1992).
Suppose we have n independent tests of marker-QTL associations,
then the probability of making at least one Type I error
is P = 1 - (1 - a)n.
Where P is the experiment wise Type I error rate. We can
set a at some other value than
a = 0.05 for each individual
test. The individual tests for marker-QTL associations
are sometimes referred to as point-wise tests.
When we
use a LOD score to test for the presence of a marker-QTL
association, we are testing Ho1 :no QTL is present or
Ho2: a QTL is present but is not linked to the marker
versus HA: a QTL is present and linked to the marker (Knott
and Haley, 1992). If Ho is true, we would expect to falsely
reject Ho 5% of the time when a = 0.05. This means that
for each marker interval - QTL test we would expect to
falsely identify a QTL 5 times out of 100 repetitions
of the experiment. The point-wise Type I error rate is
the probability of falsely identifying a marker interval
- QTL association within that single interval between
the two markers.
The Experiment wise Type I error rate
is the probability of falsely identifying a QTL for the
entire experiment. Each test for a marker interval - QTL
association is a separate test. When we have many markers
and many intervals, we are conducting a test of Ho at
each interval. The more marker intervals we evaluate for
the presence of a QTL, the more likely we are to falsely
identify a QTL somewhere in the genome. The experiment
wise error rate is much higher than the point wise error
rate, because the point wise error rate is only for a
single test of Ho.
Example 1: The first case is for a
sparse map where the markers are far enough apart that
they are not linked. Let m= number of markers and a = 0.05
for each individual test of Ho. Then the probability of
identifying at least one false QTL is given by
P = 1 - (1 - a)m.
For our example, let the point-wise error rate be a
= 0.05, then P = 1 - (0.95)100 = 0.994. This
means that there is a 99% chance that at least one marker-QTL
association will not be real. If we set a
= 0.0001, then P = 1 - (0.9999)100 = 0.01. The experiment-wise
error rate is the probability of making at least one Type
I error across all marker-QTL associations that are tested
in the whole experiment. The experiment wise error rate
is now 0.01.
Example 2:
For the case of dense markers, the markers are close enough
to be linked. In this case we use the formula provided
by Lander and Botstein (1989): P = 1 - (1 - a)X.
Where X = genome length in Morgans/length between markers.
In soybean the genome length is about 30 M and suppose
we evaluate a marker space every 20 cM. Then X = 30/0.2
= 150. When we set the point-wise error rate at a
= 0.001, then P = 1 - (1 - 0.001)150 = 0.14,
which is still an unacceptably high probability of identifying
at least one QTL that does not exist. If we set the point-wise
error rate at a = 0.0001, then
P = 1 - (1 - 0.0001)150 = 0.02. The experiment-wise
error rate is 0.02.
Determining
the appropriate Type I error rate for individual marker-QTL
tests:
For the case of the sparse map, where markers are not
closely linked, the point-wise error rate should be a
= P/M. Where a is the point-wise
Type I error rate, P is the experiment wise error rate
and M is the number of markers evaluated.
Example 1:
In this example we assume that we have a sparse map where
markers are not linked and each point-wise test is independent.
We set the experiment-wise error rate at P = 0.05 and
evaluate 100 markers. The point-wise error rate will be
0.05/100 = 0.0005 = a. This
corresponds to a LOD score of ½(log10e) (ZP/M)2.
In this example, the LOD = ½(0.434)(3.28)2
= 2.33. Where the log10e = 0.434; P/M = 0.0005; ZP/M =
3.28. With a LOD of 2.33 we expect to reject Ho when it
is true 5% of the 100 tests. This means we expect that
five of the marker-QTL associations we identify will be
false QTL's. If we use the equation:
P
= 1 - (1 - 0.0005)100 = 0.05.
The experiment wise error rate is the probability of making
at least one Type I error, which equals 0.05 when the
point-wise error rate is set at a
= 0.0005. Even when the Type I error rate is set at 0.0005
for each individual marker interval-QTL association test,
the probability of Type I error for the 100 markers is
still P = 0.05. This is because the more tests we make
for marker-QTL associations, the greater the chances that
we will falsely reject Ho when it is actually true.
Example 2: For the
case of the dense map, the critical LOD score is best
determined using the permutation test of Churchill and
Doerge (1994). Each experiment has a different sample
size, genome size, map density, and proportion of missing
data. In a mapping experiment, some markers are linked,
while other markers are not linked. This makes it difficult
to determine the critical value (LOD) to declare a marker-QTL
association to be significant. Each mapping experiment
is unique in the number of markers and linkage relationships
between markers. The permutation test is used to develop
an empirical distribution to determine the critical value
of the LOD score. The linkage relationships between markers
are maintained, but the linkage relationships between
markers and QTL's are eliminated (Liu, 1998). The permutation
test works by randomly assigning the phenotypic data to
each genotype. If Ho is true, then a LOD score that results
in a 5% rejection rate of Ho, will provide an P = 0.05
experiment wise error rate. Randomly assigning phenotypic
data to each data set maintains the linkage relationships
between markers that is unique to that experiment. A sample
of say, 1000 different random permutations of the phenotypic
data is used. If Ho is true and a = 0.05, then we will falsely
find marker-QTL associations in 50 of these random permutations
due to chance associations between markers and phenotypic
scores. A LOD score is calculated for each marker interval
for each permutation of the data. The LOD scores are then
ranked from lowest to highest. The 950th largest LOD score
out of 1000 permutations is then the critical value for
rejecting Ho for that point-wise marker interval - QTL
association. The highest LOD score for each marker is
then compiled across all marker intervals. The 950th highest
LOD score represents the critical value that provides
an P = 0.05 experiment wise Type I error. The idea is
that if there is no relationship between markers and QTLs,
5% of the tests of Ho will be rejected when Ho is actually
true. We can find this critical LOD score value by developing
a data set where Ho is true and then conducting repeated
permutations of the data.
The formula P = 1 - (1 - a)n
is based on the idea that each of the n tests for marker-QTL
associations are independent. When markers are linked,
the point-wise tests are not independent. This is because
making a Type I error for one marker will increase the
probability of making a Type I error at a linked marker.
The structure of which markers are linked and which markers
are not linked is different for each specific experiment.
For this reason the correct a
level for point-wise tests that will give a P = 0.05 experiment
wise error rate cannot be determined using a formula.
The best way to determine the a
level for point-wise tests for a given experiment is to
use the permutation test (Churchill and Doerge, 1994)
Copyright
2000©, Ted Helms |
|