Plsc 734
Spring 2005
Final Exam
1) (25 pts)
a)From your area of interest, provide a detailed example of the use of confounding with a 3x3x3 factorial. I am interested in estimating all effects. Indicate what the levels of each factor would be (3 varieties or whatever). Detail the layout-randomization-AOV. I want to see what treatment combinations are associated with each experimental unit for at least 1 replication that does not just confound a main effect.
b) For each of the following four, indicate what effects, if any, has been confounded with ranges.
1)
Range
0 000 210 020 230 301 111 321 131 202 012 222 032 103 313 123 333
1 100 310 120 330 001 211 021 231 302 112 322 132 203 013 223 033
2 200 010 220 030 101 311 121 331 002 212 022 232 303 113 323 133
3 300 110 320 130 201 011 221 031 102 312 122 332 003 213 023 233
2)
Range
0 000 210 020 230 001 211 021 231 002 212 022 232 003 213 023 233
1 100 310 120 330 101 311 121 331 102 312 122 332 103 313 123 333
2 200 010 220 030 201 011 221 031 202 012 222 032 203 013 223 033
3 300 110 320 130 301 111 321 131 302 112 322 132 303 113 323 133
3)
Range
0 000 100 200 300 021 121 221 321 002 102 202 302 023 123 223 323
1 010 110 210 310 031 131 231 331 012 112 212 312 033 133 233 333
2 020 120 220 320 001 101 201 301 022 122 222 322 003 103 203 303
3 030 130 230 330 011 111 211 311 032 132 232 332 013 113 213 313
4)
Range
0 000 100 200 300 021 121 221 321 032 132 232 332 013 113 213 313
1 010 110 210 310 031 131 231 331 002 102 202 302 023 123 223 323
2 020 120 220 320 001 101 201 301 012 112 212 312 033 133 233 333
3 030 130 230 330 011 111 211 311 022 122 222 322 003 103 203 303
2) (50 pts) Given the attached SAS statements and SAS output assume that I am willing to test at 10% probability level:
a) Fill in the missing df, MS and F statistic.
b) Can one regression of yield on myield be used for all data? Explain!
c) Do all CIs response the same to myield? Explain! Which ones might be different from say 1.0 or each other or ????
d) Show how to present the information in a report provide a VERY brief statement to include.
3) (25 pts) The following information was sent
recently to a couple faculty members in plant science:
Good
morning, gentlemen. Have a quick question for you. Am assisting in the analysis
of some data for Dr. Carr here in
thanks and look forward to any
and all response. Sincerely, Cp
Proc GLM data=work;
classes year location rep
variety;
model bua12 tw12 seedlb prot12
= year location year*location rep(year*location) variety variety*location;
test h=location
e=year*location;
test h=year
e=rep(year*location);
test h=year*location
e=rep(year*location);
lsmeans location/stderr pdiff
e=year*location;
lsmeans year*location/stderr
pdiff e=rep(year*location);
lsmeans variety
variety*location/stderr pdiff;
run;
a) Key out the sources, df and F tests as proposed by Cp.
b) Key out the sources, df and F tests you would recommend.
c) Suggest what might be wrong in the assumptions made by Cp in setting up his analysis (ie. Why your assumptions?).
d) What would suggest to improve in future work in this area or to handle the real set of data with different varieties in each location/year?
PROC IMPORT OUT= WORK.one
DATAFILE= "E:\urn2004\SD.xls"
DBMS=EXCEL2000 REPLACE;
GETNAMES=YES;
PROC SORT;BY LOC DATE;
DATA THREE;SET ONE;IF
CI="FP1095" OR CI="FP1096" OR CI="FP1097" OR
CI="FP1098" OR CI="FP1099" THEN DELETE;
PROC MEANS NOPRINT;BY LOC
DATE;VAR YIELD;OUTPUT OUT=REG MEAN=MYIELD;
DATA TWO;MERGE THREE REG;BY
LOC DATE;
PROC GLM DATA=TWO;MODEL
YIELD=MYIELD;
PROC GLM;CLASS CI;MODEL
YIELD=CI MYIELD CI*MYIELD/SOLUTION;
PROC GLM;CLASS CI;MODEL
YIELD=CI MYIELD(CI)/SOLUTION;
RUN;
SAS output:
The GLM Procedure
Dependent
Variable: YIELD YIELD
Sum of
Source DF Squares Mean Square F Value
Pr > F
Model 82694475.19 <.0001
Error 376 14114729.42 37539.17
Corrected Total 96809204.61
R-Square Coeff Var Root MSE YIELD Mean
0.854201 11.03201 193.7503 1756.256
Source DF Type I
MYIELD 82694475.19 <.0001
Source
MYIELD 82694475.19 <.0001
Standard
Parameter Estimate Error t Value
Pr > |t|
Intercept -0.000000000 38.72325404 -0.00
1.0000
MYIELD 1.000000000 0.02130611 46.93
<.0001
The GLM Procedure
Class Level
Information
Class
Levels Values
CI
40 CI 389 CI2522 CI2921 CI3096
CI3259 CI3270 CI3296 CI3297 CI3318 CI3327 CI3332
CI3353 CI3358
CI3397 CI3399 CI3404 CI3411 CI3423 CI3424 CI3425 FP1094 FP2024
FP2044 FP2102 FP2107 FP2112
FP2114 FP2118 FP2119 N0010 N2007 N2010 N2010B
N2010Y N2014 N305 N320 N323
N325 N9719
Number of observations 378
The GLM
Procedure
Dependent
Variable: YIELD YIELD
Sum of
Source DF Squares Mean Square F Value
Pr > F
Model 87129042.89 <.0001
Error 9680161.72
Corrected Total 377 96809204.61
R-Square
Coeff Var Root MSE YIELD Mean
0.900008 10.26231 180.2325 1756.256
Source DF Type I
CI 4667529.59 <.0001
MYIELD 80721450.58 <.0001
MYIELD*CI 1740062.72 0.0762
Source
CI 1630283.41 0.1270
MYIELD 70985188.11 <.0001
MYIELD*CI 1740062.72 0.0762
Standard
Parameter Estimate Error t Value
Pr > |t|
Intercept 247.6285437 B 202.5515626 1.22
0.2225
CI CI 389 148.5076076 B 285.5318421 0.52
0.6034
CI CI2522 -429.5526375 B 285.5318421 -1.50
0.1335
CI CI2921
-685.2639745 B
285.5318421 -2.40 0.0170
CI CI3096 -283.5539956 B 337.4755132 -0.84
0.4015
CI CI3259 -292.2969653 B 296.4311958 -0.99
0.3249
CI CI3270
-62.3443535 B 429.1387825 -0.15
0.8846
CI CI3296 -292.8125046 B 429.1387825 -0.68
0.4956
CI CI3297 -337.3309699 B 295.7373148 -1.14
0.2549
CI
CI3318 -304.6532979 B 429.1387825 -0.71
0.4783
CI CI3327 -149.5784968 B 286.4511670 -0.52
0.6019
CI CI3332 12.7546216 B 429.1387825 0.03
0.9763
CI CI3353 -237.8931342 B 296.4311958 -0.80
0.4229
CI CI3358 -45.7840100 B 296.4311958 -0.15
0.8774
CI CI3397 -435.2179895 B 286.4511670 -1.52
0.1297
CI CI3399 -881.4329535 B 429.1387825 -2.05
0.0409
CI CI3404 -444.0878790 B 296.4311958 -1.50
0.1352
CI CI3411 -429.6075402 B 286.4511670 -1.50
0.1347
CI
CI3423 -202.8770546 B 296.4311958 -0.68
0.4943
CI CI3424 -81.0557309 B 286.4511670 -0.28
0.7774
CI CI3425 -504.8586893 B 286.4511670 -1.76
0.0790
CI
FP1094 -231.8712486 B 347.5157316 -0.67
0.5051
CI FP2024 422.7024226 B 295.7373148 1.43
0.1540
CI FP2044 -170.6113760 B 286.4511670 -0.60
0.5519
CI
FP2102 -230.1674082 B 359.1959950 -0.64
0.5222
CI FP2107 -381.4768045 B 359.1959950 -1.06
0.2891
CI FP2112 -109.2959995 B 285.5318421 -0.38
0.7022
CI
FP2114 -586.3033210 B 285.5318421 -2.05
0.0409
CI FP2118 -534.7015999 B 285.5318421 -1.87
0.0621
CI FP2119 -236.5543681 B 285.5318421 -0.83
0.4081
CI N0010 -61.5452902 B 286.4511670 -0.21
0.8300
CI N2007 -289.5237181 B 317.8808187 -0.91
0.3631
CI N2010 -169.5133808 B 317.8808187 -0.53
0.5943
CI N2010B -757.4554914 B 337.4755132 -2.24
0.0255
CI N2010Y -839.6474827 B 337.4755132 -2.49
0.0134
CI N2014 -218.6600408 B 317.8808187 -0.69
0.4921
CI N305 163.3258906 B 317.8808187 0.51
0.6078
CI N320 72.8544734 B 317.8808187 0.23
0.8189
CI N323 -301.6254559 B 317.8808187 -0.95
0.3435
CI N325 -119.0291690 B 317.8808187 -0.37
0.7083
CI N9719 0.0000000 B . . .
MYIELD 0.8615456 B 0.1115983 7.72
<.0001
MYIELD*CI CI 389 -0.1825083 B 0.1562952 -1.17
0.2439
MYIELD*CI CI2522 0.2131881 B 0.1562952 1.36
0.1736
MYIELD*CI CI2921 0.4120246 B 0.1562952 2.64
0.0088
MYIELD*CI CI3096 0.1997460 B 0.1976688 1.01
0.3131
MYIELD*CI CI3259 0.0657280 B 0.1613117 0.41
0.6840
MYIELD*CI CI3270 0.1048935 B 0.2144361 0.49
0.6251
MYIELD*CI CI3296 0.1522544 B 0.2144361 0.71
0.4782
MYIELD*CI CI3297 0.1758160 B 0.1599726 1.10
0.2726
MYIELD*CI CI3318 0.1697239 B 0.2144361 0.79
0.4293
MYIELD*CI CI3327 0.0482491 B 0.1578238 0.31
0.7600
MYIELD*CI CI3332 -0.0545162 B 0.2144361 -0.25
0.7995
MYIELD*CI CI3353 0.1956562 B 0.1613117 1.21
0.2261
MYIELD*CI CI3358 0.0834474 B 0.1613117 0.52
0.6053
MYIELD*CI CI3397 0.2884224 B 0.1578238 1.83
0.0686
MYIELD*CI CI3399 0.4382993 B 0.2144361 2.04 0.0418
MYIELD*CI CI3404 0.2087015 B 0.1613117 1.29
0.1967
MYIELD*CI CI3411 0.2698343 B 0.1578238 1.71
0.0884
MYIELD*CI CI3423 0.1541363 B 0.1613117 0.96
0.3401
MYIELD*CI CI3424 0.1066430 B 0.1578238 0.68
0.4997
MYIELD*CI CI3425 0.2950127 B 0.1578238 1.87
0.0626
MYIELD*CI FP1094 0.1431050 B 0.1871151 0.76
0.4450
MYIELD*CI FP2024 -0.1615169 B 0.1599726 -1.01
0.3135
MYIELD*CI FP2044 0.0613072 B 0.1578238 0.39
0.6980
MYIELD*CI FP2102 0.0712855 B 0.2042852 0.35
0.7274
MYIELD*CI FP2107 0.2230787 B 0.2042852 1.09
0.2757
MYIELD*CI FP2112 0.0887166 B 0.1562952 0.57
0.5707
MYIELD*CI FP2114 0.2517335 B 0.1562952 1.61
0.1083
MYIELD*CI FP2118 0.2340180 B 0.1562952 1.50
0.1354
MYIELD*CI FP2119 0.1132422 B 0.1562952 0.72
0.4693
MYIELD*CI N0010 0.0601058 B 0.1578238 0.38
0.7036
MYIELD*CI N2007 0.1220413 B 0.1803941 0.68
0.4992
MYIELD*CI N2010 0.0603191 B 0.1803941 0.33
0.7383
MYIELD*CI N2010B 0.4893616 B 0.1976688 2.48
0.0139
MYIELD*CI N2010Y 0.5058902 B 0.1976688 2.56
0.0110
MYIELD*CI N2014 0.0864697 B 0.1803941 0.48
0.6320
MYIELD*CI N305 -0.1092757 B 0.1803941 -0.61
0.5451
MYIELD*CI N320 -0.0177345 B 0.1803941 -0.10
0.9218
MYIELD*CI N323 0.2584353 B 0.1803941 1.43
0.1530
MYIELD*CI N325 0.1099526 B 0.1803941 0.61
0.5426
MYIELD*CI N9719 0.0000000 B . . .
NOTE:
The X'X matrix has been found to be singular, and a generalized inverse was
used to solve
the normal equations. Terms whose estimates are followed by the
letter 'B' are not
uniquely estimable.
The GLM
Procedure
Class Level
Information
Class
Levels Values
CI
40 CI 389 CI2522 CI2921 CI3096
CI3259 CI3270 CI3296 CI3297 CI3318 CI3327 CI3332
CI3353 CI3358
CI3397 CI3399 CI3404 CI3411 CI3423 CI3424 CI3425 FP1094 FP2024
FP2044 FP2102 FP2107 FP2112
FP2114 FP2118 FP2119 N0010 N2007 N2010 N2010B
N2010Y N2014 N305 N320 N323 N325
N9719
The GLM
Procedure
Dependent
Variable: YIELD YIELD
Sum of
Source DF Squares Mean Square F Value
Pr > F
Model 87129042.89 <.0001
Error 9680161.72
Corrected Total 96809204.61
R-Square Coeff Var Root MSE YIELD Mean
0.900008 10.26231 180.2325 1756.256
Source DF Type I
CI 4667529.59 <.0001
MYIELD(CI) 82461513.30 <.0001
Source
CI 1630283.41 0.1270
MYIELD(CI) 82461513.30 <.0001
Standard
Parameter
Estimate Error t Value
Pr > |t|
Intercept 247.6285437 B 202.5515626 1.22
0.2225
CI CI 389 148.5076076 B 285.5318421 0.52
0.6034
CI CI2522 -429.5526375 B 285.5318421 -1.50
0.1335
CI CI2921 -685.2639746 B 285.5318421 -2.40
0.0170
CI CI3096 -283.5539956 B 337.4755132 -0.84
0.4015
CI CI3259 -292.2969653 B 296.4311958 -0.99
0.3249
CI CI3270 -62.3443535 B 429.1387825 -0.15
0.8846
CI CI3296 -292.8125046 B 429.1387825 -0.68
0.4956
CI CI3297 -337.3309699 B 295.7373148 -1.14
0.2549
CI CI3318 -304.6532979 B 429.1387825 -0.71
0.4783
CI CI3327 -149.5784968 B 286.4511670 -0.52
0.6019
CI CI3332 12.7546216 B 429.1387825 0.03
0.9763
CI CI3353 -237.8931342 B 296.4311958 -0.80
0.4229
CI CI3358 -45.7840100 B 296.4311958 -0.15
0.8774
CI CI3397 -435.2179895 B 286.4511670 -1.52
0.1297
CI CI3399 -881.4329535 B 429.1387825 -2.05
0.0409
CI CI3404 -444.0878790 B 296.4311958 -1.50 0.1352
CI CI3411 -429.6075402 B 286.4511670 -1.50
0.1347
CI CI3423 -202.8770546 B 296.4311958 -0.68
0.4943
CI CI3424 -81.0557310 B 286.4511670 -0.28
0.7774
CI CI3425 -504.8586894 B 286.4511670 -1.76
0.0790
CI FP1094 -231.8712486 B 347.5157316 -0.67
0.5051
CI FP2024 422.7024226 B 295.7373148 1.43
0.1540
CI FP2044 -170.6113760 B 286.4511670 -0.60
0.5519
CI FP2102 -230.1674082 B 359.1959950 -0.64
0.5222
CI FP2107 -381.4768045 B 359.1959950 -1.06
0.2891
CI FP2112 -109.2959996 B 285.5318421 -0.38
0.7022
CI FP2114 -586.3033210 B 285.5318421 -2.05
0.0409
CI FP2118 -534.7015999 B 285.5318421 -1.87
0.0621
CI FP2119 -236.5543681 B 285.5318421 -0.83
0.4081
CI N0010 -61.5452902 B 286.4511670 -0.21
0.8300
CI N2007 -289.5237181 B 317.8808187 -0.91
0.3631
CI N2010 -169.5133808 B 317.8808187 -0.53
0.5943
CI N2010B -757.4554914 B 337.4755132 -2.24
0.0255
CI N2010Y -839.6474827 B 337.4755132 -2.49
0.0134
CI N2014 -218.6600408 B 317.8808187 -0.69
0.4921
CI N305 163.3258906 B 317.8808187 0.51
0.6078
CI N320 72.8544734 B 317.8808187 0.23
0.8189
CI N323 -301.6254559 B 317.8808187 -0.95
0.3435
CI N325 -119.0291690 B 317.8808187 -0.37
0.7083
CI N9719 0.0000000 B .
. .
MYIELD(CI) CI 389 0.6790373 0.1094259 6.21
<.0001
MYIELD(CI) CI2522 1.0747337 0.1094259 9.82
<.0001
MYIELD(CI) CI2921 1.2735701 0.1094259 11.64
<.0001
MYIELD(CI) CI3096 1.0612916 0.1631526 6.50
<.0001
MYIELD(CI) CI3259 0.9272735 0.1164787 7.96
<.0001
MYIELD(CI) CI3270 0.9664390 0.1831084 5.28
<.0001
MYIELD(CI) CI3296 1.0137999 0.1831084 5.54
<.0001
MYIELD(CI) CI3297 1.0373615 0.1146171 9.05
<.0001
MYIELD(CI) CI3318 1.0312695 0.1831084 5.63
<.0001
MYIELD(CI) CI3327 0.9097946 0.1115983 8.15
<.0001
MYIELD(CI) CI3332 0.8070294 0.1831084 4.41
<.0001
MYIELD(CI) CI3353 1.0572017 0.1164787 9.08
<.0001
MYIELD(CI) CI3358 0.9449930 0.1164787 8.11
<.0001
MYIELD(CI) CI3397 1.1499679 0.1115983 10.30
<.0001
MYIELD(CI) CI3399 1.2998448 0.1831084
7.10 <.0001
MYIELD(CI) CI3404 1.0702471 0.1164787 9.19
<.0001
MYIELD(CI) CI3411 1.1313799 0.1115983 10.14
<.0001
MYIELD(CI) CI3423 1.0156819 0.1164787 8.72
<.0001
MYIELD(CI) CI3424 0.9681886 0.1115983 8.68
<.0001
MYIELD(CI) CI3425 1.1565583 0.1115983 10.36
<.0001
MYIELD(CI) FP1094 1.0046505 0.1501929 6.69
<.0001
MYIELD(CI) FP2024 0.7000287 0.1146171 6.11
<.0001
MYIELD(CI) FP2044 0.9228528 0.1115983 8.27
<.0001
MYIELD(CI) FP2102 0.9328311 0.1711090 5.45
<.0001
MYIELD(CI) FP2107 1.0846242 0.1711090 6.34
<.0001
MYIELD(CI) FP2112 0.9502622 0.1094259 8.68
<.0001
MYIELD(CI) FP2114 1.1132791 0.1094259 10.17
<.0001
MYIELD(CI) FP2118 1.0955636 0.1094259 10.01
<.0001
MYIELD(CI) FP2119 0.9747877 0.1094259 8.91
<.0001
MYIELD(CI) N0010 0.9216513 0.1115983 8.26
<.0001
MYIELD(CI) N2007 0.9835869 0.1417316 6.94
<.0001
MYIELD(CI) N2010 0.9218646 0.1417316 6.50
<.0001
MYIELD(CI) N2010B 1.3509072 0.1631526 8.28
<.0001
MYIELD(CI) N2010Y 1.3674357 0.1631526 8.38
<.0001
MYIELD(CI) N2014 0.9480153 0.1417316 6.69
<.0001
MYIELD(CI) N305 0.7522699 0.1417316 5.31
<.0001
MYIELD(CI) N320 0.8438111 0.1417316 5.95
<.0001
MYIELD(CI) N323 1.1199809 0.1417316 7.90
<.0001
MYIELD(CI) N325 0.9714982 0.1417316 6.85
<.0001
MYIELD(CI) N9719 0.8615456 0.1115983 7.72
<.0001
NOTE:
The X'X matrix has been found to be singular, and a generalized inverse was
used to solve
the normal equations. Terms whose estimates are followed by the
letter 'B' are not
uniquely estimable.
observed = mean + regr + dev
Mean S (Y)2 / n
Regr S(xy)2 / Sx2
Dev Y2 - mean - regr
or r = Sxy /[Sx2Sy2]
r2 tells us the relative amount of variation in common.
An example
| Age | Blood Pressure | x | y | x2 | y2 | xy |
| 35 | 114 | -20 | -27 | 400 | 729 | 540 |
| 45 | 124 | -10 | -17 | 100 | 289 | 170 |
| 55 | 143 | 0 | 2 | 0 | 4 | 0 |
| 65 | 158 | 10 | 17 | 100 | 289 | 170 |
| 75 | 166 | 20 | 25 | 400 | 625 | 500 |
| --- | --- | --- | --- | --- | --- | --- |
| 275 | 705 | 0 | 0 | 1000 | 1936 | 1380 |
| Age | Y | predicted | dev | dev squared |
| 35 | 114 | 113.4 | 0.6 | 0.36 |
| 45 | 124 | 127.2 | -3.2 | 10.24 |
| 55 | 143 | 141.0 | 2.0 | 4.00 |
| 65 | 158 | 154.8 | 3.2 | 10.24 |
| 75 | 166 | 168.6 | -2.6 | 6.76 |
| 0 | 31.60 |
S2y.x = 31.6 / 3 = 10.53 S2b = 10.53/1000 Sb = 0.102
H: ß = 0
t = 1.38 / 0.102 = 13.5 *
| Source | df | SS | MS | F |
| Total | 5 | 101341 | ||
| Mean | 1 | 99405 | ||
| Corr Tot | 4 | 1936 | ||
| Regress | 1 | 1904.4 | 1904.4 | 180.8 |
| Dev | 3 | 31.6 | 10.53 |
Intercept = µy - ßµx
Variance of intercept = Var µy + Var ßµx
= S²y.x(1/n) + µ²x S²y.x (1/x²)
= S²y.x(1/n + µ²x/x²) = S²
Test procedures in simple linear regression
| Hypothesis | Statistic | Equation |
| a = a0 | t | (a - a 0)/Sa0 |
| ß = ß0 | t | (ß - ß0)/Sß |
| a = a 0 and ß = ß0 | F | n(a - a0)² + 2nµx[(a - a0)(ß - ß0) + (ß - ß0)Sx²] / (2S²y.x) |
An example 2 groups
| Group 1 | ......... | Group 2 | ||
| X | Y | X | Y | |
| 30 | 165 | 24 | 180 | |
| 27 | 170 | 31 | 169 | |
| 20 | 130 | 20 | 171 | |
| 21 | 156 | 26 | 161 | |
| 33 | 167 | 20 | 180 | |
| 29 | 151 | 25 | 170 |
Group 2 ß2 = -0.852 SS dev = 200.95
H: ß1 = ß2
t = [1.995 - (-0.852)]/{[(566.83+200.95)/ (5+5-4)][1/133.33+1/85.33]}½
= 2.447
t6,.05 = 2.447
t6,.20 = 1.44
Rerun of example with matrix approach - like a computer package would solve a linear regression
problem
Y = µ + (X-µx)b
114 = 1µ + (35-55)b = -20
124 = 1µ + (45-55)b = -10
143 = 1µ + (55-55)b = 0
158 = 1µ + (65-55)b = 10
166 = 1µ + (75-55)b = 20
5 0 µµ µx
X'X =
0 1000 xµ xx
1/5 0
(X'X)-1 =
0 1/1000
705
(X'Y) =
1380
(1/5)(705)+0(1380)
(X'X)-1 (X'Y) =
0(705)+1380/1000
Mean of Y = 705/5 = 141
ß = 1380/1000
SS due to the model = 141 * 705 + 1.380 * 1380
mean regression
Deviation from regression = total - SS due to model = 31.6
Standard deviation of mean = inverse element * dev variance
1/5 * 10.53 for the mean
1/1000 * 10.53 for ß
Intercept = µy - ßµx
Variance of intercept = Var µy + Var ßµx
= S²y.x(1/n) + µ²x S²y.x (1/x²)
= S²y.x(1/n + µ²x/x²) = S²
Test procedures in simple linear regression
Hypothesis Statistic Equation
= 0 t (a - 0)/S
ß = ß0 t (b - ß0)/Sß
= 0 and
ß = ß0 F n(a - 0)² + 2nµx(a - 0)(b - ß0) +
(b - ß0)x²) / (2S²y.x)
An example 2 groups
Group 1 Group 2
X Y X Y
30 165 24 180
27 170 31 169
20 130 20 171
21 156 26 161
33 167 20 180
29 151 25 170
Group 1 ß1 = 1.995 SS dev = 566.83
Group 2 ß2 = -0.852 SS dev = 200.95
H: ß1 = ß2
t = [1.995 - (-0.852)]/{[(566.83+200.95)/
(6+6-4)][1/133.33+1/85.33]}½
= 2.096
t6,.05 = 2.447
t6,.20 = 1.44
Some questions that we might ask?
1. Can one regression be used for all observations?
2. If one can not be used is the regression the same within each group. ß1 = ß2?
SAP results - PROC GLM;MODEL Y=X;
DEPENDENT VARIABLE Y
AOV TABLE
DF SS MS
TOTAL 11 2065.66667
MODEL 1 31.47234 31.47234
ERROR 10 2034.19433 203.41943
R-SQUARED = 0.01524 MEAN = 164.16667
EFFECTS ESTIMATE STD ERROR
INTERCEPT 154.834762573242
X 1 .36595743894577 .930384118119818
DEPENDENT VARIABLE Y
PARTIAL SS DF SS MS
X 1 31.47234 31.47234
SAP output -- PROC GLM;CLASSES GROUP;MODEL Y=GROUP X X*GROUP;
CLASS LEVELS
GROUP 2 1 2
DEPENDENT VARIABLE Y
AOV TABLE
DF SS MS
TOTAL 11 2065.66667
MODEL 3 1297.88135 432.62712
ERROR 8 767.78532 95.97316
R-SQUARED = 0.62831 MEAN = 164.16667
EFFECTS ESTIMATE STD ERROR
INTERCEPT 147.927352905273
GROUP 1 -44.6273307800293 17.3909942979608
X 1 .571718752384186 .679058955921065
X*GROUP 1 1.42328178882599 .679058955921065
DEPENDENT VARIABLE Y
PARTIAL SS DF SS MS
GROUP 1 631.97925 631.97925
X 1 68.02988 68.02988
X*GROUP 1 421.61557 421.61557
Question #1:
F = (2034.19433 - 767.78532)/2 / 95.97316 = 6.597
F2,8,0.05 = 4.46
sign ===> one regression could not be used for all
Question #2:
F = 421.61557 / 95.97316 = 4.393
F1,8,0.05 = 5.32
non-sign ===> regression coeff same for each group.
Treatments
1 2 3 4
X Y X Y X Y X Y
30 165 24 180 34 156 41 201
27 170 31 169 32 189 32 173
20 130 20 171 35 138 30 200
21 156 26 161 35 190 35 193
33 167 20 180 30 160 28 142
29 151 25 170 29 172 36 189
160 939 146 1031 195 1005 202 1098 703 4073
Comparsion among groups
Group df xx xy yy dev df MS
1 5 133.33 266.0 1097.5 566.83 4
2 5 85.33 -72.0 262.8 200.95 4
3 5 33.5 -42.5 2047.5 1993.58 4
4 5 109.33 346.0 2530.0 1435.04 4
4196.4 16 262.275
Within 20 361.5 496.8 5937.8 5255 19 276.579
Among 3 365.459 451.2 2163.1 1606.04 2 803.023
Total 23 726.959 948 8100.9 6864.6 22 312.027
ß1 = 1.995 ß2 = -.0852 ß3 = -1.269 ß4 = 3.164
ßt = 1.304 ßw = 1.374 ßm = 1.235
Questions:
1. Can one regression line be used for all observations?
F = (6864.6-4196.4)/6 / 262.275
= 1.696 if non sign then stop
2. Are the regression coefficients the same for all groups?
F = (5255 - 4196.4)/3 / 262.275
= 1.345 if sign then stop
3. Is the regression of group means linear?
F = 803.023 / 276.579
= 2.903 if sign then stop
4. Is the regression of group means the same as the within group?
F = (5864.6 - 4194.4 - 1606.04) / 276.579
= 0.013
Covariance
Deviations about regression
Source df x2 xy y2 df SS MS
Trt 3 365.459 451.2 2163.1
Error 20 361.5 496.8 5937.8 19 5255.01 276.579
T + E 23 726.959 948 8100.9 22 6864.61
trt adjusted 3 1609.6 536.53
F test adjusted treatments = 536.53 / 276.579 = 1.94 non sign 5%
Error regression coefficient = 496.8 / 361.5 = 1.374
Adjusted mean of Y = Unadjusted mean of Y + regression coefficient * deviation in X
Treatment(group) meanx devx meany adj mean
1 26.67 -2.62 156.5 160.1
2 24.33 -4.96 171.83 178.65
3 32.50 3.21 167.5 163.09
4 33.67 4.38 183.0 176.98
Regression with several groups
Group df SSx CPxy SSy from regr df MS
1 n1-1 A1 B1 C1 C1-B1*B1/A1 n1-2
2 n2-1 A2 B2 C2 C2-B2*B2/A2 n2-2
.
k nk-1 Ak Bk Ck Ck-Bk*Bk/Ak nk-2
Sum S1 N-2k MS1
Within N-k Aw Bw Cw S1+S2=Cw-Bw*Bw/Aw N-k-1 MS1+2
Among k-1 Am Bm Cm S3=Cm-Bm*Bm/Am k-2 MS3
Total N-1 At Bt Ct St=Ct-Bt*Bt/At N-2 MSt
Questions and proper tests:
1. Can one regression line be used for all observations?
F= [(St - S1)/2(k-1)] / MS1
2. Are the regression coefficients the same for all groups?
F= MS2 / MS1
3. Deviations from regression of group means the same as error?
F= MS3 / MS1+2
4. Regression coefficient for among groups the same as within groups?
F = MS4 / MS1+2
Note St = S1 + S2 + S3 + S4
SAS statements to obtain necessary information:
PROC GLM;MODEL Y=X;
deviations from the model is St
PROC GLM;CLASSES TRT;MODEL Y=TRT X X*TRT;
deviation from the model is S1
SS due to X*TRT is S2
PROC GLM;ABSORB TRT;MODEL Y=X; or PROC GLM;CLASSES TRT;MODEL T=TRT
X;
deviation from the model is S1 + S2
PROC MEANS NOPRINT;BY TRT;VAR X Y;OUTPUT OUT=NEW MEAN=X Y N=NUM;
PROC GLM;WEIGHT NUM;MODEL Y=X;
deviation from the model is S3
Source df SS Error
Total 2(k-1) St - S1 MS1
ß1 = ß2 = ... = ßk (k-1) S2 MS1
dev from means lin (k-2) S3 MS1+2
ßw = ßm 1 St - S1 - S2 - S3 MS1+2
Using the oat data with the first observation missing!!!!!!
Assign miss=0 except to observation with missing value, then assign miss=1.
PROC GLM;CLASSES REP CULT;MODEL YLD=REP CULT MISS;
AOV TABLE
DF SS MS
TOTAL 55 22122.125
MODEL 17 19582.055 1151.886
ERROR 38 2540.070 66.844
R-SQUARED = 0.88518 MEAN = 54.125
EFFECTS ESTIMATE STD ERROR
INTERCEPT 55.4633712768555
... ... ....
MISS 1 -74.9487152099609 9.79699262222576
DEPENDENT VARIABLE YLD
PARTIAL SS DF SS MS
REP 3 216.597 72.199
CULT 13 16433.420 1264.109
MISS 1 3912.055 3912.055
PROC GLM;CLASSES REP CULT;MODEL YLD=REP CULT;
Dropping the missing observation.
DEPENDENT VARIABLE YLD
AOV TABLE
DF SS MS
TOTAL 54 19139.345
MODEL 16 16599.275 1037.455
ERROR 38 2540.070 66.844
R-SQUARED = 0.86729 MEAN = 55.109
DEPENDENT VARIABLE YLD
PARTIAL SS DF SS MS
REP 3 216.597 72.199
CULT 13 16433.424 1264.110
Estimate the missing plot using missing plot formula.
Ymiss= 74.94872
PROC ANOV;CLASSES REP CULT;MODEL YLD=REP CULT;
REP 4 1 2 3 4
CULT 14 Clarion Clintland Clintland 60
Clinton Early Clinton Fayette
Goodfield Minhafee Minton
Mo. O-205 Nehawka Nemaha
Newton Putman
DEPENDENT VARIABLE YLD
SOURCE DF SS MS F
TOTAL 55 19525.93
REP 3 221.73 73.91 1.135
CULT 13 16764.13 1289.55 19.800***
ERROR 39 2540.07 65.13
Note the estimate of the missing plot is the negative of the regression coefficient in model 1.
Compare the various AOV tables.
Define miss and miss2 for the missing values.
PROC GLM;CLASSES REP CULT;MODEL YLD=REP CULT MISS MISS2;
DEPENDENT VARIABLE YLD
AOV TABLE
DF SS MS
TOTAL 55 24469.982
MODEL 18 22066.143 1225.897
ERROR 37 2403.840 64.969
R-SQUARED = 0.90176 MEAN = 52.768
EFFECTS ESTIMATE STD ERROR
...
MISS 1 -76.0277786254883 9.68728995184745
MISS2 1 -90.0277786254883 9.68728995184745
DEPENDENT VARIABLE YLD
PARTIAL SS DF SS MS
REP 3 272.745 90.915
CULT 13 16139.652 1241.512
MISS 1 4001.693 4001.693
MISS2 1 5611.154 5611.154
DROP MISSING OBSERVATIONS.
PROC GLM;CLASSES REP CULT;MODEL YLD=REP CULT;
DEPENDENT VARIABLE YLD
AOV TABLE
DF SS MS
TOTAL 53 18694.833
MODEL 16 16290.995 1018.187
ERROR 37 2403.838 64.969
R-SQUARED = 0.87142 MEAN = 54.722
DEPENDENT VARIABLE YLD
PARTIAL SS DF SS MS
REP 3 272.745 90.915
CULT 13 16139.650 1241.512
Estimate missing plot 90.02772 and 76.02777
PROC ANOV;CLASSES REP CULT;MODEL YLD=REP CULT;
REP 4 1 2 3 4
CULT 14 Clarion Clintland Clintland 60
Clinton Early Clinton Fayette
Goodfield Minhafee Minton
Mo. O-205 Nehawka Nemaha
Newton Putman
DEPENDENT VARIABLE YLD
SOURCE DF SS MS F
TOTAL 55 20338.01
REP 3 293.22 97.74 1.586
CULT 13 17640.95 1357.00 22.016***
ERROR 39 2403.84 61.64
NOTE: the estimate of missing plots and regression coefficients.
compare the various AOV tables.
RCBD example