PlSc 734 - FIELD DESIGN II

James J. Hammond

Course outline


No textbook required.
Ref books:

Homework required - Work together or alone.

Expected average on exams - 50%
The final grade in the class will be determined based on the average of a mid term exam and final exam.


The following is a sample of a recent final exam.

1

Plsc 734

Spring 2005

Final Exam

 

1)      (25 pts)

a)From your area of interest, provide a detailed example of the use of confounding with a 3x3x3 factorial.  I am interested in estimating all effects.  Indicate what the levels of each factor would be (3 varieties or whatever).  Detail the layout-randomization-AOV.  I want to see what treatment combinations are associated with each experimental unit for at least 1 replication that does not just confound a main effect.

b)      For each of the following four, indicate what effects, if any, has been confounded with ranges.

 

1)

Range

 0    000 210 020 230 301 111 321 131 202 012 222 032 103 313 123 333 

 1    100 310 120 330 001 211 021 231 302 112 322 132 203 013 223 033 

 2    200 010 220 030 101 311 121 331 002 212 022 232 303 113 323 133 

 3    300 110 320 130 201 011 221 031 102 312 122 332 003 213 023 233 

 

2)

Range

 0    000 210 020 230 001 211 021 231 002 212 022 232 003 213 023 233 

 1    100 310 120 330 101 311 121 331 102 312 122 332 103 313 123 333 

 2    200 010 220 030 201 011 221 031 202 012 222 032 203 013 223 033 

 3    300 110 320 130 301 111 321 131 302 112 322 132 303 113 323 133 

 

3)

Range

 0    000 100 200 300 021 121 221 321 002 102 202 302 023 123 223 323 

 1    010 110 210 310 031 131 231 331 012 112 212 312 033 133 233 333 

 2    020 120 220 320 001 101 201 301 022 122 222 322 003 103 203 303 

 3    030 130 230 330 011 111 211 311 032 132 232 332 013 113 213 313 

 

4)

Range

 0    000 100 200 300 021 121 221 321 032 132 232 332 013 113 213 313 

 1    010 110 210 310 031 131 231 331 002 102 202 302 023 123 223 323 

 2    020 120 220 320 001 101 201 301 012 112 212 312 033 133 233 333 

 3    030 130 230 330 011 111 211 311 022 122 222 322 003 103 203 303 

                                                                       

 

 

 

 

 

 

2)       (50 pts)  Given the attached SAS statements and SAS output assume that I am willing to test at 10% probability level:

a)      Fill in the missing df, MS and F statistic.

b)      Can one regression of yield on myield be used for all data?  Explain!

c)      Do all CIs response the same to myield?  Explain! Which ones might be different from say 1.0 or each other or ????

d)      Show how to present the information in a report provide a VERY brief statement to include.

 

 

3)  (25 pts) The following information was sent recently to a couple faculty members in plant science:

 

Good morning, gentlemen. Have a quick question for you. Am assisting in the analysis of some data for Dr. Carr here in Dickinson. Experiment was varietal comparison described as a randomized complete block design with 4 locations and multilple reps (primarily 4, not always for all sites, years and/or data) within each location with experiment conducted over 2 years. They are some vulgaries to the actual varieties represented in each location/year. Howvever for the sake of getting started, we are working with just those varieties that are present in all locations every year. Below is an example of my proposed SAS statements to analyze the experiment as a split plot design with year and rep as random effects, and location as a fixed effect, in the whole plot and varieties in the split plot. I would appreciate your initial reaction regarding whether this is set up satisfactorily. Since one or both of you may see this analysis in a proposed publication, I thought it prudent to ask this question now as opposed to waiting to find the answer upon review.

thanks and look forward to any and all response. Sincerely, Cp

 
Proc GLM data=work;
classes year location rep variety;
model bua12 tw12 seedlb prot12 = year location year*location rep(year*location) variety variety*location;
test h=location e=year*location;
test h=year e=rep(year*location);
test h=year*location e=rep(year*location);
lsmeans location/stderr pdiff e=year*location;
lsmeans year*location/stderr pdiff e=rep(year*location);
lsmeans variety variety*location/stderr pdiff;
run;

 

a)  Key out the sources, df and F tests as proposed by Cp.

b)  Key out the sources, df and F tests you would recommend.

c)  Suggest what might be wrong in the assumptions made by Cp in setting up his analysis (ie.  Why your assumptions?).

d)  What would suggest to improve in future work in this area or to handle the real set of data with different varieties in each location/year?

 


PROC IMPORT OUT= WORK.one

            DATAFILE= "E:\urn2004\SD.xls"

            DBMS=EXCEL2000 REPLACE;

     GETNAMES=YES;

PROC SORT;BY LOC DATE;

DATA THREE;SET ONE;IF CI="FP1095" OR CI="FP1096" OR CI="FP1097" OR CI="FP1098" OR CI="FP1099" THEN DELETE;

PROC MEANS NOPRINT;BY LOC DATE;VAR YIELD;OUTPUT OUT=REG MEAN=MYIELD;

DATA TWO;MERGE THREE REG;BY LOC DATE;

PROC GLM DATA=TWO;MODEL YIELD=MYIELD;

PROC GLM;CLASS CI;MODEL YIELD=CI MYIELD CI*MYIELD/SOLUTION;

PROC GLM;CLASS CI;MODEL YIELD=CI MYIELD(CI)/SOLUTION;

RUN;

 

SAS output:

                                       The GLM Procedure

 

Dependent Variable: YIELD   YIELD

                                              Sum of

      Source                      DF         Squares     Mean Square    F Value    Pr > F

 

      Model                              82694475.19                               <.0001

      Error                      376     14114729.42        37539.17

      Corrected Total                    96809204.61

 

 

                       R-Square     Coeff Var      Root MSE    YIELD Mean

                       0.854201      11.03201      193.7503      1756.256

 

      Source                      DF       Type I SS     Mean Square    F Value    Pr > F

 

      MYIELD                             82694475.19                               <.0001

 

      Source                      DF     Type III SS     Mean Square    F Value    Pr > F

 

      MYIELD                             82694475.19                               <.0001

 

                                                 Standard

               Parameter         Estimate           Error    t Value    Pr > |t|

               Intercept     -0.000000000     38.72325404      -0.00      1.0000

               MYIELD         1.000000000      0.02130611      46.93      <.0001

 

                                       The GLM Procedure

 

                                    Class Level Information

 

Class       Levels  Values

 

CI              40  CI 389 CI2522 CI2921 CI3096 CI3259 CI3270 CI3296 CI3297 CI3318 CI3327 CI3332

                    CI3353 CI3358 CI3397 CI3399 CI3404 CI3411 CI3423 CI3424 CI3425 FP1094 FP2024

                    FP2044 FP2102 FP2107 FP2112 FP2114 FP2118 FP2119 N0010 N2007 N2010 N2010B

                    N2010Y N2014 N305 N320 N323 N325 N9719

 

 

                                 Number of observations    378

 

                                       The GLM Procedure

 

Dependent Variable: YIELD   YIELD

 

                                              Sum of

      Source                      DF         Squares     Mean Square    F Value    Pr > F

 

      Model                              87129042.89                               <.0001

      Error                               9680161.72                                      

      Corrected Total            377     96809204.61

 

                       R-Square     Coeff Var      Root MSE    YIELD Mean

                       0.900008      10.26231      180.2325      1756.256

 

      Source                      DF       Type I SS     Mean Square    F Value    Pr > F

 

      CI                                  4667529.59                               <.0001

      MYIELD                             80721450.58                               <.0001

      MYIELD*CI                           1740062.72                               0.0762

 

      Source                      DF     Type III SS     Mean Square    F Value    Pr > F

 

      CI                                  1630283.41                               0.1270

      MYIELD                             70985188.11                               <.0001

      MYIELD*CI                           1740062.72                               0.0762

 

                                                      Standard

           Parameter                Estimate             Error    t Value    Pr > |t|

           Intercept             247.6285437 B     202.5515626       1.22      0.2225

           CI        CI 389      148.5076076 B     285.5318421       0.52      0.6034

           CI        CI2522     -429.5526375 B     285.5318421      -1.50      0.1335

           CI        CI2921     -685.2639745 B     285.5318421      -2.40      0.0170

           CI        CI3096     -283.5539956 B     337.4755132      -0.84      0.4015

           CI        CI3259     -292.2969653 B     296.4311958      -0.99      0.3249

           CI        CI3270      -62.3443535 B     429.1387825      -0.15      0.8846

           CI        CI3296     -292.8125046 B     429.1387825      -0.68      0.4956

           CI        CI3297     -337.3309699 B     295.7373148      -1.14      0.2549

           CI        CI3318     -304.6532979 B     429.1387825      -0.71      0.4783

           CI        CI3327     -149.5784968 B     286.4511670      -0.52      0.6019

           CI        CI3332       12.7546216 B     429.1387825       0.03      0.9763

           CI        CI3353     -237.8931342 B     296.4311958      -0.80      0.4229

           CI        CI3358      -45.7840100 B     296.4311958      -0.15      0.8774

           CI        CI3397     -435.2179895 B     286.4511670      -1.52      0.1297

           CI        CI3399     -881.4329535 B     429.1387825      -2.05      0.0409

           CI        CI3404     -444.0878790 B     296.4311958      -1.50      0.1352

           CI        CI3411     -429.6075402 B     286.4511670      -1.50      0.1347

           CI        CI3423     -202.8770546 B     296.4311958      -0.68      0.4943

           CI        CI3424      -81.0557309 B     286.4511670      -0.28      0.7774

           CI        CI3425     -504.8586893 B     286.4511670      -1.76      0.0790

           CI        FP1094     -231.8712486 B     347.5157316      -0.67      0.5051

           CI        FP2024      422.7024226 B     295.7373148       1.43      0.1540

           CI        FP2044     -170.6113760 B     286.4511670      -0.60      0.5519

           CI        FP2102     -230.1674082 B     359.1959950      -0.64      0.5222

           CI        FP2107     -381.4768045 B     359.1959950      -1.06      0.2891

           CI        FP2112     -109.2959995 B     285.5318421      -0.38      0.7022

           CI        FP2114     -586.3033210 B     285.5318421      -2.05      0.0409

           CI        FP2118     -534.7015999 B     285.5318421      -1.87      0.0621

           CI        FP2119     -236.5543681 B     285.5318421      -0.83      0.4081

           CI        N0010       -61.5452902 B     286.4511670      -0.21      0.8300

           CI        N2007      -289.5237181 B     317.8808187      -0.91      0.3631

           CI        N2010      -169.5133808 B     317.8808187      -0.53      0.5943

           CI        N2010B     -757.4554914 B     337.4755132      -2.24      0.0255

           CI        N2010Y     -839.6474827 B     337.4755132      -2.49      0.0134

           CI        N2014      -218.6600408 B     317.8808187      -0.69      0.4921

           CI        N305        163.3258906 B     317.8808187       0.51      0.6078

           CI        N320         72.8544734 B     317.8808187       0.23      0.8189

           CI        N323       -301.6254559 B     317.8808187      -0.95      0.3435

           CI        N325       -119.0291690 B     317.8808187      -0.37      0.7083

           CI        N9719         0.0000000 B        .               .         .

           MYIELD                  0.8615456 B       0.1115983       7.72      <.0001

           MYIELD*CI CI 389       -0.1825083 B       0.1562952      -1.17      0.2439

           MYIELD*CI CI2522        0.2131881 B       0.1562952       1.36      0.1736

           MYIELD*CI CI2921        0.4120246 B       0.1562952       2.64      0.0088

           MYIELD*CI CI3096        0.1997460 B       0.1976688       1.01      0.3131

           MYIELD*CI CI3259        0.0657280 B       0.1613117       0.41      0.6840

           MYIELD*CI CI3270        0.1048935 B       0.2144361       0.49      0.6251

           MYIELD*CI CI3296        0.1522544 B       0.2144361       0.71      0.4782

           MYIELD*CI CI3297        0.1758160 B       0.1599726       1.10      0.2726

           MYIELD*CI CI3318        0.1697239 B       0.2144361       0.79      0.4293

           MYIELD*CI CI3327        0.0482491 B       0.1578238       0.31      0.7600

           MYIELD*CI CI3332       -0.0545162 B       0.2144361      -0.25      0.7995

           MYIELD*CI CI3353        0.1956562 B       0.1613117       1.21      0.2261

           MYIELD*CI CI3358        0.0834474 B       0.1613117       0.52      0.6053

           MYIELD*CI CI3397        0.2884224 B       0.1578238       1.83      0.0686

           MYIELD*CI CI3399        0.4382993 B       0.2144361       2.04      0.0418

           MYIELD*CI CI3404        0.2087015 B       0.1613117       1.29      0.1967

           MYIELD*CI CI3411        0.2698343 B       0.1578238       1.71      0.0884

           MYIELD*CI CI3423        0.1541363 B       0.1613117       0.96      0.3401

           MYIELD*CI CI3424        0.1066430 B       0.1578238       0.68      0.4997

           MYIELD*CI CI3425        0.2950127 B       0.1578238       1.87      0.0626

           MYIELD*CI FP1094        0.1431050 B       0.1871151       0.76      0.4450

           MYIELD*CI FP2024       -0.1615169 B       0.1599726      -1.01      0.3135

           MYIELD*CI FP2044        0.0613072 B       0.1578238       0.39      0.6980

           MYIELD*CI FP2102        0.0712855 B       0.2042852       0.35      0.7274

           MYIELD*CI FP2107        0.2230787 B       0.2042852       1.09      0.2757

           MYIELD*CI FP2112        0.0887166 B       0.1562952       0.57      0.5707

           MYIELD*CI FP2114        0.2517335 B       0.1562952       1.61      0.1083

           MYIELD*CI FP2118        0.2340180 B       0.1562952       1.50      0.1354

           MYIELD*CI FP2119        0.1132422 B       0.1562952       0.72      0.4693

           MYIELD*CI N0010         0.0601058 B       0.1578238       0.38      0.7036

           MYIELD*CI N2007         0.1220413 B       0.1803941       0.68      0.4992

           MYIELD*CI N2010         0.0603191 B       0.1803941       0.33      0.7383

           MYIELD*CI N2010B        0.4893616 B       0.1976688       2.48      0.0139

           MYIELD*CI N2010Y        0.5058902 B       0.1976688       2.56      0.0110

           MYIELD*CI N2014         0.0864697 B       0.1803941       0.48      0.6320

           MYIELD*CI N305         -0.1092757 B       0.1803941      -0.61      0.5451

           MYIELD*CI N320         -0.0177345 B       0.1803941      -0.10      0.9218

           MYIELD*CI N323          0.2584353 B       0.1803941       1.43      0.1530

           MYIELD*CI N325          0.1099526 B       0.1803941       0.61      0.5426

           MYIELD*CI N9719         0.0000000 B        .               .         .

 

NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve

      the normal equations.  Terms whose estimates are followed by the letter 'B' are not

      uniquely estimable.

                                       The GLM Procedure

 

                                    Class Level Information

 

Class       Levels  Values

 

CI              40  CI 389 CI2522 CI2921 CI3096 CI3259 CI3270 CI3296 CI3297 CI3318 CI3327 CI3332

                    CI3353 CI3358 CI3397 CI3399 CI3404 CI3411 CI3423 CI3424 CI3425 FP1094 FP2024

                    FP2044 FP2102 FP2107 FP2112 FP2114 FP2118 FP2119 N0010 N2007 N2010 N2010B

                    N2010Y N2014 N305 N320 N323 N325 N9719

                                       The GLM Procedure

 

Dependent Variable: YIELD   YIELD

 

                                              Sum of

      Source                      DF         Squares     Mean Square    F Value    Pr > F

 

      Model                              87129042.89                               <.0001

 

      Error                               9680161.72                                     

 

      Corrected Total                    96809204.61

 

                       R-Square     Coeff Var      Root MSE    YIELD Mean

                       0.900008      10.26231      180.2325      1756.256

 

      Source                      DF       Type I SS     Mean Square    F Value    Pr > F

 

      CI                                  4667529.59                               <.0001

      MYIELD(CI)                         82461513.30                               <.0001

 

      Source                      DF     Type III SS     Mean Square    F Value    Pr > F

 

      CI                                  1630283.41                               0.1270

      MYIELD(CI)                         82461513.30                               <.0001

 

 

                                                      Standard

          Parameter                 Estimate             Error    t Value    Pr > |t|

 

          Intercept              247.6285437 B     202.5515626       1.22      0.2225

          CI         CI 389      148.5076076 B     285.5318421       0.52      0.6034

          CI         CI2522     -429.5526375 B     285.5318421      -1.50      0.1335

          CI         CI2921     -685.2639746 B     285.5318421      -2.40      0.0170

          CI         CI3096     -283.5539956 B     337.4755132      -0.84      0.4015

          CI         CI3259     -292.2969653 B     296.4311958      -0.99      0.3249

          CI         CI3270      -62.3443535 B     429.1387825      -0.15      0.8846

          CI         CI3296     -292.8125046 B     429.1387825      -0.68      0.4956

          CI         CI3297     -337.3309699 B     295.7373148      -1.14      0.2549

          CI         CI3318     -304.6532979 B     429.1387825      -0.71      0.4783

          CI         CI3327     -149.5784968 B     286.4511670      -0.52      0.6019

          CI         CI3332       12.7546216 B     429.1387825       0.03      0.9763

          CI         CI3353     -237.8931342 B     296.4311958      -0.80      0.4229

          CI         CI3358      -45.7840100 B     296.4311958      -0.15      0.8774

          CI         CI3397     -435.2179895 B     286.4511670      -1.52      0.1297

          CI         CI3399     -881.4329535 B     429.1387825      -2.05      0.0409

          CI         CI3404     -444.0878790 B     296.4311958      -1.50      0.1352

          CI         CI3411     -429.6075402 B     286.4511670      -1.50      0.1347

          CI         CI3423     -202.8770546 B     296.4311958      -0.68      0.4943

          CI         CI3424      -81.0557310 B     286.4511670      -0.28      0.7774

          CI         CI3425     -504.8586894 B     286.4511670      -1.76      0.0790

          CI         FP1094     -231.8712486 B     347.5157316      -0.67      0.5051

          CI         FP2024      422.7024226 B     295.7373148       1.43      0.1540

          CI         FP2044     -170.6113760 B     286.4511670      -0.60      0.5519

          CI         FP2102     -230.1674082 B     359.1959950      -0.64      0.5222

          CI         FP2107     -381.4768045 B     359.1959950      -1.06      0.2891

          CI         FP2112     -109.2959996 B     285.5318421      -0.38      0.7022

          CI         FP2114     -586.3033210 B     285.5318421      -2.05      0.0409

          CI         FP2118     -534.7015999 B     285.5318421      -1.87      0.0621

          CI         FP2119     -236.5543681 B     285.5318421      -0.83      0.4081

          CI         N0010       -61.5452902 B     286.4511670      -0.21      0.8300

          CI         N2007      -289.5237181 B     317.8808187      -0.91      0.3631

          CI         N2010      -169.5133808 B     317.8808187      -0.53      0.5943

          CI         N2010B     -757.4554914 B     337.4755132      -2.24      0.0255

          CI         N2010Y     -839.6474827 B     337.4755132      -2.49      0.0134

          CI         N2014      -218.6600408 B     317.8808187      -0.69      0.4921

          CI         N305        163.3258906 B     317.8808187       0.51      0.6078

          CI         N320         72.8544734 B     317.8808187       0.23      0.8189

          CI         N323       -301.6254559 B     317.8808187      -0.95      0.3435

          CI         N325       -119.0291690 B     317.8808187      -0.37      0.7083

          CI         N9719         0.0000000 B        .               .         .

          MYIELD(CI) CI 389        0.6790373         0.1094259       6.21      <.0001

          MYIELD(CI) CI2522        1.0747337         0.1094259       9.82      <.0001

          MYIELD(CI) CI2921        1.2735701         0.1094259      11.64      <.0001

          MYIELD(CI) CI3096        1.0612916         0.1631526       6.50      <.0001

          MYIELD(CI) CI3259        0.9272735         0.1164787       7.96      <.0001

          MYIELD(CI) CI3270        0.9664390         0.1831084       5.28      <.0001

          MYIELD(CI) CI3296        1.0137999         0.1831084       5.54      <.0001

          MYIELD(CI) CI3297        1.0373615         0.1146171       9.05      <.0001

          MYIELD(CI) CI3318        1.0312695         0.1831084       5.63      <.0001

          MYIELD(CI) CI3327        0.9097946         0.1115983       8.15      <.0001

          MYIELD(CI) CI3332        0.8070294         0.1831084       4.41      <.0001

          MYIELD(CI) CI3353        1.0572017         0.1164787       9.08      <.0001

          MYIELD(CI) CI3358        0.9449930         0.1164787       8.11      <.0001

          MYIELD(CI) CI3397        1.1499679         0.1115983      10.30      <.0001

          MYIELD(CI) CI3399        1.2998448         0.1831084       7.10      <.0001

          MYIELD(CI) CI3404        1.0702471         0.1164787       9.19      <.0001

          MYIELD(CI) CI3411        1.1313799         0.1115983      10.14      <.0001

          MYIELD(CI) CI3423        1.0156819         0.1164787       8.72      <.0001

          MYIELD(CI) CI3424        0.9681886         0.1115983       8.68      <.0001

          MYIELD(CI) CI3425        1.1565583         0.1115983      10.36      <.0001

          MYIELD(CI) FP1094        1.0046505         0.1501929       6.69      <.0001

          MYIELD(CI) FP2024        0.7000287         0.1146171       6.11      <.0001

          MYIELD(CI) FP2044        0.9228528         0.1115983       8.27      <.0001

          MYIELD(CI) FP2102        0.9328311         0.1711090       5.45      <.0001

          MYIELD(CI) FP2107        1.0846242         0.1711090       6.34      <.0001

          MYIELD(CI) FP2112        0.9502622         0.1094259       8.68      <.0001

          MYIELD(CI) FP2114        1.1132791         0.1094259      10.17      <.0001

          MYIELD(CI) FP2118        1.0955636         0.1094259      10.01      <.0001

          MYIELD(CI) FP2119        0.9747877         0.1094259       8.91      <.0001

          MYIELD(CI) N0010         0.9216513         0.1115983       8.26      <.0001

          MYIELD(CI) N2007         0.9835869         0.1417316       6.94      <.0001

          MYIELD(CI) N2010         0.9218646         0.1417316       6.50      <.0001

          MYIELD(CI) N2010B        1.3509072         0.1631526       8.28      <.0001

          MYIELD(CI) N2010Y        1.3674357         0.1631526       8.38      <.0001

          MYIELD(CI) N2014         0.9480153         0.1417316       6.69      <.0001

          MYIELD(CI) N305          0.7522699         0.1417316       5.31      <.0001

          MYIELD(CI) N320          0.8438111         0.1417316       5.95      <.0001

          MYIELD(CI) N323          1.1199809         0.1417316       7.90      <.0001

          MYIELD(CI) N325          0.9714982         0.1417316       6.85      <.0001

          MYIELD(CI) N9719         0.8615456         0.1115983       7.72      <.0001

 

NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve

      the normal equations.  Terms whose estimates are followed by the letter 'B' are not

      uniquely estimable.


The following are some lecture notes.



Regression - AOV - Covariance
Regression models
Yi = µy + ß(Xi - µx) + ei


AOV - design model
Yi = µy + ti + ei
Covariance model
Yi = µy + ti + ß(Xi - µx) + ei


Basic idea
Yi = µy + ß(Xi - µx) + ei
min Sei2 = S[Yi - µy - ß(Xi - µx)]2
The partial derivative with respect to µ = -2S[Yi - µy - ß(Xi - µx)]
evaluated at zero
S[Yi - µy - ß(Xi - µx)] = 0
S[Yi - µy - ß(Xi - µx)] = 0


S Yi - nµy - 0 = 0
µy = S Yi / n
The partial derivative with respect to ß = S(Xi - µx)[Yi - µy - ß(Xi - µx)]
evaluated at zero
S[(Xi - µx)[Yi - µy - ß(Xi - µx)] = 0
S[x[y - ßx)] = 0
S xy - ß S x2 = 0
ß = Sxy / Sx2



                   observed = mean + regr + dev

Sum of squares

Mean S (Y)2 / n
Regr S(xy)2 / Sx2
Dev Y2 - mean - regr

Mean squared deviation from regression
S2y.x = dev sum of squared / (n-2)
Var of estimated ß = S2y.x / x2 = S2b
Tests H: ß = ß0
t = [b - ß0] / Sb
H: ß1 = ß2 between two regression coeff
t = [b1 - b2] / [Sb1 - b2]
S2b1 - b2 = S2p (1/Sx21 + 1/Sx22)
S2p = [SS dev Y1 + SS dev Y2]/(n1 + n2 - 4)

Correlation

or r = Sxy /[Sx2Sy2]
r2 tells us the relative amount of variation in common.
An example
AgeBlood Pressurexyx2y2xy
35 114 -20 -27 400 729 540
45 124 -10 -17 100 289 170
55 143 0 2 0 4 0
65 158 10 17 100 289 170
75 166 20 25 400 625 500
--- --- --- --- --- --- ---
275 705 0 0 1000 1936 1380


b = Sxy / Sx2 = 1380/1000 = 1.38
µx = 55 µy = 141
Y = 65.1 + 1.38 X
Predicted blood pressure
Age Y predicted dev dev squared
35 114 113.4 0.6 0.36
45 124 127.2 -3.2 10.24
55 143 141.0 2.0 4.00
65 158 154.8 3.2 10.24
75 166 168.6 -2.6 6.76
0 31.60

S2y.x = 31.6 / 3 = 10.53 S2b = 10.53/1000 Sb = 0.102

H: ß = 0

t = 1.38 / 0.102 = 13.5 *

Source df SS MS F
Total 5 101341
Mean 1 99405
Corr Tot 4 1936
Regress 1 1904.4 1904.4 180.8
Dev 3 31.6 10.53


Intercept = µy - ßµx

Variance of intercept = Var µy + Var ßµx

= S²y.x(1/n) + µ²xy.x (1/x²)

= S²y.x(1/n + µ²x/x²) = S²



Test procedures in simple linear regression

HypothesisStatisticEquation
a = a0t(a - a 0)/Sa0
ß = ß0t (ß - ß0)/Sß
a = a 0 and ß = ß0F n(a - a0)² + 2nµx[(a - a0)(ß - ß0) + (ß - ß0)Sx²] / (2S²y.x)

An example 2 groups
Group 1......... Group 2
X Y X Y
30 165 24 180
27 170 31 169
20 130 20 171
21 156 26 161
33 167 20 180
29 151 25 170

Group 1 ß1 = 1.995 SS dev = 566.83

Group 2 ß2 = -0.852 SS dev = 200.95

H: ß1 = ß2

t = [1.995 - (-0.852)]/{[(566.83+200.95)/ (5+5-4)][1/133.33+1/85.33]}½

= 2.447

t6,.05 = 2.447

t6,.20 = 1.44



Rerun of example with matrix approach - like a computer package would solve a linear regression problem

Y = µ + (X-µx)b

114 = 1µ + (35-55)b = -20

124 = 1µ + (45-55)b = -10

143 = 1µ + (55-55)b = 0

158 = 1µ + (65-55)b = 10

166 = 1µ + (75-55)b = 20



5 0 µµ µx

X'X =

0 1000 xµ xx



1/5 0

(X'X)-1 =

0 1/1000



705

(X'Y) =

1380



(1/5)(705)+0(1380)

(X'X)-1 (X'Y) =

0(705)+1380/1000



Mean of Y = 705/5 = 141

ß = 1380/1000

SS due to the model = 141 * 705 + 1.380 * 1380

mean regression

Deviation from regression = total - SS due to model = 31.6

Standard deviation of mean = inverse element * dev variance

1/5 * 10.53 for the mean

1/1000 * 10.53 for ß





Intercept = µy - ßµx

Variance of intercept = Var µy + Var ßµx

= S²y.x(1/n) + µ²xy.x (1/x²)

= S²y.x(1/n + µ²x/x²) = S²



Test procedures in simple linear regression

Hypothesis Statistic Equation

= 0 t (a - 0)/S

ß = ß0 t (b - ß0)/Sß

= 0 and

ß = ß0 F n(a - 0)² + 2nµx(a - 0)(b - ß0) +

(b - ß0)x²) / (2S²y.x)



An example 2 groups

Group 1 Group 2

X Y X Y

30 165 24 180

27 170 31 169

20 130 20 171

21 156 26 161

33 167 20 180

29 151 25 170

Group 1 ß1 = 1.995 SS dev = 566.83

Group 2 ß2 = -0.852 SS dev = 200.95

H: ß1 = ß2

t = [1.995 - (-0.852)]/{[(566.83+200.95)/

(6+6-4)][1/133.33+1/85.33]}½

= 2.096

t6,.05 = 2.447

t6,.20 = 1.44

Some questions that we might ask?

1. Can one regression be used for all observations?

2. If one can not be used is the regression the same within each group. ß1 = ß2?

SAP results - PROC GLM;MODEL Y=X;

DEPENDENT VARIABLE Y

AOV TABLE

DF SS MS

TOTAL 11 2065.66667

MODEL 1 31.47234 31.47234

ERROR 10 2034.19433 203.41943

R-SQUARED = 0.01524 MEAN = 164.16667

EFFECTS ESTIMATE STD ERROR

INTERCEPT 154.834762573242

X 1 .36595743894577 .930384118119818

DEPENDENT VARIABLE Y

PARTIAL SS DF SS MS

X 1 31.47234 31.47234

SAP output -- PROC GLM;CLASSES GROUP;MODEL Y=GROUP X X*GROUP;

CLASS LEVELS

GROUP 2 1 2

DEPENDENT VARIABLE Y

AOV TABLE

DF SS MS

TOTAL 11 2065.66667

MODEL 3 1297.88135 432.62712

ERROR 8 767.78532 95.97316

R-SQUARED = 0.62831 MEAN = 164.16667

EFFECTS ESTIMATE STD ERROR

INTERCEPT 147.927352905273

GROUP 1 -44.6273307800293 17.3909942979608

X 1 .571718752384186 .679058955921065

X*GROUP 1 1.42328178882599 .679058955921065

DEPENDENT VARIABLE Y

PARTIAL SS DF SS MS

GROUP 1 631.97925 631.97925

X 1 68.02988 68.02988

X*GROUP 1 421.61557 421.61557

Question #1:

F = (2034.19433 - 767.78532)/2 / 95.97316 = 6.597

F2,8,0.05 = 4.46

sign ===> one regression could not be used for all

Question #2:

F = 421.61557 / 95.97316 = 4.393

F1,8,0.05 = 5.32

non-sign ===> regression coeff same for each group.

Treatments

1 2 3 4

X Y X Y X Y X Y

30 165 24 180 34 156 41 201

27 170 31 169 32 189 32 173

20 130 20 171 35 138 30 200

21 156 26 161 35 190 35 193

33 167 20 180 30 160 28 142

29 151 25 170 29 172 36 189

160 939 146 1031 195 1005 202 1098 703 4073

Comparsion among groups

Group df xx xy yy dev df MS

1 5 133.33 266.0 1097.5 566.83 4

2 5 85.33 -72.0 262.8 200.95 4

3 5 33.5 -42.5 2047.5 1993.58 4

4 5 109.33 346.0 2530.0 1435.04 4

4196.4 16 262.275

Within 20 361.5 496.8 5937.8 5255 19 276.579

Among 3 365.459 451.2 2163.1 1606.04 2 803.023

Total 23 726.959 948 8100.9 6864.6 22 312.027

ß1 = 1.995 ß2 = -.0852 ß3 = -1.269 ß4 = 3.164

ßt = 1.304 ßw = 1.374 ßm = 1.235



Questions:

1. Can one regression line be used for all observations?

F = (6864.6-4196.4)/6 / 262.275

= 1.696 if non sign then stop

2. Are the regression coefficients the same for all groups?

F = (5255 - 4196.4)/3 / 262.275

= 1.345 if sign then stop

3. Is the regression of group means linear?

F = 803.023 / 276.579

= 2.903 if sign then stop

4. Is the regression of group means the same as the within group?

F = (5864.6 - 4194.4 - 1606.04) / 276.579

= 0.013

Covariance

Deviations about regression

Source df x2 xy y2 df SS MS

Trt 3 365.459 451.2 2163.1

Error 20 361.5 496.8 5937.8 19 5255.01 276.579

T + E 23 726.959 948 8100.9 22 6864.61

trt adjusted 3 1609.6 536.53



F test adjusted treatments = 536.53 / 276.579 = 1.94 non sign 5%

Error regression coefficient = 496.8 / 361.5 = 1.374

Adjusted mean of Y = Unadjusted mean of Y + regression coefficient * deviation in X

Treatment(group) meanx devx meany adj mean

1 26.67 -2.62 156.5 160.1

2 24.33 -4.96 171.83 178.65

3 32.50 3.21 167.5 163.09

4 33.67 4.38 183.0 176.98

Regression with several groups

Group df SSx CPxy SSy from regr df MS

1 n1-1 A1 B1 C1 C1-B1*B1/A1 n1-2

2 n2-1 A2 B2 C2 C2-B2*B2/A2 n2-2

.

k nk-1 Ak Bk Ck Ck-Bk*Bk/Ak nk-2

Sum S1 N-2k MS1

Within N-k Aw Bw Cw S1+S2=Cw-Bw*Bw/Aw N-k-1 MS1+2

Among k-1 Am Bm Cm S3=Cm-Bm*Bm/Am k-2 MS3

Total N-1 At Bt Ct St=Ct-Bt*Bt/At N-2 MSt

Questions and proper tests:

1.  Can one regression line be used for all observations?

F= [(St - S1)/2(k-1)] / MS1

2.  Are the regression coefficients the same for all groups?

F= MS2 / MS1

3.  Deviations from regression of group means the same as error?

F= MS3 / MS1+2

4.  Regression coefficient for among groups the same as within groups?

F = MS4 / MS1+2

Note St = S1 + S2 + S3 + S4

SAS statements to obtain necessary information:

PROC GLM;MODEL Y=X;

deviations from the model is St

PROC GLM;CLASSES TRT;MODEL Y=TRT X X*TRT;

deviation from the model is S1

SS due to X*TRT is S2

PROC GLM;ABSORB TRT;MODEL Y=X; or PROC GLM;CLASSES TRT;MODEL T=TRT X;

deviation from the model is S1 + S2

PROC MEANS NOPRINT;BY TRT;VAR X Y;OUTPUT OUT=NEW MEAN=X Y N=NUM;

PROC GLM;WEIGHT NUM;MODEL Y=X;

deviation from the model is S3

Source df SS Error

Total 2(k-1) St - S1 MS1

ß1 = ß2 = ... = ßk (k-1) S2 MS1

dev from means lin (k-2) S3 MS1+2

ßw = ßm 1 St - S1 - S2 - S3 MS1+2


RCBD example

Using the oat data with the first observation missing!!!!!!

Assign miss=0 except to observation with missing value, then assign miss=1.

PROC GLM;CLASSES REP CULT;MODEL YLD=REP CULT MISS;

AOV TABLE

DF SS MS

TOTAL 55 22122.125

MODEL 17 19582.055 1151.886

ERROR 38 2540.070 66.844

R-SQUARED = 0.88518 MEAN = 54.125

EFFECTS ESTIMATE STD ERROR

INTERCEPT 55.4633712768555

... ... ....

MISS 1 -74.9487152099609 9.79699262222576

DEPENDENT VARIABLE YLD

PARTIAL SS DF SS MS

REP 3 216.597 72.199

CULT 13 16433.420 1264.109

MISS 1 3912.055 3912.055

PROC GLM;CLASSES REP CULT;MODEL YLD=REP CULT;

Dropping the missing observation.

DEPENDENT VARIABLE YLD

AOV TABLE

DF SS MS

TOTAL 54 19139.345

MODEL 16 16599.275 1037.455

ERROR 38 2540.070 66.844

R-SQUARED = 0.86729 MEAN = 55.109

DEPENDENT VARIABLE YLD

PARTIAL SS DF SS MS

REP 3 216.597 72.199

CULT 13 16433.424 1264.110

Estimate the missing plot using missing plot formula.

Ymiss= 74.94872

PROC ANOV;CLASSES REP CULT;MODEL YLD=REP CULT;

REP 4 1 2 3 4

CULT 14 Clarion Clintland Clintland 60

Clinton Early Clinton Fayette

Goodfield Minhafee Minton

Mo. O-205 Nehawka Nemaha

Newton Putman

DEPENDENT VARIABLE YLD

SOURCE DF SS MS F

TOTAL 55 19525.93

REP 3 221.73 73.91 1.135

CULT 13 16764.13 1289.55 19.800***

ERROR 39 2540.07 65.13

Note the estimate of the missing plot is the negative of the regression coefficient in model 1.

Compare the various AOV tables.



Define miss and miss2 for the missing values.

PROC GLM;CLASSES REP CULT;MODEL YLD=REP CULT MISS MISS2;

DEPENDENT VARIABLE YLD

AOV TABLE

DF SS MS

TOTAL 55 24469.982

MODEL 18 22066.143 1225.897

ERROR 37 2403.840 64.969

R-SQUARED = 0.90176 MEAN = 52.768

EFFECTS ESTIMATE STD ERROR

...

MISS 1 -76.0277786254883 9.68728995184745

MISS2 1 -90.0277786254883 9.68728995184745

DEPENDENT VARIABLE YLD

PARTIAL SS DF SS MS

REP 3 272.745 90.915

CULT 13 16139.652 1241.512

MISS 1 4001.693 4001.693

MISS2 1 5611.154 5611.154

DROP MISSING OBSERVATIONS.

PROC GLM;CLASSES REP CULT;MODEL YLD=REP CULT;

DEPENDENT VARIABLE YLD

AOV TABLE

DF SS MS

TOTAL 53 18694.833

MODEL 16 16290.995 1018.187

ERROR 37 2403.838 64.969

R-SQUARED = 0.87142 MEAN = 54.722

DEPENDENT VARIABLE YLD

PARTIAL SS DF SS MS

REP 3 272.745 90.915

CULT 13 16139.650 1241.512

Estimate missing plot 90.02772 and 76.02777

PROC ANOV;CLASSES REP CULT;MODEL YLD=REP CULT;

REP 4 1 2 3 4

CULT 14 Clarion Clintland Clintland 60

Clinton Early Clinton Fayette

Goodfield Minhafee Minton

Mo. O-205 Nehawka Nemaha

Newton Putman

DEPENDENT VARIABLE YLD

SOURCE DF SS MS F

TOTAL 55 20338.01

REP 3 293.22 97.74 1.586

CULT 13 17640.95 1357.00 22.016***

ERROR 39 2403.84 61.64

NOTE: the estimate of missing plots and regression coefficients.

compare the various AOV tables.