|
|
||||||||
a Agro-Tech, Inc., Velva, ND, 58790
b Statistical Advisory & Training Service Pty Ltd, Sydney, NSW, Australia
* Corresponding author (agrotec{at}srt.com).
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: LMM, Linear Mixed Model BLUP, Best Linear Unbiased Predictor REML, Residual Maximum Likelihood
| NOTES |
|---|
|
|
|---|
Received for publication March 30, 2007.
| INTRODUCTION |
|---|
|
|
|---|
Let Y1, ..., Yn represent n individual plant yields in a plot and assume for the moment that
2 represents the common variance on a per plant basis, so that var(Yi) =
2 for each i. Then statistical theory tells us that, for independently growing plants,
![]() |
In field trials, it is unlikely that plants are growing independently within a plot unless widely spaced. Indeed, it is quite common now for field trials to be checked for the presence of spatial correlations (Payne, 2006). In the event of strong plant competition within a plot, the variance of total plot yield may well be constant across different plant spacings. Plants with more space tend to grow larger; plants with less space smaller, but there are more of them, so the variance of the total plot yield may even out. This is the other extreme. The reality is probably somewhere between proportional variance and constant variance. How is this checked?
A harvest or sowing date treatment also raises the separate issue of whether the variance of the yield from plants grown for various lengths of time can be constant. For example, Fig. 1A shows dry weights of transplanted lettuce (Lactuca sativa L.) seedlings for the first 35 d of growth (J. Moodie, personal communication, 1994). It is clear that the variance of dry weight increases over time; once logged (Fig. 1B), the variance appears stable.
|
| DATA |
|---|
|
|
|---|
|
| ANALYSIS AND DISCUSSION |
|---|
|
|
|---|
|
|
![]() |
Generally in agriculture blocks are considered random, so for the current experiment the random part of the model is Block + Error. Accordingly, we need to specify the variance structure for the random model in such a way as to allow an investigation into changing variances across the levels of two of our factors, density and planting date.
To investigate whether the variance changes across the levels of a factor, we calculate the change in deviance between the constant-variance and changing-variance models. Deviance plays the role in LMM (REML) that the Residual Sum of Squares plays in ANOVA. The change in deviance is distributed approximately as
2 with degrees of freedom (df) equal to the change in individual df between the two models.
Some packages use a statistic known as the scaled Wald statistic and an associated P value from a
2 distribution. For a balanced design, the scaled Wald statistic is the same as the F statistic from the ANOVA. In that case one is free to replace the P value with one based on an F distribution. Other packages such as ASReml (Gilmour et al., 2006) use an F statistic with (sometimes approximate) denominator degrees of freedom, as explained later. This is the preferred approach, because the P value from the Wald statistic assumes known variances and hence is underestimated, especially for small-scale experiments.
We chose GenStat for this paper, because (i) it is easy to use and (ii) it is free to "not-for-profit research organizations, charities, and educational institutes based in the developing world" (www.vsni.co.uk/products/discovery, verified 5 Mar. 2008). (For users of GenStat, the Wald statistic is the only test statistic produced before Version 10.) The Appendix also provides the code to perform the analyses in SAS (SAS Institute, Inc., 2004) and R (R Development Core Team, 2006).
Variance Structures in the Error Term
In addition to a constant variance assumption, there are two possible assumptions for a changing variance across dates and densities for the turnip data. In both cases we assume a constant variance across blocks and varieties.
The first allows the variance to change across densities in a pattern that is the same across all dates. In this case the variance may also change across dates. This variance model is multiplicative across dates and densities, with five parameters to estimate (four variances and one multiplicative constant for the second date). Note that this form of the model could be explored to allow a changing variance across dates but not densities.
The second allows the variance to change across all combinations of dates and densities in no particular pattern. This is an unstructured variance model, with eight variances to estimate. For the turnip data, this says that any change in variance across densities for turnips planted on 21 August is different than any change in variance for turnips planted on 28 August. Since these sowing dates are close, we might expect this not to be the case.
The first assumption for density/date variances is a special case of the second assumption, and hence can be tested using change in deviance. The deviances for various models are presented in Table 3 .
|
Next we compare the unstructured vs. multiplicative model for densities and dates. The change in deviance is 2.06 with 42 – 39 = 3 df and is not significant (P = 0.560). This indicates that any change in the variance across densities is proportionately the same at each date.
We now explore whether the variance for the multiplicative model is constant across dates (but not densities). The change in deviance, 168.10 – 162.05 = 6.05 with 43 – 42 = 1 df, is significant (P = 0.014). The variance therefore changes across dates, which we had expected from biological considerations. To test if it also changes across densities, we use 175.71 – 162.05 = 13.66 with 45 – 42 = 3 df, which is also significant (P = 0.003). We had expected this from mathematical considerations.
The LMM (REML) analysis from GenStat (Version 10) for the final model with a random block term and a random error term with a constant variance across blocks and varieties and a variance that changes multiplicatively across densities and dates is given in Table 4 . Notice that GenStat produces sequential (i.e., Type I) tests, as well as conditional tests for the current model. We dropped the three-factor interaction to illustrate the conditional tests for the three two-factor interactions.
|
Finally, we note that the change in deviance is also used to assess whether a random factor can be dropped. Technically, a random factor has no role to play in a model if the variance of that random term is zero. For the multiplicative model of the turnip data, the variance component for the random block term is only 0.16. The change in deviance for the same model with the random block term removed is 162.41 – 162.05 = 0.36 (based on 43 – 42 = 1 df, P = 0.546) and hence it is clear that there is no block variation. This of itself is extremely interesting, since the ANOVA (which assumes constant variance) indicated a strongly significant block effect. The analysis has a strong block effect associated with low values in block 4. Block 4 has the lowest yield in 13 of the 16 treatments, the high values being with the lowest density. Thus, allowing a larger variance for the higher densities has moved block variation into the residual.
Estimated variety means are 6.552 (Barkant) and 4.231 (Marco). The estimated standard error of the mean (SE) is 0.547 from the ANOVA and 0.623 from the LMM (REML). The estimated standard error of the difference (SED) is 0.774 from the ANOVA and 0.834 from the LMM (REML). There are clear advantages to the LMM (REML) analysis when we compare density by date means. From the ANOVA (which assumes a common variance), the SE is 1.095 for all means and the SED is 1.548 for all comparisons. Using the LMM (REML) analysis, these depend on the comparison being made. Table 5 presents the individual means, SE and SED values for date and density. While the average SE is 1.071, the minimum is only 0.411 (for the lowest density planted on 21 August) and the maximum 2.039 (for a density of 4 kg ha–1 planted on 28 August). Similarly, while the average SED is 1.579, the minimum is 0.641 (for the lowest density planted on 21 August) and the maximum is 2.671 (for the highest density planted on 28 August). The variances across dates and densities vary, hence means are compared with appropriate SED values. At this point, a couple of questions remain.
|
For the turnip experiment with no random block effect and with the variance assumed constant, the df to use are the combined df from the individual variance estimates from all 16 treatments, thus, 16 x(4 – 1) = 48. For the unstructured variance model, the variance component for a particular date and density must be the average of two estimates, each from a different cultivar and each with (4 – 1) df. This gives 6 df for each variance component, which will drop slightly if a block effect is included: 45/8 = 5.625 would be a reasonable value to use.
For the variance components in the multiplicative variance model with a random block effect, 6 df is an underestimate: we have only five, not eight, different variance parameters to estimate, and these are obtained, in part, from all dates and densities. Again, 45/5 = 9 would be a reasonable value to use.
The SED values that GenStat calculates are based on estimates of variances appropriate to the comparison under consideration. The analysis suggests that there are two tables of means of interest.
First, there is a significant varietal mean difference. We suggest using 17.6 df for this comparison, which are the denominator df of the Variety F statistic.
Next, we look at the table of date and density means. In this case, each mean is based on a different variance, and a Satterthwaite approximation is one option for calculating approximate df. The Satterthwaite formula for a comparison involving means based on n1 and n2 observations and respective variance estimates s12 and s22 is
![]() |
For the current multiplicative model, s22 = 3.053 x s12, and hence the approximate df for a comparison across dates for a particular density simplifies to (1 + 3.053)2/(1 + 3.0532) = 1.6 times the df of any one component (which we suggested were 9); this gives 14.4 df for all densities. For comparisons across densities for a given date, the multiplier changes from a minimum of 1.2 for the most different variance components (densities 1 and 3) to 2.0 for the most similar (densities 3 and 4).
Standardized residuals
Many packages do not produce standardized residuals for LMM. With such a change in variance across the units of this experiment, these are important. How are they obtained?
With blocks considered random effects with mean zero, there are two types of fitted values and residuals available. One has only the fixed effects removed, in which case the fitted values will be the same for all plots in a block. The other has the fixed effects and what are known as the best linear unbiased predictors (BLUPs) of the block effects removed. We suggest that these are the ones to look at here: in GenStat, for example, they are obtained by selecting "Form residuals from final term only" (as opposed to "from all random terms") in the Save menu.
The standard error of the theoretical error term is
, however the standard error of the observed residuals is not simply
, the square root of the Residual MS from the ANOVA. For example, in an unblocked design the standard error of the raw residuals is
, where r is the number of replicates in a treatment; for a randomized block design with t treatments and r blocks, it is
. For a LMM model, the standard errors of the raw residuals are much more complex. However, using
would be a close approximation, especially for larger designs. For the current analysis this becomes
i, the square root of the variance component for the ith date/density combination. This is the approach used in R. The effect of using
i as a divisor is to bring into line large raw residuals from treatments with large variation, and reduce the effect of small raw residuals from treatments with small variation. The actual standardized residuals may be slightly under-dispersed, but should eliminate any fanning, as shown in the standardized residual plot in Fig. 2(D).
| CONCLUSIONS |
|---|
|
|
|---|
We recommend that a fundamental starting point of such an analysis is that the variance of yield changes. If a constant variance model is warranted in a density trial, this simply means that plant competition is sufficiently strong for the varying numbers of plants in plots to produce yields that vary similarly.
A LMM analysis is a very flexible tool that allows for changing variances and spatial or temporal correlations for random effects. Many modern statistical packages implement a REML algorithm for estimating the variance matrices, and produce sufficient information in their output to make informed decisions about model assumptions and mean comparisons.
| APPENDIX |
|---|
|
|
|---|
(a) For the multiplicative variance model over densities and dates
Fixed Model: Variety*Density*Date
Random Model: Block+Block.Variety.
Density.Date
Correlated Error Terms... Identity+Identity,Identity,Diagonal,Diagonal
(b) For the unstructured variance model over densities and dates, define a new factor called (say) DensityDate which indexes over the eight density/date combinations
Fixed Model: Variety*Density*Date
Random Model: Block+Block.Variety.
DensityDate
Correlated Error Terms... Identity+Identity,Identity,Diagonal
SAS
(a) For the multiplicative variance model over densities and dates
proc mixed data=Density covtest;
class Variety Date Density Block;
model Yield= Variety Density Date
Variety|Density|Date/
ddfm=kr;
random block;
repeated Date Density/
subject=Block*Variety type=un@un;
parms (1)
(1)
(0) (1)
(1)
(0) (1)
(0) (0) (1)
(0) (0) (0) (1)
/hold=3,5,6,8,9,11,12,13;
lsmeans date*density/diff;
ods output covparms=covest;
run;
(b) For the unstructured variance model over densities and dates
proc mixed data=Density;
class Variety Date Density Block;
model Yield= Variety Density Date
Variety|Density|Date/ddfm=kr;
random Block;
repeated/group=Density*Date;
run;
R
(a) For the multiplicative variance model over densities and dates
library(nlme)
McConway.lme
<-lme(Yield
Variety*Date*Density,random=.
1|Block,data=McConway)
summary(McConway.lme)
anova(McConway.lme)
McConway.lme.mult
<-update(McConway.lme,weights=varComb(varIdent(form=
1|Density),
+ varIdent(form=
1|Date)))
summary(McConway.lme.mult)
anova(McConway.lme.mult)
(b) For the unstructured variance model over densities and dates
McConway.lme.unst <
update(McConway.lme,weights=varIdent(form=
1|Density*Date))
summary(McConway.lme.unst)
anova(McConway.lme.unst)
| ACKNOWLEDGMENTS |
|---|
All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. Haque, F. M. Epplin, and C. M. Taliaferro Nitrogen and Harvest Frequency Effect on Yield and Cost for Four Perennial Grasses Agron. J., November 1, 2009; 101(6): 1463 - 1469. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-P. Piepho Data Transformation in Statistical Analysis of Field Trials with Changing Treatment Variance Agron. J., July 7, 2009; 101(4): 865 - 869. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| The SCI Journals | Crop Science | Vadose Zone Journal | |||
| Journal of Natural Resources and Life Sciences Education |
Soil Science Society of America Journal | ||||
| Journal of Plant Registrations | Journal of Environmental Quality |
The Plant Genome | |||