Agronomy Journal Grow Your Career With ASA
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online 17 June 2005
Published in Agron J 97:1082-1096 (2005)
DOI: 10.2134/agronj2004.0130
© 2005 American Society of Agronomy
677 S. Segoe Rd., Madison, WI 53711 USA
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (8)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Hong, N.
Right arrow Articles by Weisz, R.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Hong, N.
Right arrow Articles by Weisz, R.
Agricola
Right arrow Articles by Hong, N.
Right arrow Articles by Weisz, R.
Related Collections
Right arrow Geostatistics
Right arrow Spatial Variability
Right arrow Experiment Design
Right arrow Site-Specific Analysis
Right arrow Statistics

Statistics

Spatial Analysis of Precision Agriculture Treatments in Randomized Complete Blocks

Guidelines for Covariance Model Selection

Nan Honga, Jeffrey G. Whiteb,*, Marcia L. Gumpertzc and Randy Weiszd

a Dep. of Crop and Soil Science, 116 ASI Building, The Pennsylvania State Univ., University Park, PA, 16802
b Dep. of Soil Science, North Carolina State Univ., Raleigh, NC 27695-7619
c Dep. of Statistics, North Carolina State Univ., Raleigh, NC 27695-8203
d Dep. of Crop Science, North Carolina State Univ., Raleigh, NC 27695-7620

* Corresponding author (jeff_white{at}ncsu.edu)

Received for publication May 14, 2004.

    ABSTRACT
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 COVARIANCE MODEL SELECTION...
 APPLYING THE PROCEDURE: A...
 INTERPRETATION AND DISCUSSION
 CONCLUSIONS
 APPENDIX
 REFERENCES
 
Failure to account for spatially correlated errors when present in the classical randomized complete block (RCB) analysis may cause inefficient estimation of treatment significance. Covariance model selection is a necessary component for spatial adjustment to estimate treatment significance. We discuss methods for selecting a covariance model in RCB analyses in the presence of spatial correlation and demonstrate one procedure in detail. The procedure uses three models: the randomized complete block with independent and identically distributed errors (RCBiid), RCB with correlated errors, and models with correlated errors but no block effects. The semivariogram of the residuals from fitting a model with just fixed effects, the likelihood ratio test, and Akaike Information Criterion are used for model selection. To illustrate the procedure, we analyzed winter wheat (Triticum aestivum L.) forage and corn (Zea mays L.) grain yield in the presence of spatial heterogeneity within blocks from a site-specific N management study. We compared the selected covariance models to the RCBiid models and to other spatial models with respect to the estimation of treatment significance. The procedure can be extended to any experiment with fixed effects, or with both fixed and random effects, and which may potentially have spatially correlated errors. The procedure is systematic and readily implemented; however, it remains difficult to evaluate whether an adequate covariance model has been selected.

Abbreviations: AIC, Akaike Information Criterion • ANOVA, analysis of variance • CE, correlated errors • CIR, color infrared aerial photography • ddf, denominator df • EGLS, estimated generalized least squares • FA, field average • GIS, geographic information systems • GPS, global positioning system • iid, independent and identically distributed • LRT, likelihood ratio test • RCB, randomized complete block • RCBCE, randomized complete block with correlated errors • REML, restricted maximum likelihood • RYE, realistic yield expectation • SSNM, site-specific nitrogen management • VRT, variable rate technologies


    INTRODUCTION
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 COVARIANCE MODEL SELECTION...
 APPLYING THE PROCEDURE: A...
 INTERPRETATION AND DISCUSSION
 CONCLUSIONS
 APPENDIX
 REFERENCES
 
PRECISION TECHNOLOGIES, for example, global positioning systems (GPS), geographic information systems (GIS), and variable rate technologies (VRT), have evolved rapidly and been incorporated into production agriculture, making precision farming an operational reality. For example, crops are commonly harvested using a combine equipped with a yield monitor linked with GPS. The resultant yield data consist of numerous georeferenced sample points across the field and dense subsampling within experimental plots. This is not at all common to traditional small-plot research, and precision agricultural research may benefit from analyses encompassing multiple spatially georeferenced observations within plots (e.g., Ferguson et al., 2002; Eghball et al., 2003; Weisz et al., 2003).

Spatial correlation in the observations and in the residuals of the statistical model used to analyze the data is also common in precision agriculture research. This is due to several factors. It has been well documented that soil physical properties (e.g., Davidoff and Selim, 1988), moisture relations (e.g., Achouri and Gifford, 1984), and soil nutrient status (e.g., Webster and Nortcliff, 1984) vary spatially across fields. Thus, it is always possible that observations such as crop yield will also be spatially correlated and have heterogeneous variances (Scharf and Alley, 1993). Precision agriculture research is often performed on a large spatial scale resulting in more spatial variability within and among blocks and plots than is typical of traditional small-plot research.

In traditional small-plot research, treatments are applied and harvest is conducted on a plot by plot basis. This is a central element in the usual statistical definition of a "plot." In precision agriculture research, plots are large compared with harvest and application equipment, and treatment application and crop harvest are not applied on a plot by plot basis. Instead, applicators and harvesters move across the field in a serpentine fashion changing application rates as they pass from plot to plot, or assigning yield data to individual points depending on their location at any given time. Consequently, the equipment may make passes through several plots before completing harvest or application for any single plot. Thus, the definition of an experimental unit as the smallest unit to which a treatment is applied does not fit exactly.

The classical randomized complete block (RCB) design, which focuses on analysis of variance (ANOVA), is commonly used in agricultural and precision agricultural research. The RCB analysis is based on the assumption that the model errors within blocks are independent and identically distributed (iid) with the same variance. When spatial variability is present at a scale that blocking cannot explain, this independence assumption is likely violated. In addition, in the classical RCB design and its traditional analysis, only one, or possibly a few, observations are taken per plot, whereas the distinctive feature of precision agriculture experiments is that the field is densely subsampled within each plot. While an RCB analysis with subsampling can still be justified based solely on the randomization of treatments within blocks (Brownie et al., 1993), the efficiency of the classical ANOVA in making treatment comparisons and in estimating treatment means is reduced (Stroup et al., 1994). In precision agriculture research where many factors can lead to spatial correlation and a large number of observations are taken within each plot, the RCBiid analysis to test treatment effects may be made more powerful by incorporating spatial correlation.

Zimmerman and Harville (1991) proposed a random field approach that can account for spatially correlated errors (CE) among spatially georeferenced observations, and thus may provide a more efficient estimation of treatment comparisons. Brownie and Gumpertz (1997) investigated the validity and robustness of CE analysis and concluded that the method could achieve substantial increases in precision compared with classical RCBiid analysis when errors are spatially correlated while still maintaining type I error rates close to the desired significance level (usually 0.05). In CE analysis, instead of blocking being used to account for spatial variability, spatial correlation is modeled using one of several possible functions. For example, as implemented in SAS PROC MIXED Version 8 (SAS Institute, Cary, NC), just considering isotropic spatial covariance models, the researcher may choose among linear, spherical, exponential, Gaussian, and power functions, all with or without the inclusion of a nugget effect. Consequently, there are 10 possible models from which to choose. Making matters more complex, some of these candidate models may result in different treatment p values and different estimates of treatment means.

Between the traditional RCBiid method and the CE analysis lies a rich land of hybrid models. These models are considered randomized complete block with correlated errors (RCBCE) because they consist of both random block and/or block x treatment effects in addition to using a spatial covariance structure to model correlated errors. The option to include up to three random effects in a model (i.e., block, or block x treatment, or both) and a choice of 10 spatial covariance structures results in 30 candidate RCBCE models, each of which could result in different conclusions concerning treatment effects and separation of treatment means.

As described above, models with correlated errors may include block effects and plot effects, or not, and quite often the block and plot effects are omitted. The idea of omitting plot effects from a model for an RCB design with subsampling contradicts traditional practice that dictates that the model for RCB designs with subsampling include terms for random block and plot effects. To explain how omitting the block and plot effects may be justified, we describe some of the foundations of the traditional analysis of variance.

The traditional analysis of variance can be derived from two very different starting points: (i) a randomization approach, and (ii) a model-based approach. The randomization approach does not require that the data be normally distributed or uncorrelated, but it does assume that the treatments have been randomly assigned to the plots. The basic unit of data, the experimental unit, in this analysis is the plot mean, because plots have been randomly assigned to treatments; within plots there is no randomization. The distributions of the test statistics in the analysis of variance are based solely on this randomization, and the statistical analysis must include block and plot effects to be valid.

The analysis of variance can be derived by a completely different approach, which does not depend on the random assignment of treatments to experimental units. It makes assumptions about the covariance structure, namely that the block effects, plot effects, and within-block errors are drawn from separate normal distributions that are independent of each other, and that the within-plot errors are independent. These assumptions imply the RCBiid correlation structure, in which all plots within a block are equally correlated, all observations within a plot are equally correlated, and observations in different blocks are uncorrelated. Under the RCBiid correlation structure, the traditional analysis of variance with random block and plot effects provides correct tests of the treatment effects. The validity of the analysis depends on the correctness of these assumptions, not on the randomization. Note, however, that random assignment of treatments is still crucial for guarding against bias caused by accidentally aligning treatments with other confounding factors.

The CE analysis replaces the RCBiid correlation structure with a spatial correlation structure estimated from the data and depends on finding an appropriate correlation structure. It often happens that the correlation pattern that best fits the data does not include random block effects and/or random plot effects. It is possible to model either plot means or individual subsamples. In this paper we model the individual subsamples.

There are some hazards to using a model-based approach because the assumptions of normality and known covariance structure are much stronger than in the randomization approach, which makes no distributional assumptions. If the model is not at least approximately adequate, the analysis can give wildly invalid results. We list three of the hazards: (i) the spatial correlation function is difficult to estimate and the significance test for treatment effects can give very different results for different correlation functions; (ii) treatments are randomly assigned to plots but not to subplots within a plot, so their effects could conceivably be confounded with some other processes, which cannot be disentangled within a plot, leading to biased estimates of the treatment effects; and (iii) there are concerns regarding the validity of using the subsample as the analytical unit, a practice that is termed pseudoreplication in the classical ANOVA, and which often leads to much higher error df (i.e., denominator df [ddf]) than the classical approach for testing treatment effects. We emphasize, however, that we are not advocating that the RCB be abandoned when designing and laying out experiments. It is the classic structure of the RCB with its inherent replication and randomization, augmented by georeferenced subsampling, that provides the framework for our analytical approach.

In the traditional RCBiid analysis of variance, the ddf for the tests of treatment effects are based on the number of plots. In the CE approach the ddf are computed based on the estimated covariance structure rather than on the randomization plan of the experiment, and are often much larger than in the RCBiid approach. A saving grace for precision agriculture trials is that if the RCBiid ddf are larger than 20 or so, increasing them may not have much impact on the critical values for F tests. For example, for testing the equality of three treatments, the critical F value is 3.32 with ddf = 30; whereas, for infinitely large ddf, the critical F value is 3.00, which is not too much different. We study the impact of the ddf on tests of treatment effects in our experiments in more detail in the subsection titled "Model Impact on P Values."

When considering a correlated errors approach to data analysis, two critical questions emerge. The first one is simply "Is this necessary, or would the traditional RCBiid method be sufficient?" The second and most daunting question is, "Given the abundance of potential CE and RCBCE models, which model or models adequately explain the covariance structure and should be used for data analysis?" It is not uncommon for agronomists to simply assume a given model is adequate. For example, Bruckler et al. (1997), analyzing soil NO3 spatial variability, and Weisz et al. (2003), analyzing variable-rate lime and P, simply assumed that the spherical covariance model they used to analyze precision agriculture data was adequate. Marx and Stroup (1993) suggested that no-nugget spatial covariance structures could generally be assumed to be adequate for correlated error analysis of agricultural data. Littell et al. (1996) suggested a general approach to covariance model selection using information criteria and variography. However, it remains a somewhat daunting task to choose among candidate models and determine which analysis may be best for a given data set.

The advent of precision agriculture research brings into high relief the question of how to make the best use of spatially correlated subsamples. In this light, our primary objective is to present a systematic, step-by-step approach that agronomists can use to select a covariance model for RCB analysis when spatial correlation is present. Our secondary objective is to demonstrate, using real examples, how the tests of treatment effects may be affected by the inclusion or exclusion of block effects, nugget effects, and spatial autocorrelation in the model. Toward these objectives, we present analyses of two separate datasets with different within-plot subsampling intensities from two trials in a site-specific N management experiment: wheat forage in 2001, and corn for grain in 2002.


    COVARIANCE MODEL SELECTION PROCEDURE
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 COVARIANCE MODEL SELECTION...
 APPLYING THE PROCEDURE: A...
 INTERPRETATION AND DISCUSSION
 CONCLUSIONS
 APPENDIX
 REFERENCES
 
Three Methods for RCB Analyses
The foundation of our analysis is the linear mixed model, which takes the following form:

[1]
where y is an n x 1 vector of responses, X is an n x p design matrix for fixed treatment effects, ß is a p x 1 vector of fixed treatment effect parameters, Z is an n x q design matrix for the random block and block x treatment effects, u is a q x 1 vector of random effects, and {epsilon} is an n x 1 vector of errors, where n, p, and q are the number of responses (i.e., subsamples), fixed effect parameters, and random effects, respectively. Key assumptions for this model are that u and {epsilon} are uncorrelated and their expectations are zero. If we set the variance of random effects var (u) = {Sigma}u, and V1 = var (Zu) = Z{Sigma}uZ' and the variance of errors V2 = var ({epsilon}), the variance of y is var (y) = V1 + V2. Thus, any correlation of observations can be specified in V2 and/or V1. Building on this foundation, we chose to focus on three potential methods for analyzing RCB designs in the presence of spatial correlation. Each of these methods is described in detail below.

Randomized Complete Block with iid Errors (RCBiid method)
This is the classical RCB analysis with subsampling. It assumes iid within-block errors ({epsilon}) for which V1 = Z{Sigma}uZ' and V2 = {sigma}2 In, where In denotes the n x n identity matrix. Consequently, any spatial correlation of observations is reflected only in V1.

Randomized Complete Block with Correlated Errors (RCBCE method)
This is a nonclassical RCB analysis with spatially correlated errors in which V1 = Z{Sigma}uZ'. When a no-nugget model is used to describe the spatial covariance, V2 = {sigma}2W, where W is the n x n spatial covariance matrix whose ijth element is defined as a function of the distance (hij) between site i and j. If a nugget model is used to describe the spatial covariance, V2 = In{sigma}2g + {sigma}2W. The nugget, In{sigma}2g, is generally due to significant variation among observations within extremely short or zero separation distance and/or measurement errors (Isaaks and Srivastava, 1989). We consider three isotropic spatial covariance functions: the spherical, Gaussian, and exponential. We chose these because of their applicability in describing spatial covariance commonly encountered in agriculture, because of the form of our exploratory semivariograms, and because they are available in SAS PROC MIXED Version 8. In this approach, any spatial correlation of observations is reflected in both V1 and V2. Clearly the RCBiid model is a reduced form of the RCBCE model, because if no spatial correlation is present, W will reduce to In.

For an example of an RCBCE model, consider an RCB design with three plots in a block and two observations in a plot for which a no-nugget exponential covariance function is chosen to model the spatial covariance. (NB: This analysis can be extended to include any number of observations per plot.) Then, Cov ({epsilon}i, {epsilon}j) = {sigma}2exp(–3hij/{theta}) (Isaaks and Srivastava, 1989), and



where matrix V1 is a positive n x n block diagonal matrix with k (6 x 6) covariance matrices along the diagonal and the off-diagonal elements are 0s. Here k is the number of blocks and 6 is the total number of observations within a block. Each matrix has 2 x 2 diagonal blocks with elements , which denote the covariance of two observations within the same plot. The variance components {sigma}2p and {sigma}2b denote the variance components within blocks, that is, among plots, and among blocks, respectively. Matrix V2 is an n x n symmetric matrix in which the sill {sigma}2 is the variance of the random errors. The sill is also the limit of the semivariance as distance increases; it appears at inter-observation distances large enough that observations are no longer correlated. The parameter {theta} is the practical range defined as the value of h at which the semivariogram without nugget reaches 95% of the sill, since the correlation of two observations fit to exponential or Gaussian functions approaches zero only when h goes to infinity. If a spherical function is fit, {theta} is the actual range (Isaaks and Srivastava, 1989).

In our example where a no-nugget exponential function is used to model the spatial covariance, SAS PROC MIXED reports the value of the sill in the covariance parameter estimates table, and labels this parameter as the "residual." If a function with a nugget had been specified, SAS PROC MIXED would have labeled the nugget (i.e., microscale variation) value as the "residual," and added a new parameter to the parameter estimates table labeled "variance." In this case, the sill consists of the sum of the residual (i.e., nugget) and the component labeled variance.

Also in our example, we caution that SAS PROC MIXED defines Cov ({epsilon}i, {epsilon}j) = {sigma}2exp(–hij/a) and labels the parameter a as "SP (EXP)." Here the parameter a does not equal the practical range {theta} as stated previously but is one third of {theta}. Thus, the practical range {theta} is computed by multiplying the parameter a by 3 (SAS Institute, 1999a). If a Gaussian function is specified, Isaaks and Srivastava (1989) define Cov = {sigma}2exp with the parameter {theta} as the practical range. However, PROC MIXED defines Cov = {sigma}2exp and labels the parameter a as "SP (GAU)," thus the practical range {theta} is computed by multiplying the parameter a by {surd}3 (SAS Institute, 1999a). If a spherical function is specified, SAS PROC MIXED labels the parameter a as "SP (SPH)," where this parameter is the actual range. For a more complete discussion of these functions see SAS Institute (1999a). There are many spatial covariance functions that can be specified in the RCBCE method.

Correlated Errors Analysis (CE method)
The CE method assumes that V1 = 0, which means the random block and block x treatment effects are not considered and are removed from the model in Eq. [1]. Any spatial correlation present is defined only in V2. Clearly the CE model is a reduced form of the RCBCE model. If the random effects of the RCBCE models are determined ineffective, that is, the elements of the matrix {Sigma}u are all zeros, RCBCE models become CE models. Additionally, all the considerations discussed above for the RCBCE method about specifying a function to model the spatial covariance also apply to CE implementation.

Overview of Model Selection
We believe that a useful guideline for choosing between candidate covariance models is that the selected model should adequately describe the covariance structure but with the fewest covariance parameters. Toward this end, step-by-step procedures describing model selection for a given data set are outlined in Fig. 1 . The first step is to determine candidate RCBCE and CE models. This is done by finding the functions that best model the spatial correlation in the RCBCE and CE methods. Once this is complete, the field of all possible candidate covariance models has been narrowed down to only three: the RCBiid model, one candidate RCBCE, and one candidate CE model. In keeping with our guideline, we propose that analysis should begin with the simplest model (i.e., the RCBiid method) and a test of the hypothesis that the errors are not spatially correlated (Hypothesis 2 in Fig. 1). This is done by comparing the RCBiid and candidate RCBCE model. If this hypothesis is not rejected, the RCBiid method is selected and the process is complete. Conversely, if the hypothesis is rejected, then RCBCE or CE analysis is warranted and the third step is to compare the candidate CE and RCBCE models (Hypothesis 3 in Fig. 1). Also in keeping with our guideline, if blocking in the RCBCE method is not significant, then the CE approach with fewer covariance parameters is selected. If blocking is significant, then the candidate RCBCE model is chosen for data analysis.



View larger version (39K):
[in this window]
[in a new window]
 
Fig. 1. Schematic representation of the best-fit covariance model selection procedure for RCB analyses in the presence of spatial correlation. The LRT denotes likelihood ratio {chi}2 test. AIC denotes Akaike Information Criterion.

 
Statistical Tools for Model Selection
We suggest the use of three types of statistical tools in the procedure outlined in Fig. 1. The first is the semivariogram of the residuals from a model containing just fixed effects ("fixed-effect model-fit residuals"). These residuals contain variation from all sources except the fixed treatment effects. This tool is used to visualize the nature of spatial correlation and to provide guidance in choosing starting parameters (i.e., nugget, range, and sill) for the RCBCE and CE methods. The semivariograms of the observations (h) and of the RCBiid model-fit residuals e(h)RCB were also computed and compared to determine qualitatively whether blocking accounted for spatial correlation among observations. All semivariograms were computed in PROC VARIOGRAM of SAS Version 8 (SAS Institute, Cary, NC).

The second tool is the Akaike Information Criterion (AIC; Akaike, 1974) that is used to compare the relative goodness-of-fit among non-nested candidate models. The AIC values are computed in SAS PROC MIXED Version 8 as:

[2]
where l is the residual log likelihood if restricted maximum likelihood (REML) is implemented and q is the number of random effect parameters being estimated using REML. In SAS Version 8, smaller AIC values indicate better fit. We chose to use AIC in preference to the other information criteria available in SAS Proc MIXED, Schwarz' Bayesian Information Criterion (BIC) (Schwarz, 1978). Guerin and Stroup (2000) compared the performance of AIC and BIC on covariance model selection for repeated measures and stated that AIC tended to result in a more complex model but with better Type I error rate control than BIC. Thus, if treatment effects are of interest, AIC will be a better criterion. In addition, BIC cannot be used to compare across methods.

The third statistical tool is the likelihood ratio {chi}2 test (LRT). Ideally, we would like to test all hypotheses concerning the relative goodness of fit of potential models with the LRT using REML. This test, however, is only applicable when comparing two nested covariance models with the same X matrix (fixed effects). Consequently, when testing a hypothesis that involves comparison of non-nested covariance models (still with the same X matrix), AIC is used (as described above). When models are nested, hypothesis testing may be possible with the LRT statistic ({lambda}) which approaches a {chi}2 distribution, and which is defined as:

[4]
where l1 and l2 denote the residual log likelihoods of the reduced model and the full model, respectively (Littell et al., 1996). In SAS PROC MIXED Version 8, "–2l" instead of "2l" is reported and labeled "–2 Res Log Likelihood" in the fit statistics table.

In practice, there is another condition that must be met for the LRT to apply. The hypothesized value of the parameter being tested must not be on the boundary of the parameter space. For example, by definition a variance component must be nonnegative, therefore the value zero is on the boundary of the parameter space. In this case, the asymptotic distribution of {lambda} does not follow a {chi}2 distribution. When the models being compared differ by such parameters, e.g., variance components or spatial ranges, the distribution of the test statistic {lambda} consists of mixtures of {chi}2 distributions (Verbeke and Molenberghs, 2000). If the models being compared differ by only one such parameter, the appropriate p value for the LRT is one-half the reported p value from the {chi}2 distribution with one degree of freedom (consistent with dropping one parameter from the full model), hereafter referred to as LRT01 (Littell et al., 1996). If the models being compared differ by more than one such parameter, then the actual distribution of {lambda} is unknown and we use AIC values (as described above) to determine which model has the better goodness of fit.

The use of AIC to determine which model has a better fit is the same as an LRT but uses a critical value of twice the difference in the number of random effect parameters between the two models under consideration (see Eq. [2]). Hence, if the two models have the same number of parameters, the AIC just selects the model with higher likelihood, which may not be a very reliable criterion. If the two models differ by one random effect parameter, the AIC uses a "critical value" of 2, which is less conservative than an LRT, which has a "critical value" of 3.84 (the 95th percentile of {chi}12), or LRT01, which has a "critical value" of 2.71 (the 95th percentiles of LRT01) (Morgan and Gumpertz, 1996). Consequently, while the use of LRT or LRT01 can be seen as adding complexity to model selection compared with simply using AIC values, it also adds a degree of robustness and confidence in the selection process that are absent when using AIC values alone.

Spatial Covariance Structure Selection for the RCBCE and CE Methods
The first step in Fig. 1 is to use a semivariogram of the fixed-effect model-fit residuals to choose candidate nugget and/or no-nugget spatial covariance structures to use with the RCBCE and CE methods. In practice, the estimated sill, nugget, and range from the semivariogram will be good starting parameter values for the mixed model procedure, which will then iteratively search for its own best-fit values. Providing starting parameter values for each spatial covariance structure is important for two reasons. The first is that the mixed models analysis is very sensitive to starting parameters values and good starting values increase the likelihood that the algorithm will converge. The second reason is that SAS PROC MIXED can require a large amount of computer time with the large data sets typical of precision agriculture and yield monitoring (n > 2000). Good starting parameter values can reduce this run time considerably.

Once candidate spatial covariance structures and their associated starting parameter values are chosen (based on the semivariogram of the fixed-effect model-fit residuals), the resultant RCBCE models are run. If a tested model initially results in nonconvergence, infinite likelihood, or a nonpositive definite Hessian matrix (a multiple of the inverse of the estimated covariance matrix of the covariance parameter estimates), we recommend a "trial-and-error" approach for changing the starting parameter values. If subsequent attempts also result in one of these failures, the model is excluded (SAS Institute, 1999b).

Spatial covariance structures may or may not include a nugget. Consequently, the second step in our demonstration is to compare all candidate RCBCE models based on no-nugget functions. Because the various spatial covariance functions are not nested, we select the best no-nugget model using AIC values as described above (Fig. 1). Since the three spatial covariance functions under consideration have the same number of random effect parameters, this amounts to simply picking the covariance function with the highest likelihood value. This is then repeated for RCBCE models based on spatial covariance structures that include a nugget. Best CE models based on no-nugget and nugget spatial covariance structures are then determined in the same manner (Fig. 1).

Hypothesis Testing
Three hypotheses are tested to select a covariance model (Fig. 1). These hypotheses and the statistical tools used to test them are described below.

Hypothesis 1
The nugget effect {sigma}2g = 0, which means the no-nugget spatial covariance function is more appropriate than one with a nugget for the RCBCE or CE method. After selecting the best RCBCE nugget and no-nugget models (Fig. 1), the two are compared to determine which results in the better fit. If the two models are based on different spatial covariance structures, they are not nested and AIC values are used to determine the one with the better fit. If they are based on the same spatial covariance structure, then they are nested. However, because the hypothesis involves testing whether a variance component (the nugget) equals zero, the LRT01 is used and the p value of the test is half of the reported p value from the {chi}2 distribution with one degree of freedom. The same procedure is used to determine whether the selected no-nugget or nugget CE model has the better fit.

Hypothesis 2
The errors are not spatially correlated; that is, the RCBiid method is more appropriate than the RCBCE method. When the best RCBCE model does not contain a nugget, the difference between the two models is only one parameter, the range. The LRT01 statistic can be used to test Hypothesis 2 using half of the reported p value from the {chi}2 distribution with one degree of freedom.

When the best RCBCE model contains a nugget, the RCBiid model is still nested within it, but the models differ by two parameters: the range and the nugget. In this case the distribution of {lambda} is unknown, so determination of the model with the better fit is based on AIC values.

Hypothesis 3
The RCBCE block and plot effects are {sigma}2p = {sigma}2b = 0; that is, V1 = 0 and V2 = {sigma}2W, which means that blocking is not significant and the CE method is more appropriate than the RCBCE method. There is one case where Hypothesis 3 can be tested with the LRT01. If the best RCBCE and CE models are both based on the same spatial covariance structure they are nested and the first requirement for using the LRT01 is met. If the RCBCE model has dropped either the block or plot effect (i.e., the estimate for either of these parameters is zero), and this model only differs from the best CE model by the one remaining block or plot effect (a variance component), then the LRT01 can be implemented using half of the reported p value from the {chi}2 distribution with one degree of freedom. In all other cases, we use AIC values to determine which model has the better fit.

Estimation of Treatment Significance and Mean
Once the covariance model is selected, it is used to estimate treatment means and conduct treatment comparisons using estimated generalized least squares (EGLS). The covariance parameters of the model are estimated using REML (Zimmerman and Harville, 1991).


    APPLYING THE PROCEDURE: A TWO-YEAR SITE-SPECIFIC NITROGEN MANAGEMENT STUDY
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 COVARIANCE MODEL SELECTION...
 APPLYING THE PROCEDURE: A...
 INTERPRETATION AND DISCUSSION
 CONCLUSIONS
 APPENDIX
 REFERENCES
 
Field Description and Layout
Two trials were conducted in a 12-ha field at the Lower Coastal Plain Tobacco Research Station, Kinston, NC (Fig. 2a) . There are three major soil map units in the field: Norfolk (No) loamy sand (fine-loamy, siliceous, thermic, Typic Paleudults), Goldsboro (Go) loamy sand (fine-loamy, siliceous, thermic, Aquic Paleudults), and Lynchburg (Ly) sandy loam (fine-loamy, siliceous, thermic, Aeric Paleaquults).



View larger version (66K):
[in this window]
[in a new window]
 
Fig. 2. (a) Field layout of the randomized complete block design at the Lower Coastal Plain Tobacco Research Station, Kinston, NC. Adjacent plots with the same shading pattern are in the same block (three plots per block). The treatment number is indicated in each plot; (b) wheat forage yield map; and (c) corn grain yield map. RYE = Realistic Yield Expectation N management; FA = field-average N management; SSNM = site-specific N management.

 
The experiment began in November 2000 with the establishment of a 2-yr winter wheat–double crop soybean [Glycine max (L.) Merr.] (Year 1)–corn (Year 2) rotation typical of the Coastal Plain. The experiment was established in an RCB design with three N management treatments replicated 10 times in plots 60.8 by 60.8 m (0.37 ha). Blocks were established to minimize the distance among treatments within a block, subject to limitations imposed by field dimensions; note that the blocks were not all the same shape (Fig. 2a). The three N treatments were as follows:
  1. "RYE": Whole-field Realistic Yield Expectation (RYE) N management based on the estimated yield expectation and the N-use factor for the predominant soil type (Go) in the field derived from the North Carolina RYE database (North Carolina Nutrient Management Workgroup, 2003). Fertilizer N was applied uniformly to treatment plots at planting and at wheat growth stage GS-30 (Zadoks et al., 1974) or corn growth stage V2 (Ritchie et al., 1993).
  2. "FA": Whole-field N management based on field-averaged in-season estimates of optimal N fertilizer rates determined by color infrared aerial photography (CIR) (Flowers et al., 2003; Carter et al., 2002; Sripada et al., 2002). Fertilizer N was applied uniformly to treatment plots at planting, and at GS-25 and GS-30 for wheat, and at V2 and VT for corn.
  3. "SSNM": Site-specific N management based on site-specific in-season estimates of optimal N fertilizer rates determined by CIR. For SSNM, each 0.37-ha plot was divided into 78, 4.6 by 9.1 m "miniplots" for applying variable rates of N fertilizer. The optimal N rate for each miniplot was determined via CIR and applied uniformly to each miniplot at the same growth stages as FA.

All fertilizer N was applied in a serpentine fashion by an applicator tacking back and forth across the field in adjacent swaths, switching the amount of fertilizer applied whenever a plot or miniplot boundary was crossed. Thus, the treatments were not applied to a plot all at one time, but partially and sequentially to adjacent plots.

Nitrogen rates and timing for the RYE, FA, and SSNM treatments for wheat forage in 2001, and for corn for grain in 2002 are summarized in Table 1. For wheat forage, the total RYE N fertilizer rate was 44 and 34% greater than for FA and SSNM, respectively. For corn, the total FA N fertilizer rate was 95 and 51% greater than for RYE and SSNM, respectively. Nitrogen fertilizer was not applied to the 2001 soybean crop.


View this table:
[in this window]
[in a new window]
 
Table 1. Rate and timing of N fertilizer applications for the three N management treatments for wheat and corn.

 
For this presentation of covariance model selection, we analyzed the 2001 wheat forage data, and the 2002 corn grain yield data. Wheat forage was harvested from 18, 1.2 by 18 m sample harvest areas within each 0.37-ha treatment plot using a forage harvester with a 1.2-m header and Mettler weigh system. Sample harvest area forage yield was assigned to the center point of each sample area georeferenced using a differential global positioning system (DGPS). Ten samples were lost during drying, resulting in a total of 530 sampling points (Fig. 2b). Corn was harvested using a grain combine equipped with a yield monitor and DGPS. After the corn yield data were cleaned and edited following the general guidelines suggested by Doerge (1999) and Blackmore and Moore (1999), there were a total of 2542 sampling points, with a range of 69 to 102 subsamples per plot (Fig. 2c).

Model Selection Steps Common to Both Crops
To investigate spatial correlation within the 2001 wheat forage and the 2002 corn grain yields, the observations were fit to two models: (i) with only fixed treatment effects (i.e., N management) to compute the fixed-effect model-fit residuals, and (ii) with both fixed treatment effects and random block and block x treatment effects to compute the RCBiid model-fit residuals. The normality of the observations and these residuals was assessed by diagnostic plots (e.g., histograms) coupled with the Shapiro-Wilk test in SAS Version 8 PROC CAPABILITY (SAS Institute, Cary, NC). There was no strong evidence of nonnormality for the original observations nor for the residuals for either crop, so the data were analyzed in the original scale. An isotropic and stationary spatial process was assumed, and all tests were conducted in SAS Version 8 and determined at the 0.05 significance level.

Scaled plots of the empirical semivariograms of the fixed-effect model-fit residuals e(h), the observations (h), and the RCBiid model-fit residuals e(h)RCB for the 2001 wheat forage crop, and for the 2002 corn yield are illustrated in Fig. 3 . For the wheat 2001 data, plots of e(h) were used to estimate starting parameters for candidate RCBCE and CE models. It was assumed (based on Fig. 3) that the spatial covariance of the errors would be best modeled with either a Gaussian, exponential, or spherical function either with or without a nugget effect. Consequently, in addition to the RCBiid model, six candidate RCBCE and six CE models were evaluated in SAS PROC MIXED. The ddf of F tests for treatment effects were computed using the Kenward-Roger method for all models (Kenward and Roger, 1997). All candidate models and their related statistics and covariance parameter estimates for wheat forage are summarized in Table 2. This process was then repeated for the corn 2002 data set.



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 3. Spatial correlation of wheat forage and corn grain yield illustrated by the isotropic semivariograms of the original observations (h), the residuals from fitting a model with just fixed effects e(h), and the RCBiid model-fit residuals e(h)RCB. The semivariances were divided by 100000 before plotting.

 

View this table:
[in this window]
[in a new window]
 
Table 2. Wheat and corn candidate covariance models and their related statistics and covariance parameters including randomized complete block with iid errors (RCBiid), randomized complete block with correlated errors (RCBCE), and correlated errors analysis (CE) models. (Italic type indicates the selected model.)

 
Wheat 2001 Model Selection
The semivariograms of both the original observations and the fixed-effect model-fit residuals for 2001 wheat forage (Fig. 3) indicated positive spatial correlation in that observations that were closer together tended to be more similar than ones farther apart (semivariance increased as lag distance increased). The semivariograms also indicated that spatial covariance structures with nugget effects would likely be appropriate, as the semivariance remained quite high as the lag distance approached zero. In contrast, the plot of e(h)RCB was nearly a pure nugget effect, that is, a relatively constant semivariance over increasing lag distances. This suggested that blocking was effective in accounting for the spatial correlation among the observations, and that blocking was likely to be an important component of the selected covariance model.

Following the flow chart (Fig. 1), the covariance model for wheat forage was selected as follows:

  1. No-nugget RCBCE Gaussian and spherical models both resulted in a nonpositive definite final Hessian matrix, and consequently were excluded.
  2. Among RCBCE nugget candidate models, the nugget exponential model did not converge.
  3. In testing Hypothesis 1, the two RCBCE models were not nested so AIC was used and the nugget Gaussian model was selected as the best RCBCE model (Table 2).
  4. In testing Hypothesis 2, the RCBCE nugget Gaussian model was compared with the RCBiid model using AIC because the nugget and a second parameter, the range, were involved. The RCBCE nugget Gaussian model with smaller AIC value was chosen over the RCBiid method.
  5. Among the CE no-nugget candidate models, the no-nugget exponential model was selected because it had the smallest AIC value compared with the CE no-nugget Gaussian and spherical models.
  6. Among the CE nugget candidate models, the nugget Gaussian model was selected because it had the smallest AIC value compared with the nugget spherical and exponential models.
  7. Between the CE no-nugget exponential and nugget Gaussian models (Hypothesis 1), the CE nugget Gaussian model with smaller AIC value was chosen.
  8. To compare the best RCBCE- and CE-nugget-Gaussian models (Hypothesis 3), the LRT01 was used. Because (i) these models were based on the same spatial covariance structure they were nested, and (ii) the block effect had been dropped from the RCBCE model (Table 2), thus the models differed by only one parameter (the plot effect). The value of {lambda} for the comparison of these models was computed as 8479 – 8468 = 11. Here {lambda} has a mixture of half {chi}20 and half {chi}21 distributions with one degree of freedom. The p value of the {chi}21 with a value of 11 is 0.0009; therefore, the p value of LRT01 test is 0.0009/2 {approx} 0.0005. Thus, the LRT01 was significant and the RCBCE nugget Gaussian model was selected as the covariance model to be used for wheat forage yield.

Corn 2002 Model Selection
As found in the previous year's wheat forage data, spatial correlation with an apparent nugget effect was evident in the plots of (h) and e(h) (Fig. 3). Furthermore, these plots were nearly identical, which was consistent with the fact that the RCBiid model indicated that treatment effects were not significant. Unlike the scaled semivariograms for 2001 wheat forage, the plot of 2002 corn e(h)RBC continued to show spatial correlation. This suggested that blocking may not have been adequate to account for the spatial variability in these data. Additionally, all three plots indicated that functions with nugget effects might be necessary to model the spatial correlation.

Following Fig. 1, the covariance model for 2002 corn yield was performed as follows:

  1. Among RCBCE no-nugget candidate models, the no-nugget exponential model was selected because it had the smallest AIC value compared with the no-nugget Gaussian and exponential models (Table 2).
  2. Among RCBCE nugget candidate models, the nugget spherical model was selected because it had the smallest AIC value compared with the nugget Gaussian and exponential models.
  3. To compare the RCBCE no-nugget exponential model and nugget spherical model (Hypothesis 1), AIC was used instead of the LRT because these models were not nested. The RCBCE nugget spherical model with smaller AIC value was selected as the best RCBCE model.
  4. To compare the RCBiid method vs. the RCBCE nugget spherical model (Hypothesis 2), AIC was again used because the models differed by two parameters: the range and the nugget; thus, the distribution of {lambda} is unknown. The RCBCE nugget spherical model with a smaller AIC value was chosen over the RCBiid method.
  5. Among the CE no-nugget candidate models, the CE no-nugget exponential model was excluded due to a nonpositive definite final Hessian matrix. The CE no-nugget spherical model was also excluded because it did not converge to a finite range or sill. Consequently, only the CE no-nugget Gaussian model remained.
  6. Among the CE nugget candidate models, the CE nugget spherical model was selected because it had the smallest AIC value compared with the nugget Gaussian and exponential models.
  7. To compare the best CE nugget and no-nugget models (Hypothesis 1), AIC was used because they were based on different spatial covariance structures. The spherical nugget model with a smaller AIC value was selected as the best CE model.
  8. To compare the best RCBCE- and CE-nugget-spherical models (Hypothesis 3), the LRT01 was used because (i) these models were based on the same spatial covariance structure, thus they were nested; and (ii) the best RCBCE model had no plot effect (Table 2), thus the models differed by only one parameter (the block effect). The value of {lambda} for the comparison of these models was 0. Clearly, the LRT01 was not significant and the CE nugget spherical model was selected for 2002 corn yield.


    INTERPRETATION AND DISCUSSION
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 COVARIANCE MODEL SELECTION...
 APPLYING THE PROCEDURE: A...
 INTERPRETATION AND DISCUSSION
 CONCLUSIONS
 APPENDIX
 REFERENCES
 
The RCBCE nugget Gaussian model and CE nugget spherical model were selected for the 2001 wheat forage and 2002 corn grain yield analyses, respectively. This indicated that in both trials, the RCBiid model-fit residuals were significantly spatially correlated, providing evidence of field heterogeneity within blocks that could make the RCBiid method less powerful than a method that incorporated the spatial correlation.

Semivariograms, Spatial Ranges, and Selected Covariance Models
The selection of covariance models for wheat and corn was supported by the semivariograms of the observations, the fixed-effect model-fit residuals, and the RCBiid model-fit residuals (Fig. 3). For wheat, the fixed-effect-model-fit residual semivariances were slightly lower than those of the observations, indicating that there was a small degree of variation accounted for by the fixed effects and that the data were spatially correlated. Additionally, the RCBiid model-fit residual semivariogram appeared to be essentially a pure nugget effect, indicating that blocking accounted for much of the spatial correlation (at distances greater than the minimum lag). Each of these facts is consistent with the selection of the RCBCE nugget Gaussian model (Table 2), which resulted in a significant treatment effect, included a nugget effect in the model of the spatial correlation, and incorporated blocking effects. The shape of the wheat semivariograms at the shortest lag distances (Fig. 3) was also consistent with the selection of a Gaussian spatial covariance structure.

If the wheat semivariograms had been the only criteria used to select the covariance model for the wheat analysis, one might have selected the RCBiid model based on the nearly pure nugget effect of the RCBiid model-fit residuals. The fact that the model selection procedure (Fig. 1) found the RCBCE nugget Gaussian model to be better is instructive. The covariance structure of the errors is not identical to that visualized by the semivariogram analysis of the residuals (Brownie and Gumpertz, 1997). This is the primary reason we emphasize using the semivariograms for estimating starting parameters, and not for selecting a final covariance model.

For wheat forage, the estimated range of spatial dependence of the errors from the selected model (RCBCE nugget Gaussian, Table 2) was 99 m, which was less than the maximum distance within a block (~172 m) but greater than the plot dimensions (~61 m). This indicated that spatially correlated errors were present within and among blocks, probably due to the omission of one or more unknown covariates from the model. In this case, the random block effect would not be expected to be fully effective in explaining the spatial structure. However, the random block x treatment effect was significant. Consequently, for wheat forage, both blocking and spatial covariance modeling of the errors were important.

For the corn data set, the fixed-effect-model-fit residual semivariogram was nearly identical to that of the observations, indicating that the treatment effects accounted for very little of the spatial variability. In contrast to wheat, the corn RCBiid model-fit-residual semivariogram indicated substantial spatial correlation despite the block effect being in the model. This semivariogram had about half the sill-to-nugget semivariance and half the range of the fixed-effect-model-fit residual semivariogram. This indicated that although blocking accounted for a substantial proportion of variability at longer lag distances, it was ineffective at accounting for spatial correlation within the scale of a plot. Additionally, all three corn semivariograms suggested the need for a spatial covariance structure that included a nugget effect. These facts are consistent with the model selection procedure's choice of the CE nugget spherical model for the corn analysis (Table 2), and indicated that the spatial correlation was better accounted for by ignoring the blocking and including an estimation of the spatial covariance structure of the errors in the model.

The estimated range of spatial dependence of the errors from the selected corn model was 52 m, slightly less than the plot size, indicating that spatially correlated errors were present within the scale of adjacent plots. In such a case, neither a block nor block x treatment effect could be expected to be effective in accounting for spatial heterogeneity. In this case, a spatial covariance function alone was better in accounting for the spatial correlation.

These two examples show that variography is an important tool in selecting covariance models. The semivariograms are useful in determining starting parameters. They give an indication of when spatial covariance structures may require inclusion of nugget effects, and they hint at the appropriate form the spatial covariance structure may take (i.e., spherical, exponential, etc.). However, consistent with the findings of Brownie and Gumpertz (1997), the exact model that will be selected cannot be fully predicted from the semivariograms alone.

Model Impact on P Values
In both the wheat forage and corn-for-grain trials, the errors were spatially correlated, the estimated range was less than the block size, and the RCBiid method tended to give higher estimates of the standard errors of treatment comparisons than the other models (Table 2). Consequently, for both data sets, the RCBiid and the selected covariance model resulted in different conclusions regarding treatment significance. The selected covariance model for wheat resulted in a smaller significance level for treatment comparison, p = 0.03 compared with p = 0.07 for the RCBiid model. This would have affected conclusions about treatment effects at the 0.05 level of significance. The selected covariance model for corn indicated significant treatment effects (p = 0.05), which were completely obscured by the RCBiid model (p = 0.43). This is because the RCBiid method assumes the spatial correlation persists for a longer distance (the width of a block, ~172 m) than the selected CE model with an estimated range of 52 m (Table 2). Clearly, when the model errors are spatially correlated, as they were in both of these data sets, the use of the selection procedure may result in a more powerful statistical test than the classical RCBiid method.

The data in Table 2 indicate that the treatment p value associated with any given candidate covariance model is very sensitive to the form of that model. For the wheat data, all of the models that included block effects gave p values for the treatment effect in the range of 0.03 to 0.07. If block effects were omitted from the wheat models, the p values dropped to <0.0001. Clearly, arbitrary selection of a covariance model can result in dramatically different treatment conclusions. In the wheat example presented here, both the selection procedure based on AIC and LRT comparisons (Fig. 1) and the initial variography (Fig. 3) support the selection of the RCBCE nugget Gaussian model for data analysis. The LRT decisively selects the RCBCE nugget Gaussian model over the CE nugget Gaussian model (p value = 0.0009 against dropping the block effects).

In the corn data, the inclusion of block effects in the models also affected the p values of the F test for treatment effects, but the most dramatic impact was associated with the inclusion or omission of a nugget effect in the spatial covariance structures used in the RCBCE and CE methods (Table 2). The p values for models with block effects and no nugget ranged from 0.27 to 0.43. If a nugget was included in the RCBCE candidate models, the p values ranged from 0.06 to 0.15. If no nugget and no block effects were include, the p value dropped dramatically to 0.0003 for the CE no-nugget Gaussian model, which was the only one of this type that converged. The CE models that included a nugget effect had p values that ranged from 0.04 to 0.13. Consistent with the wheat trial, the corn data demonstrate that it is possible for results of tests of treatment effects to differ greatly under different spatial covariance models and that covariance models should not be selected arbitrarily. In this case, the AIC values made it obvious that models without nugget effects were not adequate, and variography supported this conclusion. The nugget spherical model provided the best fit, either with or without block effects (AIC = 41341 and AIC = 41339, respectively). The LRT for block effects showed that the block effects were not significant. Including nonsignificant blocks in the model should not have much effect one way or the other; and it did not—the p values for testing treatment effects were similar whether blocks were included or not (p = 0.06 with block effects and 0.05 without, Table 2).

The ddf of F tests for treatment effects were computed using the Kenward-Roger method provided in SAS PROC MIXED (Kenward and Roger, 1997). In the classical analysis of variance, the error df measure the number of independent pieces of information used to estimate the error variance, the idea being that observations that are correlated with each other are somewhat redundant, the more highly correlated they are, the more redundant is the information. In the correlated errors analysis the error variance is computed a different way. The Kenward-Roger method computes ddf based on the estimated covariance structure that summarizes the amount of nonredundant information available to estimate the error variance. When block and plot effects are present and have an impact on the covariance structure, the Kenward-Roger ddf for the test of treatment effects will be similar to the traditional ddf based on the number of plots. If the block and plot effects are small or nonexistent, the ddf computed by the Kenward-Roger method will be much larger. In the two trials presented here, the ddf varied considerably among candidate models (Table 2). For example, the ddf ranged from 27 to 347 and from 18 to 1033 for the wheat and corn candidate models, respectively. On first consideration, it might seem that this alone should result in the decrease in treatment p values associated with some of the CE candidate models. However, in our experience, this is not the case even in these types of experiments that involve dense subsampling within plots.

To determine how much the ddf affected the p values, we compared the F statistic for treatment effects in each of our selected models to three different critical values. The three critical values correspond to ddf computed three different ways: (i) using the "RESIDUAL" option in SAS Proc Mixed, (ii) using the "KENWARDROGER" option, and (iii) using the RCBiid method. Note that we did not change the covariance models or the F statistics; just the ddf for finding the critical values were changed. Table 3 shows the results. For the wheat data, ddf = 19, 27, or 527 computed by the Kenward-Roger method, computed as if the model were RCBiid, and using the "Residual" method, respectively. The associated p values ranged only from 0.03 to 0.01. For the corn data, when ddf ranged from 18 to 2539, the change in p values was also small, ranging only from 0.07 to 0.05. Clearly, the treatment p values of the selected covariance models were relatively insensitive to changes in ddf over the range exhibited by the candidate models. Differences in p values among candidate models cannot be accounted for simply by the changes in ddf among the RCBiid, RCBCE, and CE methods.


View this table:
[in this window]
[in a new window]
 
Table 3. Impact of the denominator degrees of freedom (ddf) on the treatment p value of the selected covariance model.

 
Model Impact on Treatment Means and Standard Errors
For both the wheat and corn trials, the estimated deviations of treatment means from the overall mean were very similar between the selected covariance model and the RCBiid method (Table 4). For wheat, the standard errors of the treatment means were very similar between the two models (Table 4); but for corn, the selected CE model had standard errors of the means strikingly lower than the RCBiid method.


View this table:
[in this window]
[in a new window]
 
Table 4. Comparisons of generalized-least-squares treatment means and deviations of treatment means from overall means for wheat and corn between the randomized complete block with iid errors (RCBiid) model vs. the best-fit spatial covariance model. The standard error of the mean is in parenthesis.

 
The treatment means estimated for wheat forage and corn grain yield by the selected models were lower than those computed by the RCBiid method (Table 4). Our experience has shown that while this happened in these examples, it is not always the case. The differences in treatment estimates between the selected models and the RCBiid method differed across treatments. This was probably due, at least in part, to the spatial arrangement of the treatments (Fig. 2a). For example, there were eight pairs of adjoining plots that both received the RYE treatment, whereas there were only six and five pairs of adjoining plots that both received the FA and SSNM treatments, respectively. Each plot's contribution to the treatment mean depended on how close it was to other plots of the same treatment. Thus, for the RCBCE and CE methods, within a specific distance the covariance of any two sampling sites of each treatment was different.

Other Strategies for Model Selection
In our two examples, if only AIC had been used for model selection, the same models would have been selected as we found using the LRT01 together with AIC. However, our experience with other data has shown that this is not always the case. Overall, we prefer using the LRT in all hypothesis tests if the distribution of a LRT is known. For cases where the distribution of a LRT is unknown, AIC gives a method of comparing models. As with the LRT, there are uncertainties about the behavior of the AIC statistic when parameters are on the boundary of the parameter space.

Other model selection sequences for choosing the best-fit RCBCE or CE model are possible. Within the RCBCE or CE method, we can first compare a nugget vs. no-nugget model based on the same spatial covariance structure using LRT01, and then compare the selected models across different spatial covariance structures using AIC. Additionally, we can also first compare the best-fit RCBCE and CE models to determine the blocking significance, and then compare the selected model to the RCBiid model to determine if spatial correlation is significant. We examined these procedures for our two examples, and they resulted in the selection of the same best-fit models as our original model selection sequence.


    CONCLUSIONS
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 COVARIANCE MODEL SELECTION...
 APPLYING THE PROCEDURE: A...
 INTERPRETATION AND DISCUSSION
 CONCLUSIONS
 APPENDIX
 REFERENCES
 
We began this research with a concern about RCB analysis in precision agriculture experiments. The spatial scale of these experiments, the dense subsampling within plots associated with yield monitoring, and the unusual way in which treatments may be applied to these plots (compared with traditional small plot experiments) lead to issues that the traditional RCB analysis was not designed to address. Our primary objective was to present a step-by-step, systematic approach that agronomists could use to select a covariance model for RCB analysis when spatial correlation is present. That approach was outlined in Fig. 1, and tested with data from two trials within a large-scale site-specific N management study. This procedure is somewhat complex, but systematic and conceptually straightforward. It is readily implemented in SAS, though it may require substantial computing time for large data sets. Additionally, the procedure is not restricted to RCB analyses but is appropriate to any mixed-model analysis with potentially spatially correlated errors. Inferences based on the fitted covariance structure are expected to be more precise than other candidate models rejected in the selection process. Our second objective was to evaluate the impact that different covariance models have on treatment effect p values. Both the 2001 wheat and the 2002 corn data sets demonstrate that the resultant treatment p values may be highly sensitive to the choice of covariance model, particularly to which of the component parts are included: spatial correlation, nugget effects, plot effects, and block effects. Clearly, arbitrary selection of a covariance model can result in misleading conclusions about treatment effects.

Further Considerations
Covariance model selection based on one dataset is difficult, and no single tool will necessarily select an appropriate model. For example, the candidate CE nugget exponential wheat model had AIC values that were nearly identical to the other wheat candidate CE nugget models (Table 2), but the estimated range and sill were nearly an order of magnitude larger. These estimates were not only inconsistent with the estimates from the other candidate models, but clearly did not fit the wheat data presented in Fig. 3. Our procedure rejected this candidate model, but it is still of concern that the AIC for this model gave little indication that its parameter estimates were apparently incorrect.

A further concern raised by the results presented here is the extreme sensitivity of the treatment p values to covariance model selection. Clearly, care must be taken in selecting a model for data analysis, and untested assumptions about the adequacy of a given spatial covariance structure, the presence or absence of nugget effects, or the need to include or omit block effects can lead to erroneous conclusions. We believe that a selection procedure like the one outlined in Fig. 1 combined with variography are important if adequate models are to be used for data analysis. However, given the concerns about using AIC and LRT comparisons, all conclusions about covariance model selection and the resultant treatment effect p values need to be carefully evaluated. For example, researchers need to ask if the estimated parameters are realistic, and if the conclusions regarding treatment effects are agronomically meaningful. Simulation studies would be useful in evaluating the probability of selecting an adequate covariance structure and the effect of the covariance structure on the estimation of treatment significance with or without the inclusion of block and block by treatment (plot) effects.


    APPENDIX
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 COVARIANCE MODEL SELECTION...
 APPLYING THE PROCEDURE: A...
 INTERPRETATION AND DISCUSSION
 CONCLUSIONS
 APPENDIX
 REFERENCES
 
Following are examples of SAS PROC MIXED code used to perform the covariance model selection procedure described in the text. Starting parameter values s