Agronomy Journal Grow Your Career With ASA
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published in Agron J 99:481-488 (2007)
DOI: 10.2134/agronj2005.0150
© 2007 American Society of Agronomy
677 S. Segoe Rd., Madison, WI 53711 USA
This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Plant, R. E.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Plant, R. E.
Agricola
Right arrow Articles by Plant, R. E.
Related Collections
Right arrow Spatial Distribution
Right arrow Statistics

Statistics

Comparison of Means of Spatial Data in Unreplicated Field Trials

Richard E. Plant*

Dep. of Biological and Agricultural Engineering and Dep. of Plant Sciences, Univ. of California, Davis, CA 95616

* Corresponding author (replant{at}ucdavis.edu)

Received for publication May 17, 2005.

    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The measurement technologies of site-specific agriculture provide precision instruments at the landscape scale, permitting investigators and farmers to obtain data from experiments in commercial fields at previously unheard of spatial resolution. However, the statistical properties of data arising from these technologies are different from those of traditional small-plot trials data. These differences provide the opportunity to analyze data in different ways. The objective of this paper is to develop a method for estimating the effective sample size and P value of a significance test of the difference between two means in an unreplicated large-scale trial in a commercial field. The method is based on a modification of a well-known correction for spatial autocorrelation of the data when testing the significance of the correlation coefficient between two random variables. An example, the comparison of readings taken from a yield monitor on two different days, is given to demonstrate the properties of the method.

Abbreviations: NDVI, normalized difference vegetation index


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
PRECISION AGRICULTURE has been defined as the use of modern tools of information technology in crop management (Lowenberg-DeBoer and Erickson, 2000). While the technologies of precision agriculture, such as the yield monitor, remote sensing, and soil electrical conductivity measurement, together with the use of the global positioning system, have greatly facilitated the practice of site-specific crop management, these technologies have another use as well. They provide what might be called precision measuring instruments at the landscape scale, permitting investigators (and farmers) to obtain data from experiments in commercial fields at previously unheard of spatial resolution. These data, however, differ in their statistical properties from the data collected in traditional small-plot agronomic trials in that the data often consist of hundreds or thousands of correlated values instead of a small number of independent values. Moreover, the investigators or the commercial grower cooperator may, for logistical or economic reasons, be unable or unwilling to lay out an experiment in a commercial field using a traditional replicated plot design. It is therefore useful to explore new statistical methods to analyze precision agriculture data (Nielsen and Wendroth, 2003).

Arguably the simplest form of an experiment is a comparison of the mean responses to two levels of a single factor. For example, Lee et al. (2006) provide a report of an experiment to compare the response of C sequestration rate to tillage method (conventional vs. minimum till) in a commercial corn field. The experiment involves the measurement of C sequestration rate using the eddy covariance method. The fetch of the measurement instruments is sufficiently large that no more than two can be placed in a single field. The cost of the experiment precludes more than one trial at this initial stage of the program, but it is nevertheless desirable to obtain some statistical measure of the difference between various quantities, such as yield and normalized difference vegetation index (NDVI), in the conventional and minimum till plots of the field.

The characteristic of an experiment such as this is that a single large field is divided into two halves. Half the field receives Factor Level 1 and the other half Factor Level 2. The 2N samples are then extracted from a set of locations, N from one side and N from the other. The data are autocorrelated, but not perfectly so. Depending on the level of autocorrelation, it may not be appropriate to model the experiment either as a nested design or a model in which the samples independent. In calculating degrees of freedom in a comparison of factor level means, the N pairs of measurements provide an effective sample size of Ne pairs, where 1 ≤ Ne ≤ N. The objective of this paper is to develop a method for estimating this value Ne, and to use this to estimate the P value in a test of the null hypothesis of no difference between the population means. The method is based on an extension of a method of Clifford and Richardson (1985) and Clifford et al. (1989) for estimating the effective sample size in a significance test of the correlation coefficient of autocorrelated data. The Clifford et al. (1989) approximation has achieved the status of the standard approach for use with autocorrelated data (Sokal, 2004), and has been used in the analysis of data from agricultural field experiments (Plant et al., 1999). The extension of this method to comparison of treatment means is first developed using Monte Carlo simulation. We then give an example using data from a field experiment and compare the results with a mixed model analysis.


    THEORY
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
A spatial model for the comparison of two means associated with autocorrelated data as described in the Introduction may be written as

Formula 1[1]
where i indexes the factor levels, j indexes the samples, {lambda} is a scalar measuring the strength of autocorrelation of the errors, the wjk are elements of a matrix describing the connectivity among the samples, and the {nu}ij are independent, identically distributed normal random variables with mean 0 and variance {sigma}2. This is called the spatial error model (Anselin, 1992) since the error term {varepsilon}ij depends on the other error terms. This model fits into the mixed model format as a special case (Pinhiero and Bates, 2000, Eq. 5.5, p. 204) in which there is no explicit random effect term. As such, a test of the null hypothesis µ1 = µ2 against the alternative µ1 != µ2 can in principle be carried out using the maximum likelihood method. Indeed, Pinhiero and Bates (2000, p. 260–266) provide a method of mixed model analysis based on the variogram of the residuals of an analogous model in which the residuals are assumed to be uncorrelated. Although this approach may be used for variance components estimation, they do not recommend using it for hypothesis tests of fixed effect means (Pinhiero and Bates, 2000, p. 87). For this reason and because the determination of the effective sample size is useful in its own right, we consider an alternative based on work of Clifford et al. (1989) that provides both a test of the hypothesis and an estimate of the effective sample size.

Let U and V be two normally distributed random variables (technically, random fields) defined on a spatial coordinate system (x,y) with a total of N points each, and let u(x,y) and v(x,y) be observations of U and V. The Pearson product moment correlation coefficient of u and v is given by (Steel et al., 1997):

Formula 2[2]
where suv = N–1{Sigma}[u(x, y) Formula 2][v(x, y) Formula 2], su = {N–1{Sigma}[u(x, y) – Formula 2]2}1/2, Formula 2 is the mean of the u(x,y), the sums are taken over all the points (x,y), and similar expressions define Formula 2 and sv. If, conditional on U, the values v(x,y) of V are independent and identically distributed, then the variance of the sampling distribution of the random variable r under the null hypothesis that r = 0 is {sigma}r2 = 1/(N – 1) (Kendall and Stuart, 1979). This variance may be used in a test of the null hypothesis of zero correlation, although in practice what is usually tested is a transformed version of r (Kendall and Stuart, 1979). If, however, the random variables U and V are independent, so that r = 0, but the values of u(x,y) and v(x,y) are each positively autocorrelated among themselves, then the variance of the sampling distribution of r is generally larger than 1/(N – 1) (Clifford et al., 1989; Haining, 1990). This results in an increased tendency to reject the null hypothesis of a significance test when it is actually true, that is, in an increased Type I error rate. The intuitive explanation is that, if they are positively autocorrelated, then each value of u(x,y) and v(x,y) provides some information about the values of its neighbors, and therefore the actual number of degrees of freedom provided by each set of observations is <N. However, if an independent estimate Formula 2r2 of the variance of the sampling distribution of r were available, then we could estimate an effective sample size Ne of each set by inverting the equation {sigma}r2 = 1/(N 1) to obtain

Formula 3[3]
When u(x,y) and v(x,y) are positively autocorrelated, Formula 3r2 is generally larger than predicted under zero autocorrelation, and therefore in this case Formula 3e is generally <N.

Using a series of ingenious arguments, Clifford et al. (1989) showed that, under fairly general conditions, the value of Formula 3r2 can be estimated as follows. First, divide the set of all N pairs of points into KT strata (lag groups) Sk separated by a lag k. Of these KT strata, the first K are used to estimate the variance. Let Formula 3u(k) be the experimental covariogram of u of lag group k:

Formula 10[10]
and define the experimental covariogram of v similarly. Then the experimental correlogram Formula 10u(k) of u at lag k is given by Formula 10u(k) = Formula 10u(k)/su2, and similarly for Formula 10v(k). Clifford et al. (1989) show that the estimate of Formula 10r2 may be approximated as

Formula 4[4]
Haining (1990, 1991) extended the work of Clifford et al. (1989), testing the approximation under various combinations of values KT and K of strata and also showing that the approximation may be used for Spearman's rank correlation coefficient as well as Pearson's r.

Now consider the problem that is the subject of this paper: testing the null hypothesis of equality of means of two samples u(x,y) and v(x,y) drawn from normally distributed random fields U and V, each defined on a spatial coordinate system (x,y). The locations at which U and V are measured are not in general the same, but we will assume that U and V have the same variance {sigma}2. We will assume for simplicity that both samples have the same size N, although this can easily be relaxed (Steel et al., 1997). The null hypothesis of equality of means of U and V may be tested using Student's t statistic (Steel et al., 1997, p. 98), t = (Formula 4Formula 4)/suv, where Formula 4 and Formula 4 are the sample means and suv is the square root of suv2 = (su2 + sv2)/2, with su2 = {Sigma}[u(x, y) – Formula 4]2/(N – 1) and similarly for sv2. If the values of u(x,y) and v(x,y) are spatially independent, then the sampling distribution of the t statistic under the null hypothesis of no difference between means has a variance of {sigma}t2 = D/(D – 2) where D = 2N – 2 is the number of degrees of freedom. As with the correlation coefficient, in the case that u(x,y) and v(x,y) are each spatially positively autocorrelated, if an independent estimate Formula 4t2 of the sampling variance is available then this could in principle be used to estimate the effective sample size Ne, once again by inverting the variance equation and substituting for N.

Unfortunately, the resulting estimate, Formula 4e = (2Formula 4t2 – 1)/(Formula 4t2 – 1), has very poor numerical properties because Formula 4t2 appears in both the numerator and the denominator. A more accurate estimate of Ne is obtainable by using the point biserial correlation coefficient (Tate, 1954). This statistic is computed by arranging the values of u and v in a single column vector w (i.e., w' = [u', v'], where the prime denotes transposition). The point biserial correlation coefficient rw is then the correlation coefficient between w and a vector z whose components corresponding to u are 1 and corresponding to v are 0 (i.e., z' = [1...1 0...0]). When the components of u(x,y) and v(x,y) are independent, the variance of the sampling distribution of rw is {sigma}r2 = 1/2N (Tate, 1954). The t statistic for the test of the null hypothesis rw = 0 is

Formula 5[5]
Therefore, given an independent estimate Formula 5r2 of the variance of rw, an estimate of the effective sample size can be obtained as

Formula 6[6]
The corrected t statistic for the test of the null hypothesis rw = 0 may then be obtained by replacing N with Formula 6e in Eq. [5] to obtain

Formula 7[7]
There does not appear to be a simple analytical estimate analogous to Eq. [4] of {sigma}r2 that could be used in Eq. [6]. However, it is possible in principle to obtain an estimate using a bootstrap method (Efron and Tibshirani, 1993). Davison and Hinkley (1997) describe two classes of bootstrap methods that can be used for spatially autocorrelated data. The first is the spatial block bootstrap. Zhu and Morgan (2004) use a spatial block bootstrap to compare the distribution of nematodes in treated and untreated portions of a field in Wisconsin. This method is preferred if it is applicable since it does not require any parametric assumptions about the data. The second method, called a parametric bootstrap (Efron and Tibshirani, 1993), is to estimate the distributional parameters of the residuals of a parametric model of the data and then generate random variates with these same parameters. In the case of time series data discussed by Davison and Hinkley (1997), the residuals of the model can be used for this purpose, but in the case of spatial data there is no natural direction analogous to that of time that could be used to generate model values successively. Therefore, a set of random numbers satisfying the parametric model must be generated and used in bootstrap resampling in the manner described by Efron and Tibshirani (1993, p. 53). In our application the parametric model is the spatial error model of Eq. [1]. The parameters {lambda} and {sigma}2 of this model may be estimated from the data and then the method given by Haining (1990, p. 116) may be used to generate pseudorandom numbers, which are used to generate the bootstrap resamples. These are used to generate the estimate Formula 7r2, from which the estimated sample size Formula 7e is obtained via Eq. [6]. This value is then used in place of N in the t statistic of Eq. [7] for the test of the null hypothesis of zero difference between means.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Monte Carlo Simulation Experiments
Simulations of the Method of Clifford et al. (1989)
We tested the two bootstrap methods by comparing variance and effective sample size estimates that they generated with those generated by Eq. [4] above, which is the estimate of Clifford et al. (1989). Simulations were generated of the equations

Formula 8[8]
where the vik are independent unit normal random variables on a square grid of size n by n, so that the total sample size is N = n2. All statistical calculations were carried using the R package, v. 2.0.0 (R Development Core Team, 2004).

In each simulated Monte Carlo experiment, uncorrelated unit normal pseudorandom samples were generated on an (n + 4) by (n + 4) flat grid using the R function rnorm(). Autocorrelated random variables u(x,y) and v(x,y) on the grid were then generated from these samples using the method described by Haining (1990). The function invIrM() and other associated functions from the package spdep (Bivand et al., 2005) were used for this purpose. The outer two cells around the edge of the grid were removed from the simulation to avoid edge effects, leaving an n by n grid for the simulation. Calculations of the experimental correlograms Formula 8u(k) and Formula 8v(k) in Eq. [4] were carried out using the function correlogram() of the R package spatial (Venables and Ripley, 2002). The variance estimates Formula 8r obtained in Eq. [4] were then substituted into Eq. [3] to compute the estimated effective sample size Formula 8e. Variance estimates Formula 8r for Eq. [3] were also obtained using the spatial block bootstrap method and the parametric bootstrap method. For both bootstrap methods, 500 bootstrap resamples were generated in each simulated experiment.

The simulations were carried out for values of the spatial autocorrelation parameter {lambda} (Eq. [1]) ranging from 0 to 0.8 in steps of 0.2. For use with the parametric bootstrap method, the value of {lambda} was estimated using the spatial error model (Eq. [1]). This estimation was done using the function errorsarlm() and several other functions from the package spdep (Bivand et al., 2005). In each simulated Monte Carlo experiment, values of Formula 8e were computed by each of the three methods. A significance test at was then carried out by computing the t statistic using each value of Formula 8e. The R function pt() was used to compute the P values for that experiment. The Monte Carlo simulation was repeated 1000 times and the fraction Formula 8 of P values less than 0.05 computed by each method was determined. If the method functions correctly, then 5% of the simulated experiments should result in data that generate a P value < 0.05. The row standardized form of the spatial weights matrix W of Eq. [1] was used in the simulations.

Simulations of the Comparison of Sample Means
The second step was to test the effectiveness of Eq. [6] as a means of estimating the effective sample size in the comparison of two sample means. Monte Carlo simulations were generated of the equations

Formula 9[9]
where the {nu}ik are independent unit normal random variables. The quantity {delta} is the fixed effect. The null hypothesis is {delta} = 0. The spatial coordinates x and y are given subscripts to emphasize that in a real experiment, unlike the case of the previous simulation, the quantities u and v would generally be measured at different locations in the field. Simulations of Eq. [9] followed the same procedure as the simulations of Eq. [8]. Bootstrap estimates Formula 9r of the variance of the distribution of the point biserial correlation coefficient rw were obtained. These variance estimates were used in Eq. [6] to compute estimates Formula 9e for the test of the null hypothesis of no difference between the means of the two samples (Eq. [7]).

Field Experiment
The method is illustrated by application to an experiment carried out in a rice (Oryza sativa L.) field in 1998 and described by Roel and Plant (2004). The field has an area of 38 ha and is located near Marysville, CA (39°8' N, 121°35' W). The soils consist of approximately 45% Kimball loam (fine, mixed, active, thermic Mollic Palexerafls), 30% San Joaquin loam (fine, mixed, active, thermic, Abruptic Durixeralfs), and 25% Bruella loam (fine-loamy, mixed, Ultic Palexeralfs). A false color aerial image was taken on 18 Aug. 1998 using Kodak 2443 infrared film. This image was then scanned to a TIFF file. Rice was harvested using a stripper harvester combine equipped with a John Deere (Moline, IA) Green Star yield mapping system with real-time differential global positioning system receiver. The harvester followed a back-and-forth north to south harvest pattern.

Field data were imported into the ArcGIS (ESRI, Redlands, CA) v. 8.3 geographic information system for analysis. Some data transformations were carried out using the software package GeoDa (Anselin, 2003). All statistical calculations were again carried using the R package, v. 2.0.0 (R Development Core Team, 2004). R code for all calculations described in this paper is available from the author.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
In Monte Carlo simulations of Eq. [8], the correction method of Clifford et al. (1989) generated P values < 0.05 roughly 5% of the time for values of {lambda} < 0.8 (Table 1). The spatial block bootstrap generated an overly large number of P values < 0.05 for all values of {lambda}. The parametric bootstrap method generated P values < 0.05 roughly 5% of the time for all values of {lambda}. This tendency of the parametric bootstrap method to outperform the spatial block bootstrap with data fitting the parametric model matches results of Davison and Hinkley (1997, Table 8.2). The Monte Carlo simulations give an unfair advantage to the parametric bootstrap method because the method is used on data that are generated by a process identical to the parametric assumption. Nevertheless, the results show that when the data satisfy these assumptions, the parametric bootstrap is capable of accurately estimating the effective sample size. The block bootstrap method, however, was dropped from further analysis.


View this table:
[in this window]
[in a new window]

 
Table 1. Monte Carlo simulation results of the computation of the effective sample size for the correlation coefficient. For each of the 1000 simulation runs, the method of Clifford et al. (1989), the spatial block bootstrap, and the parametric bootstrap were used to compute Formula 9r and Formula 9e for n = 12. P is the fraction of the 1000 simulation runs declared significantly different using the sample size N = 144. Formula 9 is the fraction declared significantly different using each estimated Formula 9e.

 
The first set of Monte Carlo simulations of Eq. [9] was run with n set at 12. The fraction of simulated experiments having an uncorrected P value < 0.05 increased as the value of {lambda} increased, consistent with theory, ranging from 0.064 for uncorrelated errors ({lambda} = 0) to 0.534 for {lambda} = 0.8 (Table 2). The parametric bootstrap variance estimate of the point biserial rw gave approximately correct results (Table 2). As with the tests of the correlation between two random variables, the parametric bootstrap is in this test applied to data generated by its own model. For this reason, the Monte Carlo simulation results indicate that the parametric point biserial correlation method works well for data that satisfy the assumptions of the model, but it does not provide any information about how robust the method is.


View this table:
[in this window]
[in a new window]

 
Table 2. Monte Carlo simulation results of the computation of the difference between two random variables generated from the spatial error model (Eq. [9]) using the variance estimate generated by the parametric bootstrap method with n = 12 (i.e., N = 144). The value of P is the fraction of the 1000 simulations declared significantly different in an uncorrected t test. Each value of Formula 9 is the fraction declared significantly different in the corresponding corrected test. The table also lists the mean values of Formula 9, the estimate of {lambda} obtained from the spatial error model.

 
To test the power of the point biserial method Monte Carlo simulations of Eq. [9] were carried out with positive values of {delta}. A series of simulations was run in which values of {lambda}, {delta}, and N were all varied. Specifically, values of {lambda} of 0.0.4 and 0.8; values of {delta} of 0, 0.25, 0.5, 0.75, and 1, and values of n of 6, 12, and 20 were tested. The results obtained for {delta} = 0.25 and 0.75 were intermediate between the others and are not shown here. The results indicate that at a sample size of N = n2 = 36 the method loses both power and precision dramatically as {lambda} increases, while at N = 400 the method retains its precision but loses power as {lambda} increases (Table 3). Note that the simulated values in Table 3 with {delta} = 0, {lambda} = 0, 0.4, and 0.8, and n = 12 come from different simulation runs using the same parameter values as those of Table 2. The similarity of values of Formula 9 and the mean Formula 9e between these simulations is an indication of the stability of the results.


View this table:
[in this window]
[in a new window]

 
Table 3. Values of Formula 9 and mean (Formula 9e) for a range of values of n, {lambda}, and {delta} (note that N = n2).

 
The average value of Formula 9e is approximately equal to N at zero spatial autocorrelation, even for the parametric bootstrap method (Tables 2 and 3), and the fraction of Type I errors is approximately correct. The values do show the expected pattern of decrease as the level of spatial autocorrelation increases. On the basis of the results of the Monte Carlo simulations, we may conclude on a preliminary basis that, at moderate to high levels of spatial autocorrelation, the estimate Formula 9e obtained from Eq. [6] via the parametric bootstrap of the point biserial correlation coefficient provides a reasonable value for a significance test of the null hypothesis of zero difference between the means of two spatially autocorrelated populations when the populations have the form of a first order spatial error model with normally distributed errors.

It is important to emphasize that in an experiment carried out in a single field, the population to which the inference can be applied is limited to only that field. This is, of course, true of a replicated experiment carried out in a single field as well. Nevertheless, the standard practice would be to carry the experiment out in multiple fields and to analyze the results using paired comparisons or, in the case of a factor with more than two levels, a randomized complete block. To simulate the concept of analyzing a single field contained in a landscape of fields, Monte Carlo experiments were run in which the effect term {delta} in Eq. [9] was a normally distributed random variable with variance {sigma}{delta}2 and mean {delta}0 = 0. In these simulated experiments with n = 12 (i.e., a sample size of N = 144), with {sigma}{delta}2 = 0.0625 the null hypothesis was rejected in a fraction Formula 9 = 0.25 of the simulations, while with {sigma}{delta}2 = 0.25 it was rejected a fraction Formula 9 = 0.48 of the simulations.

It is possible, of course, to consider the possibility of applying the method of Eq. [6] to experiments carried out in more than one field, if resources permit. In this circumstance with a two-level factor and n2 measurements in each plot, the standard analysis would be to average these measurements and test the null hypothesis of zero difference in means using a paired t test. Standard theory of simultaneous inference predicts that the family error rate of the method of Eq. [6] applied to K fields would be 1 – (1 – {alpha})K, where {alpha} is the significance value. Simulations were carried out with {delta} = 0, n = 6, and K = 3, in which a paired t test and K simultaneous applications of the method of Eq. [6] were computed. For {lambda} = 0, 0.4, and 0.8, the family error rates of the method of Eq. [6] were 0.10, 0.21, and 0.27, respectively. However, since each trial in the experiment is individually observable, it may be more appropriate to measure the precision of the method using the experimental error rate. For {lambda} = 0, 0.4, and 0.8, the experimental error rates of the method of Eq. [6] were 0.04, 0.07, and 0.10, respectively. The error rates of the paired t tests for the three values of {lambda} were 0.05, 0.05, and 0.03, respectively. It is not surprising that the error rates of the paired t tests matched the theoretical value since these test the means of the pairs of 36 samples and are thus independent of {lambda}. Although in one sense the paired t test gives a superior result, it does so at the cost of all information about spatial structure. It should be noted also that the two methods are not mutually exclusive; indeed, in a sense they are complementary.

Applications of the Method to Field Data
The example of the application of the method to field data is the comparison of yield monitor data harvested on two different days (Fig. 1 ). In the original study (Roel and Plant, 2004), the cooperating grower provided calibration data that was used to adjust the measured yield values between the days. For the present paper, however, we ignore this calibration data and ask whether a significant difference in measured yield values can be demonstrated between the east side of the field (harvested on Day 1) and the west side (harvested on Day 2). This test is of interest because it illustrates the problem associated with the application of the method of distinguishing whether an observed difference in the means of the effect factor is due to the null hypothesis being false or to the presence of a confounding factor. In this experiment, a false color aerial image of the field indicated a trend of higher NDVI values on the west side of the field than on the east side (Fig. 1). Thus, a higher mean measured yield on the west side could be due to a difference in the calibration of the yield monitor on the two different harvest days, or it could be due to the fact that the yield in the western portion of the field really was higher (or both). To attempt to assess the latter possibility, we carried out a significance test on yield, NDVI, and the ratio of yield to NDVI.


Figure 1
View larger version (51K):
[in this window]
[in a new window]

 
Fig. 1. Map of the field experiment. The NDVI neighborhood mean is shown in the background. Superimposed are the two 10-m yield lattices. The eastern lattice is surrounded by a heavy black line, the boundary is also in black, and the western lattice is surrounded by a heavy white line.

 
Yield, NDVI, and the ratio yield/NDVI were averaged across square lattices whose cells had a side length of 10 m. One square lattice was on the east side of the field, harvested on Day 1, and the other was on the west side, harvested on Day 2 (Fig. 1). These three quantities were used as the u(x,y) and v(x,y) values in Eq. [6]. The quantities were fit with a spatial error model of the form of Eq. [1]. Parametric bootstrap variance estimates of the point biserial correlation coefficient were computed and inserted into Eq. [6] to obtain the estimated effective sample size Formula 9e. This was then plugged into Eq. [7] to obtain the corrected t statistic.

Differences in yield, NDVI, and yield/NDVI ratio are all significant (P = 0.000 to the numerical capacity of the computer) in the uncorrected 10-m data (Table 4). When the statistics are recomputed using the corrected Formula 9e, however, the difference in yield/NDVI ratio is not significant (P = 0.47). There is also a scale effect (Wong, 1995) associated with these data. In particular, when the calculations are repeated using a lattice with a 20-m cell size, at the 20-m scale the uncorrected yield/NDVI ratio difference is not significant (P = 0.69) and the NDVI difference is also (barely) not significant (P = 0.06). The Shapiro–Wilks statistics are highly significant for the 10-m data, although the values of the statistics themselves are relatively high. The QQ plots of the yield and NDVI data show varying departure from linearity (Fig. 2 ).


View this table:
[in this window]
[in a new window]

 
Table 4. Comparison of east (first day) and west (second day) yield, NDVI, and yield/NDVI ratio for a 10- and 20-m grid. The values of t and P refer to the uncorrected data, with 400 and 100 points for the 10- and 20-m grids, respectively. The values of W and PW refer to the value and P value of the Shapiro-Wilk statistic, respectively.

 

Figure 2
View larger version (27K):
[in this window]
[in a new window]

 
Fig. 2. Normal QQ plots of yield and NDVI of the eastern and western lattices in the field experiment.

 
For purposes of comparison the P values of a test of the null hypotheses were computed using the mixed model approach (Pinhiero and Bates, 2000, p. 260–266). This approach relies on a visual estimate of the properties of the variogram and thus does not lend itself to Monte Carlo simulation, but insight into the performance of the method can be gained from single simulations. The analysis used the function gls() of the R package nlme (Pinhiero et al., 2005). For each case, linear model was first generated ignoring spatial autocorrelation, and a variogram was then computed of the residuals of this model. The null hypothesis was then tested using the maximum likelihood method via the anova() function of the R package (R Development Core Team, 2004). The results for yield and yield/NDVI are very similar to those obtained from the uncorrected test (i.e., assuming the data are independent, Table 4). For both of these tests, the null hypothesis is rejected (P < 0.0001). Interestingly, the result for NDVI was to fail to reject the null hypothesis (P = 0.21). This hypothesis is rejected (P < 0.0001) by the corrected point biserial method at the 10-m scale and not rejected (P = 0.06) at the 20-m scale.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The method developed in this paper may be applied to spatial populations whose members are far enough apart that they are not considered identical in their response but not so far apart that they may be considered independent. The method is to estimate an effective sample size Formula 9e such that the difference in means follows a t distribution with 2Formula 9e – 2 degrees of freedom. The effective sample size provides an indication of the density of information present in the data and may be used to aid in the development of preliminary sampling plans for operations such as soil sampling.

In the field example considered in this paper, adjustment using the effective sample size Formula 9e had a strong effect on the P value of significance tests associated with the data. Yield on the east and west sides of the field were declared significantly different both with the adjusted and the unadjusted sample size. The ratio of yield to NDVI was not declared significantly different in either case, however. In the actual event, there was a calibration difference between the yield monitors used on the two sides of the field. One may interpret this as indicating that NDVI in this field was not directly proportional to yield. This is not uncommon (e.g., Plant et al., 2000). It does highlight the problem associated with interpreting the results of the application of this method. This is that it may be difficult or impossible to be sure that the observed difference in means is actually associated with the response to the treatment and not to some other factor in the experimental substrate. In this context, it is interesting to note that statistics texts (e.g. Kempthorne, 1952) generally indicate that replication is used to estimate error variance and randomization is used to eliminate bias. However, the existence of replications also helps to ensure the absence of a pattern in the experimental plot layout when the treatments are assigned randomly.

The method described here clearly cannot replace replication of treatments, both within-field and across fields, where such replication is possible. Strictly speaking, the population of inference in an experiment in a single field is limited to that field. On the other hand, this is true whether the experiment in that field is replicated or not. In the field experiment discussed here, the day-to-day variation in yield monitor calibration is of interest beyond that single field, and a significant difference in measured yields could be an indication that further testing of the yield monitor is warranted.

The P values computed by this method can be considered only as approximations intended to provide a general indication of the probability of observing the given difference in sample means given the truth of the null hypothesis. That is, it would be incorrect to follow the usual procedure in which P = 0.045 is interpreted very differently from P = 0.055 (where the former is given a star and the latter is not). Nevertheless, the method can serve practical purposes. In the case of preliminary experiments involving costly or difficult field trials, it can provide an indication of whether observed treatment effects are sufficiently strong to warrant further investigation with replicated trials. Moreover, by providing the investigator with an estimate of the effective sample size, the method gives an indication of the extent of spatial autocorrelation of the data in terms of its effect on sampling requirements of the data.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 





This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Plant, R. E.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Plant, R. E.
Agricola
Right arrow Articles by Plant, R. E.
Related Collections
Right arrow Spatial Distribution
Right arrow Statistics


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
The SCI Journals Crop Science Vadose Zone Journal
Journal of Natural Resources
and Life Sciences Education
Soil Science Society of America Journal
Journal of Plant Registrations Journal of
Environmental Quality
The Plant Genome