Published in Agron. J. 96:285-297 (2004).
© American Society of Agronomy
677 S. Segoe Rd., Madison, WI 53711 USA
PRECISION AGRICULTURE
Geostatistical Integration of Yield Monitor Data and Remote Sensing Improves Yield Maps
A. Dobermann* and
J. L. Ping
Dep. of Agron. and Hortic., Univ. of Nebraska, P.O. Box 830915, Lincoln, NE 68583-0915, USA
* Corresponding author (adobermann2{at}unl.edu).
Received for publication April 17, 2003.
 |
ABSTRACT
|
|---|
Grain yield maps must accurately display general yield patterns as well as details of local yield variation. Different geostatistical procedures for creating interpolated yield maps by integrating yield data with remotely sensed vegetation indices (VI) were evaluated. Yield monitor data and a multispectral satellite image at 4-m spatial resolution were collected in an irrigated maize (Zea mays L.) field and a rainfed soybean [Glycine max. (L.) Merr.] field. Ordinary kriging (OK), cokriging (CK), simple kriging with varying local means (SKLM), and kriging with external drift (KED) were compared using grain yield as primary variable and three different VI as secondary variables. At both sites, SKLM performed best in terms of the precision of grain yield maps and maps that depicted true yield patterns. Utilizing the most suitable vegetation index at each site and SKLM as interpolation method, the root mean squared error of yield predictions was increased by nearly 20% over OK. Methods such as KED or CK resulted in only small improvement of yield maps over those obtained with OK. Utilizing the satellite image decreased errors associated with yield monitor operations and allowed better prediction in areas where no reliable yield measurements were available. Due to the robust nature of modeling the relationship between primary and one or more secondary variables, the SKLM method has considerable potential for use in commercial precision-farming software. Future efforts on improving yield mapping should concentrate on obtaining improved yield monitor data and imagery and developing yield-sensitive VI for high-yielding crops.
Abbreviations: CK, cokriging CV, coefficient of variation GARI, green atmospherically resistant vegetation index GNDVI, green normalized difference vegetation index KED, kriging with external drift LAI, leaf area index MAE, mean absolute error, ME, mean error NDVI, normalized difference vegetation NIR, near infrared OK, ordinary kriging RI, relative improvement over ordinary kriging RMSE, root mean squared error SKLM, simple kriging with varying local means VI, vegetation indices
 |
INTRODUCTION
|
|---|
COMBINE HARVESTERS equipped with yield monitors are widely used to map within-field variation of crop yield (Stafford et al., 1996). Yield monitor data reflect systematic and random sources of yield variation, including (i) climate and soillandscape features, (ii) localized management-induced yield variation, and (iii) measurement errors associated with the yield-mapping process itself. Yield maps may not reveal the true spatial yield variation at a finer scale because grain flow through a combine is a diffusive process that represents an imprecisely defined area comprised of swath width and distance traveled within the data-recording interval. Grain mixing and smoothing effects are inevitable (Searcy et al., 1989; Arslan and Colvin, 2002b), and grain flow deconvolution for accurate georeferencing of yield records remains a challenge (Whelan and McBratney, 2002). Some measurement errors can be reduced by careful yield monitor calibration and constant travel speed (Arslan and Colvin, 2002a) and by applying postharvest yield data correction and cleaning algorithms (Blackmore and Moore, 1999). Yield data cleaning, however, causes blank areas for which no yield records exist. Depending on the harvesting operations, filtering algorithms and thresholds used, between 5 and 20% of erroneous yield monitor raw data are often deleted, particularly near field edges (headlands), other stop-and-go segments, or overlapping passes (Blackmore and Moore, 1999; Thylen et al., 2000).
In this paper, we explore whether yield maps can be improved by utilizing correlated, spatially dense secondary information. Recent developments in remote sensing technology have led to significant improvements of the spatial and spectral resolution of commercially available aircraft- or satellite-based remote sensing imagery. Satellites such as Quickbird [DigitalGlobe, Longmont, CO; http://www.digitalglobe.com (verified 31 Oct. 2003)] or IKONOS [Space Imaging, Thornton, CO; http://www.spaceimaging.com/products/ikonos/index.htm (verified 31 Oct. 2003)] with fine spatial resolutions of 2.4- or 4-m multispectral and 0.6- or 1-m panchromatic or airborne imagery of similar type allow mapping patterns of soil and vegetation with greater detail than that obtained by yield monitors. Assuming that increasingly more imagery will become available in the future, algorithms must be evaluated for utilizing such imagery in the yield-mapping process.
Numerous studies have demonstrated associations among crop canopy reflectance, crop biomass, leaf area index (LAI), and crop yield at the scale of agricultural production fields (Major et al., 1986; Bouman et al., 1992; Basso et al., 2001; Shanahan et al., 2001). Research has mainly focused on regression techniques that predict grain yields from different spectral bands and indices (Senay et al., 1998; Zhang et al., 1999; Yang and Anderson, 2000). Less emphasis has been given to geostatistical techniques that predict yield by utilizing yield monitor data in combination with remotely sensed imagery as secondary information to account for spatial auto- and cross-correlation. In one of the few studies conducted, King et al. (2000) found that CK was superior to OK for forest volume estimation when the normalized difference vegetation index (NDVI) was used as secondary information. In soil mapping, numerous studies have demonstrated increased precision of maps through geostatistical techniques that utilize exhaustive secondary information (Goovaerts, 2000; Bourennane et al., 2000; Triantafilis et al., 2001; Bekele et al., 2003).
The objectives of this study were to (i) examine whether spatially dense multispectral satellite imagery can improve the accuracy of yield maps and (ii) compare different geostatistical procedures for integrating yield monitor and remote sensing data to create fine-scale yield maps.
 |
MATERIALS AND METHODS
|
|---|
Data Collection
This study was conducted in two fields located at Mead, NE. Field A (41°9'55'' N, 98°26'50'' W) represents an irrigated continuous maize system. Maize (cv. Pioneer 33P67) was planted at 0.76-m row spacing on 9 May 2002 and had final stands of about 72000 plants ha1. Insects and weeds were controlled through several pesticide applications, and the field was center-pivot irrigated during the whole growing season. A total of 224 kg N ha1 was applied in three splits, including 134 kg N ha1 preplant and two fertigations of 45 kg N ha1, each applied through the irrigation system on 17 June and 2 July 2002. Yield data for the eastern edge of Site A were cropped from the full circular pivot area (Fig. 1) to match the available satellite image. The total field area used for yield mapping and remote sensing analysis was 43.6 ha.

View larger version (159K):
[in this window]
[in a new window]
|
Fig. 1. Maps of grain yield and selected vegetation indices of (Field A, left) maize and (Field B, right) soybean. Data shown are cleaned yield monitor data (top) and vegetation indices [green atmospherically resistant vegetation index (GARI) for Field A and green normalized difference vegetation index (GNDVI) for Field B] obtained from an IKONOS satellite image taken on 1 Sept. 2002. Symbols show the locations of the hand-harvested validation points in each field.
|
|
Field B (latitude 41°10'38'' N, longitude 96°26'35'' W) was located about 2 km northeast of Field A and represents a rainfed maizesoybean rotation. Soybean was planted at 0.76-m row spacing on 20 May 2002 (cv. Asgrow 2703) at a seeding density of 370000 seeds ha1. Insects and weeds were controlled through several postemergence applications, and no fertilizer was applied to soybean. The total field area used for yield mapping and remote sensing analysis was 65.4 ha.
A multispectral satellite image (IKONOS, Space Imaging, Thornton, CO) was obtained on 1 Sept. 2002 with four bands [blue: 445516 nm; green: 506595 nm; red: 632698 nm: and near infrared (NIR): 757853 nm] sensed at 4-m spatial resolution and 11-bit radiometric resolution. Image acquisition was timed to be near physiological maturity growth stages of both soybean and maize. The image was rectified based on a digital orthophotograph and resampled to 4- by 4-m grids for each field using ERDAS Imagine 8.5 (Leica Geosystems, Atlanta, GA). Three VI were calculated from the different reflectance bands and used as secondary information in the subsequent geostatistical analysis:
 | [1] |
 | [2] |
 | [3] |
where NDVI is calculated using red and NIR bands (Tucker, 1979), GNDVI is the green normalized difference vegetation index (Gitelson et al., 1996), and GARI is the green atmospherically resistant vegetation index (Gitelson et al., 2002).
Grain yield harvest for the whole field area was done using a John Deere JD9550 combine equipped with a calibrated Ag Leader PF3000 yield monitor with elevator mount moisture sensor (Ag Leader Technol., Ames, IA) and a differential global positioning system receiver. Soybean and maize were harvested on 11 Oct. and 5 Nov. 2002, respectively. The combine harvested a swath width of 6.1 m (eight rows), and yields were recorded at 2-s logging intervals. Yield-monitordetermined average yields were within 0.5 to 1.5% of weight-wagondetermined average yields. Raw yield monitor data were corrected assuming a grain flow delay of 12 s and screened to eliminate various types of erroneous values associated with the harvest process. Six types of erroneous values were eliminated from the raw data files: (i) header status up; (ii) short segments and start/end pass delays for both headlands and stop-and-go segments within the field (8 s); (iii) co-located yield records caused by GPS drift or overlapping passes; (iv) frequency distribution outliers of distance traveled, grain flow, and grain moisture (outside ±3 x standard deviation); (v) minimum and maximum biological yield limits for this site (maize: <0.1 Mg ha1, >22 Mg ha1; soybean: <0.1 Mg ha1, >7 Mg ha1); and (vi) small patches with extremely low or high yields that were not related to neighboring yield monitor records. In Step 6, a local neighborhood test was performed for each location following the movement of the combine through the field. Using inverse-distance interpolation, yield was estimated for a location from all values within a moving window that included the three preceding and three succeeding yield records in the same swath as well as yield records within two times the swath width in perpendicular direction of the combine movement. The 99% confidence interval of the estimate was obtained, and the data point was discarded if the measured yield for the same location was outside this interval. The cleaning algorithm removed 10 to 15% of the original yield monitor data, mainly in the headland areas. A total of 20369 (Field A) and 22513 (Field B) yield points remained and were used as primary variable for yield estimations using different spatial interpolation techniques.
At 23 (Field A) and 24 (Field B) sampling points (Fig. 1), grain yields were determined by hand harvest and used as independent validation set for evaluating the precision of different yield-mapping procedures. Hand harvest was conducted on 12 Oct. 2002 in Field B and 15 Oct. 2002 in Field A. At each location, all maize ears were hand-picked, or soybean yield was measured with a small-plot combine from a 9.3-m2 harvest area (two rows x 6.1 m). Grain yields were expressed in megagrams per hectare, with a moisture content of 0.155 g H2O g1 for maize or 0.13 g H2O g1 for soybean. To avoid bias in the yield monitor data due to grain removal by hand harvest at the validation locations, yield monitor records that fell within 4 m from the hand-harvested locations were removed from all further geostatistical analysis.
Modeling the Spatial Dependence
Experimental semivariograms of measured regionalized variables Z (grain yield, NDVI, GNDVI, or GARI) were computed as (Goovaerts, 1997):
 | [4] |
where
(h) is the semivariance for a lag distance interval h, u denotes the spatial coordinates of measured locations
, z(u
) and z(u
+ h) denote the
th pair of z observations separated at a distance h, and N(h) is the number of paired comparisons for a certain distance range h.
Nested semivariogram models were fitted to the experimental semivariances (Minasny et al., 2002) to describe crop yield or reflectance as a function of spatial variation occurring at three spatial scales: (i) extreme short-distance variation and measurement error (nugget effect), (ii) short-range variation, and (iii) long-range variation. The best fits were obtained by modeling all experimental semivariograms at Field A as the sums of a nugget effect and two exponential structures:
 | [5] |
where c0 is the nugget variance, c1 and c2 are the sills, and r1 and r2 are the range parameters of the short- and long-range spatial structures, respectively. Because in exponential models the sills are approached asymptotically, the effective ranges of spatial dependence (a) are approximated by ai
3ri where ri is the ith range parameter (Webster, 1985).
Experimental semivariograms at Field B were modeled as the sums of a nugget effect and two spherical structures:
 | [6] |
where c0 is the nugget variance, c1 and c2 are the sills, and a1 and a2 are the effective ranges of the short- and long-range spatial structures, respectively.
Geostatistical Yield Prediction Methods
Kriging is a generic name for stochastic interpolation techniques in which a continuous attribute z (e.g., grain yield) is estimated at any unsampled location u (denotes the spatial coordinates) using the z data measured at sampled locations
in the study area A or using both z data and correlated secondary attributes, based on the general linear regression estimator Z*(u) defined as (Goovaerts, 1997):
 | [7] |
where n(u) is the number of measured data values z(u
) used for estimating the attribute z at location u,
a is the weight assigned to each datum z(u
) interpreted as a realization of the regionalized variable Z(u
), and m(u) and m(u
) are the expected values (means) of Z(u) and Z(u
). Numerous kriging methods have been developed. They mainly differ in how the weights
a are determined for estimating values over the unsampled location and how secondary attributes are utilized in this process (Isaaks and Srivastava, 1989; Goovaerts, 1997).
Ordinary kriging assumes that the mean m(u) may fluctuate among local neighborhoods within a study area A. The assumption of stationarity of the mean is limited to the local neighborhood, but the mean is assumed to be unknown. The value of z for any location u is therefore estimated by:
 | [8] |
where Z*OK
is the OK estimator of Z(u). The kriging weights
OK
are forced to sum to 1 so that the unknown local mean is filtered from the linear estimator. Due to its simplicity, OK is widely used as a general interpolation method, but results depend on the measurement of z itself and the validity of the assumptions about stationarity of the mean and the variance.
Measurements of grain yield that are based on yield monitors contain numerous uncertainties. Many of these uncertainties are likely to be different from uncertainties associated with correlated secondary information such as remotely sensed crop reflectance because of the different measurement principles used. Therefore, if crop reflectance is correlated with grain yield, its utilization as secondary information is likely to improve the geostatistical prediction of Z*(u). Three methods for utilizing VI as secondary information for the geostatistical prediction of grain yield were evaluated. Simple kriging with varying local means assumes that the unknown local mean (trend) smoothly varies within each local neighborhood (and hence over the entire study area) and can be estimated from the secondary information (Goovaerts, 1997). The global grain yield mean in Eq. [7] is replaced by estimates of the locally varying mean grain yields m*SK
derived from one or more continuous secondary attributes y such as a crop reflectance index:
 | [9] |
where f is a global linear or nonlinear regression model describing the relationship between primary (grain yield) and secondary (reflectance) variables over the whole study area A. At each primary datum location u
, the residual grain yield value r(u
) was computed by subtracting the trend estimate m*SK(u
) from the measured grain yield z(u
), i.e., r(u
) = z(u
) m*SK
. The semivariogram of the residuals was modeled and used for estimating the kriging weights. The SKLM estimate of grain yield at each location u is then the sum of the regression estimate m*SK
and the simple kriging estimate of the residual value at location u:
 | [10] |
Simple kriging with local means requires that (i) there is a proven physical or biological rationale for a relationship between z and y that can be used to model the trend in local means (Eq. [9]) and (ii) the value of the secondary variable(s) must be known at all locations u being estimated.
Kriging with external drift is similar to SKLM, but the two methods differ in the definition of the trend component. Unlike SKLM, trend regression coefficients (Eq. [9]) in KED are estimated implicitly through the kriging system within each kriging neighborhood, not through a global calibration (regression) process before kriging of z. The local means m(u) are modeled as a linear function of the secondary variable y:
 | [11] |
and the KED estimate is obtained as:
 | [12] |
where the kriging weights are the solution of a system of n(u) + 2 linear equations that takes into account the trend components (Goovaerts, 1997; Deutsch and Journel, 1998). Kriging with external drift generally yields more local detail than SKLM because of the local re-evaluation of the linear regression of z on y compared with a global regression model used in SKLM. However, KED requires that (i) there is a proven physical or biological rationale for a relationship between z and y, (ii) the local mean of the primary variable z at location u is linearly related to the secondary datum y(u), (iii) the value of the secondary variable must be known at all primary locations u
as well as all locations u being estimated, and (iv) the secondary variable should vary smoothly in space to avoid instability of the kriging system (Goovaerts, 1997).
Cokriging incorporates the secondary information through the spatial cross-correlation between a primary and one or more secondary variables, which requires computation of auto- and cross-variograms and modeling of the coregionalization of the variables involved (Goovaerts and Webster, 1994; Dobermann et al., 1997; Goovaerts, 1997). The CK variant used in our study was collocated simple CK (Ma and Journel, 1999), in which the trend components of the primary variable z (grain yield) and a single secondary attribute y (crop reflectance) were assumed to be stationary means mz and my, resulting in the linear CK estimator of z at unsampled locations u defined as:
 | [13] |
Unlike SKLM or KED, CK does not require that the secondary information is available at all locations u to be estimated. The influence of the secondary variable y on estimating z depends on (i) the correlation coefficient between z and y, (ii) the spatial continuity of the attributes, and (iii) the sampling density and spatial configuration of z and y (Goovaerts, 1997).
Ordinary kriging, SKLM, KED, and CK were performed using cleaned yield monitor data as primary variable z and either NDVI, GNDVI, or GARI as single secondary variables y. Software used included GSLIB (Deutsch and Journel, 1998) for OK, SKLM, and KED and the CK program developed by Ma and Journel (1999). To account for nonconstant variance and nonlinearity of the global relationship between combine yield and VI, a generalized linear model (Myers et al., 2002) in PROC GENMOD in SAS (SAS Inst., 1999) was used in the SKLM method for estimating the regression coefficients in Eq. [9]. For example, the resulting model used to estimate maize yield in Field A as a function of GARI was yield = (1.0236 + 4.8565GARI)2, with p values < 0.0001 for both coefficients. The model used to estimate soybean yield in Field B as a function of GNDVI was yield = e(0.0397 + 3.5243GNDVI), with p values of 0.002 and <0.0001, respectively.
Prediction Comparison
Statistical comparison of the different interpolation methods was conducted by cross-validation with the independent, hand-harvested validation sample (Voltz and Webster, 1990). For each method, the mean error (ME), mean absolute error (MAE), and root mean squared error (RMSE) of the prediction were estimated. The ME is defined as:
 | [14] |
where n is the number of validation points and z
and z*
are estimated and observed values at location
. The ME is a measure of bias, and expected ME for an unbiased estimation is zero. However, ME tends to understate the error because positive and negative errors tend to cancel each other out. Therefore, the MAE may result in a better evaluation of the bias associated with different prediction methods and is defined as:
 | [15] |
The RMSE is a measure of the prediction precision and is defined as:
 | [16] |
The RMSE tends to place more emphasis on larger errors and, therefore, gives a more conservative measure than the MAE. The relative improvement of SKLM, KED, or CK over the standard estimation method, OK (RI, %), was calculated as:
 | [17] |
where RMSEp and RMSEOK are root mean square errors for a given estimation method and OK, respectively.
 |
RESULTS
|
|---|
Spatial Variation of Maize Yield and Vegetation Indices
Crop growth variation in the center-pivot irrigated maize field (Field A) was small and mainly occurred over short distances. Irrigated maize yields harvested by combine in Field A averaged 13.0 Mg ha1, with a range of 0.7 to 19.7 Mg ha1 and a coefficient of variation (CV) of 10.6% (Table 1). Remotely sensed VI had CVs ranging from 3.8 to 5.0% for the whole field and tended to be negatively skewed (Table 2). Compared with the raw yield data map, maps of VI showed more spatial detail (Fig. 1). Pivot tracks can be seen in Field A as well as the two paths that were cut to access six sampling locations for conducting other soil and plant measurements. None of these features appeared on the yield map (Fig. 1).
View this table:
[in this window]
[in a new window]
|
Table 1. Summary statistics of grain yield of maize (Field A) and soybean (Field B) measured for the whole field area by combine harvest (yield monitor data) or grain yield measured at hand-harvested sampling locations (validation data). Statistics for combine yields refer to yield monitor records remaining after screening for erroneous values.
|
|
View this table:
[in this window]
[in a new window]
|
Table 2. Summary statistics of three vegetation indices derived from an IKONOS satellite image obtained on 1 Sept. 2002.
|
|
Semivariograms of maize yield and VI included no or only a small nugget effect and two exponential structures with effective ranges of 13.6 to 24.5 m (short range) and 166 to 227 m (long range), respectively (Fig. 2). Grain yield variation occurring over distances less than 24.5 m (nugget effect + short-range structure) accounted for 45% of the overall yield variance. Vegetation indices had a shorter effective range of the short-range structure (13.615.2 m) than that fitted to the grain yield data (24.5 m), and 57 to 68% of the overall variance in crop reflectance was due to the sum of nugget effect and short-range variation. Remote sensing was likely to reveal short-distance crop variation better than the yield monitor because combine harvest involved a swath width wider than that of the image pixel size and smoothing of the yield data is caused due to grain flow delays. In contrast, the fine-resolution image showed abrupt changes in crop growth (Fig. 1), resulting in shorter ranges of the first spatial structure.

View larger version (48K):
[in this window]
[in a new window]
|
Fig. 2. Experimental semivariograms and fitted variogram models of combine-harvested grain yield and selected vegetation indices of (Field A) maize and (Field B) soybean. All semivariograms were modeled as a nested model comprised of a nugget effect and two exponential (Field A) or two spherical (Field B) structures for which sills and effective ranges are given as inserts in each graph. NDVI, normalized difference vegetation index; GNDVI, green normalized difference vegetation index; GARI, green atmospherically resistant vegetation index.
|
|
The relative contribution of the long-range semivariogram structure (range 166227 m, Fig. 2) to the overall variance was 32 to 43% for the three VI and 55% for grain yield. Crop variation at this spatial scale is likely to mainly reflect gradual changes in topography, soil type, and soil properties related to those. Due to the filtering behavior of the yield monitor (see the semivariogram in Fig. 2), cleaned yield monitor data reflected this scale of spatial variation as well as remotely sensed VI.
Spatial Variation of Soybean Yield and Vegetation Indices
Soybean yields in Field B averaged 3.3 Mg ha1, with a range of 0.4 to 5.6 Mg ha1 and a CV of 14.9% (Table 1). Vegetation indices had CVs ranging from 5.2 to 8.0% for the whole field (Table 2). Maps of VI showed patterns similar to those seen in the yield map but also considerably more spatial detail. For example, as in Field A, two paths were cut for sampling activities and can be seen in the southern and northern parts of Field B (Fig. 1). Former roads are also visible, which divided the field into smaller parts and were removed in late 2000. In the north-central section of Field B, a trench can be seen, which was dug in early 2001 to install an electric cable.
Semivariograms of VI closely resembled the grain yield semivariogram in Field B (Fig. 2) and included a small nugget effect and two spherical structures with ranges of 35 to 47 m and 156 to 177 m, respectively (Fig. 2). Grain yield variation occurring over very short distances (nugget effect + short-range structure) accounted for 56 to 63% of the variance in VI or grain yield. As in Field A, the range of the first spherical structure of the three VI (3537 m) was les than that of the combine grain yield (47 m). However, the range of the first spatial structure describing short-distance soybean growth variation was larger in Field B than that of irrigated maize in Field A (Fig. 2). Differences in the range are likely to represent differences among sites as well as the crops. Under rainfed conditions, growth is closely related to the inherent spatial variation in soil moisture, which itself is closely related to soil type and topography. Under irrigated corn, much of this inherent variation is masked by irrigation so that other factors causing growth variation over shorter distances may become relatively more important. Both the range and the relative contribution (3750% in Field B) of long-distance growth variation were comparable at both sites.
Correlations between Grain Yield and Vegetation Indices
Vegetation indices were correlated more to soybean yield than to maize yield (Table 3). In Field A, linear correlations between maize yield and crop reflectance increased in the order GNDVI < NDVI < GARI for the whole field (r = 0.410.46) as well as the hand-harvested samples (r = 0.180.37). In Field B, correlations between soybean yield and crop reflectance differed little among different indices used and ranged from r = 0.55 to 0.59 for the whole field or r = 0.66 to 0.70 for hand-harvested samples.
View this table:
[in this window]
[in a new window]
|
Table 3. Linear correlation coefficients between grain yield of maize (Field A) or soybean (Field B) and vegetation indices for locations with yield monitor data and the hand-harvested sampling locations.
|
|
Reasons for the smaller correlation between grain yield and VI in Field A include (i) insufficient sensitivity of VI to variation in crop biomass and yield in dense canopies and (ii) variation in grain yield that was not associated with sensed variation in reflectance. At the irrigated maize site, hand-harvested samples mainly represented a narrow range of high-yielding locations (12.214.9 Mg ha1, Table 1) at which the VI tested are known to be less sensitive to true variation in biomass (Gitelson et al., 2003). Compared with soybean, a significant increase in reflectance of the maize canopy was also caused by tassels, which was most pronounced in the red reflectance range and decreased the sensitivity of the VI to variation in LAI and biomass (Gitelson et al., 2003). Moreover, because the image was acquired more than 1 mo before hand harvest or 1 to 2 mo before final combine harvest, some yield losses occurring after image acquisition may have decreased the measured correlation between crop yield and the VI. In Field A, for example, late-season lodging of maize due to stalk rot was observed in small patches.
Correlations among the different VI were generally large (r = 0.880.96). Other recently proposed VI such as the Visible Atmospherically Resistant Index (Gitelson et al., 2002) or the index [(
NIR/
green) 1] (Gitelson et al., 2003) did not have larger correlations with grain yield than the indices shown in Table 3 (data not shown).
Prediction of Grain Yield as Affected by Estimation Methods and Vegetation Indices
Utilizing VI as secondary variables significantly increased the precision of grain yield maps compared with interpolation of yield monitor data alone. Yield maps produced by geostatistical integration of yield monitor data and remote sensing had less bias (smaller ME and MAE), greater precision (smaller RMSE, Fig. 3), and more detail at fine spatial resolution (Fig. 4) than maps produced by OK of grain yield. Estimation of maize yield (Field A) or soybean yield (Field B) was affected more by the choice of a spatial estimation method than by the different VI, as indicated by clusters seen in the plots of RMSE vs. MAE or vs. ME (Fig. 3).

View larger version (23K):
[in this window]
[in a new window]
|
Fig. 3. Mean error (ME), mean absolute error (MAE), and root mean squared error (RMSE) referring to the validation data sets with different estimation methods and imagery information as secondary variables at two fields. Symbols indicate the different yield prediction methods used (OK, ordinary kriging; D, kriging with external drift; C, cokriging; S, simple kriging with varying local means). Numbers indicate the different vegetation measures used as secondary information for the yield prediction [1 = normalized difference vegetation index (NDVI), 2 = green normalized difference vegetation index (GNDVI), 3 = green atmospherically resistant vegetation index (GARI)].
|
|

View larger version (101K):
[in this window]
[in a new window]
|
Fig. 4. Maps of (left) maize yield (Mg ha1) in Field A and (right) soybean yield (Mg ha1) in Field B obtained by ordinary kriging of yield monitor data (OK) or simple kriging with local means (SKLM). In SKLM, remotely sensed green atmospherically resistant vegetation index (GARI, Field A) or green normalized difference vegetation index (GNDVI, Field B) were used as a secondary variables.
|
|
In Field A, ME, MAE, and RMSE decreased in the order OK > CK
KED > SKLM (Fig. 3). Techniques such as CK or KED provided only small improvement over OK, whereas a relatively large increase in the precision was obtained by using SKLM. Compared with OK, the RI ranged from 11.6 to 17.2% for the three SKLM methods tested, with GARI performing best as the secondary variable used (Table 4). Ordinary kriging tended to have a greater smoothing effect than methods in which secondary information was used (Table 4). The best performing method, SKLM in combination with GARI, reproduced details such as circular pivot tracks (Fig. 4), which represented narrow strips with true low or zero yield, but which were smoothened out with OK. Both the overall yield range and the CV of grain yield interpolated with SKLM-GARI were more similar to the original data than those of OK (Tables 1 and 4).
View this table:
[in this window]
[in a new window]
|
Table 4. Summary statistics of grain yield of maize (Field A) and soybean (Field B) predicted for the entire field area by ordinary kriging (OK) of yield monitor data or by simple kriging with local means (SKLM) of yield monitor data in combination with either normalized difference vegetation index (NDVI), green normalized difference vegetation index (GNDVI), or green atmospherically resistant vegetation index (GARI) as secondary variables used for estimating the locally varying means.
|
|
In Field B, ME, MAE, and RMSE were similar for the OK, CK, and KED methods but declined significantly with SKLM (Fig. 3). Compared with OK, the RI ranged from 14.6 to 18.6% with the three SKLM methods tested, with GNDVI performing best as the secondary variable used (Table 4). Utilizing GNDVI in SKLM decreased the RMSE to 0.267 Mg ha1. Fine-resolution spatial details depicted by the satellite image (Fig. 1) were best reproduced by SKLM in combination with GNDVI (Fig. 4). Both the overall yield range and the CV of soybean yield interpolated with SKLM-GARI were more similar to the original data than those of OK (Tables 1 and 4).
Maize yields at many hand-harvested locations were underestimated by all methods, but deviations from the true yield were smallest with SKLM (Fig. 3 and Fig. 5). This cross-validation bias was probably caused by the selection of the validation sites. The hand-harvested sample mainly represented areas with high maize yield, and its ranges and CVs of yield and VI were smaller than those in the whole field (Table 2). Under conditions of such dense crop canopies, the VI used in our study are known to be insufficiently sensitive to true crop growth variation (Gitelson et al., 2002, 2003). Figure 6 further illustrates this by plotting predicted vs. observed yields for the combine-harvested locations. Maize yield predictions by SKLM in combination with GARI tended to be positively biased at low yield levels of about <6 Mg ha1, whereas underestimation became more common at yield levels of >12 Mg ha1.

View larger version (29K):
[in this window]
[in a new window]
|
Fig. 5. Observed and predicted grain yields of (Field A) maize and (Field B) soybean at hand-harvested validation points and the distribution of the differences between predicted and observed values (residuals). Yield predictions were based on ordinary kriging (OK) of yield monitor data or simple kriging with local means (SKLM) of yield monitor data in combination with green atmospherically resistant vegetation index (GARI, Field A) or in combination with green normalized difference vegetation index (GNDVI, Field B).
|
|

View larger version (34K):
[in this window]
[in a new window]
|
Fig. 6. Observed and predict grain yield of (Field A) maize and (Field B) soybean at combine-harvested locations and the distribution of differences between observed and predicted yields in relation to measured combine yields. Predicted yields were based on simple kriging with local means using green atmospherically resistant vegetation index (GARI, Field A) or green normalized difference vegetation index (GNDVI, Field B) as secondary variables.
|
|
The validation sample for soybean in Field B represented the yield range and corresponding VI better than the validation set for maize in Field A. Hand-harvested samples had a similar mean soybean yield (3.4 Mg ha1) and CV (13.4%) compared with yields measured by combine for the entire field (Table 1). Mean VI were also similar, but the measured ranges and CVs were somewhat smaller for the hand-harvest samples than in the whole field (Table 2). Estimation bias for rainfed soybean grown in Field B was generally less than that of irrigated maize in Field A (Fig. 5 and 6).
For comparison, grain yield was also directly estimated from VI using the regression models derived by PROC GENMOD, i.e., without geostatistical analysis that accounts for spatial correlation. In Field A, the best result was obtained for estimating grain yield through linear regression of GARI, leading to a RI of 10.3% but greater MAE (0.89) and RMSE (0.97) than those obtained with SKLM in combination with GARI (0.80 and 0.89, respectively). In Field B, yield estimation by regression varied little among the three VI used but was generally worse than any of the geostatistical methods tested. Precision of the yield estimates decreased by 6.9 to 8.7% compared with OK or by 31 to 33% compared with the best prediction method, SKLM in combination with GNDVI.
 |
DISCUSSION
|
|---|
Geostatistical Procedures for Yield Mapping
Use of SKLM with the most suitable vegetation index resulted in a nearly 20% relative increase in the precision of grain yield maps over OK (Fig. 3), sharpened yield maps (Fig. 4), and frequency statistics of the interpolated yield data (Table 4) that were more similar to the original data (Table 1) than those obtained with OK. Due to the different sensitivities of yield monitors and remotely sensed images to spatial scales of grain yield variation and measurement errors, integrating both is likely to improve the accuracy of yield maps compared with spatial predictions that are based on grain yield data alone or empirical methods in which grain yield is predicted from remotely sensed images through ordinary regression. Remotely sensed VI represent short-distance crop variation more precisely (Fig. 1) than yield monitor data because the latter are affected by smoothing due to delayed, diffusive grain flow and georeferencing and other errors associated with the combine harvest (Whelan and McBratney, 2002; Arslan and Colvin, 2002b). Techniques such as SKLM of yield in combination with a vegetation index measured at fine spatial resolution at least partly alleviate errors associated with combine harvesting, including those that are due to inaccurate geopositioning of yield records.
Of the geostatistical methods compared, SKLM performed consistently better than KED, CK, or OK (Fig. 3) and was most robust in modeling the relationship between primary and secondary variables. Although SKLM, KED, and CK all incorporate secondary information for primary variable estimation, they have different principles and are affected differently by the characteristics of secondary variables (Goovaerts, 1997). The CK variant used in this study, Markov-type CK, assumes symmetry of the cross-covariance model and proportionality of the primary variable semivariogram to the cross-semivariogram (Goovaerts, 1997; Ma and Journel, 1999). However, this assumption may not hold true for yield mapping because spatial dependences of grain yield and crop reflectance are affected differently by management and measurement error. Cokriging involves modeling of auto- and cross-variograms and solving large CK systems, which can lead to numerical instabilities. Considering this, CK may have limited potential for yield mapping, particularly if the secondary information is available for all locations to be estimated.
Both SKLM and KED perform simple kriging on the corresponding residuals, but they differ by the definition of trend components. In KED, the regression between grain yield and VI is implicitly estimated through the kriging system and assumed to be linear. Consequently, KED is prone to error if the local regression cannot be reliably established or is nonlinear. Examples for this in agricultural fields are narrow areas with extreme values of either measured grain yield or VI, including zones for which yield records cannot be reliably obtained (e.g., start/end pass delays near field edges). In SKLM, trend coefficients describing the relationship between grain yield and crop reflectance are derived globally for the whole field, which allows greater flexibility in modeling the relationship between yield and a vegetation index. Due to the use of a global regression model, SKLM tends to be less affected by artifacts or extremes in the data and appears to be the most suitable method for yield mapping based on integration of yield monitor data with VI. Such techniques should be implemented in commercial farming software.
Opportunities for Further Improvements
Our study focused on evaluating different geostatistical procedures to improve the accuracy and precision of grain yield maps. Further improvement of SKLM approaches for yield mapping appear feasible and should focus on increasing the correlation between primary and secondary variable(s) so that local means can be predicted more accurately (Eq. [9]). Major directions for achieving this are (i) minimizing the spatial mismatch between yield monitor records and remote sensing data, (ii) improving VI for yield estimation, and (iii) identifying crop-specific optimal dates for image acquisition.
A first issue requiring further study is the optimal scale for correlating yield with a vegetation index. In our study, each combine-harvested yield measurement represented an area of at least 20 m2. Although this was similar to the 16 m2 covered by a single pixel in the IKONOS image, an exact spatial alignment of primary (yield) and secondary (VI) variables was not possible. No attempt was made to adjust the time shift of the yield monitor data to increase the correlation with the satellite image (Willis et al., 1999) because such procedures tend to be empirical and because using the VI as secondary variables in kriging resulted in at least a partial rectification of the yield data. More sophisticated approaches for modeling grain flow deconvolution in combine harvesters (Whelan and McBratney, 2000) could improve the correlation with imagery. Spatial filtering of satellite images (van Meirvenne and Goovaerts, 2002) or aggregating yield monitor data and pixels of an image to a common larger size of support may also improve modeling of the relationship between primary and secondary variables because noise is increasingly filtered out (Willis et al., 1999; Zhang et al., 1999). However, the optimal scale for yield mapping may vary. Spatial aggregation to larger cell sizes may be sufficient for most management purposes, but it also reduces the ability to detect and display details in yield maps (Fig. 4) and may lead to significant bias in the classification of multiyear yield maps for creating management zones (Ping and Dobermann, 2003).
Second, improved VI must be developed that are more sensitive to grain yield variation. Although empirical equations for predicting grain yield from different spectral bands can be established through location-specific regression modeling (Yang and Anderson, 2000; Osborne et al., 2002), more robust VI are required for implementing approaches such as SKLM in commercial software targeting precision farming. At our sites, relative improvement of grain yield prediction with SKLM increased in the order GNDVI < NDVI < GARI for irrigated maize or GARI < NDVI < GNDVI for rain-fed soybean (Table 4). Indices such as GNDVI and GARI tend to be less sensitive to atmospheric effects than NDVI and resulted in less error in estimating vegetation fraction or LAI for wheat and maize, especially during the later growing season (Gitelson et al., 1996, 2002, 2003). However, relationships between crop biomass or LAI and most VI vary among different crop cover types and tend to become nonlinear or even insensitive for dense canopies. The NDVI, for example, becomes less sensitive to true variation in LAI when LAI exceeds about 2 (Myeni et al., 1997; Gitelson et al., 2003). For comparison, in our study, average measured LAI was 4.0 in Field A and 2.6 in Field B at the time of image acquisition (T. Arkebauer, unpublished data, 2002). It is likely that this explained the smaller relative improvement (Table 4) and larger bias (Fig. 5 and 6) of yield prediction in irrigated maize (Field A) compared with rainfed soybean (Field B).
Third, the choice of the growth stage (date) for image acquisition greatly affects the measured correlation between grain yield and VI and therefore the potential relative improvement of yield mapping by using imagery. For example, Basso et al. (2001) noticed that an image taken on 18 July reflected the association between soybean yield and NDVI better than an image taken on 8 August because the NDVI derived from the latter approached an asymptotic limit in the high-yielding areas within the field. Yang et al. (2000) found that images taken at or shortly after peak vegetative growth were a better yield indicator than those acquired during early growth or very late in the season. Shanahan et al. (2001) showed that GNDVI was best correlated with maize yield during the midgrain-filling stage, but the correlation was small during early growth and tended to decrease after about R3 or R4 stage of maize.
 |
CONCLUSIONS
|
|---|
The precision of interpolated grain yield maps increased by utilizing a spatially dense multispectral image in combination with combine-measured grain yield data. Of the geostatistical methods evaluated, SKLM performed best in terms of improving the precision of the grain yield maps and displaying detailed yield patterns in two fields. Utilizing a satellite image as secondary information decreased errors associated with yield monitor data and also allowed better prediction in areas where no reliable yield measurements were available. The SKLM method may have considerable potential for use in precision farming because it allows flexible modeling of the relationship between yield and one or more secondary variables and is relatively easy to implement.
Where imagery has been acquired during the growing season for purposes of crop monitoring, the same imagery can be used for improving yield maps at no or little extra cost. Future efforts on improving yield mapping should concentrate on obtaining improved yield monitor data and imagery and developing more yield-sensitive VI, particularly for high-yielding agricultural crops and with a minimum need for location-specific empirical modeling.
 |
ACKNOWLEDGMENTS
|
|---|
We thank Mark Schroeder and Walker Luedke (University of NebraskaLincoln) for providing the yield monitor data used in this study. This material is based on research supported by the USDA-CSREES/NASA program on Application of Geospatial and Precision Technologies (AGPT, Grant no. 2001-52103-11303) and the U.S. Department of Energy: (a) EPSCoR program, Grant no. DE-FG-02-00ER45827, and (b) Office of Science, Biological and Environmental Research Program (BER), Grant no. DE-FG03-00ER62996.
 |
NOTES
|
|---|
Contribution of the Nebraska Agric. Exp. Stn. Scientific J. Ser. Paper no. 14190.
 |
REFERENCES
|
|---|
- Arslan, S., and T.S. Colvin. 2002a. An evaluation of the response of yield monitors and combines to varying yields. Precis. Agric. 3:107122.
- Arslan, S., and T.S. Colvin. 2002b. Grain yield mapping: Yield sensing, yield reconstruction, and errors. Precis. Agric. 3:135154.
- Basso, B., J.T. Ritchie, F.J. Pierce, R.P. Braga, and J.W. Jones. 2001. Spatial validation of crop models for precision agriculture. Agric. Syst. 68:97112.
- Bekele, A., R.G. Downer, M.C. Wolcott, W.H. Hudnall, and S.H. Moore. 2003. Comparative evaluation of spatial prediction methods in a field experiment for mapping soil potassium. Soil Sci. 168:1528.
- Blackmore, S., and M. Moore. 1999. Remedial correction of yield map data. Precis. Agric 1:5366.
- Bouman, B.A.M., H.W.J. van Kasteren, and D. Uenk. 1992. Standard relations to estimate ground cover and LAI of agricultural crops from reflectance measurements. Eur. J. Agron. 1:249262.
- Bourennane, H., D. King, and A. Couturier. 2000. Comparison of kriging with external drift and simple linear regression for predicting soil horizon thickness with different sample densities. Geoderma 97:255271.
- Deutsch, C.V., and A.G. Journel. 1998. GSLIB. Geostatistical software library and user's guide. 2nd ed. Oxford Univ. Press, New York.
- Dobermann, A., P. Goovaerts, and H.U. Neue. 1997. Scale dependent correlations among soil properties in two tropical lowland rice fields. Soil Sci. Soc. Am. J. 61:14831496.[Abstract/Free Full Text]
- Gitelson, A.A., Y. Kaufman, and M.N. Merzlyak. 1996. Use of a green channel in remote sensing of global vegetation from EOS-MODIS. Remote Sens. Environ. 58:289298.
- Gitelson, A.A., Y.J. Kaufman, R. Stark, and D. Rundquist. 2002. Novel algorithms for remote estimation of vegetative fraction. Remote Sens. Environ. 80:7687.
- Gitelson, A.A., A. Vina, T.J. Arkebauer, D.C. Rundquist, G. Keydan, and B. Leavitt. 2003. Remote estimation of leaf area index and green leaf biomass in maize canopies. Geophys. Res. Lett. 30:1248, doi 10.1029/2002 GL016450.
- Goovaerts, P. 1997. Geostatistics for natural resources evaluation. Oxford Univ. Press, New York.
- Goovaerts, P. 2000. Geostatistical approaches for incorporating elevation into spatial interpolation of rainfall. J. Hydrol. 228:113129.
- Goovaerts, P., and R. Webster. 1994. Scale-dependent correlation between topsoil copper and cobalt concentrations in Scotland. Eur. J. Soil Sci. 45:7995.
- Isaaks, E.H., and R.M. Srivastava. 1989. An introduction to applied geostatistics. Oxford Univ. Press, New York.
- King, S.L., A.J. Lister, and M. Hoppus. 2000. A comparison of kriging and cokriging for mapping forest volume in Connecticut. In W.G. Hubbard and J.B. Jordin (ed.) Southern Forestry GIS Conf., 3rd, Athens, GA [CD-ROM]. 1012 Oct. 2000. Univ. of Georgia Press, Athens.
- Ma, X., and A.G. Journel. 1999. An expanded GSLIB cokriging program allowing for two Markov models. Comput. Geosci. 25:627639.
- Major, D.G., G.B. Schaalje, G. Asrar, and E.T. Kanemasu. 1986. Estimation of whole-plant biomass and grain yield from spectral reflectance of cereals. Can. J. Remote Sens. 12:4754.
- Minasny, B., A.B. McBratney, and B.M. Whelan. 2002. VESPER version 1.5 [Online]. Available at http://www.usyd.edu.au/su/agric/acpa (verified 31 Oct. 2003). Aust. Cent. for Precision Agric., The Univ. of Sydney, Sydney, NSW, Australia.
- Myneni, R.B., F.G. Hall, P.J. Sellers, and A.L. Marshak. 1997. Estimation of global leaf area index and absorbed PAR using radiative transfer models. IEEE Trans. Geosci. Remote Sens. 35:13801393.
- Myers, R.H., D.C. Montgomery, and G.G. Vining. 2002. Generalized linear models: With applications in engineering and the sciences. John Wiley & Sons, New York.
- Osborne, S.L., J.S. Schepers, D.D. Francis, and M.R. Schlemmer. 2002. Use of spectral radiance to estimate in-season biomass and grain yield in nitrogen- and water-stressed corn. Crop Sci. 42:165171.[Abstract/Free Full Text]
- Ping, J.L., and A. Dobermann. 2003. Creating spatially contiguous yield classes for site-specific management. Agron. J. 95:11211131.[Abstract/Free Full Text]
- SAS Institute. 1999. SAS system version 8.0. SAS Inst., Cary, NC.
- Searcy, S.W., S.K. Schueller, Y.H. Bae, S.C. Borgelt, and B.A. Stout. 1989. Mapping of spatially variable yield during grain combining. Trans. ASAE 32:826829.
- Senay, G.B., A.D. Ward, J.G. Lyon, N.R. Fausey, and S.E. Nokes. 1998. Manipulation of high spatial resolution aircraft remote sensing data for use in site-specific farming. Trans. ASAE 41:489495.
- Shanahan, J.F., J.S. Schepers, D.D. Francis, G.E. Varvel, W.W. Wilhelm, J.M. Tringe, M.R. Schlemmer, and D.J. Major. 2001. Use of remote-sensing imagery to estimate corn grain yield. Agron. J. 93:583589.[Abstract/Free Full Text]
- Stafford, J.V., B. Ambler, R.M. Lark, and J. Catt. 1996. Mapping and interpreting the yield variation in cereal crops. Comput. Electron. Agric. 14:101119.
- Thylen, L., P.A. Algerbo, and A. Giebel. 2000. An expert filter removing erroneous yield data. In P.C. Robert et al. (ed.) Precision agriculture [CD-ROM]. Proc. Int. Conf., 5th, Minneapolis, MN. 1619 July 2000. ASA, CSSA, and SSSA, Madison, WI.
- Triantafilis, J., A.I. Huckel, and I.O.A. Odeh. 2001. Comparison of statistical prediction methos for estimating field-scale clay content using different combinations of ancillary variables. Soil Sci. 166:415427.
- Tucker, C.J. 1979. Red and photographic infrared linear combinations for monitoring vegetation. Remote Sens. Environ. 8:127150.
- van Meirvenne, M., and P. Goovaerts. 2002. Accounting for spatial dependence in the processing of multi-temporal SAR images using factorial kriging. Int. J. Remote Sens. 23:371387.
- Voltz, M., and R. Webster. 1990. A comparison of kriging, cubic splines and classification for predicting soil properties from sample information. J. Soil Sci. 41:473490.
- Webster, R. 1985. Quantitative spatial analysis of soil in the field. Adv. Soil Sci. 3:170.
- Whelan, B.M., and A.B. McBratney. 2000. An approach to deconvoluting grain-flow within a conventional combine harvester using a parametric transfer function. Precis. Agric. 2:389398.
- Whelan, B.M., and A.B. McBratney. 2002. A parametric transfer function for grain-flow within a conventional combine harvester. Precis. Agric. 3:123134.
- Willis, P.R., P.G. Carter, and C.J. Johannsen. 1999. Assessing yield parameters by remote sensing techniques. p. 14651473. In P.C. Robert et al. (ed.) Precision agriculture. Proc. Int. Conf., 4th, St. Paul, MN. 1922 July 1998. ASA, CSSA, and SSSA, Madison, WI.
- Yang, C., and G.L. Anderson. 2000. Mapping grain sorghum yield variability using airborne digital videography. Precis. Agric. 2:723.
- Yang, C., J.H. Everitt, J.M. Bradford, and D.E. Escobar. 2000. Mapping grain sorghum growth and yield variations using airborne multispectral digital imagery. Trans. ASAE 43:19271938.
- Zhang, M.H., P. Hendley, and D. Drost, M. O' Neill, and S. Ustin. 1999. Corn and soybean yield indicators using remotely sensed vegetation index. p. 14751481. In P.C. Robert et al. (ed.) Precision agriculture. Proc. Int. Conf., 4th, St. Paul, MN. 1922 July 1998. ASA, CSSA, and SSSA, Madison, WI.