Agronomy Journal Journal of Natural Resources and Life Sciences Education
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (5)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Simbahan, G. C.
Right arrow Articles by Ping, J. L.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Simbahan, G. C.
Right arrow Articles by Ping, J. L.
Agricola
Right arrow Articles by Simbahan, G. C.
Right arrow Articles by Ping, J. L.
Related Collections
Right arrow Data Management
Right arrow Land-use Planning
Right arrow Site-Specific Analysis
Published in Agron. J. 96:1091-1102 (2004).
© American Society of Agronomy
677 S. Segoe Rd., Madison, WI 53711 USA

SITE-SPECIFIC MANAGEMENT

Screening Yield Monitor Data Improves Grain Yield Maps

G. C. Simbahan, A. Dobermann* and J. L. Ping

Dep. of Agron. and Hortic., Univ. of Nebraska, P.O. Box 830915, Lincoln, NE 68583-0915

* Corresponding author (adobermann2{at}unl.edu).

Received for publication September 29, 2003.

    ABSTRACT
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Yield monitor data contain systematic and random errors, which must be removed for creating accurate yield maps. A general procedure for assessing yield data cleaning methods was applied to a new postprocessing algorithm in which six common types of erroneous yield monitor values were removed: (1) combine header status up; (2) start-/end-pass delays; (3) grain flow, distance traveled, and grain moisture outliers; (4) values exceeding minimum and maximum biological yield limits; (5) local neighborhood outliers; and (6) short segments and co-located points. The algorithm was applied to four yield maps of maize (Zea mays L.) and soybean [Glycine max. (L.) Merr.] grown under irrigated and rainfed conditions. A total of 13 to 20% of the original yield monitor data was removed, with 72 to 85% of the removal occurring in the mandatory, primary screening process (Steps 1 and 2). Only 2.6 to 3.9% of the original yield monitor data were removed during secondary screening (Steps 3 through 6), but this additional screening lead to yield semivariograms with smaller nugget values and sills and a relative increase in map precision of 4.3 to 5.4% compared with conducting primary screening only. The local neighborhood outlier test (Step 5) removed a larger proportion of yield values in soybean (12.8 to 14.9% of all deleted values) than in maize (2.7 to 3.2%). The proposed algorithm is robust enough for implementation in commercial software but requires further testing in other crops and environments and with other brands of yield monitors.

Abbreviations: CV, coefficient of variation • GPS, global positioning system • MAE, mean absolute error • RMSE, root mean squared error • SD, standard deviation


    INTRODUCTION
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
IN 2001, more than 30000 grain yield monitors were in use in North America (Doerge, 2002), mainly in maize and soybean areas of the Corn Belt. Cleaned, accurate yield maps are needed to fully capture farmers' investments in these new technologies and make better agronomic decisions. With the mass of yield monitor data produced annually, there is increasing need for robust, more automatic data-screening algorithms although creating high quality yield maps requires a great deal of knowledge and often manual editing by those who actually collected the data.

Yield data obtained with combine-mounted, georeferenced grain yield monitors are affected by various systematic and random sources of measured yield variation (Stafford et al., 1996; Doerge, 1999; Arslan and Colvin, 2002b), including (i) naturally occurring yield variation due to climate and soil–landscape features, (ii) management-induced yield variation, and (iii) measurement errors caused by the yield-monitoring process itself. Naturally occurring yield variation is often related to more gradual changes in soil–landscape conditions. On the other hand, variation caused by the actual crop management often represents random events that typically occur in small areas, such as planter skips, poor crop establishment, nonuniform fertilizer application, herbicide damage, lodging, or pest damage. Measurement errors include grain flow and other sensor errors (moisture, speed, swath width), errors due to georeferencing and combine movement, operator errors, and data-processing errors (Shearer et al., 1997; Blackmore and Moore, 1999; Arslan and Colvin, 2002b). For most locations within a field, both (ii) and (iii) represent short-distance, random variation that differs from year to year. Such artifacts must be removed from yield monitor raw data to display and properly interpret the major patterns of yield variation as a basis for making site-specific crop management decisions (Ping and Dobermann, 2003).

Various filtering techniques have been proposed for postharvest processing of yield monitor data. Beck et al. (1999) proposed a screening program to identify incorrect yield monitor values based on acceptable yield and grain moisture ranges, appropriate travel distance, sudden surges in grain flow, combine turns, and overlaps of harvest passes with previously harvested ground. Their algorithm removed about 10% of the original yield monitor data. Shearer et al. (1997) filtered yield data for unrealistic cycle distance, outliers in the frequency distributions of mass flow, moisture, cycle distance and yield; rescaled values to match average weights and moisture content measured by standard grading practices; and smoothened the raw data by computing running averages. Thylen et al. (2001) evaluated three filtering levels for detecting and removing errors from reduced cutting width and rapid speed changes by comparing a data point with its 10 closest neighbors. Kleinjan et al. (2002) described a cleaning program that removed erroneous yield points related to header-up status of the combine, rapid speed change, grain flow exceeding selected low and high limits, and yield values exceeding ±3 standard deviations (SD) in comparison with all data within a search neighborhood of three swath widths. Noack et al. (2003) attempted to detect and eliminate erroneous yield measurements based on floating local neighborhood searches within a H-configuration along harvest tracks.

Applications of these methods have generally shown that postharvest screening of yield monitor data improves frequency distributions of yield, spatial structure of grain yield data, and correlations of yield maps with remotely sensed images (Thylen et al., 2001; Kleinjan et al., 2002; Noack et al., 2003). However, many of the available screening programs focus on only a few of the possible sources of error. There are also differences between programs in the criteria considered as well as differences in the extent of the research base used to develop the algorithms.

The objectives of this paper are to (i) describe a general procedure for evaluating improvements in grain yield maps due to screening of yield monitor data and (ii) describe a new yield data screening algorithm. Our focus is on describing what data are removed in various screening steps and how this improves yield maps, whereas the proposed screening algorithm mainly serves as an example for a logical sequence of screening steps.


    MATERIALS AND METHODS
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Yield Monitor Data Collection
Four sets of yield monitor data were collected from three large production fields near Mead, NE, in 2001 and 2002. Field A (44 ha) represents an irrigated maize crop grown in 2001. Field B (65 ha) was located 2 km northeast of Field A and represents a rainfed maize crop grown in 2001. Field C (45 ha) was next to Field A and represents an irrigated soybean crop grown in 2002. Field D (65 ha) was the same as Field B but was planted with rainfed soybean in 2002. Soil types were similar at all three sites. All crops were planted at 0.76-m row spacing, and insects and weeds were controlled through several pesticide applications. Fields A and C were center-pivot irrigated during the whole growing season, whereas Fields B and D received no irrigation. A total of 224 kg N ha–1 was applied in three splits to irrigated maize (Field A) and 128 kg N ha–1 to rainfed maize (Field B). No fertilizer was applied to soybean. The same agricultural equipment was used to manage all fields.

All four fields were harvested using a John Deere JD9550 combine, equipped with a calibrated Ag Leader PF3000 yield monitor (Ag Leader Technol., Inc., Ames, IA) and a Ag Leader 3050 differential global positioning system (GPS) receiver. The combine harvested crops with a swath width of 6.1 m (eight rows), and yields were recorded at 1- and 2-s logging intervals for maize and soybean, respectively. Yield-monitor–determined average yields were within 0.5 to 1.5% of scale-wagon–determined average yields for the whole field, indicating good yield monitor calibration.

In the first data-processing step (Fig. 1) , grain flow delay correction and conversion of yield monitor raw data (.yld files) to advanced text file format were done using SMS Basic 1.01 (Ag Leader Technol., Inc., Ames, IA). Commercial combine harvesters equipped with yield monitors log variables such as geographical coordinates, grain flow rate, time, logging interval, distance traveled, swath width, grain moisture, and header status up or down in time intervals of one to several seconds. These raw data are used to reconstruct the actual grain yield at a particular location by accounting for the time shift that is caused by the diffusive nature of grain flow through a combine (Blackmore and Moore, 1999; Arslan and Colvin, 2002b). Grain flow correction is accomplished by assuming a combine and/or crop-specific fixed shift value or by applying more complex convolution models to reconstruct the true yield on the ground by reversing the smoothing nature of a yield-monitoring system (Whelan and McBratney, 2002; Lark and Wheeler, 2003). In our study, optimal grain flow shift settings were employed using the procedure proposed by Beal and Tian (2001). This procedure assumes that incorrect grain flow shift would result in a large ratio of the surface area of a three-dimensional plot of the yield monitor data (yield plotted as z variable vs. geographical coordinates) to the two-dimensional projected area of the upper surface (equivalent to the harvested whole-field area). Optimum selection of grain flow shift would occur when this ratio is at a minimum. For all four fields, we estimated this ratio for different grain flow shifts ranging from 6 to 18 s and plotted it against the delay time (Fig. 2) . Based on the results, values of 10 s for maize and 12 s for soybean were chosen as the optimum grain flow shifts. No other changes were made in the raw data.



View larger version (34K):
[in this window]
[in a new window]
 
Fig. 1. Algorithm for yield data screening in which six types of erroneous or uncertain values are deleted.

 


View larger version (22K):
[in this window]
[in a new window]
 
Fig. 2. Effect of different grain flow shift values (delay time) on the ratio of surface area of a 3-d plot of the yield monitor data to projected area of the upper surface. Optimum selection of a grain flow shift value occurs when this ratio is at a minimum.

 
Yield Monitor Data Screening Algorithm
The exported files were subjected to a sequential screening process (Fig. 1), with the goal of detecting and removing six types of erroneous values: (1) combine header status is up; (2) start-/end-pass delays for both headlands and stop-and-go segments within the field; (3) frequency distribution outliers of distance traveled, grain flow, and grain moisture; (4) user-defined minimum and maximum biological yield limits; (5) small patches or narrow strips with extremely low or high yields that are not closely related to immediate neighbors; and (6) short segments and co-located yield records.

Steps 1 and 2 are referred to as primary screening because they represent known technical errors that are always associated with yield monitor operations. The primary screening eliminates erroneous data values that were recorded while the combine header was up, values recorded after the header has been lowered but before grain flow has started or has stabilized (start-pass delay), and values at the end of harvest segments when cutting has stopped but the header has not been raised yet (end-pass delay). It is recommended to remove such values, and all commercially available yield-monitoring software performs this task. The main decision for the program operator to make is to choose settings for the length of start- and end-pass delays, which may differ among crops and harvest combines due to differences in swath width, harvest speed, and grain flow through a combine. To obtain location-specific settings, grain flow measured during a short time period after start of a new harvest segment or before the end of a harvest pass can be plotted vs. time. Figure 3 shows example graphs for 15 combine tracks in each field. Based on these graphs, 8 and 4 s were selected as settings for start- and end-pass delays, respectively, with no differences between maize and soybean or between different yield levels represented by irrigated and nonirrigated crops.



View larger version (40K):
[in this window]
[in a new window]
 
Fig. 3. Grain flow measured by the yield monitor near the start or end of harvest passes. The different symbols show 15 harvest passes in each field, which were used to derive the values for start-/end-pass delays in Step 2 of the screening algorithm shown in Fig. 1.

 
Steps 3 through 6 are referred to as secondary screening (Fig. 1), in which the objective is to further remove both truly erroneous yield records that are caused by combine operation and yield sensing as well as localized, extreme yield variation due to either measurement error, crop management, or other events that are not related to the general spatial patterns of crop yield variation. The exact causes for such random yield variation are not always known, so the secondary screening process is a combination of statistical tests and empirical decisions.

In Step 3, an outlier test is performed for the variables grain flow, grain moisture, and distance traveled. The main rationale for frequency distribution screening is to eliminate the most extreme values, in terms of combine speed, grain flow, and grain moisture, that lie outside ranges of optimal yield monitor performance. The global means and SDs are calculated for the whole field, and as a default criterion, yield records for which any one of these variables is outside the mean ±3 SD range are deleted. The same criterion was used by Kleinjan et al. (2002) as the final step in their cleaning algorithm. It should be noted, however, that in a practical implementation of a yield-cleaning algorithm, users should have some flexibility in selecting the outlier test criterion that is most suited for a particular environment. For example, in dryland fields with very wide ranges of true yield variation, the SD for discarding yield data as outliers could be selected wider than the ±3 SD range used in our studies. Another alternative is to use percentiles rather than SD ranges because the latter may be affected much by skewed data distributions.

In Step 4, the user must provide an estimate of the expected, biologically possible yield range. Although many erroneous data outside this range are likely to be screened out in Step 3, the additional step was added as a machine-independent cross-check, for which limits can be set based on expert knowledge about crop yield potential. The value for the minimum possible yield should be set to a small number, slightly larger than zero, unless there is evidence that areas with true zero yields occur in a field and are not caused by the yield-monitoring operation itself. The value for the maximum possible yield should represent crop yield potential, defined as the yield of a crop cultivar when grown in environments to which it is adapted, with nutrients and water nonlimiting and pests and diseases effectively controlled (Evans, 1993). Information from high-yield trials or crop simulation models can be used for setting site- and season-specific upper yield limits. Settings used in our studies were 0.01 and 22 Mg ha–1 for lower and upper limits in maize (155 g kg–1 moisture content), respectively, and 0.01 to 7 Mg ha–1 for soybean (130 g kg–1 moisture content), respectively. For both crops, the upper values represented maximum yields achieved in yield contests or simulated by crop simulation models for the climatic conditions in eastern Nebraska and other areas of the Corn Belt (Duvick and Cassman, 1999; Specht et al., 1999; Dobermann et al., 2003a).

In Step 5, local yield extremes that occur in small patches or narrow strips with little relationship to neighboring yield records are detected and deleted. This may include remaining yield monitor or combine operation errors but also true short-distance yield variation due to random events affecting crop growth. The assumption underlying this screening step is that such local outliers represent random events that are limited in scope and unlikely to occur at the same location in succeeding years. Such data should be filtered out for integrating yield maps over time to obtain maps of spatially varying yield performance (Dobermann et al., 2003b). As shown in Fig. 4 , following the movement of the combine through the field, a floating local neighborhood test is performed for each yield monitor record. At each location xj, yield z* is predicted by inverse-distance–weighted interpolation with a power of 2 from all yield values z measured within a local neighborhood surrounding the sampled location (Shepard, 1968):

[1]
where xi = locations of measured yield data points within the local neighborhood and dij = the distance from each point to xj. The local neighborhood (Fig. 4) includes the three preceding and three succeeding yield records in the same track as well as yield records within a band perpendicular to the tangent of the path of traveled. The default width of this band was one swath width of the combine (6.1 m), and the band crossed three adjacent harvest passes on both the left and right sides of the path of travel. This definition of the local neighborhood assigns more weight to the neighboring harvest passes because those are likely to have yield monitor and operation errors that are independent from those affecting data points that precede or succeed xj in the same combine track. The confidence interval of the estimate z* is obtained. If the measured yield z at location xj is outside this interval, the yield value is considered a spatially uncorrelated outlier and discarded. The rationale for this definition is that yield at any location xj is likely to be spatially correlated to its immediate neighbors, irrespective of the direction of the combine movement. If that is not the case, a random event must have caused an unusually high or low yield at location xj, either due to yield monitor error or due to specific crop management events that occur in very small patches. The former may include sudden changes in speed or grain flow (Arslan and Colvin, 2002a, 2002b), whereas the latter may be caused by planter skips, poor crop establishment, nonuniform fertilizer application, herbicide damage, lodging, pest damage, etc. A super-block search algorithm was implemented in Step 5 to facilitate a fast, floating local neighborhood search. In this, the whole field is divided into nine super blocks to narrow the search radius for identifying neighboring points around a test location xj. A default value of 95% or ±2 SD was used as confidence interval, but depending on the intended use of a yield map, any other criterion could be set for this. For example, if there is evidence that certain management-induced yield patterns are small and permanent (e.g., compacted travel paths), the test criteria for the local neighborhood test could be relaxed.



View larger version (14K):
[in this window]
[in a new window]
 
Fig. 4. Determination of local neighborhood outliers (Step 5 in Fig. 1). The dotted circle shows the yield data point xj being tested. An estimated yield is calculated by inverse distance interpolation from the yield points located within the local neighborhood and compared with the actual yield. Arrows indicate the direction of the different combine passes in this example.

 
Step 6 removes short segments due to combine stop-and-go events and points that were recorded with the same geographical coordinates. Short segments are considered unreliable because most data points in them are affected by start- or end-pass delays. As a default, segments with less than 12 yield monitor points were identified as short segments and deleted. Co-located data points can be caused by GPS error or overlapping harvest passes. Their removal is necessary to avoid difficulties with kriging algorithms used for interpolating yield maps.

The screening algorithm was written in Visual Basic 6 (Microsoft Corp., Redmond, WA) and requires Ag Leader advanced export text files as input. A simple executable version of the program and the Visual Basic source code are made available upon request sent to the corresponding author. Although the program contains generic default values for all test criteria, many of those can be changed as needed.

Statistical Analysis
The performance of the screening algorithm was evaluated in all four fields. Values, frequency statistics, and locations of data removed in each step were calculated or identified, and the effects of primary and secondary screening on the spatial structure of grain yield and the precision of yield maps were quantified.

Experimental semivariances of measured grain yield were calculated using programs of the Geostatistical Software Library (Deutsch and Journel, 1998):

[2]
where {gamma}(h) is the semivariance for a lag distance interval h, u denotes the spatial coordinates of measured locations {alpha}, z(u{alpha}) and z(u{alpha} + h) denote the {alpha}th pair of z observations separated at a distance h, and N(h) is the number of paired comparisons for a certain distance range h. Double-spherical semivariogram models were fitted to the experimental semivariances using PROC NLIN in SAS (SAS Inst., 1999):

[3]
where c0 is the nugget variance, c1 and c2 are the sills, and a1 and a2 are the ranges of the short- and long-range spatial structures, respectively. Various semivariogram models were evaluated (data not shown). The double-spherical model fitted all data sets best and allowed depicting the major spatial scales of yield variation in each field.

Statistical evaluation of the improvement in yield prediction due to primary and secondary yield screening was conducted using jackknife kriging (Deutsch and Journel, 1998) with an independent validation sample of 1000 locations, which were randomly selected from the yield data that remained after primary or secondary screening, respectively. The values for those 1000 points were estimated by ordinary kriging from the remaining yield data sets and compared with the combine-harvested yield values for the same locations. For each field, 10 jackknife runs with different samples were performed, and their results were evaluated in terms of the mean absolute error (MAE) and the root mean squared error (RMSE). The MAE is defined as:

[4]
where n is the number of validation points (1000), z{alpha}* is the measured yield value at location {alpha}, and z{alpha} is the estimated value for the same location obtained after either the primary or secondary screening processes. The RMSE is a measure of the prediction precision and is defined as:

[5]

The RMSE tends to place more emphasis on larger errors and, therefore, gives a more conservative measure than the MAE. The relative improvement (RI) of yield map precision obtained after secondary screening compared with conducting primary screening only was calculated as:

[6]
where RMSEPrim and RMSESec are the RMSE associated with the primary and secondary screening processes, respectively. Paired comparisons of RMSE values between the primary and secondary processes were performed for each jackknife run by using a t test in PROC TTEST in SAS (SAS Inst., 1999) to determine if significant differences existed in terms of prediction precision between the two screening processes.


    RESULTS AND DISCUSSION
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Stepwise Removal of Erroneous Yield Data: Primary Screening
The original, unscreened yield data were characterized by left-skewed frequency distributions with large coefficients of variation (CV, 30 to 38%), large differences between means and medians, numerous zero yield values, and extremely high yield values that by far exceeded known biological yield limits (Table 1). For example, maximum recorded yields were 33.2 to 33.9 Mg ha–1 for maize and 15.9 to 17.5 Mg ha–1 for soybean.


View this table:
[in this window]
[in a new window]
 
Table 1. Summary statistics of grain yield of irrigated and rainfed maize (Fields A and B, respectively) and irrigated and rainfed soybean (Fields C and D, respectively) before and after yield monitor data screening.

 
The primary screening procedure removed the largest number of yield points in the whole screening process. Between 10 and 17% of the original raw data were deleted in Steps 1 and 2 (Table 1), which was equivalent to 72 to 80% of all data points deleted in the complete screening procedure outlined in Fig. 1. Primary screening significantly increased mean grain yields, brought means and medians closer together, and decreased the CVs (Table 1).

Erroneous yield points due to header-up status and start-/end-pass delays in the yield monitor operation were mostly removed in the headland areas in all four fields but also included stop-and-go locations inside the fields (Fig. 5) . As expected, data removal in Step 2 mainly occurred adjacent to locations for which the combine header was up (Step 1). Combine harvest in the two center-pivot irrigated fields (Fields A and C) included east–west passes for the core of each field as well as several circular harvest passes along the outer rim of each field. As a result, the headland areas with many data points removed in Steps 1 and 2 occurred in circular patterns and were offset from the field edges (Fig. 5). Fields B and D were only harvested in east–west direction, resulting in headlands along the western and eastern field borders only (Fig. 5).



View larger version (39K):
[in this window]
[in a new window]
 
Fig. 5. Sequential removal of erroneous or uncertain yield data in the six screening steps: (1) header status up; (2) start-/end-pass delays; (3) grain flow, distance traveled, and grain moisture outliers; (4) minimum/maximum yield limits; (5) local neighborhood outliers; (6) short segments and co-located points. Values refer to irrigated and rainfed maize (Fields A and B, respectively) and irrigated and rainfed soybean (Fields C and D, respectively).

 
Because harvest had not begun or had already ceased, the majority of yield points removed in Step 1 had zero or very low yields, but few extremely high values were also removed (Table 2). In Step 2, data removal included mainly low yields but also locations at which grain flow was increasing or decreasing (Fig. 3) as well as few high-yield values. Therefore, mean yields of data removed in Step 2 were greater than those of data removed in Step 1 but smaller than the average yield for the whole field (Tables 1 and 2).


View this table:
[in this window]
[in a new window]
 
Table 2. Summary statistics of grain yield data removed in the six steps of screening the yield monitor raw data. Values refer to irrigated and rainfed maize (Fields A and B, respectively) and irrigated and rainfed soybean (Fields C and D, respectively).

 
All commercially available yield data processing software performs the primary screening process shown in Fig. 1. The key issue in conducting this screening is setting accurate start- and end-pass delay times, for which a simple graphical analysis (Fig. 3) can provide sufficient practical guidance.

Stepwise Removal of Erroneous Yield Data: Secondary Screening
Only 2.6 to 3.2% of the original raw data were deleted in the secondary screening (Table 1), which was equivalent to 20 to 28% of all data points deleted during primary and secondary screening (Table 2). Secondary screening had little effect on yield means and medians but decreased CVs and skewness due to removal of extremely low or high values that had remained after the primary screening process (Table 1). Final CVs were larger in rainfed crops (15 to 18%) than in irrigated crops (11 to 13%), whereas such differences were masked in the original raw data by very high CVs in all fields. Negative skewness of yield frequency distributions decreased in all fields, except for irrigated maize in Field A (Table 1). At that site, true low-yielding areas occurred due to waterlogging affecting crop emergence in small depressions or stalk rot damage and lodging occurring after anthesis.

Step 3 removed points that were defined as statistical outliers of either grain flow, grain moisture, or distance traveled (combine speed). It accounted for 7.5 to 16.8% of the total number of data removed (Table 2). In terms of grain yield, data removed were mostly at the lower end, but also included few values outside the yield potential (Table 2). Values removed in Step 3 showed no consistent spatial patterns across the four fields and little association with the locations of data removed in Steps 1 and 2, indicating that they were mainly caused by random events such as errors in the harvest or yield-sensing operations. In some cases (Field A and B), this included removal of longer, narrow strips of yield data. In Field D, a large cluster of data was removed along the western field border, a location at which a former tree line was taken out in the previous year. This affected combine operation and yield sensing, resulting in outliers for distance traveled and grain moisture.

Similar outlier-testing procedures have been proposed in other yield data screening algorithms, but the criteria and variables used vary (Beck et al., 1999; Kleinjan et al., 2002; Jürschik et al., 2002). The statistical outlier test used in Step 3 is a machine- and crop-independent solution in which the only empirical decision is that of choosing the criterion for defining outliers (here: ±3 SD around the mean). Alternatively, where more specific information is available for a certain yield monitor model or crop, the frequency-distribution–based outlier screening could be replaced or supplemented by setting upper and lower quality control limits. For example, Beck et al. (1999) removed points that had zero distance traveled, speeds greater than 4.9 m s–1, or grain moisture outside user-defined lower and upper limits. Kleinjan et al. (2002) deleted yield points that were recorded if velocity changes were greater than 15%, speed was lower than 1.6 km h–1, or grain flow was beyond a specified range. It is difficult, however, to make such empirical criteria robust enough for implementation in a general yield data screening algorithm.

Step 4 removed yield points that fell outside the empirically defined minimum and maximum yield limits. Because many raw data values that would cause outliers in the computed grain yield were removed in the preceding Step 3, only 4 to 13 additional points (or 0.1 to 0.3% of all data deleted) were deleted in Step 4. Most of those were locations with zero yields, but two occasions of very high soybean yields were also detected (Table 2).

Step 5 removed yield points that were identified as local outliers within the moving window (Fig. 4). Such outliers were widely scattered across each field (Fig. 5) and included single points as well as small clusters or segments of narrow strips. Removed points had lower mean yields but much larger SD values compared to those for the whole field (Tables 1 and 2), indicating the highly variable and localized nature of these yield data. In the two maize fields, only 3% of all data deleted belonged to this category, whereas this proportion was 13 to 15% for soybean (Table 2). This could be due to true differences between maize and soybean in the nature of short-distance yield variation, or it could be caused by the different logging intervals used. Maize was harvested at 1-s logging intervals, whereas 2-s cycles were used for soybean. Because the size of the search band crossing adjacent combine passes (Fig. 4) was independent from the logging interval, the number of data points used for estimating z*(xj) in Eq. [1] was larger for maize than for soybean, including many points near to xj. Because both distance and the number of points affect the estimated yield z*(xj) and its confidence interval, the probability that yield measured at xj would be regarded as a spatial outlier decreases with more closely spaced yield data records. Conceptually, Step 5 is similar to the H method proposed by Noack et al. (2003) or the moving window mean and SD test used by Beck et al. (1999), but both the definition of the local neighbors and the statistical outlier test differ from it. A key difference is that the distance of neighboring points is taken into account in estimating z*(xj) through interpolation, whereas Noack et al. (2003) and Beck et al. (1999) estimated z*(xj) as a floating arithmetic average only or used a wider range of ±3 SD for the outlier testing (Beck et al., 1999).

Only few yield points were removed in Step 6, and they represented co-located points as well as occasionally occurring short segments (Fig. 5). Their share ranged from 0.1 to 2% of all data removed.

Improvement of Semivariograms and Yield Maps
Figure 6 shows maps of the original yield monitor data and all data points deleted in each field. Overall, 13 to 20% of the original yield monitor points were removed in the four fields, with most of the removal occurring in the mandatory, primary screening process (Steps 1 and 2). Most of the primary data removal took place in the headland areas, whereas secondary screening resulted in a more spatially dispersed data removal, which is likely to affect modeling of the spatial structure of grain yield and, thereby, the precision of interpolated yield maps.



View larger version (76K):
[in this window]
[in a new window]
 
Fig. 6. Yield monitor data before screening (left) and all removed points (right) in irrigated and rainfed maize (Fields A and B, respectively) and irrigated and rainfed soybean (Fields C and D, respectively).

 
Because of the high sampling density associated with on-the-go yield monitoring, experimental semivariograms of grain yield can be modeled with great confidence. After the primary screening process, semivariograms of yield showed good spatial structure in all four fields (Fig. 7) . However, nugget variances, a measure of spatially uncorrelated yield variation, remained relatively large. Expressed as the proportion of the overall yield variance, nugget effects were 14, 23, 43, and 49% in Fields B, D, A, and C, respectively. Note that the lowest nugget variance occurred in the rainfed crops grown in Fields B and D, whereas irrigated cropping in Fields A and C was associated with larger proportions of random or short-distance yield variation.



View larger version (37K):
[in this window]
[in a new window]
 
Fig. 7. Experimental semivariograms and fitted double-exponential variogram models of grain yield after primary and secondary yield monitor data screening. In each graph, fitted variogram parameters after primary screening are shown in the top left corner while those on the lower right corner show the variogram parameters after secondary screening.

 
Adding the secondary screening process resulted in significant downward shifts of the semivariograms of grain yield (Fig. 7). At all sites, nugget values, sills, and ranges of the two spherical structures fitted decreased due to the secondary screening. Expressed as the proportion of the overall yield variance, nugget effects decreased to 9, 13, 28, and 32% in Fields B, D, A, and C, respectively. Lower nugget values are desirable when kriging is used for data interpolation because they are an indication that more of the data is related directly to the model and not influenced by random error, resulting in less data smoothing. A decrease in nugget effect was also observed by Thylen et al. (2001) following their yield data filtering algorithms.

Prediction precision increased significantly after the secondary screening procedure compared with conducting the primary screening only, as indicated by decreasing MAE and RMSE values in all fields (Table 3). The t test of mean paired differences in RMSE indicated that RMSE values obtained after secondary screening were significantly lower than those obtained from interpolation of yield data that were only run through the primary screening process. The average relative improvement in yield map precision ranged from 4.3 to 5.4% at all sites and was consistent for the different jackknife runs performed at each site (Table 3).


View this table:
[in this window]
[in a new window]
 
Table 3. Mean absolute error (MAE) and root mean squared error (RMSE) of 10 jackknife validation runs for each field. Results refer to two levels of yield data screening in irrigated and rainfed maize (Fields A and B, respectively) and irrigated and rainfed soybean (Fields C and D, respectively). The relative improvement (RI) represents the increase in precision obtained by applying the secondary screening in addition to the primary screening process.

 

    CONCLUSIONS
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Different yield data screening filters are likely to result in different numbers and different locations of yield monitor data removed. Partial removal of erroneous yield points may not effectively improve map precision while excessive removal of yield points can cause significant loss of yield information. The procedures discussed here should be mainly understood as a general methodology for evaluating the effects of different yield data screening algorithms and criteria. Frequency statistics, maps of removed data, semivariograms before and after the filtering, and jackknife validations are major tools for evaluating screening algorithms, and they could also be used to help selecting appropriate parameters in the filtering process.

The proposed screening algorithm eliminated erroneous yield values based on a logical, sequential order of data screening, with a minimum of empirical limits imposed. It is likely to be robust enough to obtain more accurate yield maps that better illustrate the major spatial patterns of yield variation in a field. No replacement or smoothing of the original data is performed. Between 13 to 20% of the original yield monitor data points were removed, with most of the removal occurring during primary screening for technical errors in the yield-monitoring process. However, secondary screening aiming at removing other sources of random variation in yield monitor records within a field was critical for improving frequency distributions, spatial structure, and the precision of maps of grain yield. The algorithm provided consistent results in terms of (i) what data were removed, (ii) the proportions of the different screening steps, and (iii) improvement in yield map precision.

Whether the proposed algorithm removed all erroneous or uncertain yield data cannot be fully assessed because selecting accurate evaluation criteria remains a challenging issue. However, several of the screening criteria utilized are presently not or only partially implemented in most commercially available software. In particular, efforts should be made to fully utilize the secondary screening process in current yield-processing programs, but its sensitivity to empirical settings for limits as well as statistical tests needs to be evaluated more. More testing with other crops, in other environments, and with various yield monitor brands should be conducted. We also acknowledge that significant improvements could be possible by improving methods for specifying grain flow shifts (Whelan and McBratney, 2000; Chung et al., 2002; Lark and Wheeler, 2003), changing some of the test criteria used, or adding other criteria for detecting errors due to varying swath width or overlap of harvest passes (Beck et al., 1999).


    ACKNOWLEDGMENTS
 
We thank Mark Schroeder (University of Nebraska–Lincoln) for providing the yield monitor data used in this study. This material is based on research supported by the Hatch Act, the USDA-CSREES/NASA program on Application of Geospatial and Precision Technologies (AGPT, Grant no. 2001-52103-11303), and the U.S. Department of Energy (i) EPSCoR program, Grant no. DE-FG-02-00ER45827, and (ii) Office of Science, Biological and Environmental Research Program (BER), Grant no. DE-FG03-00ER62996.


    NOTES
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
A contribution of the Univ. of Nebraska Agric. Res. Div., Lincoln, NE. Journal Ser. no. 14303.


    REFERENCES
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 




This article has been cited by other articles:


Home page
Agron. J.Home page
J. L. Ping, R. B. Ferguson, and A. Dobermann
Site-Specific Nitrogen and Plant Density Management in Irrigated Maize
Agron. J., June 23, 2008; 100(4): 1193 - 1204.
[Abstract] [Full Text] [PDF]


Home page
Agron. J.Home page
K. A. Sudduth and S. T. Drummond
Yield Editor: Software for Removing Errors from Crop Yield Maps
Agron. J., October 15, 2007; 99(6): 1471 - 1482.
[Abstract] [Full Text] [PDF]


Home page
Agron. J.Home page
Y. Miao, D. J. Mulla, P. C. Robert, and J. A. Hernandez
Within-Field Variation in Corn Yield and Grain Quality Responses to Nitrogen Fertilization and Hybrid Selection
Agron. J., January 5, 2006; 98(1): 129 - 140.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (5)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Simbahan, G. C.
Right arrow Articles by Ping, J. L.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Simbahan, G. C.
Right arrow Articles by Ping, J. L.
Agricola
Right arrow Articles by Simbahan, G. C.
Right arrow Articles by Ping, J. L.
Related Collections
Right arrow Data Management
Right arrow Land-use Planning
Right arrow Site-Specific Analysis


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
The SCI Journals Crop Science Vadose Zone Journal
Journal of Plant Registrations Soil Science Society of America Journal
Journal of Natural Resources
and Life Sciences Education
Journal of
Environmental Quality