|
|
||||||||
USDA-ARS, Cropping Systems and Water Quality Research Unit, 269 Agricultural Engineering Bldg., Univ. of Missouri, Columbia, MO 65211
* Corresponding author (ken.sudduth{at}ars.usda.gov)
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: DELAY, grain flow delay filter END, end pass delay filter MAN, manual filter MAXV, maximum velocity filter MAXY, maximum yield filter MINS, minimum swath width filter MINV, minimum velocity filter MINY, minimum yield filter POS, position filter SMV, smooth velocity filter START, start pass delay filter STDY, standard deviation filter
USDA-ARS, Cropping Systems and Water Quality Research Unit, 269 Agricultural Engineering Bldg., Univ. of Missouri, Columbia, MO 65211
* Corresponding author (ken.sudduth{at}ars.usda.gov)
Received for publication November 17, 2006.
Yield maps are a key component of precision agriculture, due to their usefulness in both development and evaluation of precision management strategies. The value of these yield maps can be compromised by the fact that raw yield maps contain a variety of inherent errors. Researchers have reported that 10 to 50% of the observations in a given field contain significant errors and should be removed. Methods for removing these outliers from raw yield data have not been standardized, although many different filtering techniques have been suggested to address specific error types. We developed a software tool called Yield Editor to simplify the process of applying filtering techniques for yield data outlier detection and removal. Yield Editor includes a map view of the yield data, allowing the user to interactively set, assess the effects of, and refine a number of previously reported automated filtering methods. Additionally, Yield Editor allows manual selection of erroneous points, transects, or regions for investigation and possible deletion. This paper describes the filters implemented in Yield Editor, discusses input, output, and filtering options, and documents availability of the program. Example applications of Yield Editor on five test fields are used to show how the user interacts with the software and to analyze the relative importance of the various filters.
Abbreviations: DELAY, grain flow delay filter END, end pass delay filter MAN, manual filter MAXV, maximum velocity filter MAXY, maximum yield filter MINS, minimum swath width filter MINV, minimum velocity filter MINY, minimum yield filter POS, position filter SMV, smooth velocity filter START, start pass delay filter STDY, standard deviation filter
| INTRODUCTION |
|---|
|
|
|---|
Most common error sources have been well defined and described in the literature (i.e., Blackmore and Marshall, 1996; Moore, 1998; Blackmore and Moore, 1999, Thylén et al., 2001). The error sources they noted included unknown header width, combine filling/emptying times, time lag of grain through the combine, positional errors, rapid velocity changes, and others. These researchers also noted the importance of addressing the errors and suggested different methods for removing them or minimizing their effects.
Currently, no standard method exists for cleaning raw yield data, although many different filtering or screening techniques have been suggested to address specific error types (e.g., Blackmore and Moore, 1999; Drummond et al., 1999; Thylén et al., 2001; Beal and Tian, 2001; Beck et al., 2001; Arslan and Colvin, 2002; Chung et al., 2002; Yang et al., 2002; Simbahan et al., 2004). While many of these techniques could be implemented in a spreadsheet or in simple programming code, some would require a more sophisticated mapping or GIS application. Further, the selection of numerical parameters for the various filters might be made more efficient and accurate through interaction with map views of the changed dataset.
This paper reports on Yield Editor, a software tool we constructed to apply a number of yield data filtering techniques and to provide the user with feedback on the effect of the applied filters on the yield map. Included are descriptions of the filters implemented in Yield Editor, an outline of program characteristics, an example of Yield Editor use, and information on availability of the software.
| METHODS OF DATA FILTERING IMPLEMENTED IN YIELD EDITOR |
|---|
|
|
|---|
Grain Flow Delay (DELAY)
This parameter corrects for the transport time between the location where the crop is harvested and the location where flow rate is sensed. Although harvesting and separation processes in a harvester are generally complex and more properly represented by dynamic models (e.g., Whelan and McBratney, 2002), Birrell et al. (1996) found no practical advantage to using a dynamic model instead of a simple time delay. Several automated (Beal and Tian, 2001; Chung et al., 2002; Yang et al., 2002; Mueller-Warrant and Whittaker, 2006) and semiautomated (Robinson and Metternicht, 2005) procedures for determining the delay parameter have been proposed. Because delay time can be affected by design of harvesting equipment, speed, ground slope, load, and other factors, it is best to determine the DELAY parameter value for the harvest conditions within a particular field. Figure 1
, showing a portion of a field where adjacent transects were harvested in opposite directions, illustrates the importance of setting the delay parameter properly. With a flow delay of 9 s, there is a discernible offset in the low-to-high yield transition, depending on the direction of harvest. When a 14-s delay is applied, the pattern is much more spatially contiguous (Fig. 1), indicating that 14 s would be an appropriate choice for the DELAY parameter. Currently Yield Editor requires the user to select the optimum DELAY value by iterative adjustment and graphical observation of the effects (e.g., Fig. 1). Although it is generally quite easy to select the optimum DELAY manually, future versions of Yield Editor may incorporate one of the automated methods described above.
|
|
Minimum Velocity (MINV)
The MINV filter removes points collected at velocities less than the specified limit. It assists in removing unrealistically high instantaneous spikes in the calculated yield that can occur as velocity approaches zero. This filter was incorporated in procedures suggested by a number of researchers, including Thylén and Murphy (1996), Beck et al. (2001), and Kleinjan et al. (2002).
Smooth Velocity (SMV)
This filter eliminates data where rapid velocity changes have occurred. The SMV parameter represents an allowable ratio of velocities from one point to the next point along a transect. For example, a ratio of 0.2 indicates that if the velocity at the current point varies by more than 20% from the velocity at the previous point, the current observation will be deleted. The importance of dealing with rapid velocity changes was noted by Thylén and Murphy (1996) and by Kleinjan et al. (2002).
Minimum Swath (MINS)
This filter removes data points with swath width readings smaller than the minimum. If the combine operator has accurately entered all swath width changes into the yield monitor during harvest, this filter may be used to help eliminate point rows and narrow swaths, where lower crop flow increases the opportunity for "noise" on the yield signal. For example, a procedure described by Robinson and Metternicht (2005) removed swaths that were 30% narrower than adjacent swaths. The effectiveness of the MINS filter depends on the operator manually adjusting the swath indicator on the yield monitor; an action the operator may not be willing or remember to do. Beck et al. (2001) noted this issue and suggested that the most practical solution was to avoid recording data with narrower widths. Research into swath width sensing (Reitz and Kutzbach, 1996) and calculation of swath width from GPS data (Han et al., 1997; Drummond et al., 1999) may provide alternative ways of quantifying partial swaths, but such systems are not yet commercially available.
Maximum Yield (MAXY) and Minimum Yield (MINY)
These filters set yield thresholds above (MAXY) and below (MINY) which points will be deleted. They are commonly used in filtering procedures (Beck et al., 2001; Simbahan et al., 2004) and are sometimes the only filters applied. The MINY parameter should be chosen to represent the lowest yield that can be realistically expected. This lowest realistic yield may be near zero if stresses (e.g., ponding, severe water stress) have caused crop failure at some within-field locations. Beck et al. (2001) suggested setting MINY near zero and believed that redundancy with other filters would generally mean that bad low-yielding points would be identified as erroneous based on other information. The MAXY level should represent crop yield potential to avoid eliminating potentially valid data. Simbahan et al. (2004) suggested using information from high-yield trials or crop simulation models to set year- and site-specific upper yield limits.
Standard Deviation of Yield (STDY)
The STDY filter removes yield data that are more than a certain number of standard deviations from the field mean. This filter has been suggested by many researchers, with commonly reported values of 2 (e.g., Thylén et al., 2001) or 3 (e.g., Ping and Dobermann, 2005) standard deviations. The need to adjust this value depending on the range of true yield variation was noted by Simbahan et al. (2004). Optimization of the STDY parameter for a particular field could be achieved by iteratively choosing values and observing the effect on the resulting yield data distribution. The goal would be to select a final value for STDY that would remove isolated outliers without affecting areas of true yield variation.
Position (POS)
This filter removes positional "flyers" consisting of single points or data segments that lie outside the boundaries of the field of interest. Although the reliability and accuracy of differential GPS positioning data has increased in recent years, positional problems can still occur on occasion, for example with a loss of the differential signal. If data points are within the field, but exhibit positional error, they can be deleted using the MAN filter described below.
Manual (MAN)
Almost invariably, yield data will include some data points which are clearly in error, yet are not easily captured by any automated filter. For example, a small but significant number of errors can be introduced when the combine operator harvests a narrow "cleanup" swath in a field, but does not correctly record the swath width. In another case, the combine operator may forget to lift and lower the head when leaving/entering the crop, precluding the START and END filters from properly removing what may be a significant number of errors. The manual filter allows the user to select individual points, transects or regions for removal, addressing the case where automated filters fail to recognize errors that are visually obvious.
| YIELD EDITOR PROGRAM DESCRIPTION |
|---|
|
|
|---|
The initial target audience for Yield Editor was researchers who needed rapid, accurate, and repeatable procedures for cleaning yield data from multiple trials. However, others who use yield maps (e.g., consultants, extension staff, or producers) also recognize the magnitude of errors that exist in yield maps, and the importance of cleaning maps that are to be used for site-specific decision making. Yield Editor was beta tested on numerous crops, including corn (Zea mays L.), soybean [Glycine max (L.) Merr.], wheat (Triticum aestivum L.), grain sorghum [Sorghum bicolor (L.) Moench], oat (Avena sativa L.), and barley (Hordeum vulgare L.), by researchers, extension staff, consultants, and producers from around the world. The software incorporates changes suggested by these users.
Importing Data
Yield Editor provides several options for importing data. The most straightforward method is to import data in either AgLeader (AgLeader Technologies, Ames, IA)1 advanced format or Greenstar (Deere & Co., Moline, IL) text format. If the yield monitor data is not in one of these formats, a spreadsheet application can be used to rearrange the data. Yield Editor will work correctly if position, grain flow, logging interval, distance, swath width, and pass number are included in the column locations corresponding to the AgLeader advanced or Greenstar text formats.
For greatest control over the editing process within Yield Editor, and to avoid the possibility of losing useful information at the end of transects, combine delay export parameters from the yield monitor data management software (e.g., AgLeader SMS Basic) should be set to minimize the number of points deleted during the export. Yield Editor can easily filter any excess points, but obviously cannot recover points which have not been exported. Grain flow delay time should be set close (within ±5 s) to the correct value. This setting does not need to be exact, as final selection of the delay time will take place within Yield Editor. A delay time export setting of 12 s should be appropriate for most combines and operating conditions. This is within 5 s of the delay times reported by a number of authors (Birrell et al., 1996; Reitz and Kutzbach, 1996; Lark et al., 1997; Beal and Tian, 2001; Chung et al., 2002; Yang et al., 2002; Simbahan et al., 2004; Ping and Dobermann, 2005) who used various models of combines in tests with corn, soybean, wheat, grain sorghum, and barley. Start and stop delays should be set to minimal values (zero if possible). Minimum and maximum yield filters should be disabled, and the export of positional flyers and/or points where the header has been raised should be allowed as well, if the yield monitor data management software allows selection of these options. Each of these filters is simple to implement in Yield Editor, and the removal of points is documented in the session file created by the software.
A second import option allows accessing yield data in the native binary format present on the data card used by the yield monitor for storage. This is accomplished by opening the binary card image in a freeware package called FOViewer (MapShots, Inc., Cumming, GA). FOViewer uses field operation device drivers that are available for all major yield monitoring systems to interpret the binary data records. It then provides a map of the data on the card and allows the user to select any desired subset. Once a dataset has been selected, Yield Editor can be launched from within FOViewer with that dataset ready for analysis.
Finally, data that has previously been imported using either of the above methods may have been saved in a Yield Editor session file. These session files can be reloaded directly into Yield Editor, and they will maintain all metadata regarding the current status of the map cleaning process. For example, information about what filters are active and their associated parameter values will be automatically loaded. Points that have been removed with any manual editing procedures are marked as such, and can be reinstated if the user chooses to do so. Lastly, time-stamped metadata related to file I/O operations and user notes are maintained within the session file, so that a record of the entire yield cleaning process is preserved.
Applying Data Filters
The filtering procedure in Yield Editor consists of selecting which filters to use and determining and setting parameters for those filters. The software includes several tools to assist the user with this process. The effects of the automated filters can be highlighted or individually displayed on the map at any time to indicate how effective the current filter settings are at removing errors and where these removals have occurred. Statistics to indicate the number of points removed by each of the filters, as well as the effect on the yield mean, standard deviation, coefficient of variation, number of observations, and data range are included and updated each time new filters are applied. The filters and their associated parameters and visualization tools are all accessed graphically from the Yield Editor filtering screen (Fig. 3
).
|
Exporting Data
Data can be exported from Yield Editor in three ways. The most straightforward of these is to export the data in a user-defined text format. The user can specify which of the fifteen data fields should be exported to the file, what delimiter to use, and whether to export the nonfiltered points, the filtered points, currently selected points, or all of the points. A status flag variable can be exported to indicate whether each point has been filtered or not, and, if appropriate, which filter(s) caused the removal of that point.
FOViewer also provides output options. If Yield Editor was launched from within FOViewer, when Yield Editor is closed, the data, including status flag information, is automatically ported back into FOViewer. The map can then be exported from FOViewer in numerous formats, including many text options, database (.dbf), and shapefile formats.
As noted in the import section, Yield Editor session files can also be created. These files allow the user to store the current state of each data point as well as the current state of each filter and associated parameter settings. Session files also include metadata regarding file I/O and user notes that can be invaluable when the data are later used for analysis or decision making tasks. Session files were designed to package the settings and results of the entire data filtering process to facilitate easy sharing of the information among multiple individuals working with the same dataset.
| EXAMPLE USE OF YIELD EDITOR: EFFECTS OF FILTERING TECHNIQUES |
|---|
|
|
|---|
|
|
To set the DELAY filter, an area of each field with significant spatial yield variability but not near the end of any transect was selected, preferably an area with many adjacent transects that were traversed in opposite directions during harvesting. By iteratively adjusting the DELAY filter, a value was determined for each dataset which, by visual inspection, best represented the real spatial variability trends and minimized the sawtooth pattern that was evident when grain flow delay was set incorrectly (e.g., Fig. 1). At this point, the DELAY value for each dataset was fixed for the remainder of the processing and recorded.
Next, START and END filter values were set to remove the ramping effect seen when entering and exiting the crop. Optimum values for these filters were variable from dataset to dataset, and also could change within a field, as the timing with which the operator raised and lowered the head was often different from pass to pass. In fact, there were sections of one field where the operator never lifted the head, and several areas where the operator lifted the head in a low spot, but continued harvesting. In general, values were selected that removed obvious ramping (e.g., Fig. 2) for the majority of the transect ends. If there were transects with obviously longer start and end delays (as described above), additional points were removed with the MAN filter later in the procedure. In some cases, additional minor adjustments to the START and END values were made during the manual filtering procedure.
Next, a distribution of harvest speeds was investigated, and reasonable values for the MINV, MAXV, and SMV filters were selected. In general, these filters removed relatively few observations. The MINV filter was the most difficult of these to set, as the value needed to be high enough to remove the potentially large magnitude yield errors introduced by extremely low velocities, without removing clearly valid observations from the datasets.
The MINS filter was then applied, using a value that would remove any data recorded with a cutting width less than half of the full header width. In the datasets used in this study, combine operators did not often adjust this parameter, even though partial swaths were evident in the raw data, so few observations were removed by this filter (partial swaths were removed later with the MAN filter). Once set, no further adjustments were applied to the MINS filter.
Next, the MINY, MAXY, and STDY parameters were adjusted. The MAXY parameter was set such that no obviously valid, spatially contiguous areas were removed. The MINY parameter was more difficult to set, as most of the datasets had some areas where yield data appeared to be valid, yet yields were at very low levels. Therefore, the challenge was to set the MINY parameter low enough to retain the real data, but high enough to remove as many clearly unreasonable points as possible. The STDY parameter was also somewhat difficult to set. In several datasets, a level of 3 standard deviations was able to remove obvious outliers without removing reasonable data. However, filtering other datasets at 3 standard deviations removed what was perceived to be valid data, and the STDY parameter was set as high as 4 standard deviations for these datasets.
Once parameterization of the automated filters was complete, the manual filtering procedure was initiated. This procedure was performed by an individual experienced in the processing of yield data. Most of the points removed by this procedure were due to one or more of the following obvious problems (in order from most to least common): end of transect issues not removed by the START and END filters, narrow swaths not marked by the combine operator, erratic yield estimates near (but not precisely on) a velocity change or stop, and positioning errors within the field where GPS differential correction was lost. In the manual filtering procedure, individual points, transects and/or regions were manually selected and deleted as necessary. During this procedure, slight modifications were occasionally made to the START, END, MINV, MAXV, SMV, MINY, MAXY and STDY parameters, if obvious improvements could be made to the automated components of the filtering process.
Yield Editor allowed this complex filtering procedure to be completed quickly. For all five fields, the entire procedure was completed in about 3 h. Although it contained the most points, Dataset A required the least amount of manual filtering, and processing was completed in about 10 to 15 min. Dataset E, which required many manual edits and refinements to filter parameters, was completed in about 90 min. At the completion of the yield cleaning process for each dataset, the automated filter parameters, the remaining "clean" data points, and the deleted data points were all recorded to files. In addition, a flag value indicating which filter or filters were responsible for the removal of each point was recorded for further analysis.
Results and Discussion
Statistics for each raw and cleaned dataset were compared (Table 1). The percentage of points removed from each dataset ranged from 13 to 27% of the raw number of data points, well within the range of results described by other researchers (i.e., Blackmore and Moore, 1999; Thylén et al., 2001; Simbahan et al., 2004). Not surprisingly, a larger percentage of data points was removed from the smaller fields (B, D) and from fields where fragmentation or topography reduced average transect lengths (B, D, E). Figure 5
shows the location of the points removed in each field. Similar to Simbahan et al. (2004), the majority of the removed points were in the headland areas at the beginning and end of passes, with fewer data points dispersed throughout the remainder of the field.
|
After filtering, considerable improvement was seen in the spatial structure of the datasets. Semivariograms of the raw datasets (Fig. 6 ) exhibited a large nugget effect, with the variance at zero distance being 47 to 100% of the total variance in the dataset. This indicated a high proportion of short-distance measurement error or noise. Application of the automated filtering procedures (all filters except MAN) greatly reduced the total variance in each dataset and also reduced the portion of the total variance present in the nugget to the range of 13 to 68% (Table 1). Consistent with previous research (Thylén et al., 2001; Simbahan et al., 2004), the general spatial structure of the datasets remained, with the main change being an overall downward shift of the semivariance curve due to the smaller nugget effect (Fig. 6). A further reduction in the total variance was seen when the MAN filter was applied. The size of this reduction (Fig. 6) varied depending on the complexity of the original dataset and the ability of the automated filters to remove errors.
|
|
|
However, there is redundancy among the various filter types. For these datasets, many error observations were detected by several filters at once, as can be seen by comparing the information in Tables 1 and 2. Using Dataset A as an example, if all the detections in Table 2 were unique, then 20.3% of the observations in the raw dataset would have been removed. However, only 12.6% of the observations were actually removed (Table 1). To account for this redundancy, an analysis similar to that used to create Fig. 7 was completed, but restricted to those errors that were only detected by a single filter type (Fig. 8 ).
|
Using the information in Fig. 8, an order of importance for the filters would be: MAN, START, END, and then the nine other, relatively noncritical filters. As with the relative order of importance derived from Fig. 7, this does not tell the complete story. For example, while the DELAY filter only removed a few points from the raw datasets, its value is not as a true filter, since the START and END filters are efficient at removing end-of-transect points. However if the DELAY parameter is not set correctly, the quality of the resulting yield map will be greatly compromised (e.g., Fig. 1). Many of the other filters also fulfill important roles, and ignoring several of them is likely to introduce significant errors into the datasets. However, there is a level of redundancy among the other filters that is lacking for MAN, START, and END, indicating that these filters are of critical importance and must be applied in the yield filtering process.
Based on an overall assessment of these results, a suggested procedure for cleaning raw yield datasets is given in Fig. 9 . While some of the filters (i.e., DELAY, START, END, and MAN) are needed when processing any yield dataset, others may or may not be applied, depending on the specific characteristics of the dataset. The flowchart given in Fig. 9, along with the procedural information provided in this article and the tutorial section of the Yield Editor manual (available for download as described above) provide most of the information an individual needs to use the software effectively. Other useful background includes a basic knowledge of combine and yield monitor operation. With these in place, the process of learning how to filter data with Yield Editor is a fairly rapid one. In our experience, someone new to the program needs less than a day to read the documentation and understand how the software works. Generally, new users have become adept at yield data filtering after applying Yield Editor to data from a dozen or so fields.
|
| YIELD EDITOR SPECIFICATIONS AND AVAILABILITY |
|---|
|
|
|---|
Although it is not required, a freeware application called FOViewer, developed by Mapshots, Inc. (Cumming, GA) allows Yield Editor to directly import card image data in the native formats supplied by a number of different yield monitor manufacturers. FOViewer, along with device drivers for each of the major manufacturers, can be downloaded free of charge from the Mapshots, Inc. site at www.mapshots.com/FODM/fodd.asp [verified 30 July 2007].
| NOTES |
|---|
|
|
|---|
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
P. Jiang, N. R. Kitchen, S. H. Anderson, E. J. Sadler, and K. A. Sudduth Estimating Plant-Available Water Using the Simple Inverse Yield Model for Claypan Landscapes Agron. J., May 7, 2008; 100(3): 830 - 836. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. E. Massey, D. B. Myers, N. R. Kitchen, and K. A. Sudduth Profitability Maps as an Input for Site-Specific Management Decision Making Agron. J., January 11, 2008; 100(1): 52 - 59. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| The SCI Journals | Crop Science | Vadose Zone Journal | |||
| Journal of Natural Resources and Life Sciences Education |
Soil Science Society of America Journal | ||||
| Journal of Plant Registrations | Journal of Environmental Quality |
The Plant Genome | |||