|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Australian Centre for Precision Agriculture, McMillan Building, A05, The Univ. of Sydney, NSW 2006, Australia
* Corresponding author (james.taylor{at}usyd.edu.au)
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: ACPA, Australian Centre for Precision Agriculture CI, confidence interval ECa, apparent soil electrical conductivity PA, precision agriculture PAWC, plant available water capacity
Received for publication February 22, 2007.
Australian Centre for Precision Agriculture, McMillan Building, A05, The Univ. of Sydney, NSW 2006, Australia
* Corresponding author (james.taylor{at}usyd.edu.au)
Received for publication February 22, 2007.
The delineation and management of homogenous classes within a field is an important step in the evolution from uniform field management to site-specific management. The adoption of class or zone management on-farm has been slow due to an extension gap between researchers and producers. To overcome this, a protocol has been developed using freeware and shareware programs freely available over the Internet. This protocol promotes a cost-effective approach to class management at a grower and consultant level. Users begin with raw data and in a stepwise process clean, interpolate, and then cluster the data to develop management classes. The protocol has been developed for non-irrigated broadacre (>20 ha) Australian grain production systems but is readily transferable to other production systems with suitable local agronomic knowledge. A case study highlighting the protocol as well as possible problems and pitfalls is presented to assist potential users. Some limitations and further areas of protocol development to refine the technique are briefly discussed.
Abbreviations: ACPA, Australian Centre for Precision Agriculture CI, confidence interval ECa, apparent soil electrical conductivity PA, precision agriculture PAWC, plant available water capacity
| INTRODUCTION |
|---|
|
|
|---|
The terms management zone and management class are frequently used in precision agriculture (PA) literature and often as interchangeable terms. However, these terms are not identical. A management class is the area to which a particular treatment may be applied. A management zone is a spatially contiguous area to which a particular treatment may be applied. Thus, a management class may consist of numerous zones, whereas a management zone can contain only one management class.
Many different data sources may be used to delineate management classes. Examples in the literature range from hand-drawn farmer "mud" maps (Fleming et al., 2000), yield data (Diker et al., 2004; Flowers et al., 2005), soil survey data (Dillon et al., 2005; Franzen et al., 2002), imagery (Stewart and McBratney, 2001; Taylor et al., 2002), proximal soil sensors (Lund et al., 2001), and combinations of these (Koch et al., 2004; Fleming et al., 2004; Schepers et al., 2005; Whelan et al., 2002). Determining the optimum data layers needed to delineate management classes for a particular field without a priori information is difficult. In general, emphasis should be given to sensors that record known or expected yield determining factors, particularly soil moisture in Australia, and sensors that measure the final crop response. The adoption of ECa sensors in Australia is increasing because the ECa signal provides an integration of the clay, moisture, and possible salinity effects at a site. The use of high-accuracy GPS receivers for geo-referencing ECa sensors also permits the simultaneous collection of another data layer, elevation, which influences water movement and soil development in the local environment.
Recording the actual yield response is critical. Without this data it is difficult to fully understand how the overall environment is affecting production. Until 2004, site-specific measurement of the final crop (grain) response was restricted to yield (mass) but may now include quality (protein). Data from on-the-go protein monitors increase the ability of growers to budget for plant nitrogen requirements (Taylor et al., 2005); however, for this paper, the final crop response is limited to yield data. Aerial or satellite imagery may also be useful for within-season crop management. Nevertheless, imagery needs to be used carefully for management class delineation because it provides only a within-season snapshot and may not relate to final crop yield. Reported issues with grain yield prediction from within-season imagery highlight this problem (Uno et al., 2005). For this paper, management class delineation is performed using yield, ECa, and elevation data layers. From the authors' experience, these data are easily accessible and interpretable on-farm. In defining this protocol, it is important to use data layers that are easily accessible and indicative of production potential so that the protocol can be extended widely into differing cropping systems. These data fit this description; however, this does not preclude the use of other, valid data layers in the protocol that may be relevant to a specific cropping system.
There have been many different approaches to define the shape and number of management classes and zones ranging from the simplistic to statistically complex. These include hand-drawn polygons on yield maps or imagery (Nehmdahl and Greve, 2001; Fleming et al., 2000), normalized yield classification (Swindell, 1997), supervised or unsupervised classification of imagery (Stewart and McBratney, 2001; Anderson and Yang, 1996), spectral filters using Fast Fourier Transformation (Zhang and Taylor, 2000), hard k-means cluster analysis (Taylor et al., 2002; Whelan et al., 2002), fuzzy k-means cluster analysis (Lark and Stafford, 1997; Burrough and Swindell, 1997), and multivariate hard k-zones (Shatar and McBratney, 2001). Most of these require specialized software and/or statistical knowledge.
Although the management class philosophy is not new in a research context, adoption by the commercial farming sector has been poor. This may be attributed to conservatism from growers and/or consultants on new techniques, poor extension on the part of researchers, doubts from growers on the economic return from investment in PA technology and, a lack of tools and training to assist growers and consultants to adopt the management class philosophy. The aim of this paper is to fill part of the current knowledge gap by providing a protocol for the delineation of management classes based on common data sets. The protocol is derived with the aid of existing shareware software that permits a cost-effective approach for commercial adoption. The software used are a coordinate conversion program (GEOD), an interpolation program (VESPER), and a fuzzy k-means clustering program (FuzMe). The protocol aims to improve the statistical rigor in the management class delineation process within a framework that is understandable and economically viable for industry personnel. The second part of this paper presents a case study that illustrates the use of the proposed protocol.
| PROTOCOL |
|---|
|
|
|---|
150 field years of yield data and discussions with growers on sensible yield values. Data trimming with files of less than 65,536 points can be performed in Microsoft Excel (Microsoft Office Professional Edition 2003) using the "Filter" and "Sort" commands. Larger files need to be split or analyzed in another software package. When sorting and filtering the data, it is important to maintain the integrity of the coordinates associated with each data point. Some PA programs provide trimming features for raw data. Users should understand how these features work before accepting the output.
Before data deletion in Step 2, the data to be deleted should be plotted, using the coordinates, to visualize the data. This allows the analyst to identify if any of the data selected to be deleted are a real effect (e.g., a local low area within a field). However, from experiences within the Australian Centre for Precision Agriculture (ACPA), the majority of these data result from harvesting artifacts (such as when the harvester is turning or entering/exiting the crop) or sensor error.
Inliers are much more difficult to identify and remove. Inliers may be identified using a statistic such as the Moran's Index, in a local context, to identify individual data values that differ significantly from the local neighborhood values. To the authors' knowledge, there is no user-friendly software available to perform this analysis. Alternatively, the diagnostics of harvester operation can be used to identify when erroneous data (inliers or outliers) are likely to be collected (e.g., when the harvester is turning, when the speed is too slow, or when the front is not cutting a full width). Some PA software, such as Yield editor from the University of Missouri, are designed to trim yield on harvester diagnostics, and these programs can be very effective. However, the simple two-step process described previously has proven sufficient in ACPA applications for cleaning the data for subsequent use.
The final step in data clean-up is to convert Geographic coordinates into Cartesian coordinates. This permits data to be displayed in absolute distances (meters) rather than relative distances (degrees) and allows for easier visual interpretation of the data. It is recommended that Australian growers convert their data into a Universal Transverse Mercator (UTM) projection using the 1994 Geodetic Datum of Australia (GDA94). A freeware program, GEOD and associated manual, that performs coordinate transformation is available from the New South Wales Department of Lands (http://www.lands.nsw.gov.au/survey_mapping/surveying/gda/geod_software; verified 3 Aug. 2007). Most Geographical Information System software also provides the ability to transform coordinates. For readers outside of Australia, many government agencies also provide freeware coordinate conversion programs.
Spatial Prediction of the Data
After clean-up, the data require spatial prediction and mapping onto a standard grid. When raw data are plotted as individual points, even with cleaning, it is often difficult to interpret. To make data more presentable, most software used in PA transforms the raw point data into a continuous map. Continuous maps are produced to remove some of the noise in the raw data and present a more coherent map. Apart from making a map, spatial prediction is valuable from an analytical perspective. If done correctly, it can be used to filter some of the systematic and stochastic errors in the data (Whelan et al., 2001). It also permits data from different times and/or sources to be compared. When a field is harvested, it is highly unlikely that the same location will be recorded in different years. Likewise, soil measurements and crop imagery do not occur at the same locations or even at the same scale as the yield data. This makes it difficult to merge data from different sensors and different years to perform statistical analysis. If the data are predicted onto a standard grid that is kept temporally constant; then data from different years and sensors can be simultaneously analyzed. This allows the statistical, rather than visual, identification of stable and variable areas of crop production.
Before spatial prediction of the initial data set, a standard grid needs to be established for each field and kept constant for all subsequent data sets. Gridding can be done in VESPER (Minasny et al., 2005), a shareware interpolation program available from the ACPA (www.usyd.edu.au/su/agric/acpa/vesper/vesper.html [cited February 2007; verified 23 July 2007]). The grid spacing should reflect the level of detail required, computer processing power, and analytical software capability. Typically, a square 5-m grid is used by ACPA in Australia for broadacre production systems because it approximates half the basic operational width (assuming a 9- to 10-m front on the harvester). This produces 400 grid points ha–1. For operators constrained to Microsoft Excel, the grid size should be increased in fields larger than 163 ha to avoid exceeding the row limit. Processing power may also be problematic in large fields due to the large number of interpolations required. In either of these situations, reverting to a square 10-m grid overcomes the problem while maintaining a suitable map resolution for data display, analysis, and practical field management practices.
Spatial prediction can be undertaken using a variety of statistical techniques depending on the number and density of data collected. For the dense datasets collected from yield and on-the-go ECa sensors, local block kriging is the preferred approach (Whelan et al., 2001). Local block kriging can be performed with VESPER. An operational manual and more detailed instructions on the parameters required for the operation of VESPER are provided from the ACPA website (www.usyd.edu.au/su/agric/acpa [cited February 2007; verified 23 July 2007]). In general, the following rules should be adopted for local block kriging:
VESPER creates a "spaces" delimited output text file with headers. The output file has five columns of ID, X, Y, Predicted value, and Kriging SD (
krig).
Generating Management Classes
The first step post interpolation is to map the kriged data. A basic mapping feature is included in VESPER. However, if available, it is recommended that the output be displayed in a Geographical Information System or dedicated mapping software. Mapping the "Predicted value" column helps to identify if there are unusual patterns in the data due to remaining artifacts, error in interpolation, or extraneous management/environmental effects. If error or interpolation artifacts are present, the data should be re-cleaned and/or re-interpolated. If the error persists, further data analysis and "expert" assistance may be required.
krig for each data layer can then be collated into a single spreadsheet containing an ID and X and Y coordinates column and named appropriately. It is important to save the
krig for later analysis of significance between clusters. The collated file should contain a minimum of one interpolated ECa layer and an elevation layer. All data layers, particularly yield data, should be analyzed for relevance with someone who knows the history of the field (i.e., the grower or agronomist). Any data that show management effects (e.g., double cropping in part of the field or unintentional differential management such as spraying or fertilizing) or extraneous environmental effects (e.g., frost, insect, animal, or disease damage) should generally be discounted from any further analysis; however, it should not be deleted. The exception occurs if a climatic effect is constant and the producer wishes to manage this effect (e.g., a regular frost hollow). Management strategies such as different cropping or stubble retention regimes within a field may bias ECa measurements, making these data unsuitable for analysis. These points highlight the necessity of discussing the patterns observed in yield and environmental maps with growers before deciding which data are to be analyzed.
Once the desired data layers for management class determination have been selected, a k-means cluster analysis can be performed using the freeware FuzME (Minasny and McBratney, 2002), which is available with an instruction manual from the Australian Centre for Precision Agriculture, The University of Sydney (www.usyd.edu.au/su/agric/acpa/fkme/FkME.html [cited February 2007; verified 23 July 2007]). FuzME uses a fuzzy k-means clustering algorithm to allocate the desired data layers into clusters, which minimizes the within-cluster variability and maximizes the diagonal distance between the mean cluster value for each data layer. FuzMe requires a comma-delimited text file with headers and an ID column, consisting of the row number, in the first column. Subsequent columns contain the data to be clustered. Only valid data layers should be subset for input into FuzME and only the "Predicted Values." The
krig data should not be included in the cluster analysis.
Specifying the "Fuzzy Exponent" as 1.01 removes any fuzziness in the algorithm and makes it analogous to a hard k-means clustering. The range specified in "Number of Classes" determines the number of output files and number of clusters into which data are segregated. The range recommended here (2–6) provides solutions for two, three, four, five, and six clusters; thus, the output will be saved to five different text files (one file per solution with a suffix "_class"). These can be joined to the original data using the ID column of row numbers. A separate output file (default name of FuzMeout.txt) provides information on the analysis including the centroid values of each data layer used for each cluster.
FuzMe uses random seeding to initially allocate cluster means, so some care must be taken when undertaking the analysis. It is possible for different initial random seeding events to produce different cluster outcomes. To overcome this, FuzMe has an option ("Number of Trial") to repeat the analysis N times and take the best performed output. The output from FuzMe, or any unsupervised classification software package, should be checked using local "expert" knowledge to ensure that the unsupervised clustering is agronomically valid.
There are many statistical packages that are capable of hard or fuzzy k-means clustering. FuzME is only recommended here as a relatively user-friendly freeware program.
Determining the Optimum Number of Management Classes
One of the biggest challenges of cluster analysis is determining the optimal number of classes for management. The number of clusters (classes) chosen is usually subjective using some external "expert" knowledge of the field by the grower or consultant. The determination of significant difference between classes is problematic due to autocorrelation between data points. However, some statistical approaches have been recently proposed to provide additional decision support to growers and consultants. Fridgen et al. (2000) suggested a statistic for the comparison of within-class variance relative to total field variance (i.e., one management class per field). They calculate the weighted variances for each class as
![]() | [1] |
Cupitt and Whelan (2001) proposed a confidence interval (CI) that uses the mean kriging variance to determine if the yield response in two classes is statistically different from each. They argue that the most important point for class delineation is having sufficient yield variability between classes to warrant class-specific management. They calculated the CI as
![]() | [2] |
krig2 is the mean kriging variance.
For statistically different classes
![]() | [3] |
Through trial and error over the past 5 yr on data at ACPA, the median kriging variance (
krig2) has been shown to be a more robust value than the mean kriging variance (
krig2) due to the positive skewness often associated with the kriging variance data. The output from VESPER has been modified to give the kriging SD (
krig) rather than variance; thus, statistical difference between classes can be determined by
![]() | [4] |
krig and cannot be used on raw data or with other methods of spatial prediction. When selecting the optimum number of management classes to be used, the ability to practically manage the classes needs to be considered. Fields with large coherent zones are easier to manage differentially. Fields with management classes split into numerous small irregularly shaped zones are more difficult to manage.
The need for strong spatially structured zones and significant agronomic differences between classes has resulted in the majority of broadacre fields (n
50) that have been analyzed at the ACPA, being assigned two, three, or four management classes. It is rare, even in large fields of >100 ha, to have more than three management classes.
Validating the Management Classes
Once the classes have been created, some additional "ground-truthing" is required to validate the zoning and assist in agronomic decisions. Sensor data (yield, canopy, or soil data) do not usually give direct measurements of yield-determining factors, such as nutrient deficiencies/toxicities, subsoil constraints, soil pH, or other soil properties. Soil sampling is undertaken using a form of stratified random sampling with the potential management classes as the strata. Constraints on the random allocation of sample points are imposed to avoid strata boundaries and to target all sizeable zones produced in each class. This process aims to ensure that transitional areas are not sampled and that yield-determining factors in each zone can be examined separately. For the majority of fields analyzed by the ACPA, three or four sample sites per management class have been sufficient to identify the main yield determinants in each class. The procedure should segregate topsoil and subsoil samples for analysis. Specific depths should be adjusted to suit site conditions and local agronomic testing regimes. If it is economically feasible, more samples can be taken to provide more robust results. However, the process of stratifying the existing variation in the fields, using the management class delineation process, assists in minimizing the number of samples needed to characterize the soil trends and yield-determining factors within the field and between the classes. The use of three to four samples per class may not produce a significant difference between classes; however, the intention here is to obtain data that can be practically used, in conjunction with existing data and farmer knowledge, to understand the yield processes in the field, determine class-specific management regimes, and/or determine treatment levels for class-based experiments. If the initial three to four samples do not show major trends or yield-limiting factors, then further testing may be needed; however, this has not been necessary on the fields studied by ACPA.
The soil properties measured should reflect existing local knowledge on which soil variables are likely to be affecting yield and may differ between cropping regions. The final step is to incorporate the soil sampling results with the rest of the yield, environmental, and soil data to identify relevant agronomic responses for each class. This is class specific and relies on the correct interpretation of the collected data with a local agronomist. This protocol should help ensure that the data is presented and analyzed in a manner that facilitates correct decision-making.
| CASE STUDY |
|---|
|
|
|---|
|
|
|
|
|
krig map provides an indication of the error associated with each prediction. Typically areas with few data points, noisy data, or points near the edges of fields tend to have a higher
krig. The text output file contains estimations of
krig that can be used in the adjusted CI of Cupitt and Whelan (2001) (Eq. [4]). The VESPER mapping package is basic, and where possible the output should be mapped in an alternative package. Figure 5 shows all available data layers for the field mapped in ArcMap 9.0 (ESRI Inc., Redlands, CA). The maps are represented in half SD units to express the variables on a common legend for this paper. Generally data can be mapped using absolute values and in color for greater differentiation. By mapping the data, a feeling for the general response in the field can be gained, and any extraneous effects become more obvious.
|
Figure 5 illustrates the importance of discussing maps with growers to understand how management and the environment are affecting the spatial patterns in maps. As a result of discussions with the grower and analysis of the maps, the wheat (1997), chickpea/safflower (1999), wheat (2000), and Veris (2000) data were considered to contain artifacts and were removed from any further analysis. The grower does not intend to plant different crops or partially sow the field in the future.
When delineating management classes, care must be taken to avoid weighting any analysis toward a particular variable. Of the remaining seven variables, three are measures of ECa. The EM38(H) and EM38(V) measure the same soil property over a similar region, and the raw data were correlated (r = 0.89). To avoid biasing the cluster analysis toward ECa measurements, the EM38(H) was excluded. This left six variables: elevation, two ECa measurements (EM38(V) and EM31), and three yield measurements (sorghum 1998, wheat 2003, and wheat 2004).
All the kriged data, including
krig, were collated into a single spreadsheet (Step 6). The data layers selected for further analysis, minus the
krig data, were subset into a single text file with an ID column (Step 7). This was used as the input file for FuzME, and the program was run according to the points outlined in Step 8. Figure 6
shows the spatial pattern of the results from two, three, four, five, and six cluster analyses, and Table 2 presents the cluster means for each variable. The median
krig for each variable and the adjusted CI statistic (Eq. [4]) are also shown. Clusters that are significantly different according to this statistic are indicated with different letters within Table 2.
|
|
Applying Occam's razor, site-directed soil sampling was conducted using the two-management class model. In 2001, there were six cores taken per management class, and in 2004 there were a further three per management class. In 2001, the management class map was slightly different because it was derived from the existing data (effectively the Sorghum 1998 data). However, the temporal stability of the yield response pattern in the field resulted in the 12 samples being split evenly within the 2004-derived management classes. Figure 7 shows the sampling locations in 2001 and 2004 overlain on the two-management class map and the effect of the constrained random stratified sampling process used in 2004. In 2004, a decision was made to place one of the three Class A samples in the western zone. The other two samples were constrained to the large central zone. The other Class A zones, particularly the area in the extreme southeast of the field, were considered to be too small or to have a low priority for sampling by the farmer. The location of the samples within the zones is random with the constraint that the sample cannot be within a certain distance (usually 30–50 m) of a class boundary.
|
15 mm) could be manually pushed into the soil at field capacity. The results of selected chemical properties and soil depth estimation for 2001 and 2004 are shown in Table 3.
|
With this information, the farmer and agronomist have decisions to make regarding fertilizer management in the two classes, given that Class A has a much lower yield potential in a low-average rainfall year but may respond well to late-season fertilizer when within-season rainfall is high. The two classes should not be managed uniformly.
| CONCLUSIONS |
|---|
|
|
|---|
Although this protocol is an effective tool, it is not the definitive way to delineate management classes. The protocol fails to address several key issues and therefore is reliant on subjective "expert" input. In particular, the clustering algorithm fails to take into account the value of contiguity within a management class. Some brief attempts have been made to address this (e.g., McBratney et al., 2000), but further model development is required. There is also no provision in the current approach to account for different management activities possibly requiring different management classes. Is it reasonable to expect nitrogen, potassium, phosphorus, and irrigation (if available) to have the same management classes? The protocol also does not directly consider that management classes, even for an individual input, may be temporally dynamic, especially with differential management being incorporated into cropping practices. As more data or "expert" knowledge are acquired, the process can be rerun to update or test the effectiveness of the management classes. The incorporation of yield and environmental data layers into the initial analysis of management classes will minimize the temporal variance in the management classes/zones. The underlying environmental data (ECa and elevation) provides some stability to the clustering algorithm inputs and is indicative of yield potential. When class delineation is done only on yield data, it is impossible to differentiate between actual and potential yield responses within classes, which is problematic when making management decisions.
This aim of this paper was to describe a practical and economic methodology for delineating classes. The stability of the resulting classes is outside the scope of this paper. However, temporal class stability is an important issue for class-specific management that growers/agronomists need to be aware of. If management classes change dramatically as new data layers are added, then it becomes difficult to apply management with any confidence at a class level. From the authors' experience, the incorporation of environmental data tends to produce stability in the management class pattern. Incorporating subsequent cropping years into the class delineation process usually adjusts the location of class boundaries rather than rearranging the class pattern. In eastern Australian cropping systems, the environmental data used tends to be elevation and ECa; however, preferred environmental data may vary regionally. The environmental data also provide a foil to help interrogate why class crop response may differ between classes. Further research is needed to quantify the temporal stability of class (and zone) delineation. However, the proposed protocol provides PA practitioners with the ability to experiment on their fields with different combinations of temporal data layers to improve their understanding of how their fields respond.
The protocol outlined in this paper has been used successfully by the ACPA for several years across a broad range of cropping systems, including viticulture and horticulture as well as broad-acre cropping environments. The proposed protocol allows management classes to be statistically derived with the input of some local knowledge and provides a tried and proven way to begin managing fields by classes. Following this process allows agronomists and growers to take the first major step from uniform toward site-specific management via class-specific management. As growers/agronomists become more familiar with spatial data analysis and decision-making and as technology improves, management will drop to a zone-specific and eventually site-specific resolution. In other words, getting the uniform rate right is the first major step before considering differential management. The next, in a management class context, is getting the class rates right. The next is targeting individual zones within classes then making classes and zones smaller till we approximate plant-specific management. If this can be achieved, then PA will truly become site-specific. Further details and examples of this protocol in action are available from the ACPA website (www.usyd.edu.au/su/agric/acpa).
| ACKNOWLEDGMENTS |
|---|
| REFERENCES |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |