|
|
||||||||
USDA-ARS, National Soil Tilth Lab., 2150 Pammel Dr., Ames, IA 50011
* Corresponding author (jaynes{at}nstl.gov)
Received for publication December 13, 2001.
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: AS, aspect DEM, digital elevation model DGPS, differential global positioning system GIS, geographic information systems GPS, global positioning system EC, soil electrical conductivity EL, elevation PL, plan curvature PR, profile curvature SC, surface soil color SL, slope UTM, Universal Transverse Mercator
| INTRODUCTION |
|---|
|
|
|---|
Lack of temporal stability in yield patterns makes site-specific prediction of yield from previous yield observations or from intrinsic soil properties suspect (Eghball and Varvel, 1997). For example, simple statistical relationships can be developed between spatial yield patterns for a given year and soil properties such as texture, apparent electrical conductivity, organic matter content, or terrain attributes (Yang et al., 1998). However, the relationships are not the same among years (Jaynes et al., 1995b; Halvorson and Doll, 1991), unless the yield data is divided into subsets based on growing season climatic conditions or field attributes (Kaspar et al., 2003; Timlin et al., 1998). For example, Kaspar et al. (2003) were able to develop a multiple-regression equation based on terrain attributes that accounted for 78% of the yield variability for 6 yr of corn grown on a 16-ha field by restricting the analysis to the 4 yr having below-normal seasonal precipitation. When they included the 2 yr with above-normal growing season precipitation, only 26% of the yield variability could be accounted for by multiple regression.
Rather than trying to predict specific yields within a field, we may be more successful in identifying areas within a field that behave similarly among years. Most studies have approached this problem by first identifying areas of similar soil properties or terrain attributes and then testing for reduction in yield variability or yield response to inputs within these areas (Fraisse et al., 2001; Fleming et al., 2000). These methods rely on using general knowledge of crop production to identify the important yield-limiting properties to include in the analysis, and the results are often valid for individual years only. An alternative approach is to analyze long-term yield data to search for similar patterns of yield across years within a field (Lark and Stafford, 1997; Boydell and McBratney, 2002). For farming by management zones to be practical, one would like to identify a limited number of temporal yield patterns within a field, and the areas of the field exhibiting similar patterns should form spatially simple, contiguous areas.
Cluster analysis of multiyear yield data may be an effective tool for identifying temporal yield patterns. Cluster analysis consists of a three-part process. In the first step, a clustering algorithm is used to partition all members of a population into one of several clusters or groups such that the sum of squares for members within a cluster is minimized while the sum of squares between members of different clusters is maximized. After partitioning the data into clusters, the second step attempts to understand and characterize the nature of each cluster by examining population statistics of the members of each cluster and, in the case for management zones, the spatial distribution of the cluster members. This interpretation step can help us gain valuable insights as to the specific characteristics of the cluster members that determine cluster membership. The third step, termed profiling, is used to relate the occurrence of specific clusters to auxiliary data such as terrain or soil properties not used in the original cluster partitioning. Profiling can help determine why a yield cluster exists at a particular location in a field and is a powerful step in cluster analysis as profiling information can be used to predict the occurrence of yield clusters in areas of the field where yield data is lacking.
The auxiliary data used in the profiling step should include variables relevant to crop growth and yield. Preferably, the variables would be easy to measure or readily available so that the spatial distribution of yield clusters could be extrapolated to other locations. Remotely sensed variables are ideally suited for this purpose because they may already be available or can be obtained quickly and inexpensively. For example, georeferenced, color infrared digital aerial photography of bare soil for much of the USA is available from the U.S. Geological Survey and third-party venders. Alternatively, photographs of fallow fields can be taken with standard color film or slides and surface soil color (SC) easily georeferenced (Chen et al., 2000). Digital coverages of bare soil have proven to be useful in determining the areal distribution of soil properties such as soil organic carbon and may be effective in profiling areas belonging to a specific yield cluster (Varvel et al., 1999; Chen et al., 2000).
Terrain attributes have also been shown to be valuable in delineating soil properties important to crop production (Afyuni et al., 1993; Brubaker et al., 1993; Moore et al., 1993; Timlin et al., 1998; Kravchenko et al., 2000) and for differentiating areas of contrasting yield (Halvorson and Doll, 1991; Yang et al., 1998; Kaspar et al., 2003). A digital elevation model (DEM) for a field can be developed rapidly with centimeter accuracy using kinematic differential global positioning system (DGPS) equipment and a tractor or all-terrain vehicle (Clark and Lee, 1998). With a DEM, primary terrain attributes such as slope (SL), aspect (AS), plan curvature (PL), and profile curvature (PR) can be calculated with readily available software. Thus, terrain attributes may serve as practical auxiliary data in profiling cluster membership within a field.
When collecting elevation (EL) data with a roving vehicle, apparent soil electrical conductivity (EC) can be measured easily at the same time (Jaynes et al., 1993; Sudduth et al., 1999). Soil electrical conductivity has been shown to be highly correlated with soil properties such as clay content (Williams and Hoey, 1987), water content (Kachanoski et al., 1988), soil salinity (Rhoades and Corwin, 1981), soil organic matter content (Jaynes et al., 1995b), and cation exchange capacity (McBride et al., 1990)all properties that can influence crop yield.
These field attributes appear well suited as auxiliary data for the profiling stage of cluster analysis as they are either readily available or easily measured and have been shown to correlate with yield or soil properties known to affect yield. Other field variables such as soil pH, topsoil depth, and texture also could be used but would require much more time and labor for sampling and laboratory analysis and thus would not be conducive to large-scale, low-cost applications. Ordinate and class variables such as soil type could be helpful in the profiling process, but we will limit this analysis to continuous variables so that multiple discriminant statistical methods can be used in the profiling process.
The objective for this study was to examine if the three steps of cluster analysispartitioning, interpretation, and profilingcould provide useful interpretations of multiyear corn yield data. In the profiling step, auxiliary data were limited to EC, SC, EL, and the primary terrain attributes derivable from EL because these field attributes can be easily and intensively measured over large areas such as a farm in a relatively short time and would therefore be well suited for routine application.
| MATERIALS AND METHODS |
|---|
|
|
|---|
We measured corn and soybean yields on the south 16 ha of the field for 11 consecutive years starting in 1989. Only the yield data in years when corn was grown (1989, 1991, 1993, 1995, 1997, and 1999) were considered in this study. Details of the harvest method used and a further description of the yield data for 19891995 can be found in Colvin et al. (1997) and Jaynes and Colvin (1998). Grain yield was measured along eight eastwest trending transects spaced 48.8 m apart with a combine modified with a weigh hopper mounted inside the combine grain storage tank (Colvin, 1990). Yields from the three-row-wide transects were harvested by driving the combine for a measured distance and then stopping to weigh the harvested grain and measure its moisture content. While the position and length of each line were consistent from year to year, the exact length of each harvest plot varied and averaged slightly over 12 m. Yield data were linearly interpolated to correspond to harvest plots of uniform 12.1-m length, resulting in 28 harvest plots per transect. Thus, 8 by 28, or 224, yield values were measured on a regular grid each year (Fig. 1a) . Corn yields were adjusted and reported at 155 g/kg water content.
|
Field Attributes
Elevation and position measurements were made for the 32-ha field in fall of 2000 after soybean harvest with a kinematic DGPS receiver (Ashtech Z Surveyor, Magellan Corp., Santa Clara, CA1) mounted on an all-terrain vehicle. Readings were logged every 1 s as the vehicle moved across the field at approximately 4 m s-1, giving measurements about every 4 m. North-to-south transects were driven approximately 8.7 m apart across the field. This data was supplemented by collecting additional EL data by driving along ridges and swales to minimize interpolation errors in the subsequent terrain model. A base-station global positioning system (GPS) receiver, located at a bench mark on the eastern edge of the field, was used to differentially correct the roving GPS receiver. The ground control locations were referenced to a Universal Transverse Mercator (UTM) projection (Zone 15, North American Datum 1983). Elevation values were estimated in height above the ellipsoid (m). Position measurements were reliably within ±0.03 m horizontally and ±0.06 m vertically for this equipment.
The EL data produced a digital terrain model comprising >10 000 points, which was used to generate a DEM for the field. The data were used to estimate EL for an 8- by 8-m regular grid across the field using Surfer 7.0 (Golden Software, Golden, CO) gridding software and a linear, isotropic variogram model. The grid was then smoothed and a 2- by 2-m DEM created by invoking the cubic spline smoothing operation of Surfer 7.0. The coarser grid followed by smoothing was used to reduce the roughness in the DEM caused by small errors in the DGPS EL measurements. The primary terrain attributesEL (m), SL (the rate of maximum change in EL to surrounding grid cells, in degrees), PL (curvature of the surface perpendicular to the direction of SL, m-1; PL < 0 for curvatures that are concave upwards), PR (curvature of the surface in the direction of the SL, m-1; PR < 0 for curvatures concave upwards), and AS (direction of SL, in degrees, measured from north minus 180°)were then calculated for each 2-m grid cell of the DEM using the Arc/Info geographic information system (GIS) software CURVATURE command (Arc/Info 1998, Environ. Syst. Res. Inst., Redlands, CA).
Soil electrical conductivity was measured on the transects at the same time EL measurements were made using an EM-38 electromagnetic induction EC meter (Geonics, Mississauga, ON, Canada). The EM-38 was attached to a fiberglass trailer in the vertical dipole position and pulled across the field with an all-terrain vehicle (Jaynes et al., 1993, 1995a). The meter was set at a height of 8 cm above the ground. Soil electrical conductivity was measured and recorded with the position data giving >8500 spatially registered EC measurements. A 2-m grid cell coverage of EC for the field was created from the survey data using the Arc/Info GIS TOPOGRID command and an isotropic, gaussian variogram model.
A color photograph of the field was taken from an airplane at an altitude of 1500 m on 16 May 2001 using ASA 100 film. The bare-soil photograph was taken after planting when most of the surface residue was no longer visible because of decomposition and incorporation. A 0 to 255 gray scale digital image of the photograph was created using a digitizer (ScanMaker III, Microtek, Redondo Beach, CA), with a resulting resolution of 0.029 m2 per pixel. The digital image was georeferenced using the Georeference tool in TNTmips (MicroImages, Lincoln, NE) and control targets of known location set around the perimeter of the field. The resulting georeferenced image was then rectified to the UTM projection using the Plane Projective Model in TNTmips. The digital image was resampled to create a 2-m grid cell coverage of SC for the field.
The 224 yield plots were digitized as polygons and overlaid on the EC, SC, and terrain coverages. The area within each of the 224 yield plot polygons was converted to a raster format with a 0.0625-m2 resolution. The values of the field attributes for each 0.0625-m2 yield polygon raster were taken from the underlying 2-m grid cells of the appropriate coverage. The mean value for each attribute within each of the 224 yield plots was calculated by arithmetic averaging of the values for all 0.0625-m2 rasters that fell within the specific yield plot polygon.
Statistical Analysis
Univariate statistics for corn yield for each of the 6 yr were calculated for standardized yields. Yields were standardized in anticipation of subsequent studies combining corn yield data with yield data of other crops grown on this field and to remove the larger influence that crops with greater yield variances would have on cluster formation (Hair et al., 1987). A global median and interquartile range for corn yield were computed using data from all crop years. The median and interquartile statistics were used as robust estimates of population statistics rather than the mean and standard deviation because of outliers caused by yields equal to zero in some yield plots in years when standing water drowned out the corn plants. All yields were standardized by subtracting the global median from each yield plot value and dividing by the global interquartile range. Distributions of yield and field properties were tested for normality using the KolmogorovSmirnov statistic. Because many distributions failed the test for normality and were not readily transformable into normal distributions, Spearman rank correlation coefficients were calculated for all variables.
Cluster Analysis
The 6 yr of standardized corn yield data were used in a disjoint nonhierarchical cluster analysis using PROC FASTCLUS (SAS Inst., 2000). Cluster analysis is a data reduction procedure where the 224 multiyear yield patterns for the yield plots are reduced to a few manageable types. The clustering algorithm attempts to group the yield plots into clusters with high internal homogeneity and high external or between-group heterogeneity. The clustering algorithm groups each yield plot such that it belongs to one and only one cluster. A k-means model was used where the cluster centers are based on a least sum-of-squares estimation. Cluster analysis was used repeatedly to divide the yield plots into two, three, four, five, six, seven, and eight clusters to determine the optimum number of clusters to use. We considered eight clusters to be the maximum number of yield areas of practical use as management zones. The cluster algorithm was run for a maximum of 10 iterations in each case or until the change in all cluster means between iterations was 0. A pseudo-F statistic and the cubic clustering criterion (Milligan and Cooper, 1985) provided by the FASTCLUS procedure were used as indicators of optimum cluster number.
The spatial autocorrelation of the resulting clusters was determined using Moran's I statistic (Moran, 1950; Upton and Fingleton, 1985). Moran's I is similar in concept to correlation and ranges from -1, where members of different clusters would be evenly interspersed like the colored squares of a checkerboard, to 0, for a completely random distribution, to 1, where members of a cluster would be grouped closely together in space. Moran's I was calculated using the Excel 1997/2000 Visual Basic routine written by Sawada (1999).
Multiple Discriminant Analysis
To help interpret the characteristics of the resulting clusters, a canonical multiple discriminant analysis (CANDISC, SAS Inst., 2000) was used to summarize the influence of the different crop years in determining the yield clusters. Multiple discriminant analysis is appropriate to use when the dependent variable (clusters) is categorical and the independent variables are continuous (Hair et al., 1987). Given a set of clusters, multiple canonical discrimination can derive composites of the variables (linear combinations of the 6 yr of corn yield data in this case) that summarize the differences between clusters. A maximum of n - 1 canonical composites can be computed for n number of clusters. The canonical composites were used to examine the effect of individual years in determining clusters by examining the correlations or loadings between the canonical composites and the yearly yield data (Hair et al., 1987).
For the profiling stage of the cluster analysis, multiple discriminant analysis was repeated using the field attributes as independent continuous data and yield clusters as the categorical data. First, the yield data from the 224 plots were randomly divided into calibration and validation subsets such that each subset had similar numbers of members from each cluster. Multiple discriminant functions were developed for data in the calibration subset. The accuracy of the developed functions was tested by using the functions to predict cluster membership of the plots in the validation data subset and comparing predictions with known membership. Calibrating and validating the discriminating functions on two different subsets of the data eliminates the upward bias of testing the functions on the same data set for which they were calibrated (Hair et al., 1987).
After splitting the data, a stepwise multiple discriminant analysis (STEPDISC, SAS Inst., 2000) was performed on the calibration subset to reveal which field attributes contributed significantly toward classifying yield plots into clusters. Values for EC, SC, EL, SL, PR, PL, and AS averaged for each yield plot were included in the analysis. The stepwise discriminant process is analogous to stepwise regression. In the first step, the field attribute that contributed most to discriminating between the clusters was brought into the discriminant model. In the second step, the attribute that contributed most to the discriminating power of the model, that was not already in the model, was entered into the model if it exceeded a preset significance level for entry. Before each new attribute was entered into the model, the method tested the attributes already in the model. If the attribute that contributed the least to the model failed to satisfy a preset significance level for remaining in the model, it was removed. The stepwise process continues until all attributes in the model meet the criterion to stay and none of the remaining attributes meet the criterion to enter the model. A moderate significance level (P = 0.15) was used for attributes both entering and leaving the model because this level was midrange of the levels found to perform best in the forward selection methods (Costanza and Afifi, 1979).
Field attributes found to be significant in the stepwise discriminant process were used to develop linear functions that discriminated between clusters. The calibration subset was used to develop a set of functional criteria based on the field attributes that best discriminated between the clusters (DISCRIM, SAS Inst., 2000). These discriminant functions can be used to categorize any location in the field into the proper yield cluster based on the values of field attributes at that location. The accuracy of the functions was tested by predicting cluster membership of the yield plots in the validation subset and comparing with known membership. The calibrationvalidation process was then repeated by reversing the roles of the two data subsets (double cross validation), giving two estimates for the accuracy of multiple discriminant functions in predicting proper cluster membership.
After evaluating the accuracy of the multiple discriminant functions, the yield data subsets were combined and a final set of multiple discriminant functions computed using all 224 yield plots. Canonical multiple discriminant analysis was used to evaluate the correlations or loadings between the field attributes and the canonical composites that best summarized the differences between the clusters. These canonical composites were first rotated to redistribute the variance using varimax rotation (SAS Inst., 2000). Rotation essentially preserves the original structure and reliability of the discriminant model while making the interpretation of the loadings easier (Hair et al., 1987). The loadings were then stretched by multiplying them by their univariate F value. Stretching the loadings gave a better indication of the importance of the field attribute in determining cluster membership. The stretched loadings for each attribute were plotted as vectors starting from the origin. Cluster means were also rotated, stretched, and plotted with the attribute loading vectors, resulting in a simple graphical representation of the importance of each attribute in distinguishing between clusters.
In a final step, the multiple discriminant functions developed using all 224 yield plots were used to partition the entire 32-ha field into yield clusters. Every 2-m grid cell within the field was assigned a cluster membership based on the field attribute values within the cell. Yield cluster assignment was thus possible even for areas where yield measurements had not been taken.
| RESULTS |
|---|
|
|
|---|
|
|
Average values for these field attributes and PR, PL, and AS were computed for the 224 yield plots in the south half of the field (Fig. 1a), and their univariate statistics are summarized in Table 3. The ranges for the field attributes within the yield plots were nearly as large as the ranges for the entire field, indicating that the yield plots in the southern half of the field captured most of the variation present within the 32-ha field. None of the field attributes passed the KolmogorovSmirnov test for a normal distribution.
|
Cluster Analysis and Interpretation
The disjoint clustering routine was used to divide the yield plot data into two, three, four, five, six, seven, and eight clusters. Local maximum values for the pseudo-F statistic occurred for three and five clusters. The cubic clustering criteria also had the highest value at five clusters (not shown). Dividing the yield data into more than five clusters tended to create a number of clusters with only a few members, which is not a desirable outcome for clustering. Thus, the 224 yield plots were each assigned to one of five clusters using the least-squares criterion. Figure 2
shows the spatial distribution of the yield plots assigned to each of the five clusters and illustrates that the clusters were not randomly distributed across the field, but rather the yield plots assigned to the same cluster appeared to group together in space. The Moran's I statistic for the cluster distribution was 0.61 (P < 0.001), confirming that the clusters tended to group together within the field. This spatial correspondence was not a direct result of the clustering algorithm but rather must reflect the spatial correlation of the underlying properties and processes controlling yield. Temporal yield patterns clustered into spatially homogenous areas is encouraging for the application of precision farming techniques.
|
|
Yield plots comprising Cluster 1 were similar to the Cluster 2 pothole plots, with very good yields in the dry years and lower yields in the wet years (Fig. 3). But the wet-year yields for plots in this cluster were not as low as for Cluster 2 plots. The plots in this cluster tended to occur on nearly-level areas at lower ELs with open surface drainage and approximated the toeslope positions of the landscape. This toeslope cluster contained 18 of the 224 yield plots.
Cluster 3 contained nearly half of the 224 yield plots. Cluster 3 plots tended to be on gentle SLs on lower-EL or footslope positions. These plots tended to be at slightly higher ELs than the Cluster 1 plots and were not stressed as much by excessive soil wetness in wet years but apparently had adequate water in dry years. Plots in this footslope cluster had good yields every year compared with plots in other clusters.
Finally, Cluster 5 comprised 70 yield plots that tended to be distributed on the higher-EL, linear backslope positions within the field. Average yields within this backslope cluster were the best in wet years compared with the other clusters but were lower than yields in Clusters 1, 2, and 3 in dry years.
The four canonical composites were computed from the yield data to summarize the differences between the five clusters. Most of the difference in clusters (93%) was explained by the first two canonical composites (Table 4). Yields from 1993 and 1999, the two wet years, had the highest loadings on the first canonical composite, which accounted for 55% of the cluster separation. Thus, corn yield response in wet years appeared to be the principal determinant of the temporal yield pattern within the field. The second canonical composite accounted for about 38% of the variation and was most heavily weighted by yields in the dry years of 1989, 1991, 1995, and 1997 (Table 4). Thus, there was a significant yield response difference among clusters in dry years as well. The third canonical composite accounted for about 6% of the variation and was positively loaded by yields in 1989 and 1995 and negatively loaded by 1991 yields, all dry years. The negative loadings from 1991 yields may be due in part to the very wet March and April of that year even though the total growing season precipitation was below average. Thus, the canonical composite reflects a refinement of the simplistic categorization of wet and dry years represented by the first two composites. The fourth canonical composite was not significant and contributed little to the cluster separation.
|
0.08) to summarizing the differences between the five clusters (Table 5). Soil electrical conductivity had the highest partial R2 and entered the model first. Soil electrical conductivity was followed in order by PL, SL, and EL, indicating the magnitude of the partial R2 of the attribute before entering the model. For the final model, PL had the largest partial R2 of all attributes in the model. The large reduction in R2 for EC between entering the model and the final model was due to the correlation between EC and other attributes in the model.
|
Multiple discriminant functions were calculated independently for Data Subsets 1 and 2 using the five field attributes identified as significant in either of the stepwise analyses of the two subsetsEC, PL, SL, EL, and PR. Testing the accuracy of the multiple discriminant functions calibrated with Subset 1 on Subset 2 showed that the functions could not accurately classify yield plots into either Cluster 1 or Cluster 2, the pothole and toeslope clusters (results not shown). Only 3 out of 18 yield plots were correctly assigned to Cluster 1 and one out of seven to Cluster 2. Classification of yield plots into the other clusters however, exceeded 80% accuracy. Clusters 1 and 2 were contiguous (Fig. 2) and differed primarily by the severity of yield depression in wet years (Fig. 3). None of the field attributes used accurately captured the enclosed drainage area common to the pothole clusters and were thus inadequate for distinguishing yield plots that would drown out and those that would not. Results for the multiple discriminant analysis when the roles of the subsets were reversed were similar, with poor classification results for Clusters 1 and 2.
Because the discriminating functions could not assign yield plots to Clusters 1 and 2 reliably and because these clusters had very similar temporal yield patterns, the two clusters were merged into a single toeslope cluster (Cluster 1&2) and the multiple discriminant analysis rerun. Repeating the stepwise discriminant analysis with the yield plots divided into four clusters instead of five resulted in the same field attributes (EC, PL, SL, EL, and PR) being significant, and they were used to develop new multiple discriminant functions.
Testing the discriminating functions developed for Data Subset 1 on Subset 2 showed an improvement in classifying yield plots into combined Cluster 1&2, but less than half of the yield plots in this cluster were identified correctly (Table 6). The multiple discriminant functions accurately classified yield plots that were in Cluster 3 in 82% of the cases. Eighty percent of the yield plots in Cluster 4 were identified correctly while yield plots in Cluster 5 were correctly identified in 80% of the cases. Results for the discriminating functions developed for Data Subset 2 were similar, with 31% of the yield plots assigned to Cluster 1&2 in the holdout sample correctly identified, 87% correctly identified for Cluster 3, 100% for Cluster 4, and 71% for Cluster 5. Yield plots that were in Cluster 1&2 and incorrectly identified were always identified as belonging to Cluster 3. Yield plots in Cluster 5 that were misclassified were primarily classified as belonging to Cluster 3 also. Conversely, the most common misclassification of yield plots in Cluster 3 was into Cluster 5. Classification of Cluster 4 yield plots was the most reliable, with only one yield plot out of nine misclassified. Thus, the multiple discriminant functions had the least success in identifying proper membership for plots in the lower, nearly-level areas of the field.
|
A final set of multiple discriminant functions was calculated by using all 224 yield plots in the calibration process. The canonical composites computed for the field attributes showed that only the first two composites were significantly correlated with the cluster means. The most important composite (Composite I) accounted for 82% of the discrimination between clusters while Composite II contributed 16% to the discrimination between clusters. Ideally, the yield plots would separate into four distinct clusters when plotted against the canonical composites of the retained field attributes. Figure 4 shows that there was separation in the clusters but that the separation was not perfect, i.e., the clusters were not compact and there was considerable overlap among cluster members. Cluster 4 separates the best of the four clusters, whereas there was considerable overlap between Cluster 1&2 and Cluster 3, illustrating that the multiple discriminant functions based on these field attributes were not consistently capable of assigning plots to these clusters. Figure 4 illustrates that the greatest separation of clusters was provided by the first canonical composite, with the second composite contributing little to the separation. Potentially, other field attributes, such as soil chemical properties, could have been included in the analysis to better separate the clusters along a second canonical composite.
|
|
Along Composite II, there was little separation between Clusters 1&2, 3, and 5, but Composite II was effective in separating Cluster 5 from Cluster 4. None of the attributes loaded heavily on Composite II although SL had the highest loading on this composite. Plan curvature was noteworthy also because it loaded much more heavily on Composite II than on Composite I. Surprisingly, PR loaded evenly on the two composites and was not particularly effective in distinguishing shoulder plots from backslope plots. Lack of heavy loading on Composite II illustrated that none of the field attributes were dominate in differentiating the shoulder (4) cluster from backslope (5) cluster.
Directions of the loading vectors indicated that EC had its highest values for the toeslope (1&2) and footslope (3) clusters while EL was lowest for these clusters. This is consistent with earlier work that showed the highest EC values in areas where water and clay accumulate (Jaynes et al., 1995b). The steepest SLs and greatest PLs occurred for the shoulder (4) cluster. Schumacher et al. (1999) modeled tillage-induced erosion and showed that shoulder positions typically experience the greatest tillage-induced soil loss, which can reduce yield. Thus, erosion may partly explain the lower yields for the shoulder cluster in this field as well.
Prediction of Yield Cluster Location
Using the 2-m cell coverages of EC, EL, SL, PL, and PR and the multiple discriminant functions developed from the transect plots, yield cluster affiliation was calculated for every 2-m grid cell within the 32-ha field. As with the yield plot locations, the clusters formed simple spatial patterns across the field (Moran's I = 0.94, P < 0.001), roughly aligned with landscape positions rather than in a random spatial pattern (Fig. 6)
. As noted above, the multiple discriminant functions were only successful in identifying Cluster 1&2 locations about half of the time. Comparing Fig. 6 with Fig. 2, it was obvious that the functions were not properly identifying the known extent of Cluster 1&2 in the pothole region in the south-central portion of the field but rather identified much of the region as belonging to Cluster 3. Other than the large pothole in the southern half of the field, the toeslope cluster (1&2) extended over only a limited area of the field, principally in drainage ways and areas where surface water collects. These areas produced good corn yields in drier years, but yields were severally reduced in wetter years. Cluster 4, the shoulder cluster, also extended over a limited portion of the field, primarily in isolated hillslope and sideslope locations, and represented areas of consistently low yields. Cluster 3, the footslope cluster, comprised the greatest area of the field and represented the area within the field where corn yields were consistently high for most years. Cluster 5, the backslope cluster, occupied the higher ground, primarily along two large ridges and represented areas of somewhat lower yield expectations for drier years but good yields in wetter years.
|
| SUMMARY |
|---|
|
|
|---|
In the profiling step of the cluster analysis, multiple discriminant analysis was used to test for correlations between the yield clusters and the field attributes EC, SC, and EL and the primary terrain attributes calculated from EL: SL, PL, PR, and AS. All field attributes except SC and AS were significant in discriminating between the clusters. High correlations between EC, EL, and SC eliminated SC from contributing significantly to the discriminant analysis. The first attempt at discriminant analysis showed that the field attributes were unable to discriminate between the cluster containing yield plots within a closed depression or pothole and the cluster comprised of plots from other low areas of the field. Because of similar temporal yield patterns for these two clusters, they were combined and the discriminant analysis repeated for four clusters.
Calibrating the multiple discriminant functions on half of the 224 yield plots and validating against the other half of the data, we achieved a 76% success rate in classifying the yield plots into the correct yield cluster. Reversing the roles of the two data subsets gave an 80% success rate in classifying the yield plots into the correct yield cluster. The functions performed worse for predicting membership in the combined Cluster 1&2 compared with the other three clusters. Soil electrical conductivity was the principle field attribute that discriminated between three of the four clusters. Slope and PL were important attributes for discriminating Cluster 4 from Cluster 5. That 20% of the yield plots were misclassified with these field attributes indicates that the yield clusters were not perfectly aligned with terrain. Other terrain or soil attributes will be required to completely identify yield cluster membership.
Using the multiple discriminant functions calibrated with all 224 yield plots in the southern half of the field and field attributes averaged for 2-m grid cells across the 32-ha field, we were able to partition the entire field into four yield clusterseven areas where yield data had not been collected. With the ability to partition the entire field into a few yield clusters, the interpretation of what the clusters may mean for managing the field remains. Yield potential was not the same among the clusters, with the shoulder cluster consistently lower than the footslope and backslope clusters and the toeslope cluster having extremely variable yields depending on growing season precipitation. A question that needs to be answered is whether the identified yield clusters can be equated with management zones, i.e., subregions within a field that express a homogenous combination of yield-limiting functions for which a single rate of a specific crop input is appropriate (Doerge, 1999). If response curves for various inputs such as N are unique for the different yield clusters, then varying input rates may be a cost-effective strategy for corn production on this field.
| ACKNOWLEDGMENTS |
|---|
| NOTES |
|---|
|
|
|---|
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
C. L. Williams, M. Liebman, J. W. Edwards, D. E. James, J. W. Singer, R. Arritt, and D. Herzmann Patterns of Regional Yield Stability in Association with Regional Environmental Characteristics Crop Sci., July 1, 2008; 48(4): 1545 - 1559. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. E. Massey, D. B. Myers, N. R. Kitchen, and K. A. Sudduth Profitability Maps as an Input for Site-Specific Management Decision Making Agron. J., January 11, 2008; 100(1): 52 - 59. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Sawchik and A. P. Mallarino Evaluation of Zone Soil Sampling Approaches for Phosphorus and Potassium Based on Corn and Soybean Response to Fertilization Agron. J., November 6, 2007; 99(6): 1564 - 1578. [Abs |