|
|
||||||||
a Dep. of Global Ecol., Carnegie Inst. of Washington, Stanford, CA 94305, and Dep. of Geol. and Environ. Sci., Stanford Univ., Stanford, CA 94305
b Int. Maize and Wheat Improvement Cent. (CIMMYT), Wheat Progr., Apdo. Postal 6-641, 06600 Mexico D.F., Mexico
c Cent. for Environ. Sci. and Policy, Inst. for Int. Studies, Stanford Univ., Stanford, CA 94305
* Corresponding author (dlobell{at}stanford.edu)
Received for publication March 4, 2004.
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: CC, compacted clay DC, deep clay OLS, ordinary least squares
| INTRODUCTION |
|---|
|
|
|---|
Despite the importance and prevalence of the yield gap, its precise causes in many regions are not well known, owing in part to a lack of data on spatial variations in crop yields and yield-controlling factors (White et al., 2002). Surveys of farmer practices, supplemented by measurements of soil properties and crop performance, have provided a valuable means of assessing yield constraints in farmers' fields (e.g., Calvino and Sadras, 2002; Sadras et al., 2002). However, the time required to conduct a comprehensive survey, and in particular to collect accurate soil and crop measurements, can limit the number and extent of surveys. This is particularly true in regions with limited resources devoted to agricultural research, such as throughout the developing world. In addition, surveys are often motivated by specific questions and, as a result, fail to measure the full suite of variables needed to analyze yield variation (Wiese, 1982).
Recent developments in remote sensing have shown great promise for quantifying yield variations both within and between fields (Maas, 1988; Moulin et al., 1998; Shanahan et al., 2001; Baez-Gonzalez et al., 2002; Lobell et al., 2003). However, while many studies have employed remote sensing in precision agriculture to analyze variations within individual fields (e.g., Wiegand et al., 1994; Plant, 2001), few have addressed between-field yield variations across the landscape. In the context of crop surveys, yield remote sensing potentially provides three unique advantages over ground-based approaches. First, the ability to bypass field measurements of yield allows more time for other survey activities, which can result in increased sample sizes. Second, remote sensing allows yield estimates at a range of spatial scales, whereas field measurements are typically obtained from a limited number of small plots within fields and are therefore prone to sampling errors associated with within-field variability. Third, crop yields can be assessed for previous growing seasons using archived imagery, enabling analysis of past surveys that may not have measured yield.
Remote sensing thus offers a chance to increase the quantity and quality of survey data needed to identify on-farm yield constraints. Another important factor for understanding yield constraints is the type of model used to analyze the data. Multiple linear regression modeling, for example, is a commonly used approach but can lead to inaccurate and unstable solutions when applied to data sets with certain characteristics, such as a large number of insignificant predictor variables or the presence of strong interactions between variables (Hastie et al., 2001). Yield survey data, which often exhibit both of these characteristics, may therefore be poorly modeled with linear regression. Various alternatives to linear models have been developed in recent years that take advantage of the greater computing power available today. One such technique is regression tree modeling (Breiman et al., 1984), which is a conceptually simple yet powerful analysis tool that has been increasingly applied in ecological and agricultural sciences (e.g., Plant et al., 1999; De'ath and Fabricius, 2000; Lapen et al., 2001). Important features of regression trees related to survey data are (i) automated variable selection, (ii) a structure that highlights interactions between variables, (iii) ease of interpretation, and (iv) an ability to handle missing data (Hastie et al., 2001).
This study investigates sources of between-field yield variability in the Yaqui Valley, an irrigated region comprising 225000 ha in Sonora, Mexico. Average yields of wheat, the main crop in the Valley, increased from roughly 2.0 t ha1 in 1960 to 5.0 t ha1 in 1980 and have since remained near this level. Yet experimental trials and several farmers in the region regularly attain yields of 7.5 to 8 t ha1, indicating a yield gap of roughly 2.5 t ha1 that represents a significant opportunity for increasing regional production. Periodic surveys have been conducted in the Valley since 1981, revealing considerable variability in farmer practices (Flores et al., 2001). However, only one survey directly measured crop yields, and in this case, the factors underlying variability were not clearly resolved, due in part to limited yield variability among the 52 samples (Meisner et al., 1992).
Here we used Landsat Enhanced Thematic Mapper Plus (ETM+) data, with 30-m spatial resolution, to estimate wheat yields in the Yaqui Valley for the 2001 and 2003 harvest seasons. These yield estimates were combined with data on management practices from coincident field surveys to identify factors contributing to yield variations in farmers' fields. Both linear regression and regression trees were used to analyze managementyield relationships, providing a means to assess the relative performance of each technique in the context of explaining the yield gap.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Remotely Sensed Yield Estimates
Landsat ETM+ images of the Yaqui Valley were acquired on 11 Jan. and 16 Mar. of 2001 and 1 Jan., 6 Mar., and 22 Mar. of 2003. These images were used to estimate wheat yields following the approach described in detail by Lobell et al. (2003). Briefly, this approach uses instantaneous estimates of canopy light absorption from the satellite images to adjust a locally calibrated model of wheat growth, which then provides an estimate of wheat yield for each pixel determined, based on a multitemporal classification, to contain wheat. The total area and average yield estimates for the two growing seasons were within 3% of values reported for the agricultural district (Table 1). In addition, yields for individual fields provided by local farmers were compared with the average of remote-sensing estimates for pixels completely contained within their fields. This field-level evaluation resulted in a close agreement between ground and remote-sensingbased estimates (Fig. 1), demonstrating the ability of remote sensing to capture spatial variability of yields across the landscape.
|
|
|
A random sample of 20 fields was selected from each of the four classes, resulting in a total of 80 fields. The main goal of this stratified design was to ensure sufficient contrast in yields between fields for the statistical analysis. A secondary goal was to evaluate soil typemanagement interactions.
Soil properties were not directly measured in the surveys, for several reasons. First, the 2001 survey was originally focused on understanding farmer practices and not specifically on sources of yield variability. Therefore, soil properties were not of direct interest in the original context of the 2001 survey. Second, the required time and expense for soil collection and analysis made soil testing for each field within the survey unfeasible. Third, an existing map of soil types within the Valley obtained from the National Institute of Forestry, Agricultural and Animal Research (INIFAP) enabled at least a general description of soils on each field. Fourth, and most importantly, a previous study of spatial patterns in remotely sensed yields indicated that the majority of yield variability occurred over short distances, suggesting that between-field variations in management practices were a more important contributor than soil properties to yield variability (Lobell et al., 2002). Thus, the limited scope and resources of the surveys, prior knowledge of general soil conditions, and indications that soil properties were not a major source of yield variability resulted in the absence of detailed soil measurements. In addition, meteorological conditions were not measured on each field but were assumed equal to conditions measured at the central meteorological station because of the close proximity of the fields and the minimal change in elevation. The implications of the missing soil and weather information are discussed below.
Data Analysis
Three approaches were used to assess causes of yield variation. In the first analysis, the data were split into two subsets: one containing fields with the highest 20 yields and the second with the lowest 20 yields. A t test was then performed for each survey variable to test the hypothesis that its average value was the same for the lowest- and highest-yielding fields (Meisner et al., 1992). The MannWhitney (or Wilcoxon) test, which is the nonparametric equivalent to the t test, was also used to ensure the results were not influenced by non-Gaussian distributions in the management variables (Conover, 1999). However, the results were very similar to the t test and are therefore not presented.
The second analysis employed multiple linear regression, with forward stepwise variable selection used to identify the relevant predictor variables (Hastie et al., 2001). The Akaike Information Criterion (AIC) was used to determine the stopping point (i.e., number of variables included):
![]() |
Finally, the survey and yield data sets were analyzed with regression trees (Breiman et al., 1984). In this method, the response variable (i.e., yield) is modeled as a piece-wise constant function. The data are first split into two subsets based on the predictor variable and value of that variable that results in the greatest increase in explained variance of the response variable. Each subset, or daughter node, is then analyzed independently using the same binary partitioning procedure, with a split performed only if the resulting model exceeds a predefined threshold of improvement. The result of this recursive binary partitioning is a model whose structure can be displayed as a tree-like graph, with each split in the tree labeled according the threshold used to define the split. All analyses described above were implemented in the software package R (Ihaka and Gentleman, 1996).
A potential problem when applying ordinary least-squares (OLS) regression models to spatial data is that errors may be spatially correlated (i.e., not independent), which violates a basic assumption of OLS methods and may introduce bias into model interpretation (Long, 1998; Haining, 2003). In particular, one is prone to underestimate the uncertainties associated with model parameters and, thus, the corresponding p values. To assess the influence of spatially correlated errors, model residuals were tested for spatial correlation using the Moran I and Geary c tests. Both tests indicated a significant level of spatial correlation in model errors for both the linear regression and regression tree models (p < 0.05). We therefore repeated the linear regression analysis using a maximum likelihood approach to simultaneously solve for model parameters and spatial error correlation, as implemented by the package "spdep" in R. The coefficients of each model variable were within 6% of the original estimates, indicating that explicit consideration of spatial correlation of errors did not substantially change model estimates. We therefore present only the OLS estimates while acknowledging that spatial autocorrelation may contribute to underestimation of parameter uncertainties.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
A comparison of the highest- and lowest-yielding fields (Table 3) revealed that no management variable was significantly different (i.e., p < 0.05) between the two yield classes in both years. For example, planting date (DTPL) and N rate (N) appeared as important factors in 2001 but not 2003, whereas irrigation timing appeared important in 2003 but not 2001.
|
|
The fact that sources of spatial yield variability changed significantly between years reflects the importance of climatemanagement interactions at the regional scale. As this study spanned only 2 yr, it is difficult to say whether most years in the Valley are similar to one of these years or whether each year presents a unique set of factors that dominate spatial yield variability. In either case, it is clear that management recommendations should account for climatic conditions when possible and that surveys conducted in individual years must be interpreted with caution when applied to new situations.
Stepwise Linear Regression
The t tests presented above provide useful comparisons of the relationship between individual factors and crop yields, but multivariate models are needed to assess the combined impact of different variables taking into account their covariations. For 2001, stepwise linear regression selected a model with nine variables (Table 4). In this model, insecticide application, N rates, planting date, P rates, and field ploughing were deemed positively related to yield, whereas negative relationships were inferred for bed reformation, number of irrigations, days between preplant irrigation and planting, and leveling of canals. The magnitude and sign of the regression coefficients should be interpreted with care, particularly for those variables with high standard errors since correlation between predictor variables can impact the regression estimates. In this case, predictor variables were not highly correlated, with only bed reformation and leveling of canals exhibiting a correlation with absolute magnitude greater than 0.35 (not shown). Nonetheless, of interest here is the explanatory power of the model, which equaled 51% with nine variables.
|
|
|
|
In 2003, time between planting and first irrigation was the most important variable, with fields irrigated more than 56 d after planting experiencing yield reductions (Fig. 4). In those fields that were irrigated in time, the amount of N received at first application was an important determinant of yield. In contrast, the most important variable for fields that were not irrigated in time was land leveling. These differences reflect the interaction between management factors; i.e., N levels were only important if the plant had sufficient water to make use of the N. Interestingly, not one of the 13 fields that received sufficient water and fertilizer fell below 5.5 t ha1 while fields that were irrigated more than 56 d after planting and experienced one or more leveling were all below this level. The overall model explained roughly 52% of yield variability using three variables, which is a substantial improvement over the linear model (29% with three variables).
To evaluate the interaction between management and soil type, the regression tree model was applied separately to the fields on each soil type. The results indicated that timing of irrigation was the most important variable on both soils but that the critical threshold was 3.5 d earlier on the CC soil, which has lower water-holding capacity (Fig. 5). In addition, days between preplant irrigation and planting were deemed important on the CC while N rates were selected on the DC. These results are consistent with the fact that CC soils hold less water, and thus yields are more sensitive to water management.
|
Unmeasured Sources of Variability
The fraction of yield variability not explained by the statistical models above (roughly 50% in both years) can generally be attributed to three factors. First is the presence of measurement error, both for management variables and yield estimates. While farmers' answers to survey questions represent the best available information, errors may result from imperfect farmer memory of practices for a specific land parcel. These errors were not quantified in this study, as doing so would require independent sources of management information. Errors in the Landsat yield estimates may also contribute to model error. Based on the observed correlation between actual and estimated yields (Fig. 1), yield errors can explain a maximum of roughly 25% of model error.
A second potential source of unexplained yield variance is an inability of the statistical models to capture the true relationship between management and yield. The improvement of decision trees over linear models, for instance, reflects the increase in explanatory power possible with more appropriate models. It is possible that the use of process-based crop models would improve the agreement between modeled and measured yields.
Finally, model error may reflect the absence of important explanatory variables, such as management variables that were not measured in the survey (e.g., planting depth), differences in pest populations, soil properties, or spatial variations in weather. The existence of spatial correlation in model errors suggests that at least part of the unexplained variance in yields was attributable to factors that exhibit spatial autocorrelation across the landscape. The management variables that were recorded in this study did not generally exhibit spatial autocorrelation (Moran's I test, p > 0.1); therefore, one would expect nonmanagement variables to explain part of the model residuals. Moreover, the spatial patterns of these residuals did not correspond to existing maps of soil type (not shown). Therefore, we suspect that at least part of the model error is due to factors such as soil characteristics other than type, weather conditions, and spatially dependent biological processes such as weed competition or rust infestation. Future work is needed to explore these factors in more detail. However, our results suggest that such processes are likely to explain a small fraction of yield variability relative to the major management factors.
As with all empirical models, the interpretation of the results above must be qualified with two caveats. First, it is always possible that an unmeasured, latent variable has introduced bias into model results, even though we were careful to measure all factors that we considered potential explanatory variables. Second, one must recognize the possibility that the inferred importance of a measured variable is not due solely to a direct effect of that variable on the response but in part to the effect of another variable that covaries with the first.
For these reasons, it is often suggested that empirical models should be used only for prediction of unobserved quantities and not for modeling response of systems to change (i.e., extrapolation). However, in situations where direct experimental manipulation is impractical, empirical models play an important if not complete role in uncovering causeeffect relationships. In particular, empirical models of spatial crop yield variability can provide valuable information on the relative importance (or unimportance) of known mechanisms at the field or regional scale (Landau et al., 2000; Corwin et al., 2003). In these cases, an important distinction should be made between correlation and causation, and model interpretation should be guided by whether model variables and their coefficients are physically reasonable, as they were in this study.
| CONCLUSIONS |
|---|
|
|
|---|
Overall, it appears that management variations, as opposed to soil or climatic constraints, drive the majority of yield variability in the Yaqui Valley. This conclusion is consistent with previous interpretations based on analysis of spatial yield patterns (Lobell et al., 2002) and has the important implication that the yield gap can be significantly reduced through management changes. For example, the average yield in the highest management class was 0.84 t ha1 higher than the Valley average in 2001 and 0.98 t ha1 higher in 2003.
However, these results also indicate that strategies to improve yields through management must consider the role of climate variability. For example, N availability may constrain yields in one year, but increasing application rates may not make sense in the context of interannual climate variability (Lobell et al., 2004). Conversely, the climatic dependence of management impacts implies that seasonal weather forecasts would be useful for a wide range of management decisions, including those related to fertilizer, irrigation, planting date, soil preparation, and pest control.
The results presented here imply that improved fertilizer and water use are the most pressing management needs for increasing yields. It is interesting to note that timing of irrigation was more important than number of irrigations, suggesting that the efficient use of water is more important than total amount of water applied for yields in this region. Similarly, previous studies in this region have documented the improved N use efficiency attainable through better timing of N application (Matson et al., 1998). For both water and N, it therefore appears that efficiency of resource use plays as important a role, if not more, than total input amounts. Consideration of the economic and environmental costs associated with increased inputs places an even greater emphasis on the need for more efficient resource use (Matson et al., 1998; Cassman, 1999).
In general, an understanding of biophysical constraints to yield is only a first step toward improved management because food production is only one objective of agricultural activity. Farmers, for example, are most concerned with profit, and yield gains from higher fertilizer rates may not justify the associated increase in costs. An eventual goal of this research is thus to quantify all of the agronomic, economic, and environmental trade-offs associated with management changes. A quantitative understanding of yield controls is a critical step in this direction.
Beyond the Yaqui Valley, the results presented here have important implications for the design and interpretation of studies aimed at understanding regional yield variations. For example, it is important to conduct surveys in multiple years to ensure that results are not specific to the (potentially unusual) climatic conditions of the survey year. The need for multiple surveys places a premium on cost-effective approaches to conducting yield surveys, such as achieved by replacing intensive field measurements of yield with remote-sensing estimates. In addition, commonly used linear regression models have important limitations when used to assess yield variations. Regression trees, which provide a simple means of capturing nonlinear relationships and variable interactions, appear to be a valuable tool for identifying regionally significant yield constraints.
Given the growing demand for food, the limited prospects for increasing yield potential, and the environmental consequences associated with overapplication of inputs, improved understanding of spatial variations in crop yields is greatly needed. Remotely sensed estimates of crop production provide a unique perspective that, when combined with field surveys, should enhance our ability to identify management priorities for improving regional production and/or reducing environmental impacts.
| ACKNOWLEDGMENTS |
|---|
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
D. B. Lobell and J. I. Ortiz-Monasterio Satellite Monitoring of Yield Responses to Irrigation Practices across Thousands of Fields Agron. J., June 16, 2008; 100(4): 1005 - 1012. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Girma, S. L. Holtz, D. B. Arnall, L. M. Fultz, T. L. Hanks, K. D. Lawles, C. J. Mack, K. W. Owen, S. D. Reed, J. Santillano, et al. Weather, Fertilizer, Previous Year Yield, and Fertilizer Levels Affect Ensuing Year Fertilizer Response of Wheat Agron. J., November 6, 2007; 99(6): 1607 - 1614. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Balota, W. A. Payne, S. R. Evett, and M. D. Lazar Canopy Temperature Depression Sampling to Assess Grain Yield and Genotypic Differentiation in Winter Wheat Crop Sci., July 30, 2007; 47(4): 1518 - 1529. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. B. Lobell, J. I. Ortiz-Monasterio, F. C. Gurrola, and L. Valenzuela Identification of Saline Soils with Multiyear Remote Sensing of Crop Yields Soil Sci. Soc. Am. J., April 5, 2007; 71(3): 777 - 783. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| The SCI Journals | Crop Science | Vadose Zone Journal | |||
| Journal of Natural Resources and Life Sciences Education |
Soil Science Society of America Journal | ||||
| Journal of Plant Registrations | Journal of Environmental Quality |
The Plant Genome | |||