|
|
||||||||
a Dep. of Agric. and Biol. Eng., Univ. of Florida, Gainesville, FL 32611-0570 USA
b Dep. of Agron., Univ. of Florida, Gainesville, FL 32611 USA
c Dep. of Agron., Kansas State Univ., Manhattan, KS 66506 USA
d Dep. of Crop and Soil Sci., North Carolina State Univ., Raleigh, NC 27695 USA
airmak{at}agen.ufl.edu
| ABSTRACT |
|---|
|
|
|---|
or only half of the data were used to fit cultivar coefficients. These errors were similar to those computed using independent data. The RMSEP for yield using linear regression were better than using generic MG coefficients but not as good as that found by fitting model coefficients. We conclude that soybean yield can be simulated for specific cultivars using either crop model or regression approaches, but the latter was not adequate for predicting cultivar anthesis and maturity dates.
Abbreviations: MG, maturity group RMSE, root mean squared errors of fitting RMSEP, root mean squared errors of prediction
| INTRODUCTION |
|---|
|
|
|---|
Variety trial data are taken from a broad range of environments and have been useful for calibrating models over large areas to improve model performance (Piper et al., 1998). Mavromatis et al. (2001) showed that variety trial data can be used to fit the coefficients of a widely used soybean model (CROPGROSoybean, Boote et al., 1998), so that simulation results describe observed genotype x environment interactions. However, the accuracy of the model in predicting independent data has not been tested with coefficients derived from variety trial data. The procedure described by Mavromatis et al. (2001) to estimate model coefficients is computationally intensive. Thus, this procedure may be difficult to integrate into private and public cultivar-development programs. Wilkerson and Dunphy (unpublished data, 1998) have developed a simple approach to simulate specific soybean cultivars. This approach makes use of generic MG coefficients in the crop model along with a linear-regression equation (Heiniger and Dunphy, 1998) to adjust predicted yield for specific cultivars. Although data-fitting results have been good, this approach has not been evaluated using independent data.
Welch et al. (1999, 2000) introduced a general approach that greatly improves the computational efficiency of estimating genetic coefficients for large sets of cultivars. The method relies on three simple ideas. First, relatively coarse grid searches are adequate because, except for negligible high-frequency noise, crop model goodness-of-fit response surfaces seem to vary slowly over their parameter spaces. This was observed for CERESMaize by James et al. (1999) and verified for CROPGROSoybean (Mavromatis et al., unpublished data, 1999). Second, it is only necessary to simulate those site-year maturity management cases that result in distinct model predictions. This alone can reduce calculations over an order of magnitude when large numbers of cultivars are tested in common plantings. Third, all model runs can be completed first, and the results can be stored for later use in goodness-of-fit calculations.
A number of studies have shown the importance of model calibration in the use of crop simulation for making farm-management decisions (Heiniger et al., 1997), estimating large-area yields (Hodges et al., 1987), testing model improvements (Boote et al., 1997), and predicting new cultivar performance (Liu et al., 1989). A technique is needed to both calibrate and test a model with limited data because collecting data is expensive and time consuming. Common practice has been to split available data into two groups: One for parameter estimation and the other for testing. However, with limited data, such splitting may result in less-accurate parameter estimates and prediction variances (Jones and Carberry, 1994).
Cross validation statistical procedures can be used to estimate cultivar characteristics when data are limited. Parameter estimates obtained from least squares procedures have a bias inversely proportional to sample size that may be unacceptably large with small sample sizes (Jones and Carberry, 1994). Cross validation may provide a more reliable estimate of the prediction variance than that derived from only a subset of the data. The basis of this procedure is the use of resampling from the complete dataset where data are repeatedly divided into pairs of unevenly sized subgroups. The larger group in each pair is used to estimate the parameters, and the smaller group is used to estimate the prediction variance. This sampling with partial replacement is repeated a number of times, resulting in improved estimates of the parameters and prediction variances at the expense of extra computational effort. However, it is not known if the use of cross validation can provide similar estimates of prediction errors for cultivar parameters when compared with the use of an independent data set.
The objectives of this research were to: (i) estimate soybean crop-model prediction errors for anthesis, maturity, and yield using variety trial data; (ii) determine the effectiveness of cross validation for estimating prediction errors of the soybean model; and (iii) compare these prediction errors with those based on regression equations relating specific variety yields to simulated MG yields.
| Materials and methods |
|---|
|
|
|---|
, half of the experiments randomly selected, and a small orthogonal set
. In the third step, prediction errors were estimated for independent data using coefficients estimated from the randomly selected
and orthogonal
subsets of data. Next, cross validation was used for estimating parameters and prediction errors. Finally, prediction errors were estimated when a linear-regression equation was used to predict anthesis, maturity, and yield for independent data.
CROPGRO Model
The CROPGROSoybean model (Hoogenboom et al., 1994; Boote et al., 1998) has been shown to adequately simulate crop growth at a field or research plot scale (Boote et al., 1998). The model requires inputs, which include management practices (cultivar, row spacing, plant population, fertilizer, and irrigation amounts and dates) and environmental conditions (soil type, daily maximum and minimum temperature, rainfall, and solar irradiance). From this information, daily growth of vegetative and reproductive components are computed as a function of daily photosynthesis, growth stage, and water and N stress (Boote et al., 1998; Hoogenboom et al., 1994).
CROPGROSoybean requires inputs for variety-specific traits (Boote et al., 1998) to describe: (i) cultivar sensitivity to day length and temperature, (ii) vegetative growth traits (e.g., maximum leaf-photosynthesis rate), and (iii) reproductive growth traits (e.g., potential seed size). A number of other coefficients relate to timing of vegetative and reproductive growth (e.g., time from first flower to first seed, Table 1) . These are measured in photothermal days, which combine the standard concept of degree days with a measure of day length. Cultivar differences include traits that influence life-cycle duration and degree of determinacy. Soybean cultivars are categorized into MG from 000 to XII, based primarily on their sensitivity to day length, which influences their life-cycle duration. Cultivar coefficients within a MG are generally similar across varieties (Boote et al., 1997) although individual cultivars may depart in one way or another from group norms. These cultivar coefficients, along with site and year-specific environmental variations, result in cultivar performance variability.
|
Weather data (daily solar irradiance, precipitation, and maximum and minimum air temperature) for each site were obtained from the Georgia Automated Environmental Monitoring Network (Hoogenboom, 1996; Hoogenboom and Gresham, 1997). The most common soil for each location in the variety trials was identified from soil surveys determined by Perkins et al. (1978, 1979, 1983, 1985, and 1986). Soil textures were loamy sand for Midville, sandy loams for Tifton and Plains, clay loam for Calhoun, and sandy clay loam at Griffin. The soil characteristics were then used by Mavromatis et al. (unpublished data, 1999) to calculate the physical and chemical parameters required to run CROPGRO (Ritchie, 1998; Tsuji et al., 1994). These soil profile data, summarized by soil textures, a dimensionless soil fertility factor (SLPF), rooting depth, and total soil water-holding capacity (between lower limit and drained upper limit) in Table 2 , were used in this study. The initial soil water at planting was set to field capacity for all years and locations. The effects of tillage, pests, and diseases were not directly considered in the simulations.
|
![]() | (1) |
Fitting and Evaluating Cultivar-Specific Coefficients
Three sets of data were used to estimate cultivar-specific coefficients to determine the sensitivity of coefficient estimates to number and choice of environments and estimate errors in predicting independent data. First, data from all locationyear combinations
were used to estimate coefficients for the Hagood cultivar. Simulated anthesis date, maturity date, and yield were compared with observed values. Second, half of the environments for each of the six cultivars were randomly selected for estimating cultivar-specific coefficients. Nonlinear least squares procedures were used to estimate the set of cultivar coefficients in Table 1 that minimized squared errors between simulated and observed variables. The fitting procedures are described later. The RMSE for the cultivar coefficients were computed using the same equation used for the RMSEP. The other half of the environments were used as an independent data set for validation, and RMSEP values were computed. Finally, a subset of the 40 environments was selected to create an orthogonal data set. Only 14 environments and four of the cultivars could be included in this subset because of the orthogonality requirement; these environments were not randomly chosen. Data for the 14 environments were used to fit cultivar coefficients, and the remaining independent data were used to compute the RMSEP. Since some of the data were missing for anthesis, the subset was orthogonal for maturity and yield, but not for anthesis. By chance, the orthogonal data set included more drought years (lower yields) and only early planted crops.
Methods for Estimating Cultivar Coefficients
Cultivar-specific coefficients (Table 1) were estimated for each of the three data groups. The steps and ranges of coefficients used in the estimation procedure were those described by Mavromatis et al. (unpublished data, 1999). However, the more efficient computational procedure developed by Welch et al. (1999, 2000) was necessary for cross validation, and thus the procedures followed by Mavromatis et al. (unpublished data, 1999) were implemented using a database of simulated results containing all possible combinations of coefficients. The CROPGROSoybean model was run to create the databases with combinations of coefficients from Table 1 that were needed to fit anthesis, maturity, and yield for each cultivar. For Hagood, this required 779240 runs.
Coefficients for anthesis were first estimated by searching through the database for values of CS-DL that minimized errors between simulated and observed variety anthesis dates for the particular set of environments that were used for fitting
. CS-DL was varied over values between the generic MG V to MG IX (Boote et al., 1998). After fitting the coefficient CS-DL for anthesis, this new value was retained for estimating other coefficients.
The database was searched to estimate R1PRO and SD-PM by minimizing the sum of squares of error between simulated and observed maturity dates. The value of R1PRO for the MG VII cultivars was varied in the interval 0.2 to 1.0 h (Piper et al., 1996; Grimm et al., 1994). As SD-PM was varied, the coefficients FL-SH and FL-SD were also varied in proportion to the change in SD-PM to ensure that the relationships among these life-cycle duration variables remained constant. The values of FL-SD and SD-PM were varied by the same values in the interval (default for MG VII ± 4), which is supported by Piper et al. (1996). FL-SH was set to 0.625 times FL-SD.
Keeping the optimum coefficients obtained for flowering (CS-DL) and maturity (R1PRO, SD-PM, FL-SH, and FL-SD), we estimated coefficients to fit yield for individual cultivars across all sites and years. The optimum coefficients were identified by minimizing sum of squares of errors between simulated and observed yields, using a two-way grid search through the database on two groups of coefficients (LFMAX and THRESH; and SFDUR, PODUR, FL-SH, FL-SD, and SD-PM). These groups of coefficients were referred to in the search process as X1 and X2, respectively. Each coefficient in X1 was changed for each point in the grid search to either increase or decrease yield. Also, as X2 was incremented in the grid search, each cultivar coefficient in the X2 set was changed in proportion to its maximum change. In our study, LFMAX was allowed to vary from 0.93 to 1.13. Boote and Tollenaar (1994) reported LFMAX in a range from 0.82 to 1.39, with an average value of 1.05 mg CO2 m-2 s-1. A maximum change of ± 2.5% was used to search for the optimal value of THRESH (Vanderlip, Welch, and Schapaugh, unpublished data, 1998). This approach is described in more detail by Mavromatis et al. (unpublished data, 1999).
Cross Validation
We used cross validation to estimate cultivar-specific coefficients to evaluate the effectiveness of this approach for estimating coefficients and prediction errors. Because of the computational requirements of this approach, only one cultivar (Hagood) was used. Cross validation was first performed using all n=40 environments. The cultivar coefficients, RMSE, and RMSEP were estimated for anthesis date, maturity date, and yield. There was no independent estimate of the RMSEP because all data are used for parameter estimation and validation using cross validation.
In cross validation, all n environments can be used for both parameter estimation and model evaluation. The optimum cultivar coefficients were estimated n different times using n - 1 observations each time. During each iteration, a different observation (i) was left out, and yield or maturity and the error of prediction were computed for this it observation. The observation that was left out of the fitting procedure each time was treated as independent data, and coefficients obtained by fitting n - 1 observations were used in the crop model to predict that environment. By repeating this step for each observation, a total of n sets of coefficients and prediction errors were estimated. The average coefficient values and RMSEP were then computed.
To compare cross-validation results with those obtained from an independent set of observations, the randomly selected subset described above
was used in a second cross-validation procedure. This provided estimates of the coefficients, RMSE, and RMSEP using the procedure explained above with cross validation for
.
Linear-Regression Approach for Simulating Specific Cultivars
Cultivar x environment interactions can be determined by regressing observed cultivar variables (e.g., yields) vs. MG mean variables over all locations and years of available data. However, this approach does not provide a method for predicting cultivar responses in different environments. Therefore, we proposed a modified linear-regression approach that uses linear regression to fit cultivar variables to simulated values using the generic MG coefficients, assuming that these provide an estimate of mean response at each site. If successful, this approach would greatly simplify the procedures for incorporating new cultivars into application software that uses crop models.
For this part of the study, one cultivar (Hagood) and all n=40 environments were used to determine if this regression approach might provide an alternate way to predict yields for specific cultivars. First, all environments were used to estimate linear-regression coefficients for fitting observed cultivar anthesis, maturity, and yield to site means for all MG VII cultivars. The RMSE values for fitting the equations were computed to provide a reference for prediction errors. Secondly, generic MG coefficients were used to simulate MG means for all 40 environments using weather and soil data for each site-year planting date combination. Cross validation was used to fit linear-regression equations n times, leaving one observation out each time, to fit a-i and b-i in the equation:
![]() | (2) |
| Results and discussion |
|---|
|
|
|---|
|
|
. For the other five cultivars, mean measured yields were overestimated by 9.8% on average, indicating an expected model bias when using generic MG VII coefficients. On average, the RMSEP for yield was 484 kg ha-1 (21.0% of the yield averaged across six cultivars, Table 3).
Fitting Cultivar-Specific Coefficients
Fitting n=40 Environments
This procedure estimated coefficients for Hagood using n experiments as a reference against which to compare results obtained from various subsets of data. The estimates of coefficients and summary statistics for anthesis, maturity, and yield are given for Hagood in Tables 4 and 5
, respectively. Measured anthesis was simulated well after solving for the critical day length (Fig. 1)
. The estimated CS-DL value (12.18) is typical for MG VIII cultivars (12.07). Fitting CS-DL resulted in an RMSE of 3.1 d (Table 5) compared with an RMSEP of 6.3 d when MG VII coefficients were used for prediction (Hagood results from Table 3). The simulated anthesis date averaged within 1 d of the observed average dates.
|
|
|
|
|
|
Fitting Orthogonal Dataset
Estimates of coefficients for Hagood are given in Table 4. The coefficients for maturity and yield were different from those found by fitting all data and half of the data although CS-DL values were similar. R1PRO, FL-SH, and FL-SD were lower while SD-PM was higher. LFMAX, THRESH, and PODUR were smaller while SFDUR was larger. These differences were not surprising because of the small sample size with only early planting-date environments.
Table 8
presents the results of fitting coefficients using the orthogonal dataset for four varieties. Mean measured anthesis dates for four cultivars were accurately fit, with errors averaging <1% (Table 8). The RMSE for anthesis were 4.5% of the mean dates for four cultivars, which is similar to results found when
environments were used for estimating coefficients.
|
.
Validating Cultivar-Specific Coefficients
Randomly Selected Half of Environments
This case uses cultivar coefficients estimated from half of the environments to predict flowering, maturity, and yield for the environments that were not used in the fitting procedure. A good match between observed and simulated anthesis was found for all cultivars (Table 7). On average, the RMSEP were 3.08 d (5.3% of the mean observed flowering date, Table 7), showing better predictions than those using MG VII coefficients (7.1%, Table 3). The RMSEP for maturity was 5.28 d (3.7% of mean observed maturity) for the independent dataset, lower than that found by using MG VII coefficients (4.9%, Table 3). CROPGRO was able to simulate 98.7% of the observed variability in maturity at the validation sites.
The model simulated average grain yield within 4.6% of observed yield for the independent dataset. The RMSEP of yield was 416.3 (17.8% of mean observed yield), lower than that found when MG VII coefficients were used (21.0%, Table 3). These results showed that the cultivar coefficients estimated using half of the environments predicted soybean-variety responses better than generic MG VII coefficients.
Orthogonal Subset
At the validation sites, the model simulated consistently early flowering dates, possibly because the orthogonal subset had early planting dates. The independent validation dataset had late-planting dates. The mean differences between measured and simulated flowering dates varied from 0.3 d for Stonewall to 2.7 d for Perrin (Table 8). Using coefficients obtained by fitting the orthogonal data set, flowering was predicted better than predictions using MG VII coefficients, but not as good as those found by using coefficients from the randomly selected n=20 dataset. The RMSEP were 3.3 d, which was 6.2% of actual flowering (Table 8). This was slightly smaller than 7.6% found by using MG VII coefficients (Table 3) but higher than the value of 5.4% obtained from simulations using coefficients estimated from the n=20 environments (Table 7).
Prediction errors for maturity at the validation sites were higher than errors obtained by fitting the orthogonal data set. The mean differences between measured and simulated maturity dates varied from 5.5 d for Perrin to 8.0 d for Stonewall (Table 8). CROPGRO explained 94% (for Stonewall) to 96% (for Perrin) of the observed variability in maturity for the independent dataset. Our results also show that cultivar coefficients estimated using the small number of environments (1014) were able to predict soybean responses only marginally better than using generic MG VII coefficients.
CROPGRO underestimated the actual yields for all cultivars (Table 8). Yield predictions were similar to those obtained using MG VII coefficients (Table 3) but not as good as those found by using coefficients from the randomly selected n=20 environments (Table 7). The model was able to explain 75.3 to 79.3% of the observed yield variability in the independent dataset using coefficients estimated from the orthogonal dataset.
Cross Validation
Cross validation was used with all n=40 environments and n=20 randomly selected environments for comparing prediction errors with those obtained by validation with independent data
.
All Data (n=40 Environments)
Parameters estimated for Hagood using cross validation were only slightly different than those found by fitting n=40 environments without cross validation (Table 4). The RMSE values for fitting anthesis, maturity dates, and yield using cross validation were 3.1 d, 5.1 d, and 430 kg ha-1, respectively (Table 9)
, similar to those found by fitting n=40 experiments without cross validation. Using cross validation, RMSEP values were 3.1 d, 5.07 d, and 443.2 kg ha-1 for anthesis, maturity, and yield, or 5, 3.5, and 18.7% of the observed mean values of these observations, respectively (Table 9).
|
(Table 5). The CROPGRO model explained 80% of the actual yield variation for the independent dataset for Hagood. Cross validation RMSEP values were 3.1 d, 5.3 d, and 448.2 kg ha-1, within 5.1%, 3.2%, and 19.9% of observed anthesis, maturity, and yield, respectively. These values were almost the same as those obtained by independent validation (3.49 d, 5.29 d, and 442.2 kg ha-1, respectively) (Table 5). Results confirm that cross validation was effective at estimating cultivar coefficients and prediction errors for the CROPGRO-Soybean model.
Linear-Regression Approach for Simulating Specific Cultivars
Table 10
shows the results of the linear-regression analysis of Hagood variables vs. environment means. Although yield was described accurately using the linear equation
, flowering and maturity dates had higher fitting errors
than any results based on the crop model (Table 8). One possible reason for this poor performance is that relationships are highly nonlinear for photoperiod effects, and changing the planting date could greatly change relationship of specific variety to average condition. Linear equations may not be able to fit this type of response. When cross validation was used to evaluate prediction errors for the linear regressioncrop model approach, results were not acceptable for predicting flowering and maturity. The RMSEP values were 9.2 and 16.4 d for flowering and maturity, respectively. This was considerably higher than any of the predictions based on the crop model alone (Table 11) . However, the yield prediction error for Hagood (469.9 kg ha-1) was reasonably close to the value obtained by fitting cultivar-specific coefficients (443.2 kg ha-1, Table 5). A comparison of the RMSE and RMSEP in Tables 10 and 11 clearly illustrates the importance of evaluating errors in prediction before applying models for such purposes. The error in predicting yield using the linear-regression approach was considerably higher than the fitting error.
|
|
| Conclusions |
|---|
|
|
|---|
| NOTES |
|---|
|
|
|---|
Received for publication August 20, 1999.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
K. J. Boote, J. W. Jones, W. D. Batchelor, E. D. Nafziger, and O. Myers Genetic Coefficients in the CROPGRO-Soybean Model: Links to Field Performance and Genomics Agron. J., January 1, 2003; 95(1): 32 - 51. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. Welch, J. L. Roe, and Z. Dong A Genetic Neural Network Model of Flowering Time Control in Arabidopsis thaliana Agron. J., January 1, 2003; 95(1): 71 - 81. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Hoogenboom and J. W. White Improving Physiological Assumptions Of Simulation Models By Using Gene-Based Approaches Agron. J., January 1, 2003; 95(1): 82 - 89. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Mavromatis, K. J. Boote, J. W. Jones, G. G. Wilkerson, and G. Hoogenboom Repeatability of Model Genetic Coefficients Derived from Soybean Performance Trials across Different States Crop Sci., January 1, 2002; 42(1): 76 - 89. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| The SCI Journals | Crop Science | Vadose Zone Journal | |||
| Journal of Natural Resources and Life Sciences Education |
Soil Science Society of America Journal | ||||
| Journal of Plant Registrations | Journal of Environmental Quality |
The Plant Genome | |||