Agronomy Journal Journal of Natural Resources and Life Sciences Education
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (19)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Irmak, A.
Right arrow Articles by Wilkerson, G. G.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Irmak, A.
Right arrow Articles by Wilkerson, G. G.
Agricola
Right arrow Articles by Irmak, A.
Right arrow Articles by Wilkerson, G. G.
Related Collections
Right arrow Soybean
Right arrow Crop Models
Agronomy Journal 92:1140-1149 (2000)
© 2000 American Society of Agronomy

SOYBEAN

Evaluating Methods for Simulating Soybean Cultivar Responses Using Cross Validation

Ayse Irmaka, James W. Jonesa, Theodoros Mavromatisa, Stephen M. Welchc, Kenneth J. Booteb and Gail G. Wilkersond

a Dep. of Agric. and Biol. Eng., Univ. of Florida, Gainesville, FL 32611-0570 USA
b Dep. of Agron., Univ. of Florida, Gainesville, FL 32611 USA
c Dep. of Agron., Kansas State Univ., Manhattan, KS 66506 USA
d Dep. of Crop and Soil Sci., North Carolina State Univ., Raleigh, NC 27695 USA

airmak{at}agen.ufl.edu


    ABSTRACT
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results and discussion
 Conclusions
 REFERENCES
 
Crop simulation models are used in research worldwide, and efforts are now being made to incorporate them into decision-support systems for farmers and their advisors. However, their on-farm acceptance will be limited unless methods can be found to determine model coefficients for new cultivars that are released by public and private breeders. The availability of data to determine coefficients is usually limited; however, soybean breeders routinely collect data for new cultivars from variety trials. Objectives of this research were to (i) estimate soybean crop-model prediction errors for anthesis, maturity, and yield using variety trial data; (ii) determine the effectiveness of cross validation for estimating prediction errors of the soybean model; and (iii) compare these errors with those based on regression equations relating specific cultivar yields to simulated maturity group (MG) yields. Root mean squared errors of prediction (RMSEP) were used for comparisons. Georgia variety trial data from 1987 through 1996 for six MG VII cultivars were divided into sets for fitting model coefficients and independent validation. The RMSEP using cross validation were similar to fitting errors when all or only half of the data were used to fit cultivar coefficients. These errors were similar to those computed using independent data. The RMSEP for yield using linear regression were better than using generic MG coefficients but not as good as that found by fitting model coefficients. We conclude that soybean yield can be simulated for specific cultivars using either crop model or regression approaches, but the latter was not adequate for predicting cultivar anthesis and maturity dates.

Abbreviations: MG, maturity group • RMSE, root mean squared errors of fitting • RMSEP, root mean squared errors of prediction


    INTRODUCTION
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results and discussion
 Conclusions
 REFERENCES
 
CROP MODELS have the ability to predict yield and evaluate different options to maximize profit and/or minimize losses of nutrients or chemicals by integrating the effects of daily weather data with soil characteristics and management practices. They have been used to characterize spatial yield variability and test hypotheses related to the causes of such variability (Paz et al., 1999; Allen et al., 1996). They can also be used to understand the effects of environmental factors, such as temperature, day length, soil characteristics, and water supply on genotype x environment interactions observed in variety trial data. However, the acceptance of crop models for on-farm use has been limited because they require that coefficients describing new cultivars be available as soon as they are marketed. Without these coefficients, the models cannot accurately simulate new cultivars that are being released each year by public and private breeders.

Variety trial data are taken from a broad range of environments and have been useful for calibrating models over large areas to improve model performance (Piper et al., 1998). Mavromatis et al. (2001) showed that variety trial data can be used to fit the coefficients of a widely used soybean model (CROPGRO–Soybean, Boote et al., 1998), so that simulation results describe observed genotype x environment interactions. However, the accuracy of the model in predicting independent data has not been tested with coefficients derived from variety trial data. The procedure described by Mavromatis et al. (2001) to estimate model coefficients is computationally intensive. Thus, this procedure may be difficult to integrate into private and public cultivar-development programs. Wilkerson and Dunphy (unpublished data, 1998) have developed a simple approach to simulate specific soybean cultivars. This approach makes use of generic MG coefficients in the crop model along with a linear-regression equation (Heiniger and Dunphy, 1998) to adjust predicted yield for specific cultivars. Although data-fitting results have been good, this approach has not been evaluated using independent data.

Welch et al. (1999, 2000) introduced a general approach that greatly improves the computational efficiency of estimating genetic coefficients for large sets of cultivars. The method relies on three simple ideas. First, relatively coarse grid searches are adequate because, except for negligible high-frequency noise, crop model goodness-of-fit response surfaces seem to vary slowly over their parameter spaces. This was observed for CERES–Maize by James et al. (1999) and verified for CROPGRO–Soybean (Mavromatis et al., unpublished data, 1999). Second, it is only necessary to simulate those site-year maturity management cases that result in distinct model predictions. This alone can reduce calculations over an order of magnitude when large numbers of cultivars are tested in common plantings. Third, all model runs can be completed first, and the results can be stored for later use in goodness-of-fit calculations.

A number of studies have shown the importance of model calibration in the use of crop simulation for making farm-management decisions (Heiniger et al., 1997), estimating large-area yields (Hodges et al., 1987), testing model improvements (Boote et al., 1997), and predicting new cultivar performance (Liu et al., 1989). A technique is needed to both calibrate and test a model with limited data because collecting data is expensive and time consuming. Common practice has been to split available data into two groups: One for parameter estimation and the other for testing. However, with limited data, such splitting may result in less-accurate parameter estimates and prediction variances (Jones and Carberry, 1994).

Cross validation statistical procedures can be used to estimate cultivar characteristics when data are limited. Parameter estimates obtained from least squares procedures have a bias inversely proportional to sample size that may be unacceptably large with small sample sizes (Jones and Carberry, 1994). Cross validation may provide a more reliable estimate of the prediction variance than that derived from only a subset of the data. The basis of this procedure is the use of resampling from the complete dataset where data are repeatedly divided into pairs of unevenly sized subgroups. The larger group in each pair is used to estimate the parameters, and the smaller group is used to estimate the prediction variance. This sampling with partial replacement is repeated a number of times, resulting in improved estimates of the parameters and prediction variances at the expense of extra computational effort. However, it is not known if the use of cross validation can provide similar estimates of prediction errors for cultivar parameters when compared with the use of an independent data set.

The objectives of this research were to: (i) estimate soybean crop-model prediction errors for anthesis, maturity, and yield using variety trial data; (ii) determine the effectiveness of cross validation for estimating prediction errors of the soybean model; and (iii) compare these prediction errors with those based on regression equations relating specific variety yields to simulated MG yields.


    Materials and methods
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results and discussion
 Conclusions
 REFERENCES
 
We first evaluated the generic MG coefficients distributed with the CROPGRO–Soybean model (Boote et al., 1998) by comparing simulated values with observations for flowering, harvest maturity, and final yield from variety trial data. Second, an optimization algorithm was used with the crop model to estimate coefficients for specific cultivars by minimizing root mean squared errors of fitting (RMSE) for anthesis, maturity date, and yield. Variety trial data were divided into three sets for separate coefficient estimation and evaluation steps: All , half of the experiments randomly selected, and a small orthogonal set . In the third step, prediction errors were estimated for independent data using coefficients estimated from the randomly selected and orthogonal subsets of data. Next, cross validation was used for estimating parameters and prediction errors. Finally, prediction errors were estimated when a linear-regression equation was used to predict anthesis, maturity, and yield for independent data.

CROPGRO Model
The CROPGRO–Soybean model (Hoogenboom et al., 1994; Boote et al., 1998) has been shown to adequately simulate crop growth at a field or research plot scale (Boote et al., 1998). The model requires inputs, which include management practices (cultivar, row spacing, plant population, fertilizer, and irrigation amounts and dates) and environmental conditions (soil type, daily maximum and minimum temperature, rainfall, and solar irradiance). From this information, daily growth of vegetative and reproductive components are computed as a function of daily photosynthesis, growth stage, and water and N stress (Boote et al., 1998; Hoogenboom et al., 1994).

CROPGRO–Soybean requires inputs for variety-specific traits (Boote et al., 1998) to describe: (i) cultivar sensitivity to day length and temperature, (ii) vegetative growth traits (e.g., maximum leaf-photosynthesis rate), and (iii) reproductive growth traits (e.g., potential seed size). A number of other coefficients relate to timing of vegetative and reproductive growth (e.g., time from first flower to first seed, Table 1) . These are measured in photothermal days, which combine the standard concept of degree days with a measure of day length. Cultivar differences include traits that influence life-cycle duration and degree of determinacy. Soybean cultivars are categorized into MG from 000 to XII, based primarily on their sensitivity to day length, which influences their life-cycle duration. Cultivar coefficients within a MG are generally similar across varieties (Boote et al., 1997) although individual cultivars may depart in one way or another from group norms. These cultivar coefficients, along with site and year-specific environmental variations, result in cultivar performance variability.


View this table:
[in this window]
[in a new window]
 
Table 1 Cultivar coefficients for the CROPGRO–Soybean model estimated in this study along with their definitions and units

 
Variety Trial Data
Variety trial data, which include flowering date, maturity date, and seed yield, were obtained from the Georgia Field Crops Performance Tests reports for 1987–1996 from the Georgia Agricultural Experiment Station (Raymer et al., 1994, 1997). Observed seed yields were decreased by 13% to convert them to a dry-mass basis for comparison with simulated yield. These trials were conducted in sets of 20 to 50 cultivars over multiple years and sites. From 10 yr of the Field Crops Performance Tests publications, cultivars from MG VII (`Colquitt', `Cook', `Hagood', `Perrin', `Stonewall', and `Thomas') were selected from five locations (Tifton, Plains, Midville, Athens, and Calhoun) ranging in latitude from 31.17° to 34.17°N. Only rainfed treatments were chosen because insufficient irrigation information was available for the irrigated trials. Forty location–year combinations were available, including both early and late planting dates. However, not all location–year combinations had all six cultivars, which resulted in different numbers of combinations for each cultivar. Plants at all locations were grown on rows 0.76 m apart at a density of 34 plants m-2. Data from each location–year included yield, and some combinations also included anthesis and maturity dates.

Weather data (daily solar irradiance, precipitation, and maximum and minimum air temperature) for each site were obtained from the Georgia Automated Environmental Monitoring Network (Hoogenboom, 1996; Hoogenboom and Gresham, 1997). The most common soil for each location in the variety trials was identified from soil surveys determined by Perkins et al. (1978, 1979, 1983, 1985, and 1986). Soil textures were loamy sand for Midville, sandy loams for Tifton and Plains, clay loam for Calhoun, and sandy clay loam at Griffin. The soil characteristics were then used by Mavromatis et al. (unpublished data, 1999) to calculate the physical and chemical parameters required to run CROPGRO (Ritchie, 1998; Tsuji et al., 1994). These soil profile data, summarized by soil textures, a dimensionless soil fertility factor (SLPF), rooting depth, and total soil water-holding capacity (between lower limit and drained upper limit) in Table 2 , were used in this study. The initial soil water at planting was set to field capacity for all years and locations. The effects of tillage, pests, and diseases were not directly considered in the simulations.


View this table:
[in this window]
[in a new window]
 
Table 2 Summary of soil information for each variety trial location used in this study. Each soil was assumed to have a root zone of 200 cm. Variations in lower (LL) and drained upper (DUL) limits of plant available soil water varied by soil type, resulting in differences in total plant available soil water among the locations

 
Evaluating Existing Generic Maturity Group Coefficients
Although the CROPGRO–Soybean model can simulate performance of individual cultivars in an environment, coefficients are only available for a few cultivars. The existing generic MG VII coefficients, provided with CROPGRO–Soybean (Boote et al., 1998), were first used to test the hypothesis that cultivar coefficients were different from MG norms, and model bias existed between observed and simulated anthesis, harvest maturity, and final yield. Soybean growth and yield were simulated for each of the MG VII cultivars for each location–year combination using generic MG coefficients available in the CROPGRO–Soybean model. Because these data were independent, RMSEP were computed using the following equation:

(1)
where n is the number of observations, ypi is the predicted value, and yoi is the observed variable for the specific cultivar in environment i. Predictions were averaged across all site-year combinations and RMSEP values were computed for anthesis, maturity, and yield.

Fitting and Evaluating Cultivar-Specific Coefficients
Three sets of data were used to estimate cultivar-specific coefficients to determine the sensitivity of coefficient estimates to number and choice of environments and estimate errors in predicting independent data. First, data from all location–year combinations were used to estimate coefficients for the Hagood cultivar. Simulated anthesis date, maturity date, and yield were compared with observed values. Second, half of the environments for each of the six cultivars were randomly selected for estimating cultivar-specific coefficients. Nonlinear least squares procedures were used to estimate the set of cultivar coefficients in Table 1 that minimized squared errors between simulated and observed variables. The fitting procedures are described later. The RMSE for the cultivar coefficients were computed using the same equation used for the RMSEP. The other half of the environments were used as an independent data set for validation, and RMSEP values were computed. Finally, a subset of the 40 environments was selected to create an orthogonal data set. Only 14 environments and four of the cultivars could be included in this subset because of the orthogonality requirement; these environments were not randomly chosen. Data for the 14 environments were used to fit cultivar coefficients, and the remaining independent data were used to compute the RMSEP. Since some of the data were missing for anthesis, the subset was orthogonal for maturity and yield, but not for anthesis. By chance, the orthogonal data set included more drought years (lower yields) and only early planted crops.

Methods for Estimating Cultivar Coefficients
Cultivar-specific coefficients (Table 1) were estimated for each of the three data groups. The steps and ranges of coefficients used in the estimation procedure were those described by Mavromatis et al. (unpublished data, 1999). However, the more efficient computational procedure developed by Welch et al. (1999, 2000) was necessary for cross validation, and thus the procedures followed by Mavromatis et al. (unpublished data, 1999) were implemented using a database of simulated results containing all possible combinations of coefficients. The CROPGRO–Soybean model was run to create the databases with combinations of coefficients from Table 1 that were needed to fit anthesis, maturity, and yield for each cultivar. For Hagood, this required 779240 runs.

Coefficients for anthesis were first estimated by searching through the database for values of CS-DL that minimized errors between simulated and observed variety anthesis dates for the particular set of environments that were used for fitting . CS-DL was varied over values between the generic MG V to MG IX (Boote et al., 1998). After fitting the coefficient CS-DL for anthesis, this new value was retained for estimating other coefficients.

The database was searched to estimate R1PRO and SD-PM by minimizing the sum of squares of error between simulated and observed maturity dates. The value of R1PRO for the MG VII cultivars was varied in the interval 0.2 to 1.0 h (Piper et al., 1996; Grimm et al., 1994). As SD-PM was varied, the coefficients FL-SH and FL-SD were also varied in proportion to the change in SD-PM to ensure that the relationships among these life-cycle duration variables remained constant. The values of FL-SD and SD-PM were varied by the same values in the interval (default for MG VII ± 4), which is supported by Piper et al. (1996). FL-SH was set to 0.625 times FL-SD.

Keeping the optimum coefficients obtained for flowering (CS-DL) and maturity (R1PRO, SD-PM, FL-SH, and FL-SD), we estimated coefficients to fit yield for individual cultivars across all sites and years. The optimum coefficients were identified by minimizing sum of squares of errors between simulated and observed yields, using a two-way grid search through the database on two groups of coefficients (LFMAX and THRESH; and SFDUR, PODUR, FL-SH, FL-SD, and SD-PM). These groups of coefficients were referred to in the search process as X1 and X2, respectively. Each coefficient in X1 was changed for each point in the grid search to either increase or decrease yield. Also, as X2 was incremented in the grid search, each cultivar coefficient in the X2 set was changed in proportion to its maximum change. In our study, LFMAX was allowed to vary from 0.93 to 1.13. Boote and Tollenaar (1994) reported LFMAX in a range from 0.82 to 1.39, with an average value of 1.05 mg CO2 m-2 s-1. A maximum change of ± 2.5% was used to search for the optimal value of THRESH (Vanderlip, Welch, and Schapaugh, unpublished data, 1998). This approach is described in more detail by Mavromatis et al. (unpublished data, 1999).

Cross Validation
We used cross validation to estimate cultivar-specific coefficients to evaluate the effectiveness of this approach for estimating coefficients and prediction errors. Because of the computational requirements of this approach, only one cultivar (Hagood) was used. Cross validation was first performed using all n=40 environments. The cultivar coefficients, RMSE, and RMSEP were estimated for anthesis date, maturity date, and yield. There was no independent estimate of the RMSEP because all data are used for parameter estimation and validation using cross validation.

In cross validation, all n environments can be used for both parameter estimation and model evaluation. The optimum cultivar coefficients were estimated n different times using n - 1 observations each time. During each iteration, a different observation (i) was left out, and yield or maturity and the error of prediction were computed for this it observation. The observation that was left out of the fitting procedure each time was treated as independent data, and coefficients obtained by fitting n - 1 observations were used in the crop model to predict that environment. By repeating this step for each observation, a total of n sets of coefficients and prediction errors were estimated. The average coefficient values and RMSEP were then computed.

To compare cross-validation results with those obtained from an independent set of observations, the randomly selected subset described above was used in a second cross-validation procedure. This provided estimates of the coefficients, RMSE, and RMSEP using the procedure explained above with cross validation for .

Linear-Regression Approach for Simulating Specific Cultivars
Cultivar x environment interactions can be determined by regressing observed cultivar variables (e.g., yields) vs. MG mean variables over all locations and years of available data. However, this approach does not provide a method for predicting cultivar responses in different environments. Therefore, we proposed a modified linear-regression approach that uses linear regression to fit cultivar variables to simulated values using the generic MG coefficients, assuming that these provide an estimate of mean response at each site. If successful, this approach would greatly simplify the procedures for incorporating new cultivars into application software that uses crop models.

For this part of the study, one cultivar (Hagood) and all n=40 environments were used to determine if this regression approach might provide an alternate way to predict yields for specific cultivars. First, all environments were used to estimate linear-regression coefficients for fitting observed cultivar anthesis, maturity, and yield to site means for all MG VII cultivars. The RMSE values for fitting the equations were computed to provide a reference for prediction errors. Secondly, generic MG coefficients were used to simulate MG means for all 40 environments using weather and soil data for each site-year planting date combination. Cross validation was used to fit linear-regression equations n times, leaving one observation out each time, to fit a-i and b-i in the equation:

(2)
where ypi is the predicted variable for observation i, and XMGi is the simulated site mean using generic MG coefficients. These a-i and b-i values were used to predict the response of the cultivar for the environment left out (i). This was repeated for all i data points resulting in values of a-i, b-i, and ypi. The RMSEP were then computed. The reader should note that we did not estimate any cultivar coefficients for the CROPGRO–Soybean model in this regression approach.


    Results and discussion
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results and discussion
 Conclusions
 REFERENCES
 
Generic Maturity Group Coefficients
We first evaluated the existing generic MG VII coefficients distributed with the CROPGRO–Soybean model (Boote et al., 1998) by comparing simulated values with observations for flowering, harvest maturity, and final yield for each cultivar across all sites, years, and planting dates. Summary statistics for the six cultivars are given in Table 3 . Using generic MG VII coefficients (Table 4) , CROPGRO predicted the average observed anthesis dates reasonably well although they were consistently early (except for Stonewall and Colquitt). The model simulated anthesis within 4.7% of the average value, and the RMSEP averaged 4.2 d across the six cultivars.


View this table:
[in this window]
[in a new window]
 
Table 3 Summary statistics in predicting anthesis and maturity dates and yield for the MG VII cultivars used in this study when generic MG VII cultivar coefficients were used in the CROGPRO–Soybean model. Statistics include root mean squared errors of prediction (RMSEP) and r2 of the cultivar regression line. N is the sample size

 

View this table:
[in this window]
[in a new window]
 
Table 4 Estimates of coefficients for generic MG VII cultivars and for Hagood fit by a nonlinear least-square procedure

 
For all cultivars except Perrin, days to maturity were simulated late compared with observations, averaging 2.8 d for Cook to 6.8 d for Stonewall (Table 3). The model simulated maturity within 2.5% of the average value, and the RMSEP averaged 7.1 d across the six cultivars. Furthermore, measured yield was simulated remarkably well for Cook . For the other five cultivars, mean measured yields were overestimated by 9.8% on average, indicating an expected model bias when using generic MG VII coefficients. On average, the RMSEP for yield was 484 kg ha-1 (21.0% of the yield averaged across six cultivars, Table 3).

Fitting Cultivar-Specific Coefficients
Fitting n=40 Environments
This procedure estimated coefficients for Hagood using n experiments as a reference against which to compare results obtained from various subsets of data. The estimates of coefficients and summary statistics for anthesis, maturity, and yield are given for Hagood in Tables 4 and 5 , respectively. Measured anthesis was simulated well after solving for the critical day length (Fig. 1) . The estimated CS-DL value (12.18) is typical for MG VIII cultivars (12.07). Fitting CS-DL resulted in an RMSE of 3.1 d (Table 5) compared with an RMSEP of 6.3 d when MG VII coefficients were used for prediction (Hagood results from Table 3). The simulated anthesis date averaged within 1 d of the observed average dates.


View this table:
[in this window]
[in a new window]
 
Table 5 Evaluation of fitting and predictions with independent data validation for anthesis, maturity dates, and yield for Hagood variety. Statistics include root mean squared errors of fitting (RMSE) and r2 of the cultivar regression line. N is the sample size

 


View larger version (12K):
[in this window]
[in a new window]
 
Fig. 1 Comparison of simulated vs. observed flowering date for Hagood using coefficients estimated from the Georgia variety trial data, 21 environments

 
Measured maturity was also simulated well by fitting the coefficients (Fig. 2) . The estimated R1PRO was higher than the MG VII value (Table 4). The values of FL-SH, FL-SD, and SD-PM were lower than the MG VII values, resulting in earlier observed maturity for Hagood. On average, simulated maturity dates were fit to within 1 d of observed dates (Table 5). The RMSE for maturity was 5.1 d. The LFMAX and THRESH values for Hagood were slightly higher than those for MG VII (Table 4), while SD-PM was lower (30.08 vs. 36.00). The mean measured yield was simulated remarkably well, within 0.5% of average measured yields (Table 5), although simulated yields for individual environments had an RMSE of 429 kg ha-1 (Fig. 3) . The RMSE was 18.2% of average yields after estimating coefficients.



View larger version (12K):
[in this window]
[in a new window]
 
Fig. 2 Comparison of simulated vs. observed maturity date for Hagood using the coefficients estimated from the Georgia variety trial data, 34 environments

 


View larger version (11K):
[in this window]
[in a new window]
 
Fig. 3 Comparison of simulated vs. observed seed yield for Hagood using the coefficients estimated from the Georgia variety trial data, 40 environments

 
Fitting Randomly Selected Half of Environments
This procedure provides insight about the stability of coefficients estimated from a limited, randomly selected dataset vs. those estimated by fitting all data. The estimated cultivar coefficients for all six cultivars are given in Table 6 . Measured anthesis was simulated remarkably well for all cultivars (Table 7) . The estimated CS-DL values were typical of MG VI (12.45) for Colquitt, Stonewall, and Thomas and similar to the MG VII norm (12.33) for Cook, Perrin, and Thomas. The value of 12.18 for Hagood was lower than the other five cultivars, but was the same value obtained by fitting to all environments. Mean measured anthesis was underestimated by <1% (Table 7), indicating a good fit relative to predictions using generic MG VII coefficients (4.68%, Table 3). On average, the RMSE for anthesis date was 1.3 d less than the RMSEP based on MG VII coefficients.


View this table:
[in this window]
[in a new window]
 
Table 6 Coefficient estimates for the six cultivars based on the use of half of the data, randomly selected

 

View this table:
[in this window]
[in a new window]
 
Table 7 Errors for fitting and predicting anthesis and maturity dates and yield for six cultivars using half of the data randomly selected for fitting and half for predicting. Statistics include root mean squared errors of fitting (RMSE) and r2 of the cultivar regression line. N is the sample size

 
R1PRO reached the maximum allowable value of 1.0 for three of the cultivars (Table 6). SD-PM values were higher except for Colquitt and Hagood, and FL-SH and FL-SD were lower for Thomas and Cook. Mean measured maturity was underestimated by <1% (Table 7), indicating an improvement over the use of generic MG VII coefficients (2.5%, Table 3). LFMAX, THRESH, and PODUR were higher for Hagood, while SFDUR was smaller (Table 6) compared with coefficients estimated by fitting all data (Table 4). PODUR and SFDUR were the same as MG VII values. The mean RMSE for all cultivars after fitting all coefficients was 17.2% of the actual yields (Table 7).

Fitting Orthogonal Dataset
Estimates of coefficients for Hagood are given in Table 4. The coefficients for maturity and yield were different from those found by fitting all data and half of the data although CS-DL values were similar. R1PRO, FL-SH, and FL-SD were lower while SD-PM was higher. LFMAX, THRESH, and PODUR were smaller while SFDUR was larger. These differences were not surprising because of the small sample size with only early planting-date environments.

Table 8 presents the results of fitting coefficients using the orthogonal dataset for four varieties. Mean measured anthesis dates for four cultivars were accurately fit, with errors averaging <1% (Table 8). The RMSE for anthesis were 4.5% of the mean dates for four cultivars, which is similar to results found when environments were used for estimating coefficients.


View this table:
[in this window]
[in a new window]
 
Table 8 Errors in fitting and predicting anthesis and maturity dates and yield for Cook, Hagood, Perrin, and Stonewall cultivars by using the coefficient from the orthogonal dataset. Statistics include root mean squared errors of fitting (RMSE) and r2 of the cultivar regression line. N is the sample size

 
Mean maturity was underestimated by <1%, and the RMSE were 2.6% of the actual maturity for four cultivars (Table 8). For this case, the model simulated grain yield within 1.4% of the average value. The RMSE for yield averaged 14.6% of the actual yield for the four cultivars, indicating a closer fit for this small data set relative to .

Validating Cultivar-Specific Coefficients
Randomly Selected Half of Environments
This case uses cultivar coefficients estimated from half of the environments to predict flowering, maturity, and yield for the environments that were not used in the fitting procedure. A good match between observed and simulated anthesis was found for all cultivars (Table 7). On average, the RMSEP were 3.08 d (5.3% of the mean observed flowering date, Table 7), showing better predictions than those using MG VII coefficients (7.1%, Table 3). The RMSEP for maturity was 5.28 d (3.7% of mean observed maturity) for the independent dataset, lower than that found by using MG VII coefficients (4.9%, Table 3). CROPGRO was able to simulate 98.7% of the observed variability in maturity at the validation sites.

The model simulated average grain yield within 4.6% of observed yield for the independent dataset. The RMSEP of yield was 416.3 (17.8% of mean observed yield), lower than that found when MG VII coefficients were used (21.0%, Table 3). These results showed that the cultivar coefficients estimated using half of the environments predicted soybean-variety responses better than generic MG VII coefficients.

Orthogonal Subset
At the validation sites, the model simulated consistently early flowering dates, possibly because the orthogonal subset had early planting dates. The independent validation dataset had late-planting dates. The mean differences between measured and simulated flowering dates varied from 0.3 d for Stonewall to 2.7 d for Perrin (Table 8). Using coefficients obtained by fitting the orthogonal data set, flowering was predicted better than predictions using MG VII coefficients, but not as good as those found by using coefficients from the randomly selected n=20 dataset. The RMSEP were 3.3 d, which was 6.2% of actual flowering (Table 8). This was slightly smaller than 7.6% found by using MG VII coefficients (Table 3) but higher than the value of 5.4% obtained from simulations using coefficients estimated from the n=20 environments (Table 7).

Prediction errors for maturity at the validation sites were higher than errors obtained by fitting the orthogonal data set. The mean differences between measured and simulated maturity dates varied from 5.5 d for Perrin to 8.0 d for Stonewall (Table 8). CROPGRO explained 94% (for Stonewall) to 96% (for Perrin) of the observed variability in maturity for the independent dataset. Our results also show that cultivar coefficients estimated using the small number of environments (10–14) were able to predict soybean responses only marginally better than using generic MG VII coefficients.

CROPGRO underestimated the actual yields for all cultivars (Table 8). Yield predictions were similar to those obtained using MG VII coefficients (Table 3) but not as good as those found by using coefficients from the randomly selected n=20 environments (Table 7). The model was able to explain 75.3 to 79.3% of the observed yield variability in the independent dataset using coefficients estimated from the orthogonal dataset.

Cross Validation
Cross validation was used with all n=40 environments and n=20 randomly selected environments for comparing prediction errors with those obtained by validation with independent data .

All Data (n=40 Environments)
Parameters estimated for Hagood using cross validation were only slightly different than those found by fitting n=40 environments without cross validation (Table 4). The RMSE values for fitting anthesis, maturity dates, and yield using cross validation were 3.1 d, 5.1 d, and 430 kg ha-1, respectively (Table 9) , similar to those found by fitting n=40 experiments without cross validation. Using cross validation, RMSEP values were 3.1 d, 5.07 d, and 443.2 kg ha-1 for anthesis, maturity, and yield, or 5, 3.5, and 18.7% of the observed mean values of these observations, respectively (Table 9).


View this table:
[in this window]
[in a new window]
 
Table 9 Evaluation of fitting and predictions with cross validation for anthesis, maturity dates, and yield for Hagood variety. Statistics include root mean squared errors of fitting (RMSE) and r2 of the cultivar regression line. N is the sample size

 
Randomly Selected n=20 Environments
Coefficient estimates were close to those found by fitting n=40 and n=20 randomly selected data without cross validation for Hagood (Table 4). The RMSE values for fitting were 3.08 d, 4.54 d, and 441.6 kg ha-1, or 5.1, 3.2, and 19.6% of actual anthesis, maturity, and yield, respectively (Table 9). The RMSEP for anthesis and maturity estimated by cross validation were similar to the RMSEP found by validation using an independent dataset (Table 5). The CROPGRO model explained 80% of the actual yield variation for the independent dataset for Hagood. Cross validation RMSEP values were 3.1 d, 5.3 d, and 448.2 kg ha-1, within 5.1%, 3.2%, and 19.9% of observed anthesis, maturity, and yield, respectively. These values were almost the same as those obtained by independent validation (3.49 d, 5.29 d, and 442.2 kg ha-1, respectively) (Table 5). Results confirm that cross validation was effective at estimating cultivar coefficients and prediction errors for the CROPGRO-Soybean model.

Linear-Regression Approach for Simulating Specific Cultivars
Table 10 shows the results of the linear-regression analysis of Hagood variables vs. environment means. Although yield was described accurately using the linear equation , flowering and maturity dates had higher fitting errors than any results based on the crop model (Table 8). One possible reason for this poor performance is that relationships are highly nonlinear for photoperiod effects, and changing the planting date could greatly change relationship of specific variety to average condition. Linear equations may not be able to fit this type of response. When cross validation was used to evaluate prediction errors for the linear regression–crop model approach, results were not acceptable for predicting flowering and maturity. The RMSEP values were 9.2 and 16.4 d for flowering and maturity, respectively. This was considerably higher than any of the predictions based on the crop model alone (Table 11) . However, the yield prediction error for Hagood (469.9 kg ha-1) was reasonably close to the value obtained by fitting cultivar-specific coefficients (443.2 kg ha-1, Table 5). A comparison of the RMSE and RMSEP in Tables 10 and 11 clearly illustrates the importance of evaluating errors in prediction before applying models for such purposes. The error in predicting yield using the linear-regression approach was considerably higher than the fitting error.


View this table:
[in this window]
[in a new window]
 
Table 10 Errors in fitting for the Hagood variety using observed cultivar means for each site and year as the independent variable in the linear regression equation. Statistics include intercept, slope, r2 of the cultivar regression line and root mean squared errors of the fitting (RMSE). N is the sample size

 

View this table:
[in this window]
[in a new window]
 
Table 11 Evaluation of cross validation results for the Hagood variety using simulated cultivar means for each site and year as the independent variable in the linear regression equation. Statistics include r2 of the cultivar regression line and root mean squared errors of prediction (RMSEP). N is the sample size

 

    Conclusions
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results and discussion
 Conclusions
 REFERENCES
 
The variety trial data used in this study provided an excellent set of data for estimating cultivar-specific coefficients for the CROPGRO–Soybean model and for evaluating the errors in prediction at independent sites. Our results show that the generic MG VII cultivar coefficients currently being distributed with the model can simulate flowering, maturity, and yield with reasonable accuracy for specific soybean cultivars when accurate soil and weather data are available. However, prediction errors were reduced for all six cultivars in this study by estimating cultivar-specific coefficients from the variety trial data, as indicated by the use of independent data from the trials for validation. When only 14 environments were used to fit coefficients, prediction errors were considerably higher for soybean yield than when 40 or 20 randomly selected environments were used to fit them. Coefficient estimates were sensitive to number and choice of observations. Coefficients and prediction errors estimated by cross validation were similar to those estimated by breaking the data set into calibration and validation sets. The linear-regression approach of predicting specific cultivar anthesis and maturity dates using simulated MG values at each site did not accurately predict anthesis or maturity. The RMSEP for seed yield using the linear-regression approach was better than using generic MG coefficients but not as good as that found by fitting the soybean model coefficients. We conclude that soybean yield can be simulated for specific cultivars using either crop model or regression approaches, but the regression approach was not adequate for describing cultivar anthesis and maturity dates. Cross validation was an effective approach for estimating cultivar coefficients and prediction errors.


    NOTES
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results and discussion
 Conclusions
 REFERENCES
 
Florida Agric. Exp. Stn. Journal Series no. R-07040. This work was supported in part by Project no. 9223 of the United Soybean Board.

Received for publication August 20, 1999.
    REFERENCES
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results and discussion
 Conclusions
 REFERENCES
 




This article has been cited by other articles:


Home page
Agron. J.Home page
K. J. Boote, J. W. Jones, W. D. Batchelor, E. D. Nafziger, and O. Myers
Genetic Coefficients in the CROPGRO-Soybean Model: Links to Field Performance and Genomics
Agron. J., January 1, 2003; 95(1): 32 - 51.
[Abstract] [Full Text] [PDF]


Home page
Agron. J.Home page
S. M. Welch, J. L. Roe, and Z. Dong
A Genetic Neural Network Model of Flowering Time Control in Arabidopsis thaliana
Agron. J., January 1, 2003; 95(1): 71 - 81.
[Abstract] [Full Text] [PDF]


Home page
Agron. J.Home page
G. Hoogenboom and J. W. White
Improving Physiological Assumptions Of Simulation Models By Using Gene-Based Approaches
Agron. J., January 1, 2003; 95(1): 82 - 89.
[Abstract] [Full Text] [PDF]


Home page
Crop Sci.Home page
T. Mavromatis, K. J. Boote, J. W. Jones, G. G. Wilkerson, and G. Hoogenboom
Repeatability of Model Genetic Coefficients Derived from Soybean Performance Trials across Different States
Crop Sci., January 1, 2002; 42(1): 76 - 89.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (19)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Irmak, A.
Right arrow Articles by Wilkerson, G. G.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Irmak, A.
Right arrow Articles by Wilkerson, G. G.
Agricola
Right arrow Articles by Irmak, A.
Right arrow Articles by Wilkerson, G. G.
Related Collections
Right arrow Soybean
Right arrow Crop Models


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
The SCI Journals Crop Science Vadose Zone Journal
Journal of Natural Resources
and Life Sciences Education
Soil Science Society of America Journal
Journal of Plant Registrations Journal of
Environmental Quality
The Plant Genome