Agronomy Journal Grow Your Career With ASA
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (24)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Wallach, D.
Right arrow Articles by Aubertot, J.-N.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Wallach, D.
Right arrow Articles by Aubertot, J.-N.
Agricola
Right arrow Articles by Wallach, D.
Right arrow Articles by Aubertot, J.-N.
Related Collections
Right arrow Maize
Right arrow Crop Models
Right arrow Irrigation
Agronomy Journal 93:757-766 (2001)
© 2001 American Society of Agronomy

AGROCLIMATOLOGY

Parameter Estimation for Crop Models

A New Approach and Application to a Corn Model

Daniel Wallach*,a, Bruno Goffinetb, Jacques-Eric Bergeza, Philippe Debaekea, Delphine Leenhardta and Jean-Noël Aubertotc

a Unité d'Agronomie, INRA, BP 27, 31326 Castanet Tolosan Cedex, France
b Unité de Biométrie et d'Intelligence Artificielle, INRA, BP 27, 31326 Castanet Tolosan Cedex, France
c Laboratoire d'Agronomie, INRA INA P-G, 78850 Thiverval-Grignon, France

* Corresponding author (wallach{at}toulouse.inra.fr)

Received for publication February 22, 2000.

    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
The adjustment of the parameters in mechanistic crop models to field data, using an automatic procedure, is essential to ensure efficient and objective use of measured data. However, it is in general numerically impossible, and in any case undoubtedly unwise, to adjust all the model parameters to the measured data. There is currently no widely accepted solution to this problem. This paper proposes a new approach to parameter adjustment, and applies it to a model of corn growth and development. One begins by defining a criterion of model goodness-of-fit, which should be adapted to the goal of the modeling exercise, and a corresponding criterion of model prediction error. For the latter we propose a cross validation version of the goodness-of-fit criterion. In Step 1 of the algorithm, one orders the parameters according to how much each improves the goodness-of-fit of the model. In the second step, the number of parameters actually adjusted is chosen to minimize the prediction error criterion. This approach has the advantage of explicitly using prediction quality as a criterion. As a by-product, it leads to adjusting relatively few parameters (in our example, 3 out of the 26 potentially adjustable parameters), which considerably reduces the numerical problems. The procedure is quite straightforward to apply, although it does require substantial computing time.

Abbreviations: himax, model parameter, maximum harvest index • LAI, leaf area index • MSEP, mean squared error of prediction • p2logi, model parameter that appears in the logistic equation for leaf area index • r2hi, model parameter that describes the effect of water stress on harvest index


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
MECHANISTIC crop models have become increasingly important in recent years as teaching and research tools, and as proposed tools for crop management. Such models contain, in general, a fairly large number of parameters (several 10s or even 100 or more), where a parameter is identified by the fact that it takes the same fixed value for all site-years or possibly by group of site-years.

There is often some initial information about the parameter values, based on specific measurements for that parameter reported in the literature. However, even for measurable parameters, one is often confronted with a range of possible values, due to measurement error or due to real differences for different circumstances.

The accumulation of errors in the different parameters, in addition to possible errors in the model equations, can lead to model results that are quite far from measured field data. A solution to this problem is to calibrate the model, that is to estimate some or all of the model parameters from field data to improve the fit between model and data. Parameter estimation for these models is not, however, a straightforward regression problem. First of all, the data structure is often quite complicated. The field data may consist of measurements of several different variables such as yield, biomass, leaf area index (LAI), etc. with possible correlation between data for the same site or the same year. Secondly, it is in general numerically impossible to estimate all the parameters, and the numerical problems arise partly because of more fundamental problems, such as high correlation between parameter estimates. Thirdly, one would like to make use of the information from the literature, but it is not clear how to do this with a regression approach.

There has been relatively little work on parameter estimation for crop models. A common approach is trial and error. Various parameter values are tested, until a set of values is found that gives an acceptable fit to the data. It clearly would be a great advantage to have instead an automatic procedure for parameter adjustment. This would be simpler to implement than trial and error and would probably give better results. It would ensure that the data are always used in the same way for parameter estimation, which is also important. For example, one common method of estimating prediction error is by cross validation, and this requires a reproducible procedure for parameter estimation. Also, model comparison is complicated if the models not only have different equations, but also have different procedures for parameter estimation. For example, Batchelor et al. (1994) were able to objectively compare different models for pod detachment in peanut (Arachis hypogaea L.) because they used the same data, in the same way, to adjust the parameters for each model. Finally, it would allow one to explore in detail the relation between the data and the parameter values, and to consider such questions as the relative efficiency of different experimental designs for parameter estimation, as for example in Yapo et al. (1996). The purpose of this paper is to propose an automatic procedure for adjusting the parameter values of crop models to field data.

The problems faced here are essentially the same as for other complex dynamic models. The following discussion on parameter estimation found in the literature therefore draws heavily on work in other fields, and in particular on work concerning hydrological models.

A first problem to consider is which parameters are to be adjusted to field data. In simple cases, such as a model to predict flowering date (Grimm et al., 1993), it may be possible to estimate all the model parameters (five in this case). In general, however, this is not possible. One approach is to decide a priori on a small number of parameters to adjust. For example, Sievänen and Burk (1993) decided not to adjust parameters whose values could be measured directly and so were considered well-known. A different approach is to do a sensitivity analysis of the model, and to adjust the most sensitive parameters (Yan and Han, 1991; Gribb, 1996; Xevi et al., 1997; Van der Perk, 1998). Sumner et al. (1997) started with a set of three parameters to estimate, then added further parameters one at a time if they reduced residual variance. Hanson et al. (1999) defined the problem differently. They required that the model fit the data to within a fixed margin, and they adjusted as many parameters as necessary to do so.

Often, the argument for reducing the number of parameters to estimate is based on numerical considerations. The estimation algorithms simply are incapable of finding a global optimum if there are too many parameters. A more fundamental reason for limiting the number of estimated parameters is that estimating too many parameters can decrease model predictive quality. This can occur because estimating a parameter from field data has two antagonistic effects on model predictive quality. On the one hand, it will tend to improve predictions. The amount of improvement will depend on how far the literature value is from the optimal value, and on the sensitivity of the model predictions to the parameter. On the other hand, estimating a parameter introduces errors, related to the fact that the parameters are estimated from a finite amount of data, which are themselves not perfect. In general, the overall result is that it is worthwhile to estimate some, but not all, of the model parameters (Wallach and Génard, 1998; Wallach et al., 1990). The optimal number of parameters to estimate will depend, among other things, on the amount of data. Thus, Refsgaard (1997) stresses the importance of keeping the ratio between the number of adjusted parameters and the amount of independent field data at a reasonably low level.

A second problem is the choice of the criterion to optimize in fitting the calculated model to measured data. The criterion can be based on statistical considerations. One approach is generalized least squares, which leads to using the variance–covariance matrix to weight the different residuals in the expression that is to be minimized. In cases where there are true replicates, these may be used to estimate the variance–covariance matrix (Vold et al., 1999). In other cases, various simplifying assumptions can be made. For example, Sievänen and Burk (1993) assumed their data (tree diameter, tree height, number of trees, and height of crown) were independent but with different variances for each data type, and they estimated those variances. Gribb (1996) weighted each different data type by the inverse of the mean values, which can be thought of as an approximation to weighting by the inverse of the variance. Sumner et al. (1997) used a first-order autoregressive model to model the covariance between measurements over time. If one assumes that the model errors have a normal distribution, then a possible statistical criterion is the likelihood, which is to be maximized. In practice, this leads to minimizing the determinant of the variance–covariance matrix (for example, Yan and Han, 1991).

It is not clear, however, that statistical arguments are the most pertinent for choosing the criterion to optimize. The statistical approach relies on hypotheses about the distributions of the residuals that may be difficult to justify for complex situations. Furthermore, the modeling effort may have some specific goal (for example, prediction of yield) that is not reflected in the optimal properties that result from a rigorous statistical treatment (for example, minimum variance of the parameter estimators). This implies that statistical considerations should be used to suggest, rather than to impose, a criterion. Thus, Gupta et al. (1998) argued that in general there is no compelling statistical reason to choose a particular criterion. Rather, one is in general interested in various different aspects of the fit of the model to the data, and each aspect may lead to different parameter values. They seek, then, not a single optimum set of parameter values, but rather sets of parameter values, where each set is optimal for some aspect of the data. Yan and Han (1991) also noted that there is not a single definition of best model, it depends on the purpose of the model and the user's judgement. They considered four different outputs of their precipitation–runoff model, and showed that estimating parameters using one output may give quite poor results for a different output. They finally fit their model to a criterion that combines the different outputs. Yapo et al. (1996) found that different adjustment procedures (in their case, daily root mean square error or heteroscedastic maximum likelihood) gave different results, the first giving a better match for flood peaks and the second a better match to the entire range of flow events.

A third problem is that of a numerical algorithm for searching the parameter space for values that minimize the chosen criterion. Algorithms that have been tested include a modified Levenberg-Marquardt method (Olsthoorn, 1995), the shuffle complex evolution method, the multiple start simplex and the local simplex (Gan and Biftu, 1996; Yapo et al., 1996), a simulated annealing algorithm (Sumner et al., 1997), and variants of a genetic algorithm (Franchini and Galeati, 1997).

Hanson et al. (1999) adopted what might be called a cascade approach. First they adjusted some parameters to soil water balance, then other parameters to soil nutrient data, and finally a last set of parameters to plant production data. In this way, the number of parameters adjusted in each step is small. Van der Perk (1998) and Zhang and Lindström (1997) also use this approach.

We thus see that a number of different approaches to parameter estimation are possible. A fourth problem, then, is what criterion to use for choosing between different adjustment approaches. One possibility is simply to use the procedure that gives the best fit of the model to the data, as measured for example by the coefficient of correlation (Grimm et al., 1993; Van der Perk, 1998). A problem here is that one is usually interested in reducing prediction error, which is not the same as adjustment error. In particular, adjusting additional parameters will always reduce adjustment error but may increase prediction error. Prediction error can be estimated by splitting the data in some way, using part for adjustment and part for evaluation. For example, Vinten and Redman (1990) estimated parameter values based on daily soil water and total drainage, while evaluation was based on daily leaching. Gan and Biftu (1996) adjusted parameters to part of the time series of data from a catchment, and then evaluate the model using a different time period. Refsgaard (1997) also used this type of data splitting, but in addition tested the model for other locations. Batchelor et al. (1994) used cross validation to estimate mean squared error of prediction (MSEP), smaller values of MSEP being better. A different criterion of quality is the acceptability of the estimated parameter values. It is usually demanded that they be reasonable, based on what is known elsewhere. Sievänen and Burk (1993) adopt this criterion.

The purpose of this paper is to propose and illustrate the use of a new algorithm for parameter adjustment for crop models. The approach here borrows a number of ideas as described above. The basic difference is that an estimate of prediction error is used to decide how many parameters to adjust. This is important for the quality of the resulting model, which is now specifically chosen for predictive quality. It also is convenient computationally, since it limits the number of parameters to be adjusted simultaneously. We note that procedures including cross-validation for estimating prediction error have been proposed in the statistical literature for the problem of choosing how many parameters to adjust (for example, Stone and Brooks, 1990).

To illustrate how the proposed algorithm is applied and to demonstrate that it is applicable to crop simulation models, we apply it to a new corn (Zea mays L.) model. This particular application is of practical interest, because this model is part of a project aimed at simulating irrigation strategies for corn.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Model
The model used is fairly similar to the one described in Muchow et al. (1990) and Muchow and Sinclair (1991). The model calculates crop growth and development and soil water on a daily basis. Biomass accumulation is based on the general interception/conversion concept (Monteith, 1972, 1977), and grain yield is calculated as final aboveground biomass times a harvest index (Sinclair et al., 1990; Muchow et al., 1990; Muchow and Sinclair, 1991). The effect of water on growth is taken into account. Nitrogen is assumed not to be a limiting factor, and is not included in the model. The state variables of the model are listed in Table 1, the input variables are given in Table 2, and the adjustable parameters in Table 3. The initial values for the parameters (Table 3) are based on information from the literature or expert opinion, but not on the data set used for adjustment. The model equations are given in the appendix.


View this table:
[in this window]
[in a new window]
 
Table 1. State variables in the model.

 

View this table:
[in this window]
[in a new window]
 
Table 2. Input variables in the model.

 

View this table:
[in this window]
[in a new window]
 
Table 3. Adjustable parameters in the model.

 
Data
A large data set related specifically to the effect of irrigation on corn in southwestern France was collected for the purpose of parameter adjustment. Only experiments where N was supposed to be nonlimiting were included. The data span the period from 1986 to 1997. Sixteen different sites are represented, with soil types representative of soils in southwestern France. Maximum available water in the profile at these sites ranged from 40 to 588 mm. Overall, the data come from 32 different site-years, and represented a total of 181 situations (that is, combinations of site, year, and irrigation treatment).

For all situations, the data set includes the input variables necessary for running the model (Table 2). Grain yield measurements are available for all, and final biomass measurements for almost all situations. In addition, there are multiple LAI measurements for 64 situations and multiple biomass measurements for 44 situations.

Based on the individual values for the replications in these experiments we can calculate average variances associated with each output variable. This gives {sigma}2y = 0.10 2, {sigma}2b = 0.30 2, and {sigma}2l = 0.038 for grain yield, biomass, and LAI, respectively.

Parameter Estimation Method
The proposed parameter adjustment algorithm has two steps. In the first step, an ordered list of parameters is created. The parameters are ordered according to how much they improve the fit of the model to the measured data, as explained below. In the second step, we determine how many parameters will be adjusted, based on model predictive quality.

To implement the algorithm, one must decide on a measure of goodness of fit for Step 1, and on what aspect of predictive quality will be estimated for Step 2. This choice of criteria is done in a preliminary step (Step 0). In this step we also define other criteria that are of interest for judging the quality of the models with different sets of parameter values.

Step 0. Criteria of Fit and of Predictive Quality
The choice of criteria is important, and should be adapted to the goal of the modeling exercise. The criteria presented below are adapted to our example. They are not meant to be universally applicable, but many of the underlying considerations that led us to choose these criteria will also apply in other cases.

For our example we use the following goodness-of-fit criterion:

(1)

In the first term, yi is measured grain yield for situation i, yci is the corresponding grain yield calculated with the model, N is the total number of situations, and wy is the weighting factor for grain yield errors. The sum is over all situations. In the second term bij is measured biomass at the jth biomass measurement date for situation i, bcij is the corresponding value calculated with the model, nbi is the total number of biomass measurement dates for situation i, and wb is the weighting factor for biomass error. The sum over i is only over those situations for which there is at least one biomass measurement. The number of such situations is Nb. The third term is analogous to the second, but for errors in LAI. The weightings we use are wy = {sigma}-2y, wb = {sigma}-2b, and wl = {sigma}-2l. Adjusting parameters means finding the values of those parameters that minimize Ccomb.

A first choice involved here is which model outputs are to be adjusted. We chose to adjust to all the data types available in the data set, that is grain yield, aboveground biomass, and LAI. The second possibility was to adjust to the particular data type that is of major interest. Since we intend to use the corn model to test different irrigation strategies, it is particularly important that the model give good predictions of how grain yield varies with a change in irrigation strategy, everything else being held constant. Specifically, this implies that we are interested in the prediction of yield loss, defined as the yield difference between an irrigation strategy and a reference strategy for the same site-year. We chose the first possibility because we felt that a model that does poorly for any data type has a basic problem, and probably will not give good predictions of grain yield loss. At the end of the parameter adjustment procedure we evaluated the prediction error for grain yield loss, and we also tested other criteria as explained below, to see if better grain yield loss predictions could be obtained.

A second choice concerns the relative weightings of the different available data. When a site-year has multiple biomass or LAI measurements, the site-year contribution to the criterion is the average (rather than the total) squared error for that site-year. This is done to avoid having a site-year with very numerous measurement dates dominate the criterion. In the above criterion the grain yield, aboveground biomass and LAI terms are each weighted by the inverse of the associated measurement variance, which is what one would use for independent data with different variances in generalized least-squares. As mentioned above, we tested these choices, at least partially, at the end of the procedure.

The criterion for model predictive quality has the same structure as the goodness-of-fit criterion, but the calculated values are based on cross-validation estimates of the parameters. The criterion then is:

(2)

The notation yci indicates the grain yield for situation i calculated using cross validation, that is where the model parameters are adjusted to the data independent of situation i. The meaning of bcij and lcik is analogous. The first sum is over all situations, the second and third sums are again only over those situations for which there is at least one biomass measurement, or one LAI measurement, respectively.

Cross validation is a common approach to estimating prediction error. The data are split into two parts; one part has only a single situation (the target situation), the other part has all the data independent of the target situation. In our case, the second part of the data only includes situations that are neither from the same site nor from the same year as the target situation. The model parameters are adjusted using the second part of the data, and the resulting model is used to predict the results for the target situation. In this way, the predictions are for a situation that was not used for parameter estimation. The procedure is repeated using every situation in turn as the target situation. Averaging the errors over all the target situations gives the overall estimate of prediction error. Note that the data used for parameter estimation are the same for all target situations in the same site-year.

To limit the amount of computation, the choice of which parameters to adjust is done just once, based on all the data, and is not repeated. It is only the calculation of the adjusted values for the chosen parameters that is repeated for each target situation.

In addition to the criterion MSEPcomb, it is also convenient to define other prediction criteria, which are not an integral part of the algorithm but that will help to judge the final model. Specifically, we will evaluate the following criteria with the final model:

(3)

(4)

(5)

(6)

The first criterion measures how well grain yield loss related to irrigation is predicted. Here yrefi is measured grain yield for the reference treatment for situation i, and yrefci is the corresponding calculated value. As reference treatment for situation i, we choose the treatment in the same site-year that received the maximum amount of irrigation water. The other criteria correspond respectively to mean squared errors of prediction for grain yield, biomass, and LAI.

Step 1. Which Parameters?
The parameters are ordered according to how much Ccomb is reduced when that parameter is adjusted. The procedure is analogous to forward regression. First each parameter is adjusted to the data individually to minimize the criterion Ccomb, while the other parameters all keep their initial values. The parameter that leads to the smallest value of Ccomb is the first parameter in the list. Next, all combinations of the best single parameter with another parameter are adjusted to Ccomb. The best second parameter is added to the list, and so on. The result of this step of the adjustment procedure can be summarised in a table like Table 4, which shows the best one, two, three, and four parameters to adjust, as well as their adjusted values. We will see below that it is not necessary in our example to go beyond four adjusted parameters.


View this table:
[in this window]
[in a new window]
 
Table 4. Best 1, 2, 3, or 4 parameters to adjust to minimize Ccomb, the criterion that combines yield, biomass, and leaf area index errors. For comparison, the case of no adjusted parameters is included. The best final model (smallest value of MSEPcomb) is that with three adjusted parameters. For each set of adjusted parameters, the root mean squared prediction errors for grain yield loss related to irrigation (MSEPyloss), for yield (MSEPy), for biomass (MSEPb), and for leaf area index (MSEPl) are shown.

 
The forward regression type approach is important for reducing the computational burden. In our example there are just 23 four-parameter models to examine with this approach, out of the total of 15600 possible four-parameter models.

Minimization of the criterion was done using NAG routine EO4JAF, a quasi-Newton algorithm (Numerical Analysis Group, 1995).

Step 2. How Many Parameters?
We now estimate the prediction error MSEPcomb for the best models with one, two, etc. adjusted parameters. Thus, in the example we carry out four estimations of prediction error, corresponding to the four columns in Table 4. The model that is finally chosen is that with the smallest prediction error.

Evaluation of Criteria
The criteria pair Ccomb and MSEPcomb that we propose in Step 0 combine the different data types for which we have measurements. It is of interest to compare the results with results based on other criteria pairs. In our example, we use for comparison the pairs (Cyloss, MSEPyloss), (Cy, MSEPy), (Cb, MSEPb), and finally (Cl, MSEPl), where:

(7)

(8)

(9)

(10)

The MSEP criteria are those of Eq. [3] to [6].

Applying the algorithm with the pair Cyloss and MSEPyloss means that the model is specifically adjusted to the grain yield loss data, and the number of parameters is chosen to minimize prediction error for yield loss. The other pairs of criteria correspond to fitting the model specifically to the grain yield, biomass, and LAI data, respectively, and minimizing prediction error for each of those types of data.

The value of MSEPyloss for the best model based on the criteria Cyloss and MSEPyloss shows how well one can do in predicting yield loss when the adjustment algorithm is specifically based on the grain yield loss data. This is to be compared to the value of MSEPyloss for the best model based on Ccomb and MSEPcomb. If the MSEP values are comparable, the conclusion is that the combined criteria pair is acceptable for yield loss. Similarly, MSEPy, MSEPb, and MSEPl for the best model based on Ccomb and MSEPcomb are compared to the corresponding values for the best models based on Cy and MSEPy, Cb and MSEPb, and Cl and MSEPl.


    RESULTS AND DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Table 4 summarizes the results for the criteria pair Ccomb and MSEPcomb. The prediction criterion MSEPcomb has a minimum for three adjusted parameters. It is thus the model with three adjusted parameters that is chosen as the best model, and that we will use in future work on testing irrigation strategies. To identify the minimum in this case, it is not necessary to adjust more than four parameters simultaneously.

Table 5 shows the best models based on the criteria related to grain yield loss, yield, LAI, and biomass. To obtain the results in the first column, for example, we first created an ordered list of parameters according to how much they improved the goodness of fit for grain yield loss, Closs. Then the number of parameters to adjust was chosen to minimize prediction error for grain yield loss, MSEPloss. The other columns correspond to fitting the model and minimizing prediction error for grain yield, biomass, and LAI. Depending on the criteria, models with different sets of parameter values are obtained. This emphasizes the fact that the procedure does not calculate true parameter values, but simply the parameter values that minimize the chosen criteria. The grain yield loss criteria lead to {surd}MSEPyloss = 1.38 t/ha, the grain yield criteria to {surd}MSEPy = 1.49 t/ha, the biomass criteria to {surd}MSEPb = 2.23 t/ha, and the LAI criteria to {surd}MSEPl = 0.90. The best model based on (Ccomb, MSEPcomb) has corresponding {surd}MSEP values that are very close to these values (Table 4, last four rows, column for three adjusted parameters). That is, the combined criteria lead to predictions nearly as good as those obtained using criteria that are specifically tailored to each type of data. We thus conclude that the weightings in the combined criterion are acceptable.


View this table:
[in this window]
[in a new window]
 
Table 5. Best models based on criteria related to yield loss, yield, biomass, or leaf area index. In each case the choice of parameters to adjust was based on the goodness-of-fit criterion C, and the number of parameters to adjust was based on the prediction error MSEP. For each set of adjusted parameters, the root mean squared prediction errors for yield loss (MSEPyloss), for yield (MSEPy), for biomass (MSEPb), and for leaf area index (MSEPl) are shown.

 
Table 5 shows that using criteria based on a single data type can give poor results for other data types. For example, the best model based on grain yield loss criteria has an estimated {surd}MSEP for yield of 2.1 t/ha, which is substantially larger than the value of 1.5 t/ha obtained with the combined criterion. Thus, the use of combined criteria is important if one seeks good predictions for several data types.

The pairs of initial and final values for the adjusted parameters of the best model (Table 4) are (0.01, 0.0093) for p2logi, (0.8, 0.00361) for r2hi, and (0.55, 0.48) for himax. The parameter p2logi is related to the exponential rate of increase in LAI. The adjusted value is quite close to the initial value. The parameter himax is the maximum harvest index. The adjusted value of 0.48 is smaller than the initial value, and is in fact smaller than the measured harvest index values for certain situations in the data set. According to the model, only water stress can alter harvest index. There are certainly other effects, which are not taken into account. The result is that for prediction, the best value for maximum harvest index is in fact the average harvest index of the treatments without a water stress effect. This is a choice in the model. A different choice would be to try to describe harvest index in more detail, but that would involve introducing extra equations, each with some accompanying uncertainty. In any case, this emphasizes the difficulty in relating the adjusted parameter values to crop characteristics that can be measured independently. While the model parameter is identified as "maximum harvest index," the best value for prediction is in fact an average harvest index for the treatments with no simulated water stress.

The parameter r2hi represents the value of the fraction of available water below which harvest index is affected, during the critical period of grain formation. The initial value was chosen to reflect the hypothesis that even moderate water stress during this period reduces harvest index. The adjusted value of r2hi is in contradiction to this hypothesis. The adjusted value of r2hi = 0.00361 means that as long as available soil water does not fall to very small levels, there is no effect on harvest index. The result is that according to the model with adjusted parameters, all situations in the data set attain maximum harvest index or nearly so. In fact, there are some situations in the data base that have harvest index values much smaller than the value of himax. This result clearly indicates a problem with the model. One possibility is that the precise formulation in the model is inappropriate. The model assumes that it is only in the period dmax to dcri that harvest index is affected by water stress, and that the reduction in harvest index depends on the average stress factor over this period. In future work, we will examine how model behavior varies if these assumptions are changed.

Before parameter adjustment, the model severely underestimated grain yield, and severely overestimated grain yield loss, for a fairly large group of situations (Fig. 1A, 2A). The best adjusted model largely corrects this tendency (Fig. 1B, 2B). The estimated prediction error is 40% less both for grain yield and for grain yield loss with the adjusted model compared with the unadjusted model.



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 1. Calculated vs. observed yields, for (A) initial parameter values and for (B) the best model as given in Table 4. The 1:1 line is shown. For the model with no adjusted parameter values, Cy = MSEPy by definition.

 


View larger version (25K):
[in this window]
[in a new window]
 
Fig. 2. Calculated vs. observed yield losses, for (A) initial parameter values and for (B) the best model as given in Table 4. The 1:1 line is shown. For the model with no adjusted parameter values, Cyloss = MSEPyloss by definition.

 

    CONCLUSIONS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 
We propose a procedure for adjusting the parameters of a crop simulation model to field data. First the parameters are ordered according to how much each improves the fit of the model to the measured data, and then the number of parameters to adjust is based on an estimate of prediction error. The criteria of goodness-of-fit and of prediction quality are chosen in accordance with the goals of the modeling exercise, and the available data.

The adjustment procedure does not require any specific model structure. Also, it is computationally feasible, even if the model has a large number of parameters. In our example, with a total of 26 adjustable parameters, we had to adjust simultaneously at most four parameters. This number will not be the same for other models, but the use of prediction error as a criterion will probably lead to adjusting relatively few parameters in other models as well. It should be noted, however, that it is still important to have efficient code, as emphasized by Villeneuve et al. (1986). Our example required overall more then 1 million model runs. To make this possible, we wrote the code so that all input files were read only once, at the beginning of the procedure. For other crop models, some reworking of the code will probably be necessary to reduce computing time for the algorithm to manageable proportions, but increasing computer speed will reduce the problem.

The procedure is also very general in the types of data that can be accommodated. In our example grain yield, biomass, and LAI measurements were all used. If other data types had been available, the combined criterion could have been modified to include them.

Prior information about the parameters, not based on the data set used for adjustment, is used in our procedure to determine the initial values of the parameters. Most of the parameters retain these initial values. The adjusted parameters on the other hand get new adjusted values, which are independent of the initial values. This is thus an all-or-nothing use of the prior information. One can imagine making more use of the prior information, especially if one had upper and lower limits, or probability distributions, for the parameter values. However, it should be remembered that the relation between a model parameter and a measurable characteristic is not always clear. It is thus not clear how useful prior information can be.

The limitations of parameter adjustment for model improvement must be reemphasized. Obviously the equations used in the model are also essential, and parameter adjustment can only compensate very partially for errors in these equations. Also, the adjusted parameter values obviously depend on the data set used for adjustment. If these data are not representative of conditions where the model will be used, then the adjusted model may not be a good predictor for those conditions. Finally, it is not possible to know with certainty which parameter values really give the best predictions. The algorithm presented here only gives an approximation, albeit hopefully a rather good approximation, to the best parameter values for prediction.

There are nonetheless important advantages to using an automatic parameter adjustment algorithm. It can simplify and improve parameter estimation and it can help separate problems and questions related to parameter estimation from those related to the model equations. In particular, we plan to use this estimation algorithm to study three specific questions related to model development: (i) What is the effect of uncertainty in the initial parameter values on model predictive quality (Wallach and Génard, 1998)? (ii) How does model predictive quality depend on the structure of the data set? (iii) What is the effect of increasing the number of parameters, due to increasing the complexity of the model, on model predictive quality?

MODEL EQUATIONS
The model calculations begin on 1 January, at which date soil moisture is initialized (see Table 1). It is found that the simulation results are insensitive to the assumed soil water at this date, because in any case maximum available water is usually attained at some time during the winter in southwestern France, where the model is to be used. Once this occurs, subsequent results are independent of the initial soil water. Thermal time calculations begin on sowing date, and thermal time is accumulated until emergence occurs. At that point thermal time is reinitialized to zero, and the crop state variables are initialized as shown in Table 1.

Thermal Time and Plant Development
The increment of thermal time on calendar day d, {Delta}TTd, is

(11)
where the functions max and min evaluate, respectively, to the maximum and the minimum of their arguments.

Seven stages of plant development are identified: sowing, emergence, maximum LAI, flowering, end of critical step for grain abortion, beginning of rapid leaf senescence and maturity.

The thermal times between stages are noted TT with a subscript to indicate the stages. We fix ttsow-eme = 80°C day, ttmax-flo = 90°C day, ttflo-cri = 250°C day, and ttcri-sen = 245°C day. The values of tteme-max and ttsen-mat are variety dependent. They are calculated from the values for ttsow-max and ttsow-mat given in the corn variety catalogue (AGPM, 1999) as tteme-max = ttsow-max - ttsow-eme and ttsen-mat = ttsow-mat - ttsow-eme - tteme-max - ttmax-flo - ttflo-sen. The calendar day when each stage is reached is noted d with a stage subscript, for example dmat.

Leaf Area Index
We assume that in the absence of water stress, leaf area per plant on day d, lplantd, is described by a logistic equation as a function of thermal time:

(12)

The increase in leaf area per plant on day d in the absence of water stress, {Delta}lplantd, is then calculated as a difference:

(13)

(14)

The increase in leaf area index (LAI) on day d, {Delta}LAId, is then

(15)
where reduc(ATPT; r1sf, r2sf) is a reduction function due to water stress (Muchow and Sinclair, 1991). Here the reduction depends on ATPTd, the ratio of actual to potential transpiration on day d (see below). The reduction function is defined as

The fraction of LAI that is senescent on day d, FSENd, is given by

(16)

This equation is based on Muchow and Carberry (1989), but modified so that the fraction of senescent leaf area is the same for all varieties at maturity. Active LAI is then

(17)

Root Depth
The increase in root depth on day d, {Delta}Rd, is proportional to thermal time increase, but cannot exceed the larger of maximum root depth or soil depth. Thus

(18)

Aboveground Dry Matter
Biomass accretion depends on intercepted radiation and radiation use efficiency (rue). The radiation use efficiency depends on water stress (Muchow and Sinclair, 1991) and development stage. The increase in aboveground dry matter on day d, {Delta}Bd, is then

(19)
with rue = rue1 for TTd <= tteme-sen and rue = rue2 for TTd > tteme-sen.

Harvest Index
Potential harvest index is calculated as a function of water stress during the period around flowering, from dmax to dcri:

(20)

The increase in harvest index each day, {Delta}Hd, is then

(21)

(22)
Yield is calculated on day dmat as

(23)

Soil Processes
In the model, the soil is divided into four layers. Layer 1 is a thin superficial layer from 0- to 2-cm depth used to determine total soil water evaporation (Van Keulen, 1975), Layer 2 goes from 2 to 30 cm (approximate plowing depth), Layer 3 from 30 cm to root depth Rd, and Layer 4 from Rd to soil depth. As indicated, the positions of Layers 3 and 4 vary with time as root depth increases. If Rd < 30 cm, then Layer 3 does not exist (thickness 0), and if Rd >= soil depth, then Layer 4 does not exist.

Let {theta}FCi,d and {theta}WPi,d represent, respectively, volumetric water content of Layer i at field capacity and at wilting point. These values depend on the day d for Layers 3 and 4 because the depths included in those layers vary with time. The thickness of layer i on day d is noted THi,d. The water content of layer i on day d as an equivalent depth of water is noted Si,d. Then we have

(24)

(25)

(26)

Soil Water Evaporation
Potential evapotranspiration (ETPd) is divided among potential evaporation (PEd) and potential transpiration (PTd) (Ritchie, 1972) as:

(27)

(28)
Following Van Keulen (1975), actual soil water evaporation is a function of the fraction of maximum available water in Layer 1:

(29)

This soil water evaporation is divided among the two top layers. The relative losses for Layers 1 and 2 are

(30)

(31)
The amount of soil water evaporation from Layers 1 and 2, AE1,d and AE2,d respectively, is then

(32)

(33)

Transpiration
Actual total transpiration on day d, ATd, is given by

(34)
where reduc(FAW23; r1tran, r2tran), the reduction factor, depends on the fraction of maximum available water in the Layers 2 and 3, FAW23:

(35)
The ratio of actual to potential transpiration is ATPTd:

(36)
Transpiration demand is distributed between Layers 2 and 3 in proportion to the thickness of each. The relative demands from Layers 2 and 3, TD2,d and TD3,d, are respectively

(37)

(38)
The transpiration loss from Layers 2 and 3 is denoted AT2,d and AT3,d. If the available water in both layers is sufficient to meet demand, then

(39)

(40)
If not, then the shortfall from the layer with insufficient water is made up from the other layer.

Water Budget
At the end of day d, the water contents of the four soil layers (i) are updated in turn. We use an intermediate variable STEMPi,d, which is equal to the previous day's water content plus today's additions, INPUTi,d, minus today's soil water evaporation and transpiration:

(41)

(42)

(43)
with

(44)

(45)

The above operations concern the soil layers as defined at the start of day d. Because the thickness of the soil Layers 3 and 4 can evolve with time, it is necessary at the end of each day to calculate new values of THi,d, {theta}FCi,d, {theta}WPi,d, and finally Si,d for the new thicknesses.


    ACKNOWLEDGMENTS
 
We are thankful to the AGPM (Association Générale des Producteurs de Maïs), the ITCF (Institut Technique des Céréales et Fourrages), the CACG (Compagnie d'Aménagement des Coteaux de Gascogne), and Agrotransfert Poitou-Charentes for providing the data in our data set. We also gratefully acknowledge the substantial help of Nicole Bosc with the computer work.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 CONCLUSIONS
 REFERENCES
 




This article has been cited by other articles:


Home page
J Exp BotHome page
J. L. Bot, C. Benard, C. Robin, F. Bourgaud, and S. Adamowicz
The 'trade-off' between synthesis of primary and secondary compounds in young tomato leaves is altered by nitrate nutrition: experimental evidence and model consistency
J. Exp. Bot., November 1, 2009; 60(15): 4301 - 4314.
[Abstract] [Full Text] [PDF]


Home page
jashsHome page
I. Grechi, N. Hilgert, M. Genard, and F. Lescourret
Assessing the Peach Fruit Refractometric Index at Harvest with a Simple Model Based on Fruit Growth
J. Amer. Soc. Hort. Sci., March 1, 2008; 133(2): 178 - 187.
[Abstract] [Full Text] [PDF]


Home page
J Exp BotHome page
B. Wu, M Genard, P Lobit, J. Longuenesse, F Lescourret, R Habib, and S. Li
Analysis of citrate accumulation during peach fruit development via a model approach
J. Exp. Bot., July 1, 2007; 58(10): 2583 - 2594.
[Abstract] [Full Text] [PDF]


Home page
J Exp BotHome page
M Genard, N Bertin, C Borel, P Bussieres, H Gautier, R Habib, M Lechaudel, A Lecomte, F Lescourret, P Lobit, et al.
Towards a virtual fruit focusing on quality: modelling features and potential uses
J. Exp. Bot., March 1, 2007; 58(5): 917 - 928.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
M. Genard and B. Gouble
ETHY. A Theory of Fruit Climacteric Ethylene Emission
Plant Physiology, September 1, 2005; 139(1): 531 - 545.
[Abstract] [Full Text] [PDF]


Home page
Agron. J.Home page
M. Donatelli, M. Acutis, G. Bellocchi, and G. Fila
New Indices to Quantify Patterns of Residuals Produced by Model Estimates
Agron. J., May 1, 2004; 96(3): 631 - 645.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (24)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Wallach, D.
Right arrow Articles by Aubertot, J.-N.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Wallach, D.
Right arrow Articles by Aubertot, J.-N.
Agricola
Right arrow Articles by Wallach, D.
Right arrow Articles by Aubertot, J.-N.
Related Collections
Right arrow Maize
Right arrow Crop Models
Right arrow Irrigation


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
The SCI Journals Crop Science Vadose Zone Journal
Journal of Natural Resources
and Life Sciences Education
Soil Science Society of America Journal
Journal of Plant Registrations Journal of
Environmental Quality
The Plant Genome