|
|
||||||||
a National Institute of Agro-Environmental Sciences, 3-1-1 Kannondai, Tsukuba, Ibaraki 305-8604, Japan
b Rice FACE Project, Japan Science and Technology Corp.National Institute of Agro-Environmental Sciences, 3-1-1 Kannondai, Tsukuba, Ibaraki 305-8604, Japan
clasman{at}niaes.affrc.go.jp
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: CV, coefficient of variation KS, experiment location in Kansas LCS, lack of correlation weighted by the standard deviations MD, mean of the deviations MSD, mean squared deviation MSV, mean squared variation NE, experiment location in Nebraska NY, experiment location in NY RMSD, root mean squared deviation RMSE, root mean squared error SB, squared bias SDm, standard deviation of the measurement SDs, standard deviation of the simulation SDSD, squared difference between standard deviations
| INTRODUCTION |
|---|
|
|
|---|
The correlationregression approach is very common in fitting an empirical model to data obtained from experiments or surveys. The model parameters are adjusted to give the best fit of the empirical model to measurement. Software packages are readily available to plot the data and fit the model. This approach is hence familiar and convenient for most scientists, and may be a reason why this approach is adopted for the comparison between calculated values and measurement when the model is mechanistic rather than empirical. As shown below, however, regression is not ideal for this type of comparison, where comparison between the calculated values and measurements rather than fitting of the model to the measurement is of concern.
Henceforth, simulated value for a growth trait is denoted as x, and measured value is denoted as y. It is assumed that y is the sum of the true mean (µ) and the random error (
) associated with the measurement, namely
![]() | (1) |
In regressing y on x, a linear relationship is assumed between x and µ, namely
![]() | (2) |
![]() | (3) |
![]() | (4) |
These hypotheses can be translated into the relationships between x (simulated value) and µ (true mean value)
![]() | (5) |
![]() | (6) |
By contrast, in the direct comparison between x and y, the null (H0) and alternative (H1) hypotheses are
![]() | (7) |
![]() | (8) |
The difference between the regression and the direct comparison is in the alternative hypotheses (Eq. [6] vs. [8]). The regression analysis assumes the linear relationship between x and µ under the alternative as well as the null hypotheses, but this assumption is not guaranteed and should not be taken for granted. If each measurement is based on replicated measurements, the variance of the error term (
of Eq. [1]) can be estimated independently from the assumption of the linear relationship (Draper and Smith, 1981, p. 3338). The error variance is then used to test the assumption. If the linear assumption (Eq. [2]) is rejected, the linear regression is inadequate. A curvilinear relationship may be sought, but it is possible that no continuous function fits the relationship between x and y. Note, however, that the user's concern lies more in the comparison between x and y than in the functional relationship between the two. The direct comparison between x and y can always be made by testing the equality hypothesis (Eq. [7]) against the nonequality hypothesis (Eq. [8]).
A more relevant criterion for the direct comparison than regression is the deviation (d) of the model output (x) from the measurement (y), namely
![]() | (9) |
When the comparison is made for n measurements, d can be computed for each measurement, namely
![]() | (10) |
![]() | (11) |
Another commonly used criterion is the MD, namely
![]() | (12) |
In literature, RMSD is often referred to as root mean squared error (RMSE) (Retta et al., 1996), and MD is often called bias (Retta et al., 1996; Jamieson et al., 1998). Of these two statistics, RMSD represents the mean distance between simulation and measurement; MD is the difference between the means of simulation and measurement. Root mean squared deviation and MD thus represent different aspects of the overall deviation, but the relationship between the two has not been well defined.
In literature, these deviation-based statistics are often used in conjunction with correlation and regression coefficients (Addiscott and Whitmore, 1987; Retta et al., 1996; Kiniry et al., 1997; Jamieson et al., 1998). Although these different statistics may represent somewhat different aspects of the modelmeasurement discrepancy, it is not clear how the different statistics relate to each other, and if these statistics cover all aspects of the discrepancy sufficiently. It is also noteworthy that, as shown before, the deviation-based statistics (e.g., RMSD) and the correlation-based statistics (e.g., the correlation coefficient) are not really consistent with each other in their assumptions.
Our objective is to present a framework for the simulation vs. measurement comparison. The framework is based on the deviation (Eq. [9] and [10]), yet includes the correlation coefficient as a constituent.
| Derivation of the method |
|---|
|
|
|---|
![]() | (13) |
Mean squared deviation is the square of RMSD (Eq. [11]); i.e., MSD = RMSD2. The lower the value of MSD, the closer the simulation is to the measurement. The MSD can be partitioned into two components, namely
![]() | (14) |
and
are the means of xi and yi (i = 1, 2...n), respectively. The first term of the right side of Eq. [14] represents the bias of the simulation from the measurement and is denoted as SB, namely
![]() | (15) |
Squared bias is the square of the MD (Eq. [12]); i.e., SB = MD2.
The second term of the right side of Eq. [14] is the difference between the simulation and the measurement with respect to the deviation from the means (i.e., xi -
and yi -
) and is denoted as mean squared variation (MSV), namely
![]() | (16) |
A bigger MSV indicates that the model failed to simulate the variability of the measurement around the mean. Note that these two components, SB and MSV, are orthogonal and can be addressed separately.
Mean squared variation can be further partitioned into two components as shown below. For the partitioning, standard deviation of the simulation is denoted as SDs, that of the measurement is denoted as SDm, and correlation coefficient between the simulation and measurement is denoted as r, namely
![]() | (17) |
![]() | (18) |
![]() | (19) |
After some rearrangement, MSV in Eq. [16] can be rewritten as
![]() | (20) |
The first term of the right side of Eq. [20], called SDSD here, is the difference in the magnitude of fluctuation between the simulation and measurement, namely
![]() | (21) |
A larger SDSD indicates that the model failed to simulate the magnitude of fluctuation among the n measurements. The second term of the right side of Eq. [20] is essentially the lack of positive correlation weighted by the standard deviations, and is denoted here as LCS, namely
![]() | (22) |
A bigger LCS means that the model failed to simulate the pattern of the fluctuation across the n measurements.
With all the above terms combined, the MSV and MSD can be written as
![]() | (23) |
![]() | (24) |
In Eq. [24], it should be noted that SDSD (Eq. [21]) and LCS (Eq. [22]) are not entirely independent. They share the same constituents, SDs (Eq. [17]) and SDm (Eq. [18]). Hence a bigger SDs would increase both SDSD and LCS, if SDs > SDm. Through some rearrangement, the relative sizes of SDSD and LCS can be evaluated (see Appendix A for details). Across a major portion of the possible range of r and the ratio (
) of SDs to SDm, LCS contributes more to MSD than SDSD does, although there are some combinations of r and
that make SDSD greater than LCS (see Appendix A).
The above components of MSD can be calculated from the coefficients of regression. As in Eq. [2], a is the slope of the regression line and b is the y-intercept. Then, the components of MSD are given as
![]() | (25) |
![]() | (26) |
![]() | (27) |
In the special case when a = 1 and b = 0,
![]() | (28) |
![]() | (29) |
![]() | (30) |
![]() | (31) |
Thus, comparison based on correlation becomes equivalent to the comparison based on MSD in this special case, as long as the comparison is made within the same data set (i.e., fixed SDm).
Examples
The MSD-based approach presented above is here applied to published results of comparison between model simulation and measurement. Note that our intention is not to reanalyze the results against the original work, but to show examples of this approach compared with the correlationregression approach.
Example 1
The results of Kiniry et al. (1997), as mentioned before, are analyzed below for the comparison of simulated and measured maize yields across 10 yr within each of the nine USA locations. The MSD and its components were calculated from r,
,
, and the coefficient of variation (CV) published in their paper. Figure 1
illustrates the correlation-based comparison among the nine locations, and Fig. 2
is the MSD-based comparison. For ease of comparison between the two approaches, Fig. 1 shows the lack of fit of the regression (1 - r2) on the main vertical axis on the left, and the correlation coefficient (r) on the auxiliary vertical axis on the right.
|
|
By contrast, on the basis of MSD, NY is among those with smaller MSD (Fig. 2) than other locations. This is because SDm is smallest (0.58) for NY among the nine locations, and SDs is also small (0.59 in CERES-Maize and 0.93 in ALMANAC). With these small SDm and SDs values, LCS (Eq. [22]) is small and so is MSD (Fig. 2). The other MSD components, SB (Eq. [15]) and SDSD (Eq. [21]), are negligible (Fig. 2). In short, the modelmeasurement deviation for this location is small, because both measured and simulated yields show only small variability across the 10 yr, and the model simulated the measured yields with little bias. The low correlation may be a result only of the small year-to-year variability rather than a deviation between the model outputs and measurements.
The location in the state of Kansas (KS) is in contrast with NY. The lack of regression fit is smaller (i.e., the correlation coefficient is much larger) (0.825 in CERES-Maize and 0.714 in ALMANAC), for KS than for NY (Fig. 1). The slope of the regression line is not significantly different from 1, and the y-intercept is not significantly different from 0 for either model (Tables 6 and 7 of Kiniry et al., 1997). It therefore appears that the models fit the measurement better for this location compared with NY. Nevertheless, MSD for KS is larger than that for NY, in particular with ALMANAC (Fig. 2). This is because both SDm (1.86) and SDs (1.68 in CERES-Maize and 2.34 in ALMANAC) are larger for KS than for NY; hence LCS and MSD are also larger, with MSD components SB and SDSD being almost negligible. Thus, despite the relatively high correlation between the simulation and measurement, MSD is larger for this location than for NY. The above contrast between KS and NY indicates that the overall deviation is not only dependent on the correlation, but also on the variability of the measurement and simulation.
In most locations, LCS is the major component of MSD (Fig. 2), but there are exceptions. The location in Nebraska (NE) shows clear distinction between the two models. For CERES-Maize, MSD is smallest for NE among the nine locations, but second largest for ALMANAC (Fig. 2). This is partly because of the larger SDs (2.27) in ALMANAC than SDm (1.14), and hence the large SDSD (Eq. [21]). This indicates that, for NE, ALMANAC is overly sensitive to the environmental fluctuations responsible for the year-to-year variability of the maize yield.
Thus, the MSD-based comparison enables the user to locate the simulation vs. measurement contrasts that have larger deviations than others, and to further analyze causes of the large deviations. By contrast, the correlation-based approach tends to focus on the low correlation and the deviation of the regression line from the equality line rather than the deviation of the model outputs from the measurement. Kiniry et al. (1997) calculated RMSD (Eq. [11]) and MD (Eq. [12]) in addition to the correlation and regression coefficients. Although they used those deviation-based statistics only to say that the models simulated the measurements reasonably well, they could have used RMSD to evaluate the deviation of the simulated values from the measurements.
Example 2
Jamieson et al. (1998) compared outputs from five different models of wheat growth with measurements under different irrigation regimes in a field experiment in New Zealand. Their published values of aboveground biomass and grain weights at the end of the season are used in the comparison between the simulation and measurement below. Possibly because of the rounding error in the published values, the correlation coefficient r and other statistics calculated here might be somewhat different from those published (Jamieson et al., 1998). Note, however, that our purpose in using these data is not to reanalyze the published results, but to give an example of MSD-based analysis compared with regression analysis of the simulation vs. measurement comparison.
Aboveground biomass dry weight has a large SB component (Eq. [15]), which is the major component of MSD for all the models except Sirius (Fig. 3) . This is especially true for the model AFRCWHEAT2, which shows the largest MSD among the models. The mean aboveground biomass estimated with this model is only 14.2 t ha-1; the mean measured value is 20.1 t ha-1.
|
The regression results are thus consistent with the results of the MSD-based analysis, but it is not clear from the regression how much of this underestimation contributed to the large RMSD (Eq. [11]). Jamieson et al. (1998) compared MD (Eq. [12]) and RMSD to show that the models' underestimation does account for the major part of RMSD. This is in effect the same as the comparison based on MSD and SB, since RMSD2 = MSD and MD2 = SB as shown earlier.
The results for grain yield (Fig. 4) are in sharp contrast to the aboveground biomass results. The comparison on grain weight indicated that SDSD (Eq. [21]) and LCS (Eq. [22]), rather than SB, are the major components of MSD. Among the models, SWHEAT shows the largest MSD, for which SDSD (Eq. [21]) is the dominant component. This is because, in this model, SDs is only 0.80 t ha-1, which is < SDm = 2.15 t ha-1. It is obvious that this model is not sensitive enough to the difference among the irrigation regimes in the field experiment. For SUCROS2, which shows the second largest MSD, by comparison, SB and LCS, rather than SDSD, are responsible for the large MSD.
|
The measured grain yield was regressed on simulated grain yield (Eq. [2]) using the REG procedure of the SAS/STAT system (SAS Inst., 1988). The results showed that all the models were highly correlated with the measurement r > 0.92, but that the regression lines for some models deviated from the equality line (y = x). The null hypothesis (slope = 1 and intercept = 0) (Eq. [3]) was rejected for SWHEAT (P = 0.020) and SUCROS2 (P = 0.050). Sirius was close to the rejection (P = 0.064). Investigating the slope (a) and the intercept (b) (Eq. [2]) separately, it is found that a
1 with SWHEAT (P = 0.008) and that b
0 with SWHEAT (P = 0.008) and SUCROS2 (P = 0.029). For SWHEAT, the slope (2.56) being >1 suggests that the model is less sensitive to the variability among the irrigation regimes than the measurement. This is consistent with the results of the MSD-based comparison. Although the intercept (-10.73) for SWHEAT was significantly <0, this should not be misinterpreted as indicating overestimation of the model. This negative intercept is only a result of the slope being much larger than unity. With the MSD-based comparison, no such misinterpretation is likely, since each component was distinct from others in its meaning.
In this example, unlike in the previous example, the simulation vs. measurement comparisons were made within the same data set. The results are therefore similar whether the comparison is based on MSD or correlation and regression. However, the interpretation of the results is more straightforward in the MSD-based comparison than in the correlationregression analysis.
| Discussion |
|---|
|
|
|---|
with that of the measured values
. By comparison, in the correlationregression approach, multiple criteriacorrelation coefficient, the slope and y-intercept of the regression line, and, often, RMSD (Eq. [11]) and MD (Eq. [12])are presented simultaneously (e.g., Retta et al., 1996; Kiniry et al., 1997). Analysis based on these statistics of deviation along with regression analysis may give results similar to those obtained by the MSD-based analysis, if the interpretation is made carefully. It is not easy, however, to use these multiple criteria in combination, because they are not explicitly related to each other.
Thus, for direct comparison between model output and measurement, the MSD-based analysis is better suited than the commonly practiced correlationregression analysis. In some cases, however, the user's concern may not lie in the direct comparison but in a different aspect of the simulation vs. measurement contrast. When variability of the simulation around the mean is of more concern than deviation (Eq. [9]), then MSV (Eq. [16]) should be the criterion of the comparison. Since MSV is the sum of SDSD and LCS, differences in MSV can be analyzed with respect to the two components. Squared difference between standard deviations and LCS can be further analyzed with their constituents, SDs, SDm and r. Or, if the pattern of the fluctuation is the major concern, r is the primary measure of the comparison between the simulation and measurement. Thus, MSD or a part of it can be used for comparison of model output and measurement depending on the user's major concern.
We have not addressed the statistical test of the equality hypothesis (Eq. [7]) against the nonequality hypothesis (Eq. [8]). This can be done, if one has an estimate of the error variance from repeated measurements (see Appendix B). The error variance could also be estimated with the protocol based on resampling, as proposed by Wallach and Goffinet (1989) and applied to simulation vs. measurement comparison by Colson et al. (1995).
However, it must be noted that, on most occasions, we know the null hypothesis is wrong, no matter what the significance test indicates. The model does deviate from the reality because of the simplifications inherent in any simulation model. Such simplifications or omission of details are inevitable or even necessary in the modeling of a complex real system. The deviation of the model from the reality would result in the difference between the simulated and the true values. If this difference is smaller than the measurement error, the null hypothesis (Eq. [7]) is maintained. However, theoretically, we could reduce the measurement error by increasing the number of replications, and could eventually detect the difference between the simulated and the true values, and reject the null hypothesis. Therefore, the relevant question is not whether the model is right or wrong, but how much the model output differs from the measurement and why. The model's performance can be discussed only relatively, not absolutely (Oreskes et al., 1994). The MSD-based analysis we present here would be useful to quantify the deviation of model output from measurement, and to locate possible cause(s) of the deviation.SAS Institute 1988
| ACKNOWLEDGMENTS |
|---|
| NOTES |
|---|
|
|
|---|
Received for publication January 25, 1999.
| Appendix A |
|---|
|
|
|---|
, namely
![]() | (A-1) |
MSV (Eq. [20] and [23]) can be rewritten as
![]() | (23) |
![]() | (20) |
![]() | (A-2) |
The ratio of SDSD to LCS is equal to the ratio of the two terms in the brackets of the right side of Eq. [A-2], namely
![]() | (A-3) |
Figure A-1
depicts the ratio SDSD/LCS on the
- r coordinate in the range
![]() |
- r combination. Note that change of the ratio is quite nonuniform, but that the ratio is <1 (i.e., SDSD < LCS) for a major portion of the domain. Exceptions are the region with high correlation (e.g., r > 0.9) and the region with
< 0.3 or
> 2.8 (Fig. A-1).
|
| Appendix B |
|---|
|
|
|---|
2, namely
![]() | (B-1) |
![]() | (B-2) |
Then yi has a normal distribution with mean µi and variance
2/m, and the sum of squared deviation for each measurement divided by the error variance has a chi-squared distribution with m - 1 degrees of freedom (Mood et al., 1974, p. 240246), namely
![]() |
![]() | (B-4) |
The sum of squared error (SSE) across the n measurements divided by the error variance also has a chi-squared distribution with n(m - 1) degrees of freedom, namely
![]() | (B-5) |
On the other hand, sum of the squared deviation of yi from the true mean, µi, divided by its variance,
2/m (Eq. [B-3]), also has a chi-squared distribution of n degrees of freedom, namely
![]() | (B-6) |
Under the equality hypothesis (Eq. [7]) (i.e., H0: µi = xi), Eq. [B-6] can be written as
![]() | (B-7) |
The left side terms of both Eq. [B-5] and [B-7] have chi-square distributions and are independent from each other. The ratio of the two terms divided by their degrees of freedom is distributed as an F distribution with n and n(m - 1) degrees of freedom (Mood et al., 1974, p. 246249), namely
![]() | (B-8) |
The n
2 in both numerator and denominator is omitted, and the square sum of xi - yi is replaced with n MSD (Eq. [13]) to yield
![]() | (B-9) |
This value is compared to the critical values {e.g., F [0.05, n, n(m - 1)]} to test the null hypothesis.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
W. Xiong, D. Conway, I. Holman, and E. Lin Evaluation of CERES-Wheat simulation of Wheat Production in China Agron. J., November 7, 2008; 100(6): 1720 - 1728. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Bertheloot, P. Martre, and B. Andrieu Dynamics of Light and Nitrogen Distribution during Grain Filling within Wheat Canopy Plant Physiology, November 1, 2008; 148(3): 1707 - 1720. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Grechi, N. Hilgert, M. Genard, and F. Lescourret Assessing the Peach Fruit Refractometric Index at Harvest with a Simple Model Based on Fruit Growth J. Amer. Soc. Hort. Sci., March 1, 2008; 133(2): 178 - 187. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-F. Liu, M. Genard, S. Guichard, and N. Bertin Model-assisted analysis of tomato fruit growth in relation to carbon and water fluxes J. Exp. Bot., October 1, 2007; 58(13): 3567 - 3580. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. I. Lizaso, A. E. Fonseca, and M. E. Westgate Simulating Source-Limited and Sink-Limited Kernel Set with CERES-Maize Crop Sci., September 1, 2007; 47(5): 2078 - 2088. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Prasad, B. F. Carver, M. L. Stone, M. A. Babar, W. R. Raun, and A. R. Klatt Potential Use of Spectral Reflectance Indices as a Selection Tool for Grain Yield in Winter Wheat under Great Plains Conditions Crop Sci., July 30, 2007; 47(4): 1426 - 1440. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Wu, M Genard, P Lobit, J. Longuenesse, F Lescourret, R Habib, and S. Li Analysis of citrate accumulation during peach fruit development via a model approach J. Exp. Bot., July 1, 2007; 58(10): 2583 - 2594. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Bertin, A. Lecomte, B. Brunel, S. Fishman, and M. Genard A model describing cell polyploidization in tissues of growing fruit as related to cessation of cell proliferation J. Exp. Bot., May 1, 2007; 58(7): 1903 - 1913. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Parsons, J. H. Cherney, and H. G. Gauch Jr. Alfalfa Fiber Estimation in Mixed Stands and Its Relationship to Plant Morphology Crop Sci., October 2, 2006; 46(6): 2446 - 2452. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Parsons, J. H. Cherney, and H. G. Gauch Estimation of Preharvest Fiber Content of Mixed Alfalfa-Grass Stands in New York Agron. J., June 27, 2006; 98(4): 1081 - 1089. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. G. Izquierdo, L. A.N. Aguirrezabal, F. H. Andrade, and M. G. Cantarero Modeling the Response of Fatty Acid Composition to Temperature in a Traditional Sunflower Hybrid Agron. J., April 11, 2006; 98(3): 451 - 461. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Quilot, J. Kervella, M. Genard, and F. Lescourret Analysing the genetic control of peach fruit quality through an ecophysiological model combined with a QTL approach J. Exp. Bot., December 1, 2005; 56(422): 3083 - 3092. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Quilot, M. Genard, F. Lescourret, and J. Kervella Simulating genotypic variation of fruit quality in an advanced peachxPrunus davidiana cross J. Exp. Bot., December 1, 2005; 56(422): 3071 - 3081. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. PADILLA and M. E. OTEGUI Co-ordination between Leaf Initiation and Leaf Appearance in Field-grown Maize (Zea mays): Genotypic Differences in Response of Rates to Temperature Ann. Bot., November 1, 2005; 96(6): 997 - 1007. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Shimono, T. Hasegawa, M. Moriyama, S. Fujimura, and T. Nagata Modeling Spikelet Sterility Induced by Low Temperature in Rice Agron. J., October 19, 2005; 97(6): 1524 - 1536. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Genard and B. Gouble ETHY. A Theory of Fruit Climacteric Ethylene Emission Plant Physiology, September 1, 2005; 139(1): 531 - 545. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. D. Baez-Gonzalez, J. R. Kiniry, S. J. Maas, M. L. Tiscareno, J. Macias C., J. L. Mendoza, C. W. Richardson, J. Salinas G., and J. R. Manjarrez Large-Area Maize Yield Forecasting Using Leaf Area Index Based Yield Model Agron. J., March 1, 2005; 97(2): 418 - 425. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. GIBERT, F. LESCOURRET, M. GENARD, G. VERCAMBRE, and A. PEREZ PASTOR Modelling the Effect of Fruit Growth on Surface Conductance to Water Vapour Diffusion Ann. Bot., March 1, 2005; 95(4): 673 - 683. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Kobayashi Comments on another way of partitioning mean squared deviation proposed by Gauch et al. (2003) Agron. J., July 1, 2004; 96(4): 1206 - 1207. [Full Text] [PDF] |
||||
![]() |