Agronomy Journal Grow Your Career With ASA
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (21)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Bellocchi, G.
Right arrow Articles by Donatelli, M.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Bellocchi, G.
Right arrow Articles by Donatelli, M.
Agricola
Right arrow Articles by Bellocchi, G.
Right arrow Articles by Donatelli, M.
Related Collections
Right arrow Agroclimatology
Right arrow Other Models
Agronomy Journal 94:1222-1233 (2002)
© 2002 American Society of Agronomy

MODELING

An Indicator of Solar Radiation Model Performance based on a Fuzzy Expert System

Gianni Bellocchi*,a, Marco Acutisb, Gianni Filaa and Marcello Donatellia

a Res. Inst. for Ind. Crops, via di Corticella 133, 40128 Bologna, Italy
b Dep. of Crop Sci., via Celoria 2, 20133 Milan, Italy

* Corresponding author (g.bellocchi{at}isci.it)

Received for publication October 1, 2001.

    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 REFERENCES
 
When evaluating models, various indices or test statistics are computed, quantifying the magnitude of model residuals, the correlation between estimates and measurements, patterns of residuals over external variables, etc. Such indices are variously related to each other, thus making model comparison difficult. Problems of this type emerge when testing solar radiation models. This paper proposes a fuzzy expert system to calculate a modular indicator, Irad, which reflects an expert perception about the quality of the performance of solar radiation models. Three modules were formulated reflecting the magnitude of residuals (Accuracy), the correlation estimates and measurements (Correlation), and the presence or absence of patterns in the residuals against independent variables (Pattern), respectively. The modules Accuracy and Pattern resulted from the aggregation of three (relative root mean square error, modeling efficiency, and t-Student probability) and two (pattern index vs. day of the year and pattern index vs. minimum air temperature) indices, respectively, while the module Correlation was identified by a single index (Pearson's correlation coefficient). For each index, two functions describing membership to the fuzzy subsets Favorable (F) and Unfavorable (U) have been defined. The expert system calculates the modules according to both the degree of membership of the indices to the subsets F and U and a set of decision rules. Then the modules are aggregated into the indicator Irad. Sensitivity analysis is presented, along with module and Irad scores for some application cases.

Abbreviations: BC, Bristow–Campbell (model) • CD, Campbell–Donatelli (model) • DB, Donatelli–Bellocchi (model) • EF, efficiency (index) • F, Favorable (subset) • Irad, indicator of solar radiation model performance • PI, pattern index • PIdoy, pattern index day of year • PITmin, pattern index daily minimum air temperature • RMSE, root mean square error • RRMSE, relative root mean square error • U, Unfavorable (subset)


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 REFERENCES
 
IN THE PROCESS of testing model performance, several indices and test statistics are commonly used (e.g., Smith et al., 1997; Martorana and Bellocchi, 1999; Metselaar, 1999; Yang et al., 2000). Some of them quantify the departure of the model response from experimental measurements (model residuals) while others focus on the correlation between model estimates and measurements. Other indices have been proposed recently to assess systematic behaviors of model residuals against model inputs or other independent variables (Donatelli et al., 2000; Donatelli et al., unpublished, 2002). The interpretation of these statistics is in itself essentially descriptive and primarily based on scientific background, rather than on statistical significance (Willmott, 1982). Some users may appraise the degree of reliability of model outputs simply from the model's graphical display while others may require more in-depth evaluation. Indeed, there is some evidence to suggest that the manner in which a study is performed (i.e., the extent to which the simulation study meets a user's expectations) is more important in forming a user's quality perception than the results of a statistical evaluation (Robinson, 1998).

Each statistic allows only a partial insight into the model performance. Therefore, giving a solid quality judgment about model results requires one to simultaneously consider several statistics. Balancing different aspects involved in model testing, such as departure of estimates from measurements, modeling efficiency (i.e., better fit than the average of measurements), correlation measures, presence or absence of systematic behavior in the residuals, etc., is often complicated by the fact that different statistics may provide contrasting results. Hence, combining several statistics into one aggregated measure is desirable to have a comprehensive assessment of the model response. This would be helpful when judging one model response, evaluating its performance in a variety of conditions, or requesting one to choose the best model out of a list of candidates.

Before aggregating statistics, the user must describe the constraints needed to evaluate the model (Bardossy et al., 1985). Generally, this is an internal process that is not typically well defined or thoroughly documented. Indeed, it varies from individual to individual according to a number of factors, which in fact, influence and characterize each individual (Dubois and Prade, 1980). Personal preferences and intentions may strongly influence a user's judgment. To capture the knowledge and preferences of the user, weights may be used to establish the relative importance of complementary statistics. The weights are determined by the user and may change for different users. These subjective aspects cannot be completely removed from the evaluation process, but they may be proficiently captured and described in mathematical terms (Jones and Barnes, 2000). This could be achieved by aggregating indices by summation, multiplication, or a combination of both. These approaches pose mathematical and conceptual problems (Keeney and Raiffa, 1993) because evaluation statistics differ in their nature, dimensions, and range of possible values.

Considering the inadequacy of such methods, a different approach to evaluate model performance can rely on setting up a fuzzy expert system (Hall and Kandel, 1991) using decision rules (fuzzification). This technique is robust on uncertain and imprecise data such as subjective judgments and allows the aggregation of dissimilar measures in a consistent and reproducible way (Bouchon-Meunier, 1993).

Daily solar radiation is an important meteorological measure for various analyses in many applied sciences, including agricultural engineering, crop physiology, ecology, hydrology, meteorology, physics, and soil science (Brutsaert, 1982; Iqbal, 1983; Tracy et al., 1983). In spite of that, solar radiation is measured infrequently, and the access to reliable radiation data is limited (e.g., Thornton and Running, 1999). As a consequence, the need for accurate estimates of solar radiation exists.

A model for estimating global radiation based on daily temperature extremes was proposed by Bristow and Campbell (1984) using data from three stations in the northwestern USA. Improved models were successively developed by Donatelli and Marletto (1994), Donatelli and Campbell (1998), Bechini et al. (2000), and, more recently, by Donatelli and Bellocchi (2000). Evaluating and comparing the models Bristow–Campbell (BC), Campbell–Donatelli (CD), and Donatelli–Bellocchi (DB) against data sets from a wide range of latitudes was a complex process (Donatelli and Bellocchi, 2000) because of the number of statistical indices used. To overcome such a problem, in this paper, we describe an expert system to calculate an indicator of model performance, Irad, which reflects an expert perception of the performance of radiation models.


    METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 REFERENCES
 
The Structure of the Indicator
It is assumed here that a comprehensive assessment about radiation model performance should consider: (i) the ability of the model to produce small residuals, (ii) the extent to which the model estimates are correlated with measurements, and (iii) the ability of the model to produce residuals uniformly distributed over the range of two relevant independent variables (day of the year and minimum air temperature).

According to such premises, we defined three indicator modules, named Accuracy, Correlation, and Pattern. The value of each module depends on one or more indices (Table 1) and a set of decision rules. For each module, a dimensionless value between 0 (best model response) and 1 (worst model response) is calculated. The procedure, based on the multivalued fuzzy set theory introduced by Zadeh (1965), follows the so-called Sugeno or Takagi–Sugeno–Kang method of fuzzy inference (Sugeno, 1985). This approach is computationally efficient and well suited for mathematical analysis. It has been applied to a wide variety of problems, such as the design of an indicator for assessing environmental impact of pesticides (van der Werf and Zimmer, 1998) and the development of novel approaches to support decisions regarding sustainable development (Cornelissen et al., 2001). Three membership classes (or subsets) were basically defined for all indices given in Table 1, according to an expert judgment: F, U, and partial (or fuzzy) membership. Several indices were aggregated into modules, and the modules in the final indicator, using fuzzy-based logic rules. The procedure is explained in detail in Appendix A.


View this table:
[in this window]
[in a new window]
 
Table 1. The indicator modules Accuracy, Correlation, and Pattern and their inputs. For details, see the text.

 
The Module Accuracy
The composition of the module Accuracy was based essentially on the suggestions of Yang et al. (2000). These authors found that a sound conclusion on model accuracy may be drawn using an index of the amount of residuals [e.g., root mean square error (RMSE)] (Fox, 1981), a measure of modeling efficiency (EF) (Loague and Green, 1991), and a two-tailed paired t test:

[1]

[2]

[3]
where Di is the difference Ei - Mi, Ei is the ith estimated value, Mi is the ith measured value, n is the number of pairs Ei/Mi, is the average value of all measured values, is the average of all differences Di, and sD is the standard error of the differences Di. The computed t is compared against the critical t with 2 x (n - 1) degrees of freedom.

In place of RMSE, we considered it more appropriate to use a relative measure, the relative root mean square error (RRMSE), where:

[4]

The index RRMSE may vary from 0 to positive infinity. The smaller the RRMSE is, the better the model performance. The RRMSE is a dimensionless index, allowing comparison among different model responses, regardless of units and range of values. Problems could arise because RRMSE tends to become unstable when is close to zero and is undefined when = 0. However, this problem does not occur when computations are made on daily solar radiation data. The limit to the fuzzy subset F for this index was set equal to 20 (RMSE <= 20 is F) while the limit to the subset U was established equal to 40 (RMSE >= 40 is U). Both limits come from our expert judgment, working on multiyear weather data sets from about 200 locations (Donatelli and Bellocchi, 2000).

The index EF is very informative because it allows the immediate identification of inefficient models. It is upper-bounded by 1 and can assume negative values (lower-bounded at negative infinity). Negative values of EF indicate that the average value of all measured values is a better predictor than the model used. When estimating daily solar radiation, the limit for the subset U, EF = 0.40, and the limit for the subset F, EF = 0.90, were chosen (EF <= 0.40 is U and EF >= 0.90 is F). Again, both limits are derived from our extensive experience, covering radiation estimates over a wide range of latitudes (Donatelli and Bellocchi, 2000).

In testing numerical estimates against measured data, the paired t statistic is used to test the null hypothesis average residual equal to zero at a given probability level (statistical significance). If significant, the t statistic shows that all of the differences between estimated and measured values cannot be attributed to experimental error. If not significant, the t test cannot prove that the estimated and measured values are identical, but it does indicate that there is no statistically significant reason for rejecting the hypothesis that the two outputs represent the same response. When the paired t statistic is used to test the difference between estimated and measured data, the data set of Di should be normally distributed with a null hypothesis average Di equal to zero. In our research, all Di data sets used were characterized by a roughly normal distribution. Generally, low t values indicate a satisfactory response; however, under certain conditions, they may not adequately characterize significant departure of simulation estimates from measured data. These conditions occur with high values of standard error of Di. In such cases, low t values may be obtained with large departures of estimates from measurements; consequently, the t test may be unsuitable for evaluation purposes. Although cases of this type can theoretically occur in the evaluation process, they were not experienced with the data sets used in the present work. The paired t test is handled here giving its significance level, i.e., P(t). Because P(t) represents the probability of t under random differences between pairs, then the best value for P(t) is 1, with 0 being the worst value. The selection of a particular statistical significance is problematic. Popular values are 0.01, 0.05, and 0.10. The probability level of 0.05 is almost universally accepted as the threshold to reject a null hypothesis; therefore, P(t) = 0.05 was set as limit for U. Some scientists have used a higher level of statistical significance, e.g., 0.10 (Kedzie, 1997; Kleijnen et al., 1998; CTAP, 2000) because they were concerned about the lack of power in their test. Therefore, we set P(t) = 0.10 as the limit for F, attributing a response in between P(t) = 0.05 and P(t) = 0.10 to a transition interval.

The value of the module Accuracy was calculated from the input indices according to eight decision rules, as summarized in Table 2. The expert reasoning runs as follows: If all indices are F, the value of the module is 0 (identity of estimates and measurements); if all indices are U, the value of the module is 1. In setting up the decision rules for the other combinations, we had to decide on the relative importance of each index. In our experience, RRMSE and EF assume a relevant importance in the evaluation process; thus, a fairly large weight (0.80) was attributed to the rule when both RRMSE and EF are U. The condition = 0 is a sound requisite but scarcely informative by itself, and it may hide model outputs drifting toward large biases. Thus, the weight is low (0.20) when P(t) only is U.


View this table:
[in this window]
[in a new window]
 
Table 2. Summary of decision rules describing the effect of the input indices relative root mean square error (RRMSE), efficiency (EF), and P(t) on the module Accuracy. For details, see the text.

 
The Module Correlation
The value of the module Correlation depends on a single basic index, that is, the correlation coefficient r (Addiscott and Whitmore, 1987), derived from the Pearson's simple linear correlation coefficient:

[5]
where is the average of estimates. The coefficient r may vary from -1 (full negative correlation) to 1 (full positive correlation). The closer r is to 1, the better the model. Besides the indices based on differences, the coefficient of correlation r between estimates and measurements is commonly computed. The use of this index is questioned (e.g., Willmott, 1982) because its value is not related to the accuracy of estimate. However, the index r is a universal measure with multiple interpretations. For instance, Cahan (1987) looks at r as a measure of identity between standardized values. Moreover, the value of r may help recognize the fluctuation of the estimates among the n measurements (Kobayashi and Salam, 2000). For these reasons, the index r is generally still regarded as a useful measure of model performance.

The membership limits attributed here to the correlation coefficient come from the general categorization made by Hinkle et al. (1994), who took correlation coefficients >=0.90 as very high correlations (limit for F: r = 0.90) and coefficients <=0.70 as moderate and little correlations (limit for U: r = 0.70). Such limits do conform to our expert judgment.

It must be pointed out that statistical significance for correlation coefficients does not always imply practical significance. The limits attributed here are mere descriptors for the practical interpretation of correlation coefficients and do not take into account statistical significance. The latter depends on the number of data points and can be verified by a t test, provided that both estimated and measured series do conform to the assumptions required for the appropriate application of the test.

Given that there is only one index in the module, the computation of Correlation is simplified to two decision rules: If r is F then 0, and if r is U then 1.

The Module Pattern
The module Pattern accounts for two relevant independent variables in radiation models, day of year and the daily minimum air temperature. For the computation of pattern indices (PIs), the range of such independent variables is divided into four subranges (quartiles), thus producing four groups of residuals. Pattern indices are based on the pairwise differences between average residuals of each quartile (Donatelli et al., 2000):

[6]
where R is the model residual, l and m indicate two groups being compared, ql and qm represent the number of residuals in the groups, il and im identify each value of residuals in the groups. The PI values have the same units as the variable under study (in this case, MJ m-2 d-1).

The PIs are targeted at pointing out macropatterns in the residuals (Donatelli et al., unpublished, 2002). The presence of patterns usually means that the residuals contain structure that is not accounted for in the model. When applied to different types of residual plots, PIs may provide meaningful information on the adequacy of different aspects of the model, such as lack of inputs, poor parameterization, etc.; therefore, they should integrate difference- and correlation-based indices when evaluating model performance.

We refer here to PIs computed against day of year and daily minimum air temperature because daily radiation model residuals often show nonrandom distribution of residuals over the range of such variables, and some of the models commonly used for estimating radiation include parameters specifically devoted to account for patterns of this type.

The limits attributed to both PIs reflect the authors' experience. Values of PI are considered F when <=1.0 MJ m-2 d-1 and U when >=2.5 MJ m-2 d-1. Examples of PIs computed with different models at tropical (Patos de Minas, Brazil) and temperate (Würzburg, Germany) sites in selected years are shown in Fig. 1 and Fig. 2 , respectively.



View larger version (35K):
[in this window]
[in a new window]
 
Fig. 1. Examples of pattern indices (PIs) vs. two independent variables [day of the year (doy) and minimum air temperature (Tmin)] computed on residuals generated at Patos de Minas (Brazil) in 1997 by different radiation models [Bristow–Campbell (BC), Campbell–Donatelli (CD), and Donatelli–Bellocchi (DB)].

 


View larger version (33K):
[in this window]
[in a new window]
 
Fig. 2. Examples of pattern indices (PIs) vs. two independent variables [day of the year (doy) and minimum air temperature (Tmin)] computed on residuals generated at Würzburg (Germany) in 1998 by different radiation models [Bristow–Campbell (BC), Campbell–Donatelli (CD), and Donatelli–Bellocchi (DB)].

 
The value of the module Pattern depends on the input indices according to decision rules summarized in Table 3. The same weight was attributed to pattern index day of year (PIdoy) and pattern index daily minimum air temperature (PITmin). If all indices are F, the value of the module is 0 (i.e., no pattern); if all indices are U, the expert weight is 1; if one index is F and the other U, the weight is 0.5.


View this table:
[in this window]
[in a new window]
 
Table 3. Summary of decision rules describing the effect of the pattern index day of year (PIdoy) and pattern index daily minimum air temperature (PITmin) on the module Pattern. For details, see the text.

 
Aggregation of the Modules
The three modules described above can be used to compare different radiation models. The modules might be ranked, for instance, by means of a multicriteria analysis technique, using the modules as evaluation criteria. An alternative approach is to aggregate the three modules (second-level aggregation) in some way into an overall indicator (Irad), reflecting a global judgment about model performance, again on a 0 to 1 scale. This can be done by summation, multiplication, or a combination of both, according to aggregation schemes. We propose an aggregation of the three modules, which uses decision rules, as described above for the aggregation of indices into the modules.

The value of the indicator Irad depends on the modules Accuracy, Correlation, and Pattern, according to a set of eight decision rules (Table 4). The definition of the limits of the transition interval is the same for the three modules: We assigned complete membership to F if the value of the module was 0 (i.e., identity of estimates and measurements, unit correlation, no pattern of residuals vs. independent variables) and complete membership to U if the value of the module was 1. In setting up the other decision rules, we had to establish the relative importance of each module. As a general rule in model evaluation, more emphasis is given to the amount of residuals, whereas the correlation of estimates vs. measurements carries less weight. Because of their recent development, PIs are rarely used in model evaluation. Based on our experience, decreasing importance was given to the modules Accuracy, Pattern, and Correlation, respectively. Thus, for instance, if Accuracy is U, then the weight is 0.55. If both Accuracy and Pattern are U, then the expert conclusion is 0.85. If both Accuracy and Correlation are U, then the conclusion is 0.70.


View this table:
[in this window]
[in a new window]
 
Table 4. Summary of decision rules describing the effect of the three modules on the value of the indicator (Irad).

 
The relative incidence of each index on the indicator can be deduced by combining the weights of the indices into their own module with the ones of the modules into the indicator (Table 5).


View this table:
[in this window]
[in a new window]
 
Table 5. Relative incidence of each basic evaluation index on the value of the indicator (Irad).

 

    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Sensitivity Analysis to Input Indices
To illustrate the functioning of the system, we present first the sensitivity showed by the indicator Irad to variation of input values. Each input variable was varied over its transition interval while the others were kept fixed either at the extremes of the transition interval, i.e., at U (Fig. 3 , top) and F (Fig. 3, bottom), or at the median value (Fig. 3, middle). The sensitivity analysis reflects the functioning of the system and provides some indication about the importance of each input index on the value of Irad. However, one should be aware that the effect of the variation of a particular index over its transition interval on the value of Irad depends on the value of the other indices. Therefore, results presented here should not be considered other than illustrations of the functioning of the system.



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 3. Sensitivity analysis of the indicator Irad to variation of all input indices. Each input index is varied over its transition interval from 0.0 [completely Favorable (F)] to 1.0 (completely Unfavorable (U)] while the other input indices are kept at U (top graph), at the median value of their transition interval (middle graph), or at F (bottom graph). The traces of the pattern index day of year (PIdoy) and pattern index daily minimum air temperature (PITmin) are superimposed. RRMSE, relative root mean square error; EF, efficiency.

 
The extent to which indices affect the value of Irad can be deduced from the decision rules involved in the process of aggregating indices in the modules and the modules in the indicator. For instance, the input P(t) is very influential when all other inputs are U (Fig. 3, top) while its effect is much smaller when the other inputs are F (Fig. 3, bottom). This is the result of the mode of aggregation we adopted, which gives a lot of weight (0.8) to the rule in the module Accuracy when P(t) is F and both RRMSE and EF are U.

The RRMSE tends to be more influential than the PIs when all other indices are U (Fig. 3, top) as a consequence of the different weight attributed to these indices when they are F and the rest are U in the respective module (0.6 and 0.5, respectively). The opposite occurs when all other indices are kept F (Fig. 3, bottom).

The influence of r when other inputs are U is somewhat large (Fig. 3, top) due to the noticeable weight (0.85) attributed to the rule when the module Correlation is F and both other modules are U. Correlation coefficient and PIs exert the same incidence on the indicator (Table 5); thus, their curves are somewhat complementary (Fig. 3, top, middle, and bottom).

Example Applications
Another illustration of the functioning of the system is given by the computation of either the basic indices or the three modules and the Irad indicator over yearly sets of radiation data. Selected locations and years with dissimilar response in the various evaluation indices were used for the computations (Table 6).


View this table:
[in this window]
[in a new window]
 
Table 6. Description of the 10 locations used in this study: latitude, longitude, elevation, clear-sky transmissivity, years, yearly rainfall, yearly average maximum air temperature, yearly average minimum air temperature, and yearly average global solar radiation.

 
Estimates of daily radiation were made using three models based on daily temperature extremes: BC (Bristow and Campbell, 1984), CD (Donatelli and Campbell, 1998), and DB (Donatelli and Bellocchi, 2000). The models are described in Appendix B. Parameter values were determined at each location (except the parameter c of BC that was kept equal to 2, see Appendix B) using a multiyear calibration data set through iterative steps aimed at minimizing both RRMSE and PIs. This procedure is consistent with the hypothesis underlying model formulations, for which model parameters are specifically devoted to reduce the magnitude of residuals (b in all models) or the presence of patterns (Tnc in CD and c1 and c2 in DB). The calibration procedure is described in detail in the documentation accompanying the software RadEst3.00 (Donatelli and Bellocchi, 2001; Donatelli et al., unpublished, 2002). Both software and documentation are freely downloadable via from http://www.isci.it/tools (verified 3 July 2002).

Calibrated model parameters for the locations and years selected are given in Table 7. The results are reported in Table 8 (evaluation indices) and Table 9 (modules and indicator). The outputs of Table 9 can be used to rank different radiation models with respect to the value of one or more modules or of the Irad indicator.


View this table:
[in this window]
[in a new window]
 
Table 7. Model parameter values determined at each location. See Appendix B for explanation of the solar radiation models Bristow–Campbell (BC), Campbell–Donatelli (CD), and Donatelli–Bellocchi (DB).

 

View this table:
[in this window]
[in a new window]
 
Table 8. Response of three radiation models [Bristow–Campbell (BC), Campbell–Donatelli (CD), and Donatelli–Bellocchi (DB)] on 10 yearly data sets. The quality of performance was evaluated in terms of the basic indices relative root mean square error (RRMSE), efficiency (EF), pattern index day of year (PIdoy), and pattern index daily minimum air temperature (PITmin).

 

View this table:
[in this window]
[in a new window]
 
Table 9. Response of three radiation models [Bristow–Campbell (BC), Campbell–Donatelli (CD), and Donatelli–Bellocchi (DB)] on 10 yearly data sets. The quality of performance was evaluated in terms of the indicator modules Accuracy, Correlation, and Pattern and the indicator Irad.

 
The model DB gave the lowest value of the indicator (Irad = 0.0044 at Würzburg) while BC gave the worst one (Irad = 0.5518 at Patos de Minas). The model DB gave the best response at six locations (Longreach, Los Baños, Matsumoto, Perugia, Sadore, and Würzburg); CD was the best at three locations (Port Elizabeth, Patos de Minas, and Smolensk); and BC was superior in one case (Poza Rica). In the sites where DB gave the best performance, the result was essentially due to the major ability of this model to deal with patterns (Table 8). The response associated to the model CD was more complex with reference to the contribution of each module to the indicator. In the estimates made at Patos de Minas, we see that CD provided the best performance (Irad = 0.4425) due to the relative power of this model to keep small residuals (Accuracy = 0.2451) although this was achieved at the cost of a partial presence of patterns (Pattern = 0.9253). Conversely, CD was the best model at Port Elizabeth (Irad = 0.0315) because it allowed the best pattern control (Pattern = 0.0533). At Smolensk, CD was successful (Irad = 0.0887) because it gave the best correlation between estimates and measurements (Correlation = 0.0050). At Poza Rica, the model BC was the best according to all modules.

The results obtained here are again the consequence of the choices we made regarding the selection of input variables, the definition of their transition intervals, and the values given to the conclusions of the decision rules. These are illustrative results only and are not meant as conclusive findings on the ability of each model at specific sites. The investigation should be extended over multiyear data sets before providing final results. With large, multiyear data sets, a statistically based investigation could be performed. In particular, a comprehensive assessment of the model performance would include the quantification of the variability associated to Irad, the significance of the computed Irad scores, and the correlation between Irad scores and the respective modules. This would allow statistical separation of Irad scores. Another option would involve cluster analysis to discriminate about model performance over a large amount of locations (or subsets of them), using both the Irad values and those of the modules Accuracy, Correlation, and Pattern.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 REFERENCES
 
In the design of a system to assess radiation model performance, two major questions have to be answered: (i) Which input index should be taken into account and (ii) how should the basic indices be aggregated? The method presented in this paper proposes an answer to both questions, but its relevance lies primarily in the answer it provides to the second question. The approach contains two key elements: the use of a fuzzy set and the use of decision rules. The use of a fuzzy set provides a well-designed solution to the problem of deciding the cutoff values for input indices: e.g., the limits among a F, U, and transition response. The use of decision rules provides a rational aggregation of input indices in the related module: e.g., RRMSE, EF, and P(t) in the module Accuracy. The combination of these two concepts (limits in the response and mode of aggregation) in sets of fuzzy rules is attractive because although the combinations of values of input indices are infinite, a single set of fuzzy rules connects them all.

In this application, the system is based on a compromise between operational suitability (the evaluation of radiation model performance) and flexibility (hierarchization of objectives and aggregation of preferences). It requires extended corroboration, considering that the objective of an expert system is the simulation of a human expert. The expert system is corroborated if it displays, under a variety of conditions, the same responses that a human expert would display. Experts are therefore invited to comment on the setup and results of this system. If there is disagreement between expert perception of radiation model performance and the input of the system, the cause of this divergence will be examined in view of: (i) choice of input indices, (ii) choice of the limits of the transition interval, (iii) formulation of the decision rules, and (iv) formulation of the mode of aggregation of the modules. All of these points may be modified according to expert consensus, after an extensive testing of the methodology. This process may be relatively time demanding to perform on large data sets. For this purpose, the software IRENE (Fila et al., 2001; Fila et al., unpublished, 2002), which supplies provisions for fuzzy-based aggregation, may be of help (free download from the site: http://www.isci.it/tools). IRENE allows the creation of reusable modules and indicators (ASCII files), thus serving as a convenient means to support collaborative work among large, distributed network of scientists involved in creating and aggregating fuzzy-based components.


    CONCLUSIONS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 REFERENCES
 
We propose a fuzzy expert system reflecting our expert judgment of the quality of radiation model performance. Providing usable values of basic evaluation indices, the method allowed for the building of an aggregated indicator with a modular structure. The system takes into account three types of input variables: magnitude of model residuals, correlation between estimated values and measured data, and patterns in the behavior of residuals vs. independent variables. The resulting output can be used as a support to rank or choose among alternative models under a variety of conditions.

Although a wide expert examination is still required for a general consensus about weights and limits applied to indices and aggregated modules, the fuzzy sets suggested here may represent a pragmatic approach toward a satisfactory solution.

This approach to model evaluation is useful for a number of reasons:

  1. It allows users to express mathematically individual or collective values and preferences (uncertainty factors).
  2. It highlights the degree of model failure or goodness associated with each information source (i.e., model accuracy, correlation, and patterns in the residuals).
  3. It elucidates the degree of reliability of response associated with each alternative model.
  4. It facilitates structuring of various components of the evaluation process.
  5. It reduces several sources and levels of information into a single value.
  6. It allows examination of operational equivalence between several indices and modules.

The modular structure also presents advantages. In the first place, users have access to both a synthetic indicator reflecting overall judgment and to each of the modules. This means a transparency of each step and a control opportunity exists for anybody concerned by the process itself. Second, the mode of module aggregation can be changed, and new modules can eventually be added. The multivalue nature of the issue we are dealing with is explicitly stated; the rules are easy to read; and the numerical scores used for their conclusion are easy to tune to match expert opinions.

The method illustrated here is flexible and can be extended to aggregating more indices of different types. The same method could be conveniently applied to state variables other than solar radiation, e.g., evapotranspiration. Further evolution may be the implementation of this system in exhaustive optimization algorithms and its use as a cost function in estimating model parameters.

APPENDIX A:

FUZZY EXPERT SYSTEMS
Zadeh (1965) proposed the use of fuzzy set theory to describe relationships that are best characterized by compliance to a collection of attributes. In setting up the set of decision rules in model evaluation, the attributes are the basic evaluation indices, and the user must decide on the relative importance of each one. At the same time, the user must give the limit values beyond which the index is certainly F or U. With this procedure, three membership classes are created for the index values: F, U, and partial (or fuzzy) membership.

Fuzzy-set theory addresses this type of problem by allowing one to define the degree of membership of an element in a set by means of membership functions (transition functions) that can take any value from the interval [0, 1]. The value 0 represents complete nonmembership; the value 1 represents complete membership; and values in between are used to represent partial membership. For classical or crisp sets, the membership function only takes two values: 0 (nonmembership) and 1 (membership).

The hierarchical structure of this technique is used to aggregate indices into first-level fuzzy indicators (modules) and, next, into a second-level fuzzy indicator. Each objective in the attribute hierarchy is given a weight. This process of aggregation may continue, hypothetically, until a final-level fuzzy indicator is achieved. The indicator developed here is a second-level indicator. For simplicity, in the example below, only first-level aggregation is developed.

The aggregation process is accomplished by combining weighted fuzzy values. According to this approach, we can characterize the shape of the membership function of each input index by the two limits of the transition interval. Fuzzy membership functions may have different shapes, depending on someone's experience or even preference. We used membership functions that are S shaped in the transition interval because they provide smoother variations of the input values than functions that are linear in the transition interval. If x is the value of the index, {alpha} and {gamma} lower and upper bound, respectively, the S function is flat at a value of 0 and 1 for x <= {alpha} and for x >= {gamma}, respectively. Between {alpha} and {gamma}, the S function is a quadratic function of x (Liao, 2002):

[A1]
where ß = ({alpha} + {gamma})/2. Two adjacent fuzzy terms with S-shaped membership functions have 0.5 overlap at the midpoint between the two extremes. Equation A1 can only represent the left-hand side. For the right-hand side, the complement of Eq. [A1] is needed.

For each module, we formulated a set of decision rules attributing values between 0 and 1 to an output variable according to the membership of its input indices to the fuzzy subsets F and U.

The linguistic description of these components is accomplished in the form of fuzzy rules with a relatively simple syntax. In fact, fuzzy-rules inference involves computation of fuzzy rules that are mostly if ... then ... statements. When two indices are aggregated, the principle of the method makes use of the conjunctive operator AND, as formalized by four rules:

[Rule 1]

[Rule 2]

[Rule 3]

[Rule 4]
where xj (j = 1, 2) is an input index, Aij is a fuzzy subset, yi is an output variable, and Bi (i = 1, 2, 3, 4) is a number. xj is Aij is called a premise of the ith rule; yi is Bi is called conclusion (or expert weight) of the ith rule. In this case, the decision rules consist of two premises (if... AND if...) linked by and followed by a conclusion (then...).

The process continues identifying the degree of truth in the premise portion of each rule and then aggregating the truth of linked conditions. Let ‘x1 and ‘x2 be the values taken by x1 and x2 and Aij(‘xj) be the membership value of ‘xj to the fuzzy set Aij (given by the membership function that defines Aij). According to Sugeno’s inference method, when the premises are linked by a conclusion, the truth value of a decision rule is defined as the smallest of the truth values of its premises. Therefore, a fuzzy subset is assigned to each output variable for each rule using min aggregator where min means minimum value of. One can define w1, w2, w3, and w4, the truth values of the rules, as follows:

[truth value of Rule 1]

[truth value of Rule 2]

[truth value of Rule 3]

[truth value of Rule 4]

The first rule infers w1 x B1, the second one w2 x B2, and so on. The fuzzy sets that represent the outputs of each rule are combined by summation into a single fuzzy solution set (‘y0):

A solution of this type is sometimes known as singleton output membership function, and it can be thought of as a predefuzzified fuzzy set. The global output ‘y is inferred by:

This last operation (the weighted average of a few data points) is a common method (centroid calculation) adopted to reduce the final fuzzy set to a crisp value (defuzzification) in the Sugeno-type systems.

The computation of the aggregated indicator is primarily influenced by the truth value of each rule (wi), for which the expert weight (Bi) is a multiplication factor. As a result of this insight, it should be clear that weights on rules are not simply measures of the relative importance of each rule. They are essentially measures of the importance of the increase from the worst to the best level of performance on one rule. The Bi terms are subjective elements introduced in the process of aggregation, which the global output may be sensitive to. This means that the quality of outputs is influenced by the weights of the individual rules. This crucial issue should be investigated within the context of interest, which the truth values depend on.

To illustrate Sugeno's inference method, a numerical example will be given for the composition of the module Pattern, obtained by aggregating two PIs, PIdoy and PITmin (see details in "The Structure of the Indicator" section). We gave three classes of model response with respect to the presence of patterns in the residuals against one independent variable (Fig. 4A) . We gave a response classified as pattern (PIdoy, PITmin >= 2.5 MJ m-2 d-1) a membership value of 1 for the fuzzy subset U and a membership value of 0 for the fuzzy subset F. Model response classified as no pattern (PIdoy, PITmin <= 1.0 MJ m-2 d-1) is given a membership value of 0 for the fuzzy subset U and a membership value of 1 for the fuzzy subset F (Fig. 4B). The class of borderline values (1.0 MJ m-2 d-1 < PIdoy, PITmin < 2.5 MJ m-2 d-1) falls within a transition interval in which the membership value for U increases from 0 (at PIdoy, PITmin = 1.0 MJ m-2 d-1) to 1 (at PIdoy, PITmin = 2.5 MJ m-2 d-1) and the membership value for F decreases from 1 to 0 (thus the functions characterizing F and U are complementary). In our example, the reasoning for the four rules is:

[Rule 1]

[Rule 2]

[Rule 3]

[Rule 4]



View larger version (21K):
[in this window]
[in a new window]
 
Fig. 4. Graphical presentation of (A) crisp and (B) fuzzy sets for the pattern index.

 
You can notice the same weight is attributed here to each basic index.

Let's assume that PIdoy was computed equal to 2.00 MJ m-2 d-1 and PITmin equal to 1.70 MJ m-2 d-1. For both indices, membership to fuzzy subsets F and U has to be defined. The membership functions of Fig. 5 allow the calculation of the truth value of the premises, i.e., the degree of membership to the fuzzy subset concerned for PIdoy (Fig. 5, top) and PITmin (Fig. 5, bottom). Results are shown in Table 10. The value of Pattern is calculated as the sum of the conclusions of the decision rules, weighted by the sum of their truth values:



View larger version (24K):
[in this window]
[in a new window]
 
Fig. 5. Membership to the fuzzy sets Favorable (F) and Unfavorable (U) for a hypothetical model response in terms of pattern index day of year (PIdoy) and pattern index daily minimum air temperature (PITmin). PIdoy = 2.00 and PITmin = 1.70.

 

View this table:
[in this window]
[in a new window]
 
Table 10. Summary of decision rules describing the effect of the pattern index day of year (PIdoy) and pattern index daily minimum air temperature (PITmin) on the module Pattern. Truth values of premises (wi) and conclusions (Bi) for PIdoy = 2.00 and PITmin = 1.70 are shown. For details, see the text.

 
APPENDIX B:

DAILY SOLAR RADIATION MODELS
The general approach followed by the models used in this study for estimating solar radiation at ground level (Rad, MJ m-2 d-1) is of the form:

[B1]
where tti is the atmospheric transmissivity and Ra is the extraterrestrial radiation (that is, the calculated solar radiation outside the earth's atmosphere).

Based purely on solar geometry and the solar constant, an ordinary routine for estimating daily Ra at given latitudes (e.g., Swift, 1976; Bristow and Campbell, 1984; Campbell and Norman, 1998) was used.

As regards tti, three models to estimate it are used:

Model BC:

[B2]

Model CD:

[B3]

Model DB:

[B4]
where all temperatures are in °C and

The parameters b and c display the physics involved in the relationship and determine the rate of increase of the exponential function as {Delta}T increases. The parameter c is usually kept equal to 2 (Ndlovu, 1994; Donatelli, 1995; Weiss et al., 2001). The parameters Tnc, c1, and c2 are empirical parameters accounting for seasonal effects.


    ACKNOWLEDGMENTS
 
The structure of the indicator was basically inspired by the fuzzy-based indicator of pesticide environment impact by van der Werf and Zimmer (1998), adapted with novel ideas and concepts to conform to the purpose of the paper. The meteorological data sets used in this study are part of a large database provided from several partners and used in previous studies. The list of people and institutions who contributed by providing data is too large to be reported here, and it can be found in the Acknowledgments section of the manual of the software RadEst3.00. Research conducted under the auspices of the Italian Ministry of Agricultural and Forestry Policies, project SIPEAA (http://www.sipeaa.it), Paper no. 3.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 REFERENCES
 




This article has been cited by other articles:


Home page
J. Environ. Qual.Home page
G. Fragoulis, M. Trevisan, A. Di Guardo, A. Sorce, M. van der Meer, F. Weibel, and E. Capri
Development of a Management Tool to Indicate the Environmental Impact of Organic Viticulture
J. Environ. Qual., March 1, 2009; 38(2): 826 - 835.
[Abstract] [Full Text] [PDF]


Home page
Agron. J.Home page
M. Donatelli, M. Acutis, G. Bellocchi, and G. Fila
New Indices to Quantify Patterns of Residuals Produced by Model Estimates
Agron. J., May 1, 2004; 96(3): 631 - 645.
[Abstract] [Full Text] [PDF]


Home page
Agron. J.Home page
G. Fila, G. Bellocchi, M. Donatelli, and M. Acutis
IRENE_DLL: A CLASS LIBRARY FOR EVALUATING NUMERICAL ESTIMATES
Agron. J., September 1, 2003; 95(5): 1330 - 1333.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (21)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Bellocchi, G.
Right arrow Articles by Donatelli, M.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Bellocchi, G.
Right arrow Articles by Donatelli, M.
Agricola
Right arrow Articles by Bellocchi, G.
Right arrow Articles by Donatelli, M.
Related Collections
Right arrow Agroclimatology
Right arrow Other Models


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
The SCI Journals Crop Science Vadose Zone Journal
Journal of Natural Resources
and Life Sciences Education
Soil Science Society of America Journal
Journal of Plant Registrations Journal of
Environmental Quality
The Plant Genome