|
|
||||||||
Dep. of Agronomy, Univ. of Wisconsin-Madison, 1575 Linden Dr., Madison, WI 53706-1597 USA
mdcasler{at}facstaff.wisc.edu
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: AES, agricultural experiment stations CV, coefficient of variation LSR, least significant range MCV, modified coefficient of variation RE, relative efficiency
| INTRODUCTION |
|---|
|
|
|---|
A large proportion of these trials are operated on a fee basis by state agricultural experiment stations (AES). Several criticisms of these trials have arisen in recent years and have been the focus of discussions in meetings of the Central Alfalfa Improvement Conference, the National Alfalfa Improvement Conference, and the American Society of Agronomy (Caddel, 1993; Caddel et al., 1996; T.H. Busbice, personal communication, 1996). Large experimental errors in some AES trials compared with privately operated trials suggest a lack of precision in some AES trials. Large differences in cultivar ranking among locations is sometimes used to support the contention of imprecise trials. These observations have led to recommendations to discard data from some trials.
The number of new alfalfa cultivars reaching the market has increased considerably in recent years. This has led to two substantive changes in the alfalfa seed industry. First, it has become increasingly difficult to separate alfalfa cultivars on the basis of forage yield. A rapid increase in the number of cultivars has led to compaction of cultivar means so that smaller LSD values are required to detect differences. While this raises the question of whether or not true differences in forage yield exist among many cultivars, only a more precise test of cultivar mean differences will provide an answer to the question. Second, the increasing number of cultivars has led to stronger competition among alfalfa marketers. Increased competition has promoted increased interest in alfalfa cultivar trials and their ability to detect perceived true differences in forage yield potential. These factors suggest that AES researchers conducting alfalfa cultivar trials may need to improve trial precision to keep up with the changing marketplace.
The objective of this study was to estimate experimental errors, blocking statistics, spatial variation, various measures of trial precision, and genotype x environment interactions for 49 alfalfa cultivar trials conducted at 12 Wisconsin locations between 1984 and 1996. Relationships among these statistics were examined with the goal of recommending changes in experimental design for alfalfa cultivar trials and a set of decision rules for measuring the value of a trial, discarding data from individual trials, and discarding entire trials.
| Materials and methods |
|---|
|
|
|---|
|
Trials were harvested for up to 5 yr, but usually for 3 yr, and total annual forage yield was computed on a dry matter basis for each plot. Three or four harvests were made each year using a sickle-bar or flail-type harvester at a 5-cm cutting height. Plots were sprayed with grass herbicides, as needed, to control grassy weeds. Plots were generally fertilized with 230 kg K ha-1 and 1 kg B ha-1 following the first harvest of each year except for those in Ashland and Spooner, which received 190 kg K ha-1 in early September of each year.
Experimental Design and Plot Size
Forage yield data were analyzed using mixed-models analysis, with years as a repeated measures factor (Littel et al., 1996). Cultivars were a fixed effect while replicates, years, and all interactions were random. The relative efficiency (RE) of the randomized complete block design was computed according to Steel et al. (1996). To predict the effect of a change in plot size for future trials at a particular site, the intrablock coefficient of heterogeneity was computed for each trial. This value was used to predict the change in the expected future LSD by doubling the plot size for each particular trial site (Lin and Binns, 1984). These values for each trial were compared with expected LSD values, which would be obtained by increasing the number of replicates from four to five, six, seven or eight.
Three mixed-models spatial analyses were applied to the forage yield data of each trial (Brownie et al., 1993). The first of these was a correlated errors model,
![]() |
k = the kth year effect, ß
jk = the interaction effect of the jth block with the kth year, Ti = the ith cultivar effect,
ij = the whole-plot error effect for cultivars,
Tik= the interaction effect of the ith cultivar with the kth year, and
ijk = the residual from the ijth plot in the kth year. The variancecovariance structure of the
ij values was modeled by an exponential function as
![]() |
is a parameter to be estimated. The variance-covariance model was computed using SAS code as shown by Brownie et al. (1993). This function models the correlation between
ij values as a decreasing exponential function of the distance between plots.
The second spatial model was a trend-analysis model
![]() |
1 through
8 are the regression parameters of the fitted response surface for the
ij values, and
'ij = the whole-plot residuals remaining after the response surface has been fit to the
ij values. Trend analysis attempts to fit a response surface to the whole-plot residuals after the variance associated with traditional model terms has been described. Trend analysis was limited to the full quadratic model for convenience and to avoid overfitting residuals (Brownie et al., 1993). For trials with relatively few cultivars, the interaction terms of rows and columns with years were removed from the model due to insufficient degrees of freedom. The third spatial-analysis model combined both trend analysis and correlated errors.
Judging Trial Value
Data from each trial were also analyzed separately for each year without spatial analysis. If the F-test for cultivars was not significant at P < 0.05 for an individual year of a trial, data from that year were discarded, and the data were reanalyzed using only the remaining years (those that had individual P < 0.05). Values of the LSD were compared between complete and subset analyses.
The following statistics were computed based on the original (without spatial analysis) over-years mixed-model analysis of each trial: Mean, mean square error, LSD(0.05), range among cultivar means, CV, modified coefficient of variation (MCV) (Caddel et al., 1996), P-value of the F-test for cultivars, and the least significant range (LSR). The LSR was computed as: 100(LSD)/range. These statistics were compared using simple correlation coefficients and scatter plots. In addition, Bowman and Rawlings (1995) method of plotting the natural log of mean square error vs. the natural log of trial mean was also used to identify candidate trials for discarding.
Genotype x Environment Interaction
A series of cultivar trials for a perennial crop such as alfalfa includes three types of environmental replication: Locations, repeated trials within each location, and repeated measures (years) within each trial. Rank correlation coefficients were used to quantify the relative genotype x environment interaction of these three sources, with higher correlations indicating relatively less genotype x environment interaction.
Rank correlation coefficients between arrays of cultivar means were computed for all pairs of trials that had at least four cultivars in common. These correlation coefficients were grouped according to those that measured correlation between two trials at different locations and those that measured correlation between two trials at one location. They were transformed to Z-statistics, from which the mean and standard error were computed and detransformed (Steel et al., 1996). Rank correlation coefficients were also computed between years within trials and pooled over trials using the Z transformation. Rank correlations between different trials were pooled using the Z transformation so that a single pooled rank correlation could be used to describe the average correlation of each individual trial with all others. These pooled correlations were plotted against mean square errors of individual trials to determine the existence of a relationship between trial precision and cultivar ranking.
| Results and discussion |
|---|
|
|
|---|
. Thus, while the randomized complete block was often very inefficient, this was due to reasons other than an excessive number of cultivars in some trials. Using blocking statistics from these trials, the predicted reduction in the LSD by doubling plot size ranged from 2.2 to 43.4%, with a median of 11.9% (Fig. 1) . In comparison, increasing the number of replicates by 50% (from four to six) would decrease the trial LSD by 18.4% and increasing the number of replicates by 100% (from four to eight) would decrease the trial LSD by 29.2%. Only nine of the 49 trials had expected LSD reductions from doubling plot size that exceeded 18.4%, and only two of the 49 trials had values that exceeded 29.2%. Furthermore, there were no consistent differences in the RE or expected LSD reduction among the 13 locations where these trials were conducted. Thus, while increasing the number of replicates is more labor intensive than increasing the plot size, it will also be more effective for improving the precision of alfalfa cultivar trials for the great majority of these field sites.
|
Correlated errors analysis resulted in improved precision for 46 of the 49 trials, with mean = 128% and median = 121% (Fig. 2) . Trend analysis resulted in improved precision for 34 of the 49 trials, with mean = 122% and median = 106%. Combined trend analysis plus correlated errors gave greater improvements in trial precision, with mean = 151% and median = 134%. The general trend toward improved precision for all spatial-analysis methods points out the serious deficiency of the randomized complete block design. The general superiority of the combined trend plus correlated errors analysis suggests that spatial variation rarely followed a predictable pattern for these locations. Thus, planning efficient blocks for future experiments would be extremely difficult (Casler, 1999). Of the six trials with overall P > 0.05 for cultivars, trend plus correlated errors analysis reduced three of these p-values to P < 0.05 while separate correlated errors analysis or trend analysis only reduced one each. The p-values for the other three trials were not lowered below P < 0.05 by any spatial-analysis method. These three trials were located at Chippewa Falls and Spooner.
|
Within the 49 trials there were 113 individual trial-years of data. For these 113 trial-years, 22 had P-values of cultivar F-tests that exceeded 0.05. Because cultivar means are separated with a treatment mean separation procedure such as the LSD, some degree of protection is needed against inflated experimentwise type I error rates that automatically occur when making all possible pairwise comparisons among the cultivars in a trial (Steel et al., 1996). The protected (or Fisher's) LSD provides this protection by the decision rule to proceed with computing the LSD and comparing cultivar means only when the P-value is less than the desired comparisonwise type I error rate (e.g., P < 0.05). Thus, there is no reason to proceed with making comparisons among cultivars for 22 of 113 trial-years. What should be done with these data?
We suggest that there are five reasons not to discard these data, based on individual trial-year P-values.
Individual Trials
There are three legitimate historical reasons for discarding the data of an entire alfalfa cultivar trial. The first is when some biological or physical disturbance arises to severely compromise the integrity of the trial such that it is obvious that the data will be biased, unreliable, or both. The second is when both the over-years and each of the individual-years analyses give no evidence that cultivar means can be separated at a desirable comparisonwise type I error rate (e.g., P = 0.05). The third is when some criterion suggests that a particular trial has unusually imprecise cultivar-mean estimates. The first two decision rules are straightforward and can usually be made easily by most researchers with little subjectivity, with the exception of what P-value to use as an elimination point. The third decision rule is much more difficult to apply because there are a number of statistics that can be used to invoke this rule, and it is difficult for researchers to agree on cutoff levels.
Ideally, a criterion used to judge trial value should be easy to compute and understand, and it should be based on biological reality. For many species, there exists a historical relationship between the arithmetic mean and mean square error of a trial. This relationship is generally linear if both are expressed as natural logarithms (Bowman and Rawlings, 1995). The relationship or lack thereof provides a basis for identifying trials with unusually high mean square errors for a given level of mean performance. If there is no historical relationship between the mean and mean square error, trial data are discarded if their mean square error exceeds the historical pooled mean square error by some specified amount. This historical pooled mean square error is computed from all historical trials within a defined data set. If a linear regression exists for log mean square error as a function of log mean, then a parallel rejection line is drawn some specified, albeit arbitrary, distance above the regression line. All new trials with a mean square error above this line are rejected because they have an unusually high mean square error for their level of mean performance.
Historically, agronomists have relied heavily on the CV as a measure of a trial's worth. A rejection value of 10% is often quoted, but this value appears to be arbitrary. Because the CV is computed entirely from the mean and mean square error of a trial, there exists a fixed algebraic relationship between the CV and the regression of the mean square error on the trial mean (Fig. 3) (Bowman and Rawlings, 1995). Bowman and Rawlings showed algebraically that this fixed relationship resulted in the implicit assumption that the slope of the regression of the log mean square error on the log mean was equal to a value of two. This assumed slope value was more than double the slopes observed from their data sets and 2.5 times greater than the observed slope of the 49 Wisconsin alfalfa cultivar trials (Fig. 3).
|
If there is no historical relationship between the log mean square error and log mean, the Bowman and Rawlings decision rule simply rejects trials with mean square errors exceeding a specific cutoff value, regardless of the mean. Fig. 3 clearly shows that neither the CV nor the Bowman and Rawlings decision rule is capable of detecting trials for which the cultivar F-test is nonsignificant. Thus, neither rule should be invoked as the sole criterion to judge the value of an alfalfa cultivar trial. Clearly these decision rules are incapable of discriminating trials on the basis of detectable differences among cultivar means. Rather, they rely on mean square error, ignoring the effect of the local environment on the expression of genetic differences among cultivars.
An alternative to the CV has been proposed: The MCV = 100(LSD/mean) (Caddel et al., 1996). This statistic is inherently more appealing than the CV because it uses the measure of significance among cultivar means, the LSD. However, as Caddel points out, the MCV is almost identical to the CV, differing only in the addition of some constants. If all trials had the same number of replicates and cultivars, these constants would be identical for all trials, and the correlation between the CV and MCV would be r = 1.00. In the Wisconsin alfalfa data set, this correlation was r = 0.84, P < 0.01 (Fig. 4) . This correlation coefficient was less than 1.00 simply because of the variation in the number of cultivars and replicates among trials (Caddel, 1993). Using the MCV to replace the CV in a trial-rejection decision rule had almost no impact on which alfalfa trials were discarded. Thus, the MCV suffers from the same fault as the CVit biases rejection of trials toward those with low means.
|
![]() |
One potential disadvantage of the LSR is that its denominator only accounts for two cultivars: The minimum and maximum. This could be remedied by using the standard deviation among cultivar means in the denominator. We did not use the standard deviation in the denominator for two reasons: (i) this alternative statistic was highly correlated with the LSR (r = 0.91, P < 0.01), and (ii) it is more difficult to interpret and describe in terms of absolute value.
In the Wisconsin alfalfa data set, the LSR ranged from 8.9 to 88.1%, with a median of 38.6%. Invoking Caddel's rule of discarding trials with MCVs >10% would have eliminated 18 of the 49 trials in this data set. Seven of the 18 trials had LSR values less than 40%, indicating that they were capable of detecting some relatively small differences in total forage yield among cultivars. For example, the LSD(0.05) values of these seven trials ranged from 0.83 to 1.34 Mg ha-1. Invoking the Bowman and Rawlings rule of discarding trials with mean square errors more than double their expectation based on the trial mean would have eliminated four trials with LSRs ranging from 43 to 47% and LSD(0.05) values ranging from 1.03 to 1.30 Mg ha-1.
Even though the trials that were rejected by the MCV or Bowman and Rawlings decision rules had lower precision than the average of the 49 trials, they nevertheless were capable of discriminating cultivars based on the mean total forage yield. We suggest that the LSR would make a more appropriate decision rule for discarding alfalfa cultivar trial data because it is a more direct measure of a trial's ability to discriminate among cultivars, it is not correlated with the trial mean, and a cutoff value can be determined more directly in terms that relate to discrimination among cultivars. For example, in this data set there was a more or less continuous distribution of LSR values up to 54% (Fig. 5) . There were six trials with LSRs above this value, ranging from 61 to 88%. Four of these six trials had overall P-values >0.05 and would have also been rejected by the P-value rule. All six trials has LSD values that ranged from 0.45 to 1.12 Mg ha-1, but they were ineffective in discriminating among cultivar means due to their extremely low ranges among cultivar means (0.741.79 Mg ha-1). Thus, while some of the six trials did not have an unusually high mean square error, their low ranges made them ineffective cultivar trials. Interestingly, four of the six trials represented the only trials conducted at two locations: Spooner and Chippewa Falls. These results suggest that these test sites are not suitable for discriminating among alfalfa cultivars and should not be used in the future. Furthermore, neither the Bowman and Rawlings decision rule nor the MCV decision rule would have identified the inadequacy of these locations because they were not typified by an abnormally high mean square error or LSD per se. Using the MCV >10% decision rule would have eliminated 17 trials10 with a trial mean < the median and seven with a trial mean
the median (Fig. 5). Conversely, any decision rule for the LSR would not have discriminated among trials based on their trial mean.
|
Pooled rank correlation coefficients for cultivar means of each trial with all other trials were uncorrelated with mean square error for each trial (Fig. 6) . The rank correlation coefficient between the pooled correlations among trials and the corresponding mean square error was r = 0.01 (P > 0.05). Low pooled correlation coefficients (r < 0.40) occurred over the entire range of the mean square error. Thus, the precision of an individual cultivar trial per se was not a determinant of the cultivar ranking for that trial. Cultivar rankings appeared to be primarily influenced by environmental differences among locations; secondly, by changes in weather, establishment characteristics among trials within locations, or both; and thirdly, by age of the stand. Furthermore, previous studies have shown that low precision per se in randomized complete block experiments does not affect the ranking of cultivars (Brownie et al., 1993; Casler, 1999).
|
| Summary and conclusions |
|---|
|
|
|---|
Cultivar rankings were not influenced by the degree of precision for individual alfalfa cultivar trials. Instead, cultivar rankings varied among locations, trials within locations, and years within trials, in proportion to the expected edaphic and climatic differences among these three factors. Cultivar rankings are not expected to be influenced by discarding trials on the basis of low internal precision (high mean square error).
Alfalfa cultivar trial data from individual trial-years should not be discarded, unless there is a compelling biological or physical disturbance only present in a subset of years. There is no statistical advantage to discarding data from individual years in favor of an analysis of only those years that empirically discriminated among cultivars. Even though some years may not show cultivar discrimination, they serve as a valuable form of replication and are essential for determining yield responses to increasing plant age.
Data from entire alfalfa cultivar trials should not be discarded, unless there are compelling biological or physical disturbances that are expected to cause a bias or invalidity to the data. We recognize that there have been historical arguments to discard some trials on the basis of low statistical precision. We feel that trials should not be discarded solely on the basis of low statistical precision, but that end users of the data should be allowed to judge for themselves the value of the results. Nevertheless, we recognize that some researchers may disagree with this opinion and consider rejecting some trials. In this case, we recommend two decision rules in order of importance: (i) nonsignificant cultivar F-tests for all individual years and the combined over-years analysis or (ii) an unusually high mean square error relative to the trial mean or the range among cultivar means. The second decision rule should be based on either a historical relationship between the mean square error and trial mean, if it exists, or the LSR, which provides a direct assessment of a trial's ability to separate cultivar means. The CV or MCV should not be used to judge the value of an alfalfa cultivar trial because they are unduly biased against trials with a low mean yield.
Received for publication December 4, 1998.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
K. F. Smith and M. D. Casler Spatial Analysis of Forage Grass Trials across Locations, Years, and Harvests Crop Sci., January 1, 2004; 44(1): 56 - 62. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| The SCI Journals | Crop Science | Vadose Zone Journal | |||
| Journal of Natural Resources and Life Sciences Education |
Soil Science Society of America Journal | ||||
| Journal of Plant Registrations | Journal of Environmental Quality |
The Plant Genome | |||