Agronomy Journal Journal of Natural Resources and Life Sciences Education
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (4)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Casler, M. D.
Right arrow Articles by Undersander, D. J.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Casler, M. D.
Right arrow Articles by Undersander, D. J.
Agricola
Right arrow Articles by Casler, M. D.
Right arrow Articles by Undersander, D. J.
Related Collections
Right arrow Alfalfa
Right arrow Experiment Design
Right arrow Plant and Environment Interactions
Right arrow Statistics
Agronomy Journal 92:1064-1071 (2000)
© 2000 American Society of Agronomy

ALFALFA

Forage Yield Precision, Experimental Design, and Cultivar Mean Separation for Alfalfa Cultivar Trials

Michael D. Casler and Daniel J. Undersander

Dep. of Agronomy, Univ. of Wisconsin-Madison, 1575 Linden Dr., Madison, WI 53706-1597 USA

mdcasler{at}facstaff.wisc.edu


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results and discussion
 Summary and conclusions
 REFERENCES
 
As alfalfa (Medicago sativa L.) cultivars have become more numerous in recent years, the issue of precison of alfalfa forage yield determinations has become more important. The objective of this study was to develop a set of recommendations for improving the precision of alfalfa cultivar forage-yield estimates. Inferences were derived from 49 alfalfa cultivar trials conducted at 13 Wisconsin locations between 1984 and 1996. Although randomized complete block designs were sometimes effective, spatial analysis offers considerable potential for improved precision. Increasing the number of replicates was expected to be more effective than increasing plot size. Trial data should be discarded only when severe and irreversible biological or physical disturbances are present. If researchers feel the need to discard data or entire trials on the basis of low statistical precision as an additional criterion, the decision should be based on (i) nonsignificant F-tests for cultivars of all individual years and for the combined over-years analysis or (ii) an unusually high mean square error relative to the trial mean or range among cultivar means. Trial data should not be rejected based on the coefficient of variation (CV). Genotype x environment interactions followed patterns based largely on expectations of edaphic and climatic environmental differences. Inherent precision of alfalfa cultivar trials did not influence cultivar rankings per se.

Abbreviations: AES, agricultural experiment stations • CV, coefficient of variation • LSR, least significant range • MCV, modified coefficient of variation • RE, relative efficiency


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results and discussion
 Summary and conclusions
 REFERENCES
 
BREEDERS, SEED DEALERS, AND GROWERS OF ALFALFA all rely heavily on data generated from field trials of alfalfa cultivars. Because of the large number of alfalfa cultivars on the commercial market, and the large number of new cultivars appearing each year, this information carries considerable weight in making decisions about which experimental lines breeders will advance to cultivar status and which cultivars growers will use for forage production. Forage yield is the most important variable measured in most of these trials. Measurement of yield over several years and trials allows for a determination of a particular cultivar's adaptation or persistence to a particular region.

A large proportion of these trials are operated on a fee basis by state agricultural experiment stations (AES). Several criticisms of these trials have arisen in recent years and have been the focus of discussions in meetings of the Central Alfalfa Improvement Conference, the National Alfalfa Improvement Conference, and the American Society of Agronomy (Caddel, 1993; Caddel et al., 1996; T.H. Busbice, personal communication, 1996). Large experimental errors in some AES trials compared with privately operated trials suggest a lack of precision in some AES trials. Large differences in cultivar ranking among locations is sometimes used to support the contention of imprecise trials. These observations have led to recommendations to discard data from some trials.

The number of new alfalfa cultivars reaching the market has increased considerably in recent years. This has led to two substantive changes in the alfalfa seed industry. First, it has become increasingly difficult to separate alfalfa cultivars on the basis of forage yield. A rapid increase in the number of cultivars has led to compaction of cultivar means so that smaller LSD values are required to detect differences. While this raises the question of whether or not true differences in forage yield exist among many cultivars, only a more precise test of cultivar mean differences will provide an answer to the question. Second, the increasing number of cultivars has led to stronger competition among alfalfa marketers. Increased competition has promoted increased interest in alfalfa cultivar trials and their ability to detect perceived true differences in forage yield potential. These factors suggest that AES researchers conducting alfalfa cultivar trials may need to improve trial precision to keep up with the changing marketplace.

The objective of this study was to estimate experimental errors, blocking statistics, spatial variation, various measures of trial precision, and genotype x environment interactions for 49 alfalfa cultivar trials conducted at 12 Wisconsin locations between 1984 and 1996. Relationships among these statistics were examined with the goal of recommending changes in experimental design for alfalfa cultivar trials and a set of decision rules for measuring the value of a trial, discarding data from individual trials, and discarding entire trials.


    Materials and methods
 TOP
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results and discussion
 Summary and conclusions
 REFERENCES
 
Alfalfa cultivar trials were planted at 13 locations throughout Wisconsin between 1984 and 1995 (Table 1) . The locations were Arlington (ARL), Ashland (ASH), Beaver Dam (BVD), Chippewa Falls (CHP), Elkhorn (ELK), Fond du Lac (FON), Hancock (HAN), Lancaster (LAN), Marshfield (MSH), Oshkosh (OSH), River Falls (RIF), Sheboygan (SHE), and Spooner (SPN). Trials contained 11 to 125 cultivars, experimental synthetics, or both. Beginning in 1994, trials were divided according to experimental synthetics vs. released cultivars. Plots were direct-seeded in drilled rows at a rate of 22.4 kg ha-1. Plot size was either 0.9 by 6.0 m or 1.2 by 4.5 m. The experimental design of each trial was a randomized complete block with three, four, or eight replicates. Seed lots were all inoculated with N-fixing bacteria before planting.


View this table:
[in this window]
[in a new window]
 
Table 1 Number of entries, blocks, and harvest years for 49 alfalfa cultivar trials conducted in Wisconsin between 1984 and 1996

 
Seeding-year harvests were generally made twice for the southern locations and once for the northern locations (Ashland and Spooner). Plots were occasionally sprayed with preemergence or postemergence herbicides to control grassy weeds in the seeding year. Plots were limed and fertilized according to soil-test recommendations. Data collected from seeding-year plots were not used in this study.

Trials were harvested for up to 5 yr, but usually for 3 yr, and total annual forage yield was computed on a dry matter basis for each plot. Three or four harvests were made each year using a sickle-bar or flail-type harvester at a 5-cm cutting height. Plots were sprayed with grass herbicides, as needed, to control grassy weeds. Plots were generally fertilized with 230 kg K ha-1 and 1 kg B ha-1 following the first harvest of each year except for those in Ashland and Spooner, which received 190 kg K ha-1 in early September of each year.

Experimental Design and Plot Size
Forage yield data were analyzed using mixed-models analysis, with years as a repeated measures factor (Littel et al., 1996). Cultivars were a fixed effect while replicates, years, and all interactions were random. The relative efficiency (RE) of the randomized complete block design was computed according to Steel et al. (1996). To predict the effect of a change in plot size for future trials at a particular site, the intrablock coefficient of heterogeneity was computed for each trial. This value was used to predict the change in the expected future LSD by doubling the plot size for each particular trial site (Lin and Binns, 1984). These values for each trial were compared with expected LSD values, which would be obtained by increasing the number of replicates from four to five, six, seven or eight.

Three mixed-models spatial analyses were applied to the forage yield data of each trial (Brownie et al., 1993). The first of these was a correlated errors model,

where Yijk = the observation made on the ith cultivar of the jth block in the kth year, µ = the overall mean, ßj = the jth block effect, {gamma}k = the kth year effect, ß{gamma}jk = the interaction effect of the jth block with the kth year, Ti = the ith cultivar effect, {delta}ij = the whole-plot error effect for cultivars, {gamma}Tik= the interaction effect of the ith cultivar with the kth year, and {epsilon}ijk = the residual from the ijth plot in the kth year. The variance–covariance structure of the {delta}ij values was modeled by an exponential function as

where d(ij,i'j') = the center-to-center distance between plots ij and i'j' and {theta} is a parameter to be estimated. The variance-covariance model was computed using SAS code as shown by Brownie et al. (1993). This function models the correlation between {delta}ij values as a decreasing exponential function of the distance between plots.

The second spatial model was a trend-analysis model

where Rm = the mth row number, Cn = the nth column number, {alpha}1 through {alpha}8 are the regression parameters of the fitted response surface for the {delta}ij values, and {delta}'ij = the whole-plot residuals remaining after the response surface has been fit to the {delta}ij values. Trend analysis attempts to fit a response surface to the whole-plot residuals after the variance associated with traditional model terms has been described. Trend analysis was limited to the full quadratic model for convenience and to avoid overfitting residuals (Brownie et al., 1993). For trials with relatively few cultivars, the interaction terms of rows and columns with years were removed from the model due to insufficient degrees of freedom. The third spatial-analysis model combined both trend analysis and correlated errors.

Judging Trial Value
Data from each trial were also analyzed separately for each year without spatial analysis. If the F-test for cultivars was not significant at P < 0.05 for an individual year of a trial, data from that year were discarded, and the data were reanalyzed using only the remaining years (those that had individual P < 0.05). Values of the LSD were compared between complete and subset analyses.

The following statistics were computed based on the original (without spatial analysis) over-years mixed-model analysis of each trial: Mean, mean square error, LSD(0.05), range among cultivar means, CV, modified coefficient of variation (MCV) (Caddel et al., 1996), P-value of the F-test for cultivars, and the least significant range (LSR). The LSR was computed as: 100(LSD)/range. These statistics were compared using simple correlation coefficients and scatter plots. In addition, Bowman and Rawlings (1995) method of plotting the natural log of mean square error vs. the natural log of trial mean was also used to identify candidate trials for discarding.

Genotype x Environment Interaction
A series of cultivar trials for a perennial crop such as alfalfa includes three types of environmental replication: Locations, repeated trials within each location, and repeated measures (years) within each trial. Rank correlation coefficients were used to quantify the relative genotype x environment interaction of these three sources, with higher correlations indicating relatively less genotype x environment interaction.

Rank correlation coefficients between arrays of cultivar means were computed for all pairs of trials that had at least four cultivars in common. These correlation coefficients were grouped according to those that measured correlation between two trials at different locations and those that measured correlation between two trials at one location. They were transformed to Z-statistics, from which the mean and standard error were computed and detransformed (Steel et al., 1996). Rank correlation coefficients were also computed between years within trials and pooled over trials using the Z transformation. Rank correlations between different trials were pooled using the Z transformation so that a single pooled rank correlation could be used to describe the average correlation of each individual trial with all others. These pooled correlations were plotted against mean square errors of individual trials to determine the existence of a relationship between trial precision and cultivar ranking.


    Results and discussion
 TOP
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results and discussion
 Summary and conclusions
 REFERENCES
 
Experimental Design and Plot Size
The 49 alfalfa cultivar trials had REs of the randomized complete block design that ranged from 95 to 374%, with a median of 128%. Because of the large range in the number of cultivars among these trials (11–125), we would expect trials with a larger number of cultivars to have the lowest REs due to their larger block sizes, which would potentially increase within-block heterogeneity. This did not occur; the correlation coefficient between the RE and number of cultivars was . Thus, while the randomized complete block was often very inefficient, this was due to reasons other than an excessive number of cultivars in some trials.

Using blocking statistics from these trials, the predicted reduction in the LSD by doubling plot size ranged from 2.2 to 43.4%, with a median of 11.9% (Fig. 1) . In comparison, increasing the number of replicates by 50% (from four to six) would decrease the trial LSD by 18.4% and increasing the number of replicates by 100% (from four to eight) would decrease the trial LSD by 29.2%. Only nine of the 49 trials had expected LSD reductions from doubling plot size that exceeded 18.4%, and only two of the 49 trials had values that exceeded 29.2%. Furthermore, there were no consistent differences in the RE or expected LSD reduction among the 13 locations where these trials were conducted. Thus, while increasing the number of replicates is more labor intensive than increasing the plot size, it will also be more effective for improving the precision of alfalfa cultivar trials for the great majority of these field sites.



View larger version (49K):
[in this window]
[in a new window]
 
Fig. 1 Histogram of the predicted reduction in the LSD(0.05) value resulting from doubling the plot size in 49 alfalfa cultivar trials. Arrows represent predicted LSD reductions resulting from increasing the number of replicates from four to the specified value (r)

 
The low median RE of the randomized complete block designs suggests that considerable improvements can be made in the precision of our trial system by using one of two approaches: Incomplete block designs or spatial adjustment of forage yields. Incomplete block designs of alfalfa cultivar trials in New York had a mean RE of 191% (Hansen et al., 1996). Lattice designs of forage grass cultivar trials in Wisconsin had a mean RE of 145% (Casler, 1999). The severe restrictions on the number of cultivars for lattice designs can be avoided by using more general incomplete block designs (Cochran and Cox, 1957), row-column designs (Patterson and Robinson, 1989), or alpha designs (Patterson et al., 1978).

Correlated errors analysis resulted in improved precision for 46 of the 49 trials, with mean = 128% and median = 121% (Fig. 2) . Trend analysis resulted in improved precision for 34 of the 49 trials, with mean = 122% and median = 106%. Combined trend analysis plus correlated errors gave greater improvements in trial precision, with mean = 151% and median = 134%. The general trend toward improved precision for all spatial-analysis methods points out the serious deficiency of the randomized complete block design. The general superiority of the combined trend plus correlated errors analysis suggests that spatial variation rarely followed a predictable pattern for these locations. Thus, planning efficient blocks for future experiments would be extremely difficult (Casler, 1999). Of the six trials with overall P > 0.05 for cultivars, trend plus correlated errors analysis reduced three of these p-values to P < 0.05 while separate correlated errors analysis or trend analysis only reduced one each. The p-values for the other three trials were not lowered below P < 0.05 by any spatial-analysis method. These three trials were located at Chippewa Falls and Spooner.



View larger version (40K):
[in this window]
[in a new window]
 
Fig. 2 Histograms of relative efficiencies (REs) for 49 alfalfa cultivar trials. The REs of correlated errors analysis (CE), trend analysis, and trend plus correlated errors analysis were computed relative to the randomized complete block design without spatial analysis

 
Judging Trial Value
Individual Years within Trials
The rank values of cultivars within trials did not vary considerably from year to year. Using the rank of over-years means as a standard of comparison, rank correlations of individual years were as follows: r = 0.39 to 0.98 with a mean of 0.75 ± 0.03 for Yr 1, r = 0.60 to 0.98 with a mean of 0.85 ± 0.02 for Yr 2, and r = 0.61 to 0.98 with a mean of 0.78 ± 0.03 for Yr 3. These results are similar to results from tall fescue cultivar trials (Nepal and van Santen, 1992). Thus, the main advantages of multiple years in alfalfa cultivar trials is the added precision that derives from using years as a form of replication and from the added information on persistence and long-term survival.

Within the 49 trials there were 113 individual trial-years of data. For these 113 trial-years, 22 had P-values of cultivar F-tests that exceeded 0.05. Because cultivar means are separated with a treatment mean separation procedure such as the LSD, some degree of protection is needed against inflated experimentwise type I error rates that automatically occur when making all possible pairwise comparisons among the cultivars in a trial (Steel et al., 1996). The protected (or Fisher's) LSD provides this protection by the decision rule to proceed with computing the LSD and comparing cultivar means only when the P-value is less than the desired comparisonwise type I error rate (e.g., P < 0.05). Thus, there is no reason to proceed with making comparisons among cultivars for 22 of 113 trial-years. What should be done with these data?

We suggest that there are five reasons not to discard these data, based on individual trial-year P-values.

  1. Half of the 22 trial-year combinations were the first full year of trial production. Thus, it appears considerably less likely that cultivars can be discriminated based on forage yield in the early stages of a trial before significant biotic stresses, abiotic stresses, or both have had their effects.
  2. In three of the 49 trials, all trial-years had P-values that exceeded 0.05. This would argue that each of these trials should be discarded. However, for one of these trials, the combined analyses over years resulted in a P-value less than 0.01. The statistical significance of the combined analysis was due to the added replication of the multiple years and the strong positive correlation (minimal changes in cultivar ranking) between years within the trial. Despite the individual-years analyses, this trial provided valuable information.
  3. The 22 nonsignificant trial-years derived from 17 trials. For 14 of these trials, it was possible to compute a combined over-years analysis for all years and compare its results with a combined over-years analysis that includes only those years with a significant (P < 0.05) cultivar F-test. Values of the LSD were computed for each of these analyses. After discarding years with P > 0.05, half of the 14 trials had an LSD that decreased an average of 12.5%, presumably due to improved precision in the remaining data. The LSD of the other half increased an average of 32.7% after discarding years with P > 0.05 due to fewer repeated measurements resulting from discarding those years. Thus, automatically discarding trial-years with P > 0.05 does not necessarily improve trial precision and may substantially decrease trial precision.
  4. As discussed above, years within trials were always positively correlated with each other in this data set, suggesting that their rankings change little from one year to another. Thus, a lack of significance for an individual trial-year might arise from an unusually large mean square error, which in turn, may arise from sampling variation in the estimation of the mean square error or from some biological disturbance in the trial. Nevertheless, the lack of significant changes in cultivar ranks from year to year argues that we should be tolerant of variation in mean square errors within the limits of what can be tolerated by the assumption of heterogeneous errors. A certain amount of variation in mean square errors is a natural and unavoidable consequence of research.
  5. Finally, discarding individual years from a trial eliminates the possibility of examining year-to-year stability or yield trends for individual cultivars. Sometimes this examination can reveal significant stabilities or instabilities that would not be revealed by a individual-year or combined-year analysis.

Individual Trials
There are three legitimate historical reasons for discarding the data of an entire alfalfa cultivar trial. The first is when some biological or physical disturbance arises to severely compromise the integrity of the trial such that it is obvious that the data will be biased, unreliable, or both. The second is when both the over-years and each of the individual-years analyses give no evidence that cultivar means can be separated at a desirable comparisonwise type I error rate (e.g., P = 0.05). The third is when some criterion suggests that a particular trial has unusually imprecise cultivar-mean estimates. The first two decision rules are straightforward and can usually be made easily by most researchers with little subjectivity, with the exception of what P-value to use as an elimination point. The third decision rule is much more difficult to apply because there are a number of statistics that can be used to invoke this rule, and it is difficult for researchers to agree on cutoff levels.

Ideally, a criterion used to judge trial value should be easy to compute and understand, and it should be based on biological reality. For many species, there exists a historical relationship between the arithmetic mean and mean square error of a trial. This relationship is generally linear if both are expressed as natural logarithms (Bowman and Rawlings, 1995). The relationship or lack thereof provides a basis for identifying trials with unusually high mean square errors for a given level of mean performance. If there is no historical relationship between the mean and mean square error, trial data are discarded if their mean square error exceeds the historical pooled mean square error by some specified amount. This historical pooled mean square error is computed from all historical trials within a defined data set. If a linear regression exists for log mean square error as a function of log mean, then a parallel rejection line is drawn some specified, albeit arbitrary, distance above the regression line. All new trials with a mean square error above this line are rejected because they have an unusually high mean square error for their level of mean performance.

Historically, agronomists have relied heavily on the CV as a measure of a trial's worth. A rejection value of 10% is often quoted, but this value appears to be arbitrary. Because the CV is computed entirely from the mean and mean square error of a trial, there exists a fixed algebraic relationship between the CV and the regression of the mean square error on the trial mean (Fig. 3) (Bowman and Rawlings, 1995). Bowman and Rawlings showed algebraically that this fixed relationship resulted in the implicit assumption that the slope of the regression of the log mean square error on the log mean was equal to a value of two. This assumed slope value was more than double the slopes observed from their data sets and 2.5 times greater than the observed slope of the 49 Wisconsin alfalfa cultivar trials (Fig. 3).



View larger version (17K):
[in this window]
[in a new window]
 
Fig. 3 Plot of the natural logarithms of mean square error (MSe) vs. trial mean for 49 alfalfa cultivar trials. The solid line is the linear regression of the plotted data points: ln(MSe) = 0.4719ln(Mean) - 1.3931 (P < 0.01). The dashed and dotted lines are the decision rules for rejecting trials based on the predicted MSe or coef. of variation (CV), respectively. Open vs. closed symbols are based on the P-value from the F-test for cultivar means

 
Bowman and Rawlings suggested discarding trials when their observed mean square error was more than double the predicted mean square error for their level of mean performance. On average, this is equivalent to discarding trials when the CV is >1.41 times the mean CV. However, because the CV decision rule assumes an unrealistically high slope, some trials are penalized simply because they have low means. In the Wisconsin alfalfa data set, five trials would be discarded by the CV decision rule (above the dotted line in Fig. 3), four by the historical regression slope rule (above the dashed line in Fig. 3), with only three of these in common. Furthermore, neither of these rules was capable of detecting five of the six trials that had P > 0.05 (open circles in Fig. 3), suggesting that multiple criteria must be used in determining which trials should be discarded. While the specific cutoff value of the Bowman and Rawlings regression decision rule can be debated, it is clear that the CV biases the rejection of cultivar trials toward those with low mean values.

If there is no historical relationship between the log mean square error and log mean, the Bowman and Rawlings decision rule simply rejects trials with mean square errors exceeding a specific cutoff value, regardless of the mean. Fig. 3 clearly shows that neither the CV nor the Bowman and Rawlings decision rule is capable of detecting trials for which the cultivar F-test is nonsignificant. Thus, neither rule should be invoked as the sole criterion to judge the value of an alfalfa cultivar trial. Clearly these decision rules are incapable of discriminating trials on the basis of detectable differences among cultivar means. Rather, they rely on mean square error, ignoring the effect of the local environment on the expression of genetic differences among cultivars.

An alternative to the CV has been proposed: The MCV = 100(LSD/mean) (Caddel et al., 1996). This statistic is inherently more appealing than the CV because it uses the measure of significance among cultivar means, the LSD. However, as Caddel points out, the MCV is almost identical to the CV, differing only in the addition of some constants. If all trials had the same number of replicates and cultivars, these constants would be identical for all trials, and the correlation between the CV and MCV would be r = 1.00. In the Wisconsin alfalfa data set, this correlation was r = 0.84, P < 0.01 (Fig. 4) . This correlation coefficient was less than 1.00 simply because of the variation in the number of cultivars and replicates among trials (Caddel, 1993). Using the MCV to replace the CV in a trial-rejection decision rule had almost no impact on which alfalfa trials were discarded. Thus, the MCV suffers from the same fault as the CV—it biases rejection of trials toward those with low means.



View larger version (19K):
[in this window]
[in a new window]
 
Fig. 4 Scatterplot of the empirical relationship between modified coef. of variation (MCV) and coef. of variation (CV) for 49 alfalfa cultivar trials

 
The MCV is appealing in that it uses the LSD as a component of its decision rule, more directly relating trial rejection to the differences among cultivar means. Extending this concept, we suggest an alternative statistic, which we term the LSR

where range = the difference between the maximum and minimum cultivar means. This statistic more closely reflects the specific goal of cultivar testing: The effective statistical separation of cultivar means. It accounts for both the observed mean square error and the phenotypic expression of varietal differences. Furthermore, in the Wisconsin alfalfa data set, the LSR was independent of the trial mean (r = 0.02, P > 0.05).

One potential disadvantage of the LSR is that its denominator only accounts for two cultivars: The minimum and maximum. This could be remedied by using the standard deviation among cultivar means in the denominator. We did not use the standard deviation in the denominator for two reasons: (i) this alternative statistic was highly correlated with the LSR (r = 0.91, P < 0.01), and (ii) it is more difficult to interpret and describe in terms of absolute value.

In the Wisconsin alfalfa data set, the LSR ranged from 8.9 to 88.1%, with a median of 38.6%. Invoking Caddel's rule of discarding trials with MCVs >10% would have eliminated 18 of the 49 trials in this data set. Seven of the 18 trials had LSR values less than 40%, indicating that they were capable of detecting some relatively small differences in total forage yield among cultivars. For example, the LSD(0.05) values of these seven trials ranged from 0.83 to 1.34 Mg ha-1. Invoking the Bowman and Rawlings rule of discarding trials with mean square errors more than double their expectation based on the trial mean would have eliminated four trials with LSRs ranging from 43 to 47% and LSD(0.05) values ranging from 1.03 to 1.30 Mg ha-1.

Even though the trials that were rejected by the MCV or Bowman and Rawlings decision rules had lower precision than the average of the 49 trials, they nevertheless were capable of discriminating cultivars based on the mean total forage yield. We suggest that the LSR would make a more appropriate decision rule for discarding alfalfa cultivar trial data because it is a more direct measure of a trial's ability to discriminate among cultivars, it is not correlated with the trial mean, and a cutoff value can be determined more directly in terms that relate to discrimination among cultivars. For example, in this data set there was a more or less continuous distribution of LSR values up to 54% (Fig. 5) . There were six trials with LSRs above this value, ranging from 61 to 88%. Four of these six trials had overall P-values >0.05 and would have also been rejected by the P-value rule. All six trials has LSD values that ranged from 0.45 to 1.12 Mg ha-1, but they were ineffective in discriminating among cultivar means due to their extremely low ranges among cultivar means (0.74–1.79 Mg ha-1). Thus, while some of the six trials did not have an unusually high mean square error, their low ranges made them ineffective cultivar trials. Interestingly, four of the six trials represented the only trials conducted at two locations: Spooner and Chippewa Falls. These results suggest that these test sites are not suitable for discriminating among alfalfa cultivars and should not be used in the future. Furthermore, neither the Bowman and Rawlings decision rule nor the MCV decision rule would have identified the inadequacy of these locations because they were not typified by an abnormally high mean square error or LSD per se. Using the MCV >10% decision rule would have eliminated 17 trials—10 with a trial mean < the median and seven with a trial mean >= the median (Fig. 5). Conversely, any decision rule for the LSR would not have discriminated among trials based on their trial mean.



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 5 Scatterplot of the empirical relationship between the least significant range (LSR) and modified coef. of variation (MCV) for 49 alfalfa cultivar trials. Open circles represent trials for which the trial mean was less than the median of the trial means. Closed circles represent trials for which the trial mean was equal to or greater than the median of the trial means

 
Genotype x Environment Interaction
Genotype x environment interaction patterns behaved largely as expected. The pooled rank correlation coefficient between trials within locations was 0.64 ± 0.02 (P < 0.01) while the pooled rank correlation coefficient between trials at different locations was 0.53 ± 0.01 (P < 0.01). These two correlation coefficients were also significantly different from each other (P < 0.05). Thus, as expected, there was a greater tendency for trials at a single location to rank cultivars in a similar manner than trials at different locations. Furthermore, the pooled rank correlation coefficient among years within trials was 0.74 ± 0.01 (P < 0.01), which was significantly higher than either of the two pooled correlations among trials (P < 0.05). Thus, new seedings and changes in location have a greater effect on cultivar ranking than changes in weather or plant age within the lifetime of a trial (i.e., repeated measures).

Pooled rank correlation coefficients for cultivar means of each trial with all other trials were uncorrelated with mean square error for each trial (Fig. 6) . The rank correlation coefficient between the pooled correlations among trials and the corresponding mean square error was r = 0.01 (P > 0.05). Low pooled correlation coefficients (r < 0.40) occurred over the entire range of the mean square error. Thus, the precision of an individual cultivar trial per se was not a determinant of the cultivar ranking for that trial. Cultivar rankings appeared to be primarily influenced by environmental differences among locations; secondly, by changes in weather, establishment characteristics among trials within locations, or both; and thirdly, by age of the stand. Furthermore, previous studies have shown that low precision per se in randomized complete block experiments does not affect the ranking of cultivars (Brownie et al., 1993; Casler, 1999).



View larger version (24K):
[in this window]
[in a new window]
 
Fig. 6 Scatterplot of the mean correlation coef. between cultivar rankings of 49 individual alfalfa cultivar trials with cultivar rankings of 48 other trials vs. the mean square error of the 49 individual trials

 
Varietal differences in mean forage yield are often small and highly sensitive to genotype x environment interactions. Multiple-location testing is essential to develop clear conclusions regarding mean yield and adaptation of a cultivar relative to other cultivars. Crossover interactions between cultivars and locations are common in this data set and others of similar structure. In this data set, the relative precision of individual trials did not influence these interactions. They will exist regardless of the number of trials discarded due to unusually low internal precision.


    Summary and conclusions
 TOP
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results and discussion
 Summary and conclusions
 REFERENCES
 
Randomized complete block designs for alfalfa cultivar trials in Wisconsin were only moderately efficient on average. Significant improvements in trial precision can be made by using mixed-models adjustments for spatial variation. Analysis of within-block heterogeneity indicated that, for the majority of trial sites, increasing the number of replicates by one or two would more efficiently reduce LSD values than doubling plot size.

Cultivar rankings were not influenced by the degree of precision for individual alfalfa cultivar trials. Instead, cultivar rankings varied among locations, trials within locations, and years within trials, in proportion to the expected edaphic and climatic differences among these three factors. Cultivar rankings are not expected to be influenced by discarding trials on the basis of low internal precision (high mean square error).

Alfalfa cultivar trial data from individual trial-years should not be discarded, unless there is a compelling biological or physical disturbance only present in a subset of years. There is no statistical advantage to discarding data from individual years in favor of an analysis of only those years that empirically discriminated among cultivars. Even though some years may not show cultivar discrimination, they serve as a valuable form of replication and are essential for determining yield responses to increasing plant age.

Data from entire alfalfa cultivar trials should not be discarded, unless there are compelling biological or physical disturbances that are expected to cause a bias or invalidity to the data. We recognize that there have been historical arguments to discard some trials on the basis of low statistical precision. We feel that trials should not be discarded solely on the basis of low statistical precision, but that end users of the data should be allowed to judge for themselves the value of the results. Nevertheless, we recognize that some researchers may disagree with this opinion and consider rejecting some trials. In this case, we recommend two decision rules in order of importance: (i) nonsignificant cultivar F-tests for all individual years and the combined over-years analysis or (ii) an unusually high mean square error relative to the trial mean or the range among cultivar means. The second decision rule should be based on either a historical relationship between the mean square error and trial mean, if it exists, or the LSR, which provides a direct assessment of a trial's ability to separate cultivar means. The CV or MCV should not be used to judge the value of an alfalfa cultivar trial because they are unduly biased against trials with a low mean yield.

Received for publication December 4, 1998.
    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 Materials and methods
 Results and discussion
 Summary and conclusions
 REFERENCES
 




This article has been cited by other articles:


Home page
Crop Sci.Home page
K. F. Smith and M. D. Casler
Spatial Analysis of Forage Grass Trials across Locations, Years, and Harvests
Crop Sci., January 1, 2004; 44(1): 56 - 62.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (4)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Casler, M. D.
Right arrow Articles by Undersander, D. J.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Casler, M. D.
Right arrow Articles by Undersander, D. J.
Agricola
Right arrow Articles by Casler, M. D.
Right arrow Articles by Undersander, D. J.
Related Collections
Right arrow Alfalfa
Right arrow Experiment Design
Right arrow Plant and Environment Interactions
Right arrow Statistics


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
The SCI Journals Crop Science Vadose Zone Journal
Journal of Natural Resources
and Life Sciences Education
Soil Science Society of America Journal
Journal of Plant Registrations Journal of
Environmental Quality
The Plant Genome