Agronomy Journal Journal of Natural Resources and Life Sciences Education
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (2)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Basford, K. E.
Right arrow Articles by DeLacy, I. H.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Basford, K. E.
Right arrow Articles by DeLacy, I. H.
Agricola
Right arrow Articles by Basford, K. E.
Right arrow Articles by DeLacy, I. H.
Related Collections
Right arrow Biometrics
Right arrow Data Management
Right arrow Crop Models
Right arrow Plant and Environment Interactions
Right arrow Crop Genetics
Right arrow Statistics
Published in Agron. J. 96:143-147 (2004).
© American Society of Agronomy
677 S. Segoe Rd., Madison, WI 53711 USA

STATISTICS

Mixed Model Formulations for Multi-Environment Trials

K. E. Basford*,a, W. T. Federerb and I. H. DeLacya

a The Univ. of Queensland, Brisbane, QLD 4072, Australia
b Cornell Univ., Ithaca, NY 14850, USA

* Corresponding author (k.e.basford{at}uq.edu.au).

Received for publication October 2, 2002.

    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 AGRICULTURAL BACKGROUND
 STATISTICAL ISSUES
 GENETIC ISSUES
 EXAMPLE
 DISCUSSION AND CONCLUSION
 REFERENCES
 
When studying genotype x environment interaction in multi-environment trials, plant breeders and geneticists often consider one of the effects, environments or genotypes, to be fixed and the other to be random. However, there are two main formulations for variance component estimation for the mixed model situation, referred to as the unconstrained-parameters (UP) and constrained-parameters (CP) formulations. These formulations give different estimates of genetic correlation and heritability as well as different tests of significance for the random effects factor. The definition of main effects and interactions and the consequences of such definitions should be clearly understood, and the selected formulation should be consistent for both fixed and random effects. A discussion of the practical outcomes of using the two formulations in the analysis of balanced data from multienvironment trials is presented. It is recommended that the CP formulation be used because of the meaning of its parameters and the corresponding variance components. When managed (fixed) environments are considered, users will have more confidence in prediction for them but will not be overconfident in prediction in the target (random) environments. Genetic gain (predicted response to selection in the target environments from the managed environments) is independent of formulation.

Abbreviations: BLUP, best linear unbiased predictor • CP, constrained parameters • UP, unconstrained parameters


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 AGRICULTURAL BACKGROUND
 STATISTICAL ISSUES
 GENETIC ISSUES
 EXAMPLE
 DISCUSSION AND CONCLUSION
 REFERENCES
 
WHEN STUDYING genotype x environment interaction, breeders and geneticists often consider one of the two factors to be fixed and the other to be random. This results in a linear mixed model. For the fixed effect, all of the levels in the population of parameters are present while for the random factor, only a random sample from the population of levels is obtained. The experimenter often wishes to obtain estimates of variance components to compute genetic correlation, heritability estimates, repeatability estimates, genetic advance estimates, and other related statistics. Several discussions of variance component estimation in the mixed model situation have appeared in the literature (Federer, 1955; Cornfield and Tukey, 1956; Scheffe, 1956, 1959; Hocking, 1973; Ayres and Thomas, 1990; Samuels et al., 1991; Fry, 1992; Schwarz, 1993; Searle et al., 1992; Nelder, 1998; Voss, 1999). Different formulations have been proposed, with two of these being used most frequently. This poses a dilemma for the breeder and geneticist as to which formulation to use as they give different estimates of genetic correlation and heritability as well as different tests of significance for the random effects factor.

The objectives of this paper are to discuss the statistical and genetic issues concerned with using the mixed linear model in a plant breeding context, illustrate the application of the two formulations using a wheat (Triticum aestivum L.) breeding example (with balanced data), and make some recommendations and conclusions.


    AGRICULTURAL BACKGROUND
 TOP
 ABSTRACT
 INTRODUCTION
 AGRICULTURAL BACKGROUND
 STATISTICAL ISSUES
 GENETIC ISSUES
 EXAMPLE
 DISCUSSION AND CONCLUSION
 REFERENCES
 
Cooper et al. (1995) hypothesized that regional testing strategies in a plant breeding program could be improved by accommodating the effects of genotype x environment interactions to maximize the response to selection. They argued that one way of doing this was to identify the set of selection environments most relevant to the future production environments. If these test environments can be repeated from year to year, confidence in predicting response in future environments would be increased. They therefore assessed the scope for managing environmental conditions at a restricted number of sites to provide discrimination among wheat lines for grain yield that matches that in target production environments.

In analyzing data from such a multi-environment testing regime, the genotypes can be considered to be a random sample of the lines from the relevant stage of the breeding program. The managed environments can be considered to be fixed as they can be repeated over years and locations. Hence, a mixed model for the genotype–environment system will be appropriate. However, the interpretation of experimental results and any inference from selection will apply to the target or production environments that could be considered to be random. Cooper et al. (1995) argued that a successful breeding strategy is one that gives a high indirect response to selection for average yield over the production environments and quantified this using the genetic correlation, which measured the similarity of line discrimination between the managed-environment selection regime and that for average performance in the production environments.


    STATISTICAL ISSUES
 TOP
 ABSTRACT
 INTRODUCTION
 AGRICULTURAL BACKGROUND
 STATISTICAL ISSUES
 GENETIC ISSUES
 EXAMPLE
 DISCUSSION AND CONCLUSION
 REFERENCES
 
Voss (1999) described the two main formulations for the two-factor mixed model, and the material in this section follows his terminology. The different two-factor models will be denoted as the UP formulation and the CP formulation.

The UP formulation for the r replicate a x b factorial experiment, with factor A fixed and factor B random, is based on the following model for yijk, the response for the kth replicate of the jth level of factor B and the ith level of factor A:

[1]
where

all terms are mutually independent

The above assumptions are the same as for the random effects model.

The CP formulation for this same experiment is based on the following model for yijk, the response for the kth replicate of the jth level of factor B and the ith level of factor A:

[2]
where

all other terms are mutually independent

The nonzero covariance terms are a consequence of the sum to zero constraint on the ({tau}D)ij over i.

The heart of the problem is in the expected mean squares for the analysis of variance of Models [1] and [2], as given in Table 1. The relationship between the variance components

[3]
and

[4]
arises because one model is a reparameterization of the other, as many authors have noted. However, this does not clarify things as the plant breeder still needs to interpret the particular parameters in Models [1] or [2].


View this table:
[in this window]
[in a new window]
 
Table 1. Expected mean squares (EMS) for the r replicate a x b factorial experiment with factor A fixed and factor B random under the unconstrained-parameters (UP) and constrained-parameters (CP) formulations.

 
To better understand the parameters, Voss (1999) constructed superpopulation models from which the UP and CP models could be induced. In particular, he showed that each parameter in the CP model is a main effect or interaction effect in the usual sense of deviations amongst means. These are the universally adopted definitions of a main effect (as the population mean for a particular level of a factor minus the grand mean) and an interaction (as the population mean for the ijth cell minus the population means for the ith and jth levels of the two factors plus the overall population mean). These definitions apply to the parameters in the model and are not only constraints on the sample statistics, as inferred by Nelder (1998). In the reply to the comments on his paper (Hinkelmann, 2000; Wolfinger and Stroup, 2000), Voss (2000) restated that the constraints in the CP model give each parameter a clear interpretation as a main effect or interaction effect. This provides consistency of meaning across fixed and random effects. This is not so for the parameters in the UP formulation.

Balance or imbalance in the cells is a function of the sampling and should not affect model definition. However, it will affect the analysis. If there were no observations in the ijth cell, one solution is to use the expected value, i.e., 0, as the value for that interaction effect. If there are no observations for an effect, it cannot be estimated. The cell means model can be used for estimating main effects and interaction effects in multiway classifications, irrespective of balance (Federer and Zelen, 1966). Computer code is available to implement the CP formulation for unbalanced data, and Federer (2000) demonstrated its use for a data set with no empty cells.


    GENETIC ISSUES
 TOP
 ABSTRACT
 INTRODUCTION
 AGRICULTURAL BACKGROUND
 STATISTICAL ISSUES
 GENETIC ISSUES
 EXAMPLE
 DISCUSSION AND CONCLUSION
 REFERENCES
 
Much quantitative genetic theory has been developed from the two-way model, particularly when both factors (genotypes and environments) are assumed to be random. The two concepts on which this theory is based are heritability (in the broad sense) and predicted genetic gain (or predicted response to selection) (Falconer, 1981). To understand their meaning, other parameters need to be defined with respect to those in the associated statistical model. Here, they will be defined with respect to the mixed model (for managed environments) and for the fully random model (for target environments).

Selection amongst genotypes is based on phenotypic variance, and the phenotypic variance on a line mean basis is determined directly from the expected mean square for genotypes from the analysis of the data (Table 1). Thus, for the managed environments

and

On the other hand, response to selection is based on genotypic variance. Again, for the managed environments

and

The heritability of genotype means in the managed environments (H2M) is defined as the ratio of the genetic variance to the phenotypic variance:

[5]
using either the UP or CP formulation for both of these variances. The heritability in the targeted environments, H2T, is defined similarly, but the fully random model is assumed. Broad-sense heritability, H2, is sometimes referred to as genotypic repeatability.

The phenotypic correlation, rp(M,T), is calculated between the means of the genotype performance in the managed and production environments. The genetic correlation, rg(M,T), measures the similarity of line discrimination between the managed-environment selection regime and that for average performance in the production environments. When the error correlation from managed to production environments can be assumed to be zero (Burdon, 1977), as in this case (as the experimental residuals are not correlated), the relationship between the phenotypic and genetic correlations is

[6]
The predicted response to selection (or genetic gain) in environment l where selection is made, {Delta}Gl, is given by

[7]
where i is the standardized selection differential, H2l is the heritability on a line mean basis, and {sigma}p(l) is the phenotypic standard deviation in environment l. This equation can be applied to selection for specific traits, such as resistance or tolerance to disease, pest, or soil toxicity factors, when genotypes are exposed to the appropriate screen. Error variation reduces genetic gain, as can be seen from the definition of heritability on a genotype mean basis as the ratio of the genotypic to phenotypic variance.

Extending this concept to the common case where the environments in which selection is made are a sample of the target environments, the predicted response to selection in those target environments, {Delta}GT, is given by

[8]
where i is the standardized selection differential, H2T is the heritability on a line mean basis, and {sigma}p(T) is the phenotypic standard deviation in the target environments.

When prediction is desired from a test set of managed environments to a target set of environments, the predicted response to selection (correlated genetic gain), {Delta}G(T|M), is given by

[9]
where there is no error correlation among the managed and target environments; HT and HM are the square roots of the heritabilities of line means in the target and test environments, respectively; rg(M,T) and rp(M,T) are the genetic and phenotypic correlations between mean performance in the test and target environments, respectively; and {sigma}p(T) is the phenotypic standard deviation in the target environments. A more detailed description of the derivation and interpretation of these statistics is given in Cooper et al. (1996).


    EXAMPLE
 TOP
 ABSTRACT
 INTRODUCTION
 AGRICULTURAL BACKGROUND
 STATISTICAL ISSUES
 GENETIC ISSUES
 EXAMPLE
 DISCUSSION AND CONCLUSION
 REFERENCES
 
The data being considered here arose from trials conducted in a set of managed environments by the Queensland wheat breeding program in Australia (Cooper et al., 1995). Grain yield (t ha–1) was measured on 15 sampled lines, which included three local check cultivars, one line from the 11th International Bread Wheat Screening Nursery, and 11 lines from the 17th International Bread Wheat Screening Nursery. The 15 lines were evaluated in 18 managed environments. These were made up of six managed environments at each of three locations—Emerald, Kingsthrope (in 1988), and Gatton (in 1987 and 1988)—and involved manipulating N availability, water, and sowing date. They were evaluated in a randomized complete block design with two replicates in each managed environment. A mixed model was adopted where the lines were random effects (as they were considered to be a random sample of the lines from the preliminary testing stage of the Queensland program) and the managed environments were fixed (as it was assumed that they represented known challenges that could be repeated over years). The estimation of variance components and genetic parameters was conducted using both the UP and CP formulations. The 15 lines were also evaluated in 10 target or production environments over 4 yrs (1985–1988) in randomized complete block designs with three replicates in each environment. These were considered to be a random subset of the regional trials used by the Queensland wheat breeding program (Brennan et al., 1981). Thus, a completely random model was adopted for the production environment trials. More details may be found in Cooper et al. (1995) where two series of managed environments were considered.

The resultant mean squares for genotypes, environments, genotype x environment interaction, and error for the data from the managed environments are 3.052, 67.226, 0.318, and 0.099, respectively. As the focus here is on the interpretation from the mixed model, the mean squares for the data from the target or production environments are not listed. The genetic parameter estimates using both the UP and CP formulations are presented in Table 2. Given the difference in the expected mean squares (Table 1), the estimate of the variance component for genotypes (i.e., the genetic variance) is greater for the CP formulation than for the UP formulation, and consequently, the line mean heritability is larger and the genetic correlation from the managed environments to the production environments is smaller for the CP formulation than for the UP formulation (Table 2). Irrespective of the formulation used, predicted genetic gain from managed to production environments [{Delta}G(T|M) = 0.003] remains the same as the phenotypic correlation [rp(M,T) = 0.56] remains the same. This is in spite of the change in the estimated heritability in the managed environments.


View this table:
[in this window]
[in a new window]
 
Table 2. Genetic parameter estimates from the analysis of the grain yield (t ha–1) of 15 genotypes grown in randomized complete block designs within each of 18 managed environments under the unconstrained-parameters (UP) and constrained-parameters (CP) formulations.

 
The CP formulation puts more confidence in an ability to distinguish lines that are genetically better in the managed environments (H2M = 0.97) than does the UP formulation (H2M = 0.90) at the price of less confidence in prediction to production environments [rg(M,T) of 0.72 for CP and 0.78 for UP] (Table 2). This is compatible with the fixed model assumption for environments.

Another consequence is that the calculation of the best linear unbiased predictors (BLUPs) for genotypes will be affected by the different models, with those using the CP formulation likely to overestimate the range of performance in the production environments. This arises because, for the completely random model, the BLUP for genotype effect across environments ({eta}j) is, in its heritability form (DeLacy et al., 1996), given by

where .j. is the mean genotype response across replicates and environments and ... is the overall mean response.

The heritability from the CP formulation (H2M = 0.97) is larger than that from the UP formulation (H2M = 0.90) and shrinks the BLUPs less. For balanced data, the correlation between these BLUPs and the raw genotype means over environments is 1.


    DISCUSSION AND CONCLUSION
 TOP
 ABSTRACT
 INTRODUCTION
 AGRICULTURAL BACKGROUND
 STATISTICAL ISSUES
 GENETIC ISSUES
 EXAMPLE
 DISCUSSION AND CONCLUSION
 REFERENCES
 
Breeders are setting up fixed managed environments with known conditions that contribute to genotype x environment interaction. This is opposed to selecting a random sample of environments from the population of environments in which a genotype will be grown. It is doubtful if a truly random sample of environments could be obtained anyway. The finite set of managed environments leads directly to a mixed model situation when the genotypes represent a random sample from the population of genotypes.

If the definition of main effects and interactions universally used in factorial experiments is acceptable, then the CP formulation is the correct one for the breeder to use. We believe that this concept of main effects and interactions (as deviations among underlying population means) does have meaning across fixed, mixed, and random models.

A number of authors (e.g., Harville, 1978; Ayres and Thomas, 1990; Fry, 1992; Schwarz, 1993) have attempted to justify each of the formulations based on their covariance structures. The nonzero covariance in the CP formulation allows cov(yijk, yi'jk') to be negative (Harville, 1978; Schwarz, 1993). This negative correlation between the responses in different managed environments over genotypes can be an important biological phenomenon, e.g., in the case of environments representing different disease incidences, water stress, or toxicity.

As far as hypothesis testing and model definition are concerned, it is irrelevant whether experimental data are balanced or not. The sampling procedure does not change the hypothesis, and the population parameters that are being estimated are not different in concept, even if the actual estimates are different. While the relationship between the variance components using the UP and CP formulations holds for population parameters, estimates of these parameters will be affected by whether or not the data are balanced and by the software used to obtain them. A researcher may choose the UP formulation when analyzing unbalanced data (because of available software), but software is also available for fitting the CP formulation to unbalanced data. Regardless, it is imperative that plant breeders understand the behavior in the balanced case. While this is not meant to encourage researchers to only collect balanced data, such experiments do have a place in research.

For data collected over a period of years, it is recommended that breeders obtain estimates of the genotype and genotype x environment interaction components of variance by the two formulations and from ANOVA and REML methods. Then, the results for genetic correlations and heritabilities can be computed for all estimates and compared with the actual values achieved in the breeding program. Such summarizations and applications will verify the validity of any particular procedure for the breeding program in question.

Overall, we recommend the CP formulation because of meaning of its parameters and corresponding variance components. For balanced data, we have shown that users will be more confident in prediction in the managed environments but not overconfident in prediction in the target environments. Importantly, the genetic gain (predicted response to selection in the target environments from the managed environments) is the same under both formulations.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 AGRICULTURAL BACKGROUND
 STATISTICAL ISSUES
 GENETIC ISSUES
 EXAMPLE
 DISCUSSION AND CONCLUSION
 REFERENCES
 




This article has been cited by other articles:


Home page
Crop Sci.Home page
P. Annicchiarico, F. Bellah, and T. Chiari
Defining Subregions and Estimating Benefits for a Specific-Adaptation Strategy by Breeding Programs: A Case Study
Crop Sci., August 1, 2005; 45(5): 1741 - 1749.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (2)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Basford, K. E.
Right arrow Articles by DeLacy, I. H.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Basford, K. E.
Right arrow Articles by DeLacy, I. H.
Agricola
Right arrow Articles by Basford, K. E.
Right arrow Articles by DeLacy, I. H.
Related Collections
Right arrow Biometrics
Right arrow Data Management
Right arrow Crop Models
Right arrow Plant and Environment Interactions
Right arrow Crop Genetics
Right arrow Statistics


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
The SCI Journals Crop Science Vadose Zone Journal
Journal of Natural Resources
and Life Sciences Education
Soil Science Society of America Journal
Journal of Plant Registrations Journal of
Environmental Quality
The Plant Genome