|
|
||||||||
smund Bjornstadb
a Awassa College of Agriculture, P.O. Box 5, Awassa, Ethiopia
b Dep. of Hortic. and Crop Sci, P.O. Box 5022, Agricultural Univ. of Norway, N-1432,
s, Norway
c Dep. of Mathematics, P.O. Box 5035, Agricultural Univ. of Norway, N-1432,
s, Norway
aca{at}addisababa-server1.telecom.net.et
| ABSTRACT |
|---|
|
|
|---|
ij) and provides the graphs of the regression lines for both genotypes and locations. Separate regression on the positive and negative sectors of the environmental indices is also conducted. The program calculates Tai's
and
statistics with graphical presentation of the scatter of the genotypes in the
,
space. Other outputs of the program include the univariate stability statistics Wricke's ecovalence (Wi2), Shukla's stability variance (
i2), Hanson's genotypic stability (Di2), Plaisted and Peterson's
i, Plaisted's
(i), Francis and Kannenberg's environmental variance (Si2), and coefficient of variance (CV); and the rank-based nonparametric stability statistics Si(2), Si(3), Si(6), Kang's rank sum, and the stratified rank analysis of the genotypes. The program also computes Type 4 stability, superiority measure (Pi), the desirability index of genotype performance, and the pairwise genotype x environment (G x E) interaction of genotypes with checks. It partitions the G x E interaction into that due to heterogeneity of variances and that due to imperfect correlation between the genotype performance and performs the singular value decomposition of the G x E matrix, plotting the first two interactions' principal components.
Abbreviations: ANOVA, analysis of variance CV, coefficient of variance E, environment G, genotype IML, Interactive Matrix Language L, location R, Replication SS, sum of squares Y, year
| INTRODUCTION |
|---|
|
|
|---|
The study of the other two components, the environment and the G x E interaction, has lagged behind. The G x E interaction seems to have gained more attention in the last few decades. Though not comparable to the sophisticated biometrical models, various methodologies have been proposed to extract more information from this component than analysis of variance (ANOVA) alone could give. Various regression models have been extensively used. Univariate parametric and nonparametric stability statistics have been proposed to determine the response of genotypes to changing environment. Multivariate analytical tools originally designed in other fields have been applied to extract more pattern from the G x E interaction. Most of these stability statistics have not been extensively used and their interrelations have not been investigated thoroughly, mainly due to their computational difficulty. Because of their recent introduction, the computational algorithms of most of these methodologies are not included in the commercial software packages in use today. They therefore remain inaccessible to most breeders and agronomists. Few attempts have been made to produce such computational programs (Kang, 1989; Piepho, 1997, 1998, and 1999).
Our objective was to produce a unified SAS program that could bridge this gap. We present here an elaborate SAS program for a detailed analysis of G x E interaction in balanced data sets using the univariate and multivariate stability statistics proposed by various authors at different times. We believe that this unifying program will enable researchers to compute most of the univariate and multivariate stability statistics and study their relations under different circumstances.
| The stability parameters |
|---|
|
|
|---|
Although the ANOVA can partition the total variance into that due to the main effects and that due to the interaction, as an additive model it fails to exhaustively analyze the nonadditive G x E interaction. The first attempt to extract more information from this component was the use of linear regression. Using symmetrical joint regression we have partitioned the G x E interaction into the concurrent regression (Tukey's 1 df for nonadditivity), the regression of genotypes on environmental means (the left solution), and the regression of environments on genotype means (the right solution). The joint linear regression uses two parameters, the regression coefficient, b-values (Eberhart and Russell, 1966; Finlay and Wilkinson, 1963; Yates and Cochran, 1938) or ß-values (Bucio Alanis, 1966; Perkins and Jinks, 1968a and 1968b; Shukla, 1972) and deviation from regression (
ij) (Eberhart and Russell, 1966). The b-values have a mean of unity, while the ß-values have a mean of zero. In addition to the computation of these parameters, we have also included their test statistics and the graphs of the slopes (both b- and ß-values) for genotypes and environments separately. An IML (SAS Inst., 1989) program for testing the homogeneity of the mean squares of deviations from regression can be cut out of this program and used for testing homogeneity of variances in any data set.
Pederson and Seif (1975) proposed the simultaneous regression on location x genotype, year x genotype, and location x year x genotype effects. If all three regression coefficients are the same, the authors conclude that a single environmental factor dominates both between and within seasons. We have included the necessary tests on the plausibility of using data averaged over years in regression against the use of unaveraged data. The separate regression on the positive and negative sectors of the environmental indices (Cruz et al., 1989; Verma and Chahal, 1978) is also included.
Tai's (1971)
and
are the other stability parameters that are related to the b-values and deviations from regression (
ij), respectively. They are obtained in a manner that is a continuation of ANOVA by using the principle of structural relationships. Tests of these two stability statistics are also included in the program. The distribution of the genotypes in
,
space is graphed as proposed by Tai.
Other univariate stability parameters include Wricke's ecovalence (Wi2), Shukla's stability variance (
i2 ) (Lin et al., 1986), Hanson's (1970) genotypic stability parameter (Di2), Plaisted and Peterson's (1959) and Plaisted's (1960) contribution of individual genotypes to overall genotype interaction parameters (
i and
(i)), respectively, as explained in Lin et al. (1986), Francis and Kannenberg's (1978) environmental variance (Si2), and CV. The rank-based stability parameters proposed by Nassar and Huhn (1987), Si(2), Si(3), and Si(6), are nonparametric stability statistics that do not need the assumption of a normal distribution.
In addition to the above, the following univariate stability statistics are also computed and all necessary tests conducted and significance levels given:
The alternative method of partitioning the G x E interaction into that due to heterogeneous variances and that due to imperfect correlations measured for each genotype proposed by Muir et al. (1992) is also given. We have found that the original formula given by the authors does not always correctly partition the individual genotype's sum of squares (SS) of interaction (Wricke's ecovalence), although the partitioning of the total G x E interaction sum of squares is carried out correctly. The lack of correlation and the sum of the two components for some genotypes might be higher than Wricke's ecovalence, while for other genotypes it might be less than this component. Therefore, we have not provided the partitioning of the individual genotype's G x E interaction (Wricke's ecovalence). Only the total G x E interaction is partitioned into that due to heterogeneity and that due to lack of correlation.
Clustering of genotypes is conducted based on distance measures proposed by Abou-El-Fittouh et al. (1969) and by Fox and Rosielle (1982).We used the SAS (SAS Inst., 1987) average linkage algorithm for clustering, but the user is free to substitute any of the other clustering algorithms available in SAS.
We have included an Interactive Matrix Language (IML) (SAS Inst., 1989) program for partitioning of the G x E matrix by the method of singular value decomposition. The program extracts the genotype and environment coordinate vectors (VIPC and EIPC). It then plots the first interaction principal component axis versus both the genotype and the environmental means on a single graph (biplot); i.e., VIPC1 x genotype means, and EIPC1 x environment means. It also automatically outputs a graph of the second principal component versus the first principal component (VIPC2 x VIPC1 and EIPC2 x EIPC1). This is also a biplot of both the genotypes and the environments on a single graph. All coordinate axes are available in a data set. They are printed out in one of the tables of the output and can be used in further graphing.
The user can export the graphs that are the outputs of this program into word processing softwares. We have implemented only minimal formatting for the graphs. The user can modify the title, axis, footnotes, line types, and all other parts of the graph. Ten separate lines are provided. If the number of genotypes or environments is more than 10, the user should add more SYMBOL statements in the program.
Formulas for computing the various stability parameters are presented in Tables 1 to 4 .
|
|
|
|
| General features of the sas program |
|---|
|
|
|---|
Two types of data can be analyzed by the program: A G x L x Y data set (i.e., a data set containing results of p genotypes tested at q locations over y years, with r replications per location) or a G x L data set (i.e., a data set containing only one year's G x L data over r replications). The variable to be analyzed should be named YIELD. The environments should be entered as ENV, the genotypes as VAR, the years as YEAR, and the replications as REP. The data set to be analyzed can be directly entered just before the program and the name of this data set inserted in place of YOURDATASET at the beginning of the program. A two-level constant SAS data set (with Libname.Filename) saved on disk can also be used in place of YOURDATASET. The physical location and the Libreference of the data should be specified .
From the design of the experiment and its ANOVA, the following statistics should be available:
At the beginning of the program we have put the following macro statements:
The variables to be replaced (i.e., p, q, r, msb, msl, means, and mems) are then inserted in the program as &p, &q, &r, &msb, etc. The user should enter the real values of these variables at the top of the program. For example, if the number of replications is three, then the user should enter %LET r=3;. When not available, these variables should be replaced by their respective letter codes; for example, %LET msb=msb;. The macro automatically replaces all occurrences of these variables with their respective values. For technical reasons, seven occurrences of the number of genotypes and three occurrences of the number of environments could not be automated as suggested above. We replaced these values with geno and env, respectively. We wrote the entire program in upper-case letters. The macro variables and geno and env are written in lower-case. The case-matching option of the "Replace" submenu of the "Edit" menu should be used to automatically replace the seven occurrences of geno and the three occurrences of env. For example, if the number of environments is five, one should enter "env" in the "Find what" space and enter "5" in the "Replace with" space and then press "Replace all." All occurrences of env will be replaced by 5. SAS will display a dialog box stating, "3 occurrences of `env' have been replaced by `5'." If the user forgot to use the "Match case" option, this figure will be much higher than 3. The "Undo" option of the "Edit" menu can be used to revert to the original and the process can be repeated using the "Match case" option. The seven occurrences of geno should be replaced in the same manner.
Most of the univariate parametric and nonparametric stability statistics and the multivariate analyses of clustering and singular value decomposition for ordination can be computed by inserting only the number of genotypes (p) and the number of environments (q). Other variables mentioned above are needed mostly for the tests on the stability parameters.
The majority of the stability parameters are designed for genotype x location data. This might be because, as physical entities with their own relief, edaphic, and climatic characteristics, the locations are the reference points for future variety release. The only stability statistics on G x L x Y data included in this program are the Type 4 stability (the year within locations mean square) and the simultaneous regression on location x genotype, year x genotype, and location x year x genotype effects. These are computed by the first part of the program. The second and main part of the program computes stability statistics of the G x L data. In this second part, the program automatically collapses the G x L x Y data to a G x L data by averaging over years. We have included this second part as a separate program on the Internet site mentioned earlier for users who have G x L data. The variables to be inserted into the program by the macro statements (msb, msl, mems, and means) are for use with G x L data in this second part of the program. The ANOVA, which produces msb, msl, mems, and means, should therefore be that of the G x L data, not that of G x L x Y data. Taking these statistics from the ANOVA of the G x L x Y data will not give correct results. For computation of stability parameters from data of p genotypes tested at q locations over y years, with r replications per location-year combination, the user should, therefore, proceed as follows:
1. Find the mean yield over the y years, reducing it to a G x L data. This will give data of p genotypes in q locations for r replications. Run ANOVA for this data set and obtain mean squares of locations (msl), replications (msb), and error (means). The pooled error mean square (mems) is obtained by dividing the error mean square by r. These are the variables to be replaced in the program.
2. Average the G x L x Y x R data over the r replications and obtain a G x L x Y data. This is the data set to be inserted into the program when data of y years is to be analyzed, replacing YOURDATASET at the beginning of the program.
For experiments of p genotypes conducted at q locations for only 1 yr, msl, msb, and means, which are the mean squares of locations, blocks, and error, are obtained from the results of the direct ANOVA of the data set. The data set, averaged over the r replications, must be replaced into the program in place of YOURDATASET at the beginning of the analysis.
To do an elaborate analysis of environments instead of the analyses of the genotypes, the user should do the following:
While the computation of all other parameters given in this program could be automated, the pairwise genotypeenvironment interactions of test genotypes with checks proposed by Lin and Binns in 1985 could not be automated since the number of checks cannot be predetermined. Using Lin and Binn's (1985) data, we presented a sample program as the third part of this SAS program. Four checks are used in this data set; the user can easily study the last part of the program and see that four distinct variables are computed. We have arranged the checks as the first four entries in the data set so that Check 1's data goes from line 1 to line n1, that of Check 2 goes from line n1 + 1 to line n2, that of Check 3 from line n2 + 1 to line n3, and that of Check 4 from line n3 + 1 to line n4. The rest of the data (that of the test-genotypes) goes from line n4 + 1 to the end. Following the same procedure, we hope that users can modify this last part to their own data. If only one check is used, only single variables such as GXECH are computed. If two checks are used, two variables (GXECH1 and GXECH2) are computed; if three checks are used, three components are computed, and so on.
Although most of the output is self-explanatory and corresponds to what has been elaborated upon here and in the description of the stability parameters on the aforementioned Web site, we have put the explanation to every table of the output on the Web site as well. We also advise users to read the comments in the body of the program. Not all computed variables are included in the output. For example, the regression models of Finlay and Wilkinson (1963) and the model of Eberhart and Russell (1966) give similar slopes but different intercepts. We have computed both slopes, but printed only that from Eberhart and Russell's (1966) model to avoid redundancy. Similar relations exist between the regression models of Perkins and Jinks (1968a and 1968b) and that of Shukla (1972). We have printed out only the slope from Perkins and Jinks (1968a and 1968b). When computing the stability statistics, some intermediate results can also give important information. During the computation of the heterogeneity of variances and the lack of correlation component for individual genotypes, for example, very informative intermediate matrices exist. These contain the heterogeneity and lack of correlation components between every pair of genotypes. The lack of correlation between the genotypes can be used as a distance measure to cluster them. We did not print out such subsidiary information to minimize the amount of information and to focus the objective of the program. In the main body of the program, however, we put some notes on such intermediate data sets or matrices informing the user that this information can be printed.Delacy Basford Cooper Bull McLaren 1966; SAS Institute 1987; SAS Institute 1989
| ACKNOWLEDGMENTS |
|---|
kon Sparre of the Department of Horticulture and Crop Sciences of the Agricultural University of Norway for managing the technical aspects of putting the program and all other files on the Internet.
Received for publication April 30, 1999.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
X.-M. Fan, M. S. Kang, H. Chen, Y. Zhang, J. Tan, and C. Xu Yield Stability of Maize Hybrids Evaluated in Multi-Environment Trials in Yunnan, China Agron. J., January 1, 2007; 99(1): 220 - 228. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Sabaghnia, H. Dehghani, and S. H. Sabaghpour Nonparametric Methods for Interpreting Genotype x Environment Interaction of Lentil Genotypes Crop Sci., March 27, 2006; 46(3): 1100 - 1106. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| The SCI Journals | Crop Science | Vadose Zone Journal | |||
| Journal of Natural Resources and Life Sciences Education |
Soil Science Society of America Journal | ||||
| Journal of Plant Registrations | Journal of Environmental Quality |
The Plant Genome | |||