Published online 27 June 2006
Published in Agron J 98:1081-1089 (2006)
DOI: 10.2134/agronj2005.0326
© 2006 American Society of Agronomy
677 S. Segoe Rd., Madison, WI 53711 USA
Forages
Estimation of Preharvest Fiber Content of Mixed AlfalfaGrass Stands in New York
D. Parsons,
J. H. Cherney* and
H. G. Gauch
Department of Crop and Soil Sci., Cornell Univ., Ithaca, NY 14853
* Corresponding author (jhc5{at}cornell.edu)
Received for publication December 5, 2005.
 |
ABSTRACT
|
|---|
Regression equations can be used to estimate the neutral detergent fiber (NDF) of alfalfa (Medicago sativa L.), assisting producers in decision making at harvest time. In New York State, where most alfalfa is grown in mixed stands with grass, there are no available models to estimate NDF. The objectives of this experiment were to develop equations for estimating total mixed stand NDF with an emphasis on producer useable equations based on easily obtainable data. Stands of first-cut alfalfa and grass (0.10.9 fraction grass) were sampled at two experimental sites and producers' fields in 19 New York counties during May and June 2004 and 2005. A range of plant measurements and environmental characteristics were recorded and used to develop prediction equations. For selection of two to five variable models using 899 data points, R2 ranged from 0.89 to 0.94 and root mean square error (RMSE) ranged from 21.2 to 30.1 g kg1 dry matter (DM). The most important explanatory variables were the fraction of grass and alfalfa height. Growing degree days and day of the year improved goodness of fit but were biased between years. Categorization of the grass fraction into 0.2, 0.4, 0.6, or 0.8 allows estimation without requiring species separations. Categorization decreased R2 and increased RMSE but is a variable that could be more easily used by producers. Model validation found significant biases with some model estimates; however biases and prediction errors were small enough to suggest that the results are practically applicable to New York farms.
Abbreviations: DM, dry matter LC, lack of correlation MSD, mean squared deviation NDF, neutral detergent fiber NU, nonunity slope PEAQ, predictive equations for alfalfa quality RMSE, root mean square error SB, squared bias SBC, Schwarz Bayesian criterion
 |
INTRODUCTION
|
|---|
TIMING of spring forage harvest is critical to obtain optimal quality for animal production. For forage that serves as the primary fiber source in the diet, NDF is the principal forage quality variable of concern. The target NDF at harvest is approximately 400 g kg1 for pure alfalfa silage and 500 g kg1 DM for pure grass silage (Cherney et al., 1994). In addition there is a relatively small range in optimal alfalfa NDF (Cherney and Sulc, 1997), emphasizing the need for quick and accurate methods for estimating NDF. A number of methods have been developed to estimate alfalfa NDF, including models based on weather, chronological age, and plant morphology (Fick et al., 1994). The most widely used of these are the predictive equations for alfalfa quality (PEAQ) (Hintz and Albrecht, 1991). Hintz and Albrecht found that equations using the tallest stem and maturity of the most mature stem in the sample gave acceptable RMSE compared to more complex methods involving mean stage by weight (Fick and Janson, 1990) or mean stage by count (Kalu and Fick, 1981; Allen and Fick, 1990). Although the initial model was validated for Wisconsin, equations have been developed for other regions of the USA, including Ohio (Sulc, 1996) and New York (Cherney, 1995). In addition, the original PEAQ equations have been evaluated in New York, Pennsylvania, Ohio, California, and Wisconsin (Sulc et al., 1997). Results indicated some biases in using the equations outside the state of development; however the prediction errors were sufficiently low to suggest the PEAQ equations are robust over a wide range of environments.
The estimation of forage quality is more complex in New York, where more than 80% of alfalfa is grown in mixed stands with grass (Cherney et al., 2006). Grasses in New York can very rapidly increase in NDF during the harvest period (Cherney et al., 1993), and producers often harvest stands containing grass before fields of pure alfalfa are harvested. Consequently there is a need for simple field-based methods for estimating the NDF of alfalfa stands with a grass component. Not only is the estimate of the PEAQ model of unknown accuracy for estimating the NDF of the alfalfa portion of the sward but there are no available equations for estimating the NDF of the grass portion. In addition it is not known how alfalfa and grass interact at different proportions in the sward to affect the NDF of each component. For practical purposes, a producer is ultimately interested in the NDF of the total mixture rather than the NDF of the individual components. Thus the objectives of this experiment were (i) to develop equations for estimating total mixed stand NDF using a combination of environmental measurements and sward characteristics and (ii) to develop producer useable equations based on easily obtainable data with a focus on measurable sward characteristics.
 |
MATERIALS AND METHODS
|
|---|
Field Study
Spring growth of alfalfa and grass mixed stands were sampled at two experimental sites and 150 producers' fields in 19 New York counties during May and June 2004 and 2005. The experimental sites were the Cornell University Caldwell Field Research Farm (42.45° N, 76.46° E, 276 m, 02% slope) near Ithaca, NY, and Mount Pleasant Research Farm (42.46° N, 76.37° E, 520 m, 06% slope) near Dryden, NY. The soil at Caldwell Field is a Niagara silt loam (fine-silty, mixed, active, mesic Aeric Epiqualfs) and the soil at Mount Pleasant is a Mardin silt loam (coarse-loamy, mixed, mesic Typic Fragiudepts). The experimental design at each site was a randomized complete block design with four blocks. Each block included three different alfalfa-grass species mixtures at two grass seeding rates, and one plot of pure alfalfa, giving a total of 28 plots. Each plot measured 2.7 by 6 m (16.2 m2) with 0.15 m between plots and 0.3 m alleys between blocks. Plots were seeded on 19 May 2003 at Caldwell Field and 23 May 2003 at Mount Pleasant. All plots were seeded at Caldwell Field with Hytest 340PLH alfalfa at 13.44 kg ha1 and at Mount Pleasant with Hytest 104PLH alfalfa at 13.44 kg ha1 using a Brillion seeder (Brillion Farm Equipment, Brillion, WI). Grass plots were seeded using a Carter seeder (Carter Mfg., Brookston, IN). Richmond timothy (Phleum pratense L.) was seeded at 3.36 and 6.72 kg ha1, Okay orchardgrass (Dactylis glomerata L.) was seeded at 4.48 and 8.96 kg ha1, Rival reed canarygrass (Phalaris arundinaceae L.) was seeded at 6.72 and 13.44 kg ha1. Seeding rates were calculated using pure live seed. Lime, P, and K were applied according to soil test recommendations. Plots were lightly hand-weeded in April 2004 and April 2005.
Producer fields were identified with alfalfa height of at least 30 cm, and fields and plots were sampled using the same methods. To define a representative portion of the field or plot as the sample area, an area of approximately 1 m2 was visually identified in 2004, and in 2005 a hoop of comparable area was used. In total, 234 plot samples and 480 producers' field samples were collected in 2004, and 105 plot samples and 80 producers' field samples were collected in 2005. The data collected, variable abbreviations, and their ranges are summarized in Table 1. Height of the tallest alfalfa stem in the sample area was measured to the terminal bud (MAXHT). The alfalfa maturity categories of Kalu and Fick (1981) (Table 2) were used to assign a numerical value to the most mature stem in the sample area (MAXSTAGE). The major grass species was recorded (GSPECIES). The height of the tallest grass tiller in the sample area was measured by fully extending the leaf (GMAXHT). The average grass canopy height of the sample area was measured with no extension of leaves (GCANOPY). The developmental stage of the most mature grass tiller in the sample area was determined using the staging system of Moore and Moser (1995) (GMAXNDX). Determination of the index number requires knowledge of the total number of leaves or nodes that will appear before reaching the next development stage. Because this system requires prior knowledge of development norms for each member species, a simplified grass staging system (GMAXSTG) was created that is potentially more useable by nonscientists (see Table 3). Time of sampling was recorded and converted to a decimal number (TIME); for example, 2.30pm was converted to 14.5. The fraction of grass in the sample area was visually estimated (GEST). A representative sample of 500 to 750 g of alfalfa and grass was hand clipped from the sample area at a height of 10 cm, an approximation of typical harvest height. Date of sampling was transformed to day of the year (DOY), the number of days from the beginning of the year. The altitude of the field was recorded (ALTF), as were the geographic co-ordinates. Co-ordinates of the fields were overlayed with the co-ordinates of all New York State weather stations using Manifold (Enterprise Edition 6.50, CDA International Ltd., San Mateo, CA). Voronoi cells were created to determine the nearest weather station for each field, thus enabling the calculation of individualized growing degree days. Accumulated growing degree days were calculated using both base 0°C (GDD0) and base 5°C (GDD5). Accumulation of growing degree days was initiated when the mean temperature exceeded the base for five consecutive days. The altitude of the nearest weather station (ALTWS) was used as a potential explanatory variable.
View this table:
[in this window]
[in a new window]
|
Table 1. Descriptions and ranges of variables evaluated as potential predictors of neutral detergent fiber (NDF) content in swards of alfalfa and grass.
|
|
View this table:
[in this window]
[in a new window]
|
Table 3. Grass maturity categories used to assign a numerical value to the most mature tiller in the sampling area.
|
|
View this table:
[in this window]
[in a new window]
|
Table 2. Alfalfa maturity categories used to assign a numerical value to the most mature stem in the sampling area.
|
|
Samples were separated and oven-dried at 60°C until a constant dry weight was reached. Subsamples of dried alfalfa and grass were weighed and the actual fraction of grass in the sample (GFRAC) was calculated. Samples with GFRAC <10 or >90 were not used for further analysis. Estimating the fraction of grass in a mixed stand can be difficult, and it is difficult for producers to accurately estimate GFRAC in the field. Thus, samples were allocated grouping values of 0.2, 0.4, 0.6, or 0.8, (GGRP) in accordance with the nearest GFRAC value. GGRP values were calculations based on the actual fraction of grass rather than field estimates. Samples were ground to pass through a 1-mm screen. Grass and alfalfa samples (0.25 g) were analyzed separately for NDF content using the procedure described by Van Soest et al. (1991), using the ANKOM (Macedon, NY) fiber analyzer with filter bags. Values for the sward NDF were calculated by averaging the grass and alfalfa NDF, weighted by the fraction of grass in the sward.
Data Analysis
The dataset was randomly partitioned into two replicates, hereafter referred to as split 1 and split 2. The purpose of this partitioning was to determine if data from numerous sites and two sampling years can be pooled. The dataset was also split by year, hereafter referred to as Y2004 and Y2005. The purpose of this split was to analyze bias between years and determine whether data from different years can be combined.
PROC RSQUARE variable selection procedure (SAS for Windows Release 9.1, SAS Institute, Cary, NC) was used on the combined dataset to identify models that maximized the coefficient of determination (r2 or R2) and with minimum RMSE and Schwarz Bayesian criterion (SBC). The RMSE has the same units as the variable predicted and in model construction is the calibration error of the model. In model validation RMSE is the prediction error of the model. The SBC is a statistic that has components relating to both fit and the number of parameters in the model. The SBC is a better indicator of predictive accuracy than R2 and RMSE. Four potential prediction equations were chosen for the combined dataset and PROC GLM was used to fit equations containing the same explanatory variables for split 1, split 2, Y2004, and Y2005.
Model evaluation was performed by fitting the Y2005 dataset to equations derived from the Y2004 data and vice versa. Similarly, split 1 was fit to the equations derived from split 2 and vice versa. A number of parameters were used in evaluating the equations as no single statistical test can adequately describe the goodness of fit of the model. All regression equations were evaluated with r2 and RMSE, and tested for intercepts at the origin (a = 0) and unitary coefficients (b = 1). Kobayashi and Salam (2000) presented reasons why these parameters are not entirely satisfactory for model evaluation and promoted the use of mean squared deviation (MSD) and its components as more informative parameters. The MSD reflects discrepancy between a model and the data and is a direct measure of predictive success. Gauch et al. (2003) proposed the partitioning of MSD into the components of squared bias (SB), nonunity slope (NU), and lack of correlation (LC) to provide further insight into model performance. The three components have distinct meanings and simple geometric interpretation, with SB relating to translation, NU relating to rotation, and LC relating to scatter.
 |
RESULTS AND DISCUSSION
|
|---|
To determine the similarity of the experimental plot data to the producer field data, variable selection methods were used to develop promising equations based only on the producer field data. Two datasets were fitted to these equations (i) the producer field dataset and (ii) the total dataset including plots and producer fields. The results of the MSD values from these regressions indicate that the MSD only slightly increased when the entire dataset was used. For example, the MSD of a three-variable model based on GFRAC, MAXHT, and GDD5 increased from 478 to 491. Because the regression of plot data on equations for the producer field data does not dramatically increase the MSD, we can conclude that the plot data can be used in conjunction with the producer field data without significant error. Therefore, all subsequent analyses are based on the aggregated plot and producer field datasets.
Table 4 shows promising equations for mixed alfalfa and grass estimation selected from the variable selection procedure. Using the entire dataset the best two-variable model (Eq. [1]) consisting of MAXHT and GFRAC has an R2 of 0.89 and an RMSE of 30.1 g kg1 DM. The three-variable model with GDD5 (Eq. [2]) has an R2 of 0.94 and an RMSE of 22.4 g kg1 DM, suggesting a better goodness of fit than the two-variable model. The three-variable model with DOY (Eq. [3]) has an R2 of 0.92 and an RMSE of 25.0 g kg1 DM, suggesting a better goodness of fit than the two-variable model, but a poorer fit than for the model containing GDD5. The model with DOY was included in the equations because DOY is a simple calculation compared to GDD5 which requires access to meteorological data. To assess the possibility of using a large combination of explanatory variables, the best five-variable model was also selected (Eq. [4]). This resulted in an R2 of 0.94, equivalent to the best three-variable model, and an RMSE of 21.2 g kg1 DM, lower than the best three-variable model. In comparison with these models, the two-variable PEAQ model for NDF (Hintz and Albrecht, 1991) had an R2 of 0.89 and an RMSE of 26.2 g kg1 DM. Thus, these results for mixed alfalfa-grass are promising in terms of goodness of fit, with acceptable values for both R2 and RMSE.
View this table:
[in this window]
[in a new window]
|
Table 4. Selected regression models to estimate mixed alfalfa and grass neutral detergent fiber (NDF), based on the entire dataset and datasets split by year (2004, 2005) and split randomly (split 1, split 2).
|
|
The SBC value for the two-variable model (Table 4) is 5682. There is a drop in SBC to 5374 with the addition of DOY to the model, indicating improved fit. There is a further drop to 5187 for the GDD5 three-variable model. The SBC for the five-variable model is the lowest (5115) suggesting that predictive accuracy can be improved for mixed alfalfagrass with the use of models with a high number of explanatory variables.
The variables used to construct the models for the entire dataset were used to develop equations for the datasets split 1, split 2, Y2004, and Y2005 (see Table 4). For these datasets the R2 (ranging from 0.880.96) and RMSE (ranging from 17.229.9 g kg1 DM) values are of similar magnitude to the entire dataset. The SBC values cannot be compared between datasets due to the different number of data points; however the trend is similar for each dataset. For the Y2004, split 1, and split 2 datasets the SBC from highest to lowest is in the order: two variables, three variables with DOY, three variables with GDD5, and five variables. The exception is the Y2005 dataset which has a lower SBC for the two-variable model with GDD5 than with DOY.
Table 5 shows the results of fitting the observed NDF values from the split 1, split 2, Y2004, and Y2005 datasets to the equations developed from the corresponding dataset. For the Y2004 and Y2005 datasets the range of values for R2 (0.880.95) and RMSE (19.529.7 g kg1 DM) are acceptable. For Eq. [6] to [8] (Y2005 dataset) all b values exceed unity and all intercepts are negative. Thus NDF content is overestimated, particularly at low values. For Eq. [5] the b value does not differ from unity and the a value is not significantly different from 0. For Eq. [9] to [12] (Y2004 dataset) the b values are all significantly <1 and the a values are all significantly >0. For this replicate of dataset and equations NDF is underestimated, particularly at lower values of NDF.
View this table:
[in this window]
[in a new window]
|
Table 5. Coefficient of determination (r2), root mean square error (RMSE), slope (b), and y-intercept (a) derived from the regressions of equation estimates on observed forage quality values from the corresponding dataset pair. Dataset pairs were (i) Y2004 and Y2005 and (ii) split 1 and split 2.
|
|
These results for Eq. [5] to [12] (Table 5) make it difficult to assess the comparative validity of the models. For example, the model with the lowest R2 (Eq. [5]) is also the only one that does not have a significantly biased slope or intercept. Alternatively, Eq. [7] which has the highest R2 also has an a value of 55.18, which is significantly <0. Thus it is difficult to compare the strengths and limitations of the models solely using these methods. Partitioning MSD provides a way to better understand predictive success and provide a basis for addressing the dilemma of which model is best. Figure 1
shows SB, NU, and LC components of MSD for the models. The results for the Y2005 dataset show that the order of lowest to highest MSD is the five-variable model (488, Eq. [8]), the three-variable model with GDD5 (587, Eq. [6]), the two-variable model (878, Eq. [5]), and the three-variable model with DOY (1184, Eq. [7]). Equation [5] has very low values for SB (0) and NU (15) but a high value for LC (863), signifying that the main contributor to MSD is scatter. Equation [7] has a very high value for SB (785), a value of 22 for NU, and a value of 377 for LC, signifying that the largest contributor to MSD is translation error. A prominent feature of Eq. [5] to [8] in Fig. 1 is that the models differ greatly regarding which component contributes the most to MSD. For the Y2005 dataset it is evident that Eq. [5] has an MSD almost entirely composed of LC. Equations [6] to [8] all have LC values similar to each other, however the SB component of Eq. [7] dramatically increases MSD. The MSD partitioning pattern is slightly different for the Y2004 dataset. Once again the two-variable model (Eq. [9]) has an MSD almost entirely composed of LC. Equations [10] to [12] all have similar values for LC, however the SB component of the three-variable model with DOY (Eq. [11]) dramatically increases MSD. For Eq. [10] and [12], SB and NU also contribute a greater proportion of the MSD than the corresponding equations in the Y2005 dataset. As a result Eq. [12] has only a marginally lower MSD than Eq. [9], and Eq. [10] has a higher MSD than Eq. [9], even though the LC component is much smaller. We can conclude from Fig. 1 that the 714 points used to construct the Y2004 model were more successful in estimating the NDF of the Y2005 dataset than the 185 points used to construct the Y2005 model in estimating the NDF of the Y2004 dataset. These results suggest the potential danger of building a model from data of 1 yr and testing it on data from a different year, particularly when the explanatory variables are seasonally dependent, such as growing degree days. Inclusion of DOY (Eq. [7] and [11]) also increased model bias. This is logical, because although DOY broadly captures the trend of increasing NDF with time, a model built on data from an average year will fail to predict the variation in response to environmental factors such as higher than average temperatures. A further reason for understanding the components of MSD is that the implications of these errors can be very different. For example, if there is a legitimate explanation for an observed translation error (SB) it may be possible to adjust the model to compensate for this. On the other hand, errors due to scatter (LC) may be more difficult to ameliorate.

View larger version (22K):
[in this window]
[in a new window]
|
Fig. 1. Components of mean squared deviation (MSD) for mixed alfalfagrass multiple regression models. The three components are squared bias (SB), non-unity slope (NU), and lack of correlation (LC). Equation numbers correspond to models listed in Table 4.
|
|
The results for the split 1 and split 2 datasets (Table 5) are similar to each other. Coefficients of determination (R2) range from 0.88 to 0.94 for split 1 and 0.90 to 0.95 for split 2. Values of RMSE range from 21.7 to 29.9 g kg1 DM for split 1 and 20.1 to 29.5 g kg1 DM for split 2. Goodness of fit in terms of both R2 and RMSE is best for the five-variable models (Eq. [16] and [20]) and worst for the two variable models (Eq. [13] and [17]). The three-variable models with GDD5 (Eq. [14] and [18]) have better R2 and RMSE than those with DOY (Eq. [15] and [19]). All of the models have slopes close to 1.0, ranging from 0.965 to 1.031; however b values are significantly different from 1 for Eq. [14] and [18] (P < 0.05) and Eq. [16] and [20] (P < 0.01). In addition, a values are significantly different from 0 for Eq. [18] (P < 0.05) and Eq. [14], [16], and [20] (P < 0.01). Equations [13], [15], [17], and [19] have b values not significantly different to 1 and a values not significantly different to 0. It should be taken into consideration that with a large dataset with a low RMSE there is a greater chance that the b value will be significantly different from 1 and the a value will be significantly different from 0. In this example, where n = 435 for split 1 and n = 464 for split 2 it is unsurprising that some of the equations have statistically significant a and b values. Therefore caution must be used in deciding which models are appropriate for use and models should not automatically be discarded because of a significant bias. The magnitude of the differences of the b value from 1 and the a value from 0 should also be considered. Once again MSD is a useful statistic for such a dilemma. The MSD partitioning for Eq. [13] to [20] (see Fig. 1) shows that for each model the SB and NU components are very small compared to LC, which is by far the largest component of MSD. This suggests that although Eq. [14], [16], [18], and [20] are statistically biased; the magnitudes of these biases are not practically consequential.
Practical Equations for Producers
A concern with the models (Eq. [1][20]) proposed thus far is their dependence on the actual fraction of grass in the sward (GFRAC). In practice this is a difficult parameter to measure, particularly for producers. Although clipping and separating a sample, followed by drying and weighing grass and alfalfa components is a possibility, a model relying on such methods would not likely be widely used. The GGRP parameter was devised on the assumption that producers can reasonably estimate whether the grass fraction is closest to 0.2, 0.4, 0.6, or 0.8. Table 6 shows three models chosen for further analysis by variable selection using the entire dataset when GGRP is included in the group of explanatory variables and GFRAC is excluded. Models include the best two-variable model (Eq. [21]), the best three-variable model (Eq. [22]), and the best three-variable model not including growing degree days or DOY (Eq. [23]). Coefficients of determination (R2) range from 0.85 to 0.91, RMSE values range from 26.7 to 33.7 g kg1 DM, and SBC ranges from 5507 to 5901.
View this table:
[in this window]
[in a new window]
|
Table 6. Selected regression models to estimate mixed alfalfa and grass neutral detergent fiber (NDF) using categories for the fraction of grass in the sward, based on the entire dataset and datasets split by year (2004, 2005) and randomly (split 1, split 2).
|
|
The variables used to construct the models for the entire dataset were used to develop equations for the datasets split 1, split 2, Y2004, and Y2005, numbered as Eq. [24] to [35] in Table 6. The magnitudes of R2 and RMSE are similar for all models, ranging from 0.85 to 0.92 for R2 and 24.0 to 34.1 g kg1 DM for RMSE. In addition, for all datasets the ranking of the models from best to worst based on R2, RMSE, and SBC is the three-variable model with GDD5, the three-variable model with GMAXSTG, and the two-variable model.
Table 7 shows the results of fitting the observed NDF values from the split 1, split 2, Y2004, and Y2005 datasets to the equations developed from the corresponding dataset. For the Y2004 and Y2005 datasets the range of values for R2 (0.850.90) and RMSE (27.134.1 g kg1 DM) are acceptable. However, Y2005 dataset Eq. [25] to [26] have b values significantly (P < 0.001) >1 and a values are significantly (P < 0.001) <0. Thus NDF is overestimated, particularly at lower values of NDF. For Eq. [24] the b value is not significantly different from 1 and the a value is not significantly different from 0. For the Y2004 dataset, Eq. [27] to [29] have b values significantly <1 and a values significantly >0. For these equations the NDF is underestimated, particularly at lower values of NDF.
View this table:
[in this window]
[in a new window]
|
Table 7. Coefficient of determination (r2), root mean square error (RMSE), slope (b), and y-intercept (a) derived from the regressions of equation estimates on observed forage quality values from the corresponding dataset pair. Dataset pairs were (i) Y2004 and Y2005 and (ii) split 1 and split 2.
|
|
The components of MSD (Fig. 2
) again provide valuable insight into the appropriateness of the models. The two-variable models have MSD values of 1089 (Eq. [24]) and 1194 (Eq. [27]) and are almost entirely composed of LC. All three-variable models (Eq. [25], [26], [28], and [29]) have lower LC components, ranging from 727 to 1029; however these models also have higher SB (ranging from 61329) and NU (ranging from 57201) components. It is evident from comparing the magnitude of MSD totals in Fig. 2 that the models based on the Y2004 dataset were more successful in estimating the NDF of the Y2005 dataset than the reverse.

View larger version (23K):
[in this window]
[in a new window]
|
Fig. 2. Components of mean squared deviation (MSD) for mixed alfalfagrass multiple regression models using categories for the fraction of grass in the sward. The three components are squared bias (SB), non-unity slope (NU), and lack of correlation (LC). Equation numbers correspond to models listed in Table 6.
|
|
The results for the split 1 and split 2 datasets (Table 7) are similar to each other. Coefficients of determination (R2) range from 0.85 to 0.91, and RMSE values range from 26.7 to 34.2 g kg1 DM. All of the models have slopes close to 1.0, ranging from 0.958 to 1.040; however Eq. [31] and [34] have b values significantly different (P < 0.01) from 1 and a values significantly different (P < 0.01) from 0. Equations [30], [32], [33], and [35] have b values not significantly different from 1 and a values not significantly different from 0. The MSD partitioning for Eq. [30] to [35] in Fig. 2 shows that for each model the SB and NU components are very small compared to LC, which is by far the largest component of MSD. This suggests that although Eq. [31] and [34] are statistically biased the magnitudes of these biases are not practically significant.
Comparing the results from Tables 5 and 7, and Fig. 1 and 2, we can assess the impact of replacing GFRAC with the more practical variable GGRP. Using the split 1 two-variable equations as an example (Eq. [17] and [33]), R2 decreases from 0.88 to 0.85, RMSE increases from 29.9 to 33.4 g kg1 DM, and MSD increases from 904 to 1121 primarily due to increased scatter (LC). The trend of decreasing R2, increasing RMSE and increasing MSD due to scatter is similar for other pairs of equations. These results highlight the trade-off in model selection between goodness of fit and practical application. Although GGRP is less precise, it is a more realistic variable for inclusion in a rapid field-based model.
To explore how many variables could potentially be added to a model with GGRP, a nested variable selection procedure was performed whereby for each level the variables were added to the model in the same order with one additional variable. Variables were selected in the order GGRP, MAXHT, GDD41, GMAXNDX, ALTWS, DOY, MAXSTAGE, GMAXHT, GDD0, GMAXSTG, ALTD, and GCANOPY. Results in Table 8 show that the SBC initially drops abruptly from 6788 with one variable to 5507 with three variables. Changes in SBC from three to six variables, where the SBC reaches its lowest value of 5425, are smaller. With seven variables or more the SBC again begins to rise slightly, reaching 5454 with 12 variables. The coefficient of determination (R2) reaches a maximum of 0.921 and the RMSE reaches a minimum of 25.29 g kg1 DM with eight variables. With addition of further variables R2 does not change and RMSE increases. This outcome exemplifies a widely observed response called Ockham's hill wherein models with too few parameters underfit real signal whereas models with too many parameters overfit spurious noise, so a relatively parsimonious model is most predictively accurate (Gauch, 2002, p. 269326). The use of R2 and RMSE often recommend relatively complex models, in this case eight variables. Measures of predictive accuracy, such as SBC recommend simpler models, in this case six variables. In this example the far side of Ockham's hill is relatively flat and hence it is unlikely that using too many explanatory variables would dramatically affect model predictive accuracy.
View this table:
[in this window]
[in a new window]
|
Table 8. Nested variable selection for estimation of mixed alfalfa and grass neutral detergent fiber (NDF) using categories for the fraction of grass in the sward (n = 493).
|
|
Conclusions and Practical Implications
A combination of variables can be used to predict the NDF content of mixed stands of alfalfa and grass in experimental plots and producer fields. We found significant bias with some of the model estimates; however this is unsurprising given the large number of samples used. The prediction and calibration errors (RMSE) are small enough to suggest that the results are practically applicable to New York farms. The equations we propose should be further validated in New York and other states to build confidence in their predictive ability.
The fraction of grass in the stand was found to be a critical variable for construction of models. This obstacle can be potentially overcome by substituting a grouped estimate of the grass fraction for the actual grass fraction. Although RMSE values are higher using an estimate of the grass fraction, the magnitudes of the errors were still at an acceptable and useful level.
The results draw attention to the importance of having an effective method for estimating the grass fraction of the sward. This is even more important when we consider that the target NDF is higher for grass than for alfalfa due to differences in fiber digestibility. Thus, setting the target NDF for mixed stands also depends on an acceptable estimation of the grass fraction, a process that could be enhanced by technologies that improve the ability of producers to estimate its magnitude. One such method may be the development of benchmark photographs of mixtures of alfalfa with different fractions of typically grown grass species. These photographs could be made available on a website and also assembled into a booklet for field use. An example of a comparable extension technology is the Pasture Pic booklet (Singh, 1996) used in Australia to estimate botanical composition and pasture dry matter.
Another practical technology that could facilitate the use of these equations is a website that enables the input of data and calculates an estimate of the NDF of the stand. Such a tool could combine both field measurements and other data such as growing degree days. We demonstrated that a wide range of variables can be used to estimate mixed stand NDF and that the use of a relatively large number of variables can improve model accuracy. Software could also circumvent the need for lengthy calculations of multiple regression equations that could otherwise discourage use of the equations.
The equations we propose address the need to estimate preharvest quality of mixed stands of alfalfa and grass to aid harvest management and storage decisions. Like the PEAQ equations, these equations are not a replacement for accurate analysis of harvested alfalfa and grass. In addition, for the field-based variables we would suggest that producers take at least five samples to adequately represent the field. With these provisos, these equations offer a rapid method to estimate forage quality and could greatly help producers of mixed stands in timing harvest operations and optimizing the quality of harvested forage.
 |
ACKNOWLEDGMENTS
|
|---|
The authors thank Sam Beer, Kai Ming Zhao, Molly Lebowitz, Jen Beckman, Peter Barney, Aaron Gabriel, Jeff Miller, Bruce Tillapaugh, Rick Faucett, Michael Hunter, Michael Davis, Aysin Bilgili, and Leon Hatch for assistance with harvesting and analysis. This research was supported by a Kieckhefer Adirondack Fellowship.
 |
REFERENCES
|
|---|
- Allen, S.J., and G.W. Fick. 1990. On-farm testing of mean stage by count as a predictor of alfalfa forage quality. p. 185. In Agronomy abstracts. ASA, Madison, WI.
- Cherney, D.J.R., J.H. Cherney, and R.F. Lucey. 1993. In vitro digestion kinetics and quality of perennial grasses as influenced by forage maturity. J. Dairy Sci. 76:790797.[Abstract/Free Full Text]
- Cherney, J.H. 1995. Spring alfalfa harvest in relation to growing degree days. p. 2936. In Proc. Natl. Alfalfa Symp., 25th, Syracuse, NY. 2728 Feb. 1995. Certified Alfalfa Seed Council, Woodland, CA.
- Cherney, J.H., D.J.R. Cherney, D.G. Fox, L.E. Chase, and P.J. Van Soest. 1994. Evaluating forages for dairy cattle. Proc. Am. Forage Grassl. Council 3:207.
- Cherney, J.H., D.J.R. Cherney, and D. Parsons. 2006. Grass silage management issues. p. 3749. In Proc. from Silage for Dairy Farms: Growing, Harvesting, Storing, and Feeding, NRAES-181, Harrisburg, PA. 2325 Jan. 2006. Natural Resource, Agric., and Engineering Serv., Ithaca, NY.
- Cherney, J.H., and R.M. Sulc. 1997. Predicting first cutting alfalfa quality. p. 5365. In Silage: Field to feedbunk. Proc. from the Silage: Field to Feedbunk North American Conf., NRAES-99, Hershey, PA. 1113 Feb. 1997. Northeast Regional Agric. Eng. Serv., Ithaca, NY.
- Fick, G.W., and C.G. Janson. 1990. Testing mean stage as a predictor of alfalfa forage quality with growth chamber trials. Crop Sci. 30:678682.[Abstract/Free Full Text]
- Fick, G.W., P.W. Wilkens, and J.H. Cherney. 1994. Modeling forage quality changes in the growing crop. p. 757795. In G.C. Fahey, Jr. et al. (ed.) Forage quality, evaluation, and utilization. ASA, CSSA, and SSSA, Madison, WI.
- Gauch, H.G. 2002. Scientific method in practice. Cambridge Univ. Press, Cambridge.
- Gauch, H.G., J.T.G. Hwang, and G.W. Fick. 2003. Model evaluation by comparison of model-based predictions and measured values. Agron. J. 95:14421446.[Abstract/Free Full Text]
- Hintz, R.W., and K.A. Albrecht. 1991. Prediction of alfalfa chemical composition from maturity and plant morphology. Crop Sci. 31:15611565.[Abstract/Free Full Text]
- Kalu, B.A., and G.W. Fick. 1981. Quantifying morphological development of alfalfa for studies of herbage quality. Crop Sci. 21:267271.
- Kobayashi, K., and M.U. Salam. 2000. Comparing simulated and measured values using mean squared deviation and its components. Agron. J. 92:345352.[Abstract/Free Full Text]
- Moore, K.J., and L.E. Moser. 1995. Quantifying developmental morphology of perennial grasses. Crop Sci. 35:3743.[Abstract/Free Full Text]
- Singh, A. 1996. Pasture pic: Easy estimation of dry matter levels. Kondinin Group, Perth, Australia.
- Sulc, R.M. 1996. Equations for predicting quality of alfalfa. p. 115 124. In 1996 Proc. Tri-State Dairy Nutrition Conf., Fort Wayne, IN. 1415 May 1996. Ohio State Univ., Columbus.
- Sulc, R.M., K.A. Albrecht, J.H. Cherney, M.H. Hall, S.C. Mueller, and S.B. Orloff. 1997. Field testing a rapid method for estimating alfalfa quality. Agron. J. 89:952957.[Abstract/Free Full Text]
- Van Soest, P.J., J.B. Robertson, and B.A. Lewis. 1991. Methods for dietary fiber, neutral detergent fiber, and nonstarch polysaccharides in relation to animal nutrition. J. Dairy Sci. 74:35833597.[Abstract]
This article has been cited by other articles:

|
 |

|
 |
 
D. Parsons, J. H. Cherney, and H. G. Gauch Jr.
Alfalfa Fiber Estimation in Mixed Stands and Its Relationship to Plant Morphology
Crop Sci.,
October 2, 2006;
46(6):
2446 - 2452.
[Abstract]
[Full Text]
[PDF]
|
 |
|