Published in Agron. J. 96:1787-1790 (2004).
© American Society of Agronomy
677 S. Segoe Rd., Madison, WI 53711 USA
Notes and Unique Phenomena
ESTIMATION OF DURATION INDICES FOR REPEATED TENSIOMETER READINGS
David W. Meek* and
Jeremy W. Singer
USDA-ARS, National Soil Tilth Lab., 2150 Pammel Dr., Ames, IA 50011
* Corresponding author (meek{at}nstl.gov)
Received for publication April 21, 2004.
 |
ABSTRACT
|
|---|
Regression analysis of repeated-measures data from plots in large experiments is both labor intensive and time consuming. When duration estimates characterizing each individual repeated-measures series are of interest, numerical integrations are possible alternatives. This note evaluates a trapezoidal rule for estimating a wetness duration index that is derived from repeated tensiometer readings of matric potential associated with plots in a recent soybean [Glycine max (L.) Merr.] study. Tensiometer repeated-measures series were randomly selected from four of the treatments with their replications for a total of 15 plots (of 420 total) associated with a 2-yr experiment. Regression-based duration estimates were developed and compared with those derived using an unequally spaced trapezoidal rule. Equivalence between the two estimators was assessed based on multiple statistical analyses. The estimators were virtually identical by all criteria. The numerical method is faster and easier to employ. Furthermore, the concept can be applied to other repeated measurements, and the results may be useful as covariates in other analyses.
 |
INTRODUCTION
|
|---|
THE METHODOLOGY for and analysis of data from repeated measures are important considerations in many agronomic and other scientific studies (e.g., Meek et al., 1991; Wolfinger et al., 1993; Logsdon et al., 2002). A repeated option in mixed model procedures is now available in most well-known statistical software systems (e.g., Littell et al., 1996). Moreover, new or adaptive statistical methods for handling data from repeated measures are now available (Davidian and Giltinan, 1995, 2003). Modeling the time trend (response) of measurements associated with each individual experimental unit (like a plant, a plot, a specific sensor, etc.) allows the researcher to compare and contrast direct or derived characteristics of the time response, often with traditional analysis of variance. The basic methodology is as follows:- Fit the same curve form to each individual plot, for example, a logistic curve given by
where y is the dependent variable, t is time, and a, b, and t50 are the regression parameters. The parameter a is an asymptote or the upper limit for the measured data, the parameter b is a specific type of rate estimator, and the parameter t50 is the time value for which y/a = 1/2.
(2) Perform an ANOVA with the appropriate hypothesis tests for the given experimental design on the parameter or parameters of interest (e.g., Johnson and Milliken, 1983). An estimator derived from the curve fit can also be calculated. For example, were the curve a crop growth model, some possible estimators of interest are relative growth rates at specific times, dry matter duration for the whole season, or dry matter duration for a specific portion of the season.
Singer and Meek (2004) analyzed repeated measurements of light interception and matric potential from a 2 yr experiment on soybean. Some methods from Davidian and Giltinan (1995) were initially used for this purpose, but difficulties were encountered. The design of the experiment resulted in instrumenting 120 plots per year except for the tensiometer measurements in the first year, which only had 90. The tensiometers, however, were placed at two depths for each plot. So 420 time response curves needed to be developed and selected. An automated procedure was developed to model each time response for each plot's repeated measurements for a given year and candidate model. Composite behavior of all the observed responses guided the choice of suitable models. The output file included multiple performance indices and diagnostics (e.g., Meek et al., 1991; Logsdon et al., 2002). After examining the first year's modeling results, some critical problems were evident; no one model worked adequately on all of the plots. Exploratory data analysis and graphs of the time series for each variable revealed that individual time responses could be very different from the composite response. For example, in comparing the tensiometer series, all treatment combinations started out wet, and most completely dried out by the end the sampling period; some, however, retained water throughout the sampling period. In general, the same model fit these latter ones very poorly. Hence, we wanted to find an easier and better-fitted method.
Questions of seasonal differences among the treatments were of interest for two different kinds of repeated measures: light interception over the growing season and the previously mentioned tensiometer record. One way to determine a duration estimate for every individual set of repeated measurements is what was needed. This purpose of this paper is to propose and evaluate a possible simple method as an alternative to the procedure outlined in (1) above. For brevity, only data for tensiometer measurements from selected treatments in Singer and Meek (2004) were used.
 |
Materials and Methods
|
|---|
Background on the Experiment and Data
A 2-yr study evaluating biomass removal in no-tillage soybean was conducted in 2000 and 2001 on a Quakertown silt loam soil (fine-loamy, mixed, mesic Typic Hapludult) at the Rutgers University Snyder Research and Extension Farm near Pittstown, NJ (40°30' N, 75°00' W; elevation 170 m above sea level). A three-factor treatment structure was arranged in a split-split-plot randomized block design. There were four replications. The main factor was indeterminate soybean variety, either Pioneer Brand 93B53 (designated var = 1) or Agway PK394NRR (var = 2). The first split was three row spacingsnarrow (20 cm), intermediate (41 cm), and wide (76 cm)designated as 1 to 3, and the second split was no biomass removal (control) and biomass removal at V1 + V3 + V6, V6 + R1, R1 + R4 + R6, and V1 + V3 + V6 + R1 + R4 + R6 (Ritchie et al., 1994), designated as 1 to 5. In this note, "treatment" refers to the unique combination of the three factors. The treatments are applied to the smallest experimental unit, formally referred to as a "sub-subplot" of the design; for brevity and ease of reading, the term "plot" is used throughout this note instead. Soybean was planted using no-tillage techniques on 16 and 21 May in 2000 and 2001, respectively, at 518700 seeds ha1 using a no-tillage drill in the narrow and intermediate row spacings and a no-tillage planter in the wide row spacing. Further details are provided in Singer and Meek (2004).
Details on the repeated measures are, of course, pertinent to this paper. Tensiometers were inserted in each row at V1 at depths of 30 and 60 cm (designated as Depths 1 and 2) in all plots of three replications in 2000 and four in 2001. Soil water potential measurements were made frequently with a pressure transducer until the soil water pressure exceeded the air entry value of the porous cup (80 kPa). In 2000, measurements of matric potential of soil water started on 21 June, 36 d after planting, and ended on 31 July. In 2001, measurements started on 18 June, 28 d after planting, and ended on 9 August. While an effort was made to record measurements at regular intervals, sometimes the schedule could not be met. For almost all plots, the recording interval is not always equally spaced throughout the sampling period. Also, in plots where the soil had clearly dried out, no further measurements were recorded; hence, within a season, not all the individual series are of equal length. Thus, in 2000, 29 repeated measurements were included in the longest individual period of record; in 2001, the corresponding number was 40.
Four treatments were randomly selected from all data sets including years and depths. Hereafter, the treatments are referenced by an identification code defined by the concatenation of the biomass removal, variety, and row-spacing integers. The selected codes were 121 from Year 1 and Depth 1, 112 from Year 2 and Depth 2, 113 from Year 2 and Depth 1, and 423 from Year 2 and Depth 2. The first selection had three replications; the remainder all had four (designated 1 to 4). Thus, there were 15 total plot responses selected with repeated measures ranging from 13 to 38 observations per sampling period. For further reference, the associated replication number is concatenated to the front of the treatment combination code; this code uniquely identifies each selected plot (e.g., Treatment 121 in Year 1 is coded 1121). Note that determining a reasonable duration estimate from the observed time response for each plot's sensor, i.e., repeated measures or time series, is the purpose of this comparative analysis. Hereafter, these individual records are denoted as plot time response.
Wetness Duration Index
Let t be time measured in the number days after the start of the sampling period. Let x(t) be the tensiometer reading (kPa) at time t. Without loss of generality, x(t) can be taken as positive with x(t) = 0 being wet and x(t) = 80 being dry. Scale the reading from 0 to 1 to define a dimensionless wetness index, y(t), as follows: y(t) = 1 x(t)/80. Note the tensiometer instrument calibration allows x(t) to exceed 80 in some cases, but the values outside of 0 to 80 are not considered valid (see, e.g., Young and Sisson, 2002). Observations for each sensor were discarded when the soil effectively dried out [i.e., x(t) > 80]. Formally, this criterion is achieved by setting the first occurrence to 0 and the remaining to missing values. Now define the proposed wetness duration index,
W, with Eq. [1]:
 | [1] |
W is the definite integral of y(t) over the sampling period P (here in days, with P = final day starting day). Hence,
W is in days. A numerical trapezoidal rule for unequally spaced data is used to do the actual integrations, Eq. [2]:
 | [2] |
where y and t are defined the same as in Eq. [1], n is the number of observations, and i is the observation index. The trapezoidal rule was selected over alternative numerical integration methods for several reasons. Given that data are often unequally spaced but with incremental changes that are gradual relative to the entire period of observation, alternative procedures (like adaptive quadrature methods, see, e.g., Section 4.5 in Burden et al., 1981) are comparatively much more complicated but likely unnecessary. Note some important conceptual features of such an index. Mathematically, if y(t) is always 1, i.e., completely wet, then
W/P = 1. So
W, if scaled by P, can be considered the equivalent fraction of time a plot is completely wet; otherwise,
W can be considered the corresponding number of days. Note that
W/P =
y
, where
y
is the expectation value for y(t) (i.e.,
y
is the definition for the mean of a continuous variable). If y(t) starts out wet and then completely dries and stays that way for the rest of the period, then
W is similar to a temporal scale of fluctuation (see, e.g., Vanmarcke, 1983). Ideas for the overall approach are similar to those in Meek (2001). Physically,
W is a cumulative or seasonal measure of exposure to soil water.
Evaluation Procedure
To evaluate the
W estimates via Eq. [2], we assumed that the tensiometers were well calibrated (one meaning of unbiased) and that the calibrations did not drift during the experiment. Henceforth, we refer to bias as a systematic error in the
W estimates via Eq. [2] to those estimated from a reference methodhere, the regression-based
W estimates. For the Eq. [2]
W estimates to be sound, two conditions need to be established. First, the Eq. [2]
W estimates are reasonably equivalent to regression-based
W estimates. Second, for using the Eq. [2]
W estimates in further analyses like ANOVA, the Eq. [2]
W estimators' variability over treatment replications are reasonably similar to that obtained with the corresponding regression-based
W estimates. Note that both of the
W estimates are subject to random (measurement) errors in the tensiometer data, and so both are considered random variables.
For each plot time response, the analyses proceeded as follows: Both methods for estimating
W were used. Using Eq. [2],
W was estimated first (hereafter, designated
trap). Then, a time response model was selected from at least two candidate models fit to same plot time response data. Models considered were full- or partial-term polynomials (generally only up to cubic terms) or splines with these same curve forms (see, e.g., Kimball, 1976, or Meek et al., 2001). The spline models were fit with a nonlinear regression procedure. Model selection considered multiple performance criteria, including residual diagnostics. Error structure was reasonably homoscedastic. Next, analytic integrations for the selected models were calculated to determine the
W estimates (hereafter denoted as
reg). The first equivalence condition was examined based on three statistical criteria. The individual distributions of
reg,
trap, and 
=
reg
trap along with (
reg,
trap) scatterplot were all simultaneously examined graphically with Berg's plot (1992). Next, the 
mean and distribution were formally examined with univariate statistics, i.e., a paired observation t test (see, e.g., p. 252 in Walpole and Meyers, 1978). Finally, if the mean for 
was not different from zero (i.e., the mean bias error is zero), then a slope-only measurement error regression method was used to estimate the (
reg,
trap) relationship and its uncertainty; here, the method of Kerrich (1966) was used. If the slope is not statistically different from 1, then the two estimators are equivalent.
To compare the variability of the two estimators over different replicates, the variance ratio for each treatment was estimated. The numerator was for the treatment variance for the
trap estimates. The denominator was from the corresponding
reg estimates. Probabilities for each variance ratio were estimated. If the variance ratio was less than 1, the reciprocal was tested. If all the ratio probabilities were reasonably insignificant (P > 0.35), then the second condition is met. All analyses were performed with SAS V8.2 (SAS Inst., 1999).1
 |
Results and Discussion
|
|---|
The
reg estimates require modeling y(t), the wetness index, first. A simple parabola model was selected for three first-year plot time responses [y(t) for Plots 1121, 2121, and 3121]. Plot 1111 required a partial-term sixth degree polynomial. All the remaining plot time responses were best modeled with splines, most with just two phases, and three with three phases (Plots 4423, 4112, and 2111). The three-phase responses had a secondary rise and fall in the y(t) level after a late-season rainfall. Figure 1 shows the response for Plot 2423; this y(t) pattern was the only plot time response that could be modeled simply by a spline with two linear segments. The remaining y(t) plot time responses required models with single or multiple higher-order terms and, as mentioned, in some cases, additional spline segments to obtain reasonably good interpolation. Across all selected models, R2 ranged from 0.881 to 0.995 and had a mean of 0.948 (median 0.952). The corresponding number of model parameters ranged from 3 (for the simple parabolas) to 8 (for one of the three-phase splines). Corresponding error degrees of freedom ranged from 10 to 30. It is obvious, hence, that no one model fit all the plots adequately. In addition, for
reg estimation, a different analytical integration is required for each different regression model form. Aside from possibly being tedious, the best regression approach to estimating
reg for this kind of data is clearly very labor intensive.

View larger version (27K):
[in this window]
[in a new window]
|
Fig. 1. The time response curve and scattergram is shown for Plot 2423's wetness index, y(t). For both estimation methods, the wetness duration index is 36.5. The model has R2 = 0.974 for 31 observations (P < 0.0001). The selected regression model is two splined line segments with a common join point; formally, it can be written as follows:
The gray band is the 95% confidence interval, the dashed line represents the model estimates, and the black circles are the wetness index values estimated by scaling the soil matric potential observations.
|
|
In contrast, estimating
trap with Eq. [2] requires considerably less work; it can be accomplished for all plots in one data pass in many statistical programs or spreadsheets. As shown in Fig. 2, the corresponding
trap estimates were virtually identical to the
reg estimates because all three equivalence conditions were extremely well met. Notice, as shown in the Fig. 2 axial box plots, the central tendency and distribution of both estimators are nearly identical. The scatter is very low, and there is no apparent bias. In fact, the range of the difference box plot is very small next to that of the estimators, and the box representing interquartile range of the difference is barely discernable. The descriptive statistics, bias, and regression slope all meet the defined equivalence criteria extremely well. Given that incremental summations tend to average out random errors, this overall result is no surprise.

View larger version (25K):
[in this window]
[in a new window]
|
Fig. 2. Berg (1992) equivalence plot for comparing reg to trap. Here n = 15. For reg, the mean is 27.946 ± 2.416 (P < 0.0001), the maximum value is 36.499, the minimum value is 10.602, and the median value is 31.959 (see vertical box plot). For the trap, the mean is 27.927 ± 2.423 (P < 0.0001), the maximum value is 36.504, the minimum value is 10.612, and the median value is 32.189 (see horizontal box plot). Based on the trap reg, the mean bias is 0.0192 ± 0.0484 (P < 0.6513), the maximum value is 0.415, the minimum value is 0.300, and the median value is 0.008 (see difference box plot). The Kerrich (1966) regression slope is 1.001 with a 95% confidence interval ranging from 0.997 to 1.005 with r = 0.9998 (P < 0.0001).
|
|
The second criterion was also very clearly met. The four treatment variance ratio values were 1.17 (P
0.451), 1.04 (P
0.487), 0.85 (inverse = 1.18, P
0.540), and 1.07 (P
0.479) and, hence, the variances are nearly the same by any reasonable standard. Furthermore, no overall test was needed because every individual ratio was not different from unity.
The basic concepts used to define y(t) and
W can be applied to other measurements with repeated observations for a selected sensor or site. In this study, light interception for the crop was also measured throughout the season on each plot, similar to the work of Singer (2001). In Singer and Meek (2004), a light transmittance index was defined. Here again, a corresponding duration index is defined. Let L(t) be the light interception as a fraction of incoming photosynthetically active radiation (PAR) at time t. The corresponding transmittance is then T(t) = 1 L(t), and the transmittance duration index is the definite integral of T(t) over the period. Again, for actual estimation, a trapezoidal rule was used. Similar to
W, the transmittance duration index,
L, result can be interpreted as the equivalent number of days that T(t) = 1.
While such indices can be analyzed directly, they can also be extremely useful in other analyses. Often spatial variability can be reduced or eliminated by including an appropriate covariate in an analysis. So measurement or estimation of known possible covariates is a warranted research activity. Duration indices or variations of them could serve as covariates in the analysis of other variables. Variations include the integral over time of quantity less than or in excess of reference values or within a range like growing degree days. While suitable definitions for any given variable may take creativity, the possible results in terms of insight or prediction could be extremely beneficial.
 |
Conclusions
|
|---|
The computation of the
trap is comparatively easier and much less work than fitting and integrating the modeled y(t) to determine
reg. Both approaches produce very similar results, indicating that the numerical method is preferable to the regression method for analyzing tensiometer data collected over time. With suitable repeated-measures data, agronomists can readily employ the proposed method's results directly as a variable to be analyzed or as a possible covariate in other analyses.
 |
ACKNOWLEDGMENTS
|
|---|
The authors thank Dr. B. Mackey, Biometrician, USDA-ARS, Albany, CA; Prof. K. Moore, Department of Agronomy, Iowa State University, Ames, IA; and an anonymous reviewer for their helpful comments.
 |
NOTES
|
|---|
1 The mention of a trade name is for informational purposes only and does not imply an endorsement by the USDA-ARS. 
 |
REFERENCES
|
|---|
- Berg, R.L. 1992. First place. Best presentation of data-monochrome. p. 15211527. In Proc. Annu. SAS Users Group Int. Conf., 17th, Honolulu, HI. 1215 Apr. 1992. SAS Inst., Cary, NC.
- Burden, R.L., J.D. Faires, and A.C. Reynolds. 1981. Numerical analysis. 2nd ed. Prindle, Weber, and Schmidt, Boston.
- Davidian, M., and D.M. Giltinan. 1995. Nonlinear models for repeated measurement data. 1st ed. Chapman Hall/CRC Press Inc., Boca Raton, FL.
- Davidian, M., and D.M. Giltinan. 2003. Nonlinear models for repeated measurement data: An overview and update. J. Agric. Biol. Environ. Stat. 8(4):387419.
- Johnson, P., and G. Milliken. 1983. A simple procedure for testing linear hypothesis about the parameters of a nonlinear model using weighted least squares. Commun. Stat.Simul. Comput. 12(2):135145.
- Kerrich, J.E. 1966. Fitting the line Y =
X when errors in observation are present in both variables. Am. Stat. 20:24.
- Kimball, B.A. 1976. Smoothing data with cubic splines. Agron. J. 68(1):126129.[Abstract/Free Full Text]
- Littell, R., G. Milliken, W. Stroup, and R. Wolfinger. 1996. SAS system for mixed models. Publ. no. 55235. SAS Inst., Cary, NC.
- Logsdon, S.D., T.C. Kaspar, J.H. Prueger, and D.W. Meek. 2002. Nitrate leaching as influenced by cover crops in large soil monoliths. Agron. J. 94(4):807814.[Abstract/Free Full Text]
- Meek, D.W. 2001. A semiparametric method for estimating the scale of fluctuation. Comput. Geosci. 27(10):12431249.
- Meek, D., D. Dinnes, D. Jaynes, C. Cambardela, T. Colvin, J. Hatfield, and D. Karlen. 2001. An autoregression model for a paired watershed comparison. p. 223231. In G. Milliken (ed.) Proc. Appl. Stat. Agric. Conf., 12th, Manhattan, KS. 12 May 2000. Stat. Dep., Kansas State Univ., Manhattan.
- Meek, D.W., R.B. Hutmacher, B.E. Mackey, and K.R. Davis. 1991. Heteroscedasticity in whole plant growth curves developed from nonreplicated data. Agron. J. 83:417424.[Abstract/Free Full Text]
- Ritchie, S.W., J.J. Hanway, H.E. Thompson, and G.O. Benson. 1994. How a soybean plant develops. Special Rep. 53. Iowa State Univ., Ames.
- SAS Institute. 1999. SAS/STAT user's guide. Version 8. Vol. 13. Publ. 57388. SAS Inst., Cary, NC.
- Singer, J.W. 2001. Soybean light interception and yield response to row spacing and biomass removal. Crop Sci. 41:424429.[Abstract/Free Full Text]
- Singer, J.W., and D.W. Meek. 2004. Repeated biomass removal affects soybean resource utilization and yield. Agron. J. 96:13821389.[Abstract/Free Full Text]
- Vanmarcke, E. 1983. Random fields: Analysis and synthesis. MIT Press, Cambridge, MA.
- Walpole, R.E., and R.H. Meyers. 1978. Probability and statistics for engineers and scientists. 2nd ed. Macmillian Publ. Co., New York.
- Wolfinger, R., N. Miles-Mcdermott, and J. Kendall. 1993. Analyzing split-plot and repeated-measures designs using mixed models. p. 190200. In G. Milliken (ed.) Proc. 1992 Kansas State Conf. on Appl. Stat. Agric., Manhattan, KS. 2628 Apr. 1992. Stat. Dep., Kansas State Univ., Manhattan.
- Young, M.H., and J.B. Sisson. 2002. Tensiometry. p. 575608. In J.H. Dane and G.C. Topp (ed.) Methods of soil analysis: Part 4. Physical methods. SSSA Book Ser. 5. SSSA, Madison, WI.