Published in Agron J 99:1654-1664 (2007)
DOI: 10.2134/agronj2007.0170
© 2007 American Society of Agronomy
677 S. Segoe Rd., Madison, WI 53711 USA
Remote Sensing
Corn and Soybean Mapping in the United States Using MODIS Time-Series Data Sets
Jiyul Changa,*,
Matthew C. Hansena,
Kyle Pittmana,
Mark Carrollb and
Charlene DiMicelib
a Geographic Information Science Center of Excellence, South Dakota State Univ., Brookings, SD 57007
b Dep. of Geography, Univ. of Maryland, College Park, MD 20742
* Corresponding author (jiyul.chang{at}sdstate.edu)
 |
ABSTRACT
|
|---|
Monitoring and mapping of U.S. croplands has long been a primary goal of many users of earth observation satellite data. The advantages of using low spatial and high temporal resolution data are (i) increased ability to monitor the phenological change of crop plants, and (ii) the possibility of generating consistent large area crop cover maps. This study investigates the potential of 500-m MODIS (MODerate Resolution Imaging Spectroradiometer) data in estimating corn (Zea mays L.) and soybean [Glycine max (L.) Merr.] area for the dominant production areas of the USA. To avoid cloud cover, MODIS 32-day composites for all land bands, normalized difference vegetation index (NDVI), and land surface temperature (LST), were used covering March 2002 to February 2003. These time-sequential images were further composited to produce 279 annual time-integrated metrics. Using USDA-NASS Cropland Data Layers (CDL) as subpixel training data, percentage soybean and corn cover per 500-m pixel was calculated and accuracy was assessed at national, state, and county scales using data from the 2002 NASS Census of Agriculture. When these estimates were compared with the NASS Census, r2 values for corn, soybean, and combined corn and soybean areas were 0.957, 0.949, and 0.984 at the state level, respectively. At the national scale, MODIS estimates of corn and soybean cover differed by 6 and 4%, respectively. Results indicate a robust potential for using MODIS in crop type monitoring applications.
Abbreviations: CDL, cropland data layer LST, land surface temperature MODIS, Moderate Resolution Imaging Spectroradiometer NASS, National Agricultural Statistics Service NDVI, normalized difference vegetation index RMSE, root mean square error
Corn and Soybean Mapping in the United States Using MODIS Time-Series Data Sets
Jiyul Changa,*,
Matthew C. Hansena,
Kyle Pittmana,
Mark Carrollb and
Charlene DiMicelib
a Geographic Information Science Center of Excellence, South Dakota State Univ., Brookings, SD 57007
b Dep. of Geography, Univ. of Maryland, College Park, MD 20742
* Corresponding author (jiyul.chang{at}sdstate.edu)
Received for publication May 22, 2007.
Monitoring and mapping of U.S. croplands has long been a primary goal of many users of earth observation satellite data. The advantages of using low spatial and high temporal resolution data are (i) increased ability to monitor the phenological change of crop plants, and (ii) the possibility of generating consistent large area crop cover maps. This study investigates the potential of 500-m MODIS (MODerate Resolution Imaging Spectroradiometer) data in estimating corn (Zea mays L.) and soybean [Glycine max (L.) Merr.] area for the dominant production areas of the USA. To avoid cloud cover, MODIS 32-day composites for all land bands, normalized difference vegetation index (NDVI), and land surface temperature (LST), were used covering March 2002 to February 2003. These time-sequential images were further composited to produce 279 annual time-integrated metrics. Using USDA-NASS Cropland Data Layers (CDL) as subpixel training data, percentage soybean and corn cover per 500-m pixel was calculated and accuracy was assessed at national, state, and county scales using data from the 2002 NASS Census of Agriculture. When these estimates were compared with the NASS Census, r2 values for corn, soybean, and combined corn and soybean areas were 0.957, 0.949, and 0.984 at the state level, respectively. At the national scale, MODIS estimates of corn and soybean cover differed by 6 and 4%, respectively. Results indicate a robust potential for using MODIS in crop type monitoring applications.
Abbreviations: CDL, cropland data layer LST, land surface temperature MODIS, Moderate Resolution Imaging Spectroradiometer NASS, National Agricultural Statistics Service NDVI, normalized difference vegetation index RMSE, root mean square error
 |
INTRODUCTION
|
|---|
SINCE THE LAUNCH of the first earth observation satellites, monitoring and mapping of U.S. croplands has been a primary goal of many data users. Agencies such as the National Aeronautics and Space Administration (NASA), National Oceanic and Atmospheric Administration (NOAA), and U.S. Department of Agriculture (USDA) have sought to employ these data for large area monitoring. The early LACIE (Large Area Crop Inventory Experiment) and AgRISTARS (Agriculture and Resources Inventory Surveys Through Aerospace Remote Sensing) programs, which began in 1974 and 1980, represent initial, concerted efforts to improve crop monitoring via the use of remotely sensed data sets (Boatwright and Whitehead, 1986). An operational outgrowth of these programs has been the Crop Condition Data Retrieval and Evaluation (CADRE) system used by the USDA Foreign Agricultural Service (FAS) (Reynolds, 2001), which focuses on the delivery of agrometeorological data and incorporates earth observations in assessing crop development. The USDA-NASS is another significant user of remotely sensed imagery in supporting their mission of surveying and reporting on crop type acreage through the production of earth observation-based, state-level CDL products (National Agricultural Statistics Service, 2007).
However, synoptic U.S. mapping of crop types delivered at the end of the growing season has not been achieved. Despite much research in the area of cropland mapping using earth observation data sets, a number of factors have inhibited operational and timely map production. First, differentiating crop types requires fine temporal-scale imagery that allow for the identification of the subtle differences between various crop phenologies. Second, cropped fields are discrete entities in landscapes and need an appropriate spatial resolution to be unambiguously resolved. For earth observation satellites, the typical engineering tradeoff is temporal versus spatial resolution. Higher spatial resolution data sets do not have high return rates in terms of repeated imaging over a given locale. Other sensors with coarser spatial resolution have wider swaths that allow for higher rates of repeated imaging, but are limited in terms of generating accurate area estimates.
Many studies have utilized high spatial resolution satellite imagery, particularly Landsat data, to classify and map crops. Mergerson (1981) used a sampling approach to relate Landsat imagery to ground survey data. Hall and Badwar (1987) emphasized the use of well-calibrated signatures and multitemporal data sets for large-area crop monitoring. More recently, numerous local-scale studies have been employed using Landsat Enhanced Thematic Mapper Plus (ETM+), and other data sources such as the Indian Remote Sensing–Linear Imaging Self-Scanning Sensor 3 to map crop types (El-Magd and Tanton, 2003; De Wit and Clevers, 2004; Turker and Arikan, 2005).
Limitations to using high spatial resolution data include data costs and infrequent acquisitions combining to preclude the derivation of regional-scale crop cover maps on an operational basis. To solve these problems, researchers had exploited high temporal, low spatial resolution imagery to characterize crops. Quarmby et al. (1992) employed linear mixture model techniques to estimate subpixel crop area estimates. Others, such as Doraiswamy et al. (2003) and Rembold and Maselli (2006), combined Landsat Thematic Mapper (U.S. Department of the Interior and U.S. Geological Survey, Washington, DC) and NOAA AVHRR (Advanced Very High Resolution Radiometer) 1-km imagery to improve crop monitoring capabilities. Research in this domain has reflected the limitations in crop monitoring capabilities as no single sensor had the required spatial and temporal detail to adequately capture crop dynamics at high spatial resolutions. New earth observation assets to monitor crop cover have filled the operational gap between higher resolution (
30 m) and coarse resolution (
1000 m) instruments, and include sensors such as MODIS and the Indian Remote Sensing Advanced Wide Field Sensor (AWiFS).
The MODIS data in particular offer a unique capability in balancing the requirements of spatial detail and temporal density. The MODIS sensor has seven bands designed specifically for terrestrial monitoring with spatial resolutions of 250 and 500 m. The MODIS data are free, feature daily acquisitions above 30 degrees latitude (Wolfe et al., 1999), and undergo standard processing to surface reflectance that enables regional, continental, and global scale mapping (Vermote et al., 1997). As the temporal frequency of global datasets becomes finer, the ability to monitor phenological change of crop plants during growing seasons increases. Doraiswamy et al. (2004) used Landsat ETM+ and MODIS 250-m imagery to monitor alfalfa (Medicago sativa subsp. sativa), corn, and soybean conditions and to generate models for crop yield simulations in southern Iowa. Wardlow et al. (2006) used MODIS 250-m, 16-d composite NDVI time series data to estimate greenup onset dates of corn, sorghum [Sorghum bicolor (L.) Moench], and soybean in Kansas. They then related USDA weekly crop progress reports to evaluate the results in evaluating MODIS data as inputs to crop type monitoring. Ozdogan et al. (2003) monitored the long-term changes (1993–2002) of agricultural lands in southeastern Turkey by exploiting the temporal information of MODIS and the spatial detail of Landsat in identifying irrigated land expansion.
Thematic outputs also differ between finer and coarser resolution map products. For high-resolution crop studies, common discrete classification methods are often employed. The high-resolution studies mentioned previously classified data using pixel-based (El-Magd and Tanton, 2003) and field-based (De Wit and Clevers, 2004; Turker and Arikan, 2005) maximum likelihood methods. However, in the case of low spatial resolution data, crop area estimations must rely on subpixel cover characterizations because of the prevalence of mixed pixels (Gallego, 2004), such as in the study by Quarmby et al. (1992). To address the mixed pixel problem for global-scale mapping, Hansen et al. (2002, 2003) employed the continuous field algorithm for mapping vegetative traits such as tree cover using MODIS and AVHRR data. In the continuous field approach, each coarse resolution pixel is characterized as 0 to 100% cover of a vegetation class or physiognomic trait, ameliorating the primary limitation of coarse spatial resolution data.
Organizations such as the USDA Foreign Agricultural Service (FAS) and NASS work to monitor both national and global-scale food and crop production. Improving the efficiency and timeliness of monitoring techniques would be beneficial both in anticipating market fluctuations and in ensuring an efficient and adequate food supply. The emphasis on timely and accurate crop cover estimations is currently topical at a national scale given the recent energy policies that emphasize a shift of significant agricultural land from food production to energy production. The recent Billion-Ton Annual Supply study (Perlack et al., 2005) reported on the ability of the United States to produce enough dry biomass feedstock to replace 30% or more of the country's present petroleum consumption. The report states that agricultural lands alone could produce this much dry biomass, mostly through a dramatic increase in yields by 50%. Changes in crop rotation practices and extensification into lands not currently in production are also a means to increasing feedstock production. Such land use change dynamics may be monitored using remotely sensed data sets.
The objective of this study was to investigate the potential of MODIS data in estimating corn and soybean areas for the dominant production areas of the United States. Subsequent iterations will employ the 250-m data stream from the Collection 5 version of the MODIS MOD44B vegetation continuous field algorithm (unpublished data, 2007). The broader study objective was to advance the operational monitoring of croplands. To create such a capability, reliable data streams produced in a consistent and standard way need to be combined with appropriate algorithms and cover characterization themes. Standard MODIS time-series data sets are consistently calibrated over time and have sufficient spatiotemporal detail to map the dominant commodity crops of the United States. When combined with appropriate supervised learning algorithms, a repeatable method for mapping is produced. Results from this initial study will be used to test the procedure for multiple years and in other global centers of corn and soybean production.
 |
MATERIALS AND METHODS
|
|---|
Training Data
The USDA NASS 2002 CDLs (National Agricultural Statistics Service, 2002) were used to provide subpixel training for the 500-m MODIS data. The states for training were North Dakota, Nebraska, Iowa, Illinois, Indiana, Virginia, and North Carolina. The CDL is a rasterized, georeferenced, and categorized land cover map produced using Landsat TM and Landsat ETM+ imagery. The Landsat data used by NASS cover dates from March to September 2002. The ground resolution is 30 by 30 m and the coordinate system is the Universal Transverse Mercator (UTM) projection.
NASS collects the remote sensing Acreage Estimation Program's field level training data during the June Agricultural Survey, which is a national survey based on a stratified random sample of land areas selected from each state's area frame. An area frame is a land use stratification based on percentage cultivation. Classification accuracy for major crops varies from 80% to the high 90s for kappa coefficients, and the correlation coefficients (r) with ground data range from 0.6 to nearly 1 (Allen et al., 2002).
For this study, corn and soybean cover pixels were selected from the NASS CDL classification and reclassified to a value of 100, while all other land cover types became 0. Each CDL layer was reprojected using a nearest neighbor resampling to the MODIS Sinusoidal projection. These 30-m images were aggregated to the 500-m cell size, yielding a 0 to 100% value based on the proportion of 30-m pixels per crop type that fell within each MODIS 500-m grid cell.
MODIS Annual Metrics
Daily global, gridded, georeferenced MODIS data covering the seven MODIS land bands, NDVI and LST (Wan et al., 2002), comprised the input dataset. The MODIS land data are derived from the MOD09 standard product (Vermote et al., 2002) and the 8-d L3 1-km MODIS LST product (MOD11) (Wan et al., 2002). The MODIS inputs consisted of 250-m Band 1 (620–670 nm, red), Band 2 (841–876 nm, near infrared), 500-m Band 3 (459–479 nm, blue), Band 4 (545–565 nm, green), Band 5 (1230–1250 nm, midinfrared), Band 6 (1628–1652 nm, midinfrared), and Band 7 (2105–2155 nm, midinfrared) data. The NDVI was calculated using Bands 1 and 2.
To avoid the presence of cloud cover, the MODIS daily acquisitions were converted to 32-d composites covering March 2002 to February 2003, resulting in 11 roughly monthly composites for each of the nine inputs (Carroll et al., 2007, unpublished data). Compositing is a standard approach to compiling time-series data sets by preferentially selecting cloud-free observations over a given interval. From these 99 composite images, an additional series of 279 annual MODIS metrics was generated. These image inputs are not tied to a specific time of year, and have been shown to be more appropriate for mapping land cover at continental and global scales than are time-sequential monthly composite image data sets (Hansen and DeFries, 2004). For regional scales featuring a synchronized phenology, metrics have been shown to complement time-sequential composites by providing a generalized annual feature space that enables the extension of regional spectral signatures for given vegetation traits of interest, such as corn and soybean cover.
Regression Tree Analysis
For regression tree analysis, the S-Plus statistical package (Venables and Ripley, 1994) was used. Regression trees have been used with remotely sensed data sets by others (Michaelson et al., 1994; DeFries et al., 1997; Prince and Steininger, 1999; Hansen et al., 2002, 2003) and have been used to monitor change on an annual time step (Hansen and DeFries, 2004). They are a nonlinear tool which recursively splits a continuous training variable into subsets, called nodes, which minimize the overall residual sum of squares. The regression tree has the following form:
 | [1] |
where D is the deviance as measured by the corrected sum of squares for a split. This is calculated from all j cases of y and the mean value of those cases, u. Input data are analyzed across all spectral values using a threshold approach. All of the training data higher and lower than the value of each tested threshold are treated as two distinct data assemblages. The threshold, or split, which produces the greatest reduction in deviance is used to divide the data, and the process begins again for the two newly created subsets.
Multiple independent runs of decision trees via sampling with replacement allow for more reliable results. This procedure is called bagging (Breiman, 1996), and consists of calculating a mean or median result from the multiple runs. By repeatedly sampling the training data and growing multiple tree models, isolated overfitting within any individual tree is reduced by calculating a median multitree output. The sample training sets were created by random selection, with replacement, of 5% of the training points in each of the seven states. The sampling was done 30 times and 30 different tree models were generated.
For this study, the set of independent variables were the 378 annual MODIS metrics and 32-d composites, and were used to predict the dependent variable of percentage corn and soybean cover from resampled NASS CDL data. Processing limitations precluded the use of perfectly fit trees, so a minimum deviance-explained of 0.001% of the root deviance of the sample dataset was used as a cutoff point to stop the creation of further splits. The median value of each node was used to preserve high and low cover values in the regression tree output.
Parsimonious Model
After running the regression tree analysis using all 378 MODIS inputs, the most useful metrics for identifying specific crop types were selected and the regression tree analysis was run a second time with this subset of metrics to generate a parsimonious model. There are several advantages to using the reduced set of metrics, including data reduction to ease processing and analysis, and the increased portability of a simplified model to data from other years and different geographic regions.
To select imagery that was most useful for identifying corn and soybean cropped acreage, the 378 inputs were sorted by the reduction of deviance attributable to them across the 30 original regression trees. It was decided that (i) a subset of ranked MODIS inputs explaining at least 80% of the variance explained by the nonparsimonious models would be used, and (ii) once that threshold was reached, a natural break in the amount of deviance explained would determine the metrics to be included in the parsimonious model. This resulted in a set of 14 metrics for corn, representing 86.1% reduction of the deviance from the nonparsimonious models, and a set of nine metrics for soybean, representing 80.9% reduction of the deviance from the nonparsimonious models. Figure 1
shows a flowchart of the crop type mapping methodology.
Validation
The resulting output MODIS cover estimates per pixel, ranging from 0 to 100%, were averaged by county, extracted to text files, and converted to acreage. The results were compared with county level NASS census data taken in 2002 (National Agricultural Statistics Service, 2002) for validation. The validation was done by simple linear regressions between known values (NASS 2002 census data) and estimates (500-m MODIS data) at national, state, and county levels. The per-state root mean square error (RMSE) between NASS 2002 census data and 500-m MODIS estimates was calculated using the equation
 | [2] |
where NASS was the NASS 2002 census data, MODIS was the 500-m MODIS estimates, and n was the number of counties.
 |
RESULTS
|
|---|
Corn
Regression Tree Analysis
The 30 bagged regression trees using all 378 MODIS inputs resulted in models that explained 69.0% of the root node deviance (Table 1
). A single metric, relating growing season amplitude of NDVI (the difference of the mean of the three highest annual NDVI values and the mean of the fourth-through-sixth highest annual NDVI values) over the year explained 33.9% of the total deviance. A second LST metric (the mean of the thermal band for the seventh-through-ninth coolest months of the year) explained a further 15.8% of the deviance.
View this table:
[in this window]
[in a new window]
|
Table 1. The list of MODIS annual metrics which reduced node deviance most when identifying corn fields. These selected metrics were used to generate parsimonious regression tree models.
|
|
From these results, 14 metrics were used to create a parsimonious model that explained 67.0% of total deviance, accounting for 97.2% of the deviance explained by the original nonparsimonious model. This reflects that the greatly reduced feature space was still able to capture the information employed by the full model in discriminating corn cover.
NDVI-related imagery accounted for three of the 14 selected inputs (growing season amplitude,
August 32-d composite, and
July 32-d composite), and Fig. 2a
shows the phenological patterns of NDVI. In areas of high corn density in North Dakota, Illinois and Indiana, meaning pixels with >50% of their area covered by corn fields, there was a sharp peak in July and August and a large drop in September. In North Carolina, the sharp peak was in June and July, with a large drop in August and September. This phenological curve is typical of cropped fields in the Midwest, where July and August are the greenest months with senescence in September. In the Southeast (North Carolina), corn is planted earlier than in the Midwest, with the peak canopy cover occurring earlier and featuring a more gradual senescence. The less well-defined peak in NDVI for corn in North Carolina reflects a more fragmented landscape and greater mixing of forest cover with agricultural fields. Based on these patterns, the amplitude of change in mean NDVI from the three greenest months to the fourth-through-sixth greenest months becomes the most important metric for differentiating fields of corn cultivation from other land cover types. The timing offset in terms of peak greenness is accounted for in metric spectral space. Figure 2b shows the averaged phenology of noncorn dominated pixels for North Dakota, Illinois, Indiana, and North Carolina, and highlights the separability of corn-dominated phenologies from other cover types.

View larger version (15K):
[in this window]
[in a new window]
|
Fig. 2. Seasonal changes of NDVI for corn fields, where (a) is high (>50%) corn field density and (b) is low (<25%) corn field density pixels using 32-d composites from March 2002 to February 2003.
|
|
Statistical Analysis
The 19 major corn production states that account for 95% of all U.S. corn production were selected for final analysis. Figure 3
shows the bagged tree results using the subset of metrics (parsimonious model) along with the NASS census data by state. In 13 out of 19 states, the NASS census data were within ±1 quartile of the bagged tree results from MODIS data. In Texas, the MODIS results were highly variable and reflect a lack of training data in the southern tier of the United States. Figure 4a
shows the median result of the 30 bagged trees per pixel, and Fig. 4b shows the NASS census values for corn area per county. The spatial patterns are correlated and reflect the regional mapping capability of the MODIS data.

View larger version (12K):
[in this window]
[in a new window]
|
Fig. 3. Range of estimated corn area (ha) from 30 parsimonious regression tree models. Dashed line and dots are NASS census values. Boxes and bars represent ± 25% and ± 45% of median, respectively. The x axis is sorted by NASS 2002 census.
|
|
Figure 5
illustrates per county and per state corn area comparisons between the MODIS and NASS census. The regression of 19 states at the county level was statistically significant (MODIS = 0.88 x NASS census + 856, r2 = 0.883), partly due to the fact that high corn production counties performed very well (Fig. 5a). When calculating the regression line at a state rather than county level, the regression line was closer to the 1:1 line (MODIS = 0.95 x NASS census – 10,127; r2 = 0.957) (Fig. 5b).

View larger version (20K):
[in this window]
[in a new window]
|
Fig. 5. Validation of 500-m MODIS regression tree model for corn field areas (ha) with NASS 2002 census data at (a) county level and (b) state level. Dashed line is 1:1. ** Significant at P 0.01.
|
|
The median MODIS product underestimated corn area by 6% (26,190,567 NASS ha to 24,617,259 MODIS ha) (Table 2
). Iowa, Illinois, Nebraska, and Indiana, the four states with the highest corn acreage, had relatively low absolute and percentage-based RMSE values. Minnesota had the largest difference (807,277 ha, underestimated) and the biggest RMSE (170,294). Central Minnesota was underestimated in the MODIS depiction because the LST metric divided northern cool areas and southern warm areas. Because Minnesota is the fourth-largest corn producing state, this difference actually accounts for roughly half of the national underestimation. The training data from the North Dakota CDLs drove this underestimation, as there was insufficient corn area in North Dakota to adequately depict higher latitude corn acreage. Two solutions for further iterations are possible: (i) add training in the northern tier, including Minnesota, or (ii) reduce the MODIS feature set by removing the LST metrics.
View this table:
[in this window]
[in a new window]
|
Table 2. Corn areas (ha) from county based NASS 2002 census and 500-m MODIS estimates, root mean square errors (RMSEs), and the simple linear regression equations relating NASS census data and 500-m MODIS estimates.
|
|
All simple linear regressions between the NASS census data and the MODIS estimates at the county level per state were significant at the 99% level, with 10 states having r2 values > 0.8. Kansas, Pennsylvania, and Texas had the lowest r2 and slopes. The most probable cause of low correlation was a lack of available training outside of the Corn Belt states of the Midwest and northern plains, resulting in models that did not adequately capture the local variability in peripheral states. Limitations at the state scale are evident in Table 2. Adding training data both in terms of additional states and for additional years will aid in determining the need for subnational stratification of the method.
Soybean
Regression Tree Analysis
The 30 bagged regression trees using all 378 MODIS inputs explained 66.0% of the root node deviance in the models (Table 3
). Just as in the corn models, the same metric capturing the difference between the mean of the three highest NDVI values and the mean of the fourth-through-sixth highest NDVI values over the year explained the largest portion (27.7%) of the total deviance. The second most important metric in the soybean analysis was the MODIS near infrared (band 2) reflectance in the
August 32-d composite image. This metric explained an additional 12.2% of the total deviance.
View this table:
[in this window]
[in a new window]
|
Table 3. The list of MODIS annual metrics which reduced node deviance most when identifying soybean fields. These selected metrics were used to generate parsimonious regression tree models.
|
|
The parsimonious soybean models were derived using nine input metrics, including four that were NDVI-related metrics. These simplified models explained 62.4% of the total deviance, and accounted for 94.5% of the deviance explained by the nonparsimonious models.
The phenological pattern of NDVI in soybean fields differed from that of corn fields by maintaining higher peak greenness throughout July and August (Fig. 6a
). This is evident for Illinois, Indiana, and North Carolina, but not for North Dakota, and indicates either different corn and soybean progressions for this environment or a limitation of the NASS North Dakota CDL to accurately depict corn and soybean acreage. Elsewhere, the extended peak of the soybean phenological pattern in the sixth compositing period is important to discriminating soybean from corn cover. Near-infrared reflectance of this composite period is the second most important metric in the soybean model. In pixels with soybean coverage < 25% (Fig. 6b), the phenological patterns are the same as that seen in Fig. 2b.

View larger version (15K):
[in this window]
[in a new window]
|
Fig. 6. Seasonal changes of NDVI for soybean fields, where (a) is high (>50%) soybean field density and (b) is low (<25%) soybean field density pixels using 32-d composites from March 2002 to February 2003.
|
|
Statistical Analysis
For the final statistical analysis, the 17 states representing 95% of all U.S. soybean production were processed. Figure 7
shows the bagged tree results using the subset of metrics (parsimonious model) along with the NASS census data by state. The NASS census data for 13 out of 17 states were within ±1 quartile of the bagged tree results from MODIS data. Figure 8a
shows the median result of the 30 bagged trees per pixel and Fig. 8b shows the NASS census values for soybean growing area per county. As in the corn results, the spatial patterns are correlated and reflect the regional mapping capability of the MODIS data.

View larger version (11K):
[in this window]
[in a new window]
|
Fig. 7. Range of estimated soybean area (ha) from 30 parsimonious regression tree models. Dashed line and dots are NASS census values. Boxes and bars represent ±25% and ±45% of median, respectively. The x axis is sorted by NASS 2002 census.
|
|
Figure 9
illustrates per county and per state soybean area comparisons between the MODIS and NASS census. The regression at the county level for the 17 states was statistically significant (MODIS = 0.84 x NASS census + 3743, r2 = 0.868) (Fig. 9a). When calculating the regression line at state-level, the performance was improved (MODIS = 0.91 x NASS census + 212,661; r2 = 0.949) (Fig. 9b).

View larger version (18K):
[in this window]
[in a new window]
|
Fig. 9. Validation of 500-m MODIS regression tree model for soybean field areas (ha) with NASS 2002 census data in (a) county level and (b) state level. Dashed line is 1:1. ** Significant at P 0.01.
|
|
Across all 17 states, the median MODIS product overestimated soybean acreage by 4% (27,864,415 NASS ha to 28,938,338 MODIS ha) (Table 4
). Three of the four highest soybean producing states, Iowa, Illinois, and Indiana, had low RMSEs as a percentage of NASS estimates. Minnesota, third in overall soybean acreage, performed less well with a 15% overestimate. Among the worse performing states, Wisconsin (73% overestimate) and Michigan (57% overestimate) were the most inaccurate as compared with NASS census data. Out of the 17states analyzed, however, these two states have comparatively little soybean production, and thus did not significantly impact the national results.
View this table:
[in this window]
[in a new window]
|
Table 4. Soybean areas (ha) from county based NASS 2002 census and 500-m MODIS estimates, root mean square errors (RMSE), and the simple linear regression equations relating NASS census data and 500-m MODIS estimates.
|
|
All linear regressions between the NASS census data and the MODIS estimates at the county level per state were significant at the 99% level with 11 states having r2 values of higher than 0.8. Kansas had the lowest r2 and slope. As in the corn models, the most probable cause of low correlation was a lack of available training outside of the Midwest corn and soybean belt.
Corn Plus Soybean
Statistical Analysis
To test the robustness of the individually derived corn and soybean models, a summation of the two products was performed and compared with the NASS census data. In analyzing the combination of the corn and soybean models, the 16 states which contribute to both 95% of U.S. corn and soybean production were used. Figure 10
shows the 30 bagged tree results using the subset of metrics (parsimonious model) along with the NASS census data by state. In 15 out of 16 states, the NASS census data were within ±1 quartile of the bagged tree results from MODIS data.

View larger version (11K):
[in this window]
[in a new window]
|
Fig. 10. Ranges of corn and soybean field areas (ha) summed from 30 parsimonious regression tree models per crop type. Dashed line and dots are NASS census values. Boxes and bars represent ±25% and ±45% of median, respectively. The x axis is sorted by NASS 2002 census.
|
|
Figure 11
illustrates per county and per state corn and soybean area comparisons between the MODIS and NASS census. The simple linear regression between NASS census data and MODIS estimates per county for both crops was statistically significant (MODIS = 0.90 x NASS census + 3580 and r2 = 0.937) (Fig. 11a) and the plot had less variability than single crop plots. At the state level, the regression line (MODIS = 0.93 x NASS census + 190,138; r2 = 0.984) was close to the 1:1 line (Fig. 11b).

View larger version (18K):
[in this window]
[in a new window]
|
Fig. 11. Validation of 500-m MODIS regression tree model for corn plus soybean field areas (ha) with NASS 2002 census data at (a) county level and (b) state level. Dashed line is 1:1. ** Significant at P 0.01.
|
|
For all 16 states, the total corn and soybean acreage from MODIS product was only 1% lower than the NASS census data (52,041,780 NASS ha to 51,701,475 MODIS ha) (Table 5
). Iowa, Illinois, Nebraska, and Indiana, the four highest combined production states, had low RMSEs as a percentage of the NASS census. North Carolina (22% overestimate) was the most inaccurate as compared with NASS census data. However, it was a small contributor and thus did not significantly impact the national results.
View this table:
[in this window]
[in a new window]
|
Table 5. Corn plus soybean areas (ha) from county based NASS 2002 census and 500-m MODIS estimates, root mean square errors (RMSE), and the simple linear regression equations relating NASS census data and 500-m MODIS estimates.
|
|
When compared with the single crop results, the regressions between the NASS census data and the MODIS estimates at the county level per state had better r2 values and all regressions were significant at the 99% (Table 5). Additionally, 13 of the 16 states had r2 values greater than 0.8. The results indicate that the errors in the individual corn and soybean maps largely represent confusion between the two crop categories.
 |
CONCLUSIONS
|
|---|
Corn and soybean area mapping using 500-m MODIS time-series data set has been conducted for the dominant production areas of the continental United States. Regression tree analysis was used to identify corn or soybean fields from annual 500-m MODIS metrics using a set of NASS CDLs as training. The 19 and 17 states which covered 95% of the total corn or soybean fields in United States, respectively, were selected for the final statistical analysis and validation, with 16 states included in both analyses. The validation was done at the national, state, and county levels using county-level statistics from the NASS 2002 census data.
Several important conclusions can be drawn from this study. First, the 500-m MODIS annual metrics were useful in differentiating corn and soybean areas both from other crops and other land cover types. The NDVI related metrics were most prominent, reducing 36 and 32% of the deviance in the regression tree analysis for corn and soybean, respectively. Second, the parsimonious tree models derived from the most important metrics from the exhaustive analysis also estimated corn and soybean areas successfully, explaining 97% of the deviance captured in the nonparsimonious corn model and 94% from the nonparsimonious soybean model. The strength of the regression tree is its use as a data mining tool. Numerous phenological measures and data transformations may be input to such a model to identify which ones are most useful for discrimination. Reduced set inputs and resulting models can then easily be adapted to operational settings. Third, the regression tree analysis generated models that discriminated corn and soybean cover, pointing a way forward to improved annual crop inventorying. At national, state, and county scales, the estimates of corn and soybean areas using 500-m MODIS data were significantly close to NASS census data.
Future steps on this topic will include (i) addressing and ameliorating the regional variation in crop type mapping accuracy; (ii) building multitemporal models that allow results to be generated across a range of years, and thus enable direct comparison of results through time; and (iii) translating the models to other important crop production areas of the world. Preliminary tests of the method have begun for Brazil and Argentina. Also, the 250-m MOD44B inputs from MODIS Collection 5 processing are expected to improve on these initial results by providing observations more spatially correlated with the scale of agricultural lands in the United States. The goal is to test a standard, robust procedure in improving the monitoring of key cash crops globally, particularly in the light of changing energy production priorities. The primary limitation to doing so is not the MODIS inputs or tree algorithm, but the presence of high-quality training information for model calibration.
 |
ACKNOWLEDGMENTS
|
|---|
This work was made possible through funding by the NASA Applied Science Program and USDA Foreign Agriculture Service via the Global Agriculture Monitoring (GLAM) project, grant code NNS06AA03A. We would like to acknowledge the assistance of Chris Justice of the University of Maryland, and Brad Doorn and Curt Reynolds of the USDA FAS in implementing this study.
 |
REFERENCES
|
|---|
- Allen, R., G. Hanuschak, and M. Craig. 2002. History of remote sensing for crop acreage. Available at www.usda.gov/nass/nassinfo/remotehistory.htm [verified 29 Aug. 2007]. USDA-NASS, Washington, DC.
- Boatwright, G.O., and V.S. Whitehead. 1986. Early warning and crop condition assessment research. IEEE Trans. Geosci. Remote Sens. 24:54–64.
- Breiman, L. 1996. Bagging predictors. Mach. Learn. 24:123–140.
- DeFries, R.S., M. Hansen, M. Steininger, R. Debayah, R. Sohlberg, and J. Townshed. 1997. Subpixel forest cover in Central Africa from multisensor, multitemporal data. Remote Sens. Environ. 60:228–246.
- De Wit, A.J.W., and J.G.P.W. Clevers. 2004. Efficiency and accuracy of per-field classification for operational crop mapping. Int. J. Remote Sens. 25:4091–4112.
- Doraiswamy, P.C., S. Moulin, P.W. Cook, and A. Stern. 2003. Crop yield assessment from remote sensing. Photogrm. Eng. Remote Sens. 69:665–674.
- Doraiswamy, P.C., J.L. Hatfield, T.J. Jackson, B. Akhmedov, J. Prueger, and A. Stern. 2004. Crop condition and yield simulations using Landsat and MODIS. Remote Sens. Environ. 92:548–559.
- El-Magd, I.A., and T.W. Tanton. 2003. Improvements in land mapping for irrigated agriculture from satellite sensor data using a multi-stage maximum likelihood classification. Int. J. Remote Sens. 24:4197–4206.
- Gallego, F.J. 2004. Remote sensing and land cover area estimation. Int. J. Remote Sens. 25:3019–3047.
- Hall, F.G., and G.D. Badwar. 1987. Signature extendable technology: Global space-based crop recognition. IEEE Trans. Geosci. Remote Sens. 25:93–103.
- Hansen, M.C., and R.S. DeFries. 2004. Detecting long-term global forest change using continuous fields of tree-cover maps from 8-km Advanced Very High Resolution Radiometer (AVHRR) data for the years 1982–99. Ecosystems 7:695–716.
- Hansen, M.C., R.S. DeFries, J.R.G. Townshend, M. Carroll, C. DiMiceli, and R.A. Sohlberg. 2003. Global percent tree cover at a spatial resolution of 500 meters: First results of the MODIS vegetation continuous fields algorithm. Earth Interact. 7:1–15.
- Hansen, M.C., R.S. DeFries, J.R.G. Townshend, R. Sohlberg, C. DiMiceli, and M. Carroll. 2002. Towards as operational MODIS continuous field of percent tree cover algorithm: Examples using AVHRR and MODIS data. Remote Sens. Environ. 83:303–319.
- Mergerson, J.W. 1981. Crop area estimates using ground gathered and Landsat data: A multitemporal approach. p. 1211–1218. In Proc. 15th Int. Symp. Remote Sens. Environ., Ann Arbor, MI. Environmental Res. Inst. of Michigan, Ann Arbor.
- Michaelson, J., D.S. Schimel, M.A. Friedle, F.W. Davis, and R.O. Dubayah. 1994. Regression tree analysis of satellite and terrain data to guide vegetation sampling and surveys. J. Veg. Sci. 5:673–696.
- National Agricultural Statistics Service. 2002. Census of agriculture—Volume 1 Geographic area series census, State—County Data. Available at www.nass.usda.gov/Census/Create_Census_US_CNTY.jsp [verified 30 Aug. 2007]. USDA-NASS, Washington, DC.
- National Agricultural Statistics Service. 2007. Cropland data layer. Available at www.nass.usda.gov/research/Cropland/SARS1a.htm [verified 30 Aug. 2007]. USDA-NASS, Washington, DC.
- Ozdogan, M., C.E. Woodcock, and G.D. Salvucci. 2003. Monitoring changes in irrigated lands in southeastern Turkey with remote sensing. Int. Geosci. Remote Sens. Symp. 3:1570–1572.
- Perlack, R.D., L.L. Wright, A.F. Turhollow, R.L. Graham, B.J. Stokes, and D.C. Erbach. 2005. Biomass as feedstock for a bioenergy and bioproducts industry: The technical feasibility of a billion-ton annual supply. Available at http://feedstockreview.ornl.gov/pdf/billion_ton_vision.pdf [verified 30 Aug. 2007]. USDE-USDA, Washington, DC.
- Prince, S.D., and M.K. Steininger. 1999. Biophysical stratification of the Amazon basin. Glob. Change Biol. 5:1–22.[Medline]
- Quarmby, N.A., J.R.G. Townshend, J.J. Settle, K.H. White, M. Milnes, T. Hindle, and N. Silleos. 1992. Linear mixture modeling applied to AVHRR data for crop area estimation. Int. J. Remote Sens. 13:981–989.
- Rembold, F., and F. Maselli. 2006. Estimation of inter-annual crop area variation by the application of spectral angle mapping to low resolution multitemporal NDVI images. Photogramm. Eng. Remote Sens. 72:55–62.
- Reynolds, C.A. 2001. Input data sources, climate normals, crop models, and data extraction routines utilized by PECAD. Available at www.pecad.fas.usda.gov/cropexplorer/datasources.cfm [verified 7 Sept. 2007]. USDA, Washington, DC.
- Turker, M., and M. Arikan. 2005. Sequential masking classification of multi-temporal Landsat7 ETM+ images for field-based crop mapping in Karacabey, Turkey. Int. J. Remote Sens. 26:3813–3830.
- Venables, W.N., and B.D. Ripley. 1994. Modern applied statistics with S-Plus. Springer-Verlag, New York.
- Vermote, E.F., N. El Saleous, and C.O. Justice. 2002. Atmospheric correction of MODIS data in the visible to middle infrared: First results. Remote Sens. Environ. 83:97–111.
- Vermote, E.F., N. El Saleous, C.O. Justice, Y.J. Kaufman, J.L. Privette, L. Remer, J.C. Roger, and D. Tanre. 1997. Atmospheric correction of visible to middle-infrared EOS-MODIS data over land surfaces: Background, operational algorithm and validation. J. Geophys. Res. 102:17,131–17,141.
- Wan, Z., Y. Zhang, Q. Zhang, and Z.-L. Li. 2002. Validation of the land-surface temperature products retrieved from terra moderate resolution imaging spectroradiometer data. Remote Sens. Environ. 83:163–180.
- Wardlow, B.D., J.H. Kastens, and S.L. Egbert. 2006. Using USDA crop progress data for the evaluation of greenup onset date calculated from MODIS 250-meter data. Photogrm. Eng. Remote Sens. 72:1225–1234.
- Wolfe, R.E., M. Nishihama, A.J. Fleig, and D.P. Roy. 1999. MODIS operational geolocation error analysis and reduction methodology. Int. Geoscience Remote Sensing Symp. 1:449–451 doi:10.1109/IGARSS.1999.773529.