A Review of EOS Quality Assessment (QA) Methodology and EOSDIS Support for QA

--Bob Lutz (rlutz@ltpmail.gsfc.nasa.gov), Raytheon ITSS, Code 423, Goddard Space Flight Center, Greenbelt. Maryland 20771

INTRODUCTION

One of the main objectives of any scientific data processing system is that suspect and bad data be identified and flagged before release to the user community. This is a challenging task within EOS1 due to the large volume of data produced (one terabyte per day), the near-real-time mode between data production and distribution, and the numerous error sources that may affect data quality. Quality control measures applied to the data before release to the general public are referred to as Quality Assurance or Quality Assessment (QA) within EOS. QA is one of three components of quality control, with Calibration and Validation being the other two processes2.

The AM­1 spacecraft will be the first comprehensive satellite of the EOS program. The spacecraft supports five instruments: ASTER, CERES, MISR, MODIS, and MOPITT. There is an associated instrument team (IT) for each instrument developing the science algorithms and processing software. MODIS, producing more data than the four other instruments combined, has its instrument team split into three disciplines: atmosphere, ocean, and land. The instrument teams and their programming staff use one or more science computing facilities (SCFs) to develop and test the science algorithm software and to support IT quality control analyses.

EOSDIS3 has been under development to support the AM­1 spacecraft and future EOS missions. EOSDIS provides the computing and network facilities to support the generation, archiving, and distribution of geophysical data products from the data sensed by the EOS instruments. EOS AM-1 data are archived at four operational DAACs: Goddard Space Flight Center (GSFC)MODIS; Langley Research Center (LaRC)CERES, MISR, MOPITT; EROS Data Center (EDC)MODIS, ASTER; and the National Snow and Ice Data Center (NSIDC)MODIS. EOSDIS's infrastructure, the EOSDIS Core System (ECS), provides the instrument team scientists the computing architecture needed to quality assess their data (i.e., the Clientthe EOSDIS search and order tool).

QA is defined within EOS as the process that identifies and flags data products that obviously and significantly do not conform to the expected accuracies for the particular product type.3 The QA process is performed primarily at the granule or smaller level, where a granule is defined as the smallest entity of a data set that is tracked and managed by the system. The instrument teams realized that it would be necessary to incorporate automated QA within the Product Generation Executives (PGEsscience software executives) to ensure at least the minimum needed quality control of all the data. Some limited QA would also be performed manually on subsets of the data products by staff at the DAACs and the SCFs. A further complexity in the planning of EOS QA was that EOSDIS and the science algorithms have been developed simultaneously. This necessitated that the resulting QA methodology be sufficiently flexible to respond to changing requirements from both a science and information system perspective.

The various types of archived QA parameters will be useful to several types of users. Instrument teams can use QA parameters for monitoring the "health" of their data products. They may be more concerned with sub-granule (e.g., pixel) level QA data, rather than granule-level QA data. The general science community may utilize QA parameters quite differently from the instrument teams, in that these parameters may be used to screen data for potential usefulness. Here granule level QA parameters are the most important elements, since these attributes are used to search and order data. The organization and storage of QA parameters within EOSDIS must be able to satisfy the requirements of both of these types of users.

In consideration of the above points, a successful EOS QA methodology must be able to integrate: a) the automated flagging of suspect data by the algorithm software, b) the capability of EOSDIS to alert the ITs and the DAACs to suspect data, c) the extraction of suspect data out of the archives for QA purposes and the subsequent storage of QA results within EOSDIS, and d) the organization, archiv-ing, and display of all of these QA results (automated and human) in a user-friendly format for the scientific community.

DEVELOPMENT OF A STRATEGIC EOS QA PLAN

Introduction

Initial scoping of the effort was led by the EOSDIS Project Scientist and then transitioned to the Earth Science Data and Information System (ESDIS) Project's Science Office, under the coordination of a QA Scientist. The development of an EOS QA plan has entailed an interactive and iterative process, involving the ITs, the DAACs, and the developers of ECS. The EOS QA plan has also evolved over the course of several years, as ITs' ideas matured as to how they planned to perform their QA analyses and as the design of EOSDIS passed from a conceptual phase to implementation. An initial strategy was developed, documented, and published for review by the EOS science community4.

Instrument team and DAAC participation

The ESDIS Science Office requested that the instrument teams develop and submit Draft IT QA Plans to ESDIS several years prior to launch, so that EOSDIS (which was under concurrent development) could support the needs of the ITs. These draft plans contained operational scenarios of the IT QA methodology, including QA data flows and the QA-related functionality of EOSDIS, and description of the content and format of QA parameters stored in the products5. Also included in the IT QA plans were descriptions of the expected roles and responsibilities of the DAACs in the QA process. Although science QA is the responsibility of the instrument teams, some monitoring of the data (for example, visual inspection) could take place at the DAACs under agreements between the respective ITs and DAACs. Early specification of the responsibilities of the DAACs and ITs should help in the allocation of appropriate resources required to carry out the QA functions.

EOSDIS design evolution

A fundamental aspect of the EOSDIS design is the incorporation of sufficient flexibility to accommodate new and evolving IT and DAAC QA requirements. EOSDIS provided for generalized support for QA functionality when the specifics were unknown, and provided "hooks" within the system to allow additional functionality to be added at a later date.

EOSDIS was designed to present QA information to the user at several levels, with summary QA information provided at the granule level and more-specific QA information provided at the sub-granule level. This hierarchical format was intended to meet both the needs of instrument team scientists and the user population.

It was also recognized by the ECS developers that they would need to develop enhanced system functionality within EOSDIS to allow the instrument teams to access the data in a highly efficient manner. Special tools would be needed for the instrument teams to incorporate their QA analyses back into the system.

Coordination of the instrument teams, the DAACs, and the developers of EOSDIS

To provide a forum for an exchange of ideas, a series of workshops was held under the auspices of the ESDIS Science Office. A QA Working Group (QAWG) was formed with representatives from each AM-1 instrument team, their associated DAACs, and ECS contractor staff. In addition, SAGE III and the Data Assimilation System (DAS) teams were represented, and interested members from the EOS Interdisciplinary Science Teams were invited to participate. The QAWG generates and works action items and acts as a liaison to the ITs and DAACs on all QA issues. Through these meetings the ITs and DAACs learned of each others' needs through a discussion of their individual QA plans; the strengths and weaknesses of EOSDIS were identified by discussing QA scenarios; and guidance was gathered from the user community in regard to the resulting EOS QA methodology.

DISCUSSION OF EOS QA METHODOLOGY

Introductiondissemination of information

The ITs were provided a "generic" QA Plan6 by the ESDIS Science Office to provide guidance and some commonalty as to what elements should be covered in their respective plans. Draft QA Plans7-15 were submitted to the ESDIS Science Office and, in turn, were distributed among the members of the QAWG. This included QA plans from each AM-1 instrument and SAGE III. Many of these draft plans are now outdated and reflect only the team's QA methodology at the time of submission. Final QA Plans are due several months before the AM-1 launch. In addition, Kahn16,17 presented the early QA thoughts of the MISR team to the general science community via two articles in The Earth Observer. Three QA Workshops were also held in conjunction with this effort: November, 1996, July, 1997, and January, 1998.

The resulting EOS QA methodology that evolved from the QAWG peer review of the Draft QA Plans and from sessions at the workshops is presented through discussions of: a) Archived QA (how QA results are stored), b) Operational QA (how QA is done), and c) User QA (how QA results will be used). The final part of this section discusses the special requirements that were specified for EOSDIS by the QAWG, to support IT and DAAC QA needs.

Archived QA

QA parameters are archived within EOSDIS at several levels:

These three types of QA parameters are described below.

Granule level

At the granule level, QA parameters are stored in the metadata (Table 1). QA metadata consist of core (all products) and non-core (product specific) metadata18.

Table 1: Granule level QA metadata

Granule Level Metadata
Core (common to all products) QACollectionStats
--AutomaticQualityFlag
--OperationalQualityFlag
--ScienceQualityFlag

QAStats
--QAPercentMissingData
--QAPercentOutofBoundsData
--QAPercentInterpolatedData
--QAPercentCloudCover

Non-core (Product specific attributes [PSAs]) QA PSAs
--defined by ITs

Sub-granule metadata

Data products may also contain QA parameters within the product itself rather than in the metadata. Generally, but not necessarily, these are at the same resolution as the data product. For example, several teams (e.g., ASTER) are including QA data planes within all their products that contain QA information for each pixel. This approach allows the user to visualize per pixel QA information in a common format for the entire team's data products. It provides a consistent format for interpretation by casual and sophisticated users and is useful for data screening at the pixel level.

External QA products

External QA products may be generated as the data products are produced and contain QA information at the granule level primarily used by the ITs at the SCFs. These include QA log files (MODIS Land), exception logs (MOPITT), or QA reports (CERES). These QA products are, in essence, QA granules, which are searchable and orderable within the EOSDIS system. The QA granules may correspond one-to-one with the data granule (MODIS Land, MISR) or may be a QA summary file of many granules (MOPITT). They may be permanently stored or temporarily created for the instrument team's needs. For example, MOPITT is creating QA summary logs for their QA analyses. These files have a short archival lifetime, on the order of days. To support analysis of these QA granules at the SCFs, several teams (e.g., MISR, MODIS Land) are developing external databases from EOSDIS. Within these databases, they copy, store, and analyze these external QA products.

Operational QA

This section presents a description of the three general components of Operational QA.

Components of operational QA

Though the specifics may vary for each IT's QA scenario, there are three general operational components of EOS QA:

PGE QA Analysis--Within this component, the data products are produced (generally at a DAAC) from science algorithms supplied by the instrument science teams. Numerous QA parameters (operational and product-related) are generated by these algorithms. These generated QA parameters may be at the granule or sub-granule level, and possibly summarized or subsetted. These QA parameters are then sorted and subdivided among the product metadata, the data product, and any external QA products. From criteria specified by the instrument teams, the core metadata fieldthe AutomaticQualityFlag (flag and text)is set within the PGEs. In addition, some teams (e.g., ASTER) are making extensive use of alerts or alarms in their processing software to warn them of anomalous conditions that occur during production. These alerts may be: a) automatically sent to DAAC operations staff, who forward these messages to the instrument team, or b) sent to processing logs, which later can be downloaded from the system and be analyzed.

DAAC QA Analysis--The DAACs are responsible for monitoring non-science QA aspects of data production. They are to check the integrity of the data at the file level, to ensure that the data are not corrupted in the transfer, archiving, or retrieval processes. This analysis may include checking that the file can be opened and that the file size is correct. In addition, some DAACs may perform limited science QA functions, in agreement with their ITs. This may involve monitoring summary QA statistics and alerts generated from the PGE QA analysis or visually displaying data to detect gross problems. The results from non-science QA analyses performed at the DAAC are summarized in the core metadata OperationalQualityFlag, and text field by DAAC staff.

SCF QA Analysis--The ITs ultimately are responsible for the science QA of their data products. Each instrument team has developed a different strategy and set of procedures to accomplish this objective. There are two general types of QA analyses performed by instrument team scientists: 1) those of an investigative nature and principally analyzing suspect data, and 2) those of a routine nature involving regular screening of the data product. Many teams are estimating that they can routinely examine 10% of the daily averaged data production. It is expected that, during the first year, a greater emphasis will be placed on analyzing suspect data. Maturity in the understanding of the behavior of the instruments and revised science algorithms, should see a gradual change from investigative QA to routine QA screening in later years. A subset, or the entire data product stream for instruments with low data rates, may be examined by scientists at the SCF. For most AM-1 instrument teams it is impractical to transfer the full set of data products from the DAAC to the SCF because of prohibitively large network requirements. Therefore, over a given time period, most teams intend to order only statistical samples and samples of those data with quality problems indicated by their QA metadata. Some teams will receive all the external QA products associated with their products and infer from these which products should be ordered for QA purposes.

The results of science QA analyses performed at the SCF are summarized in the core metadata ScienceQuality-Flag and text field by the instrument team scientists.

User support for QA

The science community is provided with some tools within EOSDIS to enable them to access the generated QA parameters efficiently. Although two of the topics (the EOSDIS Client and Subscription) presented in this section were not developed specifically for QA purposes, these may be used to exploit QA information associated with the data products. The User Comment Document and DAAC User Services Groups are also discussed here. These latter features may aid the user in interpreting and describing (for other users and the ITs) information related to the quality of the data.

EOSDIS Client

The EOSDIS Client is the tool that the science community will use to search, browse, and order AM­1 data. Users are able to search on granule-level core and product specific attributes to define the granules that they wish to order. For example, a search initiated with "ScienceQualityFlag = Passed" in the search criteria field will return only granules that have passed IT QA analyses.

Within the Client, the user will be able to display all QA-related metadata (core and product specific) for the product. This allows a user to see the product QA parameters as a group, helping the user decide which data to order.

EOS policy states that all AM­1 data, regardless of the quality, be visible, searchable, and orderable from EOSDIS. Based on a QAWG recommendation, the Client manually requires users to acknowledge that they are ordering poor quality data (e.g., when any of the Automatic, Operational, or Science Quality flags is set to "Failed"). This occurs even though the user may or may not have searched on these attributes.

Subscription

The Subscription functionality within EOSDIS allows the user community to place standing orders on future EOS data. Users specify to have the data either automatically sent to their facility ("pushed"), or to be sent a notification that the data are ready to be extracted or "pulled" from the system. Again, users can specify QA core and product specific attributes as qualifiers within their subscriptions, enabling them optionally to filter poor-quality data. It should be noted that in the case of poor-quality data being automatically pushed to the user, EOSDIS does not warn the data receiver, as is done within the Client. In this push scenario, as well as an order using the Client, the user will be provided all related QA core and product-specific metadata with the data product.

User Comment Document

To enable users to communicate their quality concerns back into EOSDIS, a User Comment Document is associated with each data product. The User Comment Document is for users to provide scientific comments about specific granules or the entire data set. Comments will be reviewed by DAAC staff for appropriateness and may be forwarded to the instrument teams for investigative action.

User Services

In addition, users may communicate directly with the DAAC User Services Groups, established at every DAAC, with quality-related questions. Staff employed in these positions will investigate the nature of these user concerns. If necessary, they will consult with instrument team personnel for guidance and resolution of the question.

Specific requirements of operational QA

Within discussions of the QA scenarios at the workshops, several new requirements were specified for EOSDIS to support the needs of operational QA. EOSDIS had been designed to be flexible in adapting to evolving requirements.

CONCLUSIONS AND FUTURE QA ENHANCEMENTS WITHIN EOSDIS

Currently, EOS policy states that all data products are to be made available to the general science community. As a consequence, the importance of mechanisms to ensure the quality of the products prior to their distribution has been recognized, and an end-to-end QA approach has been developed. This paper has described the different elements of the EOS QA approach from data production through archiving that have been adopted by the AM­1 science teams and the data producers at the DAACs.

The design of EOSDIS has proven to be adaptive to the new QA requirements described in this paper. Future enhancements to EOSDIS may include granule-level data visibility and access controls that will allow developers (science teams) and producers (DAACs) to temporarily hold specific data sets that require more detailed QA analysis. Another requirement recently advocated is the need for an automated method to update the QA metadata outside of the production software. This requirement was born out of the realization that QA procedures may be automated through post-launch experience and characterization of the spaceborne instruments and of the science software used to produce the products.

QA is an evolving element within EOS. Communication is continuing among all entities (the instrument teams, the DAACs, and the developers of EOSDIS) in order to ensure that the quality of the large volume of EOS products is defined and documented. The user community will be involved in this process after launch by providing feedback from their experiences in trying to use the data. It is expected that user feedback will be important for the instrument teams to identify problems with their products and to fine tune their QA methodologies.

Many of the standard EOS products are new and without heritage, and may contain questionable data in the early post-launch period. Users of EOS data must now be made aware of the QA information associated with the products to encourage their proper utility. This information will be made available in a timely manner prior to launch within EOSDIS.

ACKNOWLEDGMENTS

The author would like to acknowledge H. K. Ramapriyan, ESDIS, who has provided resources, guidance, and direction in the development of this effort. In addition, the author would like to thank the members of the QA Working Group for their participation in this endeavor.

REFERENCES

  1. Asrar, G. and J. Dozier: EOS Science Strategy for the Earth Observing System, American Institute of Physics Press, Woodbury, N. Y., 1994.

  2. Asrar, G. and R. Greenstone: 1995 MTPE/EOS Reference Handbook, National Aeronautics and Space Administration, Washington, D. C., NP-215, 278 pp., 1995.

  3. Asrar, G., and H. K. Ramapriyan: "Data and Information System for Mission to Planet Earth." Remote Sensing Reviews 13, pp. 1-25, 1995.

  4. Lutz, B.: "Quality Assurance Methodology for EOS Products." The Earth Observer Vol. 8, March/April 1996, pp. 59-62, 1996.

  5. Lutz, B.: Quality Assurance Procedures for EOS products : Concepts, Implementation and Archival. Draft 4, ESDIS Science Office, 1995.

  6. Lutz, B.: The QA Process: A Decomposition into Functional Components. Vol. 2, ESDIS Science Office, 1996.

  7. Leff, C.: ASTER Higher-Level Data Product Quality Assessment Plan. JPL D-13841, Jet Propulsion Laboratory, California, 1996.

  8. Anselmo, T., L. Chang, L. Coleman, D. Cooper, J. Escuadara, A. Fan, N. McKoy, K. McIntire, T. Murray, S. Nolan, J. Robbins, J. Stassi, S. Sullivan, C.Tolson. J. Kibler, E. Geir, and M. Mitchum: CERES Data Management System Quality Assurance Plan. Release 1, Hampton, Virginia, 1996.

  9. Diner, D., C. Bruegge, K. Crean, P. Glover, R. Kahn, S. Lewicki, J. Martonchik, S. McMuldroch, C. Moroney, and S. Paradise: MISR Processing Quality Assessment Plan (JPL D-13496) and MISR Science Quality Indicators (JPL D-13965). Jet Propulsion Laboratory, California, 1997.

  10. Chu, A., K. Strabala, R. Song, S. Platnick, M. Wanf, and S. Matoo: MODIS Atmosphere QA Plan. Vol. 1.1, Goddard Space Flight Center, Maryland, 1997.

  11. Roy, D.: The MODIS Land Quality Assurance Plan, Vol. 1.1, University of Maryland, Maryland, 1996.

  12. Jones, M., H. Montgomery, R. Veiga, D. Knowles, N. Che, and L. Goldberg: MODIS Level 1B QA Plan. Vol. 1.2,. MCST Document #M0028, Goddard Space Flight Center, Maryland, 1996.

  13. Kilpatrick, K.: MODIS Oceans Quality Assurance Plan.Vol. 1.1, University of Miami, Florida, 1998.

  14. Bailey, P.: MOPITT Science Data Product Quality Assurance Plan. Vol.1, National Center for Atmospheric Research, Colorado. 1996.

  15. Veiga, R. and A. Edwards: SAGE III Quality Assurance Plan, Hampton, Virginia. 1997.

  16. Kahn, R.: "How will we choose which quality flags and constraints to report for MISR Level 2 Data?", The Earth Observer, Vol. 7, May/June 1995, pp. 32-33, 1995.

  17. Kahn, R. D. Diner, E. Hansen, J. Martonchik, S. McMuldroch, S. Paradise and R. West: "Quality Assessment for MISR Level 2 Data.", The Earth Observer, Vol. 8, January/February 1996, pp. 19-21, 1996.

  18. Gross, C.: B.0 Implementation Earth Science Data Model, ECS Document #420-TP-015-002, Hughes Information Systems Company, 1997.

  19. Singhal, S.: QA Metadata Update Tool for the ECS Project.,ECS Document #160-WP-002-001, Raytheon Information Systems Company, 1998.