This implementation plan recognizes that data and information management activities designed for a particular agency program are most responsive to the community of users within that program. In responding to mission objectives, agencies have used diverse data and information management approaches. It is the thrust of this interagency global change data and information management effort to help coordinate the multiple individual agency programs related to global change and provide the necessary broad interdisciplinary capability needed by the USGCRP. This coordination will not only lead to the availability of more comprehensive data and information, but also help both to reduce redundancies and increase the ability to use data and information from disparate sources.
A major focus of the GCDIS implementation plan is to develop procedures and criteria for establishing priorities for some of the key data and information sets needed to meet the objectives of the USGCRP. The establishment of priorities will be implemented by actively involving the other organizational elements of the USGCRP, the NAS, and the research and user communities. In following sections, three levels of service, corresponding to three levels of priority, are provided as guidelines for the agencies in implementing specific functions. These levels of service are summarized in Table 3.
A particular activity of the GCDIS implementation will include coordinating activities needed to produce special data and information products needed by the users of global change data and information. These will include value-added data sets that require data from multiple agencies. For example, assessment studies will require the synthesis of models from individual parts of the global change system. Process studies data sets that result from the synthesis of data from different platforms will also be needed by researchers in other agencies or their programs. Information summarizing the global environment and its change will be produced by the USGCRP; these results will have high-priority availability. Many of the summary products will be in text, graph, and table formats rather than in numeric.
The Content Subgroup, in coordination with other elements of the USGCRP, will identify and address those issues that affect the content of the GCDIS. This will include data from Federal agencies, State and local governments, and international organizations.
The ad hoc panels will comprise experts identified by the agencies, and will be called on to respond to many of the content issues. The panels will assist in the coordination of identifying and providing data holdings and special products needed to respond to the user needs-for example, interagency efforts such as the Global Energy and Water Cycle Experiment (GEWEX).
|Level of Service||Level 1||Users involved in all facets listed here.|
|Level 2||Users involved in defining data priorities, data set assembly, definition of special product needs, and the definition of other content related issues.|
|Level 3||User feedback and advisory group participation.|
Mechanisms and links to State, regional, and local governments must be developed to transfer priority data, metadata, documentation, and related software to appropriate local or national archives. Procedures will be developed in the mid-1990s to begin making these data and information available and to integrate these data and information into the existing archive systems. These procedures will include potential funding mechanisms to assist States with the transfer of nonfederally funded data. The procedures also will include recommendations for transferring data collected for federally funded projects to an archive accessible by the GCDIS. Examples of existing federally funded programs are the EPA State Data Management Program and the National Oceanic and Atmospheric Administration (NOAA) Coastal Zone Management Program. The USGCRP can use these existing funding mechanisms as models for future funding to State and local governments as part of the USGCRP budget.
Some U.S. global change scientists and institutions already know of such modern data in developing countries. Otherwise, it may be necessary to explore the existence of such data through the International Geosphere-Biosphere Programme (IGBP) committees of these countries or through national institutions. The WDC system also may be of help in searching out such modern data sets.
The Content Subgroup will identify disciplinary regional and global data and information that are known to exist but that are not available within the United States. The subgroup also will identify potential priority disciplinary regional and global data and information for which a source is not known.
Rescue of data needed for global change research but in danger of deteriorating or no longer being archived is another activity that will be actively pursued while the data and information in question are still accessible. The process of rescuing such data will include selective conversion to digital format. Another form of data rescue involves digitizing older tabular data, especially when these data represent important time series, to provide these data in maximally useful form for computer-assisted analyses.
Successful data rescue and digitization projects have been conducted with colleagues in Russia and China by units of WDC-A, at low cost. These collaborative projects can also be of benefit to scientists in the respective countries by providing access to technical assistance and data products. For example, receiving copies of their own data back in digital form (on compact disk-read only memory [CD-ROM] devices, for example) may make their data more useful for national studies.
The Content Subgroup, as a standing group, will make links from the data and information needed for key USGCRP research questions to the identification of, provision of, and priorities for the necessary data within the GCDIS. This will be done in the context of the USGCRP framework, which provides overall direction for the collection of data and the use of information for supporting global change research.
To provide such direction and to coordinate the interagency activities necessary to meet the goals of the USGCRP, four parallel, but interconnected, streams of activity and working groups have been established. These are
All four streams not only provide data and information, but also have specific requirements that the data and information management program must address. These requirements are driven by questions that the USGCRP must answer. These questions, in turn, may be used to set priorities for data content in the GCDIS.
The four working groups will be asked to provide priority questions that require information from the GCDIS. Expert advice will be sought from the agencies and from outside groups, such as the NAS, to develop detailed lists of data and information needed to develop the information to answer the USGCRP questions. Priorities for data and information will be derived from priorities associated with the questions that generated the information request. These data and information priorities will then be used to develop priorities for data and information from archives external to the USGCRP and will be used by the Access Subgroup to develop the level of implementation required for data and information in the GCDIS. In addition to this primary method of setting priorities for data and information, external advice will be sought from a number of sources.
To that end, the Content Subgroup will survey existing data and information, such as the documentation of scientific reports, assessment reports by the NAS, and written reports by the international scientific community. In addition, the subgroup will survey written reports from time to time and will attend meetings on an informal, member-by-member basis in order to identify new data sets that the process just described may have missed. Review of new national and international program requirements as they become available will be coordinated by the Content Subgroup.
To determine the availability of needed data and information, the subgroup will survey such current holdings as are defined by existing catalogs in the GCDIS. In this manner, gaps in the needed data can be determined. The survey will also identify and describe library and data collections important for global change use, beginning with GCDIS agencies' collections, then other Federal collections, then non-Federal collections (e.g., universities', State and local governments', nongovernment organizations', international). This information will then be made widely available through the GCDIS directory.
The Content Subgroup will use the existing international framework for identifying international data and information that may be needed. Representative organizations include the Intergovernmental Panel on Climate Change (IPCC), which was established by the World Meteorological Organization (WMO), and the United Nations Environment Program (UNEP). The IPCC assesses climate change for policymakers every several years, focusing on immediate issues and consequences. The IPCC prepares in-depth summary information for policymakers so that realistic response strategies can be created to manage climate change issues.
Many of the requirements underlined in the IPCC documents are addressed by continuing international programs. For example, international projects are coordinated by the World Climate Research Programme (WCRP), sponsored by the WMO and the ICSU, and by the IGBP, sponsored by the ICSU. Both the WCRP and the IGBP have addressed questions of data set identification.
For example, an IGBP report  presents a summary of the scientific requirements for a 1-km data set. The identification of such a high-priority need by IGBP and other scientific groups led to a high priority for the collection of the data set. An international and interagency Global Land 1-km AVHRR Data Set Project is now underway. This is an unprecedented effort to collect a daily global data set for a 1.5-year baseline. The project includes the U.S. Geological Survey (USGS), NASA, and NOAA (in the United States) and the European Space Agency and Australia, along with several dozen high-resolution picture transmission stations as participating collection sites. The Committee on Earth Observations Satellites (CEOS) has endorsed this project, and asked its Working Group on Data to provide support needed for international coordination of project data management issues.
The focused programs of the USGCRP are major sources of global change data and information. Resource requirements for the focused programs' associated data and information management functions have been included in the USGCRP planning. Other major sources are the many Federal and other programs with objectives primarily for purposes other than global change whose data and information are vital to the USGCRP. Examples include the daily satellite and in situ weather observations, information output from climate models, tree ring and ice core measurements, forest inventories, biological and ecological observations, cartographic data, stream flow records, bathythermograph data, fossil fuel statistics, demographic data, and soil maps.
Such data and information from these other programs are critical not only for global change research, modeling, and assessments, but also to the focused data and information program to fill gaps in coverage, tie together diverse data sets, and improve the quality and usefulness of the data and information by providing ground truth at selected points for calibration. The latter is specially important for the remotely sensed data required by the USGCRP. Some of these critically important data and information now are available to the USGCRP because they are intermingled with material subject to security, proprietary, or regulatory constraints, or are the property of entities outside the Federal Government, such as State and local governments, international sources, or individual researchers.
These other programs do not have global change as their primary theme. Therefore, the resources needed to make these data available to global change researchers (appropriate formats, products, and so forth) may not be available. A special effort is required to make the high-priority data and information from other programs available to GCDIS users in useful forms. The framework that will be used to accomplish this required aspect of GCDIS implementation is described in Appendix A on data capture. The IWGDMGC will coordinate with these other programs through appropriate interagency mechanisms to avoid duplication and to maximize efficiency.
|Level of search service||Level 1||All sources|
|Level 2||All U.S. sources|
|Level 3||All Federal sources|
|Not rated||As made available|
Complete documentation must do more than just describe the values represented in each field and the format information needed to read the media. It must fully document the data set from all possible points of view. The extent of data documentation, as with quality assurance, will depend on the importance of any particular global change data set. At a minimum, data set documentation for those data sets deemed of highest priority should contain the following:
|Level of Service||Level 1||Full set of data management functions in the GCDIS with packaged data set, complete with necessary ancillary data; documentation adequate for full use with confidence; priority given to reassembly, if necessary.|
|Level 2||Full set of data management functions in the GCDIS; ancillary data identified; documentation is a detailed description with quality estimates; reassemble lower priority.|
|Level 3||Minimal set of functions in the GCDIS; documentation is a summary of data sets.|
The IWGDMGC activities relative to the adoption of standards and guidelines for the documentation of global change data sets will be coordinated with other groups addressing similar issues, such as the FGDC.
A specific example of reassembly services is represented by the need for the detection of significant long-term global change. As stated in a recent report by the National Research Council, "We believe that the detection of significant long-term global change is so central to the goals of the USGCRP, and so clear an obligation to future scientists, that it should be considered explicitly for added emphasis in the early stages of the program."
This central requirement for the detection of global change, and the determination of its natural or human-induced origin, requires that long-term data bases of key parameters be established and updated regularly as the data and information from new observations become available. These special data bases each need to be of the highest quality and must combine the data and information from all the applicable observations into a standard, regularly produced, special data and information product with continuity from year to year. Examples include global ozone profiles, solar irradiances, cloud cover, snow cover, land cover, land and ice surface temperatures, and sea surface temperatures. All these special-standard products will have the functional service levels of essential priority data and information sets.
The importance of these special-standard products makes it particularly important that international agreements on both the individual parameters and on their formats and other aspects be reached as soon as possible. In addition to its need by the USGCRP, this special-standard product data and information activity will lay the groundwork for the United States to help lead the development of such international working relationships and agreements, wherever appropriate.
The advent of global change research has changed the way researchers investigate scientific issues. The issues of interest in global change research are some of the most complex ever contemplated. They go far beyond inquiries bounded by clear disciplinary lines. For example, assessing the effects of sea level rise on coastal regions requires data from more than a half-dozen disciplines to be integrated into one geographically registered data base. In response, agency data centers are adding new data and information products that are built with other GCDIS holdings.
In particular, existing issue-oriented Information Analysis Centers (IACs) - the Department of Energy's (DOE's) CDIAC, for example - provide these interdisciplinary services. The IACs proactively participate in the identification and creation of needed data sets from multiple extant sources to support specific issues. IACs support a user community beyond researchers, specifically including educators and the assessment staff that support policymakers.
The IACs focus on the integrated, value-added data and information products needed by both the research and policy-making communities to assess and mitigate specific global change issues. Whether global change IACs are formed as new entities within the GCDIS or incorporated into existing data centers, the establishment of IAC functionality is critical.
Researchers must be the primary source of quality assurance for checking their data, since they are closest to the collection process and are most knowledgeable about problems with the measurements. However, researchers are often reluctant to quality assure and document their data to archive standards. This is understandable because they often do not need such high standards for their own use of the data, and especially because they are not rewarded to do so. (In some cases they may not be particularly good at it.) The project under which the data may be collected by many principal investigators has the obligation to enforce standards of formatting, definitions, documentation, and quality assurance among the various data streams being generated to produce a consistent, integrated data base useful in systems analysis. The archive data center also has a major obligation in ensuring the quality of the data being archived. It is the last chance to review data quality before distribution to the research community at large. It is at this step that shortfalls in the quality assurance process at the principal investigator and project level can be identified and corrected.
Experience has proved that few complete data sets produced by researchers are without problems. Often, quality assurance includes simple checks for missing or unreasonable values or inconsistent correlation. But a good data center will question further. Some large observational data sets are received unfiltered (for completeness); but by applying reasonable constraints on the quality of observations (always working with the principal investigator), they can be made beyond question.
The resulting data set may be smaller, but confidence in the reliability of the data is greatly increased. In the case of computer models, appropriate quality assurance measures include error and sensitivity analysis requiring special codes and research techniques. Of course, any changes made are in conjunction with and with full approval of the original researchers to ensure their appropriateness. The accompanying documentation clearly explains what was done during the quality assurance process.
Depending on the data involved, the effort can be extensive and often can be the single most expensive step in processing a data set for distribution. The cost of full-quality assurance for a large data set makes it imperative that priorities be set within each data center so that only the most valuable data sets receive the maximum level of quality assurance. However, for these pivotal data sets, this level of quality assurance is absolutely necessary since future research and policy decisions may rest on careful analysis of these important data sets.
Data set documentation should specifically identify what quality assurance procedures were applied to the data either by the principal investigator(s), the project team, or the archive data center. Data variables themselves may have quality tags associated with them to indicate the level of confidence placed in their value or to indicate known problems with measurements. The value and use of data quality tags will vary between data sets and the analyses to which they will be subject.
|Level of Service||Level 1||The data will be put through a system of real-world analyses similar to those they will be subjected to by the user community Important data sets should be released as beta test versions to interested researchers who can exercise them thoroughly by applying actual analytical algorithms. Any problems can thus be identified and resolved at the data center (working in conjunction with the original suppliers of the data) before final release to the general user community.|
|Level 2||Data centers will scan all data for proper ranges of values and provide simple logical testing - summing values that should have a known value (e.g., 100 percent), correlating related values (such as cloud cover and sunshine hours), or locating sampling points relative to known regions (oceanographic samples should be located in the ocean).|
|Level 3||A data center will read all data sets for adherence to formatting conventions and proper identification of missing values.|
As a specific example of GCDIS content system implementation, the Content Subgroup will assume responsibility for the coordination necessary to identify a limited number - a dozen, more or less - of such policy-related issues. This will require coordinating with multiple groups, including other elements of the USGCRP and the NAS. For each such policy issue established, the Content Subgroup will form an ad hoc panel that includes representatives of the appropriate NAS groups, agency program managers, and agency data centers. It will be the responsibility of this panel of experts to identify
This approach - working group interactions among advisory- , management- , and working-level experts that span particular issues - will help ensure that the best data and information will be available so that progress can be made on problems of central national importance. Further ensurance will be provided by giving the priority policy-related data and information identified by this approach the highest level of service, level 1.
A pilot study has been initiated to test and demonstrate the applicability of this approach for GCDIS content coordination on an actual policy issue before implementing this approach on the full range of USGCRP policy issues. Because of the imperative to fill the planning gap on requirements for CO2 emissions monitoring that is an integral part of the Climate Framework Convention, the issue selected for the pilot study is the ability to determine regional sources and sinks for atmospheric CO2 for application as a part of a monitoring system for a GHG emissions reduction agreement (see Figure 3). Based on preliminary estimates of the data and information needed to investigate this problem that are presently available through processes such as those outlined in Figure 3, it seems likely that one immediate result of the pilot study will be the identification of the additional data and information management efforts that must be mounted for the determination of the regional sources and sinks of atmospheric CO2.
These three trends lead to the need for early consideration of several issues. The ways in which model output is captured and preserved will be critical. Learning from the Program for Climate Modeling Diagnosis and Intercomparison, the GCDIS will address guidelines for what data and information need to be preserved from particular models, as well as which models will have their output regularly preserved. Data assimilation is becoming an analytic method of choice. Not only will it be necessary to address the preservation of assimilated data products, but the extensive computational assets needed for a global data assimilation will increasingly become a necessary aspect of the data access question. Specifically, without assimilation, many data will be far less valuable.
Finally, the data assets of the various national and international agencies have value beyond their support of researchers seeking access in the sense of a library or archive. In the coming years significant field programs, such as the World Ocean Circulation Experiment (WOCE) and GEWEX, will mount major field campaigns that will depend heavily on these data assets. They will depend on other data not only for the postcampaign analysis, but also in support of the field operations during the experiments themselves.
Go back to Chapter 3. Users and the GCDIS
Go to Chapter 5. Access System Implementation by Functional Areas
Return to the Table of Contents