Natural Resources Information Management Strategy
Guiding Principles for Data Management
Few people troll through databases or technical reports for the fun of it. Invariably, they will be searching for information that will be supportive of their goals. Moreover, people with diametrically opposed goals will probably approach most sources of environmental and natural resources information with a view to finding information useful for their own specific purposes. Therefore, it is the responsibility of the people who are involved in evaluating, interpreting, and presenting the information to do so as clearly and unambiguously as possible - considering the constraints of time and resources. However, the final responsibility for using and presenting data belongs with the client and not the data provider.
The end of the long process of collection, evaluation, and interpretation of data and information is its presentation in a cohesive manner so that others can, not only understand the conclusions as stated, but are also able to make interpretative evaluations of their own.
Information holdings must contain sufficient detail so that clients - even years later - can understand how the results flow from the raw data, without having to make interpretations or, worse still, investigations of their own. Unless this objective is achieved, the custodian has not done their job properly.
Data management involves developing a number of systematic processes and protocols that are designed to provide a framework for providing quality information with a high degree of credibility.
This document provides a broad guide to the main elements that a data custodian needs to take into account in the data management process. Whilst not strictly sequential, these elements can be regarded as the minimum set of steps a custodian should undertake.
Standards and Methods
Data Custody (Data Entry, Storage, And Transfer Procedures)
Pricing and Access
Assessment and Reporting
Since data and information are increasingly being collected and analysed for multiple clients with different needs, data custodians (collectors, analysts, and providers) must be mindful that they are not the final step in the data use process - the data user provides the final step in working with and presenting data sets. For this reason custodians should attempt to preserve data 'depth' and variability, or alternatively specify how the information provided was generated, and the impacts and implications of this.
Because the custodian should understand their clients' needs for information, the identification of, and consultation with, data users is the most fundamental aspect of the data management process.
Custodians must identify data users and consult with them prior to setting the data collection programs' objectives and goals. The needs of the data users will constrain and guide:
- the type of data collected
- the methods of data collection
- the data collection regime
- the standards to which data collection is carried out
- appropriate data processing/culling methodologies
- appropriate means of dissemination or publication.
Of course, the range of potential end-users can be vast, and so data custodians need to take a pragmatic approach when undertaking a user needs analysis.
Standards and Methods
Unless it is necessary - due to the specialised nature of the data or user needs - data collection, processing and reporting should comply with relevant state and national standards and policies. Where no relevant standards or policies exist, custodians will need to develop new ones in consultation with their clients and, where appropriate, with other standards-setting bodies. Whether defined, derived, or unique, all standards used need to be documented or referenced.
On the basis of the user needs analysis the custodian then develops a data collection program. Correctly set up with clearly defined objectives and goals, the data collection program will provide the framework within which the custodian can:
- inform users on data collection and maintenance plans and their progress
- maintain the quality of the information assigned to them in terms of accuracy, integrity, currency, standardisation and completeness
- correct faulty data brought to its attention and endeavour to notify users of amendments to the data
- provide a mechanism to facilitate easy access to the dataset
- act as the authoritative source of information for the dataset and act as a single point of contact for inquiries.
(Data Entry, Storage, And Transfer Procedures)
Data entry, storage, maintenance, transfer, and archival procedures should be based on a secure chain-of-custody system that includes:
- where appropriate, simple but explicit primary sample tracking procedures
- use of data recording forms and good data entry procedures to ensure that correct and complete data are recorded
- data holdings that are suitably identified and maintained in a manner consistent with good record-keeping practices
- a formal mechanism for authorising data changes
- a process to ensure that quality control checks are always complete
- a process to ensure that documentation (such as access restrictions, licence conditions, copyright and disclaimer, and metadata) are always complete and attached to data holdings
- protocols for the transfer of data holdings including traceability mechanisms.
Data holdings should be suitably identified and maintained in a manner consistent with good record-keeping practices. Documentation should be clearly linked to data holdings, and should state:
- the standards to which the data have been collected (or references to published methodologies given), or new or derived methodologies described in detail
- the processes used in calculations and computations
- corrections, normalisations, standardisations, calibrations, and other adjustments employed
- statistical procedures, numerical methods, and computer programs used,
methods for evaluating limits of uncertainty
- corrections for systematic errors
- the sources of all constants used.
Auxiliary data should be collected throughout the measurement program and reviewed periodically. The use of supporting information may become necessary during the data interpretation process; and it is important in determining data validity (eg. for deciding whether or not an outlier is a valid value or an artefact). Unusual conditions should always be recorded. Where relevant, records of equipment maintenance must also be documented (so as to track potential systematic errors).
Data documentation includes metadata records. As with all dataset documentation, maintenance of metadata records in a manner consistent with good record-keeping practices is crucial to sustaining the long-term value of the dataset. For this reason, metadata management must be an integral component of data management, not merely an afterthought.
The Australian and New Zealand Land Information Council has developed a set of Metadata Guidelines. In NSW, these have been implemented by way of the NSW Natural Resource Metadata Policy. Metadata should be maintained in a manner consistent with this policy.
For all stages of data collection, analysis, evaluation, and interpretation, there must be clear and precise documentation encompassing quality assurance / quality control guidelines and principles. A statement certifying the data quality should be attached to the data holding, and be recorded in the metadata.
Data validation is an essential element of data quality assurance. It provides for reviewing a body of data against a set of criteria so that assurance can be made that the data of interest are adequate for their intended use. It includes the identification of questionable data and the investigation of apparent anomalies.
Validation checks are necessary so as to identify measurement, recording, transmission/transcription, and processing errors (loss or alteration of data).
Completeness and Representativeness: Completeness measures the amount of valid data obtained from a measurement process. Data representativeness compares how closely the measured results reflect "reality" (a statement of systematic methodological biases, etc.).
Data Compatibility and Comparability
Data compatibility among data holdings is controlled by the degree of similarity of sampling procedures and measurement systems, analytical techniques, quality assurance / quality control protocols, etc. Data comparability is a measure of the confidence with which one data set can be compared with another.
Data Review and Evaluation: Review and evaluation of data holdings involves an assessment of the overall utility of the holding with respect to data accuracy, precision, representativeness, and completeness, as well as how well these fulfil the objectives under which the data was collected.
Data verification checks must be performed before the data are assembled into a data holding, processed, sent out to data users and, ultimately, archived. Systematic inspection and periodic review of primary data records will ensure the general, ongoing, quality of their contents. Any discrepancies or errors found must be corrected before storage, and the raw or processed data must also be checked again when merged into an existing data holding. Notwithstanding this, changes or revisions must be justified and documented. Any changes should be made as additions to the original data; no erasures of records or data should be permitted.
Pricing and Access
Natural resources datasets are valuable for a wide range of management tasks across a range of disciplines. Effort expended to provide ready access to these datasets will generally result in increased utility, efficiency, and a reduction of duplication. On the other hand, custodians have a duty to safeguard the intellectual property and integrity of the data collected (and its sources where relevant), as well as protecting the investment of resources that went into collecting and maintaining the dataset.
Where appropriate, the custodian should:
- determine access restrictions on all or parts of the dataset
- set a pricing structure for the dataset
- develop data licences for the dataset
vprovide on-line access to data via appropriate frameworks
- streamline offline data distribution mechanisms so as to provide a reliable and high quality service at minimal cost.
Licensing conditions should cover publication, copying, transferring, and on-selling of data (including Internet and other forms of electronic dissemination) and any copyright, acknowledgment, disclaimer or other notice required to accompany publication.
Unless it is necessary - due to the specialised nature of the data or user needs - data access restrictions, pricing structures, and licences should comply with relevant state and national standards and policies.
Electronic data handling, data reduction, and data storage systems are important parts of many analytical systems. They greatly facilitate data management and control of errors due to misreading, faulty transcription, or miscalculations. However, the security, and correct and proper functioning of such systems needs to be ensured so as to prevent the accidental or malicious loss or alteration of data.
Similarly, the secure and correct operation of distribution and transmission protocols and systems must be maintained.
Assessment and Reporting
To complete the data management cycle, an assessment of the overall utility of the data holding is required. This involves reviewing and evaluating:
- the types of data collected and methods used
- the data collection regime
- collection standards and processing methodsv
- communication with users
- the quality (accuracy, integrity, currency, standardisation and completeness) of the data holding
- access mechanisms (to the dataset, to associated information, and to the custodian for inquiries); with respect to the clientsÍ information needs and the objectives and goals of the data collection program.
The results of such a review can then form the basis of further consultation with users and the evolution/refinement of the dataset.