Data discovery and identification: The repository enables users to discover the data and refer to them in a persistent way through proper citation


One objective of the data discovery is to allow ESPRI repository to be harvested by other data centers with interoperable data catalogs. In this regard, ESPRI uses well-defined standards for metadata exchange such as INSPIRE or ISO1915 for example, as well as international standard vocabularies such as GCMD or CF to describe its observational datasets. Climate simulations published by ESPRI can be harvested through a RESTful API “ESGF SearchAPI”, which can also be used by clients (browsers and desktop clients). Because of the distributed capabilities of the ESGF, a query sent to any node can return information from the whole federation (i.e., distributed search).

As described in previous requirements and in the DMP, ESPRI repository hosts multiple types of data used by various scientific communities. This implies discovery data is not yet fully centralized and both climate model simulations and earth observations provide users with adapted approaches for data discovery and access tools.

For the observation data, ESPRI deployed a GeoNetwork server hosting the IPSL metacatalog. This catalog provides an easy-to-use web interface to search geospatial data. The search provides full-text search as well as faceted search on keywords, resource types, organizations, or any other relevant parameter.

For the distribution of climate simulations, ESPRI relies on the Earth System Grid Federation (ESGF). ESGF is an international collaboration consisting of federated data centres that enable access to the largest archive of climate data world-wide. ESPRI acts as a Tier 1 site that implements the ESGF stack: an open source effort providing a robust, distributed data and computation software stack, enabling world-wide access to Peta/Exa-scale scientific data. Such an ESGF portal allows the users to find, select and download data files held in the globally distributed ESGF archives.

Both approaches allow users to search by scientific theme, facets, location or data type. The portal can also be used to refine the search and perform an advanced search according to the researcher’s criteria. This tool is adapted to the research methods of users who tend towards advanced searches.

Finally, ESPRI also offers a common catalog based on THREDDS software. It provides metadata and data access for ESPRI datasets, using OPeNDAP, OGC WMS and WCS, HTTP, and other remote data access protocols.

ESPRI data distribution services also embeds persistent identifier systems. CMIP6 data dissemination relies on unique and immutable Persistent IDentifiers (PID)  which are automatically generated during the ESGF publication.  ESPRI is also authorized and registered with INIST/CNRS to generate Digital Object Identifiers (DOIs) on datasets.  To this end, we use the GeoNetWork meta catalog features for automatic DOI generation in conjunction with DataCite. We recommend the data providers  to include the DOI/PID in the metadata of their datasets so that users can easily access the license and the terms of use of the data.

Given the free licenses, no license signature is required to access and use the data. However, ESPRI ensures that the license information is made available and correctly positioned in the metadata. ESPRI also ensures that the citation is provided either in the metadata or in the meta-catalog alongside the data. This ensures that users know the correct citation information.

The search portal is being unified at both European and national level. This is a long-term project, requiring tasks such as cross-referencing vocabulary. While we await the centralization and unification of a search portal, there are a number of interfaces that are well suited to the needs of our community.

Alongside of each DOI, ESPRI we provides a landing page corresponding to the DOI with the conditions of use and the way to cite the data: “Permission is granted to use these data in research and publications mentioning the DOI and accompanied by the following statement: The authors acknowledge the XXX group for supplying the data and the data center ESPRI/IPSL for their help in accessing the data.

In the ESGF context, the DKRZ has been mandated to host the CMIP6 Data Citation service. For each CMIP6 simulation (including IPSL ones) a DOI is automatically registered when published on the ESGF. Thus, CMIP6 data can be cited in scholarly publications using DataCite DOIs