The repository accepts data and metadata based on defined criteria to ensure relevance and understandability for data users
ESPRI is a thematic repository dedicated to climate observations and modelling data. It is intended to host all data of interest for the climate science community and data produced at the IPSL. The criteria of acceptability of the data in the ESPRI repository are:
- Data produced at IPSL or within an IPSL-related project,
- Community-based data from reference sources that constitutes a spatially and/or temporally coherent set (e.g. time-series of observations, data related to a field campaign, global or regional climate simulations, mesoscale simulations of a meteorological event, etc.).
ESPRI centralizes data from reference sources whose authenticity is recognized by the scientific community. Forexample, the WMO has commissioned the ESGF to disseminate the CMIP and CORDEX climate simulations which are replicated on ESPRI (see the Data Management Plan in the relevant links). In addition, all data produced by the IPSL are by nature within the scope of ESPRI. If the data submitted do not fall within the missions, themes or expertise of the ESPRI service, the ingestion of data into the national RIs repositories allows the request to be redirected to other known partners capable of handling these data.
Data are validated by the scientific teams responsible for the dataset before collection or dissemination by ESPRI. Therefore, ESPRI only checks data understandability through their metadata, and in particular the conformance to data reference syntaxes that define file formats, filenames, directory structures as well as attributes and keys that the files should include. Consequently, the data formats used must allow for the storage of metadata associated with the data (e.g., NetCDF, HDF5, Nasa Ames, etc.).
To do this, ESPRI shares guidelines from reference sources or research projects with data providers to help them format the data and complete the necessary metadata (see Strateole 2 project guidelines for instance in the relevant links). Deposit of data in non-preferred or non-supported format is strongly discouraged and only allowed for long tail data under the responsibility of the data provider. If the submitted data does not meet the predefined data and metadata standards, ESPRI service can set up data transformation workflows when necessary to convert the data into a standard format and add the missing metadata in interaction with the data provider.
To allow the discovery and exploitation of the data hosted by the ESPRI service, particular attention is paid to the respect of the community’s data format and metadata standards for the data submitted to the repository (e.g., the Climate and Forecast Convention, the Attribute Conventions Dataset Discovery). As guidelines cannot ensure the quality of data that fall within our scope, ESPRI applies a variety of quality checks with community and dedicated tools.
Those checks are not applied by the depositors but by clearly identified ESPRI engineers (since they are part of the staff). There is therefore no specific procedure for verifying that the checks have been carried out according to the procedure in force, as this is part of the missions of ESPRI engineers. The assignments of ESPRI staff are monitored and evaluated internally each year.
Finally, because of the multiplicity of research themes addressed at IPSL, the ESPRI service must be able to support new data standards from particular themes of the IPSL community. For example, ESPRI has recently integrated Paleoclimate related data by supporting the LiPD format within its data discovery and distribution information system.