Common Data Index - CDI

The primary objective of the Common Data Index (CDI) Data Discovery & Access Service is to give users a highly detailed insight in the availability and geographical spreading of marine data across the different data centres and institutes across Europe and to provide a central and homogeneous interface for online data access to all these distributed data resources. The CDI provides an index (metadata format describes base) to individual data sets, is based - ISO 19139 metadata standards with INSPIRE compliance and it paves the way to online data access.

The CDI was initiated in the EU Sea-Search project and is further developed in the EU SeaDataNet and SeaDataNet II projects, in which the NODCs of 35 countries around European seas are participating. The SeaDataNet CDI service has also been adopted by several other EU data management projects and EU EMODNet projects lots. At present more than 90 Data Centres throughout larger Europe are connected to the CDI infrastructure and actively populating the service with their data sets for physical oceanography, marine geology, marine geophysics, marine chemistry, bathymetry and marine biology.

For the Netherlands a dedicated NODC CDI V5 service is operational, which gives access to more than 55.000 data sets from Rijkswaterstaat, KNMI, TNO, NIOZ and NIOO-CEME:

The Netherlands NODC data collections are also included in the pan-European SeaDataNet CDI V5 services, which brings together circa 1.55 Million CDI metadata from already more than 90 Data Centres. The CDI system has been adopted by a number of associated marine data infrastructure initiatives, such as Geo-Seas for marine geological and geophysical data, Upgrade Black Sea SCENE for the Black Sea region and the EMODnet pilot portals for Marine Chemistry, Hydrography and Biology.

All these initiatives result in a further populating and finetuning of the CDI V5 metadatabase as well as to enlarging the number of data centres that have connected their data systems to the CDI V5 system for providing harmonised access to their data sets.

How does it work?
The CDI V5 query interfaces enable users to search freely by a set of criteria. The selected data sets are listed. Geographical locations are indicated on a map. Clicking on the display icon retrieves the full metadata of the data set. This gives information on the what, where, when, how, and who of the data set. It also gives standardised information on the data access restrictions, that apply. The interface in addition features a shopping mechanism, by which selected data sets can be included in a shopping basket.

All users can freely query and browse in the CDI V5 directory; however submitting requests for data access via the shopping basket requires that users are registered in the SeaDataNet central user register, thereby agreeing with the overall SeaDataNet User Licence.

The data requests are forwarded automatically from the portal to the relevant data centres. This process is controlled via the Request Status Manager (RSM) Web Service at the portal and a Download Manager (DM) java software module, implemented at each of the data centres. The RSM also enables registered users to check regularly the status of their requests and download data sets, after access has been granted. Data centres can follow all transactions for their data sets online and can handle requests which require their consent.

Each CDI V5 metadata record includes a data access restriction tag. It indicates under which conditions the data set is accessable to users. Its values can vary from ‘unrestricted’ to ‘no access’ with a number of values in between. During registration every user will be qualified by its national NODC / Marine Data Centre with one or more SeaDataNet roles. The RSM service combines for each data set request the given data access restriction with the role(s) of the user as registered in the SeaDataNet central user register. This determines per data set request, whether a user gets direct access automatically, whether it first has to be considered by the data centre, that therefore might contact the user, or that no access is given.

Configuration, maintenance and formats
For purposes of standardisation and international exchange the ISO19115 - ISO19139 metadata standards have been adopted. The CDI V5 format is defined as a dedicated subset of this standard and ISO and INSPIRE compliant. A CDI V5 XML format supports the exchange between data centres and the central CDI manager, and ensures interoperability with other systems and networks. CDI V5 XML entries are generated by participating data centres, directly from their databases. Data centres can make use of a dedicated Java Tool (MIKADO) to generate CDI V5 XML files automatically, following a properties file, which defines the mapping between CDI-format and partner’s database fields and the required local queries. CDI updates are produced and transferred at regular intervals.

The connection between the data systems of data centres and the RSM Web service can be realised by the data centres installing and configuring a Download Manager java component, that handles the communication with the portal, retrieving of requested data sets and providing download services to the users.

More information and the software tools itself can be found at the SeaDataNet website in the section 'Standards & Software'.

Common Vocabularies and Ontologies
Use of common vocabularies in all metadatabases and data formats is an important prerequisite towards consistency and interoperability. Thereby it is of upmost importance that these vocabularies are supported by a large group of stakeholders, accessable for all users and kept up to date in a controlled way.

Therefore SeaDataNet operates and maintains a service to provide ‘controlled vocabularies’, which are used in the metadata and to label data. This SeaDataNet Vocabulary service provides access to lists of standardised terms that cover a broad spectrum of disciplines of relevance to the oceanographic and wider community.

The SeaDataNet Vocabulary service is based upon the NERC Common Vocabularies (NVS 2.0) web service, developed and operated by BODC. For end-users a vocabulary Client Interface has been developed and is operated by MARIS, to provide users the options to search and browse in the various vocabularies. To harvest the latest versions of the lists from the NVS 2.0 web service an automatic synchronisation is included. This arranges loading of the latest updates into a local buffer for feeding the Search and Browse interface.

The vocabulary Web service works closely together with the MIKADO JavaTool, that is available for the CDI V5 XML generation. In addition there is an XML Validation Web service which supports data centres to validate samples of their CDI V5 XML production.