Skip to content

Home

Home

An ecosystem for sharing metadata


Foster good data management, with data sharing in mind

Sharing descriptive Metadata is the first essential step towards Open Scientific Data. With this in mind, Maggot was specifically designed to annotate datasets by creating a metadata file to attach to the storage space. Indeed, it allows users to easily add descriptive metadata to datasets produced within a collective of people (research unit, platform, multi-partner project, etc.). This approach fits perfectly into a data management plan as it addresses the issues of data organization and documentation, data storage and frictionless metadata sharing within this same collective and beyond.


Main features of Maggot

The main functionalities of Maggot were established according to a well-defined need (See Background).

  1. Documente with Metadata your datasets produced within a collective of people, thus making it possible :
  2. Search datasets by their metadata
    • Indeed, the descriptive metadata thus produced can be associated with the corresponding data directly in the storage space then it is possible to perform a search on the metadata in order to find one or more sets of data. Only descriptive metadata is accessible by default.
  3. Publish the metadata of datasets along with their data files into an Europe-approved repository

See a short Presentation and Poster for a quick overview.


Overview of the different stages of metadata management


Note: The step numbers indicated in the figure correspond to the different points developed below

1 - First you must define all the metadata that will be used to describe your datasets.    All metadata can be defined using a single file (in TSV format, therefore using a spreadsheet). This is a unavoidable step because both input and search interfaces are completely generated from these definition files, defining in this way each of the fields along with their input type and also the associated Controlled Vocabulary (ontology, thesaurus, dictionary, list of fixed terms). The metadata proposed by default was mainly established according to the DDI (Data Documentation Initiative) metadata schema. This schema also largely corresponds to that adopted by the Dataverse software. See the Terminology Definition section.  

2 - Entering metadata will be greatly facilitated by the use of dictionaries.    The dictionaries offered by default are: people, funders, data producers, as well as a vocabulary dictionary allowing you to mix ontologies and thesauri from several sources. Each of these dictionaries allows users, by entering a name by autocompletion, to associate information which will then be added when exporting the metadata either to a remote repository, or for harvesting the metadata. Thus this information, once entered into a dictionary, will not need to be re-entered again.  

3 - The web interface for entering metadata is entirely built on the basis of definition files.    The metadata are distributed according to the different sections chosen, each constituting a tab (see screenshot). Mandatory fields are marked with a red star and must be documented in order to be able to generate the metadata file. The entry of metadata governed by a controlled vocabulary is done by autocompletion from term lists (dictionary, thesaurus or ontology). We can also define external resources (URL links) relating to documents, publications or other related data. Maggot thus becomes a hub for your datasets connecting different resources, local and external. Once the mandatory fields (at least) and other recommended fields (at best) have been entered, the metadata file can be generated in JSON format.  

4 - The file generated in JSON format must be placed in the storage space reserved for this purpose.    The role played by this metadata file can be seen as a README file adapted for machines, but also readable by humans. With an internal structure, it offers coherence and consistency of information that a simple README file with a completely free and therefore unstructured text format does not allow. Furthermore, the central idea is to use the storage space as a local data repository, so that the metadata should go to the data and not the other way around.  

5 - A search of the datasets can thus be carried out on the basis of the metadata.    Indeed, all the JSON metadata files are scanned and parsed according to a fixed time interval (30 min) then loaded into a database. This allows you to perform searches based on predefined metadata. The search form, in a compact shape, is almost the same as the entry form (see a screenshot). Depending on the search criteria, a list of data sets is provided, with for each of them a link pointing to the detailed sheet.  

6 - The detailed metadata sheet provides all the metadata divided by section.    Unfilled metadata does not appear by default. When a URL can be associated with information (ORCID, Ontology, web site, etc.), you can click on it to go to the corresponding link. Likewise, it is possible to follow the associated link on each of the resources. From this sheet, you can also export the metadata according to different schemata (Dataverse, Zenodo, JSON-LD). See screenshot 1 & screenshot 2.  

7 - Finally, once you have decided to publish your metadata with your data, you can choose the repository    that suits you (currently repositories based on Dataverse and Zenodo are supported).  


Additional key points

  • Being able to generate descriptive metadata from the start of a project or study without waiting for all the data to be acquired or processed, nor for the moment when one wish to publish data, thus respecting the research data lifecycle as best as possible

  • The implementation of the tool requires involving all data stakeholders upstream (definition of the metadata schema, vocabularies, targeted data repositories, etc.); everyone has their role: data manager/data steward on one side but also scientists and data producers on the other.

  • A progressive rise towards an increasingly controlled and standardized vocabulary is not only possible but even encouraged. First we can start with a simple vocabulary dictionary used locally and grouping together domain vocabularies. Then we can consider the creation of a thesaurus with or without mapping to ontologies. The promotion of ontologies must also be done gradually by selecting those which are truly relevant for the collective. A tool like Maggot makes it easy to implement them.




Inrae