About¶

Maggot, a research data cataloging tool¶

Motives¶

Meet the challenges of organizing, documenting, storing and sharing data within a collective (unit, platform, project, etc.).
Have visibility of what is produced within the collective: datasets, software, images, sounds, videos, analyses, codes, etc.
Fall within an open science approach for sharing and reproducibility by promoting FAIR principles within the collective.
Raise awareness among newcomers and students about a better description of what they produce.

State of need¶

Implementing a data management plan imposes prerequisites such as the externalization of data to be preserved outside of users' disk space. This does not only concern published data but all data produced during the duration of a project. Above all, this outsourcing makes it possible to gather the data in one place and already constitutes a first-level backup. This becomes even more necessary when temporary agents (doctoral students, post-docs, interns, fixed-term contracts) are involved in data production.
Consequently, the concern arises about the organization of these storage spaces. Should they be harmonized, i.e. impose good practices such as i) the naming of folders and files, ii) a folder structure (docs, data, scripts, etc.), iii) the use of README files, etc.
At a minimum, using a README file seems the simplest and least restrictive. But then the question arises “what to put in it”? Templates can be offered to simplify their writing. But then the question arises of how to use them effectively when we want to find information? With what vocabulary?

Proposed approach¶

The two main ideas behind the tool are:
- Make the data storage space a data repository without having to move the data, then ensure that the metadata gets to the data.
- Be able to “capture” the user’s metadata as easily as possible by using their vocabulary.
Concerning the first idea: "Just" place a metadata file (JSON format) describing the project data in each subdirectory, and then find the projects and/or data corresponding to specific criteria. The choice fell on the JSON format, very suitable for describing metadata, readable by both humans and machines.
Concerning the second idea: Given the diversity of the fields, the approach chosen is to be both the most flexible and the most pragmatic possible by allowing users to choose their own vocabulary (controlled or not) corresponding to the reality of their field and their activities. However, a good approach is as much as possible to use only controlled vocabulary, that is to say relevant and sufficient vocabulary used as a reference in the field concerned to allow users to describe a project and its context without having to add additional terms. To this end, the tool must allow users a progressive approach towards the adoption of standardized controlled vocabularies (thesauri or even ontologies).
With the approach proposed by Maggot, initially there is no question of opening the data, but of managing metadata associated with the data on a storage space with a precise perimeter represented by the collective (unit, team, project , platform, …). The main characteristic of the tool is, above all, to “capture” the metadata as easily as possible according to a well-chosen metadata schema. However, the opening of data via their metadata must be a clearly stated objective within the framework of projects financed by public institutions (e.g Europe). Therefore if you have taken care to correctly define your metadata schema so that it is possible to make a metadata crosswalk (using a mapping file) with a data repository recognized by the international community, then you can easily "push" its metadata with the data without having re-enter anything.

Links¶

Source code on Github : inrae/pgd-mmdt
Issues tracker : inrae/pgd-mmdt/issues
Instance online : INRAE UMR 1322 BFP

Publication¶

Daniel Jacob, François Ehrenmann, Romain David, Joseph Tran, Cathleen Mirande-Ney, Philippe Chaumeil, An ecosystem for producing and sharing metadata within the web of FAIR Data, GigaScience, Volume 14, 2025, giae111, DOI:10.1093/gigascience/giae111

Créez, gérez et partagez vos métadonnées avec Maggot (2025) - VO-INRAE
Usages d'AgroPortal dans des systèmes d'information à INRAE (2025) - hal-05077024

Contacts¶

Daniel Jacob (INRAE UR BIA-BIBS - 0000-0002-6687-7169)

Discussion space¶

Issues tracker on github
Tchap : Only for the French Education and Research (ESR) community

Designers / Developers¶

Daniel Jacob (INRAE UR BIA-BIBS)
François Ehrenmann (INRAE UMR BioGECO)
Philippe Chaumeil (INRAE UMR BioGECO)

Contributors¶

INRAE
- Edouard Guitton (Dept. SA, PFIE), David Benaben & Cathleen Mirande-Ney (UMR BFP), Joseph TRAN (UMR EGFV), Stéphane Bernillon (UR MycSA)
CIRAD
- Julien Paul (AGAP institute)
ERINHA
- Romain David
Mines de Paris - PSL
- Amélie Masson, Sandrine Mouret, Sébastien Perrin

License¶

GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007 - GPLv3

About¶

Maggot, a research data cataloging tool¶

Motives¶

State of need¶

Proposed approach¶

Links¶

Publication¶

Related articles¶

Contacts¶

Discussion space¶

Designers / Developers¶

Contributors¶

License¶