Hello every one. My name is Reza Salek.
I'm from EMBL-EBI, which stands
for European Molecular Biology Lab - European Bioinformatics Institute.
The EBI is essentially in the business of biological sciences data.
It captures, shares, and makes publicly available
all the data and resources that people can use to get access to data,
to analyze data, and reuse data, in a wide variety of biological sciences.
In this talk, I'm only focusing on metabolomics,
which makes up a small portion of the efforts which are ongoing at EBI.
I would like to go through the current state
of data sharing and data standards in metabolomics.
I have to say here at the beginning that this is by
no means an effort within EBI or a particular group.
This is a community effort which many people, many labs,
and many groups, are involved in different aspects of standardizations
and contribute to data sharing and the different elements of it.
So, I will try to acknowledge as many people as possible that I can recall and
are involved, and I apologize in advance if I have missed
someone out of this, to be recognize.
I'm going to start with the notion that has been promoted
by the European Open Science Cloud community.
For all experiments, as we know, and for all the sciences,
it is important for results to be reproduced.
I guess this is a core tenet of science.
A result is not valid unless it can be reproduced.
Therefore, one way to improve the chance of reproducibility is by making
data, or datasets, or everything that
has been an element of the experiment to be available.
So at this point,
I will only focus on the data part of this reproducibility, and what are
the principles that could help with good data sharing and good standards.
So, the open research data and open science community has proposed these "FAIR" principles.
What does FAIR principles stand for?
Data that are Findable,
Accessible, Interoperable, and Reusable.
So finding, finding (data) is the key to be able to access data,
and as you can remember in many journals, or many papers and manuscripts,
often people used to share (their) data by providing
a local file server link or FTP link, and they say,
"Here's the data from our local server if you want to access and see it."
And over time, probably the server moved or is not
there, so that stops people being able to access data.
So, just sharing data saying,
"Here is my datasets" is not a good way of sharing data.
It needs several elements that you can find it and
you can discover it, ideally within a stable database,
and is accessible, so you can read the files in an open source format.
So, if I only share my data in a format that only could be used
by one particular tool that I have, that makes it not accessible.
So the Accessible principle is to make data (available) in an open access format.
There are other elements of it which means making sure that it's Interoperable -
meaning that the other services and,
their databases, can actually also see, discover, find,
and be able to communicate with the resources that
house the data. And finally we come to Re-usability -
meaning that we can actually use the data for
some other experiments, or to cross multiple experiments to produce a result.