Data sharing and data standards for metabolomics: where are we now?

Published on May 31, 2018   38 min

Other Talks in the Series: Bioinformatics for Metabolomics

0:00
Hello every one. My name is Reza Salek. I'm from EMBL-EBI, which stands for European Molecular Biology Lab - European Bioinformatics Institute. The EBI is essentially in the business of biological sciences data. It captures, shares, and makes publicly available all the data and resources that people can use to get access to data, to analyze data, and reuse data, in a wide variety of biological sciences.
0:32
In this talk, I'm only focusing on metabolomics, which makes up a small portion of the efforts which are ongoing at EBI. I would like to go through the current state of data sharing and data standards in metabolomics. I have to say here at the beginning that this is by no means an effort within EBI or a particular group. This is a community effort which many people, many labs, and many groups, are involved in different aspects of standardizations and contribute to data sharing and the different elements of it. So, I will try to acknowledge as many people as possible that I can recall and are involved, and I apologize in advance if I have missed someone out of this, to be recognize. I'm going to start with the notion that has been promoted
1:20
by the European Open Science Cloud community. For all experiments, as we know, and for all the sciences, it is important for results to be reproduced. I guess this is a core tenet of science. A result is not valid unless it can be reproduced. Therefore, one way to improve the chance of reproducibility is by making data, or datasets, or everything that has been an element of the experiment to be available. So at this point, I will only focus on the data part of this reproducibility, and what are the principles that could help with good data sharing and good standards. So, the open research data and open science community has proposed these "FAIR" principles. What does FAIR principles stand for? Data that are Findable, Accessible, Interoperable, and Reusable. So finding, finding (data) is the key to be able to access data, and as you can remember in many journals, or many papers and manuscripts, often people used to share (their) data by providing a local file server link or FTP link, and they say, "Here's the data from our local server if you want to access and see it." And over time, probably the server moved or is not there, so that stops people being able to access data. So, just sharing data saying, "Here is my datasets" is not a good way of sharing data. It needs several elements that you can find it and you can discover it, ideally within a stable database, and is accessible, so you can read the files in an open source format. So, if I only share my data in a format that only could be used by one particular tool that I have, that makes it not accessible. So the Accessible principle is to make data (available) in an open access format. There are other elements of it which means making sure that it's Interoperable - meaning that the other services and, their databases, can actually also see, discover, find, and be able to communicate with the resources that house the data. And finally we come to Re-usability - meaning that we can actually use the data for some other experiments, or to cross multiple experiments to produce a result.
Hide

Data sharing and data standards for metabolomics: where are we now?

Embed in course/own notes