WP2 - Standards Development

WP2, “Standards Development” delivers the exchange formats and terminological artefacts needed to describe, exchange and query both, metabolomics data and experimental metadata, e.g. provenance of study materials, technology and measurement types. 

Aims: To deliver the exchange formats and terminological artifacts needed to describe, exchange and query both the metabolomics data and the contextual information (‘experimental metadata’ — e.g., provenance of study materials, technology and measurement types, sample-to-data relationships). We will ensure that these standards are widely accepted and used by involving all major global players in the development process.

Role in metabolomis community: The consortium represented by COSMOS already contains the majority of players in Metabolomics in Europe and other global players in the field have provided letters of support. Those and others will be invited both the work meetings as well as the regular stakeholder meetings.

Open source: As the open standards developed here are supported by open source tools, they can be easily put to work which will aid adoption. We will develop and maintain exchange formats for raw data and processed information (identification, quantification), building on experience from standards development within the Proteomics Standards Initiative (PSI).

nmrML: We will develop the missing open standard NMR Markup Language (NMR-ML) for capturing and disseminating Nuclear Magnetic Resonance spectroscopy data in metabolomics. This is urgently needed as long-term archival format if metabolomic databases are to capture all the formats of metabolomic data, as well as supporting developments in cheminformatics and structural biology.

MS: For mass spectrometry, we will work with the PSI to extend existing exchange standards to technologies used in metabolomics, e.g. gas chromatography, imaging mass spectrometry and the identification tools and databases.

Metadata: In addition to the raw data formats, we will need to continue the development of standards for experimental metadata and results, independent of the analytical technologies. We will review, maintain and, where needed, extend reporting requirements and terminological artefacts developed by Metabolomics Standards Initiative (MSI). We need to represent quantification options in MS and NMR, and the semantics of data matrices used to summarize experimental results, key information which often is only available in PDF tables associated to manuscripts. As research in biomedical and life sciences is increasingly moving towards multi-omics studies, metabolomics must not be an island.

ISA-Tab: The ‘Investigation/Study/Assay’ ISA-Tab format was developed to represent experimental metadata independently from the assay technology used. We will use ISA-Tab to standardize metabolomics reporting requirements and terminologies through customized configurations.

Finally, we will explore semantic web standards that facilitate linked open data (LOD) throughout the biomedical and life science realms, and demonstrate their use for metabolomics data. While the technical standards already exist, we will need to develop the “inventory” of terms and concepts required to express facts about metabolomics, capturing the data to characterize studies and digital objects in metabolomics to facilitate the data flow in biomedical e-infrastructures.

This work package is lead by IPB-Halle

Recent Progress:

A first prototype version of the nmrML XSD, accompanying CV and example XML instances are available under the Github development pages:

The nmrML XML Schema (XSD):

https://github.com/nmrML/nmrML/blob/master/xml-schemata/nmrML.xsd

The nmrML Controlled Vocabulary (CV):

https://github.com/nmrML/nmrML/blob/master/ontologies/nmrCV.owl

Two nmrML example files:

https://github.com/nmrML/nmrML/tree/master/examples/working.tmp/nmrML

https://github.com/nmrML/nmrML/tree/master/examples

Further URLs pointing to development resources:

Browsable HTML serializations of the XSD and the CV can be found in the github folders nmrML\docs\SchemaDocumentation\HTML_Serialisations  and nmrML\docs\CVDocumentation\OwlDoc respectively.

GitHub site:                    https://github.com/nmrML/nmrML

nmrML website:             http://nmrml.org

nmrML wiki:                   http://cosmos-fp7.eu/nmrML/

nmrML google forum:   https://groups.google.com/forum/#!forum/nmrml

 

Deliverables:

D2.1 - Completion of GC-MS for mzML (m6, done)

D2.2 - Data exchange format for metabolite identification (m12, done)

D2.3 - Data exchange format for metabolite quantitation (m12, done)

D2.4 - Definition of NMR-ML Schema, initial MSI-NMR ontology, example files (m12, done)

D2.5 - Real data, Converters, Validators and Parsers for NMR-ML (m24)

D2.6 - Collection of ISA configurations for metabolomics studies (m27)

D2.7 - Test infrastructure for the validation of ISA dataset (m36)

D2.8 - Guideline document on RDF and SPARQL for metabolomics resources (m24)

D2.9 - Public availability of query endpoints for linked data from EBI, MPG, IPB (m36)