Work packages use cases
Here you can find some use cases for each work package that might help to clarify our work and also get support from the community.
Table of contents
Jane User found a DataSet in MetaboLights that is highly related to her own research. She does not have a Thermo Sciex instrument, but since the data is deposited in the vendor neutral mzML, she can upload that to XCMS online and co-analyse with her own measurements.
Bob Bioinformatician has a very advanced dataset from a brand new instrument, but no single tool can handle all aspects of the data. He uses OpenMS FileFilter to split the data and write out mzML files, then mzMine2 to perform peak picking.
Jean-Luc Piccard found a copy of the internet inside a Borg cube. He converts all XML files in MetaboLights into native Tricorder format, and uses metabolite concentrations from the 21th century as reference data.
Michael Stravs downloaded MTBLS38 raw data in mzML, extracted spectra with RMassBank and submitted them to MassBank.
Piotr S. Gromski et al. take the mzTab files from several MetaboLights submissions and studies the influence of scaling metabolomics data on model classification accuracy (10.1007/s11306-014-0738-7)
The team at DBPedia pulls in all RDF triples from the COSMOS partners, and some articles on diseases can link to studies in MetaboLights
Mike Metabolite works at Metabovestigator, and pulls the highly annotated MetaboLights studies into a datawarehouse to provide high-level views of which metabolites respond to different stimuli.
The data providers provide SPARQL endpoints. Peter Poweruser can now query for metabolites, which have been measured from the same sample with different technologies (GC/MS, LC/MS and/or NMR).
Janet User found a DataSet in MetaboLights that is highly related to her own research. She does not have a Varian/Agilent NMR instrument, nor software, but since the data is deposited in the vendor neutral nmrML, she can download it, process it with open access tools like Batman or Matlab and co-analyse the results with her own measurements.
Biologist Irmela wants to apply R based statistics on NMR data over a variety of vendors. Thanks to the available nmrIO parser such vendor neutral data can be loaded into CRAN or Bioconductor for further R statistics.
Bioinformatician Herman wants to develop vendor neutral NMR data processing software. It would be hard to look into all different vendor formats, but since nrmML implicitly has mappings available Herman just needs to look at the nmrML schema to access the common intersecting parameters to read in.
An NMR repository curator decides to visualize the NMR raw FID data and the processed annotated NMR Spectrum. Thanks to nmrML and its Ident and Quant additions, he/she needs to only write one parser to visualize all vendors proprietary spectra.
Stephen the NMR repository curator likes to implement quality assurance for his repository data. Thanks to the semantic validation, he can now specify compliance of the available data by checking Minimal Information availability.
Michael is interested in developing a manual knowledge acquisition/capture tool to support the detailed manual submission of extensive NMR experiment setups into his repository. Thanks to the datatype constraints and semantic rules exploitable by the data acquisition tool, Michael can implement decision support and intelligent real time data verification into his data capture and submission tool.
Journal editor Marta likes to enforce a certain granularity and completeness level for NMR data in the supplementary material long term data store. ISA-Tab and the nrmML validation rules now allow for this throughout all vendor formats.
NMR software producer Thomas likes to do statistics on what types of NMR instruments were used in the recent MetaboLights submissions. Thanks to the updated and comprehensive Controlled Vocabulary referenced in the nmrML, Thomas can now analyse trends and shifts in nmr instrument and software usage.
Biologist Judy likes to do an abstract, generalized query, e.g. to find all experiments in a repository that tackle Mammals and where Indole derivatives were identified. Thanks to the robust is-a taxonomy established in the used ontologies (e.g. NCI Tax and ChEBI) a query engine can generalize over descriptor classes and can come up with a larger result set thanks to subsumption.
An institute’s data manager Paula decides to strive for long term data persistence and readability in her local repository dumps. Thanks to the stable standard format, she can rely on parsers being available during the next decades.
Hans maintains an NMR reference database, i.e. HMDB. To ensure his data is presented in the best normalized way possible, he can now compare his data polishing pipeline with the ones used in the analytic experimental data from different vendors.
XEML - efficient capture of metadata
The XEML designer system was integrated into the experiment capture workflow at the Max Planck Institute for Molecular Plant Physiolgy (MPIMP). By employing XEML, Metadata are efficiently recorded and facilitate greatly the subsequent upload to the Golm Metabolome Database (GMD) and display of metabolite measurement data. In particular, the contrast between measurement samples intended by the researcher will immediately be mirrored by the created data views
XEML - export; Export study descriptions and associated metadata to ISA-Tab format
The XEML designer system allows exporting metadata as ISA-Tab formatted metadata files. This greatly facilitates the upload of experimental datasets to the MetaboLights database and was successfully demonstrated for ?? datasets.
XEML - Plato database for biochemical phenotyping (INRA)/ case study
Team leader Yves Gibon and biologist Cécile Cabasson, working at INRA Bordeaux, want to combine experimental data from their own biochemical phenotyping database PLATO with environmental data collected with the XEML-Designer. Hundreds of experiments are already described in PLATO, waiting to be linked with their respectives environmental data.
Software tools - repository of tools and services for common metabolomics data processing tasks
The COSMOS partners have developed a total of 20 software tools and services (http://cosmos-fp7.eu/tools, see Deliverable 3.2) that originated from identified needs and are in active use at the site at which they were developed. Were suitable, broad adoption by the community can be expected.
Springer (http://link.springer.com/) pilots initial data capturing system as the publisher of the Metabolomics Journal
Royston Goodacre (WP4) has worked in collaboration with the Metabolomics Journal, the official journal from the Metabolomics Society, to implement an initial minimal information capturing system that provides useful MSI (Metabolomics Standards Initiative) compliant information regarding newly submitted articles. When submitting an article, authors are requested to provide additional information about the data under investigation by filling in online questions divided into 3 main topics: data availability, data analysis and metabolite identification. Whilst established procedures are not yet fully in place for metabolite data upload within COSMOS, the Metabolomics Journal is only encouraging researchers to deposit their data to one of the COSMOS recognized data repositories such as the MetaboLights database (http://www.ebi.ac.uk/metabolights/) database. In the future, however, data deposition prior to article submission is likely to become standard and compulsory for article submission. As a future “use case” example, suppose that the system is fully operational and completely integrated with COSMOS. When submitting an article to the Journal authors will be first required to upload experimental data used in the article (e.g., mass spectrometry data, spectroscopy data, etc.) into an appropriate (according to data type, study, etc.) COSMOS recognised database. Once the respective data have been correctly uploaded the database will generate a unique COSMO identification or accession number which identifies those data and all respective metadata information related to them. This accession number will then have to be provided to the Metabolomics Journal for successful submission of the article. COSMOS guidelines will ensure that the data and metadata collected by the recognised database comply with MSI standards. Once the article has been published, the related data are made publicly available, searchable and free to download via COSMOS.
More detailed description of the above mentioned collaborative effort can be found in the reporting of MS7 Agreement on minimum information.
List of journals developing data capture systems in collaboration with COSMOS:
Metabolomics Journal (http://link.springer.com/journal/11306)
NATURE Scientific Data (http://www.nature.com/sdata/data-policies/repositories#omics)
MDPI Metabolites, (see end of the document). (http://www.mdpi.com/journal/metabolites/instructions,
We have also recommended that the “Data should be submitted in accordance with MSI (Metabolomics Standards Initiative) guidelines” (http://metabolomicssociety.org/index.php/resources/metabolomics-standards
Allow metabolomics dataset/database providers to register their datasets in a central register. Provide a system to facilitate the registration of metabolomics datasets that are publicly available in a freely accessible database online (D.5.1).
Overview of all publicly available metabolomics datasets. Create an online interface for accessing all publicly available metabolomics datasets that have been registered by the individual data providers (D.5.1).
Search for datasets by keyword. Allow the metabolomics community to search for datasets of interest by entering keyword(s) (D.5.1).
Receive notification of newly available datasets. Allow the metabolomics community to subscribe to notifications of newly available datasets (D.5.2).
The WP7 of the BiomedBridges project aims at identifying and developing a set of annotations, terminologies, and mappings between terminologies for human and mouse models of diabetes and obesity. It tackles a major challenge related to the available mouse phenotype and human clinical data: different ontological phenotype descriptions hinder researchers from both sides to cross the species bridge between mouse models and human. To achieve integration at the level of phenotypes in these species interaction with the wider community is required. A dataset describing the urinary metabolic phenotype of severely obese human patients has been submitted to MetaboLights and shared with the BioMedBridges community. We achieved this by establishing a deposition pipeline between Biocrates AG MeDIQ data management application and EMBL-EBI MetaboLights repository. UOXF worked with Biocrates AG to produce an ISA-Tab and MAF file conversion component. This representation a significant milestones as it concretized COSMOS standardization effort with a major commercial provider of targeted metabolite profiling solution. The work offers unique opportunities for future data deposition as evidenced by the number of publications made relying on Biocrates technologies (27 manuscripts published between 2013 and 2014 alone, more than 50 since 2011). The solution developed by UOXF, Biocrates AG and EMBL-EBI allows one click submission of targeted metabolite profiling datasets, with extensive metadata, raw data files and quantified concentration, lending themselves to reuse, meta-analysis and dataset integration. Not only does it validate the use of ISA-Tab format to capture CIMR compatible experimental metadata and FIA-MS, LC-MS based targeted metabolomics data, it also sends a extremely encouraging signal to customers of commercial solutions: Data deposition does not need to be hard. By engaging early and constructively with vendors, standardization efforts yield fruition and make data management and custody tasks simpler in the long run. Better managed data mean provenance, better reuse, better assessment. We hope this example will also encourage other vendors to engage in a virtual circle of acceptance and implementation of COSMOS supported data management standard. Ultimately, this will guarantee a stable flow of good quality data to EMBL-EBI MetaboLights repository.