nmrML format release Feb 2014Submitted by Reza on Wed, 01/29/2014 - 12:24
Development of nmrML format: Currently, the most widely used data exchange format for NMR data is JCAMP-DX version 6.0 by the Joint Committee on Atomic and Molecular Physical Data (Davies and Lampen 1993), but the specification is not very rigorous and many different flavors exist in the wild, which can lead to incompatibilities between different software packages. It is also not easily extendable to capture supplementary information.
The MSI workgroups have provided detailed suggestions about the minimum information metadata to be captured for a NMR experiment. In particular, the MSI, had put forth recommendations to report instrument descriptions and configurations, instrument-specific sample preparation and data acquisition parameters (Rubtsov, Jenkins et al. 2007), which resulted in a first round of NMR XML data standard development, focusing on raw and processed one- and two-dimensional NMR experiments and associated metadata (Ludwig, Easton et al. 2012).
Inspired by the huge success of mzML in mass spectrometry, the COSMOS COordination Of Standards In MetabOlomicS (http://cosmos-fp7.eu) consortium has joined forces with other groups and has now merged and adopted existing schemata into a new nmrML format (http://nmrml.org). The format consists of the XML schema that defines the structure of an nmrML file. This structure is deliberately kept simple to ease the task of implementation, and avoid the need for frequent changes when the terminology needs to accommodate upcoming new technologies and parameters. Instead, these will be annotated in the nmrML file using the second component of nmrML, the controlled vocabulary terms from the nmrCV ontology. The nmrCV is based on earlier work at the EMBL-EBI (Sansone, Schober et al. 2007) and efforts at the Metabolomics Innoventions Center (David Wishart Group). The nmrCV contains nearly 600 terms and partly relies on external sources like CHEBI for chemical information, thus making it an integrative resource. Term request can be channeled through the issue tracker/mailing list.
We also provide early prototypes for file converters from vendor formats to nmrML, as well as parser libraries for Java, R and python, which can be used by open NMR processing and analysis software.
The development of nmrML is taking place on Github (https://github.com/nmrML/nmrml), where the specification documents, more detailed descriptions of our use cases, examples files and the parser libraries can be found.
We are now providing a first nmrML release candidate at http://nmrml.org for public consultation and feedback.
This work is part of WP2 deliverable lead by Steffen Neumann and Daniel Schober (IPB-Halle) in collaboration with Michael Wilson and David Wishart (U Alberta Canada), Luis de Figueiredo and Reza Salek (EMBL-EBI), Daniel Jacob and Catherine Deborde (Centre INRA de Bordeaux) and Philippe Rocca-Serra (University of Oxford e-Research Centre).
Contribution and feedback from: Jie Hao and Tim Ebbels (Imperial College), Christian Ludwig, John Easton, (University of Birmingham), Annick Moing (Centre INRA de Bordeaux), Leonardo Tenori (University of Florence), Antonio Rosato (University of Florence), Ian Lewis (Princeton) and many more