[NeXML]
Rich phyloinformatic dataThe future data exchange standard is here!
NeXML is an exchange standard for representing phyloinformatic data — inspired by the commonly used NEXUS format, but more robust and easier to process.
Overview
The NEXUS flat file format is a commonly used syntax for phylogenetic data. Unfortunately, over time, NEXUS has become overloaded - which has caused various problems. Meanwhile, new technologies around the XML standard have emerged. These technologies have the potential to greatly simplify, and improve robustness, in the processing of rich phylogenetic data. This website is the home for the community-driven NeXML project, which seeks to leverage XML technologies in the development of a new data standard that translates NEXUS concepts into a syntax that is more easily validated and processed. This approach promises several advantages:
- Syntax validation — some of the issues hampering interoperability are caused by the fact that no formal specification exists for NEXUS and other flat files, and no unambiguous way to validate them. Using XML Schema we have defined a versioned grammar against which data files can be validated syntactically.
- Semantic annotation — another issue in current file formats is that their semantics are not well-defined. For example, what does it mean to use an ambiguity code in a matrix? Is it uncertainty or polymorphism? With the wider EvoInfo working group we are developing an ontology on which we are mapping NeXML schema types so that the semantics of data files become well-defined. In addition, NeXML has a facility for annotating fundamental phylogenetic data objects (such as trees, character state matrices and taxa) with ontology predicates and objects using RDFa.
- Web services — a number of different technologies (such as XML-RPC, REST and SOAP) have emerged allowing disparate, xml-based services to be glued together over the internet. For example, the PhyloWS initiative seeks to develop conventions for RESTful phylogenetic web services for which NeXML is one of the preferred response formats.
Therefore, a group of developers of phylogenetic software have come together as part of the NESCent working group for evolutionary informatics to develop a new data exchange standard based on these technologies.
[ Back to top ]
What are we doing about it?
NeXML development is being undertaken in a number of subprojects:
- In the first place, we're designing an XML schema. This schema (designated as namespace www.nexml.org/2009) is explained on our wiki and formally documented; the bleeding edge version is available from svn; the source code can be browsed on our site (it's a check out from our repository which is updated every five minutes); for bug reports and feature requests please visit our issue tracker page.
-
Secondly, we're implementing NeXML read and/or write abilities in a
number of software applications:
TreeBASE now supports serialization to NeXML.
The Mesquite project now supports reading and writing of NeXML. Wayne Maddison and Peter Midford helped start an implementation that is currently being maintained by Rutger Vos.
Xuhua Xia's DAMBE version 5.2.31 for Windows Vista/7 reads and writes NeXML data.
The PhenoScape project uses NeXML to annotate complex morphological character states with ontology terms in its Phenex editor.
The phylobase package for R reads and writes tree descriptions, with character matrices under way. This implementation is being developed by Aaron Mackey.
Jeet Sukumaran has implemented NeXML I/O for python in the DendroPy package. There are many DendroPy code samples for dealing with NeXML data in the wiki manual.
Chase Miller has implemented Bio::NexmlIO for BioPerl, which under the hood reuse Rutger Vos's Bio::Phylo parser libraries.
Anurag Priyam and Rutger Vos have developed a NeXML I/O plugin for the BioRuby open source bioinformatics library for Ruby.
Jaime Huerta-Cepas is working on NeXML I/O for the ETE Python environment for tree exploration.
Matt Yoder has implemented NeXML serialization for the mx collaborative web-based content management system for evolutionary systematists.
Andrew Hill has added NeXML support to PhyloBox.
Sam Smits has made it so that the jsPhyloSVG tree visualization widget can now show NeXML trees.
Mike Keesey has added NeXML support to Names On Nodes, a web application that automatically applies biological nomenclature to datasets. For the 2011 Google Summer of Code, Apurv Verma has added NeXML reading capability to phyloGeoRef.
Mark Jensen has implemented NeXML compatability for the HIVQuery web application.
- Third, we're crossreferencing the NeXML schema with the Character Data Analysis Ontology which is being developed by other members of the EvoInfo working group.
[ Back to top ]
Get involved!
If you are interested in being involved in the NeXML project in any way, please do! Here are some ways to get involved:
- Get informed — information about the NeXML project is distributed over the manual (for an overview of vision, plans, implementation), documentation (for formal description of the schema) and the mailing list (for immediate plans and discussion).
- Try it out — the download section of the website has nightly builds of bindings for various languages. Take these for a spin!
- Contribute — if you are a programmer interested in extending NeXML support, please contact us through the mailing list to get commit support for the subversion repository.
[ Back to top ]
Acknowledgements
The research leading to these results has received funding from the [European Community's] Seventh Framework Programme ([FP7/2007-2013] under grant agreement nĀ° [237046].
[ Back to top ]