Data integration on hadoop and web-service using spark
The XData project was a french collaboration between industrials, startups as well as big companies, and academics. Its main objective was to develop innovative commercial products constructed from the integration of private data with open data.
I mostly work on the xdata “movement analytics” application. In particular:
- I worked on data integration of the movement data type: any type of data that represent people movement such as housing or companies moving, tourist displacement, or cell phone tracking. The integration is done in two main parts: first a generalized data structure defined with a generic data descriptor that allows importing any data set containing movement data ; second an automated data query algorithm made to select suitable movement entries with respect to geographical and temporal area and granularity.
- I was in charge of transfering a stand alone prototype of the main web application, which use mysql and spring technologies, on the hadoop cluster of the xdata project, in particular using spark and hive.
Ontology and web semantic
The SIFR project investigate the scientific and technical challenges in building ontology-based services to leverage biomedical ontologies.
My work is on the the annotators web service which purpose it to:
- Provide a unique access point to several server running the ontology annotators developed by the NCBO, such as their bioportal.bioontology.org/annotator.
- Wrap new functionalities around these annotators. In particular I work on adding RDF output format and the annotation scoring methods which have been published in Scoring semantic annotations returned by the NCBO Annotator
The language of root architecture
With several of the main actors in root system measurement and analysis, we have develop the RSML file format. It allows to store 2D or 3D image metadata, plant and root properties and geometries, continuous functions along individual root paths and a suite of annotations at the image, plant or root scales, at one or several time points. The plant ontologies are used to describe botanical entities that are relevant at the scale of root system architecture.
Go to the RSML web site
High-throughput Root System Architecture extraciton from images
I was in charge of developing the RhizoScan python package.