Author Archives: juh

WordPress on AWS

On September 2017, the redhat cloud I used to host my wordpress site changed its structure (and pricing). I didn’t like that and moved to AWS.

This is how this website was deployed

There are lots of tutorial that explained everything (but for solving some critical problem explained below):

It all worked fine, but I later had some trouble. My site wasn’t answering anymore and I couldn’t connect with ssh, timeout. I now think that I must have made an error with the credential. But at the time I didn’t know and, following some advice found on internet, I restarted my instance: I still couldn’t connect, but this time with connection refused…

After looking at the console log of my instance, I found out that ssh didn’t start, due to some incorrect owner rights on some system directories. But how to correct some file-system ownership when I cannot connect to the instance !?!

The solution, in short :

  • Shutdown the instance
  • Disconnect the EBS volume
  • Create a new instance (same config) and connect the EBS of the 1st instance on it
  • Start and connect to it.
  • Correct the files owner ship on the mounted drive
  • Revert the three first steps

System integrator at The-ICA june 2015

I am one of the original IT of the ICA start up. We develop big-compute technologies and BI for financial risk analysis. We use all the new cloud tech that can help us leverage some very high computing and distributed data challenges.

My role span from implementing the distributed computing engines to the pricer of derivative products and financial metrics. One of my main specialty is to cope with all expertise of the companies, cloud IT to financial math. And as such, I have a central position regarding the design of the main computational workflow.

Web app using spark Processing hdfs data from a spring web app

In the xdata project I had to convert a stand alone Spring web app into a “big-data” web app running on a hadoop cluster. To do that, I chose to use apache spark and spark-hive because it provided the most practical interface. I however could not find any documentation or tutorial on such use of spark in java spring web application.

To test how to setup such application, I made two getting-started prototypes:

  1. A spring+spark web app: it implements a very simple web service that reads and converts files either on local file system or on hdfs, using spark.
  2. A spring+spark-hive web app: its implements simple web services that generate a hive table and requests content from it.

The main difficulty was about run-time dependencies: dependencies used for compilation (such as provided through maven) were not working together at run-time (at the time of writing this post).

To run a stand alone app, one should add the `$SPARK_HOME/lib/spark-assembly-X.X.X-hadoopY.Y.Y.jar` (provided by the spark installation) to the classpath. For the spark-hive case, the datanucleus dependencies found in spark lib should also be added. Because, web app are run by a servlet container, such as tomcat or jetty, this jar should be added:

  1. Either to the war file, such as recommended for web app. It is however a 140Mb dep. This is what is used in the spring+spark web app.
  2. Or to the class path of the servlet container. This is what is used in the spring+spark-hive web app which is added to the maven jetty plugin.

For my (professional) work, I chose the second solution: add the spark jar to maven jetty plugin which is used during development, and I included the jetty-runner to the project which I run with spark jar added to the classpath (using the `–jar` option).

Making jar by hand, with dep Back to the basic

I happened to have problem running a jar with dependencies, and I found my-self wondering: Do I really know how it works?

I don’t remember having ever done it manually, so I tried. It is a good thing to be sure of what we know. Here is the code, lets two classes A and B. A’s got the main and requires B:

package one.pack;
 
import two.pack.B;
 
public class A{
    public static void main(String args[]){
        B b = new B();
        b.println("Hello world");
    }
}
package two.pack;
 
public class B{
    public void println(String message){
        System.out.println("B says: "+message);
    }
}

Now, let’s compile everything

# compile B.java into B.jar
javac B.java -d .    # "-d .": automatically create folders two/pack/
jar -cf B.jar two/     
 
# compile A.java into A.jar
javac -cp B.jar A.java -d .   # include B.jar in the classpath
jar -cf A.jar one/

Finally, move A.jar and run it

mv A.jar /somewhere/else
java -cp /somewhere/else/A.jar:B.jar one.pack.A

Output: `B says: Hello world`

Conclusion, it is just as expected but now at least I’m sure of it 🙂

WordPress server in redhat cloud Free and simple

Obsolete redhat has changed its cloud on September 2017 and I moved to AWS

Here is a simple method to deploy a wordpress server on the redhat cloud and attach it to a domain name, for free. This is what I have done to host this site.

Create a wordpress server:

  1. Create an account on OpenShift, or login
  2. Add an application, and select “wordpress”
  3. If the account was just created, choose its (unique) namespace
  4. Choose the application name and fill up the form
  5. Go to application-namespace.rhcloud.com site to create the wordpress user account
  6. Add content and customize the site directly from wordpress

Access the server:

  1. install the rhc command, then run rhc setup
  2. log in with ssh and/or git clone the wordpress repo from openshift
    • The url for ssh and git can be found on the page of the application once logged in openshit.
  3. Note for git: content should be added in the .openshift/[themes|plugins|…] folder of the repo. It will be copied in the suitable folder of the wordpress server (i.e. $OPENSHIFT_DATA/[themes|…]) by git push.

Finally, to attach this application to a domain name:

  1. Follow this 2 steps procedure
  2. Then in wordpress dashboard > setting > general, change the “site address (url)” with the url of your domain

This entry was posted in cloud, cms, dns, other, php, web-site on 8 December 2014 by diener.

SIFR project Ontology and web semantic

The SIFR project investigate the scientific and technical challenges in building ontology-based services to leverage biomedical ontologies.

My work is on the the annotators web service which purpose it to:

  1. Provide a unique access point to several server running the ontology annotators developed by the NCBO, such as their bioportal.bioontology.org/annotator.
  2. Wrap new functionalities around these annotators. In particular I work on adding RDF output format and the annotation scoring methods which have been published in

XData project Data integration on hadoop cluster

The XData project is a french collaborative project between industrials, startups as well as big companies, and academics. Its main objective is to develop innovative  commercial product constructed from the integration of private data with open data.

I mostly work on the xdata “movement analytics” application. More specifically on:

  1. The data integration of the movement data type: any type of data that represent people movement such as housing or companies moving, as well as tourist displacement. The integration is done in two main parts: first a generalized data structure cas defined with a generic data descriptor to allow importing any data set containing movement data ; second an automated data query algorithm has been defined to select suitable movement entry with respect to geographical and temporal area and granularity.
  2. The transfer of the stand alone prototype of the web application, which use mysql and spring technologies, on the hadoop cluster of the xdata project, in particular using spark and hive.

RSML The language of root architecture

With several of the main actors in root system measurement and analysis, we have develop the RSML file format. It allows to store 2D or 3D image metadata, plant and root properties and geometries, continuous functions along individual root paths and a suite of annotations at the image, plant or root scales, at one or several time points. The plant ontologies are used to describe botanical entities that are relevant at the scale of root system architecture.

Go to the RSML web site

Rhizoscan High-throughput Root System Architecture extraction from images

The RhizoScan is a project, founded by Rhizopolis and Numev, to develop image and graph analysis technologies to automatically process image of root systems and extract their architecture.

See the RhizoScan python package.

Poster at the 7th internation conference on Functional-Structural Plant Model
Poster at the CSHL meeting: Automated Imaging and High Throughput Phenotyping