Category Archives: other

Web app using spark Processing hdfs data from a spring web app

In the xdata project I had to convert a stand alone Spring web app into a “big-data” web app running on a hadoop cluster. To do that, I chose to use apache spark and spark-hive because it provided the most practical interface. I however could not find any documentation or tutorial on such use of spark in java spring web application.

To test how to setup such application, I made two getting-started prototypes:

  1. A spring+spark web app: it implements a very simple web service that reads and converts files either on local file system or on hdfs, using spark.
  2. A spring+spark-hive web app: its implements simple web services that generates a hive table and requests content from it.

The main difficulty is about run-time dependencies: dependencies used for compilation (such as provided through maven) are not working together at run-time (at the time of writing this post).

To run stand alone app, one should add the `$SPARK_HOME/lib/spark-assembly-X.X.X-hadoopY.Y.Y.jar` (provided by the spark installation) to the classpath. For the spark-hive case, the datanucleus dependencies found in spark lib should also be added. Because, web app are run by a servlet container, such as tomcat or jetty, this jar should be added:

  1. Either to the war file, such as recommended for web app. It is however a 140Mb dep. This is what is used in the spring+spark web app (1).
  2. Or to the class path of the servlet container. This is what is used in the spring+spark-hive web app (2) which is added to the maven jetty plugin.

For my (professional) work, I choosed the second solution: add the spark jar to maven jetty plugin which is used during development, and I included the jetty-runner to the project which I run with spark jar added to to classpath (using the `–jar` option).

Making jar by hand, with dep Back to the basic

I happened to have problem running a jar with dependencies, and I found my-self wondering: Do I really know how it works?

I don’t remember having ever done it manually, so I tried. It is a good thing to be sure of what we know. Here is the code, lets two classes A and B. A’s got the main and requires B:

package one.pack;

import two.pack.B;

public class A{
    public static void main(String args[]){
        B b = new B();
        b.println("Hello world");
package two.pack;

public class B{
    public void println(String message){
        System.out.println("B says: "+message);

Now, let’s compile everything

# compile into B.jar
javac -d .    # "-d .": automatically create folders two/pack/
jar -cf B.jar two/     

# compile into A.jar
javac -cp B.jar -d .   # include B.jar in the classpath
jar -cf A.jar one/

Finally, move A.jar and run it

mv A.jar /somewhere/else
java -cp /somewhere/else/A.jar:B.jar one.pack.A

Output: `B says: Hello world`

Conclusion, it is just as expected but now at least I’m sure of it :-)

WordPress server in the cloud Free and simple

Here is a simple method to deploy a wordpress server on the redhat cloud and attach it to a domain name, for free. This is what I have done to host this site.

Create a wordpress server:

  1. Create an account on OpenShift, or login
  2. Add an application, and select “wordpress”
  3. If the account was just created, choose its (unique) namespace
  4. Choose the application name and fill up the form
  5. Go to site to create the wordpress user account
  6. Add content and customize the site directly from wordpress

Access the server:

  1. install the rhc command, then run  rhc setup
  2. log in with ssh and/or git clone the wordpress repo from openshift
    • The url for ssh and git can be found on the page of the application once logged in openshit.
  3. Note for git: content should be added in the .openshift/[themes|plugins|...] folder of the repo. It will be copied in the suitable folder of the wordpress server (i.e. $OPENSHIFT_DATA/[themes|...]) by git push.

Finally, to attach this application to a domain name:

  1. Follow this 2 steps procedure
  2. Then in wordpress dashboard > setting > general, change the “site address (url)” with the url of your domain
  3. Note: to get a free (sub)domain name, check