Build Hadoop Native Librairies

| Comments

The Hadoop native librairies are compiled for 32 bits plateforms. If you are using Hadoop on x64, you have probably been faced to the following issue :

 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 

For performances purposes, it better to recompile those libraries according to your plateform.

It’s a good idea to compile on the same architecture than your Hadoop production plateform. Of course, avoid any compilations on your production server Not sure that hadoop native librairies are compiled for 32 bits plateform ? You can check that with the following command :

file $HADOOP_HOME/lib/native/

Here the result :

file ELF 32-bit

Download Source

Visit and find the tarball of your Hadoop version. Download it :


Install dependencies :

sudo apt-get install cmake autoconf automake libtool gcc zlib1g-dev pkg-config libssl-dev openssl gcc g++ make maven zlib zlib1g-dev libcurl4-o

Install protobuf :

wget gunzip protobuf-2.5.0.tar.gz
tar -xvf protobuf-2.5.0.tar
cd protobuf-2.5.0
sudo ./configure --prefix=/usr sudo make
sudo make install

Compile Hadoop

Unzip you tarball :

tar -xzf hadoop-2.4.1-src.tar.gz

Enter in your folder :

cd hadoop-2.4.1-src/

Set your environment :

export Platform=x64
Compile :
mvn package -Pdist,native -DskipTests -Dtar

If you face issues while compiling, google is your friend ;). If all is OK, you will have this kind of output :

[INFO] ------------------------------------------------------------- [INFO] BUILD SUCCESS
[I------------------------------------------------------------------ [INFO] Total time: 5:27.684s
[INFO] Finished at: Wed Jul 02 19:33:51 CEST 2014
[INFO] Final Memory: 165M/834M
[INFO] -----------------------------------------------------------------------

You can find the librairies in this folder :

cd ./hadoop-dist/target/hadoop-2.4.1/lib/native

We can see all built librairies (“ls ­lh”) :

-rw-r--r-- 1 hadoop hadoop 1.1M Jul 2 19:07 libhadoop.a
lrwxrwxrwx 1 hadoop hadoop 18 Jul 2 19:07 -> -rwxr-xr-x 1 hadoop hadoop 650K Jul 2 19:07
-rw-r--r-- 1 hadoop hadoop 1.4M Jul 2 19:07 libhadooppipes.a
-rw-r--r-- 1 hadoop hadoop 421K Jul 2 19:07 libhadooputils.a
-rw-r--r-- 1 hadoop hadoop 373K Jul 2 19:07 libhdfs.a
lrwxrwxrwx 1 hadoop hadoop 16 Jul 2 19:07 -> -rwxr-xr-x 1 hadoop hadoop 245K Jul 2 19:07

At this step, you can check the plateform of the librairies :


The result seems to be OK : ELF 64-bit LSB shared object, x86-64

save them and archive this package :

tar -cvzf hadoop-native-libraries-2.4.1.tgz *

Copy librairies on your cluster

This step need to be reproduced for namenode and datanode. You just need to copy all those file in $HADOOP_HOME/lib/native (eg : /usr/local/hadoop/lib/native) :

$ rsync hadoop-native-libraries-2.4.1.tgz <your_hadoop_production_server>:/usr/local/hadoop/lib/native/

Enter in your Hadoop home (eg : /usr/local/hadoop/lib/native)

$ cd $HADOOP_HOME/lib/native

Extract archives :

$ tar -xzf hadoop-native-libraries-2.4.1.tgz

Configure your environment

You probably have a specific unix user for your hadoop cluster. Add those lines in your ~/.bashrc (or coure, edit paths according to your configuration):

export HADOOP_INSTALL=/usr/local/hadoop

Stop and restart Hadoop

Stop cluster :


Source again your bashrc :

source ~/.bashrc

Start Hadoop :


Test that the message disappears :

$ hadoop fs -ls /user/hadoop
Found 1 item
-rwxr-xr-x 1 hadoop supergroup 8 2014-07-01 14:06 /user/hadoop/toto.txt

Split Large File in Bash

| Comments

When you are dealing with large file, it’s complicated to share or manipulate them.

On linux the split command can be useful for you.

Basic Usage :

$ split [-l] [-b] filename prefix

For example, if you want to split a large file named “clients.csv” and create files with 100k records / file, apply the following command:

$ split -l 100000 clients.csv splitted_clients-
“splitted_client” is the prefix applied to each generated files