pig用上lzo的相关配置 -

Taoo

浏览: 295510 次
性别:
来自: 北京

最近访客更多访客>>

huageng520

leisureWong

jack1007

kingding

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

pig用上lzo的相关配置

博客分类：

pig lzo

转载：http://stackoverflow.com/questions/7277621/how-to-get-pig-to-work-with-lzo-files
还没有试过。

----------------------
I recently got this to work and wrote up a wiki on it for my coworkers. Here's an excerpt detailing how to get PIG to work with lzos. Hope this helps someone!

NOTE: This is written with a Mac in mind. The steps will be almost identical for other OS', and this should definitely give you what you need to know to configure on Windows or Linux, but you will need to extrapolate a bit (obviously, change Mac-centric folders to whatever OS you're using, etc...).
Hooking PIG up to be able to work with LZOs

This was by far the most annoying and time-consuming part for me-- not because it's difficult, but because there are 50 different tutorials online, none of which are all that helpful. Anyway, what I did to get this working is:

    1，Clone hadoop-lzo from github at https://github.com/kevinweil/hadoop-lzo.

    2，Compile it to get a hadoop-lzo*.jar and the native *.o libraries. You'll need to compile this on a 64bit machine.

    3，Copy the native libs to $HADOOP_HOME/lib/native/Mac_OS_X-x86_64-64/.

    4，Copy the java jar to $HADOOP_HOME/lib and $PIG_HOME/lib

    5，Then configure hadoop and pig to have the property java.library.path point to the lzo native libraries. You can do this in $HADOOP_HOME/conf/mapred-site.xml with:

    <property>
        <name>mapred.child.env</name>
        <value>JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native/Mac_OS_X-x86_64-64/</value>
    </property>

    6，Now try out grunt shell by running pig again, and make sure everything still works. If it doesn't, you probably messed up something in mapred-site.xml and you should double check it.

    7，Great! We're almost there. All you need to do now is install elephant-bird. You can get that from https://github.com/kevinweil/elephant-bird (clone it).

    8，Now, in order to get elephant-bird to work, you'll need quite a few pre-reqs. These are listed on the page mentioned above, and might change, so I won't specify them here. What I will mention is that the versions on these are very important. If you get an incorrect version and try running ant, you will get errors. So, don't try grabbing the pre-reqs from brew or macports as you'll likely get a newer version. Instead, just download tarballs and build for each.

    9，command: ant in the elephant-bird folder in order to create a jar.

    10，For simplicity's sake, move all relevant jars (hadoop-lzo-x.x.x.jar and elephant-bird-x.x.x.jar) that you'll need to register frequently somewhere you can easily find them. /usr/local/lib/hadoop/... works nicely.

    11，Try things out! Play around with loading normal files and lzos in grunt shell. Register the relevant jars mentioned above, try loading a file, limiting output to a manageable number, and dumping it. This should all work fine whether you're using a normal text file or an lzo.

分享到：