官网目前提供的下载包为32位系统的安装包,在linux 64位系统下安装后会一直提示错误“WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ”,但官网又不提供64位系统下的安装包,所以你只能自己去编译打包64位系统下的安装包。
如何查看自己的Hadoop是32位还是64位呢,这里我的Hadoop安装在/opt/hadoop-2.7.1/,那么在/opt/hadoop-2.7.1/lib/native目录下,可以查看文件libhadoop.so.1.0.0,里面会显示Hadoop的位数,这里我的是已经自己编译了Hadoop,所以是64位的,截图如下:
如果上述问题解决了,仍然提示“WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable”,那么请在/etc/profile里设置export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native:/usr/local/lib
对所有用户生效,或者在/home/hadoop/.bashrc里设置export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native:/usr/local/lib
只对hadoop用户生效,就能解决该问题。
2.编译hadoop2.7.1的正确方法(官网)
网上关于Hadoop编译的文章一大堆,编译前准备工作五花八门,很少有人告诉你为什么这么做,初学者只能被动接受整个过程。
当你的操作系统是64位linux,但在官网下载的hadoop2.7.1是32位的时候,你就得考虑自己编译打包,获得hadoop2.7.1在64位操作系统下的安装包了,问题是网上编译hadoop2.7.1一大堆,为什么要这么做你知道吗?别人为什么要这么做呢?
当你遇到上述问题的时候,最可靠的还是官网对于编译的说明,这个说明在hadoop2.7.1的源代码根目录下的BUILDING.txt文件里面,这里我下载的hadoop2.7.1源代码根目录在/opt/hadoop-2.7.1-src/,截图如下:
这里将BUILDING.txt重要内容分析如下:
1)编译前必备条件
来自BUILDING.txt
Requirements:
* Unix System
* JDK 1.7+
* Maven 3.0 or later
* Findbugs 1.3.9 (if running findbugs)
* ProtocolBuffer 2.5.0
* CMake 2.6 or newer (if compiling native code), must be 3.0 or newer on Mac
* Zlib devel (if compiling native code)
* openssl devel ( if compiling native hadoop-pipes and to get the best HDFS encryption performance )
* Jansson C XML parsing library ( if compiling libwebhdfs )
* Linux FUSE (Filesystem in Userspace) version 2.6 or above ( if compiling fuse_dfs )
* Internet connection for first build (to fetch all Maven and Hadoop dependencies)
2)hadoop maven模块介绍
- hadoop-project (Parent POM for all Hadoop Maven modules.All plugins & dependencies versions are defined here.)
- hadoop-project-dist (Parent POM for modules that generate distributions.)
- hadoop-annotations (Generates the Hadoop doclet used to generated the Javadocs)
- hadoop-assemblies (Maven assemblies used by the different modules)
- hadoop-common-project (Hadoop Common)
- hadoop-hdfs-project (Hadoop HDFS)
- hadoop-mapreduce-project (Hadoop MapReduce)
- hadoop-tools (Hadoop tools like Streaming, Distcp, etc.)
- hadoop-dist (Hadoop distribution assembler)
3)maven工程从哪里开始编译
来自BUILDING.txt
Where to run Maven from?
It can be run from any module. The only catch is that if not run from utrunk
all modules that are not part of the build run must be installed in the local
Maven cache or available in a Maven repository.
可以编译单个模块,可以在主模块下编译所有模块,唯一不同是,编译单个模块只会将变异的jar包放置于maven本地资源库中,在主模块下编译也会将各模块编译放置于maven本地资源库中,还会打包Hadoop针对该机的tar.gz安装包。
4)关于snappy
来自BUILDING.txt
Snappy build options:
Snappy is a compression library that can be utilized by the native code.
It is currently an optional component, meaning that Hadoop can be built with
or without this dependency.
* Use -Drequire.snappy to fail the build if libsnappy.so is not found.
If this option is not specified and the snappy library is missing,
we silently build a version of libhadoop.so that cannot make use of snappy.
This option is recommended if you plan on making use of snappy and want
to get more repeatable builds.
* Use -Dsnappy.prefix to specify a nonstandard location for the libsnappy
header files and library files. You do not need this option if you have
installed snappy using a package manager.
* Use -Dsnappy.lib to specify a nonstandard location for the libsnappy library
files. Similarly to snappy.prefix, you do not need this option if you have
installed snappy using a package manager.
* Use -Dbundle.snappy to copy the contents of the snappy.lib directory into
the final tar file. This option requires that -Dsnappy.lib is also given,
and it ignores the -Dsnappy.prefix option.
Hadoop支持用特定的压缩算法将要存储的文件进行压缩,在客户端访问时,又自动解压缩返回给客户端原始格式文件,目前Hadoop支持的压缩格式有LZO、SNAPPY等,这里SNAPPY默认是不支持的,如果要使得Hadoop支持SNAPPY,需要首先安装linux关于SNAPPY库,然后编译Hadoop得到安装包。
目前市面上普遍采用的压缩方式为SNAPPY,SNAPPY也是后期分布式列存储数据库HBASE的首选,而hbase必须依赖Hadoop环境,所以如果后期采用hbase又想用压缩SNAPPY的话,这里将SNAPPY一起编译进来是有必要的。
5)编译方式选择
来自BUILDING.txt
----------------------------------------------------------------------------------
Building distributions:
Create binary distribution without native code and without documentation:
$ mvn package -Pdist -DskipTests -Dtar
Create binary distribution with native code and with documentation:
$ mvn package -Pdist,native,docs -DskipTests -Dtar
Create source distribution:
$ mvn package -Psrc -DskipTests
Create source and binary distributions with native code and documentation:
$ mvn package -Pdist,native,docs,src -DskipTests -Dtar
Create a local staging version of the website (in /tmp/hadoop-site)
$ mvn clean site; mvn site:stage -DstagingDirectory=/tmp/hadoop-site
----------------------------------------------------------------------------------
大致意思如下:
6)Hadoop单机和集群安装方式介绍
来自BUILDING.txt
----------------------------------------------------------------------------------
Installing Hadoop
Look for these HTML files after you build the document by the above commands.
* Single Node Setup:
hadoop-project-dist/hadoop-common/SingleCluster.html
* Cluster Setup:
hadoop-project-dist/hadoop-common/ClusterSetup.html
----------------------------------------------------------------------------------
7)maven编译Hadoop时候内存设置项
来自BUILDING.txt
----------------------------------------------------------------------------------
If the build process fails with an out of memory error, you should be able to fix
it by increasing the memory used by maven -which can be done via the environment
variable MAVEN_OPTS.
Here is an example setting to allocate between 256 and 512 MB of heap space to
Maven
export MAVEN_OPTS="-Xms256m -Xmx512m"
----------------------------------------------------------------------------------
大致意思是,如果maven编译遇到内存方面错误,请先设置maven内存配置,例如linux下请设置export MAVEN_OPTS="-Xms256m -Xmx512m",这点对于后期编译spark源代码也一样好使
3.Hadoop编译必备条件准备
1)Unix System(操作系统为linux,操作系统请自行安装,没条件的就弄虚拟机)
2)JDK 1.7+
#1.首先不建议用openjdk,建议采用oracle官网JDK
#2.首先卸载系统自带的低版本或者自带openjdk
#首先用命令java -version 查看系统中原有的java版本
#然后用用 rpm -qa | gcj 命令查看具体的信息
#最后用 rpm -e --nodeps java-1.5.0-gcj-1.5.0.0-29.1.el6.x86_64卸载
#3.安装jdk-7u65-linux-x64.gz
#下载jdk-7u65-linux-x64.gz放置于/opt/java/jdk-7u65-linux-x64.gz并解压
cd /opt/java/
tar -zxvf jdk-7u65-linux-x64.gz
#配置linux系统环境变量
vi /etc/profile
#在文件末尾追加如下内容
export JAVA_HOME=/opt/java/jdk1.7.0_65
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
#使配置生效
source /etc/profile
#4检查JDK环境是否配置成功
java -version
3)Maven 3.0 or later
#1.下载apache-maven-3.3.3.tar.gz放置于/opt/下并解压
cd /opt
tar zxvf apache-maven-3.3.3.tar.gz
#2.配置环境变量
vi /etc/profile
#添加如下内容
MAVEN_HOME=/opt/apache-maven-3.3.3
export MAVEN_HOME
export PATH=${PATH}:${MAVEN_HOME}/bin
#3.使配置生效
source /etc/profile
#4.检测maven是否安装成功
mvn -version
#5.配置maven中央仓库,maven默认是从中央仓库去下载依赖的jar和插件,中央仓库在过完,对于国内,
有其他中央仓库镜像可供下载,这里我设置maven在国内镜像仓库oschina
#maven不懂的话先去了解下,最好不要修改/opt/apache-maven-3.3.3/conf/settings.xml配置文件,因为该文件对所有
用户生效,而是修改当前用户所在根目录,比如对于hadoop用户,你要修改的文件是/home/hadoop/.m2/settings.xml配置文件,
#在该文件中添加如下内容
目前oschina的maven库已经不可用,请选择阿里的maven库,配置如下:
<?xml version="1.0" encoding="UTF-8"?>
<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
<!--默认下载依赖放置于/home/hadoop/.m2/repository目录下,这里我指定我想存放于/opt/maven-localRepository-->
<localRepository>/opt/maven-localRepository</localRepository>
<pluginGroups></pluginGroups>
<proxies></proxies>
<servers></servers>
<mirrors>
<!--add by aperise start-->
<mirror>
<id>alimaven</id>
<name>aliyun maven</name>
<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
<mirrorOf>central</mirrorOf>
</mirror>
<!--add by aperise end-->
</mirrors>
<profiles>
</profiles>
</settings>
4)Findbugs 3.0.1 (if running findbugs)
#1.安装
tar zxvf findbugs-3.0.1.tar.gz
#2.配置环境变量
vi /etc/profile
#内容如下:
export FINDBUGS_HOME=/opt/findbugs-3.0.1
export PATH=$PATH:$FINDBUGS_HOME/bin
#3.使配置生效
source /etc/profile
#4.键入findbugs检测是否安装成功
findbugs
5)ProtocolBuffer 2.5.0
#1.安装(需要先安装cmake,条件六需要先做)
tar zxvf protobuf-2.5.0.tar.gz
cd protobuf-2.5.0
./configure --prefix=/usr/local/protobuf
make
make check
make install
#2.配置环境变量
vi /etc/profile
#编辑内容如下:
export PATH=$PATH:/usr/local/protobuf/bin
export PKG_CONFIG_PATH=/usr/local/protobuf/lib/pkgconfig/
#3.使配置生效,输入命令,source /etc/profile
#4.键入protoc --version检测是否安装成功
protoc --version
6)CMake 2.6 or newer (if compiling native code), must be 3.0 or newer on Mac
#1.安装前提
yum install gcc-c++
yum install ncurses-devel
#2.安装
#方法一是直接yum install cmake
#方法二下载tar.gz编译安装
#下载cmake-3.3.2.tar.gz编译并安装
tar -zxv -f cmake-3.3.2.tar.gz
cd cmake-3.3.2
./bootstrap
make
make install
#3.键入cmake检测是否安装成功
cmake
7)Zlib devel (if compiling native code)
yum -y install build-essential autoconf automake libtool cmake zlib1g-dev pkg-config libssl-dev zlib-devel
8)openssl devel ( if compiling native hadoop-pipes and to get the best HDFS encryption performance )
yum install openssl-devel
9)Jansson C XML parsing library ( if compiling libwebhdfs )
10)Linux FUSE (Filesystem in Userspace) version 2.6 or above ( if compiling fuse_dfs )
11Internet connection for first build (to fetch all Maven and Hadoop dependencies)
上面三个看自己情况选择,不是必须的。
在我搭建环境过程中,安装的东西很多,我个人当时还执行过如下命令安装一些缺失的库,命令如下:
yum -y install build-essential autoconf automake libtool zlib1g-dev pkg-config libssl-dev
4.正式编译Hadoop源代码
进入源代码根目录执行/opt/hadoop-2.7.1-src
cd /opt/hadoop-2.7.1-src
export MAVEN_OPTS="-Xms256m -Xmx512m"
mvn package -Pdist,native,docs -DskipTests -Dtar
源代码编译需要半小时以上,会从公网去下载源码依赖的jar包,所以请耐心等待编译完成,编译完成后提示信息如下:
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop Common ............................... SUCCESS [02:27 min]
[INFO] Apache Hadoop NFS .................................. SUCCESS [ 4.841 s]
[INFO] Apache Hadoop KMS .................................. SUCCESS [ 15.176 s]
[INFO] Apache Hadoop Common Project ....................... SUCCESS [ 0.055 s]
[INFO] Apache Hadoop HDFS ................................. SUCCESS [03:36 min]
[INFO] Apache Hadoop HttpFS ............................... SUCCESS [ 21.601 s]
[INFO] Apache Hadoop HDFS BookKeeper Journal .............. SUCCESS [ 4.182 s]
[INFO] Apache Hadoop HDFS-NFS ............................. SUCCESS [ 3.577 s]
[INFO] Apache Hadoop HDFS Project ......................... SUCCESS [ 0.036 s]
[INFO] hadoop-yarn ........................................ SUCCESS [ 0.033 s]
[INFO] hadoop-yarn-api .................................... SUCCESS [01:53 min]
[INFO] hadoop-yarn-common ................................. SUCCESS [ 23.525 s]
[INFO] hadoop-yarn-server ................................. SUCCESS [ 0.042 s]
[INFO] hadoop-yarn-server-common .......................... SUCCESS [ 8.896 s]
[INFO] hadoop-yarn-server-nodemanager ..................... SUCCESS [ 11.562 s]
[INFO] hadoop-yarn-server-web-proxy ....................... SUCCESS [ 3.324 s]
[INFO] hadoop-yarn-server-applicationhistoryservice ....... SUCCESS [ 6.115 s]
[INFO] hadoop-yarn-server-resourcemanager ................. SUCCESS [ 14.149 s]
[INFO] hadoop-yarn-server-tests ........................... SUCCESS [ 3.887 s]
[INFO] hadoop-yarn-client ................................. SUCCESS [ 5.333 s]
[INFO] hadoop-yarn-server-sharedcachemanager .............. SUCCESS [ 2.249 s]
[INFO] hadoop-yarn-applications ........................... SUCCESS [ 0.032 s]
[INFO] hadoop-yarn-applications-distributedshell .......... SUCCESS [ 1.915 s]
[INFO] hadoop-yarn-applications-unmanaged-am-launcher ..... SUCCESS [ 1.450 s]
[INFO] hadoop-yarn-site ................................... SUCCESS [ 0.049 s]
[INFO] hadoop-yarn-registry ............................... SUCCESS [ 4.165 s]
[INFO] hadoop-yarn-project ................................ SUCCESS [ 4.168 s]
[INFO] hadoop-mapreduce-client ............................ SUCCESS [ 0.077 s]
[INFO] hadoop-mapreduce-client-core ....................... SUCCESS [ 15.869 s]
[INFO] hadoop-mapreduce-client-common ..................... SUCCESS [ 15.401 s]
[INFO] hadoop-mapreduce-client-shuffle .................... SUCCESS [ 2.696 s]
[INFO] hadoop-mapreduce-client-app ........................ SUCCESS [ 5.780 s]
[INFO] hadoop-mapreduce-client-hs ......................... SUCCESS [ 4.528 s]
[INFO] hadoop-mapreduce-client-jobclient .................. SUCCESS [ 3.592 s]
[INFO] hadoop-mapreduce-client-hs-plugins ................. SUCCESS [ 1.262 s]
[INFO] Apache Hadoop MapReduce Examples ................... SUCCESS [ 3.969 s]
[INFO] hadoop-mapreduce ................................... SUCCESS [ 3.829 s]
[INFO] Apache Hadoop MapReduce Streaming .................. SUCCESS [ 2.999 s]
[INFO] Apache Hadoop Distributed Copy ..................... SUCCESS [ 7.995 s]
[INFO] Apache Hadoop Archives ............................. SUCCESS [ 1.425 s]
[INFO] Apache Hadoop Rumen ................................ SUCCESS [ 4.508 s]
[INFO] Apache Hadoop Gridmix .............................. SUCCESS [ 3.023 s]
[INFO] Apache Hadoop Data Join ............................ SUCCESS [ 1.896 s]
[INFO] Apache Hadoop Ant Tasks ............................ SUCCESS [ 1.633 s]
[INFO] Apache Hadoop Extras ............................... SUCCESS [ 2.256 s]
[INFO] Apache Hadoop Pipes ................................ SUCCESS [ 1.738 s]
[INFO] Apache Hadoop OpenStack support .................... SUCCESS [ 3.198 s]
[INFO] Apache Hadoop Amazon Web Services support .......... SUCCESS [ 8.421 s]
[INFO] Apache Hadoop Azure support ........................ SUCCESS [ 2.808 s]
[INFO] Apache Hadoop Client ............................... SUCCESS [ 10.124 s]
[INFO] Apache Hadoop Mini-Cluster ......................... SUCCESS [ 0.097 s]
[INFO] Apache Hadoop Scheduler Load Simulator ............. SUCCESS [ 3.395 s]
[INFO] Apache Hadoop Tools Dist ........................... SUCCESS [ 10.150 s]
[INFO] Apache Hadoop Tools ................................ SUCCESS [ 0.035 s]
[INFO] Apache Hadoop Distribution ......................... SUCCESS [01:48 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 14:12 min
[INFO] Finished at: 2015-11-04T16:08:14+08:00
[INFO] Final Memory: 139M/1077M
[INFO] ------------------------------------------------------------------------
编译后获得Hadoop安装包位置如下:
cd /opt/hadoop-2.7.1-src/hadoop-dist/target
ls
antrun hadoop-2.7.1.tar.gz maven-archiver
dist-layout-stitching.sh hadoop-dist-2.7.1.jar test-dir
dist-tar-stitching.sh hadoop-dist-2.7.1-javadoc.jar
hadoop-2.7.1 javadoc-bundle-options
5.Hadoop编译环境相关依赖包分享
相关推荐
本篇将详细阐述如何在Hadoop 2.7.1环境下搭建HBase 1.2.1集群,并进行性能优化,以提升系统效率。 首先,我们需要了解Hadoop和HBase的基本概念。Hadoop是基于分布式文件系统HDFS(Hadoop Distributed File System)...
本压缩包提供了这些组件的安装部署资源,便于快速搭建一个完整的Hadoop2.7.1、ZK3.5、HBase2.1和Phoenix5.1.0的基础环境。 首先,Hadoop是Apache开源项目,它提供了分布式文件系统(HDFS)和MapReduce计算框架,...
标题 "hadoop2.7.1+hbase2.1.4+zookeeper3.6.2.rar" 提供的信息表明这是一个包含Hadoop 2.7.1、HBase 2.1.4和ZooKeeper 3.6.2的软件集合。这个压缩包可能包含了这些分布式系统的安装文件、配置文件、文档以及其他...
### Hadoop2.7.1 + HBase1.3.5 在 CentOS6.5 虚拟机环境下的安装配置指南 #### 准备工作 为了确保 Hadoop 和 HBase 的顺利安装,需要提前做好一系列准备工作,包括安装 VMware、设置虚拟机、配置 CentOS 操作系统等...
与Hadoop 2.7.1一同提及的还有hive-1.2.1,Hive是基于Hadoop的数据仓库工具,可以将结构化的数据文件映射为一张数据库表,并提供SQL查询功能。在Hive 1.2.1中,可能包含的改进有: 1. 性能优化,包括更快的查询执行...
在探讨Hadoop2.7.1、HBase1.0、Hive1.2以及ZooKeeper3.4.6的安装和配置时,我们首先需要了解这些组件的基本功能以及它们在整个大数据处理框架中所扮演的角色。以下对这些知识点进行详细说明: ### Hadoop2.7.1 ...
9. **生态系统**:Hadoop 2.7.1还伴随着许多其他项目,如Hive(数据仓库工具)、Pig(数据分析平台)、HBase(NoSQL数据库)、Spark(快速大数据处理引擎)等,它们共同构成了Hadoop生态系统。 10. **安装与部署**...
在Hadoop2.7.1中,引入了YARN(Yet Another Resource Negotiator),它作为资源管理器,负责调度集群中的计算资源,提高了系统的资源利用率和任务调度效率。YARN将原本由JobTracker承担的任务调度和资源管理职责分离...
本文将深入探讨Hadoop Common 2.7.1与HBase 2.0.0之间的关系,以及在Windows环境下如何正确安装和配置这两个组件。 Hadoop是Apache软件基金会开发的一个开源框架,主要用于处理和存储大规模数据集。Hadoop Common是...
总之,Hadoop2.7.1安装包提供了在Linux和Windows环境下运行Hadoop所需的一切,让开发者和数据分析师能够利用分布式计算能力处理大规模数据。无论是学习Hadoop基础知识,还是在生产环境中部署大数据解决方案,这个...
在Windows环境下安装和配置Hadoop2.7.1和Spark2.0.0+时,确保正确放置hadoop.dll和winutils.exe文件,并配置相应的环境变量,是成功运行Spark作业的必要步骤。用户还需要注意Java环境的配置,因为Hadoop和Spark都是...
Hadoop 2.7.1 是一个重要的版本,在大数据处理领域具有广泛的影响力。这个版本包含了Hadoop的核心组件,包括HDFS(Hadoop Distributed File System)和MapReduce,这两个组件是Hadoop生态系统的基础。HDFS提供了...
Hadoop 2.7.1 是 Apache 基金会发布的一个开源分布式计算框架,它在大数据处理领域扮演着至关重要的角色。...通过持续的改进和优化,Hadoop 2.7.1 为用户提供了更加灵活、高效和可靠的分布式计算环境。
这个名为“hadoop-2.7.1.tar.gz.zip”的文件包含了Hadoop的2.7.1版本,这是一个非常重要的里程碑,因为它包含了对Hadoop生态系统的许多改进和修复。 首先,我们要明白文件的结构。这是一个压缩文件,最外层是.zip...
8. **生态系统**:Hadoop 2.7.1 还兼容众多生态系统组件,如 Hive(数据仓库工具)、Pig(数据分析工具)、HBase(NoSQL 数据库)、Mahout(机器学习库)等,构建了一个完整的大数据处理平台。 9. **配置和管理**:...
总的来说,`hadoop-2.7.1.tar.gz` 包含了搭建、配置和运行一个功能齐全的Hadoop环境所需的所有文件,为大数据处理提供了强大的基础。无论是初学者还是经验丰富的开发者,都能从中学习到关于Hadoop分布式计算框架的...
Hadoop 2.7.1 是一个开源框架,主要...通过阅读这份Hadoop 2.7.1的中文文档,无论是初学者还是经验丰富的开发者,都能深入了解Hadoop的工作原理,掌握其核心功能,并学会如何在实际环境中应用Hadoop解决大数据问题。