`
sillycat
  • 浏览: 2551740 次
  • 性别: Icon_minigender_1
  • 来自: 成都
社区版块
存档分类
最新评论

Spark/Hadoop/Zeppelin Upgrade(1)

 
阅读更多
Spark/Hadoop/Zeppelin Upgrade(1)

1 Install JDK1.8 Manually
> wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u77-b03/jdk-8u77-linux-x64.tar.gz"

Unzip and place it in right place. Add bin to PATH.
> java -version
java version "1.8.0_77"

2 MAVEN Installation
http://sillycat.iteye.com/blog/2193762

> wget http://apache.arvixe.com/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz

Unzip and place it in the right place, add bin to PATH
> mvn --version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T10:41:47-06:00)
Maven home: /opt/maven
Java version: 1.8.0_77, vendor: Oracle Corporation

3 Protoc Installation
> git clone https://github.com/google/protobuf.git

> sudo apt-get install unzip

> sudo apt-get install autoconf

> sudo apt-get install build-essential libtool

configure make and make install, adding to the PATH.
> protoc --version
libprotoc 3.0.0

Error Exception:
'libprotoc 3.0.0', expected version is '2.5.0'

Solution:
Switch to 2.5.0
> git checkout tags/v2.5.0

> ./autogen.sh

> ./configure --prefix=/home/carl/tool/protobuf-2.5.0

> protoc --version
libprotoc 2.5.0

4 HADOOP Installation
> wget http://mirrors.ibiblio.org/apache/hadoop/common/hadoop-2.7.2/hadoop-2.7.2-src.tar.gz

> mvn package -Pdist,native -DskipTests -Dtar

Error Message:
Cannot run program "cmake"

Solution:
> sudo apt-get install cmake

Error Message:
An Ant BuildException has occured: exec returned: 1

Solution:
Try to get more detail
> mvn package -Pdist,native -DskipTests -Dtar -e

> mvn package -Pdist,native -DskipTests -Dtar -X

> sudo apt-get install zlib1g-dev

> sudo apt-get install libssl-dev

But it is not working.

So, switch to use the binary instead.
> wget http://apache.mirrors.tds.net/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz

http://sillycat.iteye.com/blog/2193762

http://sillycat.iteye.com/blog/2090186

Configure the JAVA_HOME
export JAVA_HOME="/opt/jdk"

PATH="/opt/hadoop/bin:$PATH"

Format the node with command
> hdfs namenode -format

Set up SSH on ubuntu-master, ubuntu-dev1, ubuntu-dev2
> ssh-keygen -t rsa

> cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Find the configuration file /opt/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME="/opt/jdk"

Follow documents and make the configurations.
http://sillycat.iteye.com/blog/2090186

Command to start DFS
> sbin/start-dfs.sh

Error Message:
java.io.IOException: Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
        at org.apache.hadoop.hdfs.DFSUtil.getNNServiceRpcAddressesForCluster(DFSUtil.java:875)
        at org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.refreshNamenodes(BlockPoolManager.java:155)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1125)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:428)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2370)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2257)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2304)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2481)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2505)

Solution:
Configure the same xml in master as well.
Change the slaves file to point to ubuntu-dev1 and ubuntu-dev2.

Error Message:
2016-03-28 13:31:14,371 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /home/carl/tool/hadoop-2.7.2/dfs/name is in an inconsistent state: storage directory doe
s not exist or is not accessible.
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:327)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:215)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:975)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:584)

Solution:
Make sure we have have the DFS directory
> mkdir -p /opt/hadoop/dfs/data

> mkdir -p /opt/hadoop/dfs/name

Check if the DFS is running
> jps
2038 SecondaryNameNode
1816 NameNode
2169 Jps

Visit the console page:
http://ubuntu-master:50070/dfshealth.html#tab-overview

Error Message:
2016-03-28 14:20:16,180 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: ubuntu-master/192.168.56.104:9000
2016-03-28 14:20:22,183 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ubuntu-master/192.168.56.104:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

> telnet ubuntu-master 9000
Trying 192.168.56.104...
telnet: Unable to connect to remote host: Connection refused

Solution:
I can talent that on ubuntu-master, but not on ubuntu-dev1 and ubuntu-dev2. I guess it is firewall problem.
>sudo ufw disable
Firewall stopped and disabled on system startup

Then I also delete the IPV6 related things in /etc/hosts.
> cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 ubuntu-dev2.ec2.internal

192.168.56.104   ubuntu-master
192.168.56.105   ubuntu-dev1
192.168.56.106   ubuntu-dev2
192.168.56.107   ubuntu-build

Start YARN cluster
> sbin/start-yarn.sh

http://ubuntu-master:8088/cluster

5 Spark Installation
http://sillycat.iteye.com/blog/2103457

download the latest spark version
> wget http://mirror.nexcess.net/apache/spark/spark-1.6.1/spark-1.6.1-bin-without-hadoop.tgz

Unzip and place in the right place.
http://spark.apache.org/docs/latest/running-on-yarn.html

> cat conf/spark-env.sh

HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop

We can Build Spark
http://spark.apache.org/docs/latest/building-spark.html

> build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.2 -Phive -DskipTests clean package

[WARNING] The requested profile "hadoop-2.7" could not be activated because it does not exist.

That means we need to use hadoop 2.6.4

6 Install NodeJS
http://sillycat.iteye.com/blog/2284695

> wget https://nodejs.org/dist/v4.4.0/node-v4.4.0.tar.gz

> sudo ln -s /home/carl/tool/node-v4.4.0 /opt/node-v4.4.0

7 Zeppelin Installation
http://sillycat.iteye.com/blog/2216604

http://sillycat.iteye.com/blog/2223622

http://sillycat.iteye.com/blog/2242559

Check git version
> git --version
git version 1.9.1

Java Version
> java -version
java version "1.8.0_77"

Check nodeJS version
> node --version && npm --version
v4.4.0
2.14.20

Install dependencies
> sudo apt-get install libfontconfig

Check MAVEN
> mvn --version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T10:41:47-06:00)

Add MAVEN parameters
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=1024m"

> mvn clean package -DskipTests -Pspark-1.6 -Dspark.version=1.6.1 -Phadoop-2.6 -Dhadoop.version=2.6.

References:
http://sillycat.iteye.com/blog/2244147

http://sillycat.iteye.com/blog/2193762

zeppelin
https://github.com/apache/incubator-zeppelin/blob/master/README.md
分享到:
评论

相关推荐

    Spark/Hadoop开发缺失插件-winutils.exe

    本地开发Spark/Hadoop报错“ERROR Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.” ...

    spark/hadoop读取s3所需要的外部依赖包

    1. **Spark与Hadoop的S3支持**: Spark和Hadoop原生并不直接支持Amazon S3,但可以通过添加特定的库来实现这一功能。这些库提供了与S3接口交互的API,使得数据可以像操作本地文件系统一样操作S3。 2. **aws-java-...

    重新编译好的contain-executor文件,指向/etc/hadoop/container-executor.cfg

    所以需要重新编译Container-executor,这边提供重新编译好的,默认加载配置文件路径/etc/hadoop/container-executor.cfg 使用方法: 1 替换/$HADOOP_HOME/bin/下的container-executor 2 创建/etc/hadoop目录,并将...

    扩展了对阿里云 E-MapReduce 上 Spark/Hadoop 数据源的支持

    Spark 1.3+ 介绍 本项目支持在Spark运行环境中与阿里云的基础服务OSS、ODPS、LogService、ONS等进行交互。 构建和安装 git clone https://github.com/aliyun/aliyun-emapreduce-datasources.git cd aliyun-...

    hadoop安装过程中的问题

    Hadoop/etc/hadoop/slaves 的IP地址要变。 5个重要的配置文件逐个检查,IP地址需要变 2.配置文件确认无错误,看日志: 从logs日志中寻找slave的namenode为什么没有起来。 3.最后发现是hdfs中存在上次的数据,删掉...

    spark+hadoop环境搭建

    sudo nano /home/hadoop/hadoop-2.6.4/etc/hadoop/mapred-site.xml ``` 配置 `mapreduce.framework.name` 为 `yarn`。 - **yarn-site.xml**: ```bash sudo nano /home/hadoop/hadoop-2.6.4/etc/hadoop/yarn-...

    大数据面试题,大数据成神之路开启...Flink/Spark/Hadoop/Hbase/Hive...-Python开发

    大数据面试题,大数据成神之路开启...Flink/Spark/Hadoop/Hbase/Hive... 已经更新100+篇~ 关注公众号~ 大数据成神之路目录 大数据开发基础篇 :skis: Java基础 :memo: NIO :open_book: 并发 :...

    hadoop/etc/hadoop/6个文件

    hadoop/etc/hadoop/6个文件 core-site.xml hadoop-env.sh hdfs-site.xml mapred-site.xml yarn-env.sh yarn-site.xml

    Spark和Hadoop的集成

    Hadoop的框架最核心的设计就是:HDFS和MapReduce。HDFS为海量的数据提供了存储,则MapReduce为海量的数据提供了计算。Storm是一个分布式的、容错的实时计算系统。两者整合,优势互补。

    hadoop单机配置方法

    1. **移动Hadoop压缩包** 将下载的Hadoop压缩包`hadoop-0.20.2.tar.gz`移动到`/usr/local`目录下,命令为`sudo mv /home/dm/hadoop-0.20.2.tar.gz .`. 2. **解压Hadoop** 使用`sudo tar xzf hadoop-0.20.2.tar....

    hadoop-lzo-master

    1.安装 Hadoop-gpl-compression ...1.2 mv hadoop-gpl-...bin/hadoop jar /usr/local/hadoop-1.0.2/lib/hadoop-lzo-0.4.15.jar com.hadoop.compression.lzo.LzoIndexer /home/hadoop/project_hadoop/aa.html.lzo

    hadoop中文实战

    Hadoop生态中的其他重要组件,如HBase(分布式数据库)、Hive(数据仓库工具)、Pig(数据流处理语言)和Spark(快速大数据处理引擎)也会有所提及。这些工具与Hadoop结合,提供了更高效、灵活的数据处理方案。例如...

    实验七:Spark初级编程实践

    实验中统计了 `/home/hadoop/test.txt` 和 `/user/hadoop/test.txt` 文件的行数,这展示了 Spark 对文本数据的基本操作。 3. **编写独立 Scala 应用程序** Spark 提供了 Scala、Java、Python 和 R 的 API,便于...

    Hadoop和Apache Spark环境配置.docx

    sudo nano /opt/hadoop/etc/hadoop/core-site.xml ``` 添加如下配置: ```xml &lt;name&gt;fs.defaultFS&lt;/name&gt; &lt;value&gt;hdfs://localhost:9000&lt;/value&gt; &lt;/property&gt; ``` - 编辑 `hdfs-site.xml`: ```bash ...

    hadoop安装与配置 Hadoop的安装与配置可以分成几个主要步骤: 1. 安装Java 2. 下载Hadoop 3. 配

    hadoop安装与配置 hadoop安装与配置 Hadoop的安装与配置可以分成几个主要步骤: 1. 安装Java 2. 下载Hadoop 3. 配置Hadoop ...编辑/usr/local/hadoop/etc/hadoop/hadoop-env.sh,设置JAVA_HOME: export JAVA_H

    运行成功的hadoop配置文件

    经过多次反复试验,完全可用的hadoop配置,有0.19的版本,也有0.20的版本。并且有脚本可以在两个版本之间...vi hadoop/conf/core-site.xml &lt;name&gt;hadoop.tmp.dir&lt;/name&gt; &lt;value&gt;/data/hadoop_tmp&lt;/value&gt; 祝好运!

    Python+Spark 2.0+Hadoop机器学习与大数据

    1. Hadoop生态系统的介绍,包括HDFS、MapReduce、YARN等核心组件的工作原理和配置。 2. Spark 2.0的安装、配置、编程模型,如RDD、DataFrame和DataSet,以及Spark SQL的使用。 3. Python在大数据处理中的应用,包括...

    Spark+Hadoop+IDE环境搭建

    1. **Hadoop作为数据源**:Spark可以通过Hadoop的API读取HDFS上的数据,使用`sc.textFile()`等方法。 2. **配置Hadoop依赖**:在Spark应用中,需要包含Hadoop的相关jar包,确保Spark能与Hadoop通信。 3. **使用Spark...

Global site tag (gtag.js) - Google Analytics