`
sillycat
  • 浏览: 2536092 次
  • 性别: Icon_minigender_1
  • 来自: 成都
社区版块
存档分类
最新评论

Data Solution 2019(3)Run Zeppelin in Single Docker

 
阅读更多
Data Solution 2019(3)Run Zeppelin in Single Docker

Exception when Start HDFS in Docker
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Solution:
Add this to ENV solve the problem.
export HDFS_NAMENODE_USER="root"
export HDFS_DATANODE_USER="root"
export HDFS_SECONDARYNAMENODE_USER="root"
export YARN_RESOURCEMANAGER_USER="root"
export YARN_NODEMANAGER_USER="root"
Exception when Start HDFS in Docker
Starting namenodes on [0.0.0.0]
0.0.0.0: /tool/hadoop-3.2.0/bin/../libexec/hadoop-functions.sh: line 982: ssh: command not found
Starting datanodes
localhost: /tool/hadoop-3.2.0/bin/../libexec/hadoop-functions.sh: line 982: ssh: command not found
Starting secondary namenodes [140815a59b06]
140815a59b06: /tool/hadoop-3.2.0/bin/../libexec/hadoop-functions.sh: line 982: ssh: command not found
Solution:
https://stackoverflow.com/questions/40801417/installing-ssh-in-the-docker-containers
Install and Start SSH Server
RUN apt-get install -y openssh-server
RUN mkdir /var/run/sshd
RUN ssh-keygen -q -t rsa -N '' -f /root/.ssh/id_rsa
RUN cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
#start ssh service
nohup /usr/sbin/sshd -D >/dev/stdout &
Exception when Start HDFS
ERROR: JAVA_HOME is not set and could not be found
Solution:
Add JAVA_HOME in Hadoop-env.sh
export JAVA_HOME="/usr/lib/jvm/java-8-oracle”
It seems HDFS is running fine in Docker.
But from the UI, I get error like this from UI http://localhost:9870/dfshealth.html#tab-overview
Exception:
Permission denied: user=dr.who, access=WRITE, inode="/":root:supergroup:drwxr-xr-x
Solution:
https://stackoverflow.com/questions/11593374/permission-denied-at-hdfs
Since this is my local Docker, I will just disable the permission in pdfs-site.xml
    <property>
          <name>dfs.permissions</name>
          <value>false</value>
    </property>
Check Docker Stats
> docker stats
My memory is only 2G, too small, maybe CPU is not power enough as well.
CONTAINER ID        NAME                CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
382b064708ec        ubuntu-spark-1.0    0.64%               1.442GiB / 1.952GiB   73.89%              216kB / 437kB       255MB / 10.1MB      256
> nproc
4
Maybe CPU is ok
I am using MAC, so the way to increase the memory is to open the tool
Docker Desktop —> References —> Advanced —> CPUs 4, Memory 2GB, Swap 1.0GB
https://stackoverflow.com/questions/44533319/how-to-assign-more-memory-to-docker-container
Clean up my Docker Images which I am not using anymore
> docker images | grep none | awk '{ print $3; }' | xargs docker rmi
Official Website
https://hub.docker.com/r/apache/zeppelin/dockerfile
Finally I made it working.
conf/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://0.0.0.0:9000</value>
    </property>
</configuration>
conf/hadoop-env.sh
export JAVA_HOME="/usr/lib/jvm/java-8-oracle”
export HADOOP_OS_TYPE=${HADOOP_OS_TYPE:-$(uname -s)}
case ${HADOOP_OS_TYPE} in
  Darwin*)
    export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.realm= "
    export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.kdc= "
    export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.conf= "
  ;;
esac
conf/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
          <name>dfs.permissions</name>
          <value>false</value>
    </property>
</configuration>
conf/spark-env.sh
HADOOP_CONF_DIR=/tool/hadoop/etc/hadoop
Need to put out zeppelin/conf and zeppelin/notebook out side and mapping to docker application to save data.
This is the important Dockerfile
#Run a kafka server side
#Prepare the OS
FROM            ubuntu:16.04
MAINTAINER      Carl Luo <luohuazju@gmail.com>
ENV DEBIAN_FRONTEND noninteractive
ENV JAVA_HOME       /usr/lib/jvm/java-8-oracle
ENV LANG            en_US.UTF-8
ENV LC_ALL          en_US.UTF-8
RUN apt-get -qq update
RUN apt-get -qqy dist-upgrade
#Prepare the denpendencies
RUN apt-get install -qy wget unzip vim
RUN apt-get install -qy iputils-ping
#Install SUN JAVA
RUN apt-get update && \
  apt-get install -y --no-install-recommends locales && \
  locale-gen en_US.UTF-8 && \
  apt-get dist-upgrade -y && \
  apt-get --purge remove openjdk* && \
  echo "oracle-java8-installer shared/accepted-oracle-license-v1-1 select true" | debconf-set-selections && \
  echo "deb http://ppa.launchpad.net/webupd8team/java/ubuntu xenial main" > /etc/apt/sources.list.d/webupd8team-java-trusty.list && \
  apt-key adv --keyserver keyserver.ubuntu.com --recv-keys EEA14886 && \
  apt-get update && \
  apt-get install -y --no-install-recommends oracle-java8-installer oracle-java8-set-default && \
  apt-get clean all
#Prepare for hadoop and spark
RUN apt-get install -y openssh-server
RUN mkdir /var/run/sshd
RUN ssh-keygen -q -t rsa -N '' -f /root/.ssh/id_rsa
RUN cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
RUN            mkdir /tool/
WORKDIR        /tool/
#add the software hadoop
ADD            install/hadoop-3.2.0.tar.gz /tool/
RUN            ln -s /tool/hadoop-3.2.0 /tool/hadoop
ADD            conf/core-site.xml /tool/hadoop/etc/hadoop/
ADD            conf/hdfs-site.xml /tool/hadoop/etc/hadoop/
ADD            conf/hadoop-env.sh /tool/hadoop/etc/hadoop/
#add the software spark
ADD            install/spark-2.4.0-bin-hadoop2.7.tgz /tool/
RUN            ln -s /tool/spark-2.4.0-bin-hadoop2.7 /tool/spark
ADD            conf/spark-env.sh /tool/spark/conf/
#add the software zeppelin
ADD            install/zeppelin-0.8.1-bin-all.tgz /tool/
RUN            ln -s /tool/zeppelin-0.8.1-bin-all /tool/zeppelin
#set up the app
EXPOSE  9000 9870 8080 4040
RUN     mkdir -p /app/
ADD     start.sh /app/
WORKDIR /app/
CMD    [ "./start.sh” ]
This is the Makefile which will make it working
IMAGE=sillycat/public
TAG=ubuntu-spark-1.0
NAME=ubuntu-spark-1.0
prepare:
    wget http://mirror.olnevhost.net/pub/apache/hadoop/common/hadoop-3.2.0/hadoop-3.2.0.tar.gz -P install/
    wget http://ftp.wayne.edu/apache/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz -P install/
    wget http://apache.claz.org/zeppelin/zeppelin-0.8.1/zeppelin-0.8.1-bin-all.tgz -P install/
docker-context:
build: docker-context
    docker build -t $(IMAGE):$(TAG) .
run:
    docker run -d -p 9870:9870 -p 9000:9000 -p 8080:8080 -p 4040:4040 -v $(shell pwd)/zeppelin/notebook:/tool/zeppelin/notebook -v $(shell pwd)/zeppelin/conf:/tool/zeppelin/conf --name $(NAME) $(IMAGE):$(TAG)
debug:
    docker run -ti -p 9870:9870 -p 9000:9000 -p 8080:8080 -p 4040:4040 -v $(shell pwd)/zeppelin/notebook:/tool/zeppelin/notebook -v $(shell pwd)/zeppelin/conf:/tool/zeppelin/conf --name $(NAME) $(IMAGE):$(TAG) /bin/bash
clean:
    docker stop ${NAME}
    docker rm ${NAME}
logs:
    docker logs ${NAME}
publish:
    docker push ${IMAGE}
This is the start.sh to start the application
#!/bin/sh -ex
#prepare ENV
export HDFS_NAMENODE_USER="root"
export HDFS_DATANODE_USER="root"
export HDFS_SECONDARYNAMENODE_USER="root"
export YARN_RESOURCEMANAGER_USER="root"
export YARN_NODEMANAGER_USER="root"
export SPARK_HOME="/tool/spark"
#start ssh service
nohup /usr/sbin/sshd -D >/dev/stdout &
#start the service
cd /tool/hadoop
bin/hdfs namenode -format
sbin/start-dfs.sh
cd /tool/zeppelin
bin/zeppelin.sh
After that, we can visit this 3 UI to work on our data
### Hadoop 3.2.0 Spark 2.4.0 Zeppelin 0.8.1
### HDFS
http://localhost:9870/explorer.html#/
### Zeppelin UI
http://localhost:8080/
### After you Run the First Demo JOB, Spark Jobs UI
http://localhost:4040/stages/

References:
https://stackoverflow.com/questions/48129029/hdfs-namenode-user-hdfs-datanode-user-hdfs-secondarynamenode-user-not-defined
https://www.cnblogs.com/sylar5/p/9169090.html
https://www.jianshu.com/p/b49712bbe044
https://stackoverflow.com/questions/40801417/installing-ssh-in-the-docker-containers
https://stackoverflow.com/questions/27504187/ssh-key-generation-using-dockerfile
https://github.com/twang2218/docker-zeppelin
分享到:
评论

相关推荐

    docker-zeppelin:用于基于网络的Spark笔记本Zeppelin的Docker构建

    此存储库已弃用,请参考齐柏林飞艇一个基于debian:jessie的Spark和 ...简单用法要启动Zeppelin,请提取latest图像并运行容器: docker pull dylanmei/zeppelindocker run --rm -p 8080:8080 dylanmei/zeppelinZeppeli

    docker-zeppelin:Dockerized Zepplin w Spark 1.5

    当地的构建映像并在安装了数据量的本地模式下运行docker build -t zeppelin:1.5.0 .mkdir /data && chmod -R 777 /datadocker run -d -v /data:/zeppelin/data -p 8080:8080 -p 8081:8081 zeppelin:1.5.0Zeppelin将...

    bigdata-docker-compose:Hadoop,Hive,Spark,Zeppelin和Livy

    大数据游乐场:通过Docker-compose与Hadoop,Hive,Spark,Zeppelin和Livy集群。 我希望能够轻松地处理各种大数据应用程序,即Amazon EMR中的那些应用程序。 理想情况下,这可以在一个命令中提出和拆除。 这就是这...

    zeppelin-blink_poc.zip

    zeppelin与blink的集成。对应github地址为https://github.com/zjffdu/zeppelin/tree/blink_poc 这个版本的zeppelin支持blink,这是相关文档 ...

    java8看不到源码-docker-spark-hive-zeppelin:docker-spark-hive-zeppelin

    看不到源码docker-spark-hive-zeppelin 组件 Spark v.2.2.2 蜂巢 v.2.3.3 齐柏林飞艇 v.0.8.0 Hadoop 3.1.1 Traefik 1.6.5 支持的基础设施 Kubernetes(k8s) 支持 docker-compose Docker Swarm(长时间未测试) ...

    spark4ds:适用于Spark的Docker Image和Zeppelin for Data Science

    火花壳角度的降价PostgreSQL的jdbc Python 基础弹性搜索简单用法要启动Zeppelin,请拉latest图像并运行容器(添加文件夹): docker pull josepcurto/sparkzeppelindocker run --rm --name zeppelin -p 8080:8080 -...

    zeppelin-authentication:Zeppelin的简单身份验证

    Docker的Zeppelin身份验证 是的笔记本。 Zeppelin不支持身份验证,这是多用户环境的要求。 这是尝试使用nginx和Docker容器提供简单的基于密码的身份验证。 创建密码文件 需要基本的密码文件进行身份验证。 我不随...

    藏经阁-nabling Apache Zeppelin_ and Spark_ for Data Science in the

    Apache Zeppelin 和 Spark 在数据科学企业应用中的启用 Apache Zeppelin 是一个基于 Web 的交互式笔记本,旨在使数据科学家和数据分析师更方便地处理大数据。 Apache Spark 是一个开源的数据处理引擎,能够高效地...

    docker-examples:dockerfile和docker-compose示例可启动C *,Stargate,Zeppelin,Prometheus,Grafana等

    码头工人的例子 dockerfile和docker-compose示例可启动C *,Stargate,Zeppelin,Prometheus,Grafana等

    zeppelin-0.8.0-bin-all.tgz

    3. **启动 Zeppelin**:在 Zeppelin 目录下运行 `bin/zeppelin-daemon.sh start` 命令启动服务。 4. **访问 Web UI**:打开浏览器,输入 `http://localhost:8080`,你将看到 Zeppelin 的登录界面。默认情况下,无需...

    zeppelin-0.8.1-bin-all.tgz

    3. **启动Zeppelin**:运行bin/zeppelin-daemon.sh start命令启动服务。 4. **访问Web界面**:通过浏览器打开http://localhost:8080,输入笔记名称,即可开始编写和执行代码。 5. **监控与维护**:使用bin/...

    apache zeppelin使用文档

    3. **启动服务**:运行相应的脚本来启动 Zeppelin 服务。默认情况下,Zeppelin 会在浏览器中通过 `http://&lt;IP&gt;:8080` 进行访问。 #### 四、Apache Zeppelin 的基本使用 ##### 1. 创建一个新的 Notebook - 打开 ...

    zeppelin-spark-notebook:Docker compose和一些笔记本可通过Spark沙箱快速启动并运行

    《使用Docker Compose快速启动Zeppelin Spark Notebook》 在当今大数据分析领域,Apache Zeppelin以其交互式的笔记本体验,成为了数据分析、数据科学以及机器学习项目中的热门工具。它支持多种语言,包括Python、...

    zeppelin素材

     3,将已经下载的图标文件拖拽至Zeppelin文件夹内,然后进入手机设置--Zeppelin--主题设置(Theme)选择你喜欢的图标吧!  4,图标选择【LOL图标】,【愿得一人心】,【白首不相离】,【I LOVE YOU】后续小编会...

    BigDataTools_for_intellij-213.5449.243

    With this plugin, you can conveniently work with Zeppelin notebooks, run applications with spark-submit, produce and consume messages with Kafka, monitor Spark and Hadoop YARN applications, and work ...

    Mastering Apache Spark 2.x Scale your m l and d l systems with SparkML, DL4j and

    • Apply Apache Spark in Elastic deployments using Jupyter and Zeppelin Notebooks, Docker, Kubernetes and the IBM Cloud • Understand internal details of cost based optimizers used in Catalyst, ...

    zeppelin&amp;说明书.rar

    3. **安装与配置**:使用`tar`命令解压下载的Zeppelin压缩包,然后修改`conf/zeppelin-env.sh`配置文件,设置`ZEPPELIN_HOME`指向你的解压目录,并根据实际环境调整其他配置项,如Java路径、内存分配等。 4. **启动 ...

    Apache Zeppelin 0.7.2 中文文档

    Apache Zeppelin 0.7.2 中文文档 Apache Zeppelin 0.7.2 中文文档 Apache Zeppelin 0.7.2 中文文档

    zeppelin简单安装.md

    cloudera manager6.2.1web界面集成zeppelin,由于原装的CDH6.2.1parcel包没有包含zeppelin组件,我们公司又用到了这个zeppelin组件,所以我临危受命开始安装zeppelin,刚开始的时候也是不太懂怎么安装,第一次接触最新的...

Global site tag (gtag.js) - Google Analytics