`
sillycat
  • 浏览: 2542377 次
  • 性别: Icon_minigender_1
  • 来自: 成都
社区版块
存档分类
最新评论

Hadoop Docker 2019 Version 3.2.1

 
阅读更多
Hadoop Docker 2019 Version 3.2.1

I try to set up a HDFS in Docker, which can be running on one server to provide DFS. That is it. The files there can be easily share with multiple machines.

Exception:
> systemctl start sshd
Failed to get D-Bus connection: Operation not permitted

Solution:
I can not fix that in CentOS. So I start to use Ubuntu instead.

Set Up Client and Try
> wget http://apache-mirror.8birdsvideo.com/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
> tar zxvf hadoop-3.2.1.tar.gz
> mv hadoop-3.2.1 ~/tool/
Place in working directory, and add this to Path
PATH=$PATH:/opt/hadoop/bin

Check version
> hdfs version
Hadoop 3.2.1
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r b3cbbb467e22ea829b3808f4b7b01d07e0bf3842
Compiled by rohithsharmaks on 2019-09-10T15:56Z
Compiled with protoc 2.5.0
From source with checksum 776eaf9eee9c0ffc370bcbc1888737
This command was run using /home/carl/tool/hadoop-3.2.1/share/hadoop/common/hadoop-common-3.2.1.jar

List the file
> hdfs dfs -ls hdfs://localhost:9000/
Found 1 items
drwxr-xr-x   - dr.who supergroup          0 2019-12-07 16:25 hdfs://localhost:9000/hello

The put command works
> hdfs dfs -put ./README.txt hdfs://localhost:9000/hello/README.txt

But I can not upload and download from web console. Check the developer tool, I found it is using the Docker Container hostname and 9864 port.
https://my.oschina.net/u/3163032/blog/1622221

Official Website
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/HttpAuthentication.html
https://note.louyj.com/blog/post/louyj/Authentication-for-Hadoop-HTTP-web-consoles
<property>
        <name>hadoop.http.filter.initializers</name>
        <value>org.apache.hadoop.security.AuthenticationFilterInitializer</value>
    </property>
    <property>
        <name>hadoop.http.authentication.type</name>
        <value>simple</value>
    </property>
    <property>
        <name>hadoop.http.authentication.token.validity</name>
        <value>12000</value>
    </property>
    <property>
        <name>hadoop.http.authentication.simple.anonymous.allowed</name>
        <value>false</value>
    </property>
    <property>
        <name>hadoop.http.authentication.signature.secret.file</name>
        <value>/tool/hadoop/hadoop-http-auth-signature-secret</value>
    </property>
    <property>
        <name>hadoop.http.authentication.cookie.domain</name>
        <value></value>
    </property>

The Hadoop-http-auth-signature-secret  is a text file with content hello!123
This will works
http://rancher-worker1:9870/explorer.html?user.name=hello!123#/

Warning:
2019-12-08 01:56:21,717 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

It is an info, I do not know how to disable it right now.

The most important conf/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://0.0.0.0:9000</value>
    </property>
    <property>
        <name>hadoop.http.filter.initializers</name>
        <value>org.apache.hadoop.security.AuthenticationFilterInitializer</value>
    </property>
    <property>
        <name>hadoop.http.authentication.type</name>
        <value>simple</value>
    </property>
    <property>
        <name>hadoop.http.authentication.token.validity</name>
        <value>12000</value>
    </property>
    <property>
        <name>hadoop.http.authentication.simple.anonymous.allowed</name>
        <value>false</value>
    </property>
    <property>
        <name>hadoop.http.authentication.signature.secret.file</name>
        <value>/tool/hadoop/hadoop-http-auth-signature-secret</value>
    </property>
    <property>
        <name>hadoop.http.authentication.cookie.domain</name>
        <value></value>
    </property>
</configuration>

Nothing special in conf/hadoop-env.sh
export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"
export HADOOP_OS_TYPE=${HADOOP_OS_TYPE:-$(uname -s)}
case ${HADOOP_OS_TYPE} in
  Darwin*)
    export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.realm= "
    export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.kdc= "
    export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.conf= "
  ;;
esac

Secret password file conf/hadoop-http-auth-signature-secret
hello123

Configuration file conf/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
          <name>dfs.permissions</name>
          <value>false</value>
    </property>
    <property>
        <name>dfs.namenode.http-address</name>
        <value>0.0.0.0:9870</value>
      </property>
      <property>
        <name>dfs.datanode.http.address</name>
        <value>0.0.0.0:9864</value>
      </property>
</configuration>

All steps are in Dockerfile
#Run a kafka server side

#Prepare the OS
FROM            ubuntu:16.04
MAINTAINER      Carl Luo <luohuazju@gmail.com>

ENV DEBIAN_FRONTEND noninteractive
ENV JAVA_HOME       /usr/lib/jvm/java-8-openjdk-amd64
ENV LANG            en_US.UTF-8
ENV LC_ALL          en_US.UTF-8

RUN apt-get -qq update
RUN apt-get -qqy dist-upgrade

#Prepare the denpendencies
RUN apt-get install -qy wget unzip vim
RUN apt-get install -qy iputils-ping

#Install JAVA
RUN apt-get update && \
    apt-get install -y --no-install-recommends locales && \
    locale-gen en_US.UTF-8 && \
    apt-get dist-upgrade -y && \
    apt-get install -qy openjdk-8-jdk

#Prepare for hadoop and spark
RUN apt-get install -y openssh-server
RUN mkdir /var/run/sshd
RUN ssh-keygen -q -t rsa -N '' -f /root/.ssh/id_rsa
RUN cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

RUN            mkdir /tool/
WORKDIR        /tool/
RUN            wget http://apache-mirror.8birdsvideo.com/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
RUN            tar zxvf hadoop-3.2.1.tar.gz
RUN            ln -s /tool/hadoop-3.2.1 /tool/hadoop
ADD            conf/core-site.xml /tool/hadoop/etc/hadoop/
ADD            conf/hdfs-site.xml /tool/hadoop/etc/hadoop/
ADD            conf/hadoop-env.sh /tool/hadoop/etc/hadoop/
ADD            conf/hadoop-http-auth-signature-secret /tool/hadoop/hadoop-http-auth-signature-secret

#set up the app
EXPOSE  9870 9000 9864
RUN     mkdir -p /app/
ADD     start.sh /app/
WORKDIR /app/
CMD    [ "./start.sh" ]

Makefile to help me build the images
IMAGE=sillycat/public
TAG=ubuntu-hadoop-1.0
NAME=ubuntu-hadoop-1.0
HOSTNAME=rancher-worker1
   
docker-context:

build: docker-context
    docker build -t $(IMAGE):$(TAG) .

run:
    docker run -d -p 9870:9870 -p 9000:9000 -p 9864:9864 --hostname ${HOSTNAME} --name $(NAME) $(IMAGE):$(TAG)

debug:
    docker run -ti -p 9870:9870 -p 9000:9000 -p 9864:9864 --hostname ${HOSTNAME} --name $(NAME) $(IMAGE):$(TAG) /bin/bash

clean:
    docker stop ${NAME}
    docker rm ${NAME}

logs:
    docker logs ${NAME}

publish:
    docker push ${IMAGE}

Shell script to start the service start.sh
#!/bin/sh -ex

#prepare ENV
export HDFS_NAMENODE_USER="root"
export HDFS_DATANODE_USER="root"
export HDFS_SECONDARYNAMENODE_USER="root"
export YARN_RESOURCEMANAGER_USER="root"
export YARN_NODEMANAGER_USER="root"

#start ssh service
nohup /usr/sbin/sshd -D >/dev/stdout &

#start the service
cd /tool/hadoop
bin/hdfs namenode -format
sbin/start-dfs.sh
tail -f /dev/null


References:
https://phoenixnap.com/kb/how-to-enable-ssh-centos-7
https://serverfault.com/questions/824975/failed-to-get-d-bus-connection-operation-not-permitted
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml
https://serverfault.com/questions/562756/how-to-remove-the-path-with-an-nginx-proxy-pass

Security
https://www.jianshu.com/p/51c39dfecff2
https://www.twblogs.net/a/5cfed4aebd9eee14029f459f/zh-cn


分享到:
评论

相关推荐

    hadoop-common-3.2.1.jar

    hadoop-common-3.2.1.jar

    hadoop2.7+spark3.2.1+pyspark3.2.1+python3.9

    比如,Hadoop 2.7与Spark 3.2.1是兼容的,同时PySpark 3.2.1是专门为Spark 3.2.1设计的,它可以很好地与Python 3.9集成。在实际操作中,可能需要配置环境变量,安装必要的依赖,设置HADOOP_HOME和SPARK_HOME,以及...

    docker配置hadoop

    001

    hadoop-docker:Hadoop Docker映像

    Apache Hadoop 2.7.1 Docker映像 注意:这是master分支-对于特定的Hadoop版本,请始终检查相关分支几周前,我们发布了Apache Hadoop 2.3 Docker映像-这很快成为DockerHadoop映像。 在成功获得先前的Hadoop Docker,...

    spark-3.2.1 不集成hadoop安装包

    在标题"spark-3.2.1 不集成hadoop安装包"中,我们看到的是Spark的一个特定版本——3.2.1,而且特别强调了这个版本不包含Hadoop的集成。这意味着这个Spark发行版没有内置对Hadoop的支持,用户需要自己配置和管理与...

    hadoop-3.2.1.zip

    这个名为“hadoop-3.2.1.zip”的压缩包包含的是Hadoop的最新稳定版本,即3.2.1。这个版本的发布旨在为用户提供了更高效、稳定和安全的数据处理能力,尤其适用于构建大规模的集群环境。 Hadoop框架主要由两个关键...

    hadoop-3.2.1.tar.gz

    Hadoop是Apache的一款开源框架,使用java语言编写,可以通过编写简单的程序来实现大规模数据集合的分布式计算。工作在Hadoop框架上的应用可以工作在分布式存储和计算机集群计算的环境上面。 Hadoop具有高扩展性,...

    hadoop3.2.1 各组件安装教程

    hadoop3.2.1 各组件安装教程

    hadoop-eclipse-plugin-3.2.1.jar

    hadoop-eclipse-plugin.jar插件基于Ubuntu18.04和Hadoop-3.2.1编译的,最后可以在eclipse创建Map Reduce文件

    hadoop-3.2.1.rar

    `hadoop-3.2.1.rar`是一个包含Hadoop 3.2.1版本的压缩包,适用于Windows系统。以下将详细介绍这个版本Hadoop的安装、配置以及`winutils`的相关知识点。 1. **Hadoop 3.2.1简介** Hadoop 3.2.1是Hadoop的重要版本,...

    docker-hadoop:Apache Hadoop Docker映像

    docker-hadoop:Apache Hadoop Docker映像

    spark-3.2.1 安装包 下载 hadoop3.2

    Spark 3.2.1是该框架的一个稳定版本,提供了对Hadoop 3.2的支持,这意味着它可以很好地集成到Hadoop生态系统中,利用Hadoop的存储和计算能力。Hadoop是一个分布式文件系统(HDFS)和MapReduce计算模型的集合,为大...

    hadoop-3.2.1

    Hadoop 3.2.1是这个框架的一个重要版本,它提供了许多改进和新特性,以提升大数据处理的性能、稳定性和可扩展性。 首先,我们来看Hadoop 3.2.1的核心组件: 1. **HDFS(Hadoop Distributed File System)**:...

    详解从 0 开始使用 Docker 快速搭建 Hadoop 集群环境

    Docker 本身就是基于 Linux 的,所以首先以我的一台服务器做实验。虽然最后跑 wordcount 已经由于内存不足而崩掉,但是之前的过程还是可以参考的。 连接服务器 使用 ssh 命令连接远程服务器。 ssh root@[Your IP ...

    Docker搭建Hadoop集群

    里面包含Hadoop2.7,jdk1.8以及写好的Dockerfile文件,还有配置文件

    rpi-hadoop-3.2.1.tar.gz

    标题中的“rpi-hadoop-3.2.1.tar.gz”是一个针对树莓派4B优化的Apache Hadoop 3.2.1版本的压缩包文件。这个版本的Hadoop通常是为了适应树莓派的ARM架构,特别是ARMv7处理器,因为官方提供的Hadoop构建通常是基于X86...

    hadoop-3.2.1-src.tar.gz

    Hadoop 3.2.1 是该框架的一个稳定版本,提供了许多增强的功能和优化。源码分析对于深入理解和定制 Hadoop 系统至关重要,下面将对 Hadoop 的主要组件、设计框架、实现细节和代码风格进行详细阐述。 1. **HDFS**: ...

    spark-3.2.1 安装包 集成 hadoop2.7

    总的来说,Spark 3.2.1集成Hadoop 2.7的安装包提供了一种简便的方式,让用户能够在Hadoop环境中快速部署和使用Spark,进行大规模数据处理任务。这个压缩包包含了运行Spark所需的所有组件,简化了安装和配置过程,...

    在docker上部署hadoop集群

    教程:在linux虚拟机下(centos),通过docker容器,部署hadoop集群。一个master节点和三个slave节点。

    hadoop HDP docker镜像部署

    hadoop HDP docker镜像部署

Global site tag (gtag.js) - Google Analytics