- 浏览: 2539570 次
- 性别:
- 来自: 成都
文章分类
最新评论
-
nation:
你好,在部署Mesos+Spark的运行环境时,出现一个现象, ...
Spark(4)Deal with Mesos -
sillycat:
AMAZON Relatedhttps://www.godad ...
AMAZON API Gateway(2)Client Side SSL with NGINX -
sillycat:
sudo usermod -aG docker ec2-use ...
Docker and VirtualBox(1)Set up Shared Disk for Virtual Box -
sillycat:
Every Half an Hour30 * * * * /u ...
Build Home NAS(3)Data Redundancy -
sillycat:
3 List the Cron Job I Have>c ...
Build Home NAS(3)Data Redundancy
Spark 2017 BigData Update(1)ENV on Spark 2.2.1 with Zeppelin on Local
Java Version
>java -version
java version "1.8.0_121"
Maven Version
>mvn --version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T10:41:47-06:00)
Protoc Version
>protoc --version
libprotoc 2.5.0
Currently Spark is with hadoop 2.7 version, so I plan to use these, Install hadoop 2.7.5
http://mirrors.ocf.berkeley.edu/apache/hadoop/common/hadoop-2.7.5/hadoop-2.7.5-src.tar.gz
Prepare CMake ENV on MAC
https://cmake.org/install/
>wget https://cmake.org/files/v3.10/cmake-3.10.1.tar.gz
unzip and go to that working directory
>./bootstrap
>make
>sudo make install
>cmake --version
cmake version 3.10.1
Unzip the source of hadoop try to build that
>mvn package -Pdist,native -DskipTests -Dtar
Still I can not build that on my MAC, so it is fine. I will use the binary as well.
Download the binary
>wget http://apache.osuosl.org/hadoop/common/hadoop-2.7.5/hadoop-2.7.5.tar.gz
Unzip the file and move to work directory
>sudo ln -s /Users/carl/tool/hadoop-2.7.5 /opt/hadoop-2.7.5
Prepare the Configuration file
>cat etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
>cat etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Format the file system
>hdfs namenode -format
Generate the key to access localhost
>ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
>cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Still have issue saying connection refused
>ssh localhost
ssh: connect to host localhost port 22: Connection refused
Solution:
https://bluishcoder.co.nz/articles/mac-ssh.html
Open System Reference —> Sharing —> Remote Login
Not work on Mac OS.
HDFS, but I need to type password during the process
>sbin/start-dfs.sh
Visit the webpage
http://localhost:50070/dfshealth.html#tab-overview
YARN
>sbin/start-yarn.sh
Visit the page
http://localhost:8088/cluster
Install Spark
>wget http://apache.spinellicreations.com/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz
unzip and place in the working directory
>sudo ln -s /Users/carl/tool/spark-2.2.1 /opt/spark-2.2.1
Prepare Configuration File
>cat conf/spark-env.sh
HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
>echo $SPARK_HOME
/opt/spark
Start the Spark Shell
>MASTER=yarn-client bin/spark-shell
yarn-client is changed after 2.0, I will use this instead
>MASTER=yarn bin/spark-shell
It stuck there for a while, maybe because of some stuck tasks, let me kill them
>bin/yarn application -kill application_1514320285035_0001
Install Zeppelin
https://zeppelin.apache.org/docs/0.7.3/install/install.html#installation
Download binary
>wget http://apache.mirrors.tds.net/zeppelin/zeppelin-0.7.3/zeppelin-0.7.3-bin-all.tgz
Place the file in directory
>sudo ln -s /Users/carl/tool/zeppelin-0.7.3 /opt/zeppelin-0.7.3
Prepare conf
>cat conf/zeppelin-env.sh
export SPARK_HOME="/opt/spark"
export HADOOP_CONF_DIR="/opt/hadoop/etc/hadoop/"
Start the notebook
>bin/zeppelin-daemon.sh start
stop the notebook
>bin/zeppelin-daemon.sh stop
Visit the webpage
http://localhost:8080/#/
You can see the task here as well
http://localhost:4040/stages/
spark.master is ‘local’, that is why it runs on local machine, not on remote YARN, we can easily change that in the setting page
References:
http://sillycat.iteye.com/blog/2286997
http://sillycat.iteye.com/blog/2288141
http://sillycat.iteye.com/blog/2405873
https://spark.apache.org/docs/latest/
https://zeppelin.apache.org/docs/0.7.3/install/install.html#installation
Java Version
>java -version
java version "1.8.0_121"
Maven Version
>mvn --version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T10:41:47-06:00)
Protoc Version
>protoc --version
libprotoc 2.5.0
Currently Spark is with hadoop 2.7 version, so I plan to use these, Install hadoop 2.7.5
http://mirrors.ocf.berkeley.edu/apache/hadoop/common/hadoop-2.7.5/hadoop-2.7.5-src.tar.gz
Prepare CMake ENV on MAC
https://cmake.org/install/
>wget https://cmake.org/files/v3.10/cmake-3.10.1.tar.gz
unzip and go to that working directory
>./bootstrap
>make
>sudo make install
>cmake --version
cmake version 3.10.1
Unzip the source of hadoop try to build that
>mvn package -Pdist,native -DskipTests -Dtar
Still I can not build that on my MAC, so it is fine. I will use the binary as well.
Download the binary
>wget http://apache.osuosl.org/hadoop/common/hadoop-2.7.5/hadoop-2.7.5.tar.gz
Unzip the file and move to work directory
>sudo ln -s /Users/carl/tool/hadoop-2.7.5 /opt/hadoop-2.7.5
Prepare the Configuration file
>cat etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
>cat etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Format the file system
>hdfs namenode -format
Generate the key to access localhost
>ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
>cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Still have issue saying connection refused
>ssh localhost
ssh: connect to host localhost port 22: Connection refused
Solution:
https://bluishcoder.co.nz/articles/mac-ssh.html
Open System Reference —> Sharing —> Remote Login
Not work on Mac OS.
HDFS, but I need to type password during the process
>sbin/start-dfs.sh
Visit the webpage
http://localhost:50070/dfshealth.html#tab-overview
YARN
>sbin/start-yarn.sh
Visit the page
http://localhost:8088/cluster
Install Spark
>wget http://apache.spinellicreations.com/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz
unzip and place in the working directory
>sudo ln -s /Users/carl/tool/spark-2.2.1 /opt/spark-2.2.1
Prepare Configuration File
>cat conf/spark-env.sh
HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
>echo $SPARK_HOME
/opt/spark
Start the Spark Shell
>MASTER=yarn-client bin/spark-shell
yarn-client is changed after 2.0, I will use this instead
>MASTER=yarn bin/spark-shell
It stuck there for a while, maybe because of some stuck tasks, let me kill them
>bin/yarn application -kill application_1514320285035_0001
Install Zeppelin
https://zeppelin.apache.org/docs/0.7.3/install/install.html#installation
Download binary
>wget http://apache.mirrors.tds.net/zeppelin/zeppelin-0.7.3/zeppelin-0.7.3-bin-all.tgz
Place the file in directory
>sudo ln -s /Users/carl/tool/zeppelin-0.7.3 /opt/zeppelin-0.7.3
Prepare conf
>cat conf/zeppelin-env.sh
export SPARK_HOME="/opt/spark"
export HADOOP_CONF_DIR="/opt/hadoop/etc/hadoop/"
Start the notebook
>bin/zeppelin-daemon.sh start
stop the notebook
>bin/zeppelin-daemon.sh stop
Visit the webpage
http://localhost:8080/#/
You can see the task here as well
http://localhost:4040/stages/
spark.master is ‘local
References:
http://sillycat.iteye.com/blog/2286997
http://sillycat.iteye.com/blog/2288141
http://sillycat.iteye.com/blog/2405873
https://spark.apache.org/docs/latest/
https://zeppelin.apache.org/docs/0.7.3/install/install.html#installation
发表评论
-
Stop Update Here
2020-04-28 09:00 310I will stop update here, and mo ... -
NodeJS12 and Zlib
2020-04-01 07:44 465NodeJS12 and Zlib It works as ... -
Docker Swarm 2020(2)Docker Swarm and Portainer
2020-03-31 23:18 361Docker Swarm 2020(2)Docker Swar ... -
Docker Swarm 2020(1)Simply Install and Use Swarm
2020-03-31 07:58 363Docker Swarm 2020(1)Simply Inst ... -
Traefik 2020(1)Introduction and Installation
2020-03-29 13:52 328Traefik 2020(1)Introduction and ... -
Portainer 2020(4)Deploy Nginx and Others
2020-03-20 12:06 419Portainer 2020(4)Deploy Nginx a ... -
Private Registry 2020(1)No auth in registry Nginx AUTH for UI
2020-03-18 00:56 428Private Registry 2020(1)No auth ... -
Docker Compose 2020(1)Installation and Basic
2020-03-15 08:10 364Docker Compose 2020(1)Installat ... -
VPN Server 2020(2)Docker on CentOS in Ubuntu
2020-03-02 08:04 444VPN Server 2020(2)Docker on Cen ... -
Buffer in NodeJS 12 and NodeJS 8
2020-02-25 06:43 376Buffer in NodeJS 12 and NodeJS ... -
NodeJS ENV Similar to JENV and PyENV
2020-02-25 05:14 464NodeJS ENV Similar to JENV and ... -
Prometheus HA 2020(3)AlertManager Cluster
2020-02-24 01:47 413Prometheus HA 2020(3)AlertManag ... -
Serverless with NodeJS and TencentCloud 2020(5)CRON and Settings
2020-02-24 01:46 330Serverless with NodeJS and Tenc ... -
GraphQL 2019(3)Connect to MySQL
2020-02-24 01:48 242GraphQL 2019(3)Connect to MySQL ... -
GraphQL 2019(2)GraphQL and Deploy to Tencent Cloud
2020-02-24 01:48 443GraphQL 2019(2)GraphQL and Depl ... -
GraphQL 2019(1)Apollo Basic
2020-02-19 01:36 320GraphQL 2019(1)Apollo Basic Cl ... -
Serverless with NodeJS and TencentCloud 2020(4)Multiple Handlers and Running wit
2020-02-19 01:19 306Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(3)Build Tree and Traverse Tree
2020-02-19 01:19 310Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(2)Trigger SCF in SCF
2020-02-19 01:18 284Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(1)Running with Component
2020-02-19 01:17 302Serverless with NodeJS and Tenc ...
相关推荐
### Hadoop 2.7.4 与 Spark 2.2.1 集群环境搭建详解 #### 一、概述 本文档旨在提供一份关于如何在三台虚拟机上搭建基于Hadoop 2.7.4 和 Spark 2.2.1 的大数据集群环境的详细指南。该文档适合于那些希望了解并实践...
1. 下载并解压"spark-2.2.1-bin-2.6.0-cdh5.14.2.tar.gz"到服务器的适当位置。 2. 配置Spark环境变量,如`SPARK_HOME`,并将Spark的bin目录添加到PATH环境变量中。 3. 使用Cloudera Manager配置YARN以支持Spark,...
《Hive on Spark实施详解》 在大数据处理领域,Hive和Spark分别是两个重要的组件,它们各自在数据仓库和分布式计算方面发挥着重要作用。当Hive与Spark结合使用,即Hive on Spark,可以实现更高效的数据处理。本文将...
- **spark-env.sh**:`cp /etc/spark/conf/spark-env.sh /opt/cloudera/parcels/CDH/lib/spark3/conf/` - **classpath.txt**:`cp /etc/spark/conf/classpath.txt /opt/cloudera/parcels/CDH/lib/spark3/conf/` -...
4. **配置spark-env.sh文件**:根据实验环境设置相关环境变量,如SPARK_MASTER_HOST、SPARK_MASTER_PORT等。 5. **分发Spark安装包**:使用scp命令将配置好的Spark目录复制到所有从节点。 6. **启动Spark集群**:在...
- **spark-env.sh**: Spark 的环境变量配置文件,可以设置一些特定于系统的环境变量,比如 Java 安装路径等。 启动集群的具体步骤如下: 1. 修改上述配置文件。 2. 在计划作为 Master 的节点上运行 `./sbin/start-...
mv spark-env.sh.template spark-env.sh vim spark-env.sh ``` 在 `spark-env.sh` 文件中设置: ```bash export SCALA_HOME=/usr/local/scala export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop export SPARK_...
export SPARK_HOME=/home/Hadoop/bigdata/spark-1.6.0-bin-hadoop2.6 export PATH=$PATH:$SPARK_HOME/bin export PATH=$PATH:$SPARK_HOME/sbin ``` - 保存并使配置生效。 ```bash source ~/.bashrc ``` **4...
1. **配置spark-env.sh** 在`$SPARK_HOME/conf/spark-env.sh`文件中配置Spark的环境变量,例如设置内存大小等。 2. **复制Spark文件夹** 在每个节点上创建Spark的安装目录,并将Master节点上的Spark文件复制到...
当前是环境文件, 需要配合配置文件使用,
安装和配置Spark 2.4.8时,你需要根据你的环境调整配置文件,如`spark-env.sh`或`spark-defaults.conf`,以适应你的Hadoop集群或本地环境。在使用Spark时,你可以通过`spark-submit`命令提交应用程序,或者直接在...
- **spark-env.sh**:此文件用于设置环境变量。在文件末尾添加以下内容: ```sh export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.6 export MASTER=yarn export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64...
复制`zeppelin-env.sh.template`到`zeppelin-env.sh`,并配置相关环境变量,包括Spark和Hadoop的路径,以及Python解释器。 ```bash cp /opt/zeppelin/conf/zeppelin-env.sh.template /opt/zeppelin/conf/zeppelin-...
- 在从节点的Spark配置文件`conf/spark-env.sh`中,设置以下关键参数: - `SPARK_MASTER`应指向Spark Master的地址(例如`spark://master_node_ip:7077`) - 根据需求调整`SPARK_WORKER_CORES`和`SPARK_WORKER_...
- 配置:根据环境修改conf目录下的配置文件,如`spark-defaults.conf`和`spark-env.sh`。 - 启动:启动Spark的Master和Worker节点,准备运行任务。 - 运行应用:使用Spark Shell或提交Spark应用程序到集群执行。 ...
3. 可选配置,如修改`spark/conf/spark-env.sh`以设置内存分配、JVM参数等。 4. 初始化Hadoop环境,确保Hadoop配置正确,并启动Hadoop服务。 5. 启动Spark相关服务,如`sbin/start-all.sh`启动所有Spark组件。 6. ...
1. **Spark**: Spark的核心在于它的弹性分布式数据集(RDD),这是一个容错的内存计算模型。它提供了一组高级APIs,支持批处理、交互式查询(Spark SQL)、实时流处理(Spark Streaming)和机器学习(MLlib)等多种...
3. **修改Spark配置**: 在`conf/spark-env.sh`中,设置`SPARK_MASTER`为`local[*]`,表示Spark将在本地启动多个工作线程。同时,根据需要配置Hadoop的相关路径,如`HADOOP_CONF_DIR`指向Hadoop的配置目录。 4. **...
- 配置`spark-env.sh`文件,根据实际环境设置`JAVA_HOME`, `HADOOP_CONF_DIR`等参数。 - 如果需要运行在Hadoop YARN上,还需要配置`yarn-site.xml`和`core-site.xml`等相关Hadoop配置文件。 - 启动Spark相关服务,如...