- 浏览: 2564588 次
- 性别:
- 来自: 成都
-
文章分类
最新评论
-
nation:
你好,在部署Mesos+Spark的运行环境时,出现一个现象, ...
Spark(4)Deal with Mesos -
sillycat:
AMAZON Relatedhttps://www.godad ...
AMAZON API Gateway(2)Client Side SSL with NGINX -
sillycat:
sudo usermod -aG docker ec2-use ...
Docker and VirtualBox(1)Set up Shared Disk for Virtual Box -
sillycat:
Every Half an Hour30 * * * * /u ...
Build Home NAS(3)Data Redundancy -
sillycat:
3 List the Cron Job I Have>c ...
Build Home NAS(3)Data Redundancy
Data Solution(1)Prepare ENV to Parse CSV Data on Single Ubuntu
Java Version
> java -version
java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)
Maven Version
> mvn --version
Apache Maven 3.6.0 (97c98ec64a1fdfee7767ce5ffb20918da4f719f3; 2018-10-24T13:41:47-05:00)
Prepare Protobuf
> git clone https://github.com/google/protobuf.git
> ./autogen.sh
Exception:
Can't exec "aclocal": No such file or directory at /usr/local/Cellar/autoconf/2.69/share/autoconf/Autom4te/FileUtils.pm line 326.
autoreconf: failed to run aclocal: No such file or directory
Possible Solution:
https://github.com/meritlabs/merit/issues/344
> brew install autoconf automake libtool berkeley-db4 pkg-config openssl boost boost-build libevent
Success this time
> ./autogen.sh
> ./configure --prefix=/Users/hluo/tool/protobuf-3.6.1
Make and Make install to place in the working directory under PATH
Check Version
> protoc --version
libprotoc 3.6.1
Prepare CMake ENV
> wget https://github.com/Kitware/CMake/releases/download/v3.14.0-rc2/cmake-3.14.0-rc2.tar.gz
Unzip and go to the directory
> ./bootstrap
Then make and make install, check version
> cmake --version
cmake version 3.14.0-rc2
Get Hadoop Source Codes
> wget http://apache.osuosl.org/hadoop/common/hadoop-3.2.0/hadoop-3.2.0-src.tar.gz
Unzip and build
> mvn package -Pdist.native -DskipTests -Dtar
Haha, Exception
org.apache.maven.plugin.MojoExecutionException: protoc version is 'libprotoc 3.6.1', expected version is '2.5.0'
Solution:
> git checkout tags/v2.5.0
>> git checkout tags/v2.5.0
> ./autogen.sh
> ./configure --prefix=/home/carl/tool/protobuf-2.5.0
> protoc --version
libprotoc 2.5.0
Build again
> mvn package -Pdist.native -DskipTests -Dtar
Read this document to figure out how to build
https://github.com/apache/hadoop/blob/trunk/BUILDING.txt
> mvn package -Pdist,native,docs -DskipTests -Dtar
Do not build native package on MAC
> mvn package -Pdist,docs -DskipTests -Dtar
Still not build like last time. I will directly use the binary then.
> wget http://mirror.olnevhost.net/pub/apache/hadoop/common/hadoop-3.2.0/hadoop-3.2.0.tar.gz
Unzip the file and place in the working directory
> cat etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
> cat etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Format the disk
> hdfs namenode -format
Set up SSH access on MAC
> cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
> ssh localhost
Open System Reference —> Sharing —> Remote Login
HDFS
> sbin/start-dfs.sh
Last PORT NUMBER
https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-common/ClusterSetup.html#Installation
http://localhost:9870/dfshealth.html#tab-overview
Start YARN
> sbin/start-yarn.sh
Starting resourcemanager
Starting nodemanagers
Something went wrong here
> less hadoop-hluo-nodemanager-machluo.local.log
2019-02-20 22:23:40,483 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: NMWebapps failed to start.
Caused by: com.google.inject.ProvisionException: Unable to provision, see the following errors:
1) Error injecting constructor, java.lang.NoClassDefFoundError: javax/activation/DataSource
at org.apache.hadoop.yarn.server.nodemanager.webapp.JAXBContextResolver.<init>(JAXBContextResolver.java:52)
Solution:
https://salmanzg.wordpress.com/2018/02/20/webhdfs-on-hadoop-3-with-java-9/
> vi etc/hadoop/hadoop-env.sh
export HADOOP_OPTS="--add-modules java.activation"
Module java.activation not found
Maybe it is because my local installation JAVA is jdk10 or jdk11.
Let me try on my ubuntu virtual machine.
Generate key pair if needed
> ssh-keygen
> cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Add JAVA_HOME into
> vi etc/hadoop/hadoop-env.sh
export JAVA_HOME=/opt/jdk
Start DFS
>hdfs namenode -format
>sbin/start-dfs.sh
http://ubuntu-master:9870/dfshealth.html#tab-overview
Start YARN
> sbin/start-yarn.sh
http://ubuntu-master:8088/cluster
Install Spark
> wget http://ftp.wayne.edu/apache/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz
Unzip the file and place in the working place
> cp conf/spark-env.sh.template conf/spark-env.sh
> vi conf/spark-env.sh
HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
> echo $SPARK_HOME
/opt/spark
Try Shell
> MASTER=yarn bin/spark-shell
Install Zeppelin
Download Binary
> wget http://apache.claz.org/zeppelin/zeppelin-0.8.1/zeppelin-0.8.1-bin-all.tgz
Unzip and place in working directory, Prepare the configuration file
> cp conf/zeppelin-env.sh.template conf/zeppelin-env.sh
export SPARK_HOME="/opt/spark"
export HADOOP_CONF_DIR="/opt/hadoop/etc/hadoop/"
> bin/zeppelin-daemon.sh start
Then we can visit the web app
Visit the page
http://ubuntu-master:8080/#/
I am running single mode right now
http://ubuntu-master:4040/jobs/
spark.master is ‘local’, that is why it runs on local machine, not on remote YARN, we can easily change that in the setting page
Put my File on ubuntu-master HDFS
Put the file into HDFS
Check the directory
> hdfs dfs -ls /
Create directory
> hdfs dfs -mkdir /user
> hdfs dfs -mkdir /user
Upload file
> hdfs dfs -put ./new-printing-austin.csv /user/yiyi/austin1.csv
After that we can see the file here
http://ubuntu-master:9870/explorer.html#/user/yiyi
> hdfs dfs -ls /user/yiyi/
Found 1 items
-rw-r--r-- 1 carl supergroup 105779 2019-02-21 12:44 /user/yiyi/austin1.csv
Other command lines
https://hadoop.apache.org/docs/r1.0.4/cn/hdfs_shell.html
Change the core-site.xml to accept 0.0.0.0, then I can access the HDFS
> hdfs dfs -ls hdfs://ubuntu-master:9000/user/yiyi/
Found 1 items
-rw-r--r-- 1 carl supergroup 105779 2019-02-21 12:44 hdfs://ubuntu-master:9000/user/yiyi/austin1.csv
These code works pretty well on NoteBook
val companyRawDF = sqlContext.read.format("csv")
.option("header", "true")
.option("inferSchema", "true")
.load("hdfs://ubuntu-master:9000/user/yiyi/austin1.csv")
val companyDF = companyRawDF.columns.foldLeft(companyRawDF)((curr, n) => curr.withColumnRenamed(n, n.replaceAll("\\s", "_")))
companyDF.printSchema()
companyDF.createOrReplaceTempView("company")
sqlContext.sql("select businessId, title, company_name, phone, email, bbbRating, bbbRatingScore from company where bbbRating = 'A+' limit 10 ").show()
%sql
select bbbRatingScore, count(1) value
from company
where phone is not null
group by bbbRatingScore
order by bbbRatingScore
Security
https://makeling.github.io/bigdata/39395030.html
References:
https://spark.apache.org/
https://hadoop.apache.org/releases.html
https://spark.apache.org/docs/latest/index.html
https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-common/ClusterSetup.html#Installation
Some other documents
Spark 2017 BigData Update(2)CentOS Cluster
Spark 2017 BigData Upadate(3)Notebook Example
Spark 2017 BigData Update(4)Spark Core in JAVA
Spark 2017 BigData Update(5)Spark Streaming in Java
Java Version
> java -version
java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)
Maven Version
> mvn --version
Apache Maven 3.6.0 (97c98ec64a1fdfee7767ce5ffb20918da4f719f3; 2018-10-24T13:41:47-05:00)
Prepare Protobuf
> git clone https://github.com/google/protobuf.git
> ./autogen.sh
Exception:
Can't exec "aclocal": No such file or directory at /usr/local/Cellar/autoconf/2.69/share/autoconf/Autom4te/FileUtils.pm line 326.
autoreconf: failed to run aclocal: No such file or directory
Possible Solution:
https://github.com/meritlabs/merit/issues/344
> brew install autoconf automake libtool berkeley-db4 pkg-config openssl boost boost-build libevent
Success this time
> ./autogen.sh
> ./configure --prefix=/Users/hluo/tool/protobuf-3.6.1
Make and Make install to place in the working directory under PATH
Check Version
> protoc --version
libprotoc 3.6.1
Prepare CMake ENV
> wget https://github.com/Kitware/CMake/releases/download/v3.14.0-rc2/cmake-3.14.0-rc2.tar.gz
Unzip and go to the directory
> ./bootstrap
Then make and make install, check version
> cmake --version
cmake version 3.14.0-rc2
Get Hadoop Source Codes
> wget http://apache.osuosl.org/hadoop/common/hadoop-3.2.0/hadoop-3.2.0-src.tar.gz
Unzip and build
> mvn package -Pdist.native -DskipTests -Dtar
Haha, Exception
org.apache.maven.plugin.MojoExecutionException: protoc version is 'libprotoc 3.6.1', expected version is '2.5.0'
Solution:
> git checkout tags/v2.5.0
>> git checkout tags/v2.5.0
> ./autogen.sh
> ./configure --prefix=/home/carl/tool/protobuf-2.5.0
> protoc --version
libprotoc 2.5.0
Build again
> mvn package -Pdist.native -DskipTests -Dtar
Read this document to figure out how to build
https://github.com/apache/hadoop/blob/trunk/BUILDING.txt
> mvn package -Pdist,native,docs -DskipTests -Dtar
Do not build native package on MAC
> mvn package -Pdist,docs -DskipTests -Dtar
Still not build like last time. I will directly use the binary then.
> wget http://mirror.olnevhost.net/pub/apache/hadoop/common/hadoop-3.2.0/hadoop-3.2.0.tar.gz
Unzip the file and place in the working directory
> cat etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
> cat etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Format the disk
> hdfs namenode -format
Set up SSH access on MAC
> cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
> ssh localhost
Open System Reference —> Sharing —> Remote Login
HDFS
> sbin/start-dfs.sh
Last PORT NUMBER
https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-common/ClusterSetup.html#Installation
http://localhost:9870/dfshealth.html#tab-overview
Start YARN
> sbin/start-yarn.sh
Starting resourcemanager
Starting nodemanagers
Something went wrong here
> less hadoop-hluo-nodemanager-machluo.local.log
2019-02-20 22:23:40,483 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: NMWebapps failed to start.
Caused by: com.google.inject.ProvisionException: Unable to provision, see the following errors:
1) Error injecting constructor, java.lang.NoClassDefFoundError: javax/activation/DataSource
at org.apache.hadoop.yarn.server.nodemanager.webapp.JAXBContextResolver.<init>(JAXBContextResolver.java:52)
Solution:
https://salmanzg.wordpress.com/2018/02/20/webhdfs-on-hadoop-3-with-java-9/
> vi etc/hadoop/hadoop-env.sh
export HADOOP_OPTS="--add-modules java.activation"
Module java.activation not found
Maybe it is because my local installation JAVA is jdk10 or jdk11.
Let me try on my ubuntu virtual machine.
Generate key pair if needed
> ssh-keygen
> cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Add JAVA_HOME into
> vi etc/hadoop/hadoop-env.sh
export JAVA_HOME=/opt/jdk
Start DFS
>hdfs namenode -format
>sbin/start-dfs.sh
http://ubuntu-master:9870/dfshealth.html#tab-overview
Start YARN
> sbin/start-yarn.sh
http://ubuntu-master:8088/cluster
Install Spark
> wget http://ftp.wayne.edu/apache/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz
Unzip the file and place in the working place
> cp conf/spark-env.sh.template conf/spark-env.sh
> vi conf/spark-env.sh
HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
> echo $SPARK_HOME
/opt/spark
Try Shell
> MASTER=yarn bin/spark-shell
Install Zeppelin
Download Binary
> wget http://apache.claz.org/zeppelin/zeppelin-0.8.1/zeppelin-0.8.1-bin-all.tgz
Unzip and place in working directory, Prepare the configuration file
> cp conf/zeppelin-env.sh.template conf/zeppelin-env.sh
export SPARK_HOME="/opt/spark"
export HADOOP_CONF_DIR="/opt/hadoop/etc/hadoop/"
> bin/zeppelin-daemon.sh start
Then we can visit the web app
Visit the page
http://ubuntu-master:8080/#/
I am running single mode right now
http://ubuntu-master:4040/jobs/
spark.master is ‘local
Put the file into HDFS
Check the directory
> hdfs dfs -ls /
Create directory
> hdfs dfs -mkdir /user
> hdfs dfs -mkdir /user
Upload file
> hdfs dfs -put ./new-printing-austin.csv /user/yiyi/austin1.csv
After that we can see the file here
http://ubuntu-master:9870/explorer.html#/user/yiyi
> hdfs dfs -ls /user/yiyi/
Found 1 items
-rw-r--r-- 1 carl supergroup 105779 2019-02-21 12:44 /user/yiyi/austin1.csv
Other command lines
https://hadoop.apache.org/docs/r1.0.4/cn/hdfs_shell.html
Change the core-site.xml to accept 0.0.0.0, then I can access the HDFS
> hdfs dfs -ls hdfs://ubuntu-master:9000/user/yiyi/
Found 1 items
-rw-r--r-- 1 carl supergroup 105779 2019-02-21 12:44 hdfs://ubuntu-master:9000/user/yiyi/austin1.csv
These code works pretty well on NoteBook
val companyRawDF = sqlContext.read.format("csv")
.option("header", "true")
.option("inferSchema", "true")
.load("hdfs://ubuntu-master:9000/user/yiyi/austin1.csv")
val companyDF = companyRawDF.columns.foldLeft(companyRawDF)((curr, n) => curr.withColumnRenamed(n, n.replaceAll("\\s", "_")))
companyDF.printSchema()
companyDF.createOrReplaceTempView("company")
sqlContext.sql("select businessId, title, company_name, phone, email, bbbRating, bbbRatingScore from company where bbbRating = 'A+' limit 10 ").show()
%sql
select bbbRatingScore, count(1) value
from company
where phone is not null
group by bbbRatingScore
order by bbbRatingScore
Security
https://makeling.github.io/bigdata/39395030.html
References:
https://spark.apache.org/
https://hadoop.apache.org/releases.html
https://spark.apache.org/docs/latest/index.html
https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-common/ClusterSetup.html#Installation
Some other documents
Spark 2017 BigData Update(2)CentOS Cluster
Spark 2017 BigData Upadate(3)Notebook Example
Spark 2017 BigData Update(4)Spark Core in JAVA
Spark 2017 BigData Update(5)Spark Streaming in Java
发表评论
-
Update Site will come soon
2021-06-02 04:10 1688I am still keep notes my tech n ... -
Stop Update Here
2020-04-28 09:00 325I will stop update here, and mo ... -
NodeJS12 and Zlib
2020-04-01 07:44 486NodeJS12 and Zlib It works as ... -
Docker Swarm 2020(2)Docker Swarm and Portainer
2020-03-31 23:18 375Docker Swarm 2020(2)Docker Swar ... -
Docker Swarm 2020(1)Simply Install and Use Swarm
2020-03-31 07:58 376Docker Swarm 2020(1)Simply Inst ... -
Traefik 2020(1)Introduction and Installation
2020-03-29 13:52 345Traefik 2020(1)Introduction and ... -
Portainer 2020(4)Deploy Nginx and Others
2020-03-20 12:06 437Portainer 2020(4)Deploy Nginx a ... -
Private Registry 2020(1)No auth in registry Nginx AUTH for UI
2020-03-18 00:56 446Private Registry 2020(1)No auth ... -
Docker Compose 2020(1)Installation and Basic
2020-03-15 08:10 383Docker Compose 2020(1)Installat ... -
VPN Server 2020(2)Docker on CentOS in Ubuntu
2020-03-02 08:04 469VPN Server 2020(2)Docker on Cen ... -
Buffer in NodeJS 12 and NodeJS 8
2020-02-25 06:43 396Buffer in NodeJS 12 and NodeJS ... -
NodeJS ENV Similar to JENV and PyENV
2020-02-25 05:14 491NodeJS ENV Similar to JENV and ... -
Prometheus HA 2020(3)AlertManager Cluster
2020-02-24 01:47 433Prometheus HA 2020(3)AlertManag ... -
Serverless with NodeJS and TencentCloud 2020(5)CRON and Settings
2020-02-24 01:46 343Serverless with NodeJS and Tenc ... -
GraphQL 2019(3)Connect to MySQL
2020-02-24 01:48 257GraphQL 2019(3)Connect to MySQL ... -
GraphQL 2019(2)GraphQL and Deploy to Tencent Cloud
2020-02-24 01:48 458GraphQL 2019(2)GraphQL and Depl ... -
GraphQL 2019(1)Apollo Basic
2020-02-19 01:36 334GraphQL 2019(1)Apollo Basic Cl ... -
Serverless with NodeJS and TencentCloud 2020(4)Multiple Handlers and Running wit
2020-02-19 01:19 318Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(3)Build Tree and Traverse Tree
2020-02-19 01:19 327Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(2)Trigger SCF in SCF
2020-02-19 01:18 304Serverless with NodeJS and Tenc ...
相关推荐
如果环境变量不存在或者无法转换为预期的类型,`env.Parse`函数将返回一个错误。在上面的代码中,我们用`panic`来处理错误,但在实际项目中,根据错误的类型和严重程度,你可能需要采取更适当的错误处理策略。 除了...
parse-env - 从 env 解析配置 parse-env使您的项目支持基于环境变量的配置变得容易。 它根据给定的模板推断环境变量名称。 该模板可能是配置本身。 如果找到与约定匹配的环境变量,它将使用该变量而不是原始配置中的...
data.to_csv('data.csv',encoding='utf-8') 将Excel文件转为csv文件的python脚本 #!/usr/bin/env python __author__ = lrtao2010 ''' Excel文件转csv文件脚本 需要将该脚本直接放到要转换的Excel文件同级目录下 ...
ubuntu-dev-env 设置ubuntu开发环境
在Ubuntu系统上安装配置HBase是一项重要的任务,尤其对于那些需要处理大规模数据的项目而言。HBase是一个分布式的、面向列的开源数据库,它构建于Hadoop之上,提供了实时读写和强一致性。以下是在Ubuntu上安装和配置...
接着,在`.env`文件中填写相应的Parse服务环境变量,包括App ID、REST Key、Master Key以及自托管Parse服务器的URL。 为了便于使用,需要创建一个服务提供者和门面。在`app/Providers`目录下创建一个新的服务提供者...
ENV常用快捷键,覆盖X:\Cadence17.4\Cadence\SPB_Data\pcbenv中ENV文件即可, W/w拉线,A/a更改,S/s修线,D/d删除,C/c复制,T/t修改文本,R/r旋转器件,H/h高亮,等等有好几个,如果要修改,用文本打开自己修改...
"env快捷键(中文说明).env"和"env快捷键.env"这两个文件都是用于存储快捷键设置的ENV文件。其中,“env快捷键(中文说明).env”可能包含中文注释,方便不熟悉英文的用户理解各个快捷键的功能。这些文件通常包含了...
1. **别名与命令组合**:除了单一命令,你还可以在`env`文件中定义一系列命令的组合,例如,可以定义一个快捷键同时完成放置元件和自动布线: ``` ! alias PlaceAndRoute = "place -select last; route" ``` 2....
npx cross-env VAR1=value1 VAR2=value2 ``` 4. **高级用法** `cross-env`还支持一些高级特性,如`--shell`选项,可以指定命令行解析器,这对于某些特定场景非常有用。另外,`cross-env`也提供了`--no-shell`...
标题中的“nCoV_data_analyse2.zip”暗示了一个关于新冠病毒(nCoV)数据的分析项目。这个项目的重点可能在于收集、处理和可视化与新冠病毒相关的数据。它以一个ZIP压缩包的形式提供,通常这样的文件包含多个相关...
.env文件是一种常见的配置文件格式,它可以存储键值对形式的配置信息,并且具有良好的可读性和易用性。 有时候IDEA网络不好下载不了,我这里提供一个下载包,下载到电脑上,从IDEA直接就能导入。 插件版本:3.4.2 ...
// and the location to your Parse cloud code var api = new ParseServer({ databaseURI: 'mongodb://localhost:27017/dev', cloud: '/home/myApp/cloud/main.js', // Provide an ...
比如`npx cross-env VAR1=value1 VAR2=value2 ...`。 5. **无侵入性**:cross-env并不会改变你的项目结构或代码,它只是在命令行层面帮助处理环境变量,使得开发者可以专注于编写代码。 6. **稳定性与维护**:作为...
allegro快捷键文件env
用SpringBoot + Spring Data JPA操作数据库 项目启动的时候 报了一个错 SpringBoot的版本是2.2.6.RELEASE org.springframework.beans.factory.BeanCreationException: Error creating bean with name '...
在Laravel的`.env`文件中,添加Parse的相关配置项,如APP_ID、MASTER_KEY、SERVER_URL等。同时,在`config/services.php`文件中创建一个parse配置数组,存放这些密钥。 一旦配置完成,我们就可以在Laravel的应用中...
Copy data between databases (copy data from production env to dev env for debugging or migrate your project to another cloud provider) Import data from RDB files - you can easily split data from ...
RT-Thread 的 Env 工具无法下载软件包 RT-Thread 是一个开源的实时操作系统, Env 工具是 RT-Thread 的一个重要组件,负责管理软件包。然而,在使用 Env 工具时,有时可能会遇到无法下载软件包的问题。本文将详细...