`
sillycat
  • 浏览: 2542430 次
  • 性别: Icon_minigender_1
  • 来自: 成都
社区版块
存档分类
最新评论

Data Solution 2019(12)Flink Processing Data

 
阅读更多
Data Solution 2019(12)Flink Processing Data

Master and Slaves Mode
> java -version
java version "1.8.0_221"

Start from here
https://ci.apache.org/projects/flink/flink-docs-release-1.9/getting-started/tutorials/local_setup.html
https://juejin.im/post/5d6610c65188257573636a86

> wget https://www-eu.apache.org/dist/flink/flink-1.9.1/flink-1.9.1-bin-scala_2.12.tgz
> tar zxvf flink-1.9.1-bin-scala_2.12.tgz
> mv flink-1.9.1 ~/tool/
> sudo ln -s /home/carl/tool/flink-1.9.1 /opt/flink-1.9.1
> sudo ln -s /opt/flink-1.9.1 /opt/flink

Start alone mode
> bin/start-cluster.sh
Starting cluster.
Starting standalonesession daemon on host rancher-home.
Starting taskexecutor daemon on host rancher-home.

Visit the UI Page
http://rancher-home:8081/#/overview

Add this to PATH
PATH=$PATH:/opt/flink/bin

Submit the task to single node
> flink run -m rancher-home:8081 ./examples/batch/WordCount.jar --input ./README.txt

Do the download and path on worker machine as well
> wget https://www-eu.apache.org/dist/flink/flink-1.9.1/flink-1.9.1-bin-scala_2.12.tgz
> sudo ln -s /opt/flink-1.9.1 /opt/flink
> cd /opt/flink

Try to join the cluster as a task manager
> bin/jobmanager.sh start rancher-home
Starting standalonesession daemon on host rancher-worker1.

>  bin/taskmanager.sh start
Starting taskexecutor daemon on host rancher-worker1.

> jps
13617 StandaloneSessionClusterEntrypoint
14388 Jps
14312 TaskManagerRunner

No, this does not work.


Zeppelin can connect to my cluster
https://zeppelin.apache.org/docs/0.5.5-incubating/interpreter/flink.html

Error
INFO [2019-10-30 23:20:15,968] ({flink-akka.actor.default-dispatcher-3} JobClientActor.java[tryToSubmitJob]:406) - Could not submit job Flink Java Job at Wed Oct 30 23:20:13 CDT 2019 (2c6bcfffb9d3bc0f5c12a72e16797080), because there is no connection to a JobManager.

Solution:
https://stackoverflow.com/questions/52274020/apache-zeppelin-flink-interpretor-can-not-connect-flink-1-5-2
It seems it is the support versions issues.

Check the libraries here
/opt/zeppelin/interpreter/flink
Maybe the version is just too low
flink-java-1.1.3.jar

Some explanation here
https://zeppelin.apache.org/docs/0.9.0-SNAPSHOT/setup/deployment/flink_and_spark_cluster.html

Build Zeppelin
> git clone https://github.com/apache/zeppelin.git
Build command
>  mvn clean package -DskipTests -Pspark-2.4 -Dflink.version=1.9.1 -Pscala-2.12

How to build
https://zeppelin.apache.org/docs/0.9.0-SNAPSHOT/setup/basics/how_to_build.html
> mvn clean package -Pbuild-distr -Pspark-2.3 -Dflink.version=1.9.1 -Phadoop3 -Pscala-2.11

Build on CentOS7
> mvn clean package -Pbuild-distr -Pspark-2.3 -Dflink.version=1.9.1 -Phadoop3 -Pscala-2.11 -rf :zengine-plugins-parent

Here is the command if these men build command failure with packages not found
>mvn install:install-file -DgroupId=org.jetbrains.pty4j -DartifactId=pty4j -Dversion=0.9.3 -Dpackaging=jar -Dfile=/home/carl/install/pty4j-0.9.3.jar

com.google.code.findbugs:jsr305:1.3.9

>mvn install:install-file -DgroupId=com.google.code.findbugs -DartifactId=jsr305 -Dversion=1.3.9 -Dpackaging=jar -Dfile=/home/carl/install/jsr305-1.3.9.jar 

com.google.code.findbugs:jsr305:3.0.0

>mvn install:install-file -DgroupId=com.google.code.findbugs -DartifactId=jsr305 -Dversion=3.0.0 -Dpackaging=jar -Dfile=/home/carl/install/jsr305-3.0.0.jar

After build, the binary should be here
/Users/hluo/install/zeppelin/zeppelin-distribution/target/zeppelin-0.9.0-SNAPSHOT.tar.gz

That is quick not stable, so I try to list the tag
> git tag
v0.8.1-docker
v0.8.1-rc1
v0.8.2
v0.8.2-docker
v0.8.2-rc1

> git checkout v0.8.2

It does not work well. I will downgrade the link version and try again.

Related versions and other softwares
https://flink.apache.org/ecosystem.html

This old version may work
> wget https://archive.apache.org/dist/flink/flink-1.1.3/flink-1.1.3-bin-hadoop2-scala_2.11.tgz
> sudo ln -s /home/carl/tool/flink-1.1.3 /opt/flink-1.1.3
> sudo ln -s /opt/flink-1.1.3 /opt/flink

Take references from 1.9.1 configuration
On Master
> cat conf/flink-conf.yaml
jobmanager.rpc.address: rancher-home
jobmanager.rpc.port: 6123
jobmanager.heap.size: 1024m
taskmanager.heap.size: 1024m
taskmanager.numberOfTaskSlots: 2
parallelism.default: 2

> cat conf/masters
rancher-home:8081

Make sure slaves is empty
> cat conf/slaves

Start master
>  bin/start-cluster.sh
Starting cluster.
Starting jobmanager daemon on host rancher-home.

> jps
7536 Jps
7427 JobManager

Visit the console UI
http://rancher-home:8081/#/overview

On the Slave Machine
> wget https://archive.apache.org/dist/flink/flink-1.1.3/flink-1.1.3-bin-hadoop2-scala_2.11.tgz
> sudo ln -s /home/carl/tool/flink-1.1.3 /opt/flink-1.1.3
> sudo ln -s /opt/flink-1.1.3 /opt/flink

Check the config
> cat conf/masters
rancher-home:8081

> cat conf/slaves

> cat conf/flink-conf.yaml
> jobmanager.rpc.address: rancher-home
> jobmanager.rpc.port: 6123
> jobmanager.heap.mb: 1024
> taskmanager.heap.mb: 1024

> taskmanager.numberOfTaskSlots: 2
> parallelism.default: 2

Start the Service
> bin/taskmanager.sh start
Starting taskmanager daemon on host rancher-worker1.

> jps
6632 TaskManager
6703 Jps

Refresh the console UI, we can see the 2 TaskManager joined the cluster
http://rancher-home:8081/#/overview

Run a local test, it works well.
> flink run -m rancher-home:6123 ./examples/batch/WordCount.jar --input ./README.txt

Zeppelin Configuration as follow:
Host: rancher-home
Port: 6123

Zeppelin 0.8.2 works well with Flink Cluster 1.1.3


References:
https://flink.apache.org/zh/usecases.html
https://flink.apache.org/
https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/deployment/cluster_setup.html
http://wuchong.me/blog/2018/11/07/5-minutes-build-first-flink-application/
https://www.infoq.cn/article/zbBAGroBgtytDiBs*Xq9

Installation
https://www.cnblogs.com/frankdeng/p/9400627.html
https://juejin.im/post/5d6610c65188257573636a86








分享到:
评论

相关推荐

    Declarative Data Processing With Java in Apache Flink

    ### Declarative Data Processing with Java in Apache Flink #### Apache Flink简介 Apache Flink 是一个分布式的流处理引擎,支持大规模数据处理任务。它提供了一系列丰富的API,包括Java、Scala以及Python等语言...

    Stream Processing with Apache Flink

    a core contributor to Flink’s graph processing API (Gelly), explains the fundamental concepts of parallel stream processing and shows you how streaming analytics differs from traditional batch data ...

    Stream+Processing+with+Apache+Flink2019.pdf

    Stream Processing With Apache Flink See how to get started with writing stream processing algorithms using Apache Flink. by reading a stream of Wikipedia edits and getting some meaningful data out of ...

    Interactive Data Analysis with Apache Flink

    使用Flink交互式大数据分析资料Interactive Data Analysis with Apache Flink

    2019年5月11号_Apache Flink China Meetup - 上海站 Meetup.zip

    综上所述,2019年5月11日的Apache Flink China Meetup - 上海站活动,聚焦于Flink的实时处理能力、事件时间模型、API设计、状态管理和容错机制,以及其在各种业务场景的应用。这样的交流活动对于提升开发者对Flink的...

    Learning Apache Flink电子版

    This book will be your definitive guide to batch and stream data processing with Apache Flink. The book begins by introducing the Apache Flink ecosystem, setting it up and using the DataSet and ...

    flink消费kafka到greenplum

    在实战中,`kknf-dataanalysis`可能包含了具体的代码示例,帮助开发者理解并实现这个过程。学习这些示例,可以帮助我们更好地理解和运用Flink、Kafka以及Greenplum的集成,提升实时数据处理的能力。 总之,结合...

    Stream Processing with Apache Flink - Vasiliki Kalavri & Fabian Hueske(2019).zip

    《Stream Processing with Apache Flink》是由Apache Flink项目管理委员会成员Vasiliki Kalavri和Fabian Hueske于2019年合著的一本权威性书籍,它深入介绍了Flink这一流行的开源流处理框架。这本书对于理解Flink的...

    flink-sql-demo-data-part2.tar.gz

    本文将围绕“flink-sql-demo-data-part2.tar.gz”这一压缩包,深入探讨其中包含的测试数据,以及这些数据如何在Flink SQL中发挥作用。 首先,我们关注到这个压缩包的名字“flink-sql-demo-data-part2.tar.gz”,这...

    Stream Processing with Apache Flink(Early Release)

    "Stream Processing with Apache Flink (Early Release)"很可能是一本深入介绍Flink技术的书籍,它可能包含了Flink的基本概念、核心特性、实战应用以及最新版本的功能。 1. **Flink基础**:Flink设计的核心理念是...

    流处理框架Stream Processing with Apache Flink.zip

    现在大数据处理里面比较公认的流处理框架,Stream Processing with Apache Flink;

    Flink FFA Flink Towards Streaming Data Warehouse

    Flink FFA Flink Towards Streaming Data Warehouse

    Introduction to Apache Flink

     Goals for Processing Continuous Event Data  Evolution of Stream Processing Technologies  First Look at Apache Flink  Flink in Production  Where Flink Fits Chapter 2 Stream-First Architecture  ...

    I Heart Logs Event Data, Stream Processing, and Data Integration

    ### I Heart Logs: Event Data, Stream Processing, and Data Integration #### 概述 《I Heart Logs: Event Data, Stream Processing, and Data Integration》是一本由Jay Kreps编写的书籍,该书聚焦于日志(Logs...

Global site tag (gtag.js) - Google Analytics