- 浏览: 2564730 次
- 性别:
- 来自: 成都
-
文章分类
最新评论
-
nation:
你好,在部署Mesos+Spark的运行环境时,出现一个现象, ...
Spark(4)Deal with Mesos -
sillycat:
AMAZON Relatedhttps://www.godad ...
AMAZON API Gateway(2)Client Side SSL with NGINX -
sillycat:
sudo usermod -aG docker ec2-use ...
Docker and VirtualBox(1)Set up Shared Disk for Virtual Box -
sillycat:
Every Half an Hour30 * * * * /u ...
Build Home NAS(3)Data Redundancy -
sillycat:
3 List the Cron Job I Have>c ...
Build Home NAS(3)Data Redundancy
AirFlow Apache(1)Introduction and Install on Ubuntu
Try the Quick Start in Ubuntu System
Install on Ubuntu
Init the Home Directory
> export AIRFLOW_HOME=~/airflow
Prepare Python VERSION
https://sillycat.iteye.com/blog/2436508
Install from the GitHub
> git clone https://github.com/pyenv/pyenv.git ~/.pyenv
Add the cloned directory to PATH, edit the .profile file in my case
export PYENV_ROOT="$HOME/.pyenv"
export PATH="$PYENV_ROOT/bin:$PATH"
if command -v pyenv 1>/dev/null 2>&1; then
eval "$(pyenv init -)"
fi
Check Versions
> pyenv versions
* system (set by /home/carl/.pyenv/version)
> sudo apt-get install bzip2 libbz2-dev
> sudo apt-get install libreadline6 libreadline6-dev
> sudo apt-get install sqlite3 libsqlite3-dev
Install 3.6.0
> pyenv install 3.6.0
> pyenv install 2.7.12
> pyenv install 3.7.2
Check versions
> pyenv versions
system
2.7.12
* 3.6.0 (set by /home/carl/.pyenv/version)
3.7.2
Set Global to be 3.6.0
> pyenv global 3.6.0
Check version
> python --version
Python 3.6.0
Upgrade pip
> pip install --upgrade pip
> pip --version
pip 18.1 from /home/carl/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pip (python 3.6)
PIP Install airflow
> pip install apache-airflow
Fail with Exception:
raise RuntimeError("By default one of Airflow's dependencies installs a GPL "
Solution:
https://stackoverflow.com/questions/52203441/error-while-install-airflow-by-default-one-of-airflows-dependencies-installs-a
> SLUGIFY_USES_TEXT_UNIDECODE=yes pip install apache-airflow
Init the Database
> airflow initdb
Start the Web Server, default PORT is 8080
> airflow webserver -p 8080
Start the Scheduler
> airflow scheduler
Visit the webUI
http://ubuntu-master:8080/admin/
Check Version
> airflow version
[2019-01-12 23:28:28,723] {__init__.py:51} INFO - Using executor SequentialExecutor
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
v1.10.1
List all dags
> airflow list_dags
example_bash_operator
example_branch_dop_operator_v3
example_branch_operator
example_http_operator
example_passing_params_via_test_command
example_python_operator
example_short_circuit_operator
example_skip_dag
example_subdag_operator
example_subdag_operator.section-1
example_subdag_operator.section-2
example_trigger_controller_dag
example_trigger_target_dag
example_xcom
latest_only
latest_only_with_trigger
test_utils
tutorial
Disable the Example, Set the configuration file
load_examples = False
expose_config = True
Delete the examples
> airflow delete_dag test_utils
Delete without ask
> airflow delete_dag -y example_branch_operator
https://juejin.im/post/5a0a39c25188254d2b6da2a3
DAG - Directed Acyclic Graph - Group of Tasks. Python file in DAG directory
Some useful examples are here
https://github.com/apache/airflow/tree/master/airflow/example_dags
Create First Dag
> cd ~/airflow/
> mkdir dags
> cd dags
> vi first_bash.py
# -*- coding: utf-8 -*-
"""
### Tutorial Documentation
Documentation that goes along with the Airflow tutorial located
[here](https://airflow.apache.org/tutorial.html)
"""
from datetime import timedelta
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
# These args will get passed on to each operator
# You can override them on a per-task basis during operator initialization
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': airflow.utils.dates.days_ago(2),
'email': ['luohuazju@gmail.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
# 'end_date': datetime(2016, 1, 1),
# 'wait_for_downstream': False,
# 'dag': dag,
# 'adhoc':False,
# 'sla': timedelta(hours=2),
# 'execution_timeout': timedelta(seconds=300),
# 'on_failure_callback': some_function,
# 'on_success_callback': some_other_function,
# 'on_retry_callback': another_function,
# 'trigger_rule': u'all_success'
}
dag = DAG(
'first_bash',
default_args=default_args,
description='A simple tutorial DAG',
schedule_interval=timedelta(days=1),
)
# t1, t2 and t3 are examples of tasks created by instantiating operators
t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag,
)
t2 = BashOperator(
task_id='sleep',
depends_on_past=False,
bash_command='sleep 5',
dag=dag,
)
templated_command = """
{% for i in range(5) %}
echo "{{ ds }}"
echo "{{ macros.ds_add(ds, 7)}}"
echo "{{ params.my_param }}"
{% endfor %}
"""
t3 = BashOperator(
task_id='templated',
depends_on_past=False,
bash_command=templated_command,
params={'my_param': ‘Hello, Sillycat'},
dag=dag,
)
t1 >> [t2, t3]
I copy the content of tutorial there
Run command to check the PYTHON compile
> python first_bash.py
> airflow list_dags
-------------------------------------------------------------------
DAGS
-------------------------------------------------------------------
first_bash
> airflow list_tasks first_bash
print_date
sleep
templated
Test my Task
Command subcommand dag_id task_id date
> airflow test first_bash print_date 2019-01-13
> airflow test first_bash sleep 2019-01-13
I can directly click run on the UI
In this mode, we can only use SequentialExecutor
Set up MySQL
https://blog.csdn.net/qazplm12_3/article/details/53065654
We should use CeleryExecutor
https://blog.csdn.net/qazplm12_3/article/details/53065654
Check logging on the UI
https://blog.csdn.net/SunnyYoona/article/details/76615699
Celery
https://www.jianshu.com/p/1840035cb510
Execute on Remote Machines
http://yangcongchufang.com/airflow/airflow-ssh-operator.html
https://github.com/diggzhang/python_snip/blob/master/airflow/airflow_ssh.py
https://stackoverflow.com/questions/10635733/how-do-i-make-multiple-celery-workers-run-the-same-tasks
Run Celery on Multiple Machines
https://www.213.name/archives/1105
http://docs.celeryproject.org/en/master/userguide/routing.html#id2
http://docs.celeryproject.org/en/master/userguide/routing.html#broadcast
http://docs.celeryproject.org/en/latest/userguide/workers.html
References:
https://airflow.apache.org/
https://airflow.apache.org/start.html
https://juejin.im/post/5a0a39c25188254d2b6da2a3
https://blog.csdn.net/qazplm12_3/article/details/53065654
https://blog.csdn.net/SunnyYoona/article/details/76615699
https://sanyuesha.com/2017/11/13/airflow/
https://www.jianshu.com/p/59d69981658a
https://liqiang.io/post/airflow-the-workflow-in-python
https://www.cnblogs.com/skyrim/p/7456170.html
https://zhuanlan.zhihu.com/p/37889267
https://www.jianshu.com/p/1840035cb510
Try the Quick Start in Ubuntu System
Install on Ubuntu
Init the Home Directory
> export AIRFLOW_HOME=~/airflow
Prepare Python VERSION
https://sillycat.iteye.com/blog/2436508
Install from the GitHub
> git clone https://github.com/pyenv/pyenv.git ~/.pyenv
Add the cloned directory to PATH, edit the .profile file in my case
export PYENV_ROOT="$HOME/.pyenv"
export PATH="$PYENV_ROOT/bin:$PATH"
if command -v pyenv 1>/dev/null 2>&1; then
eval "$(pyenv init -)"
fi
Check Versions
> pyenv versions
* system (set by /home/carl/.pyenv/version)
> sudo apt-get install bzip2 libbz2-dev
> sudo apt-get install libreadline6 libreadline6-dev
> sudo apt-get install sqlite3 libsqlite3-dev
Install 3.6.0
> pyenv install 3.6.0
> pyenv install 2.7.12
> pyenv install 3.7.2
Check versions
> pyenv versions
system
2.7.12
* 3.6.0 (set by /home/carl/.pyenv/version)
3.7.2
Set Global to be 3.6.0
> pyenv global 3.6.0
Check version
> python --version
Python 3.6.0
Upgrade pip
> pip install --upgrade pip
> pip --version
pip 18.1 from /home/carl/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pip (python 3.6)
PIP Install airflow
> pip install apache-airflow
Fail with Exception:
raise RuntimeError("By default one of Airflow's dependencies installs a GPL "
Solution:
https://stackoverflow.com/questions/52203441/error-while-install-airflow-by-default-one-of-airflows-dependencies-installs-a
> SLUGIFY_USES_TEXT_UNIDECODE=yes pip install apache-airflow
Init the Database
> airflow initdb
Start the Web Server, default PORT is 8080
> airflow webserver -p 8080
Start the Scheduler
> airflow scheduler
Visit the webUI
http://ubuntu-master:8080/admin/
Check Version
> airflow version
[2019-01-12 23:28:28,723] {__init__.py:51} INFO - Using executor SequentialExecutor
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
v1.10.1
List all dags
> airflow list_dags
example_bash_operator
example_branch_dop_operator_v3
example_branch_operator
example_http_operator
example_passing_params_via_test_command
example_python_operator
example_short_circuit_operator
example_skip_dag
example_subdag_operator
example_subdag_operator.section-1
example_subdag_operator.section-2
example_trigger_controller_dag
example_trigger_target_dag
example_xcom
latest_only
latest_only_with_trigger
test_utils
tutorial
Disable the Example, Set the configuration file
load_examples = False
expose_config = True
Delete the examples
> airflow delete_dag test_utils
Delete without ask
> airflow delete_dag -y example_branch_operator
https://juejin.im/post/5a0a39c25188254d2b6da2a3
DAG - Directed Acyclic Graph - Group of Tasks. Python file in DAG directory
Some useful examples are here
https://github.com/apache/airflow/tree/master/airflow/example_dags
Create First Dag
> cd ~/airflow/
> mkdir dags
> cd dags
> vi first_bash.py
# -*- coding: utf-8 -*-
"""
### Tutorial Documentation
Documentation that goes along with the Airflow tutorial located
[here](https://airflow.apache.org/tutorial.html)
"""
from datetime import timedelta
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
# These args will get passed on to each operator
# You can override them on a per-task basis during operator initialization
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': airflow.utils.dates.days_ago(2),
'email': ['luohuazju@gmail.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
# 'end_date': datetime(2016, 1, 1),
# 'wait_for_downstream': False,
# 'dag': dag,
# 'adhoc':False,
# 'sla': timedelta(hours=2),
# 'execution_timeout': timedelta(seconds=300),
# 'on_failure_callback': some_function,
# 'on_success_callback': some_other_function,
# 'on_retry_callback': another_function,
# 'trigger_rule': u'all_success'
}
dag = DAG(
'first_bash',
default_args=default_args,
description='A simple tutorial DAG',
schedule_interval=timedelta(days=1),
)
# t1, t2 and t3 are examples of tasks created by instantiating operators
t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag,
)
t2 = BashOperator(
task_id='sleep',
depends_on_past=False,
bash_command='sleep 5',
dag=dag,
)
templated_command = """
{% for i in range(5) %}
echo "{{ ds }}"
echo "{{ macros.ds_add(ds, 7)}}"
echo "{{ params.my_param }}"
{% endfor %}
"""
t3 = BashOperator(
task_id='templated',
depends_on_past=False,
bash_command=templated_command,
params={'my_param': ‘Hello, Sillycat'},
dag=dag,
)
t1 >> [t2, t3]
I copy the content of tutorial there
Run command to check the PYTHON compile
> python first_bash.py
> airflow list_dags
-------------------------------------------------------------------
DAGS
-------------------------------------------------------------------
first_bash
> airflow list_tasks first_bash
print_date
sleep
templated
Test my Task
Command subcommand dag_id task_id date
> airflow test first_bash print_date 2019-01-13
> airflow test first_bash sleep 2019-01-13
I can directly click run on the UI
In this mode, we can only use SequentialExecutor
Set up MySQL
https://blog.csdn.net/qazplm12_3/article/details/53065654
We should use CeleryExecutor
https://blog.csdn.net/qazplm12_3/article/details/53065654
Check logging on the UI
https://blog.csdn.net/SunnyYoona/article/details/76615699
Celery
https://www.jianshu.com/p/1840035cb510
Execute on Remote Machines
http://yangcongchufang.com/airflow/airflow-ssh-operator.html
https://github.com/diggzhang/python_snip/blob/master/airflow/airflow_ssh.py
https://stackoverflow.com/questions/10635733/how-do-i-make-multiple-celery-workers-run-the-same-tasks
Run Celery on Multiple Machines
https://www.213.name/archives/1105
http://docs.celeryproject.org/en/master/userguide/routing.html#id2
http://docs.celeryproject.org/en/master/userguide/routing.html#broadcast
http://docs.celeryproject.org/en/latest/userguide/workers.html
References:
https://airflow.apache.org/
https://airflow.apache.org/start.html
https://juejin.im/post/5a0a39c25188254d2b6da2a3
https://blog.csdn.net/qazplm12_3/article/details/53065654
https://blog.csdn.net/SunnyYoona/article/details/76615699
https://sanyuesha.com/2017/11/13/airflow/
https://www.jianshu.com/p/59d69981658a
https://liqiang.io/post/airflow-the-workflow-in-python
https://www.cnblogs.com/skyrim/p/7456170.html
https://zhuanlan.zhihu.com/p/37889267
https://www.jianshu.com/p/1840035cb510
发表评论
-
Stop Update Here
2020-04-28 09:00 325I will stop update here, and mo ... -
NodeJS12 and Zlib
2020-04-01 07:44 486NodeJS12 and Zlib It works as ... -
Docker Swarm 2020(2)Docker Swarm and Portainer
2020-03-31 23:18 375Docker Swarm 2020(2)Docker Swar ... -
Docker Swarm 2020(1)Simply Install and Use Swarm
2020-03-31 07:58 376Docker Swarm 2020(1)Simply Inst ... -
Traefik 2020(1)Introduction and Installation
2020-03-29 13:52 345Traefik 2020(1)Introduction and ... -
Portainer 2020(4)Deploy Nginx and Others
2020-03-20 12:06 437Portainer 2020(4)Deploy Nginx a ... -
Private Registry 2020(1)No auth in registry Nginx AUTH for UI
2020-03-18 00:56 447Private Registry 2020(1)No auth ... -
Docker Compose 2020(1)Installation and Basic
2020-03-15 08:10 383Docker Compose 2020(1)Installat ... -
VPN Server 2020(2)Docker on CentOS in Ubuntu
2020-03-02 08:04 469VPN Server 2020(2)Docker on Cen ... -
Buffer in NodeJS 12 and NodeJS 8
2020-02-25 06:43 396Buffer in NodeJS 12 and NodeJS ... -
NodeJS ENV Similar to JENV and PyENV
2020-02-25 05:14 491NodeJS ENV Similar to JENV and ... -
Prometheus HA 2020(3)AlertManager Cluster
2020-02-24 01:47 433Prometheus HA 2020(3)AlertManag ... -
Serverless with NodeJS and TencentCloud 2020(5)CRON and Settings
2020-02-24 01:46 343Serverless with NodeJS and Tenc ... -
GraphQL 2019(3)Connect to MySQL
2020-02-24 01:48 257GraphQL 2019(3)Connect to MySQL ... -
GraphQL 2019(2)GraphQL and Deploy to Tencent Cloud
2020-02-24 01:48 458GraphQL 2019(2)GraphQL and Depl ... -
GraphQL 2019(1)Apollo Basic
2020-02-19 01:36 334GraphQL 2019(1)Apollo Basic Cl ... -
Serverless with NodeJS and TencentCloud 2020(4)Multiple Handlers and Running wit
2020-02-19 01:19 319Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(3)Build Tree and Traverse Tree
2020-02-19 01:19 327Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(2)Trigger SCF in SCF
2020-02-19 01:18 304Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(1)Running with Component
2020-02-19 01:17 316Serverless with NodeJS and Tenc ...
相关推荐
Apache Airflow 在 Ubuntu 上的安装 Apache Airflow 是一个流行的工作流管理系统,它提供了一个灵活的方式来编排和监控复杂的数据处理工作流。下面是 Apache Airflow 在 Ubuntu 上的安装步骤: Step 1: 更新系统 ...
标题中的"PyPI 官网下载 | apache-airflow-providers-apache-hive-1.0.0b2.tar.gz"表明这是一个从Python Package Index (PyPI)官方源下载的软件包,具体是Apache Airflow的一个提供者包,用于与Apache Hive进行集成...
Apache Airflow 是一个开源的工作流管理系统,用于编排、调度和监控复杂的业务逻辑。这个"apache-airflow-2.1.2.tar.gz"文件是一个压缩包,包含了Airflow的版本2.1.2的源代码和其他相关文件。该版本可能包含了一些新...
Apache Airflow是一款强大的开源工作流管理系统,用于构建、监控和调度数据管道。它最初由Airbnb开发,并在2015年开源,现在是Apache软件基金会的顶级项目。本资料"Data Pipelines with Apache Airflow v1"可能涵盖...
Apache Airflow是一个Airbnb 的 Workflow 开源项目,用于开发、调度和监控面向批处理的工作流。Airflow的可扩展Python框架使您能够构建与几乎任何技术连接的工作流程。Web界面有助于管理工作流的状态。Airflow可通过...
《PyPI上的Apache Airflow Providers for Apache Spark 1.0.2详解》 Apache Airflow是一款强大的工作流管理系统,用于创建、调度和监控复杂的任务执行流程。它支持各种操作,如数据处理、ETL(提取、转换、加载)...
airflow python安装包,apache_airflow-2.1.2-py3-none-any.whl
资源分类:Python库 所属语言:Python 资源全名:apache-airflow-providers-apache-spark-2.1.3.tar.gz 资源来源:官方 安装方法:https://lanzao.blog.csdn.net/article/details/101784059
Apache Airflow (incubating) Documentation
pip install apache-airflow[gcp,statsd,sentry]==1.10.10 同时安装这些额外的软件包。 pip install cryptography==2.9.2 pip install pyspark==2.4.5 要验证您的Airflow安装,请检查您的Airflow版本。 这应该...
Apache Airflow是一款强大的开源工作流管理系统,用于构建、监控和调度数据管道。它允许开发者定义、安排和执行复杂的任务依赖关系,确保数据处理任务按计划和预期的方式运行。本资料"Data Pipelines with Apache ...
Apache Airflow 是一个开源的工作流管理系统,用于编排、调度和监控复杂的业务逻辑。这个名为 "apache-airflow-upgrade-check-1.0.0.tar.gz" 的压缩包是 Apache Airflow 的一个升级检查工具,版本为 1.0.0,源自 ...
airflow 是一个编排、调度和监控workflow的平台,由Airbnb开源,现在在Apache Software Foundation 孵化。airflow 将workflow编排为tasks组成的DAGs,调度器在一组workers上按照指定的依赖关系执行tasks。同时,...
《PyPI上的Apache Airflow Providers for InfluxDB 1.0.0rc1初探》 在Python的世界里,PyPI(Python Package Index)是最重要的软件仓库,它为开发者提供了无数的模块和库,便于他们构建和扩展项目。这次我们要探讨...
Apache Airflow 是一个开源的工作流管理系统,用于编排、调度和监控复杂的业务逻辑。它广泛应用于数据处理、ETL(提取、转换、加载)任务、机器学习管道等场景。PyPI(Python Package Index)是Python社区的主要软件...
在分析和研究`apache-airflow-client-2.1.0rc1`源码时,你可以关注以下几个方面: - 探索API的设计模式,了解如何创建和管理DAG、任务和依赖。 - 研究如何使用客户端库与Airflow服务器通信,获取DAG的状态信息。 - ...
Apache AirFlow是一款强大的工作流管理系统,它被设计用于构建、调度和监控复杂的任务执行流程。Apache AirFlow源代码是公开的,允许开发者深入理解其内部工作机制,并可以根据需求进行定制和扩展。这个开源项目体现...
Airflow is a platform to programmatically author, schedule and monitor workflows (a.k.a. DAGs or Directed Acyclic Graphs)
资源分类:Python库 所属语言:Python 资源全名:apache-airflow-providers-singularity-2.0.3rc1.tar.gz 资源来源:官方 安装方法:https://lanzao.blog.csdn.net/article/details/101784059
pip install apache-airflow ``` 或 ``` yum install apache-airflow ``` 配置 AirFlow 安装完成后,需要配置 AirFlow。首先,需要修改环境变量,添加 AirFlow 的二进制文件路径。使用以下命令: ``` echo "export ...