`
sillycat
  • 浏览: 2564730 次
  • 性别: Icon_minigender_1
  • 来自: 成都
社区版块
存档分类
最新评论

AirFlow Apache(1)Introduction and Install on Ubuntu

 
阅读更多
AirFlow Apache(1)Introduction and Install on Ubuntu
Try the Quick Start in Ubuntu System
Install on Ubuntu
Init the Home Directory
> export AIRFLOW_HOME=~/airflow
Prepare Python VERSION
https://sillycat.iteye.com/blog/2436508
Install from the GitHub
> git clone https://github.com/pyenv/pyenv.git ~/.pyenv
Add the cloned directory to PATH, edit the .profile file in my case
export PYENV_ROOT="$HOME/.pyenv"
export PATH="$PYENV_ROOT/bin:$PATH"
if command -v pyenv 1>/dev/null 2>&1; then
eval "$(pyenv init -)"
fi
Check Versions
> pyenv versions
* system (set by /home/carl/.pyenv/version)
>  sudo apt-get install bzip2 libbz2-dev
> sudo apt-get install libreadline6 libreadline6-dev
> sudo apt-get install sqlite3 libsqlite3-dev
Install 3.6.0
> pyenv install 3.6.0
> pyenv install 2.7.12
> pyenv install 3.7.2
Check versions
> pyenv versions
  system
  2.7.12
* 3.6.0 (set by /home/carl/.pyenv/version)
  3.7.2
Set Global to be 3.6.0
> pyenv global 3.6.0
Check version
> python --version
Python 3.6.0
Upgrade pip
> pip install --upgrade pip
> pip --version
pip 18.1 from /home/carl/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pip (python 3.6)
PIP Install airflow
> pip install apache-airflow
Fail with Exception:
raise RuntimeError("By default one of Airflow's dependencies installs a GPL "
Solution:
https://stackoverflow.com/questions/52203441/error-while-install-airflow-by-default-one-of-airflows-dependencies-installs-a
> SLUGIFY_USES_TEXT_UNIDECODE=yes pip install apache-airflow
Init the Database
> airflow initdb
Start the Web Server, default PORT is 8080
> airflow webserver -p 8080
Start the Scheduler
> airflow scheduler
Visit the webUI
http://ubuntu-master:8080/admin/
Check Version
> airflow version
[2019-01-12 23:28:28,723] {__init__.py:51} INFO - Using executor SequentialExecutor
  ____________       _____________
____    |__( )_________  __/__  /________      __
____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
_/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
   v1.10.1
List all dags
> airflow list_dags
example_bash_operator
example_branch_dop_operator_v3
example_branch_operator
example_http_operator
example_passing_params_via_test_command
example_python_operator
example_short_circuit_operator
example_skip_dag
example_subdag_operator
example_subdag_operator.section-1
example_subdag_operator.section-2
example_trigger_controller_dag
example_trigger_target_dag
example_xcom
latest_only
latest_only_with_trigger
test_utils
tutorial
Disable the Example, Set the configuration file
load_examples = False
expose_config = True
Delete the examples
> airflow delete_dag test_utils
Delete without ask
> airflow delete_dag -y example_branch_operator
https://juejin.im/post/5a0a39c25188254d2b6da2a3
DAG - Directed Acyclic Graph - Group of Tasks. Python file in DAG directory
Some useful examples are here
https://github.com/apache/airflow/tree/master/airflow/example_dags
Create First Dag
> cd ~/airflow/
> mkdir dags
> cd dags
> vi first_bash.py
# -*- coding: utf-8 -*-
"""
### Tutorial Documentation
Documentation that goes along with the Airflow tutorial located
[here](https://airflow.apache.org/tutorial.html)
"""
from datetime import timedelta
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
# These args will get passed on to each operator
# You can override them on a per-task basis during operator initialization
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': airflow.utils.dates.days_ago(2),
    'email': ['luohuazju@gmail.com'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
    # 'queue': 'bash_queue',
    # 'pool': 'backfill',
    # 'priority_weight': 10,
    # 'end_date': datetime(2016, 1, 1),
    # 'wait_for_downstream': False,
    # 'dag': dag,
    # 'adhoc':False,
    # 'sla': timedelta(hours=2),
    # 'execution_timeout': timedelta(seconds=300),
    # 'on_failure_callback': some_function,
    # 'on_success_callback': some_other_function,
    # 'on_retry_callback': another_function,
    # 'trigger_rule': u'all_success'
}
dag = DAG(
    'first_bash',
    default_args=default_args,
    description='A simple tutorial DAG',
    schedule_interval=timedelta(days=1),
)
# t1, t2 and t3 are examples of tasks created by instantiating operators
t1 = BashOperator(
    task_id='print_date',
    bash_command='date',
    dag=dag,
)
t2 = BashOperator(
    task_id='sleep',
    depends_on_past=False,
    bash_command='sleep 5',
    dag=dag,
)
templated_command = """
{% for i in range(5) %}
    echo "{{ ds }}"
    echo "{{ macros.ds_add(ds, 7)}}"
    echo "{{ params.my_param }}"
{% endfor %}
"""
t3 = BashOperator(
    task_id='templated',
    depends_on_past=False,
    bash_command=templated_command,
    params={'my_param': ‘Hello, Sillycat'},
    dag=dag,
)
t1 >> [t2, t3]

I copy the content of tutorial there
Run command to check the PYTHON compile
> python first_bash.py
> airflow list_dags
-------------------------------------------------------------------
DAGS
-------------------------------------------------------------------
first_bash
> airflow list_tasks first_bash
print_date
sleep
templated

Test my Task
Command subcommand dag_id task_id date
> airflow test first_bash print_date 2019-01-13
> airflow test first_bash sleep 2019-01-13
I can directly click run on the UI
In this mode, we can only use SequentialExecutor
Set up MySQL
https://blog.csdn.net/qazplm12_3/article/details/53065654
We should use CeleryExecutor
https://blog.csdn.net/qazplm12_3/article/details/53065654
Check logging on the UI
https://blog.csdn.net/SunnyYoona/article/details/76615699
Celery
https://www.jianshu.com/p/1840035cb510
Execute on Remote Machines
http://yangcongchufang.com/airflow/airflow-ssh-operator.html
https://github.com/diggzhang/python_snip/blob/master/airflow/airflow_ssh.py
https://stackoverflow.com/questions/10635733/how-do-i-make-multiple-celery-workers-run-the-same-tasks
Run Celery on Multiple Machines
https://www.213.name/archives/1105
http://docs.celeryproject.org/en/master/userguide/routing.html#id2
http://docs.celeryproject.org/en/master/userguide/routing.html#broadcast
http://docs.celeryproject.org/en/latest/userguide/workers.html

References:
https://airflow.apache.org/
https://airflow.apache.org/start.html
https://juejin.im/post/5a0a39c25188254d2b6da2a3
https://blog.csdn.net/qazplm12_3/article/details/53065654
https://blog.csdn.net/SunnyYoona/article/details/76615699
https://sanyuesha.com/2017/11/13/airflow/
https://www.jianshu.com/p/59d69981658a
https://liqiang.io/post/airflow-the-workflow-in-python
https://www.cnblogs.com/skyrim/p/7456170.html
https://zhuanlan.zhihu.com/p/37889267
https://www.jianshu.com/p/1840035cb510

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics