`

MySQL Applier For Hadoop: Real time data export from MySQL to HDFS

 
阅读更多

http://innovating-technology.blogspot.com/2013/04/mysql-hadoop-applier-part-1.html

MySQL replication enables data to be replicated from one MySQL database server (the master) to one or more MySQL database servers (the slaves). However, imagine the number of use cases being served if the slave (to which data is replicated) isn't restricted to be a MySQL server; but it can be any other database server or platform with replication events applied in real-time! 
 
This is what the new Hadoop Applier empowers you to do.
 
An example of such a slave could be a data warehouse system such asApache Hive, which uses HDFS as a data store. If you have a Hive metastore associated with HDFS(Hadoop Distributed File System), theHadoop Applier can populate Hive tables in real time. Data is exported from MySQL to text files in HDFS, and therefore, into Hive tables. It is as simple as running a 'CREATE TABLE' HiveQL on Hive, to define the table structure similar to that on MySQL (and yes, you can use any row and column delimiters you want); and then run Hadoop Applier to start real time data replication.

 

The motivation to develop the Hadoop Applier is that currently, there is no tool available to perform this real time transfer. Existing solutions to import data into HDFS include Apache Sqoop which is well proven and enables batch transfers , but as a result requires re-import from time to time, to keep the data updated. It reads the source MySQL database via a JDBC connector or a fastpath connector, and performs a bulk data transfer, which can create an overhead on your operational systems, making other queries slow. Consider a case where there are only a few changes of the database compared to the size of the data, Sqoop might take too long to load the data. 
 
On the other hand, Hadoop Applier reads from a binary log and inserts data in real time, applying the events as they happen on the MySQL server; therefore other queries can continue to execute without effect on their speed. No bulk transfers required! Hadoop Applier takes only the changes and insert them, which is a lot faster. 
 

Hadoop Applier can thus be a solution when you need to rapidly acquire new data from MySQL for real-time processing within Hadoop.

 

Introducing The Applier: 
 
 
It is a method which replicates events from the MySQL binary log to provide real time integration of MySQL with Hadoop and related frameworks which work on top of HDFS. There are many use cases for the integration of unstructured data stored in Apache Hadoop and structured data from relational databases such as MySQL. 
 
 
 
Hadoop Applier provides real time connectivity between MySQL andHadoop/HDFS(Hadoop Distributed File System); which can be used for big data analytics: for purposes like sentiment analysis, marketing campaign analysis, customer churn modeling, fraud detection, risk modelling and many more. You can read more about the role of Hadoop Applier in Big data in the blog by Mat Keep. Many widely used systems, such as Apache Hive, use HDFS as a data store.
The diagram below represents the integration:


 

Replication via Hadoop Applier happens by reading binary log events , and writing them into a file in HDFS(Hadoop Distributed File System) as soon as they happen on MySQL master. “Events” describe database changes such as table creation operations or changes to table data.

 

As soon as an Insert query is fired on MySQL master, it is passed to the Hadoop Applier. This data is then written into a text file in HDFS. Once data is in HDFS files; other Hadoop ecosystem platforms and databases can consume this data for their own application. 
 

Hadoop Applier can be downloaded from http://labs.mysql.com/

 

Prerequisites:
These are the packages you require in order to run Hadoop Applier on your machine:
 
- Hadoop Applier package from http://labs.mysql.com
- Hadoop 1.0.4 ( that is what I used for the demo in the next post)
- Java version 6 or later (since hadoop is written in Java)
- libhdfs (it comes precompiled with Hadoop distros,
 ${HADOOP_HOME}/libhdfs/libhdfs.so)
- cmake 2.6 or greater 
- libmysqlclient 5.6
- gcc 4.6.3
- MySQL Server 5.6
-FindHDFS.cmake (cmake file to find libhdfs library while compiling. You can get a copy online)
-FindJNI.cmake (optional, check if you already have one: 
$locate FindJNI.cmake
)

 

To use the Hadoop Applier with Hive, you will also need to install Hive , which you can download here.

Please use the comments section of this blog to share your opinion on Hadoop Applier, and let us know more about your requirements.

 

  • 大小: 56.9 KB
分享到:
评论

相关推荐

    MySQL 8 for Big Data-Packt Publishing(2017).pdf

    It will cover real-time use case scenarios to explain integration and achieving Big Data solutions using different technologies such as Apache Hadoop, Apache Sqoop, and MySQL Applier. The book will ...

    MySQL 8数据库复制技术介绍.pptx

    * Threaded applier:用于异步应用变化。 * Persistent log buffer:用于存储变化的日志记录。 * Capture statements or data changes:用于捕获变化的语句或数据变化。 MySQL 8 数据库复制技术的工作流程包括: ...

    mysql-kafka-applier:用于kafka的mysql realtime-binlog

    MySQL Kafka应用程序 用于kafka的mysql realtime-binlog 要求 MySQL Binlog事件1.0.0 librdkafka MySQL 5.7.X(二进制和源代码) 安装 跑步

    lvs+keepalived+mha+mysql架构最佳部署手册

    ### lvs+keepalived+mha+mysql高可用架构配置说明 #### 第一部分 MHA介绍 **MHA**(Master High Availability)作为一种成熟的MySQL高可用性解决方案,它由日本开发者Youshimaton创建,旨在为MySQL环境提供故障...

    MySQL_57_Replication_Enhancements

    在一个典型的MySQL复制设置中,主要包括以下三个角色:插入客户端(Insert Client),发送线程(Sender thread),接收线程(Receiver thread)以及应用线程(Applier Thread)。在复制过程中,主服务器(Master)上...

    kube-applier:kube-applier为您的Kubernetes集群实现自动部署和声明式配置

    库伯应用程序 kube-applier是一项服务,可通过将声明性配置文件从Git存储库应用到Kubernetes集群,从而实现Kubernetes对象的连续部署。 kube-applier在您的集群中作为Pod运行,并监视以确保集群对象及其存储库中的...

    基于MySQL组复制技术数据备份策略实现.pdf

    隔离核心服务层是MySQL组复制技术的最上层,API接口层是用户访问核心层的接口,核心服务插件层包括capture、applier、recovery三个组件,Replication协议层模块实现了replication协议的逻辑,Group Communication ...

    2020_05_22_mysql_复制问题处理全集.docx

    假设我们遇到了一个具体的错误:“Last_SQL_Error: Column 21 of table 'gpay_acq_pos.t_pos_merchant' cannot be converted fromtype 'varchar(20(bytes))' to type 'varchar(8(bytes) gbk)'”。 这个错误表明从表...

    简述mysql监控组复制

    【MySQL 监控组复制】是MySQL高可用性与容错性的重要特性,它允许数据库集群中的多个服务器间实现数据的实时同步,确保在单个服务器出现故障时,整个系统仍能正常运行。监控组复制主要是通过对集群内各成员的状态...

    MySQL MGR 有哪些优点

    MySQL MGR,全称为MySQL Group Replication,是MySQL 5.7版本引入的一个高可用性和高扩展性的插件。它的核心目标是解决传统异步复制和半同步复制中可能出现的数据一致性问题,提供一种更为可靠的数据复制解决方案。...

    feature-change-applier:在历时语音过程建模中的应用

    Feature Change Applier是将系统声音更改规则应用于输入词典的工具。 特征: 基于功能的电话定义基于功能的声音更改规则支持多字符电话支持多个规则集的比较运行什么是LATL?LATL是一种针对JavaScript的编译语言,...

    Oracle 11g高可用RAC双节点+dg+ogg

    本实战文档旨在指导如何在Oracle Linux环境下搭建Oracle 11g RAC(Real Application Clusters)集群,并结合Data Guard(DG)与GoldenGate(OGG)实现数据库的高可用性与数据同步。通过本指南,您将学习到如何从零...

    详解MySQL 5.7 MGR单主确定主节点方法

    MySQL 5.7版本引入了MySQL Group Replication(MGR)功能,这是一个高可用性、分布式复制方案,允许多个MySQL服务器实例组成一个复制组,并在组内进行故障自动转移和数据一致性处理。在单主模式下,MGR会自动选举出...

    patch_applier:小型库,使用Java反射来合并“源”和“目标”对象

    使用Java反射合并“源”和“目标”对象的小型库。...com.tarde.merger.DataProvider-目标对象数据结构 com.tarde.merger.ObjectMergerTest-用例 代码示例: ObjectMerger.mergerOf(source,target).merge();

    Licence Applier-开源

    《Licence Applier:自动化开源软件许可管理工具》 在当今的开源软件世界中,遵循正确的许可协议至关重要,它不仅保护了开发者的权益,也确保了用户和贡献者能够合法地使用、修改和分发软件。"Licence Applier" 是...

    阿里巴巴开源的Oracle数据迁移同步工具yugong.zip

    2008年左右,阿里巴巴开始尝试MySQL的相关研究,并开发了基于MySQL分库分表技术的相关产品,Cobar/TDDL(目前为阿里云DRDS产品),解决了单机Oracle无法满足的扩展性问题,当时也掀起一股去IOE项目的浪潮,愚公这项目...

    MGR Best Practice - 娄帅 - 20210522.pdf

    MySQL 8.0的MGR(MySQL Group Replication)最佳实践是针对数据库的高可用性和数据一致性问题提出的解决方案。本文将详细介绍MGR的架构、推荐配置、监控、部署方案以及相关的知识点。 首先,让我们来理解MGR的概念...

    rapidmain tutorial

    RapidMiner支持连接多种类型的数据库,包括MySQL、Oracle等,以便直接从数据库加载数据。 ### 3. 第一次使用 #### 3.1 首个示例 首次使用RapidMiner时,建议从简单的示例开始,例如加载数据集、进行基本的数据...

    CorePatch:通用补丁程序生成器和应用程序,例如使用BsDiffBsPatch和Google Archive Patch

    核心补丁 通用补丁程序生成器和应用程序,例如使用BsDiff / BsPatch和Google存档补丁程序。... compile "io.github.lizhangqu:corepatch-core-applier:1.0.4" } 专家 //for generator <group

Global site tag (gtag.js) - Google Analytics