Spark JDBC(1)MySQL Database RDD

sillycat

浏览: 2579516 次
性别:
来自: 成都

最近访客更多访客>>

huageng520

learnmore

u012363178

ymgjava

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Summary

Spark JDBC(1)MySQL Database RDD

Try to understand how the JDBCRDD work on Spark.
First of all, the master did not connect to the database.

First step,
The client driver class will connect to the MySQL and get the minId and maxId.
150612 17:21:55 58 Connect cluster@192.168.56.1 on lmm
select coalesce(min(d.id), 0) from device d where d.last_updated >= '2014-06-12 00:00:00.0000' and d.last_updated < '2014-06-13 00:00:00.0000'
select coalesce(max(d.id), 0) from device d

Second step, All the workers will try to fetch the data based on partitions
150612 17:22:13 59 Connect cluster@ubuntu-dev2 on lmm
select id, tenant_id, date_created, last_updated, device_id, os_type, os_version,
          search_radius, sdk_major_version, last_time_zone, sendable
         from
          device d
         where
          375001 <= d.id and
          d.id <= 750001

select id, tenant_id, date_created, last_updated, device_id, os_type, os_version,
          search_radius, sdk_major_version, last_time_zone, sendable
         from
          device d
         where
          750002 <= d.id and
          d.id <= 1125002

62 Connect cluster@ubuntu-dev1 on lmm
62 Query select id, tenant_id, date_created, last_updated, device_id, os_type, os_version,
          search_radius, sdk_major_version, last_time_zone, sendable
         from
          device d
         where
          0 <= d.id and
          d.id <= 375000
63 Query select id, tenant_id, date_created, last_updated, device_id, os_type, os_version,
          search_radius, sdk_major_version, last_time_zone, sendable
         from
          device d
         where
          1500004 <= d.id and
          d.id <= 1875004

The sample JDBCRDD is in code
https://github.com/luohuazju/sillycat-spark/tree/streaming

References:
http://spark.apache.org/docs/1.4.0/tuning.html
http://stackoverflow.com/questions/27619230/how-to-split-the-input-file-in-apache-spark

分享到：

Machine Learning(1)Collect Documents | Redis(8)Monitor and Data Type Design

2015-06-13 14:10
浏览 687
评论(0)
分类:企业架构
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论