Data integration
involves combining data residing in different sources and providing users with a unified view of these data。 from wikipedia.org
说白了就是将各个数据源的数据汇总到一起为用户提供统一视图。
数据集成包含几个组件:Repository, Data Source, ETL tools
Repository 就是总库
Data source 包括各系统正在使用的SQL 数据库, Big Data 数据库, 各种电子表格文件, XML data service等。
ETL tools 就五花八门了。目前开源的主要是pentaho的kettle和talend 的open studio。
从以下几个方面来总结以下使用心得:
1、产地和文档
挑选商品的时候先看产地,原来made in japan就是质量的保障,现在made in china就OK了。
软件我个人认为made in German是最好的,如SAP、SUSE Linux, Avira都不错。
made in US是文档和通用性的保证,kettle就是米国产的,到Amazon上查下出版物,kettle的比talend多很多, talend只有一本书,还是2012年12月预计出版。这和talend是法国人搞的不无关系。可惜在下不懂法文,所以只能看talend自带的user guide。user guide和component reference 写的太条文化,可读性一般,没有类似cookbook的东西,对解决实际问题的帮助不大。
2、易用性
从程序设计的角度而言kettle比较直接,每一个小步骤就需要一个组件,虽然看起来很明确,但不支持拖放,操作略显复杂。完成talend同样的操作需要对多个组件进行操作。
这点talend完胜。
3、稳定性
举个场景:从sqlserver 2005将数据按规则存放到oralce 10g中
kettle和talend对sqlserver默认使用jtds
talend 5.1.1jtds 报错
kettle一样
talend 解决方法 版本降级到5.0.3。尝试过使用通用jdbc连接,按前人的方法将微软的jdbc driver 带入 talend的lib下,结果找不到driver class。
kettle首先找不到driver class, 后重启环境, OK.
另外kettle可以使用odbc对sqlserver 进行操作,性能很稳定,talend使用odbc结果是无法获取schema。
关于oracle的连接,kettle一开始无法直接使用jdbc,原因是没有默认安装jdbc驱动到classpath。
talend 则一切OK。
从以上的使用经验出发,talend在jdbc扩展方面要略差于kettle。
但如果不是使用mssqlserver, 问题应该不大(我尝试过使用postgre和mysql都OK)。(结论,在开源的系统里ms就是被排斥的对象。比如说在centos下装tinytds也是很痛苦的)
结论:如果有ms的数据库,请用kettle,如果没有请用talend。
ps:kettle的重心是在BI上,所以如果只是做DI,talend的learning curve和操作都更为简单。
分享到:
相关推荐
Build your own chatbot using Python and open source tools. This book begins with an introduction to chatbots where you will gain vital information on their architecture. You will then dive straight ...
Build your own chatbot using Python and open source tools. This book begins with an introduction to chatbots where you will gain vital information on their architecture. You will then dive straight ...
Learn various commercial and open source products that perform SQL on Big Data platforms. You will understand the architectures of the various SQL engines being used and how the tools work internally ...
linear algebra, integration, interpolation, and other special functions using array objects, machine learning, data mining, and plotting. This book offers practical guidance to help you on the ...
- **Pentaho**: An open-source data integration tool that supports ETL processes. - **Talend**: A commercial and open-source platform for data integration. The chapter provides guidance on using these...
This book is packed with hands-on examples that will help you program your robot and give you complete solutions using open source ROS libraries and tools. It also shows you how to use virtual ...
This book is packed with hands-on examples that will help you program your robot and give you complete solutions using open source ROS libraries and tools. It also shows you how to use virtual ...
As Mirth Corporation (now is a subsidiary of Quality Systems, Inc.) says on their web-site, “Mirth Connect is the Swiss Army knife of healthcare integration engines, specifically designed for HL7 ...
Its seamless integration and inherent support for open source software make it an obvious choice for building cloud-based applications and services. This book will take you through a full ...
enabling the possibility of integration with other open source tools such as ntop or PHP Weathermap. This is the best RRDtool frontend. What this book covers Chapter 1 is an overview of Cacti. ...
Going further, you'll learn how to design and deploy a Continuous Integration platform on AWS using either open-source or AWS provided tools/services. Following on from the Delivery part of the ...
Learn how to meet all your GIS needs with the leading open source GIS Master QGIS by learning about database integration, geoprocessing tools, Python scripts, advanced cartography, and custom plugins ...