`

A magic book on kettle

 
阅读更多
   With the help of my friend , I find a book on kettle with the name <<Pentaho Kettle olutions>
Building+Open+Source+ETL+Solutions+with+Pentaho+Data+Integration>>,it is really a very cute book which help me to know more on kettle.
   

    

   in the year20121218, I begin to read this book, from page 1 to page 44, and I get to know the history of kettle and the relaiton of oltp system and data warehouse.because the English is so difficult, therefore I have to read vary carefully.

20130106

  TOPIC 1 Agile BI
1)ETL  Design

2) Data Acquisition

3) Beware of Spreadsheets

4) Design for failure

  Kettle contains many features to do this. You can:
• Test a repository connection.
• Ping a host to check whether it’s available.
• Wait for a SQL command to return success/failure based on a row count condition.
• Check for empty folders.
• Check for the existence of a file, table, or column.
• Compare files or folders.
• Set a timeout on FTP and SSH connections.
• Create failure/success outputs on every available job step.

5) Change data capture

6) Data Quality

2013-1-16
   today , I try to study the kettle components, kettle is very powerful with the following building blocks.although it is a little difficult to develop the ETL jobs at the beginning,but it much easy to maintence the ETL jobs at the end. so it is a nice tools.
    The Building Blocks of Kettle Design
This section introduces and explains some of the Kettle specific terminology.

Transformations
A transformation is the workhorse of your ETL solution. It handles the manipulation of rows or data in the broadest possible meaning of the extraction, transformation, and loading acronym.

Steps
A step is a core building block in a transformation. It is graphically represented in the form of an icon;
Transformation Hops
A hop, represented by an arrow between two steps, defines the data path between the steps. The hop also represents a row buffer called a row set between two steps.

Parallelism
The simple rules enforced by the hops allow steps to be executed in a parallel nature in separate threads.

Rows of Data
The data that passes from step to step over a hop comes in the form of a row of data. A row is a collection of zero or more fields that can contain the data in any of the following data types:
• String: Any type of character data without any particular limit.
• Number: A double precision floating point number.
• Integer: A signed long integer (64-bit).
• BigNumber: A number with arbitrary (unlimited) precision.
• Date: A date-time value with millisecond precision.
• Boolean: A Boolean value can contain true or false.
Binary: Binary fields can contain images, sounds, videos, and other types of binary data.

Data Conversion
Jobs
A job consists of one or more job entries that are executed in a certain order. The order of execution is determined by the job hops between job entries as well as the result of the execution itself.

Job Entries
A job entry is a core building block of a job. Like a step, it is also graphically represented in the form of an icon. However, if you look a bit closer, you see that job entries differ in a number of ways:
Job Hops
Multiple Paths and Backtracking
Job Entry Results
.
   Tools and Utilities
Kettle contains a number of tools and utilities that help you in various ways and in various stages of your ETL project. The core tools of the Kettle software stack include:
• Spoon: A graphical user interface that will allow you to quickly design and manage complex ETL workloads.
• Kitchen: A command-line tool that allows you to run jobs
• Pan: A command-line tool that allows you to run transformations.
• Carte: A lightweight (around 1MB) web server that enables remote execution of transformations and jobs. A Carte instance also represents a slave server, a key part of Kettle clustering (MPP).
Chapter 3 provides more detailed information on these tools.
Repositories
    When you are faced with larger ETL projects with many ETL developers working together, it’s important to have facilities in place that enable cooperation. Kettle provides a way of defining repository types in a pluggable and flexible way.
• Database repository:
• Pentaho repository:
• File repository:
• Central storage:
• File locking:
• Revision management:
• Referential integrity checking:
• Security:
• Referencing: tact.

Virtual File Systems
   Flexible and uniform file handling is very important to any ETL tool. That is why Kettle supports the specification of files in the broadest sense as URLs. The Apache Commons VFS back end that was put in place will then take care of the complexity for you. For example, with Apache VFS, it is possible to process a selection of files inside a .zip archive in exactly the same way as you would process a list of files in a local folder. For more information on how to specify VFS files, visit the Apache VFS website at http://commons.apache.org/vfs/.
Table 2-5 shows a few typical examples.






分享到:
评论

相关推荐

    kettle设置循环变量

    * 统计表 A、B 的总行数:可以使用 Kettle 设置循环变量来实现循环执行这些表的数据抽取操作。 * 数据抽取和加载:Kettle 设置循环变量可以实现数据抽取和加载的自动化,提高数据处理效率和灵活性。 Kettle 设置...

    kettle-core-8.1.0.0-365_kettle_kettle达梦8_

    《Kettle与达梦数据库的整合:深入理解kettle-core-8.1.0.0-365_kettle_kettle达梦8_》 Kettle,也被称为Pentaho Data Integration (PDI),是一款强大的数据集成工具,它提供了一种图形化的界面,让用户能够设计、...

    kettle6与kettle7版本比较

    我自己编写的KETTLE6.1与KETTLE7.1版本之间的差距比较

    KETTLE数据库转换类型例子

    在Kettle中,我们可以使用"表输入"步骤来从数据库A读取数据,然后使用"更新/插入"步骤将数据写入数据库B。在这个过程中,Kettle会比较源数据库和目标数据库中的记录,如果目标数据库中不存在源数据库的记录,那么就...

    kettle变量参数设置

    ### Kettle变量参数设置详解 #### 一、概述 Kettle是一款开源的数据集成工具,用于进行数据清洗和加载等操作。随着Kettle的发展,其在3.2版本中引入了更多的参数管理机制,以增强其灵活性和易用性。本文将详细介绍...

    kettle二次开发

    ### Kettle二次开发详解 #### 一、引言 Pentaho Data Integration (PDI) 或称为 Kettle,是一款强大的开源 ETL (Extract, Transform, Load) 工具,广泛应用于数据集成领域。Kettle 以其高度可扩展性和灵活性而闻名...

    KETTLE中文官方文档

    Kettle中文官方文档 Kettle是一款功能强大的数据集成工具,提供了详细的中文官方文档,旨在帮助用户快速上手使用Kettle实现数据集成任务。本文档提供了资源库管理、菜单栏介绍、变量等多方面的知识点,帮助用户深入...

    kettle7.1.rar

    《Kettle 7.1:数据仓库与ETL的强大工具》 Kettle 7.1 是一款强大的数据集成工具,其全称为Pentaho Data Integration(PDI),由社区驱动的开源项目提供支持,主要用于数据仓库建设和ETL(Extract, Transform, Load...

    【kettle】10分钟搞定kettle源码部署

    ### Kettle源码部署知识点详解 #### 一、Kettle简介与价值 Kettle是一款知名的开源ETL(Extract-Transform-Load)工具,以其强大的功能和直观易用的图形界面著称。它允许用户轻松地从多种数据源抽取数据,进行必要...

    kettle下载文件.zip

    Kettle,全称为Pentaho Data Integration(PDI),是一款强大的数据集成工具,它由社区驱动,为企业级ETL(Extract, Transform, Load)任务提供了全面解决方案。在本压缩包"Kettle下载文件.zip"中,您将找到与Kettle...

    java集成kettle所有jar包

    Java集成Kettle所有Jar包是一项常见的任务,尤其对于那些在数据处理、ETL(提取、转换、加载)项目中使用Kettle(也称为Pentaho Data Integration或PDI)的开发者来说。Kettle是一个强大的开源数据集成工具,它提供...

    springboot整合kettle项目源码

    标题 "springboot整合kettle项目源码" 描述了一个基于Spring Boot框架的集成Kettle(Pentaho Data Integration,简称KDI)的工程实例。Kettle是一款强大的ETL(提取、转换、加载)工具,它允许开发者通过编写Java...

    【kettle012】kettle访问FTP服务器文件并处理数据至PostgreSQL

    【Kettle012】Kettle访问FTP服务器文件并处理数据至PostgreSQL是关于使用Kettle(也称为Pentaho Data Integration,简称PDI)工具进行数据集成的一个具体实例。Kettle是一个开源的数据集成工具,它允许用户通过图形...

    kettle导入的lib包

    Kettle,全称为Pentaho Data Integration(PDI),是一款强大的ETL(Extract, Transform, Load)工具,用于数据抽取、转换和加载。在Kettle的工作中,lib库扮演着至关重要的角色,它包含了Kettle运行所需的各类依赖...

    kettle实战教程.pdf

    标题:“kettle实战教程.pdf” 描述:“全网最详细的kettle教程” Kettle,也称为Pentaho Data Integration (PDI),是一款开源的ETL(抽取、转换、加载)工具,主要被用于数据仓库、数据迁移、数据转换等场景。本...

    kettle rabbitmq 插件开发

    标题 "kettle rabbitmq 插件开发" 涉及的是如何在 Pentaho Kettle(也称为 Spoon)中创建和使用 RabbitMQ 插件。Kettle 是一个开源的数据集成工具,它允许用户进行数据抽取、转换和加载(ETL)操作。RabbitMQ 是一个...

    kettle7.1可用版本

    kettle7.1可用版本

    Kettle API(HTML格式)

    Kettle API,全称为Pentaho Data Integration (Kettle) API,是Pentaho ETL(数据抽取、转换和加载)工具集的一部分。Pentaho Data Integration,简称PDI或Kettle,是一款开源的数据集成解决方案,它允许用户通过...

    Kettle对接tdh.zip

    【Kettle对接TDH】 Kettle(Pentaho Data Integration,简称PDI)是一款开源的数据集成工具,它提供了丰富的ETL(Extract, Transform, Load)功能,用于在各种数据源之间进行数据迁移、清洗和转换。TDH(Talend ...

    kettle数据抓取操作手册

    kettle数据抓取操作手册 kettle是一款功能强大的数据抓取和ETL(Extract, Transform, Load)工具,广泛应用于数据仓库、数据挖掘和数据分析领域。本手册将指导用户如何使用kettle进行数据抓取操作,包括连接数据库...

Global site tag (gtag.js) - Google Analytics