Last Modified on October 28, 2010
What’s New in Pentaho Data Integration整合
Enterprise Edition 4.1
Copyright ? 2010 Pentaho Corporation. Redistribution重新分配 permitted. All trademarks商标 are the property所有权 of
their respective各自的 owners.
For the latest information, please visit our web site at www.pentaho.com
PentahoTM What’s New in Pentaho Data Integration Enterprise Edition 4.1 2
Contents
Contents ................................................................................................................... 2
Purpose 目的 of This Document ......................................................................................... 3
Pentaho Data Integration Enterprise Edition 4.1 .......................................................... 3
Pentaho Data Integration for Hadoop ......................................................................... 3
Enhancements增强 to Hops .............................................................................................. 4
Metadata Injection注射 .................................................................................................... 4
General Steps and Job作业 Entries .................................................................................... 4
New Transformation Steps ......................................................................................... 4
New Job Entries ........................................................................................................ 5
PentahoTM What’s New in Pentaho Data Integration Enterprise Edition 4.1 3
Purpose of This Document
This document introduces new capabilities生产力 delivered in Pentaho Data Integration (PDI) 4.1. It is intended 打算to
address people who have a working familiarity with the capabilities of Pentaho Data Integration (PDI), but is
not a complete review of Pentaho Data Integration’s functional capabilities.
Pentaho Data Integration Enterprise Edition 4.1
This PDI release includes: integration with Apache Hadoop, making it easy to leverage Hadoop for storing
and processing very large data sets; usability improvements for working with hops; the first ever support for
the concept 概念of Metadata Injection; and a number of new general purpose transformation steps and job
entries.
Pentaho Data Integration for Hadoop
More and more enterprises are turning to Hadoop to reduce 减少costs成本 and improve their ability to extract获得
actionable可控告的 business insight from the vast巨大的 amount of data being collected throughout the enterprise.
Hadoop’s massive大量的 parallel processing capabilities, along with the ability to store存储extremely large amounts of
data in a low cost and reliable可靠的 manner方式, make it an attractive迷人的 option for building Business Intelligence
solutions for Big Data. However, Hadoop presents many challenges to traditional BI Data Integration users,
including a steep险峻的 technical learning curve学习曲线, a lack of qualified technical staff, and the lack of appropriate
tools for performing运行 data integration and business intelligence tasks with Hadoop.
Pentaho Data Integration Enterprise Edition 4.1 delivers comprehensive综合的 integration with Hadoop, which
lowers the technical barriers障碍 to adopting Hadoop for Big Data projects. By using Pentaho Data Integration’s
easy-to-use, graphical design environment, ETL Designers can now harness治理 the power of Hadoop with zero
Java development to address common Data Integration use cases including:
? Moving data files into and out of the Hadoop Distributed File System (HDFS)
? Input/Output data to and from Hadoop using standard SQL statements
? Coordination协调、和谐 and execution实行 of Hadoop tasks as part of larger Data Integration and Business
Intelligence智力 workflows流
? Graphical Design of new MapReduce jobs taking advantage of Pentaho Data Integration’s vast
library of pre-built mapping and data transformation steps
Pentaho for Hadoop simplifies简化 the use of Hadoop for analytics including file input and output steps as well
as managing Hadoop jobs
Pentaho Data Integration Enterprise Edition 4.1 supports the latest releases of Apache Hadoop as well as
popular commercial distributions such as Cloudera Distribution for Hadoop and Amazon Elastic灵活的 MapReduce.
For information and best practices on how to incorporate使混合 Hadoop into your Big Data architecture结构, visit
http://www.pentaho.com/hadoop/resources.php.
PentahoTM What’s New in Pentaho Data Integration Enterprise Edition 4.1 4
Enhancements to Hops
Pentaho Data Integration 4.1 enhances the handling of hops between steps and job entries进入 by allowing all
hops downstream顺流的 from a certain某一的 point or among all selected steps or job entries to be enabled or disabled.
This allows for easier debugging of a faulty 有错误的 step at the end of the transformation and you can now disable
and enable hops simply by clicking on them once. In addition, when hops are split分开, target and error
handling info is retained.保持
Metadata Injection
Pentaho Data Integration 4.1 supports for the first time in data integration history the concept of Metadata
Injection. Metadata Injection offers increased flexibility灵活的 for developers who want to treat对待 their ETL metadata
as data. Last-minute最后的 injection of file layout and field selection into a transformation template makes this
possible. It can drastically大大的 reduce the number of data transformations in situations情况 where patterns can be
discovered 发现in the data integration workload工作量. Implemented as a metadata injection step, this feature allows
developers to dynamically set step properties in transformations. The step exposes揭发 all the available
properties of the step and enables injection of file names, the removal移走 or renaming改名 of fields, and other
metadata properties.
General Steps and Job Entries
In addition to the Pentaho for Hadoop functionality, Pentaho Data Integration 4.1 includes a number of new
steps and job entries designed to increase developer productivity. These include a conditional blocking阻碍 step,
JSON and YAML input steps, a string operations step, and a write to file job entry step. Below is a complete
list of new steps and transformations.
New Transformation Steps
Pentaho Data Integration 4.1 adds the following new transformation steps:
Icon Step Name Description
Hadoop File Input Processes files from an HDFS or Amazon S3 location.
Hadoop File Output Creates files in an HDFS location.
Conditional Blocking Step
Block this step until steps finish, allows building step logic
depending on some others steps execution
JSON Input Step Enables JSON step to execute even if defined path does not exist
JSON Output Step Create JSON block and output in a field of a file.
PentahoTM What’s New in Pentaho Data Integration Enterprise Edition 4.1 5
LDAP Output Step
Perform Insert, Upsert, Update, Add and Delete operations on
records based on their DN.
YAML Input Step Enables reading information from a YAML file.
Email Messages Input Read POP3/IMAP server and retrieve messages.
Generate Random Credit Card
Number
Generates random valid Credit Card numbers.
String Operations Step
Enables string operations including trimming整理, padding衬垫,
lowercase/uppercase, InitCap, Escape (XML, SQL, CDATA, HTML),
extract only digits, remove special characters (CR, LF, Espace,
Tab)
S3 File Output Creates files in an S3 file location.
Run SSH Commands Runs SSH commands and returns results.
Output steps metrics度量 Returns metrics for one or more steps within a transformation.
New Job Entries
Pentaho Data Integration 4.1 adds the following new job steps/entries:
Icon Step Name Description
Amazon EMR Job Executor执行者 Executes Map/Reduce jobs in Amazon EMR
Hadoop Copy Files Copies files to and from HDFS or Amazon S3
Hadoop Job Executor Executes Map/Reduce jobs in Hadoop
Hadoop Transformation Job
Executor
Executes PDI transformation-based Map/Reduce jobs in Hadoop
Write to File Job Entry At job level, directly write some data (static or in variables)
分享到:
相关推荐
Pentaho Data Integration(原Kettle)和 Data Reporting工具最新版9.0.0.2.530百度云下载地址 ETL工具及报表展示工具
Kettle 改名啦!!! 现在叫 Pentaho Data Integration Kettle9.4版本 Pentaho Data Integration 9.4 PDI 9.4 下载地址: ...
使用Pentaho Data Integration 5.0.3整合SQL Server数据库和Salesforce数据的一个Demo,包括ppt,kettle文件及所有数据库backup文件及参考文档及代码。
本书涵盖了Pentaho Data Integration(Kettle)的广泛知识,是深入理解这个强大ETL(提取、转换、加载)工具的重要资源。Pentaho Data Integration,也被称为Kettle,是开源软件,用于在各种数据源之间进行数据迁移...
Pentaho Data Integration,通常称为Kettle,是一个强大的ETL(提取、转换、加载)工具,用于处理数据集成任务。它的Java API允许开发者在Java应用程序中直接调用和控制Kettle的数据转换和作业执行,提供了丰富的...
《Pentaho Data Integration Quick Start Guide(2018)随书代码》是关于Pentaho Data Integration(也称为Kettle)的入门指南,旨在帮助初学者快速掌握这个强大的ETL(提取、转换、加载)工具。这本书包含了从Chapter...
《Pentaho Data Integration Beginner's Guide, Second Edition》是一本专为初学者设计的指南,旨在帮助读者快速掌握Pentaho Data Integration(简称PDI,也称Kettle)的基础知识。PDI是一款强大的ETL(提取、转换、...
《Pentaho 3.2 Data Integration- Beginner's Guide》是一本专为初学者设计的指南,旨在帮助读者快速理解并掌握Pentaho Data Integration(Kettle)的基础知识。Pentaho Data Integration,通常简称为Kettle,是...
Pentaho Data Integration 4 Cookbook原书里面的sampledata数据库不能使用了,我改了下,这是能够导入到mysql里面的
### Pentaho Data Integration (Kettle) 完全自学手册知识点概览 #### 一、Kettle基础介绍 - **Kettle概述**:Pentaho Data Integration(简称PDI),也称为Kettle,是一款开源的数据集成工具,主要用于企业级的...
《Pentaho Data Integration Beginner's Guide, Second Edition》是一本专为初学者设计的指南,旨在帮助读者理解和掌握Pentaho Data Integration(PDI),也称为Kettle。这本书的示例代码文件是辅助学习的重要资源,...
**Pentaho Data Integration(PDI)5.3详解** Pentaho Data Integration,简称PDI,也被称为Kettle,是开源软件公司Pentaho公司推出的一款强大的ETL(Extract, Transform, Load)工具。ETL是数据仓库系统中至关重要...
### Pentaho Data Integration 4 Cookbook 知识点详解 #### 一、Pentaho Data Integration (Kettle) 概述 - **定义与背景**:Pentaho Data Integration(PDI),也称为Kettle,是一种开源的数据集成工具,用于执行...
《Pentaho Data Integration Kitchen详解》 Pentaho Data Integration(Kettle)是企业级的数据集成工具,其中的Kitchen组件是其命令行接口,用于执行ETL(提取、转换、加载)过程。本资料主要围绕2013年版本的...
《Pentaho 3.2 Data Integration and Spoon_3_0_0_User_Guide》是针对Pentaho数据集成工具Spoon的一个详细用户指南。Pentaho Data Integration(也称为Kettle)是一款强大的ETL(提取、转换、加载)工具,它允许用户...
### Pentaho Data Integration:Beginner's Guide #### 一、Pentaho Data Integration 简介 《Pentaho Data Integration:Beginner's Guide》是一本面向初学者的专业指南,旨在帮助读者掌握Pentaho Data ...