1.Pentaho Big Data BI Knowledage @ http://wiki.pentaho.com/display/BAD/How+To%27s
Hadoop
-
Loading Data into a Hadoop Cluster — How to load data into HDFS (Hadoop's Distributed File System), Hive and HBase.
- Loading Data into HDFS — How to use a PDI job to move a file into HDFS.
- Loading Data into Hive — How to use a PDI job to load a data file into a Hive table.
- Loading Data into HBase — How to use a PDI transformation that sources data from a flat file and writes to an HBase table.
-
Transforming Data within a Hadoop Cluster — How to transform data within the Hadoop cluster using Pentaho MapReduce, Hive, and Pig.
- Using Pentaho MapReduce to Parse Weblog Data — How to use Pentaho MapReduce to convert raw weblog data into parsed, delimited records.
- Using Pentaho MapReduce to Generate an Aggregate Dataset — How to use Pentaho MapReduce to transform and summarize detailed data into an aggregate dataset.
- Transforming Data within Hive — How to read data from a Hive table, transform it, and write it to a Hive table within the workflow of a PDI job.
- Transforming Data with Pig — How to invoke a Pig script from a PDI job.
-
Extracting Data from the Hadoop Cluster — How to extract data from Hadoop using HDFS, Hive, and HBase.
- Extracting Data from HDFS to Load an RDBMS — How to use a PDI transformation to extract data from HDFS and load it into a RDBMS table.
- Extracting Data from Hive to Load an RDBMS — How to use a PDI transformation to extract data from Hive and load it into a RDBMS table.
- Extracting Data from HBase to Load an RDBMS — How to use a PDI transformation to extract data from HBase and load it into a RDBMS table.
- Extracting Data from Snappy Compressed Files — How to configure client-side PDI so that files compressed using the Snappy codec can be decompressed using the Hadoop file input or Text file input step.
-
Reporting on Data in Hadoop — How to report on data that is resident within the Hadoop cluster.
- Reporting on HDFS File Data — How to create a report that sources data from a HDFS file.
- Reporting on HBase Data — How to create a report that sources data from HBase.
- Reporting on Hive Data — How to create a report that sources data from Hive.
- Unit Test Pentaho MapReduce Transformation — How to unit test the mapper and reducer transformations that make up a Pentaho MapReduce job.
-
Advanced Pentaho MapReduce — Advanced how-tos for developing Pentaho MapReduce.
- Using Compression with Pentaho MapReduce — How to use compression with Pentaho MapReduce.
- Using a Custom Partitioner in Pentaho MapReduce — How to use a custom partitioner in Pentaho MapReduce.
- Using a Custom Input or Output Format in Pentaho MapReduce — How to use a custom Input or Output Format in Pentaho MapReduce.
- Processing HBase data in Pentaho MapReduce using TableInputFormat — How to use HBase TableInputFormat in Pentaho MapReduce.
MapR
-
Loading Data into a MapR Cluster — How to load data into CLDB (MapR’s distributed file system), Hive and HBase.
- Loading Data into CLDB — How to use a PDI job to move a file into CLDB.
- Loading Data into MapR Hive — How to use a PDI job to load a data file into a Hive table.
- Loading Data into MapR HBase — How to use a PDI transformation that sources data from a flat file and writes to an HBase table.
-
Transforming Data within a MapR Cluster — How to leverage the massively parallel, fault tolerant MapR processing engine to transform resident cluster data.
- Using Pentaho MapReduce to Parse Weblog Data in MapR — How to use Pentaho MapReduce to convert raw weblog data into parsed, delimited records.
- Using Pentaho MapReduce to Generate an Aggregate Dataset in MapR — How to use Pentaho MapReduce to transform and summarize detailed data into an aggregate dataset.
- Transforming Data within Hive in MapR — How to read data from a Hive table, transform it, and write it to a Hive table within the workflow of a PDI job.
- Transforming Data with Pig in MapR — How to invoke a Pig script from a PDI job.
-
Extracting Data from the MapR Cluster — How to extract data from the MapR cluster and load it into an RDBMS table.
- Extracting Data from CLDB to Load an RDBMS — How to use a PDI transformation to extract data from MapR CLDB and load it into a RDBMS table.
- Extracting Data from Hive to Load an RDBMS in MapR — How to use a PDI transformation to extract data from Hive and load it into a RDBMS table.
- Extracting Data from HBase to Load an RDBMS in MapR — How to use a PDI transformation to extract data from HBase and load it into a RDBMS table.
-
Reporting on Data in the MapR Cluster — How to report on data that is resident within the MapR cluster.
- Reporting on CLDB File Data — How to create a report that sources data from a MapR CLDB file.
- Reporting on HBase Data in MapR — How to create a report that sources data from HBase.
- Reporting on Hive Data in MapR — How to create a report that sources data from Hive.
Cassandra
- Write Data To Cassandra — How to read data from a data source (flat file) and write it to a column family in Cassandra using a graphic tool.
- How To Read Data From Cassandra — How to read data from a column family in Cassandra using a graphic tool.
- How To Create a Report with Cassandra — How to create a report that uses data from a column family in Cassandra using graphic tools.
MongoDB
- Write Data To MongoDB — How to read data from a data source (flat file) and write it to a collection in MongoDB
- Read Data From MongoDB — How to read data from a collection in MongoDB.
- Create a Report with MongoDB — How to create a report that uses data from a collection in MongoDB.
- Create a Parameterized Report with MongoDB — How to create a parameterize report that uses data from a collection in MongoDB.
Instaview
- Google Analytics Instaview Sample template — Instaview template for use with Google Analytics
- MongoDB Instaview Sample template — Sample Instaview template for use with MongoDB
2. Pentaho ABC's are @http://docs.huihoo.com/pentaho/pentaho-business-analytics/4.8/
-
Pentaho User Console GuideReference material and task-based documentation on Pentaho Dashboard Designer, Pentaho Analyzer, Pentaho Interactive Reporting, and the content scheduling and authorization features in the Pentaho User Console.
-
Analysis GuideGuidance and theory on creating ROLAP schemas with Schema Workbench and Pentaho Data Integration; Pentaho Analyzer and JPivot user documentation; Mondrian engine and Pentaho Analyzer configuration instructions. Also includes an MDX element reference.
-
Report Designer User GuideReference material and task-based documentation on creating, editing, and publishing reports with Pentaho Report Designer. Includes a complete chart property reference.
-
Metadata Editor User GuideReference material and task-based documentation for creating, editing, and publishing metadata models with Pentaho Metadata Editor.
-
Pentaho Data Integration User GuideReference material and task-based documentation that covers the majority of the functionality in Pentaho Data Integration.
-
Big Data GuideHow to install and configure PDI for various Hadoop distributions, along with procedural documentation on how to use the Big Data steps and entries in Pentaho Data Integration.
-
Aggregation Designer User GuideInformation on creating aggregate tables for Pentaho Analysis.
-
Getting Started with Pentaho Business AnalyticsDetailed walkthroughs that show how to create content with Pentaho User Console design tools.
-
Getting Started with Pentaho Data IntegrationEvaluation document showcasing the high-value features of PDI.
-
Getting Started with Pentaho Data Integration InstaviewDetailed walkthroughs that show how to use PDI's Instaview to quickly generate transform and analyze data from a variety of sources.
-
Pentaho Business Analytics Graphical Installation GuideComplete instructions for performing a production installation using the Business Analytics graphical installation utility for Windows, Linux, or OS X. This method is recommended and encouraged for evaluation deployments, but is not typical for production installations.
-
Pentaho Business Analytics Archive-Based Installation GuideDeployment instructions for the premade Business Analytics archive packages in .tar or .zip format. Packages are available for all individual parts of Business Analytics. This is a common production deployment scenario for both servers and workstations.
-
Pentaho BA Server Manual Deployment GuideInstructions for building a custom BA Server J2EE WAR for Tomcat or JBoss. This is a typical production deployment scenario for servers.
-
Pentaho Data Integration Installation GuideComplete instructions for performing a production installation of Pentaho Data Integration for servers and workstations. Covers both archive package deployment and graphical installation utility execution. Installation to Hadoop nodes is also covered in detail.
-
Business Analytics Upgrade GuideUpgrade instructions for migrating from the previous major Business Analytics release to the newest. This guide also includes all of the content from the PDI Upgrade Guide.
-
Pentaho Data Integration Upgrade GuideInstructions for upgrading from the previous version of Pentaho Data Integration to the newest. This pertains to both the client tool (Spoon) and the Data Integration Server (DI Server).
-
Business Analytics Administrator's GuideExplains system configuration and administration tasks for the Pentaho BA Server and DI Server. (Please see the Security Guide for detailed instructions on implementing different user authentication methods).
-
Business Analytics Troubleshooting GuideA collection of troubleshooting topics from all other Pentaho guides. You may find this useful if you have encountered some kind of error but don't know where in Business Analytics to look for the root cause.
-
Business Analytics Security GuideInstructions and guidance for implementing a different user authentication method, or for implementing SSL on the BA Server. Covers Active Directory, LDAP, single sign-on, and custom JDBC authentication.
-
Business Analytics Performance-Tuning GuideGuidance and instructions for improving performance in most areas of Business Analytics. Covers modification of Business Analytics, guidelines for content streamlining, application server clustering, and advice on performance monitoring and testing.
-
Pentaho Data Integration Administrator's GuideExplains system configuration and administration tasks for the DI Server.
-
Customizing Pentaho Business AnalyticsInstructions for localization and basic customization of the Pentaho User Console, including Pentaho Analyzer, Interactive Reporting, and Dashboard Designer.
-
Creating Pentaho DashboardsDesign theory for creating dashboards that use Pentaho content. Covers Pentaho Dashboard Designer, Community Dashboard Framework (CDF), and basic guidance for custom JSPs.
-
Integrating With the BA ServerCode samples and URL parameter reference material that shows how to interact with or embed (in an existing Web application) Pentaho Analyzer, Dashboard Designer, and Interactive Reporting.
-
Creating Action SequencesReference material, guidance, and code samples for creating action sequences to run on the Pentaho BI Platform. Includes user documentation for Pentaho Design Studio.
-
Extending and Embedding Pentaho Data IntegrationInstructions, Java classes and methods, as well as Eclipse-based sample plugin projects that show you how to programatically extend PDI functionality or embed the PDI engine into your own applications.
相关推荐
这个压缩包中包含的是一系列经典的中文资料,涵盖了Pentaho的不同方面,旨在帮助用户深入理解和使用Pentaho。 1. **Advanced_Reporting_Guide-zh-CN-1.5.4.htm**:这份文档详细介绍了Pentaho高级报告功能,包括如何...
这份压缩包中的英文资料集合提供了全面的入门指南,涵盖了从基本使用到高级开发的多个方面,帮助用户深入理解和掌握 Pentaho 的功能。 1. **Pentaho Report Design Wizard**:这个文档很可能是关于如何使用Pentaho...
Pentaho Kettle是一款强大的数据集成工具,也被...总的来说,这套资料为学习和掌握Pentaho Kettle提供了一个全面的资源库,不仅涵盖了基础操作,还包括了进阶开发和实战技巧,有助于提升用户在数据集成领域的专业能力。
作为入门资料,你可能找到了包括教程、案例研究、视频课程等内容,这些都是学习Pentaho的好资源。了解基本概念后,你可以通过实践项目加深理解,例如建立一个简单的数据ETL流程,设计一个报表,或者创建一个交互式的...
pentaho 中文资料 Getting_Started_with_the_BI_Platform-zh-CN-1.5.4.htm, Manual_Deployment_of_Pentaho-zh-CN-1.5.4.htm, Pentaho_AJAX_Guide_zh_CN_1.2.0.htm, Pentaho_Building_Components-1.5.4.htm, Pentaho...
Pentaho BI Suite 初级培训资料概述 Pentaho 是一个开源的商业智能(BI)套件,它包括了数据提取(ETL)、分析、元数据管理以及报告功能。这个套件主要根据GNU General Public License version 2授权,部分组件则...
描述提到的"pentaho启动难得源码资料"可能意味着这个压缩包内可能包含了Pentaho启动过程的源代码,这对于开发者和系统管理员来说是宝贵的资源,他们可以通过阅读源码来理解和定制Pentaho的启动行为。 标签"pentaho...
总的来说,《Pentaho Business Analytics Cookbook(2014)随书代码》是一个全面的学习资源,涵盖了Pentaho平台的各个方面,对于希望提升Pentaho技能的IT专业人士来说,是一份宝贵的参考资料。通过实践书中的代码示例...
《Pentaho Kettle Solutions 中文版文档》是学习Pentaho Kettle这一强大ETL(数据抽取、转换、加载)工具的重要参考资料。Kettle,又称为PDI(Pentaho Data Integration),是一款开源的数据集成解决方案,广泛应用...
Pentaho技术白皮书中文版.手头资料分享
为了帮助用户更好地理解和使用Pentaho商业智能平台,本白皮书还提供了详细的术语解释和开源软件资源指南,旨在为用户提供全面的技术支持和学习资料。无论是初学者还是经验丰富的数据分析师,都可以在这里找到所需的...
书中的源码例子资料是学习Pentaho Kettle的重要资源,它们可以帮助读者深入理解ETL过程的实现细节。这些示例涵盖了从简单的数据抽取到复杂的转换和加载流程,有助于读者快速掌握Pentaho Kettle的工作机制。 首先,...
总的来说,"pentaho-kettle-9.0.0.2-R.tar.gz"这个压缩包提供了深入研究和开发Pentaho Kettle的宝贵资料,无论是对学习数据集成技术,还是对提升现有项目性能,都有极大的帮助。通过仔细研究源码,开发者能够更好地...
- 将涵盖Kettle的基础操作、高级技巧、最佳实践等,是学习Kettle的重要资料。 9. **Kettle使用培训文档PPT**: - 可能包含演示如何创建和运行转换、作业的实例,以及解决常见问题的方法。 通过深入学习上述内容...
Pentaho Kettle 4.2.1 基础教程 Pentaho Kettle 是一款功能强大且灵活的ETTL工具,提供了图形化的用户界面,能够帮助用户实现数据的抽取、转换、装入和加载。Kettle 的主要组件包括 Spoon 和 Pan,Spoon 是一个图形...
总的来说,这份报告深入剖析了Pentaho的各个方面,对于想要理解和开发基于Pentaho的BI解决方案的人员来说,是一份宝贵的参考资料。通过阅读和理解这份报告,读者能够获得关于Pentaho平台的全面认识,包括其设计原则...