With the help of my friend , I find a book on kettle with the name <<Pentaho Kettle olutions>
Building+Open+Source+ETL+Solutions+with+Pentaho+Data+Integration>>,it is really a very cute book which help me to know more on kettle.
in the year20121218, I begin to read this book, from page 1 to page 44, and I get to know the history of kettle and the relaiton of oltp system and data warehouse.because the English is so difficult, therefore I have to read vary carefully.
20130106
TOPIC 1 Agile BI
1)ETL Design
2) Data Acquisition
3) Beware of Spreadsheets
4) Design for failure
Kettle contains many features to do this. You can:
• Test a repository connection.
• Ping a host to check whether it’s available.
• Wait for a SQL command to return success/failure based on a row count condition.
• Check for empty folders.
• Check for the existence of a file, table, or column.
• Compare files or folders.
• Set a timeout on FTP and SSH connections.
• Create failure/success outputs on every available job step.
5) Change data capture
6) Data Quality
2013-1-16
today , I try to study the kettle components, kettle is very powerful with the following building blocks.although it is a little difficult to develop the ETL jobs at the beginning,but it much easy to maintence the ETL jobs at the end. so it is a nice tools.
The Building Blocks of Kettle Design
This section introduces and explains some of the Kettle specific terminology.
Transformations
A transformation is the workhorse of your ETL solution. It handles the manipulation of rows or data in the broadest possible meaning of the extraction, transformation, and loading acronym.
Steps
A step is a core building block in a transformation. It is graphically represented in the form of an icon;
Transformation Hops
A hop, represented by an arrow between two steps, defines the data path between the steps. The hop also represents a row buffer called a row set between two steps.
Parallelism
The simple rules enforced by the hops allow steps to be executed in a parallel nature in separate threads.
Rows of Data
The data that passes from step to step over a hop comes in the form of a row of data. A row is a collection of zero or more fields that can contain the data in any of the following data types:
• String: Any type of character data without any particular limit.
• Number: A double precision floating point number.
• Integer: A signed long integer (64-bit).
• BigNumber: A number with arbitrary (unlimited) precision.
• Date: A date-time value with millisecond precision.
• Boolean: A Boolean value can contain true or false.
Binary: Binary fields can contain images, sounds, videos, and other types of binary data.
Data Conversion
Jobs
A job consists of one or more job entries that are executed in a certain order. The order of execution is determined by the job hops between job entries as well as the result of the execution itself.
Job Entries
A job entry is a core building block of a job. Like a step, it is also graphically represented in the form of an icon. However, if you look a bit closer, you see that job entries differ in a number of ways:
Job Hops
Multiple Paths and Backtracking
Job Entry Results
.
Tools and Utilities
Kettle contains a number of tools and utilities that help you in various ways and in various stages of your ETL project. The core tools of the Kettle software stack include:
• Spoon: A graphical user interface that will allow you to quickly design and manage complex ETL workloads.
• Kitchen: A command-line tool that allows you to run jobs
• Pan: A command-line tool that allows you to run transformations.
• Carte: A lightweight (around 1MB) web server that enables remote execution of transformations and jobs. A Carte instance also represents a slave server, a key part of Kettle clustering (MPP).
Chapter 3 provides more detailed information on these tools.
Repositories
When you are faced with larger ETL projects with many ETL developers working together, it’s important to have facilities in place that enable cooperation. Kettle provides a way of defining repository types in a pluggable and flexible way.
• Database repository:
• Pentaho repository:
• File repository:
• Central storage:
• File locking:
• Revision management:
• Referential integrity checking:
• Security:
• Referencing: tact.
Virtual File Systems
Flexible and uniform file handling is very important to any ETL tool. That is why Kettle supports the specification of files in the broadest sense as URLs. The Apache Commons VFS back end that was put in place will then take care of the complexity for you. For example, with Apache VFS, it is possible to process a selection of files inside a .zip archive in exactly the same way as you would process a list of files in a local folder. For more information on how to specify VFS files, visit the Apache VFS website at http://commons.apache.org/vfs/.
Table 2-5 shows a few typical examples.
- 大小: 40.1 KB
- 大小: 51.9 KB
分享到:
相关推荐
内容概要:ETL之kettle包含26张PPT,kettle安装、使用、如何连接等,使用PDI9.2演示、什么是ETL、什么是Kettle、kettle安装、kettle目录结构、转换和作业、kettle操作、数据库连接、注意事项。 适合人群:具备一定...
我自己编写的KETTLE6.1与KETTLE7.1版本之间的差距比较
kettle7.1下载资源
kettle 9.2 各版本 drivers
kettle使用事务的转换
本地简单kettle抽MySQL数据到ES中 案例.zip本地简单kettle抽MySQL数据到ES中 案例.zip 本地简单kettle抽MySQL数据到ES中 案例.zip本地简单kettle抽MySQL数据到ES中 案例.zip 本地简单kettle抽MySQL数据到ES中 案例....
这个kettle 的插件,它可以从一个或多个 PDF 文件中抽取文本内容,抽取后的文本一页作为一行记录,便于后续处理,如写入数据库等等。 帮助手册 http://www.xgndata.com/resources/kettle/PFR_UserGuide_zh_CN.pdf ...
kettle相关jar包,kettle相关jar包,kettle相关jar包kettle相关jar包kettle相关jar包kettle相关jar包kettle相关jar包kettle相关jar包
kettle设置循环变量,控制循环作业;kettle设置循环变量,控制循环作业.
BI Kettle中文文档汇集 ELT平台操作手册-KETTLE.pdf ETL工具kettle.pdf etl工具kettle公司学习文档.pdf ETL工具kettle学习总结.pdf ETL工具Kettle用户手册3.0.pdf ETL工具Spoon 2.5.0用户手册.pdf KETTLE.pdf...
kettle增量抽取数据
适配了达梦数据库的kettle核心代码。
全网最详细的kettle教程
web版kettle源码
kettle jar在官方或很多镜像仓库中都没有,需要手动下载。 本文提供了kettle 8.2相关的5个jar,进行kettle相关插件的开发完成够用了。 如下为本kettle压缩包包含的文件: kettle-core-8.2.0.0-342.jar kettle-dbdialog...
springboot项目整合kettle项目源码,为各位开发提供一个完整的项目代码参考。 java整合kettle项目源码。
kettle-core-7.1.0.0-12.jar kettle-dbdialog-7.1.0.0-12.jar kettle-engine-7.1.0.0-12.jar pentaho-metadata-7.1.0.0-12.jar
kettle数据抓取操作手册
Kettle9.0下载地址 Kettle9.0下载地址 Kettle9.0下载地址
20210511_kettle抽取mysql增量到ES中.zip20210511_kettle抽取mysql增量到ES中.zip20210511_kettle抽取mysql增量到ES中.zip20210511_kettle抽取mysql增量到ES中.zip20210511_kettle抽取mysql增量到ES中.zip20210511_...