spacer spacer BI Platform / Server Pentaho Reporting Kettle Mondrian Weka   spacer
spacer

Pentaho Data Integration (Kettle)

Welcome to the community home for Pentaho Data Integration Community Edition (PDI CE) also known as Kettle. Pentaho Data Integration delivers powerful Extraction, Transformation and Loading (ETL) capabilities using an innovative, metadata-driven approach. With an intuitive, graphical, drag and drop design environment, and a proven, scalable, standards-based architecture, Pentaho Data Integration is increasingly the choice for organizations over traditional, proprietary ETL or data integration tools.

Community Edition is self supported open source software. An Enterprise Edition (EE) of Pentaho Data Integration including technical support, managed upgrades and enterprise features is also available. For more information about EE or for screenshots and datasheets, visit Pentaho Data Integration EE on Pentaho's corporate site.

Recent News and Releases

- 2012-04-20 - Stable build of Kettle 4.3 released: More info.
- 2011-09-12 - Stable build of Kettle 4.2 released: download now.
- 2011-07-01 - Release Candidate 1 of Kettle 4.2 released: download now.
- 2010-11-30 - Stable build of Kettle 4.1 released: download now.
- 2010-11-30 - Kettle Agile BI Plugin 1.0.2-stable: download now.

Stable
Pentaho Data Integration 4.2.0 stable
This is a stable build of Pentaho Data Integration (Kettle) 4.2.0. New features:
  • Excel Writer step has advanced output functionality to control the look and feel.
  • Graphical performance and progress feedback for transformations
  • Google Analytics step allows download of statistics from your Google analytics account
  • Pentaho Reporting Output step makes it possible for you to run your (parameterized) Pentaho reports in a transformation. It allows for easy report bursting of personalized reports.
  • Automatic Documentation step generates (simple) doc of your transformations and jobs.
  • Get repository names step retrieves job and transformation information from your repositories.
  • LDAP Writer step
  • Ingres VectorWise (streaming) bulk loader step
  • Greenplumb (streaming) bulk loader step (for gpload)
  • Talend Job Execution job entry
  • Healthcare Level 7 : HL7 Input step, HL7 MLLP Input and HL7 MLLP Acknowledge job entries
  • PGP File Encryption, Decryption & validation job entries.
  • Single Threader step for parallel performance tuning of large transformations
  • Allow a job to be started at a job entry of your choice (continue after fixing an error)
  • MongoDB Input step (including authentication)
  • ElasticSearch bulk loader
  • XML Input Stream (StAX) step to read huge XML files at optimal performance and flat memory usage by flattening the structure of the data.
  • Get ID from Slave Server step allows multi-host or clustered transformations to get globally unique integer IDs from a slave server: See wiki doc for more info
  • Memory tuning of logging back-end with: KETTLE_MAX_LOGGING_REGISTRY_SIZE, KETTLE_MAX_JOB_ENTRIES_LOGGED, KETTLE_MAX_JOB_TRACKER_SIZE allowing for flat memory usage for never ending ETL in general and jobs specifically.
  • Multiway Merge Join step (experimental) allows for any number of data sources to be joined using one or more keys using an inner or a full outer join algorithm.
Carte improvements:
  • reserve next value range from a slave sequence service
  • allow parallel (simultaneous) runs of clustered transformations
  • list (reserved and free) socket reservations service
  • new options in XML for configuring slave sequences
  • allow time-out of stale objects using environment variable KETTLE_CARTE_OBJECT_TIMEOUT_MINUTES
Repository Import/Export:
  • Export at the repository folder level
  • Export and Import with optional rule-based validations
  • Import command line utility allow for rule-based (optional) import of lists of transformations, jobs and repository export files: See wiki doc for more info
ETL Metadata Injection:
  • Retrieval of rows of data from a step to the “metadata injection” step
  • Support for injection into the “Excel Input” step
  • Support for injection into the “Row normaliser” step
  • Support for injection into the “Row Denormaliser” step
Many bug fixes. See Release Notes for 4.2.0 for more info
 
- Downloads spacer - Source spacer - Documentation spacer - Forum spacer  
In Development
Developer Resources
- Roadmap spacer - Sprint Homepage spacer - Open Issues spacer - Continuous Integration Builds spacer  
- Source spacer - Documentation spacer - Developer Forum spacer  
PDI 4.4.0 (platform Release 5.0 - Sugar) - In Progress
The primary goal of the PDI version 4.3 is Ease of Management with features for conducting Lifecycle Management along with significant improvements to Administration and Monitoring capabilities.
- Task Board for 4.4.0 GA spacer - Prod Management for 4.4.0 spacer - JIRA Cases for 4.4.0 spacer - Source (trunk) spacer  

 

Upcoming Training
Mastering Pentaho Data Integration
Pentaho BI Suite Bootcamp
See all Courses
Quick Links

- Frequently Asked Questions
- Online Documentation
- Matt's blog
- Case Studies
- Java API Examples
- Screenshots
- Recorded Demos
- Partners
- Get Support

Pentaho Advertisement

spacer
Contribute to the Project

You can participate by contributing new code, reporting bugs, testing new releases, answering questions and more; Email us the proposed contribution and any other relevant details. Welcome to the team.

- Write a tech tip
- Report a bug in JIRA
- Answer posts on the forums
- Write some code
- How to Contribute


spacer
spacer spacer spacer
gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.