Access Keys:
Skip to content (Access Key - 0)

spacer

View • Edit • • • Log In • spacer

Transforming Data within a Hadoop Cluster

{0}."> {0}-{1} of {2} pages containing {3}.">
spacer Loading Data into a Hadoop Cluster spacer Hadoop Extracting Data from the Hadoop Clusterspacer

How to transform data within the Hadoop cluster using Pentaho MapReduce, Hive, and Pig.

  • Using Pentaho MapReduce to Parse Weblog Data — How to use Pentaho MapReduce to convert raw weblog data into parsed, delimited records.
  • Using Pentaho MapReduce to Generate an Aggregate Dataset — How to use Pentaho MapReduce to transform and summarize detailed data into an aggregate dataset.
  • Transforming Data within Hive — How to read data from a Hive table, transform it, and write it to a Hive table within the workflow of a PDI job.
  • Transforming Data with Pig — How to invoke a Pig script from a PDI job.
  • Using Pentaho MapReduce to Parse Mainframe Data — How to use Pentaho to ingest a Mainframe file into HDFS, then use MapReduce to process into delimited records.

This documentation is maintained by the Pentaho community, and members are encouraged to create new pages in the appropriate spaces, or edit existing pages that need to be corrected or updated.

Please do not leave comments on Wiki pages asking for help. They will be deleted. Use the forums instead.

Adaptavist Theme Builder (4.2.0) Powered by Atlassian Confluence 3.3.3, the Enterprise Wiki
gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.