Access Keys:
Skip to content (Access Key - 0)

spacer

View • Edit • • • Log In • spacer

Extracting Data from the Hadoop Cluster

{0}."> {0}-{1} of {2} pages containing {3}.">
spacer Transforming Data within a Hadoop Cluster spacer Hadoop Reporting on Data in Hadoopspacer

How to extract data from Hadoop using HDFS, Hive, and HBase.

  • Extracting Data from HDFS to Load an RDBMS — How to use a PDI transformation to extract data from HDFS and load it into a RDBMS table.
  • Extracting Data from Hive to Load an RDBMS — How to use a PDI transformation to extract data from Hive and load it into a RDBMS table.
  • Extracting Data from HBase to Load an RDBMS — How to use a PDI transformation to extract data from HBase and load it into a RDBMS table.
  • Extracting Data from Snappy Compressed Files — How to configure client-side PDI so that files compressed using the Snappy codec can be decompressed using the Hadoop file input or Text file input step.

This documentation is maintained by the Pentaho community, and members are encouraged to create new pages in the appropriate spaces, or edit existing pages that need to be corrected or updated.

Please do not leave comments on Wiki pages asking for help. They will be deleted. Use the forums instead.

Adaptavist Theme Builder (4.2.0) Powered by Atlassian Confluence 3.3.3, the Enterprise Wiki
gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.