You are here

Home » Ecosystem » Capabilities » Big Data Preparation
spacer

Big Data Preparation

 

Ingest, Manipulate, Integrate, Access, Model and Orchestrate

From ingesting and manipulating data to modeling, Pentaho decreases the time and complexity involved in preparing data for analytics. Pentaho weaves big data technologies like Hadoop and NoSQL with relational data warehouses, data marts, and enterprise applications to deliver integrated, analysis-ready data.

spacer

Simple visual tools to improve developer productivity

 

Pentaho includes a visual extract-transform-load (ETL) tool to load and process big data sources in the same familiar way as traditional relational and file-based data sources. Instead of writing Java programs or Pig scripts, Pentaho empowers less technical developers to design and develop big data jobs using visual tools - resulting in greater team productivity and efficiency.

Pentaho works with any semi-structured and unstructured data type, for example, parsing web log and application log files to extract useful data to gain powerful insights about customer behavior.

In addition, Pentahoโ€™s visual interface enables calling of custom code, for example, to analyze images and video files to extract meaningful metadata for identifying people and places.

Pentaho also provides visual data modeling capabilities, making it quick and easy to deliver an end-user friendly view of the data source.

 

Visual job orchestration

Pentaho provides a rich graphical design tool for orchestrating the execution of jobs in Hadoop, NoSQL and high performance analytic databases, as well as traditional data stores.

Orchestration capabilities include conditional checking steps, event waiting steps, execution steps and notification steps. Together these steps can be combined to enable easy visual assembly of extremely powerful job flow logic, across multiple jobs and data sources.

spacer

Pentaho also integrates with Hadoop-native utilities such as Oozie, an open source workflow/coordination service to manage data processing jobs for Apache Hadoop. This integration is key for companies who have already defined Oozie jobs but would like to migrate over to a visual, no-programming environment like Pentaho.

Processing data volumes and varieties with speed

Pentaho has powerful and innovative capabilities to process massive data volumes within constrained time windows such as:

  • High performance data flow engine โ€“ With a multi-threaded parallel processing architecture and in-memory data caching, Pentaho Data Integration (PDI) provides a world-class enterprise-scalable data integration platform ideal for handling the largest big data challenges.
  • Cluster support โ€“ PDI may be deployed in a cluster, enabling distributed processing of jobs across multiple nodes in the cluster.
  • Run as Hadoop MapReduce โ€“ Pentaho's small footprint and Java-based data integration engine is unique in its ability to execute as a Hadoop MapReduce job, running on every node in a Hadoop cluster of any size with up to thousands of nodes. Pentaho's support for Hadoop's distributed cache makes deployment of Pentaho across the cluster automatic and seamless.

spacer

Instant and interactive analytics

Provides immediate access to data inside Hadoop, NoSQL or other big data stores, and with interactive analysis, rich visualization and data discovery.

Learn more about big data analytics

 
 

Next Steps

Hadoop
NoSQL
Analytic Databases
Free 30-Day Big Data Trial

 
 

Big Data Demos and Studies

Demo: See Instaview in Action

Analyst Research: The Forrester WaveTM: Enterprise Hadoop Solutions, Q1 2012

Webinar: Delivering Big Data Analytics with MongoDB and Pentaho

 
 

Stay Connected with Pentaho

spacer spacer spacer spacer
 
 
spacer
 

Pentaho tightly couples data integration with complete business analytics for big data, supporting Hadoop, NoSQL, and Analytic databases. Pentaho is the only vendor that provides a full big data analytics solution that supports the entire big data analytics process from ETL and data integration to real-time analysis and big data visualization.

 
gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.