Blog | Support Portal | Contact

Products
- LucidWorks Search
- LucidWorks Big Data
Support & Services
- Consulting
- LucidWorks University
- Lucene/Solr Support
Resources
About Us
- Management
- Board of Directors
- Committers
- Partners
- Events
- Press Releases
- News
- Careers
- Contact Us
- FAQ

Subscribe by Email

Your email:

Follow Me

Latest Posts

Pig and HBase with LucidWorks Big Data
Find, Discover and Analyze the Value in Big Data
I've uncovered a new form of Identity Theft
LucidWorks Big Data now Integrated with MapR
LucidWorks Big Data & Oozie Workflow With VizOozie
6 Predictions for 2013: Search Marketing & Big Data Analytics
Getting Started with LucidWorks Big Data
Windows Azure support for Solr 4.0 Announced
Windows Azure Websites Quotas, Scaling, and Pricing
How to quickly create a LucidWorks instance on Windows Azure

log a Support Request</a>." display="None" validationGroup="ISBBLatest" id="dnn__ctl3__ctl0_BigProblemCV" evaluationfunction="CustomValidatorEvaluateIsValid" style="color:Red;display:none;">

log a Support Request</a>." display="None" validationGroup="ISBBPopular" id="dnn__ctl4__ctl0_BigProblemCV" evaluationfunction="CustomValidatorEvaluateIsValid" style="color:Red;display:none;">

LucidWorks Blog

Current Articles | RSS Feed

LucidWorks Big Data & Oozie Workflow With VizOozie

Posted by Ivan Provalov on Thu, Feb 07, 2013 @ 05:29 PM

Email Article

In this post we will discuss how to create a visualized workflow graph for Oozie. Oozie is a workflow management system for Hadoop jobs. Oozie Workflow jobs are DAG (Directed Acyclical Graphs) of actions: oozie.apache.org

At LucidWorks we use Oozie in our LucidWorks Big Data product. The workflows which we provide with the platform are configured and run with Oozie. Developers create workflow.xml, workflow definition files for Oozie, and deploy them to Hadoop. A good explanation of how this works is provided here:www.infoq.com/articles/oozieexample

Some workflows get complicated pretty quickly and may include subworkflows, forks and joins and other actions which are hard to follow in xml. A visualization tool then would help streamlining workflow designs and quickly grasp the gist of what the workflow does.

VizOozie is an open source tool which helps converting your static xml workflow definitions into dot files, which can be used by graphviz dot program to create pdf or other formats: www.graphviz.org/

You will need a Unix like environment, python, and graphviz dot installed to run this.

Check it out from github and run:

python vizoozie/vizoozie.py example/workflow.xml example/workflow.dot

or use your own Oozie workflow xml file.

This will generate a dot file which can be easily converted to pdf with dot:

dot -Tpdf example/workflow.dot -o example/workflow.pdf

Standard workflow shapes are used for the start, end, process, join, fork and decision nodes. The action node backfill colors are configurable in the vizoozie.properties file (e.g. java action is in blue).

The code is pretty simple, it takes each node type and converts xml to dot string using xml.dom.minidom and writes it out. For example, given an XML snippet:

  <fork name="post-process">
    <path start="complex-math" />
    <path start="more-complex" />
    <path start="geek-candy-process" />
  </fork>

the code for a fork node looks like this:

    def processFork(self, doc):
        output = ''
        for node in doc.getElementsByTagName("fork"):
            name = self.getName(node)
            output += '\n' + name.replace('-', '_') + " [shape=octagon];\n"
            for path in node.getElementsByTagName("path"):
                start = path.getAttribute("start")
                output += '\n' + name.replace('-', '_') + " -> " + start.replace('-', '_') + ";\n"
        return output

In this method, there is just some node name normalization with name.replace('-', '_') as well specific node shape insertion (shape=octagon). Then, it just looks for the fork's start paths like these: <path start="complex-math" />. From our example above, this method will produce an output like this:

post_process [shape=octagon];
post_process -> complex_math;
post_process -> more_complex;
post_process -> geek_candy_process;

When used with dot program, it will generate a fork node with three children nodes. I hope you find this explanation useful.

LucidWorks transforms the way people access information to enable data-driven decisions. By combining Search with Big Data, the LucidWorks product suite provides real-time access to multi-structured data in motion.

Tags: LucidWorks Big Data, LucidWorks Search, LucidWorks Enterprise

Comments

Currently, there are no comments. Be the first to post one!

Post Comment
Name *
Email *
Website (optional)
Comment * Allowed tags: <a> link, <b> bold, <i> italics
Receive email when someone replies. Subscribe to this blog by email.