spacer
  • Products
    • LucidWorks Search
    • LucidWorks Big Data
  • Support & Services
    • Consulting
    • LucidWorks University
    • Lucene/Solr Support
  • Resources
  • About Us
    • Management
    • Board of Directors
    • Committers
    • Partners
    • Events
    • Press Releases
    • News
    • Careers
    • Contact Us
    • FAQ

Subscribe by Email

Your email:

Follow Me

Latest Posts

  • Pig and HBase with LucidWorks Big Data
  • Find, Discover and Analyze the Value in Big Data
  • I've uncovered a new form of Identity Theft
  • LucidWorks Big Data now Integrated with MapR
  • LucidWorks Big Data & Oozie Workflow With VizOozie
  • 6 Predictions for 2013: Search Marketing & Big Data Analytics
  • Getting Started with LucidWorks Big Data
  • Windows Azure support for Solr 4.0 Announced
  • Windows Azure Websites Quotas, Scaling, and Pricing
  • How to quickly create a LucidWorks instance on Windows Azure
log a Support Request</a>." display="None" validationGroup="ISBBLatest" id="dnn__ctl3__ctl0_BigProblemCV" evaluationfunction="CustomValidatorEvaluateIsValid" style="color:Red;display:none;">

Browse by Tag

  • announcements (3)
  • apache (1)
  • aradigm shift in big data (1)
  • Azure (6)
  • big data (15)
  • big data ecosystem (1)
  • big data paradigm shift (1)
  • big data platform (1)
  • Cloud (2)
  • Cloud Computing (2)
  • Cloud Search (6)
  • dark data (1)
  • Enterprise Search (7)
  • Hadoop (4)
  • Lucene Revolution (1)
  • Lucene/Solr (1)
  • LucidWorks Big Data (5)
  • LucidWorks Enterprise (1)
  • LucidWorks Search (4)
  • Mahout (1)
  • MapR (1)
  • Open Source Search (1)
  • Trends (1)
  • Tutorial (2)
  • unstructured search (8)
log a Support Request</a>." display="None" validationGroup="ISBBPopular" id="dnn__ctl4__ctl0_BigProblemCV" evaluationfunction="CustomValidatorEvaluateIsValid" style="color:Red;display:none;">

LucidWorks Blog

Current Articles | spacer  RSS Feed

LucidWorks Big Data & Oozie Workflow With VizOozie

Posted by Ivan Provalov on Thu, Feb 07, 2013 @ 05:29 PM
  
spacer  Email Article  
Tweet  
  

In this post we will discuss how to create a visualized workflow graph for Oozie. Oozie is a workflow management system for Hadoop jobs. Oozie Workflow jobs are DAG (Directed Acyclical Graphs) of actions: oozie.apache.org

At LucidWorks we use Oozie in our LucidWorks Big Data product. The workflows which we provide with the platform are configured and run with Oozie. Developers create workflow.xml, workflow definition files for Oozie, and deploy them to Hadoop. A good explanation of how this works is provided here:www.infoq.com/articles/oozieexample

Some workflows get complicated pretty quickly and may include subworkflows, forks and joins and other actions which are hard to follow in xml. A visualization tool then would help streamlining workflow designs and quickly grasp the gist of what the workflow does.

VizOozie is an open source tool which helps converting your static xml workflow definitions into dot files, which can be used by graphviz dot program to create pdf or other formats: www.graphviz.org/

You will need a Unix like environment, python, and graphviz dot installed to run this.

Check it out from github and run:

python vizoozie/vizoozie.py example/workflow.xml example/workflow.dot

or use your own Oozie workflow xml file.

This will generate a dot file which can be easily converted to pdf with dot:

dot -Tpdf example/workflow.dot -o example/workflow.pdf

spacer

Standard workflow shapes are used for the start, end, process, join, fork and decision nodes. The action node backfill colors are configurable in the vizoozie.properties file (e.g. java action is in blue).

The code is pretty simple, it takes each node type and converts xml to dot string using xml.dom.minidom and writes it out. For example, given an XML snippet:

  <fork name="post-process">
    <path start="complex-math" />
    <path start="more-complex" />
    <path start="geek-candy-process" />
  </fork>

the code for a fork node looks like this:

    def processFork(self, doc):
        output = ''
        for node in doc.getElementsByTagName("fork"):
            name = self.getName(node)
            output += '\n' + name.replace('-', '_') + " [shape=octagon];\n"
            for path in node.getElementsByTagName("path"):
                start = path.getAttribute("start")
                output += '\n' + name.replace('-', '_') + " -> " + start.replace('-', '_') + ";\n"
        return output

In this method, there is just some node name normalization with name.replace('-', '_') as well specific node shape insertion (shape=octagon). Then, it just looks for the fork's start paths like these: <path start="complex-math" />. From our example above, this method will produce an output like this:

post_process [shape=octagon];
post_process -> complex_math;
post_process -> more_complex;
post_process -> geek_candy_process;

When used with dot program, it will generate a fork node with three children nodes. I hope you find this explanation useful.

IP

LucidWorks transforms the way people access information to enable data-driven decisions.  By combining Search with Big Data, the LucidWorks product suite provides real-time access to multi-structured data in motion.

Tags: LucidWorks Big Data, LucidWorks Search, LucidWorks Enterprise

Comments

Currently, there are no comments. Be the first to post one!
Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics

log a Support Request</a>." display="None" validationGroup="ISBizBlogger" id="dnn__ctl0__ctl0_BigProblemCV" evaluationfunction="CustomValidatorEvaluateIsValid" style="color:Red;display:none;">
gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.