« Hard core multicore with TestNG
New blog »

More on multithreaded topological sorting

spacer

I received a few interesting comments on my previous entry regarding the new multithreaded topological sort I implemented in TestNG and there is one in particular from Rafael Naufal that I wanted to address:

The @Priority annotation couldn’t be adapted to say which free methods get scheduled first? BTW, the responsibility of knowing which nodes are free couldn’t be moved to the graph of test methods?

This is pushing my current algorithm even further in the sense that we not only want to schedule free nodes as they become available, we also want to schedule them in order of importance.

What makes a node more important than another? Its level of dependencies. The more a node is depended upon, the more beneficial it is to schedule it as soon as possible since this will end up freeing more nodes. Admittedly, you are still bounded by the size of your thread pool, but this is exactly what we want: increasing the pool size should lead to more parallelism and therefore better performance, but the current scheduling algorithm being fair (or random) means that we are not guaranteed to see this performance increase.

My first reaction was to modify my Executor but as it turns out, you can actually do this with the existing implementation. The constructor of all the Executors takes a
BlockingQueue in parameter, which is the queue that the Executor will use to process the workers. Unsurprisingly, there already is a priority queue called PriorityBlockingQueue.

All you need to do is to use that queue instead of the default one when you create your Executor and then make sure that the workers you pass it have a natural ordering. In this case, the weight of a worker is how many other workers depend on it, which is very easy to calculate.

On a related topic, I wanted to get a closer look at how the algorithm I described in my previous blog post actually works. I described the theory and I have tests that show that it seems to work as expected, but it occurred to me that I could actually “view it from the inside” with little effort.

First, I added a method called toDot which generates a Graphviz file representing the current graph. This turned out to be trivial:

/**
* @return a .dot file (GraphViz) version of this graph.
*/
public String toDot() {
  String FREE = "[style=filled color=yellow]";
  String RUNNING = "[style=filled color=green]";
  String FINISHED = "[style=filled color=grey]";
  StringBuilder result = new StringBuilder("digraph g {\n");
  Set<T> freeNodes = getFreeNodes();
  String color;
  for (T n : m_nodesReady) {
    color = freeNodes.contains(n) ? FREE : "";
    result.append("  " + getName(n) + color + "\n");
  }
  for (T n : m_nodesRunning) {
    color = freeNodes.contains(n) ? FREE : RUNNING;
    result.append("  " + getName(n) + color + "\n");
  }
  for (T n : m_nodesFinished) {
    result.append("  " + getName(n) + FINISHED+ "\n");
  }
  result.append("\n");
  for (T k : m_dependingOn.getKeys()) {
    List<T> nodes = m_dependingOn.get(k);
    for (T n : nodes) {
      String dotted = m_nodesFinished.contains(k) ? "style=dotted" : "";
      result.append("  " + getName(k) + " -> " + getName(n) + " [dir=back " + dotted + "]\n");
    }
  }
  result.append("}\n");
  return result.toString();
}

Then I modified the executor to dump the graph every time a worker terminates, and finally, I wrote a shell script to convert these dot files into images and to create an HTML file. I ran a simple test case, processed the files with the shell script and here is the final result.

A yellow node is “free”, green means that the node is “ready” (to be run in the thread pool), grey is “finished” and white nodes haven’t been processed yet. Dotted arrows represent dependencies that have been satisfied.

As you can see, the execution matches very closely what you would expect based on my description of the algorithm and I confirmed that changing the size of the thread pool creates different executions.

spacer

Tags: testng

This entry was posted on December 13, 2009, 12:45 pm and is filed under Java. You can follow any responses to this entry through RSS 2.0. Both comments and pings are currently closed.

7 Comments

  • #1 by Nish on December 13, 2009 - 1:21 pm

    spacer

    Your example always schedules every free test, which would happen with the random scheduler as well. Do you have an example where scheduling random free nodes differs from scheduling free nodes based on their dependencies?

  • #2 by Cedric on December 13, 2009 - 1:34 pm

    spacer

    Nish,
    Take a look at this new graph:
    beust.com/topo/Topo8.png
    When a1 completes, it frees both b1 and c1, but with this scheme, c1 would have a weight of 0 (no dependees) while b1 has a weight of 2 (2 nodes depend on it), so b1 would get scheduled first, and when it completes, it will free 2 more nodes (b2 and b3).

  • #3 by Brian Slesinsky on January 17, 2010 - 2:08 pm

    spacer

    I can’t tell from these examples, but hopefully you’re counting the number of descendants of each node rather than the number of children?

  • #4 by Olivier on January 18, 2010 - 8:24 am

    spacer

    Hi Cedric,
    I’ve used a similar technique to structure a large batch process recently. In my case, a task also has the ability to dynamically spawn various child tasks, that get reinjected in the dependency graph (probably not relevant for TestNG though).
    Another interesting issue is to evaluate the maximum parallelism that can be achieved on the graph (under ideal circumstances), which gives an upper bound on the thread pool’s size. This amounts to calculating the independence number of the graph, which can be solved in linear time for a directed acyclic graph.

  • #5 by Daniel on January 19, 2010 - 9:08 pm

    spacer

    I ran into a similar problem trying to parallelize the eager initialization of Spring singleton beans in order to reduce application startup time [1]. I really like your implementation, especially the dot file generation as a visualization aid… so I stole the idea! spacer Thanks for a great series of posts.

    [1] github.com/gredler/spriths

  • #6 by Jack3d on June 5, 2011 - 6:00 pm

    spacer

    Thanks for the tip. I’ve been having the same issues and I thought I would never find any solution online! I’ve used your technique and it worked out great! Kudos!

  • #7 by Annie Ignacio on September 15, 2011 - 12:30 am

    spacer

    Great site! I have an eager solutions to those problems I’ve encountered and ruin my works. I modified the executor to dump the graph . Finally, I can wrote a good script to convert these dot files into images and to create an HTML file. I can a simple processed the files with the script, an interesting issues will be evaluated. Thanks for the idea…

Comments are closed.

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.