Performance

Saturday, August 18, 2012

Replacing standard JDK serialization using Jackson (JSON/Smile), java.io.Externalizable

1. Background

The default Java serialization provided by JDK is a two-edged sword: on one hand, it is a simple, convenient way to "freeze and thaw" Objects you have, handling about any kind of Java object graphs. It is possibly the most powerful serialization mechanism on Java platform, bar none.

But on the other hand, its shortcomings are well-document (and I hope, well-known) at this point. Problems include:

Poor space-efficiency (especially for small data), due to inclusion of all class metadata: that is, size of output can be huge, larger than about any alternative, including XML
Poor performance (especially for small data), partly due to size inefficiency
Brittleness: smallest changes to class definitions may break compatibility, preventing deserialization. This makes it a poor choice for both data exchange between (Java) systems as well as long-term storage

Still, the convenience factor has led to many systems using JDK serialization to be the default serialization method to use.

Is there anything we could do to address downsides listed above? Plenty, actually. Although there is no way to do much more for the default implementation (JDK serialization implementation is in fact ridiculously well optimized for what it tries to achieve -- it's just that the goal is very ambitious), one can customize what gets used by making objects implement java.io.Externalizable interface. If so, JDK will happily use alternate implementation under the hood.

Now: although writing custom serializers may be fun sometimes -- and for specific case, you can actually write very efficient solution as well, given enough time -- it would be nice if you could use an existing component to address listed short-comings.

And that's what we'll do! Here's one possible way to improve on all problems listed above:

Use an efficient Jackson serializer (to produce either JSON, or perhaps more interestingly, Smile binary data)
Wrap it in nice java.io.Externalizable, to make it transparent to code using JDK serialization (albeit not transparent for maintainers of the class -- but we will try minimizing amount of intrusive code)

2. Challenges with java.io.Externalizable

First things first: while conceptually simple, there are couple of rather odd design decisions that make use of java.io.Externalizable bit tricky:

Instead of passing instances of java.io.InputStream, java.io.OutputStream, instead java.io.ObjectOutput and java.io.ObjectInput are used; and they do NOT extend stream versions (even though they define mostly same methods!). This means additional wrapping is needed
Externalizable.readExternal() requires updating of the object itself, not that of constructing new instances: most serialization frameworks do not support such operation
How to access external serialization library, as no context is passed to either of methods?

These are not fundamental problems for Jackson: first one requires use of adapter classes (see below), second that we need to use "updating reader" approach that Jackson was supported for a while (yay!). And to solve the third part, we have at least two choices: use of ThreadLocal for passing an ObjectMapper; or, use of a static helper class (approach shown below)

So here are the helper classes we need:

final static class ExternalizableInput extends InputStream
{
  private final ObjectInput in;

  public ExternalizableInput(ObjectInput in) {
   this.in = in;
  }

  @Override
  public int available() throws IOException {
    return in.available();
  }

  @Override
  public void close() throws IOException {
    in.close();
  }

  @Override
  public boolean  markSupported() {
    return false;
  }

  @Override
  public int read() throws IOException {
   return in.read();
  }

  @Override
  public int read(byte[] buffer) throws IOException {
    return in.read(buffer);
  }

  @Override
  public int read(byte[] buffer, int offset, int len) throws IOException {
    return in.read(buffer, offset, len);
  }

  @Override
  public long skip(long n) throws IOException {
   return in.skip(n);
  }
}

final static class ExternalizableOutput extends OutputStream
{
  private final ObjectOutput out;

  public ExternalizableOutput(ObjectOutput out) {
   this.out = out;
  }

@Override
public void flush() throws IOException {
out.flush();
}

@Override
public void close() throws IOException {
out.close();
}

@Override
public void write(int ch) throws IOException {
out.write(ch);
}

@Override
public void write(byte[] data) throws IOException {
out.write(data);
}

@Override
public void write(byte[] data, int offset, int len) throws IOException {
out.write(data, offset, len);
}
}

/* Use of helper class here is unfortunate, but necessary; alternative would
 * be to use ThreadLocal, and set instance before calling serialization.
 * Benefit of that approach would be dynamic configuration; however, this
 * approach is easier to demonstrate.
 */
class MapperHolder {
  private final ObjectMapper mapper = new ObjectMapper();
  private final static MapperHolder instance = new MapperHolder();
  public static ObjectMapper mapper() { return instance.mapper; }
}

and given these classes, we can implement Jackson-for-default-serialization solution.

3. Let's Do a Serialization!

So with that, here's a class that is serializable using Jackson JSON serializer:

  static class MyPojo implements Externalizable
  {
        public int id;
        public String name;
        public int[] values;

        public MyPojo() { } // for deserialization
        public MyPojo(int id, String name, int[] values)
        {
            this.id = id;
            this.name = name;
            this.values = values;
        }

        public void readExternal(ObjectInput in) throws IOException {
            MapperHolder.mapper().readerForUpdating(this).readValue(new ExternalizableInput(in));
        }
        public void writeExternal(ObjectOutput oo) throws IOException {
            MapperHolder.mapper().writeValue(new ExternalizableOutput(oo), this);
        }
  }

to use that class, use JDK serialization normally:

  // serialize as bytes (to demonstrate):
  MyPojo input = new MyPojo(13, "Foobar", new int[] { 1, 2, 3 } );
  ByteArrayOutputStream bytes = new ByteArrayOutputStream();
  ObjectOutputStream obs = new ObjectOutputStream(bytes);
  obs.writeObject(input);
  obs.close();
  byte[] ser = bytes.toByteArray();

  // and to get it back:
  ObjectInputStream ins = new ObjectInputStream(new ByteArrayInputStream(ser));
  MyPojo output = (MyPojo) ins.readObject();
  ins.close();

And that's it.

4. So what's the benefit?

At this point, you may be wondering if and how this would actually help you. Since JDK serialization is using binary format; and since (allegedly!) textual formats are generally more verbose than binary formats, how could this possibly help with size of performance?

Turns out that if you test out code above and compare it with the case where class does NOT implement Externalizable, sizes are:

Default JDK serialization: 186 bytes
Serialization as embedded JSON: 130 bytes

Whoa! Quite unexpected result? JSON-based alternative 30% SMALLER than JDK serialization!

Actually, not really. The problem with JDK serialization is not the way data is stored, but rather the fact that in addition to (compact) data, much of Class definition metadata is included. This metadata is needed to guard against Class incompatibilities (which it can do pretty well), but it comes with a cost. And that cost is particularly high for small data.

Similarly, performance typically follows data size: while I don't have publishable results (I may do that for a future post), I expect embedded-JSON to also perform significantly better for single-object serialization use cases.

5. Further ideas: Smile!

But perhaps you think we should be able to do better, size-wise (and perhaps performance) than using JSON?

Absolutely. Since the results are not exactly readable (to use Externalizable, bit of binary data will be used to indicate class name, and little bit of stream metadata), we probably do not greatly care what the actual underlying format is.
With this, an obvious choice would be to use Smile data format, binary counterpart to JSON, a format that Jackson supports 100% with Smile Module.

The only change that is needed is to replace the first line from "MapperHolder" to read:

private final ObjectMapper mapper = new ObjectMapper(new SmileFactory());

and we will see even reduced size, as well as faster reading and writing -- Smile is typically 30-40% smaller in size, and 30-50% faster to process than JSON.

6. Even More compact? Consider Jackson 2.1, "POJO as array!"

But wait! In very near future, we may be able to do EVEN BETTER! Jackson 2.1 (see the Sneak Peek) will introduce one interesting feature that will further reduce size of JSON/Smile Object serialization. By using following annotation:

@JsonFormat(shape=JsonFormat.Shape.OBJECT)

you can further reduce the size: this occurs as the property names are excluded from serialization (think of output similar to CSV, just using JSON Arrays).

For our toy use case, size is reduced further from 130 bytes to 109; further reduction of almost 20%. But wait! It gets better -- same will be true for Smile as well, since while it can reduce space in general, it still has to retain some amount of name information normally; but with POJO-as-Arrays it will use same exclusion!

7. But how about actual real-life results?

At this point I am actually planning on doing something based on code I showed above. But planning is in early stages so I do not yet have results from "real data"; meaning objects of more realistic sizes. But I hope to get that soon: the use case is that of storing entities (data for which is read from DB) in memcache. Existing system is getting CPU-bound both from basic serialization/deserialization activity, but especially from higher number of GCs. I fully expect the new approach to help with this; and most importantly, be quite easy to deploy: this because I do not have to change any of code that actually serializes/deserializes Beans -- I just have to modify Beans themselves a bit.

Posted by Tatu Saloranta at Saturday, August 18, 2012 4:26 PM
Categories: Java, JSON, Performance
| Permalink |Comments | links to this post

Thursday, May 24, 2012

Doing actual non-blocking, incremental HTTP access with async-http-client

Async-http-client library, originally developed at Ning (by Jean-Francois, Tom, Brian and maybe others and since then by quite a few others) has been around for a while now.
Its main selling point is the claim for better scalability compared to alternatives like Jakarta HTTP Client (this is not the only selling points: its API also seems more intuitive).

But although library itself is capable of working well in non-blocking mode, most examples (and probably most users) use it in plain old blocking mode; or at most use Future to simply defer handling of respoonses, but without handling content incrementally when it becomes available.

While this lack of documentation is bit unfortunate just in itself, the bigger problem is that most usage as done by sample code requires reading the whole response in memory.
This may not be a big deal for small responses, but in cases where response size is in megabytes, this often becomes problematic.

1. Blocking, fully in-memory usage

The usual (and potentially problematic) usage pattern is something like:

  AsyncHttpClient asyncHttpClient = new AsyncHttpClient();
  Future<Response> f = asyncHttpClient.prepareGet("www.ning.com/ ").execute();
  Response r = f.get();
  byte[] contents = r.getResponseBodyAsBytes();

which gets the whole response as a byte array; no surprises there.

2. Use InputStream to avoid buffering the whole entity?

The first obvious work around attempt is to have a look at Response object, and notice that there is method "getResponseBodyAsStream()". This would seemingly allow one to read response, piece by piece, and process it incrementally, by (for example) writing it to a file.

Unfortunately, this method is just a facade, implemented like so:

 public InputStream getResponseBodyAsStream() {
   return new ByteArrayInputStream(getResponseBodyAsBytes());
 }

which actually is no more efficient than accessing the whole content as a byte array. :-/

(why is it implemented that way? Mostly because underlying non-blocking I/O library, like Netty or Grizzly, provides content using "push" style interface, which makes it very hard to support "pull" style abstractions like java.io.InputStream -- so it is not really AHC's fault, but rather a consequence of NIO/async style of I/O processing)

3. Go fully async

So what can we do to actually process large response payloads (or large PUT/POST request payloads, for that matter)?

To do that, it is necessary to use following callback abstractions:

To handle response payloads (for HTTP GETs), we need to implement AsyncCompletionHandler interface.
To handle PUT/POST request payloads, we need to implement BodyGenerator (which is used for creating a Body instance, abstraction for feeding content)

Let's have a look at what is needed for the first case.

(note: there are existing default implementations for some of the pieces -- but here I will show how to do it from ground up)

4. A simple download-a-file example

Let's start with a simple case of downloading a large file into a file, without keeping more than a small chunk in memory at any given time. This can be done as follows:

public class SimpleFileHandler implements AsyncHandler<File>
{
 private File file;
 private final FileOutputStream out;
 private boolean failed = false;

 public SimpleFileHandler(File f) throws IOException {
  file = f;
  out = new FileOutputStream(f);
 }

 public com.ning.http.client.AsyncHandler.STATE onBodyPartReceived(HttpResponseBodyPart part)
   throws IOException
 {
  if (!failed) {
   part.writeTo(out);
  }
  return STATE.CONTINUE;
 }

 public File onCompleted() throws IOException {
  out.close();
  if (failed) {
   file.delete();
   return null;
  }
  return file;
 }

 public com.ning.http.client.AsyncHandler.STATE onHeadersReceived(HttpResponseHeaders h) {
  // nothing to check here as of yet
  return STATE.CONTINUE;
 }

 public com.ning.http.client.AsyncHandler.STATE onStatusReceived(HttpResponseStatus status) {
  failed = (status.getStatusCode() != 200);
  return failed ?  STATE.ABORT : STATE.CONTINUE;
 }

 public void onThrowable(Throwable t) {
  failed = true;
 }
}

Voila. Code is not very brief (event-based code seldom is), and it could use some more handling for error cases.
But it should at least show the general processing flow -- nothing very complicated there, beyond basic state machine style operation.

5. Booooring. Anything more complicated?

Downloading a large file is something useful, but while not a contriver example, it is rather plain. So let's consider the case where we not only want to download a piece of content, but also want uncompress it, in one fell swoop. This serves as an example of additional processing we may want to do, in incremental/streaming fashion -- as an alternative to having to store an intermediate copy in a file, then uncompress to another file.

But before showing the code, however, it is necessary to explain why this is bit tricky.

First, remember that we can't really use InputStream-based processing here: all content we get is "pushed" to use (without our code ever blocking with input); whereas InputStream would want to push content itself, possibly blocking the thread.

Second: most decompressors present either InputStream-based abstraction, or uncompress-the-whole-thing interface: neither works for us, since we are getting incremental chunks; so to use either, we would first have to buffer the whole content. Which is what we are trying to avoid.

As luck would have it, however, Ning Compress package (version 0.9.4, specifically) just happens to have a push-style uncompressor interface (aptly named as "com.ning.compress.Uncompressor"); and two implementations:

com.ning.compress.lzf.LZFUncompressor
com.ning.compress.gzip.GZIPUncompressor (uses JDK native zlib under the hood)

So why is that fortunate? Because interface they expose is push style:

 public abstract class Uncompressor
 {
  public abstract void feedCompressedData(byte[] comp, int offset, int len) throws IOException;
  public abstract void complete() throws IOException;
 }

and is thereby usable to our needs here. Especially when we use additional class called "UncompressorOutputStream", which makes an OutputStream out of Uncompressor and target stream (which is needed for efficient access to content AHC exposes via HttpResponseBodyPart)

6. Show me the code

Here goes:

public class UncompressingFileHandler implements AsyncHandler<File>,
   DataHandler // for Uncompressor
{
 private File file;
 private final OutputStream out;
 private boolean failed = false;
 private final UncompressorOutputStream uncompressingStream;

 public UncompressingFileHandler(File f) throws IOException {
  file = f;
  out = new FileOutputStream(f);
 }

 public com.ning.http.client.AsyncHandler.STATE onBodyPartReceived(HttpResponseBodyPart part)
   throws IOException
 {
  if (!failed) {
   // if compressed, pass through uncompressing stream
   if (uncompressingStream != null) {
    part.writeTo(uncompressingStream);
   } else { // otherwise write directly
    part.writeTo(out);
   }
   part.writeTo(out);
  }
  return STATE.CONTINUE;
 }

 public File onCompleted() throws IOException {
  out.close();
  if (uncompressingStream != null) {
   uncompressingStream.close();
  }
  if (failed) {
   file.delete();
   return null;
  }
  return file;
 }

 public com.ning.http.client.AsyncHandler.STATE onHeadersReceived(HttpResponseHeaders h) {
  // must verify that we are getting compressed stuff here:
  String compression = h.getHeaders().getFirstValue("Content-Encoding");
  if (compression != null) {
   if ("lzf".equals(compression)) {
    uncompressingStream = new UncompressorOutputStream(new LZFUncompressor(this));
   } else if ("gzip".equals(compression)) {
    uncompressingStream = new UncompressorOutputStream(new GZIPUncompressor(this));
   }
  }
  // nothing to check here as of yet
  return STATE.CONTINUE;
 }

 public com.ning.http.client.AsyncHandler.STATE onStatusReceived(HttpResponseStatus status) {
  failed = (status.getStatusCode() != 200);
  return failed ?  STATE.ABORT : STATE.CONTINUE;
 }

 public void onThrowable(Throwable t) {
  failed = true;
 }

 // DataHandler implementation for Uncompressor; called with uncompressed content:
 public void handleData(byte[] buffer, int offset, int len) throws IOException {
  out.write(buffer, offset, len);
 }
}

Handling gets bit more complicated here, since we have to handle both case where content is compressed; and case where it is not (since server is ultimately responsible for applying compression or not).

And to make call, you also need to indicate capability to accept compressed data. For example, we could define a helper method like:

public File download(String url) throws Exception
{
 AsyncHttpClient ahc = new AsyncHttpClient();
 Request req = ahc.prepareGet(url)
  .addHeader("Accept-Encoding", "lzf,gzip")
  .build();
 ListenableFuture<File> futurama = ahc.executeRequest(req,
   new UncompressingFileHandler(new File("download.txt")));

 try { // wait for 30 seconds to complete
  return futurama.get(30, TimeUnit.MILLISECONDS);
 } catch (TimeoutException e) {
  throw new IOException("Failed to download due to timeout");
 }
}

which would use handler defined above.

7. Easy enough?

I hope above shows that while doing incremental, "streaming" processing is bit more work, it is not super difficult to do.

Not even when you have bit of pipelining to do, like uncompressing (or compressing) data on the fly.

Posted by Tatu Saloranta at Thursday, May 24, 2012 5:26 PM
Categories: Java, Open Source, Performance
| Permalink |Comments | links to this post

Friday, April 06, 2012

Take your JSON processing to Mach 3 with Jackson 2.0, Afterburner

(this is part on-going "Jackson 2.0" series, starting with "Jackson 2.0 released")

1. Performance overhead of databinding

When using automatic data-binding Jackson offers, there is some amount of overhead compared to manually writing equivalent code that would use Jackson streaming/incremental parser and generator. But how overhead is there? The answer depends on multiple factors, including exactly how good is your hand-written code (there are a few non-obvious ways to optimize things, compared to data-binding where there is little configurability wrt performance).

But looking at benchmarks such as jvm-serializers, one could estimate that it may take anywhere between 35% and 50% more time to serialize and deserialize POJOs, compared to highly tuned hand-written alternative. This is usually not enough to matter a lot, considering that JSON processing overhead is typically only a small portion of all processing done.

2. Where does overhead come?

There are multiple things that automatic data-binding has to do that hand-written alternatives do not. But at high level, there are really two main areas:

Configurability to produce/consume alternative representations; code that has to support multiple ways of doing things can not be as aggressively optimized by JVM and may need to keep more state around.
Data access to POJOs is done dynamically using Reflection, instead of directly accessing field values or calling setters/getters

While there isn't much that can be done for former, in general sense (especially since configurability and convenience are major reasons for popularity of data-binding), latter overhead is something that could be theoretically eliminated.

How? By generating bytecode that does direct access to fields and calls to getters/setters (as well as for constructing new instances).

3. Project Afterburner

And this is where Project Afterburner comes in. What it does really is as simple as generating byte code, dynamically, to mostly eliminate Reflection overhead. Implementation uses well-known lightweight bytecode library called ASM.

Byte code is generated to:

Replace "Class.newInstance()" calls with equivalent call to zero-argument constructor (currently same is not done for multi-argument Creator methods)
Replace Reflection-based field access (Field.set() / Field.get()) with equivalent field dereferencing
Replace Reflection-based method calls (Method.invoke(...)) with equivalent direct calls
For small subset of simple types (int, long, String, boolean), further streamline handling of serializers/deserializers to avoid auto-boxing

It is worth noting that there are certain limitations to access: for example, unlike with Reflection, it is not possible to avoid visibility checks; which means that access to private fields and methods must still be done using Reflection.

4. Engage the Afterburner!

Using Afterburner is about as easy as it can be: you just create and register a module, and then use databinding as usual:

Object mapper = new ObjectMapper()
mapper.registerModule(new AfterburnerModule());
String json = mapper.writeValueAsString(value);
Value value = mapper.readValue(json, Value.class);

absolutely nothing special there (note: for Maven dependency, downloads, go see the project page).

5. How much faster?

Earlier I mentioned that Reflection is just one of overhead areas. In addition to general complexity from configurability, there are cases where general data-binding has to be done using simple loops, whereas manual code could use linear constructs. Given this, how much overhead remains after enabling Afterburner?

As per jvm-serializers, more than 50% of speed difference between data-binding and manual variant are eliminated. That is, data-bind with afterburner is closer to manual variant than "vanilla" data-binding. There is still something like 20-25% additional time spent, compared to highest optimized cases; but results are definitely closer to optimal.

Given that all you really have to do is to just add the module, register it, and see what happens, it just might make sense to take Afterburner for a test ride.

6. Disclaimer

While Afterburner has been used by a few Jackson users, it is still not very widely used -- after all, while it has been available since 1.8, in some form, it has not been advertised to users. This article can be considered an announcement of sort.

Because of this, there may be rought edges; and if you are unlucky you might find one of two possible problems:

Get no performance improvement (which is likely due to Afterburner not covering some specific code path(s)), or
Get a bytecode verification problem when a serializer/deserializer is being loaded

latter case obviously being nastier. But on plus side, this should be obvious right away (and NOT after running for an hour); nor should there be a way for it to cause data losses or corruption; JVMs are rather good at verifying bytecode upon trying to load it.

Posted by Tatu Saloranta at Friday, April 06, 2012 7:24 PM
Categories: Java, JSON, Performance
| Permalink |Comments | links to this post

Monday, April 04, 2011

Introducing "jvm-compressor-benchmark" project

I recently started one new open source project; this time being inspired by success of another OS project I had been involved in, project is "jvm-serializers" benchmark originally started by Eishay and built by a community of java databinder/serializer experts. What has been great with this project has been amount of energy it seemed ot feed back to development of serializers: highly visible competition for best performance seems to have improved efficiency of libraries a lot. I only wish we had historical benchmark data to compare to see exactly how far have the fastest Java serializers come.

Anyway, I figured that there are other groups of libraries where high performance matters, but where there is lack of actual solid benchmarking information. So while there are a few compression performance benchmarks, they are often non-applicable for Java developers: partly because they just compare native compressor codecs, and partly because focus is more often only on space-efficiency (how much compression is achieved) with little consideration of performance of compression. The last part is particularly frustrating as in many use cases there is significant trade-off between space and time efficiency (compression rate vs time used for compression).

So, this is where the new project -- "jvm-compressor-benchmark" -- comes from. I hope it will allow fair comparison of compression codecs available on JVM, to be used by Java and other JVM lagnuages; and also bring in some friendly competition between developers of compression codecs.

First version compares half a dozen of compression formats and codecs, from the venerable deflate/gzip (which offers pretty good compression ratio with decent speed) to higher-compression-but-slower-operation alternatives (bzip2) and lower-compression-but-very-fast alternatives like lzf, quicklz and the new kid on the block, Snappy (via JNI).

And although the best way to evaluate results is to run tests on your machine, using data sets you care about (which I strongly encourage!), Project wiki does have some pretty diagrams for tests run on "standard" data sets gathered from the web.

Anyway: please check the project out -- at the very least it should give you an idea of how many options there are above and beyond basic JDK-provided gzip.

ps. Contributions are obviously also welcome -- anyone willing to tackle Java version of 7-zip's LZMA, for example, would be most welcome!

Posted by Tatu Saloranta at Monday, April 04, 2011 10:48 PM
Categories: Java, Open Source, Performance
| Permalink |Comments | links to this post

Tuesday, January 04, 2011

Annual Update on State of Java JSON data binding performance

Yes, 'tis the season for performance measurements again: last time I covered this subject about a year ago with "JSON data binding performance (again!)".
During past 12 months many of the tested libraries have released new versions; some with high hopes for performance improvements (Gson 1.6, for example, was rumored to have faster streaming parser). So it seems prudent to have another look at how performant are Java JSON data binding libraries currently available.

1. Libraries tested

Libraries tested are the same as last time; with versions:

Jackson 1.6 (was 1.2)
Gson 1.6 (was 1.4)
Json-tools (core, 1.7) (no change)
Flex-json 2.1 (was 1.9.1)

(for what it's worth, I was also hoping to include tests for "json-marshaller", but lack of documentation coupled with seeming inability to parse directly from stream or even String suggested that it's not yet mature enough to be included)

2. Test system

I switched over to the light side at home (replaced my old Athlon/Linux box to a small spunky Mini-Mac, yay!), so the test box has a 2.53 GHz Intel Core 2 Duo CPU. This is somewhat box; it seems to be about 4x as fast for this particular task. Test is single-threaded, so it would be possible to roughly double the throughput with different Japex test setup; however, test threads are CPU-bound and independent so seems to be little point in adding more concurrency.

Test code is available from Woodstox SVN repository (under "staxbind" folder), and runs on Japex. Converters for libraries are rather simple; data consists of medium-sized (20k) documents for anyone interested in replicating results.

3. Pretty Graphs

Ok, here is the main graph:

and for more data, check out the full results.

4. "But what does it MEAN?"

As with previous tests, upper part of double-graph just indicates amount of data read and/or written (which is identical for first three, but flex-json seems to insist including some additional type metadata), which can be ignored; the lower-graph indicates through-put (higher bar means faster processing) and is the interesting part.

There are three tests; read JSON (into data object(s)), write JSON (from data object(s)) and read-then-write which combines two operations. I use last result, since it gives reasonable approximation for common use with web services where requests are read, some processing done, and a response written.

From graph it looks like results have not changed a lot; here is the revised speed ratio, using the slowest implementation (Gson) as the baseline:

Impl	Read (TPS)	Write (TPS)	Read+Write (TPS)	R+W, times baseline
Jackson (automatic)	7240.202	9161.873	4023.464	14.75
FlexJson	721.743	1402.848	462.594	1.69
Json-tools	524.119	1007.068	341.123	1.25
GSON	714.106	462.935	272.637	1

5. Thoughts?

Not much has changed; Jackson is still an order of magnitude faster than the rest, and relative ranking of the other libraries hasn't changed either.

Posted by Tatu Saloranta at Tuesday, January 04, 2011 9:28 PM
Categories: Java, JSON, Performance
| Permalink |Comments | links to this post

Friday, October 29, 2010

Faster compression on Java platform with LZF

One open source project that I have worked on a bit lately is compress-lzf, a simple Java library that implements block- and stream-accessible version of LZF compression algorithm. LZF algorithm itself is very simple, and mostly (only?) documented here, in form of working reference implementation that doubles as documentation.

Our implementation is a simple rewrite of C version, and fully compatible with the reference implementation including file header and block signature (this is different from the other existing Java version that projects like H2 include). While simple implementation, it is not naive translation; and some effort has been spent on optimizing its performance.

And as should be the case for such low-level libraries, there are no external dependencies beyond JDK 1.5; meaning it should work on variety of platforms, including Android (in fact it could probably be compiled on any 1.x JDK)

1. What is LZF?

LZF compression algorithm is basic Lempel-Ziv, with little additional processing; so one way to think of it is that it is "gzip without Huffman encoding" (gzip uses Deflate, which is Lempel-Ziv followed by Huffman encoding, with variable bit-length).
And Lempel-Ziv is just method whereby one finds repeating sub-sequences in output, replacing sequences of three or more "already seen bytes" with a back-reference; matches are typically stored in a hash table indexed by hash of first 3 bytes, value being offset of last instance of potentially matching sequence.

It is worth noting that many other similar (and sometimes similarly named) algoritms likewise use basic LZ; "Fastlz" for example is very similar.

2. Gzip exists, why bother with LZF?

But why does this matter? If gzip is this and more, why would you want to use a "lesser" algorithm?

Key benefit is that of speed: LZF can be up to twice as fast to uncompress; but where it REALLY shines is compression speed -- here it can be 2x or 3x as fast (writing a stand-alone benchmark is on my todo list, so hopefully I will get to talk a bit more about this in future). Performance difference is mostly due to relatively high overhead that deflate uses for non-byte-aligned output (which is needed for optimal Huffman-encoding of codewords) -- while it definitely helps compression, it is expensive in terms of CPU usage.

LZF compression rate is lower than with gzip, but compression profile is similar: things that compress well with gzip also compress well with LZF; and those that compress less do so with LZF as well. This makes sense given that most compression often comes from LZ algorithm, which is implemented similarly.

So it all comes down to this: gzip compression is very CPU-intensive (read: slow); so much so that you typically use 5x as long to compress things like JSON data than writing content itself. With fast networks and I/O this often becomes bottleneck.
LZF can be a good alternative since its CPU overhead is much more modest; and it can still compress "fluffy" content pretty well -- where gzip can compress something by, say, 90%, LZF usually gets to 75-80% ratio (this obviously varies by types of content; basically depends on which part of deflate algorithm is more effective)

3. Usage

Assuming that above sounds interesting, you can get compress-lzf library either from GitHub; or via Maven. In latter case, information is:

<groupId>com.ning</groupId>

CowTalk

Moo-able Type for Cowtowncoder.com