Saturday, August 18, 2012

Replacing standard JDK serialization using Jackson (JSON/Smile), java.io.Externalizable

1. Background

The default Java serialization provided by JDK is a two-edged sword: on one hand, it is a simple, convenient way to "freeze and thaw" Objects you have, handling about any kind of Java object graphs. It is possibly the most powerful serialization mechanism on Java platform, bar none.

But on the other hand, its shortcomings are well-document (and I hope, well-known) at this point. Problems include:

  • Poor space-efficiency (especially for small data), due to inclusion of all class metadata: that is, size of output can be huge, larger than about any alternative, including XML
  • Poor performance (especially for small data), partly due to size inefficiency
  • Brittleness: smallest changes to class definitions may break compatibility, preventing deserialization. This makes it a poor choice for both data exchange between (Java) systems as well as long-term storage

Still, the convenience factor has led to many systems using JDK serialization to be the default serialization method to use.

Is there anything we could do to address downsides listed above? Plenty, actually. Although there is no way to do much more for the default implementation (JDK serialization implementation is in fact ridiculously well optimized for what it tries to achieve -- it's just that the goal is very ambitious), one can customize what gets used by making objects implement java.io.Externalizable interface. If so, JDK will happily use alternate implementation under the hood.

Now: although writing custom serializers may be fun sometimes -- and for specific case, you can actually write very efficient solution as well, given enough time -- it would be nice if you could use an existing component to address listed short-comings.

And that's what we'll do! Here's one possible way to improve on all problems listed above:

  1. Use an efficient Jackson serializer (to produce either JSON, or perhaps more interestingly, Smile binary data)
  2. Wrap it in nice java.io.Externalizable, to make it transparent to code using JDK serialization (albeit not transparent for maintainers of the class -- but we will try minimizing amount of intrusive code)

2. Challenges with java.io.Externalizable

First things first: while conceptually simple, there are couple of rather odd design decisions that make use of java.io.Externalizable bit tricky:

  1. Instead of passing instances of java.io.InputStream, java.io.OutputStream, instead java.io.ObjectOutput and java.io.ObjectInput are used; and they do NOT extend stream versions (even though they define mostly same methods!). This means additional wrapping is needed
  2. Externalizable.readExternal() requires updating of the object itself, not that of constructing new instances: most serialization frameworks do not support such operation
  3. How to access external serialization library, as no context is passed to either of methods?

These are not fundamental problems for Jackson: first one requires use of adapter classes (see below), second that we need to use "updating reader" approach that Jackson was supported for a while (yay!). And to solve the third part, we have at least two choices: use of ThreadLocal for passing an ObjectMapper; or, use of a static helper class (approach shown below)

So here are the helper classes we need:

final static class ExternalizableInput extends InputStream
{
  private final ObjectInput in;

  public ExternalizableInput(ObjectInput in) {
   this.in = in;
  }

  @Override
  public int available() throws IOException {
    return in.available();
  }

  @Override
  public void close() throws IOException {
    in.close();
  }

  @Override
  public boolean  markSupported() {
    return false;
  }

  @Override
  public int read() throws IOException {
   return in.read();
  }

  @Override
  public int read(byte[] buffer) throws IOException {
    return in.read(buffer);
  }

  @Override
  public int read(byte[] buffer, int offset, int len) throws IOException {
    return in.read(buffer, offset, len);
  }

  @Override
  public long skip(long n) throws IOException {
   return in.skip(n);
  }
}

final static class ExternalizableOutput extends OutputStream { private final ObjectOutput out; public ExternalizableOutput(ObjectOutput out) { this.out = out; } @Override public void flush() throws IOException { out.flush(); } @Override public void close() throws IOException { out.close(); } @Override public void write(int ch) throws IOException { out.write(ch); } @Override public void write(byte[] data) throws IOException { out.write(data); } @Override public void write(byte[] data, int offset, int len) throws IOException { out.write(data, offset, len); } }

/* Use of helper class here is unfortunate, but necessary; alternative would
* be to use ThreadLocal, and set instance before calling serialization.
* Benefit of that approach would be dynamic configuration; however, this
* approach is easier to demonstrate.
*/
class MapperHolder { private final ObjectMapper mapper = new ObjectMapper(); private final static MapperHolder instance = new MapperHolder(); public static ObjectMapper mapper() { return instance.mapper; } }

and given these classes, we can implement Jackson-for-default-serialization solution.

3. Let's Do a Serialization!

So with that, here's a class that is serializable using Jackson JSON serializer:


  static class MyPojo implements Externalizable
  {
        public int id;
        public String name;
        public int[] values;

        public MyPojo() { } // for deserialization
        public MyPojo(int id, String name, int[] values)
        {
            this.id = id;
            this.name = name;
            this.values = values;
        }

        public void readExternal(ObjectInput in) throws IOException {
            MapperHolder.mapper().readerForUpdating(this).readValue(new ExternalizableInput(in));
} public void writeExternal(ObjectOutput oo) throws IOException { MapperHolder.mapper().writeValue(new ExternalizableOutput(oo), this); }
}

to use that class, use JDK serialization normally:


  // serialize as bytes (to demonstrate):
MyPojo input = new MyPojo(13, "Foobar", new int[] { 1, 2, 3 } ); ByteArrayOutputStream bytes = new ByteArrayOutputStream(); ObjectOutputStream obs = new ObjectOutputStream(bytes); obs.writeObject(input); obs.close(); byte[] ser = bytes.toByteArray();

// and to get it back:
ObjectInputStream ins = new ObjectInputStream(new ByteArrayInputStream(ser)); MyPojo output = (MyPojo) ins.readObject();
ins.close();

And that's it.

4. So what's the benefit?

At this point, you may be wondering if and how this would actually help you. Since JDK serialization is using binary format; and since (allegedly!) textual formats are generally more verbose than binary formats, how could this possibly help with size of performance?

Turns out that if you test out code above and compare it with the case where class does NOT implement Externalizable, sizes are:

  • Default JDK serialization: 186 bytes
  • Serialization as embedded JSON: 130 bytes

Whoa! Quite unexpected result? JSON-based alternative 30% SMALLER than JDK serialization!

Actually, not really. The problem with JDK serialization is not the way data is stored, but rather the fact that in addition to (compact) data, much of Class definition metadata is included. This metadata is needed to guard against Class incompatibilities (which it can do pretty well), but it comes with a cost. And that cost is particularly high for small data.

Similarly, performance typically follows data size: while I don't have publishable results (I may do that for a future post), I expect embedded-JSON to also perform significantly better for single-object serialization use cases.

5. Further ideas: Smile!

But perhaps you think we should be able to do better, size-wise (and perhaps performance) than using JSON?

Absolutely. Since the results are not exactly readable (to use Externalizable, bit of binary data will be used to indicate class name, and little bit of stream metadata), we probably do not greatly care what the actual underlying format is.
With this, an obvious choice would be to use Smile data format, binary counterpart to JSON, a format that Jackson supports 100% with Smile Module.

The only change that is needed is to replace the first line from "MapperHolder" to read:

private final ObjectMapper mapper = new ObjectMapper(new SmileFactory());

and we will see even reduced size, as well as faster reading and writing -- Smile is typically 30-40% smaller in size, and 30-50% faster to process than JSON.

6. Even More compact? Consider Jackson 2.1, "POJO as array!"

But wait! In very near future, we may be able to do EVEN BETTER! Jackson 2.1 (see the Sneak Peek) will introduce one interesting feature that will further reduce size of JSON/Smile Object serialization. By using following annotation:

@JsonFormat(shape=JsonFormat.Shape.OBJECT)

you can further reduce the size: this occurs as the property names are excluded from serialization (think of output similar to CSV, just using JSON Arrays).

For our toy use case, size is reduced further from 130 bytes to 109; further reduction of almost 20%. But wait! It gets better -- same will be true for Smile as well, since while it can reduce space in general, it still has to retain some amount of name information normally; but with POJO-as-Arrays it will use same exclusion!

7. But how about actual real-life results?

At this point I am actually planning on doing something based on code I showed above. But planning is in early stages so I do not yet have results from "real data"; meaning objects of more realistic sizes. But I hope to get that soon: the use case is that of storing entities (data for which is read from DB) in memcache. Existing system is getting CPU-bound both from basic serialization/deserialization activity, but especially from higher number of GCs. I fully expect the new approach to help with this; and most importantly, be quite easy to deploy: this because I do not have to change any of code that actually serializes/deserializes Beans -- I just have to modify Beans themselves a bit.

Posted by Tatu Saloranta at Saturday, August 18, 2012 4:26 PM
Categories: Java, JSON, Performance
| Permalink |Comments | links to this post

blog comments powered by Disqus

Search


Custom Search

Last posts

  • Replacing standard JDK serialization using Jackson (JSON/Smile), java.io.Externalizable

Categories

    Database
    Environment
    Food+Drink
    General
    Java
    JSON
    Music
    Open Source
    Performance
    Philosophic
    Rant
    Silly
    StaxMate
    XML/Stax
Subscribe to this blog's feed
[What is this?]

Sponsored By

Archives

  • October 2013
  • September 2013
  • August 2013
  • August 2012
  • May 2012
  • April 2012
  • March 2012
  • December 2011
  • October 2011
  • September 2011
  • August 2011
  • July 2011
  • May 2011
  • April 2011
  • March 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • September 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • July 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008
  • February 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007
  • February 2007
  • January 2007
  • November 2006
  • October 2006
  • September 2006
  • August 2006
  • July 2006
  • June 2006

Related Blogs

(by Author (topics))
  • Dan D
    (XFire, Mule)
  • Jean-Francois A
    (Ajax, Comet, Async HTTP)
  • Josh C
    (Judge Mental)
  • Kohsuke K
    (Relax NG, Sun MSV)
  • Michael K
    (xslt, xquery)
  • Paul B
    (Haskell, RSS)
  • Santiago P-G
    (Glassfish, java.net, JAXP, Xalan)

Powered By

,
Blogger Templates and
spacer

About me

  • I am known as Cowtowncoder
  • Contact me at@yahoo.com
Check my profile to learn more.
gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.