Saturday, August 18, 2012

Forcing escaping of HTML characters (less-than, ampersand) in JSON using Jackson

1. The problem

Jackson handles escaping of JSON String values in minimal way using escaping where absolutely necessary: it escapes two characters by default -- double quotes and backslash -- as well as non-visible control characters. But it does not escape other characters, since this is not required for producing valid JSON documents.

There are systems, however, that may run into problems with some characters that are valid in JSON documents. There are also use cases where you might prefer to add more escaping. For example, if you are to enclose a JSON fragment in XML attribute (or Javascript code), you might want to use apostrophe (') as quote character in XML, and force escaping of all apostrophes in JSON content; this allows you to simple embed encoded JSON value without other transformations.

Another specific use case is that of escaping "HTML funny characters", like less-than, greater-than, ampersand and apostrophe characters (double-quote are escaped by default).

Let's see how you can do that with Jackson.

2. Not as easy to change as you might think

Your first thought may be that of "I'll just do it myself". The problem is two-fold:

  1. When using API via data-binding, or regular Streaming generator, you must pass unescaped String, and it will get escaped using Jackson's escaping mechanism -- you can not pre-process it (*)
  2. If you decide to post-process content after JSON gets written, you need to be careful with replacements, and this will have negative impact on performance (i.e. it is likely to double time serialization takes)

(*) actually, there is method 'JsonGenerator.writeRaw(...)' which you can use to force exact details, but its use is cumbersome and you can easily break things if you are not careful. Plus it is only applicable via Streaming API

3. Jackson (1.8) has you covered

Luckily, there is no need for you to write custom post-processing code to change details of content escaping.

Version 1.8 of Jackson added a feature to let users customize details of escaping of characters in JSON String values.
This is done by defining a CharacterEscapes object to be used by JsonGenerator; it is registered on JsonFactory. If you use data-binding, you can set this by using ObjectMapper.getJsonFactory() first, then define CharacterEscapes to use.

Functionality is handled at low-level, during writing of JSON String values; and CharacterEscapes abstract class is designed in a way to minimize performance overhead.
While there is some performance overhead (little bit of additional processing is required), it should not have significant impact unless significant portion of content requires escaping.
As usual, if you care a lot about performance, you may want to measure impact of the change with test data.

4. The Code

Here is a way to force escaping of HTML "funny characters", using functionality Jackson 1.8 (and above) have.


import org.codehaus.jackson.SerializableString;
import org.codehaus.jackson.io.CharacterEscapes;

// First, definition of what to escape public class HTMLCharacterEscapes extends CharacterEscapes { private final int[] asciiEscapes; public HTMLCharacterEscapes() {
// start with set of characters known to require escaping (double-quote, backslash etc) int[] esc = CharacterEscapes.standardAsciiEscapesForJSON();
// and force escaping of a few others: esc['<'] = CharacterEscapes.ESCAPE_STANDARD; esc['>'] = CharacterEscapes.ESCAPE_STANDARD; esc['&'] = CharacterEscapes.ESCAPE_STANDARD; esc['\''] = CharacterEscapes.ESCAPE_STANDARD; asciiEscapes = esc; }
// this method gets called for character codes 0 - 127 @Override public int[] getEscapeCodesForAscii() { return asciiEscapes; }
// and this for others; we don't need anything special here @Override public SerializableString getEscapeSequence(int ch) { // no further escaping (beyond ASCII chars) needed: return null; } }

// and then an example of how to apply it
public ObjectMapper getEscapingMapper() {
ObjectMapper mapper = new ObjectMapper();
mapper.getJsonFactory().setCharacterEscapes(new HTMLCharacterEscapes());
return mapper;
}

// so we could do:
public byte[] serializeWithEscapes(Object ob) throws IOException
{
return getEscapingMapper().writeValueAsBytes(ob);
}


And that's it.

Posted by Tatu Saloranta at Saturday, August 18, 2012 3:14 PM
Categories: JSON
| Permalink |Comments | links to this post

blog comments powered by Disqus

Search


Custom Search

Last posts

  • Forcing escaping of HTML characters (less-than, ampersand) in JSON using Jackson

Categories

    Database
    Environment
    Food+Drink
    General
    Java
    JSON
    Music
    Open Source
    Performance
    Philosophic
    Rant
    Silly
    StaxMate
    XML/Stax
Subscribe to this blog's feed
[What is this?]

Sponsored By

Archives

  • October 2013
  • September 2013
  • August 2013
  • August 2012
  • May 2012
  • April 2012
  • March 2012
  • December 2011
  • October 2011
  • September 2011
  • August 2011
  • July 2011
  • May 2011
  • April 2011
  • March 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • September 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • July 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008
  • February 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007
  • February 2007
  • January 2007
  • November 2006
  • October 2006
  • September 2006
  • August 2006
  • July 2006
  • June 2006

Related Blogs

(by Author (topics))
  • Dan D
    (XFire, Mule)
  • Jean-Francois A
    (Ajax, Comet, Async HTTP)
  • Josh C
    (Judge Mental)
  • Kohsuke K
    (Relax NG, Sun MSV)
  • Michael K
    (xslt, xquery)
  • Paul B
    (Haskell, RSS)
  • Santiago P-G
    (Glassfish, java.net, JAXP, Xalan)

Powered By

,
Blogger Templates and
spacer

About me

  • I am known as Cowtowncoder
  • Contact me at@yahoo.com
Check my profile to learn more.
gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.