1. The problem
Jackson handles escaping of JSON String values in minimal way using escaping where absolutely necessary: it escapes two characters by default -- double quotes and backslash -- as well as non-visible control characters. But it does not escape other characters, since this is not required for producing valid JSON documents.
There are systems, however, that may run into problems with some characters that are valid in JSON documents. There are also use cases where you might prefer to add more escaping. For example, if you are to enclose a JSON fragment in XML attribute (or Javascript code), you might want to use apostrophe (') as quote character in XML, and force escaping of all apostrophes in JSON content; this allows you to simple embed encoded JSON value without other transformations.
Another specific use case is that of escaping "HTML funny characters", like less-than, greater-than, ampersand and apostrophe characters (double-quote are escaped by default).
Let's see how you can do that with Jackson.
2. Not as easy to change as you might think
Your first thought may be that of "I'll just do it myself". The problem is two-fold:
(*) actually, there is method 'JsonGenerator.writeRaw(...)' which you can use to force exact details, but its use is cumbersome and you can easily break things if you are not careful. Plus it is only applicable via Streaming API
3. Jackson (1.8) has you covered
Luckily, there is no need for you to write custom post-processing code to change details of content escaping.
Version 1.8 of Jackson added a feature to let users customize details of
escaping of characters in JSON String values.
This is done by
defining a CharacterEscapes object to be used by JsonGenerator;
it is registered on JsonFactory. If you use data-binding, you can
set this by using ObjectMapper.getJsonFactory() first, then
define CharacterEscapes to use.
Functionality is handled at low-level, during writing of JSON String
values; and CharacterEscapes abstract class is designed in a way to
minimize performance overhead.
While there is some performance
overhead (little bit of additional processing is required), it should
not have significant impact unless significant portion of content
requires escaping.
As usual, if you care a lot about performance, you
may want to measure impact of the change with test data.
4. The Code
Here is a way to force escaping of HTML "funny characters", using functionality Jackson 1.8 (and above) have.
import org.codehaus.jackson.SerializableString; import org.codehaus.jackson.io.CharacterEscapes;
// First, definition of what to escape public class HTMLCharacterEscapes extends CharacterEscapes { private final int[] asciiEscapes; public HTMLCharacterEscapes() {
// start with set of characters known to require escaping (double-quote, backslash etc) int[] esc = CharacterEscapes.standardAsciiEscapesForJSON();
// and force escaping of a few others: esc['<'] = CharacterEscapes.ESCAPE_STANDARD; esc['>'] = CharacterEscapes.ESCAPE_STANDARD; esc['&'] = CharacterEscapes.ESCAPE_STANDARD; esc['\''] = CharacterEscapes.ESCAPE_STANDARD; asciiEscapes = esc; }
// this method gets called for character codes 0 - 127 @Override public int[] getEscapeCodesForAscii() { return asciiEscapes; }
// and this for others; we don't need anything special here @Override public SerializableString getEscapeSequence(int ch) { // no further escaping (beyond ASCII chars) needed: return null; } }
// and then an example of how to apply it
public ObjectMapper getEscapingMapper() {
ObjectMapper mapper = new ObjectMapper();
mapper.getJsonFactory().setCharacterEscapes(new HTMLCharacterEscapes());
return mapper;
}
// so we could do:
public byte[] serializeWithEscapes(Object ob) throws IOException
{
return getEscapingMapper().writeValueAsBytes(ob);
}
And that's it.
Posted by Tatu Saloranta at Saturday, August 18, 2012 3:14 PM
Categories: JSON
| Permalink
|Comments