Adding Hyphenation to NSString
Khoi Vinh recently showed that the typesetting in Apple’s iBooks is quite horrendous. One obvious problem is that the text is layout with justification (which is probably an appropriate decision when typesetting books), but lacks hyphenation. John Gruber does not approve.
The fact is, there are pretty good algorithms for hyphenation. The Hunspell project has a library that powers, among other projects, OpenOffice.org and extends the algorithm that was implemented for TeX long ago. Time to bring some of that goodness to Cocoa!
I implemented a simple category for NSString to add UTF-8 soft hyphens to a string. In this post I show how to use it in your project, including some examples.
Setup
First, gather all required files:
- Get NSString+Hyphenate from GitHub
- Download the Hyphen library
- The Hyphen library contains the hyphenation dictionary for US English but if you want to support more languages you can download more dictionaries from OpenOffice.org
Second, after unzipping put the code in place.
- Add the
NSString+Hyphenate .h
and.m
file to your project. - To statically add the hyphen library to your project add the
hyphen.h
,hyphen.c
andhnjalloc.h
,hnjalloc.c
files.
Third, and finally, add the .dic
files to the Hyphenate.bundle
and add the bundle to your project. Your project source tree should now contain all the necessary files and look something like this.
Usage
The Hyphenate
category gives you one method: -stringByHyphenatingWithLocale:
. Its usage is straightforward:
NSString* text = @"It was in the fourth year of my apprenticeship to Joe, and it was a Saturday night."; NSLocale* en = [[[NSLocale alloc] initWithLocaleIdentifier:@"en_US"] autorelease]; NSString* hyphenated = [text stringByHyphenatingWithLocale:en];
UIKit has limited support for the soft hyphen. This is the result for setting the string above to a UILabel
, UITextView
and UIWebView
respectively.
As you can see, UILabel
will simply display all soft hyphens. The behavior of UITextView
and UIWebView
is more useful: the soft hyphen is shown only when needed and it allows word wrapping.
Using (X)HTML
Since UITextView
is pretty limited in how much you can style and typeset text, a UIWebView
will usually be the way to go for displaying nicely looking, hyphenated text.
Obviously running -stringByHyphenatingWithLocale:
on an HTML document will not give the required result. Unfortunately, unless you are willing to use libxml2 directly, your options for working with XML documents on the iPhone are limited.
The best option (as far as I know) is to use TouchXML, a friendly wrapper for libxml2 with an API that mimics Cocoa’s NSXML* classes. However, TouchXML only supports reading XML documents, not creating them. To apply hyphenation, we would need at least a way to modify text nodes. Luckily that turned out to only require a small change to TouchXML, which you can find as a patch in the hyphenate repository.
Next, after patching and setting up TouchXML, we use a simple XPath expression to fetch all the text nodes and modify each.
CXMLDocument* document = [[[CXMLDocument alloc] init...] autorelease]; NSArray* textNodes = [document nodesForXPath:@"//body//text()" error:NULL]; for (CXMLNode* node in textNodes) { [node setStringValue:[[node stringValue] stringByHyphenatingWithLocale:en]]; } NSString* hyphenatedDocument = [[[NSString alloc] initWithData:[document XMLData] encoding:NSUTF8StringEncoding] autorelease];
Note that this code is a bit of a oversimplification. Whether this simple XPath expression is appropriate for you wholly depends on your actual documents.
Compare the results with and without hyphenation:
For more information and documentation, check out the hyphenate repository on GitHub.
(Texts from Charles Dicken’s Great Expectations.)
Update 1: I have now found KissXML to be a better option than TouchXML for my purposes. Also, it supports setting the text nodes out of the box, no patching necessary!
Update 2: Frank Zheng has figured out a simple solution to use hyphenation in Core Text. See his blogpost for more information. Thanks, Frank!
About this entry
You’re currently reading “Adding Hyphenation to NSString,” an entry on Tupil Code Blog
- Published:
- Monday, June 21st, 2010 at 21:21
- Author:
- Eelco Lempsink
- Category:
- Code
- Tags:
- Cocoa, Hunspell, Hyphenation, iPhone, Justification, Objective-C, TouchXML, Typesetting
23:05 UTC
Great post. It worked right away. This is especially useful for German which tends to have longer words than English.
21:32 UTC
Good thing that you’re posting things again :) I’m not doing any iPhone development or anything like it, but I like to read about what issues people developing for mobile phones are facing.
18:13 UTC
Awesome. You made my day.
11:09 UTC
It does work for UITextView and UIWebView.
However, I met problems when I am trying to make it work by CoreText, cause I hope I can implement hyphenation and fully justification on iPad.
That the “-” doesn’t show up, but seems the line break as hyphenation suggests.
0:23 UTC
Hi Frank,
Unfortunately, I don’t have any experience with CoreText, so I can’t give you a straight answer. I nosed around in the documentation a bit, and I think you’ll have to do your own rendering of a hyphen when a soft hypen is the last character of a CTLine (which you can find by looking at the last CTRun).
Also, the code for WebKit is Open Source, and it also seems to use CoreText for the rendering of text. The part that handles (soft) hyphens is in this file: www.opensource.apple.com/source/WebCore/WebCore-528.15/platform/graphics/mac/CoreTextController.cpp.
If you figure out how to make it work, will you share it?
8:21 UTC
Thanks for suggestions. I have checked WebCore source code, it does a lot own work here, that I can’t find a clear way to do this.
Now I check all the CTLines and create new line with “-” at the end then draw the new line stead. However the new line doesn’t be justified with old lines, but the hyphen mark “-” shows up anyway.
I’ll spent more time here, I’ll show some sample code after I got a solution.
6:45 UTC
“Hyphenation with Core Text on the iPad”
frankzblog.appspot.com/?p=7001