Adding Hyphenation to NSString

Khoi Vinh recently showed that the typesetting in Apple’s iBooks is quite horrendous. One obvious problem is that the text is layout with justification (which is probably an appropriate decision when typesetting books), but lacks hyphenation. John Gruber does not approve.

The fact is, there are pretty good algorithms for hyphenation.  The Hunspell project has a library that powers, among other projects, OpenOffice.org and extends the algorithm that was implemented for TeX long ago.  Time to bring some of that goodness to Cocoa!

I implemented a simple category for NSString to add UTF-8 soft hyphens to a string. In this post I show how to use it in your project, including some examples.

Setup

First, gather all required files:

  • Get NSString+Hyphenate from GitHub
  • Download the Hyphen library
  • The Hyphen library contains the hyphenation dictionary for US English but if you want to support more languages you can download more dictionaries from OpenOffice.org

Second, after unzipping put the code in place.

  • Add the NSString+Hyphenate .h and .m file to your project.
  • To statically add the hyphen library to your project add the hyphen.h, hyphen.c and hnjalloc.h, hnjalloc.c files.

Third, and finally, add the .dic files to the Hyphenate.bundle and add the bundle to your project. Your project source tree should now contain all the necessary files and look something like this.

Usage

The Hyphenate category gives you one method: -stringByHyphenatingWithLocale:. Its usage is straightforward:

    NSString* text = @"It was in the fourth year of my apprenticeship to Joe, and it was a Saturday night.";
    NSLocale* en = [[[NSLocale alloc] initWithLocaleIdentifier:@"en_US"] autorelease];
    NSString* hyphenated = [text stringByHyphenatingWithLocale:en];

UIKit has limited support for the soft hyphen. This is the result for setting the string above to a UILabel, UITextView and UIWebView respectively.

As you can see, UILabel will simply display all soft hyphens. The behavior of UITextView and UIWebView is more useful: the soft hyphen is shown only when needed and it allows word wrapping.

Using (X)HTML

Since UITextView is pretty limited in how much you can style and typeset text, a UIWebView will usually be the way to go for displaying nicely looking, hyphenated text.

Obviously running -stringByHyphenatingWithLocale: on an HTML document will not give the required result. Unfortunately, unless you are willing to use libxml2 directly, your options for working with XML documents on the iPhone are limited.

The best option (as far as I know) is to use TouchXML, a friendly wrapper for libxml2 with an API that mimics Cocoa’s NSXML* classes. However, TouchXML only supports reading XML documents, not creating them. To apply hyphenation, we would need at least a way to modify text nodes. Luckily that turned out to only require a small change to TouchXML, which you can find as a patch in the hyphenate repository.

Next, after patching and setting up TouchXML, we use a simple XPath expression to fetch all the text nodes and modify each.

    CXMLDocument* document = [[[CXMLDocument alloc] init...] autorelease];
 
    NSArray* textNodes = [document nodesForXPath:@"//body//text()" error:NULL];
    for (CXMLNode* node in textNodes) {
       [node setStringValue:[[node stringValue] stringByHyphenatingWithLocale:en]];
    }
 
    NSString* hyphenatedDocument = [[[NSString alloc] 
                                     initWithData:[document XMLData] 
                                     encoding:NSUTF8StringEncoding] autorelease];

Note that this code is a bit of a oversimplification. Whether this simple XPath expression is appropriate for you wholly depends on your actual documents.

Compare the results with and without hyphenation:

For more information and documentation, check out the hyphenate repository on GitHub.

(Texts from Charles Dicken’s Great Expectations.)

Update 1: I have now found KissXML to be a better option than TouchXML for my purposes. Also, it supports setting the text nodes out of the box, no patching necessary!

Update 2: Frank Zheng has figured out a simple solution to use hyphenation in Core Text. See his blogpost for more information. Thanks, Frank!


About this entry

You’re currently reading “Adding Hyphenation to NSString,” an entry on Tupil Code Blog

Published:
Monday, June 21st, 2010 at 21:21
Author:
Eelco Lempsink
Category:
Code
Tags:
Cocoa, Hunspell, Hyphenation, iPhone, Justification, Objective-C, TouchXML, Typesetting

Comments are closed

Comments are currently closed on this entry.
  1. Andreas  July 19th, 2010
    23:05 UTC

    Great post. It worked right away. This is especially useful for German which tends to have longer words than English.

  2. Reinier Lamers  July 26th, 2010
    21:32 UTC

    Good thing that you’re posting things again :) I’m not doing any iPhone development or anything like it, but I like to read about what issues people developing for mobile phones are facing.

  3. mike  August 6th, 2010
    18:13 UTC

    Awesome. You made my day.

  4. Frank Zheng  August 13th, 2010
    11:09 UTC

    It does work for UITextView and UIWebView.
    However, I met problems when I am trying to make it work by CoreText, cause I hope I can implement hyphenation and fully justification on iPad.
    That the “-” doesn’t show up, but seems the line break as hyphenation suggests.

  5. Eelco Lempsink  August 14th, 2010
    0:23 UTC

    Hi Frank,

    Unfortunately, I don’t have any experience with CoreText, so I can’t give you a straight answer. I nosed around in the documentation a bit, and I think you’ll have to do your own rendering of a hyphen when a soft hypen is the last character of a CTLine (which you can find by looking at the last CTRun).

    Also, the code for WebKit is Open Source, and it also seems to use CoreText for the rendering of text. The part that handles (soft) hyphens is in this file: www.opensource.apple.com/source/WebCore/WebCore-528.15/platform/graphics/mac/CoreTextController.cpp.

    If you figure out how to make it work, will you share it?

  6. Frank Zheng  August 17th, 2010
    8:21 UTC

    Thanks for suggestions. I have checked WebCore source code, it does a lot own work here, that I can’t find a clear way to do this.
    Now I check all the CTLines and create new line with “-” at the end then draw the new line stead. However the new line doesn’t be justified with old lines, but the hyphen mark “-” shows up anyway.
    I’ll spent more time here, I’ll show some sample code after I got a solution.

  7. Frank Zheng  August 28th, 2010
    6:45 UTC

    “Hyphenation with Core Text on the iPad”
    frankzblog.appspot.com/?p=7001


About

Tupil creates native iOS apps for iPhone and iPad. See tupil.com for everything about us.

Recently

  • 06.21 Adding Hyphenation to NSString
  • 09.2 MoProPro: a single command to add testers to iPhone provisioning profiles
  • 04.27 Building commercial Haskell applications
  • 04.19 Running Happstack applications with FastCGI
  • 10.23 On unit testing and type checking
  • 09.2 Grid Computing using Javascript
  • 08.25 Look ma, no callbacks!
  • 07.30 Formlets in Haskell
  • 07.19 Stemming with Haskell reloaded
  • 07.14 Stemming with Haskell

@tupil on Twitter

  • @jvtjvt Goed om te horen! De features die je noemt staan ook in ons vizier. Stay tuned.
  • RT @beamer_app: Just released: Beamer 1.2, adding support for multiple Apple TVs and password protected AirPlay. Upgrade now! t.c ...
  • RT @beamer_app: Just released: Beamer 1.1. Adds support for Snow Leopard, a couple more file extensions and a nice way to enter and view ...
  • @AlexanderNL We zetten je op de lijst!
  • @AlexanderNL Graag gedaan! :) Leuk om te horen.

Tags

abstraction Apache AppleScript applicative arrows cloud computing Cocoa FastCGI formlets functional programming game github grid computing gui hackathon HAppS Happstack Haskell haskell commercial happstack Hunspell Hyphenation iPhone Javascript Justification lastfm library mailfile mashup Objective-C OS X Ruby Snowball stemmer TDD Testing TouchXML Types Typesetting upcoming

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.