pdf.js: Rendering PDF with HTML5 and JavaScript

Posted on June 15, 2011 by Andreas

Update: I updated the links again. pdf.js has moved to a new location on github.

Why?

While traveling to the Firefox 4 launch parties in Seoul and Taipei all the way from California, we killed a lot of time by brainstorming cool things to do with the web platform. Like many before us, we were wondering why nobody had implemented a PDF reader in HTML5/JavaScript. The kinds of operations a PDF reader needs to be fast at –render text, draw lines, blit images– need to be fast in browsers too, so browsers are already highly optimized for them.

Building an HTML5-based PDF renderer would also answer the question of whether the web platform and in particular canvas and SVG APIs are complete enough to efficiently and faithfully render PDFs.

Displaying PDFs directly in the browser would definitely improve the user’s experience. There are literally millions (billions?) of PDFs floating around the web, and on many devices loading PDFs switches to a different application (e.g. Preview on OS X and PDF View on Android). Also, external PDF readers and many plugins don’t support important PDF features well, including content links and fetch-as-you-go (HTTP range requests).

External readers and plugins are also forced to reinvent their own user interaction paradigms, meaning for example that users might scroll HTML pages in one way with one set of heuristics in the browser, but a totally different way in an external PDF reader.

It’s important to note that we’re not trying to promote PDF to a first-class web citizen like HTML5 is. Instead we hope that a browser-native PDF renderer written on the web platform allows web technologies to subsume PDF.

Benefits

The traditional approach to rendering PDFs in a browser is to use a native-code plugin, either Adobe’s own PDF Reader or other commercial renderers, or some open source alternative (e.g. poppler). From a security perspective, this enlarges the trusted code base, and because of that Google’s Chrome browser goes through quite some pain to sandbox the PDF renderer to avoid code injection attacks. An HTML5-based implementation is completely immune to this class of problems.

Project Status

We have been developing pdf.js in the open (on github.com), albeit quietly, for about a month now. We were waiting on the completion of some major features (Type1 fonts, gradients, etc.) before communicating pdf.js more broadly. We’ve been taken by surprise by the early and intense interest in our work, so we decided to blog and talk about our project earlier than we initially planned.

As part of our project plan, we are initially focused on achieveing pixel-perfect rendering of a single PDF paper, a 2009 paper on Trace Compilation we submitted to the ACM SIGPLAN PLDI conference. As the Tracemonkey work described in the paper led the way for JavaScript JITs, so we hope pdf.js opens the door to implementing legacy formats on top of the web platform.

If you want to see a demo of pdf.js, click on this link. There are still glitches and rendering artifacts, but you will get the picture. We are still missing Type1 PostScript fonts, which Vivien Nicolas is working on.

Along the way, we had to add some new interfaces to the HTML5 canvas element, and figure out how to implement some difficult features of the PDF spec in JavaScript. See Chris’s post for a general technological overview, and Shaon’s post for details on rendering “shading patterns”.

Whats next?

We intend to use pdf.js to render PDFs “natively”, within Firefox itself. Our most immediate goal is to implement the most commonly used PDF features so we can render a large majority of the PDFs found on the web. We believe we can reach that point in less than 3 months (the entire code so far is less than one month old, and it already renders a large set of PDF features).

Initially we will make a Firefox extension available to interested users that enables inline PDF rendering using pdf.js, but our ultimate goal is of course shipping pdf.js with Firefox. This will result in a substantial usability but also security improvement for our users. pdf.js uses only safe web languages and doesn’t contain any native code pieces attackers could exploit.

Open Source

We want pdf.js to be a community driven and governed open-source project. We’ll use it for Firefox, but we think there are many cool applications for it. We would love to see it embedded in other browsers or web applications; because it’s written only in standards-compliant web technologies, the code will run in any compliant browser. We are licensing pdf.js under a very liberal 3-clause BSD license and we welcome external contributors. We are looking forward to your ideas or code to make pdf.js better! Take a look at our github and our wiki, or talk to us on IRC in #pdfjs.

Chris Jones and Andreas Gal (and the pdf.js team)

Advertisement

Share this:

  • Facebook
  • Twitter
  • Digg
  • Reddit
  • Email

Like this:

Like
14 bloggers like this post.
  • spacer
  • spacer
  • spacer
  • spacer
  • spacer
  • spacer
  • spacer
  • spacer
  • spacer
  • spacer
  • spacer
  • spacer
  • spacer
  • spacer
This entry was posted in Mozilla. Bookmark the permalink.

196 Responses to pdf.js: Rendering PDF with HTML5 and JavaScript

  1. kdkd says:
    June 15, 2011 at 8:22 pm

    OK This is very cool.

    I’m evaluating the potential of client side javascript as a natural language processing platform, and what would be especially useful would be to have client side pdf to text functionality. I think that (and the general lack of javascript libraries for NLP ) are the two main blocks on the road to getting this stuff started. My javascript chops aren’t terribly good, but as a perl hacker of minor accomplishments, my ability to tape existing libraries together are second to none :). On the other hand I am trying to port some of the good perl NLP tools over to JS right now.

    Reply
  2. Jeff Wheeler says:
    June 15, 2011 at 9:34 pm

    It’s been done before: blogs.gnome.org/alexl/2011/03/15/gtk-html-backend-update/

    Watch for the evince part of the demo.

    Reply
    • Aleena says:
      June 15, 2011 at 11:56 pm

      That is not the same thing. That is using an X11 windowing system running a PDF application to render to an HTML 5 canvas, rather than to the screen. It requires significant server-side resources and is inappropriate given the goals of this project. Broadway is an intriguing project in its own right, but cannot be included by default into any browser.

      Reply
    • Steve says:
      July 1, 2011 at 1:44 pm

      A lot of people have misinterpretted that same post. Various other erroneous assertions have been given about how much of a “game changer” it is.

      It’s a very interesting project but it’s possible uses are limited by performance. For things like remote access or testing apps before downloading it’s awesome. However it has very little relavence to client-side rendering of pdf files.

      Reply
  3. WebArtisan says:
    June 15, 2011 at 11:56 pm

    Interesting idea and project, I’ll follow its evolutions.
    I just find annoying that the pdf document renders as an image in the canvas element. It doesn’t allow plain text search and selection/copy/paste of thext

    Reply
    • Andreas says:
      June 15, 2011 at 11:59 pm

      Canvas is our first backend. We are planning an SVG backend which will allow search, text selection and accessibility (its also a bit slower, so we will probably render canvas and then build an SVG DOM on demand). Chris’ blog post explains this in some detail.

      Reply
      • Steve Lee says:
        September 30, 2011 at 9:35 am

        And also provide better accessibility – though I just heard mo are working on remaining Canvas a11y bugs – ftw

  4. Dean Clatworthy says:
    June 16, 2011 at 12:58 am

    I understand the security benefits here, but surely shipping a native PDF reader within firefox would allow quicker rendering (which is what I want as a user).

    Reply
    • Andreas says:
      June 16, 2011 at 8:50 am

      Actually I am not so sure about that. Canvas is highly optimized (using OpenGL and GPU acceleration), whereas most plugins use CPU-based pixel pushing. The first page of the document currently renders in 60ms on my machine. Acrobat Reader takes almost 2 seconds to open. So I think we can come up with something pretty competitive here when it comes to performance.

      Reply
      • joe@drew.ca says:
        June 16, 2011 at 11:03 am

        Canvas doesn’t use OpenGL for anything but composition into the page (yet). It only uses GPU acceleration when Firefox uses Direct2D, i.e., on Windows Vista and 7.

  5. Johan Geerts says:
    June 16, 2011 at 12:58 am

    Nothing is being shown on the demo page with Safari, just the grey background. When I test with chrome it’s ok.
    Very nice idea!

    Reply
    • Andreas says:
      June 16, 2011 at 8:49 am

      Yes, current Safari seems to not support typed arrays (trunk does I believe).

      Reply
      • Mark says:
        June 22, 2011 at 7:40 am

        Same problem in the new Safari on 10.7 Lion DP4.

  6. Ecir Hana says:
    June 16, 2011 at 1:00 am

    Hello,
    I cannot say it strong enough how important this is. Thank you.
    And as practicaly all print/press revolves around PDF/X-1a:2001, please, are you considering to support it?

    Reply
    • Andreas says:
      June 16, 2011 at 8:48 am

      We are aiming to support all frequently used PDF features. I haven’t looked at the PDF/X spec yet, but its definitely something that is within the scope of our project.

      Reply
      • Ecir Hana says:
        June 16, 2011 at 12:18 pm

        Cool!

        PDF/X-1a is PDF 1.3 (i.e. no transparency), no RGB (just CMYK, spot colors and gray-scale) and mandatory ICC profiles and fully embedded fonts (i.e. no substitutions).

        In other words, the aim of “1a” is for a print to come out as close to the author’s intend as possible (e.g.: no color shifts and no wrongly substituted or dropped characters).

  7. Timothy Chien says:
    June 16, 2011 at 1:44 am

    Hey, This is Tim from Mozilla Taiwan.
    Rendering PDF in Javascript is extremely interesting, and I am impressed that you guys can make things work within a month. Should invite you guys to Asia more :P

    Reply
    • Andreas says:
      June 16, 2011 at 8:46 am

      Hey Tim. Yes, our Asia trip was clearly very fruitful :) See you at the next Firefox community party, maybe :)

      Reply
  8. Pingback: pdf.js - randează documente PDF folosind HTML 5 şi Javascript | WorldIT

  9. D. says:
    June 16, 2011 at 2:59 am

    Sorry for the stupid question, but how (in one high level sentence) in HTML5+JS are you reading the binary PDF file?

    Reply
    • Andreas says:
      June 16, 2011 at 8:44 am

      We are using XMLHttpRequest and have it return an array of bytes which we then parse in JS.

      Reply
  10. Mikkel says:
    June 16, 2011 at 3:39 am

    “Building an HTML5-based PDF renderer would also answer the question of whether the web platform and in particular canvas and SVG APIs are complete enough to efficiently and faithfully render PDFs.”

    Hasn’t Scripd already answered this question? True, we don’t know how they do it, since their conversion runs server-side, but they do output something that is HTML5, according to en.wikipedia.org/wiki/Scribd#Technology.

    Reply
    • Andreas says:
      June 16, 2011 at 8:43 am

      crocodoc.com/ seems to do server-side PDF conversion and HTML rendering as well. We are looking for maximum visual quality, anti-aliasing, sub-pixel font rendering etc. We already ran into a few missing APIs in canvas (specific gradients, dashed lines), so we do expect to push the envelope on canvas and we will propose some small additions to the standard.

      Reply
  11. Pingback: Mozilla planning to render PDF with HTML5 and JavaScript in Firefox | ZDNet

  12. Pingback: Mozilla planning to use HTML5 and JavaScript to render PDFs in Firefox

  13. Pingback: UK’s top mobile operators announce mobile wallet partnership « V E X E D

  14. Stuart Axon says:
    June 16, 2011 at 6:11 am

    Cool … I guess stuff like the type1 font support will be available as a seperate library ?

    Reply
    • Andreas says:
      June 16, 2011 at 8:37 am

      Yes. The type1 support seems worth of a stand-alone library.

      Reply
  15. Drew Vogel says: