introducing esprima: blazing-fast javascript parser

December 13th, 2011 Tags: coding, esprima, javascript, parser

spacer

In a nutshell, Esprima (esprima.org) is a JavaScript parser written in pure JavaScript. In the near future, it will expand itself to something even more cooler, but as of now it’s just a parser. It uses the common recursive descent approach. The main parsing routine is not machine generated, everything is written by hand. The output of the parser is a syntax tree in JSON, formatted compatible to Mozilla Parser API.

The code is designed to be educational (no funky obfuscated tricks only a JavaScript ninja can decipher), self explanatory (the terminologies match the actual official 258-page specification), and high performant (it can tear apart jQuery source code, and not the minified version, in less than 0.1 sec). It’s always challenging to pick the sweet spot which nails all these three objectives, though I hope Esprima hits an optimal compromise.

Like any complex parser, unit testing is an integral part of the development. To ensure faithful compatibility with Mozilla Parser API, hundreds of its tests have been imported as well. All in all, there are over a thousand tests. In addition, there is a benchmarks suite, it consists of most common JavaScript libraries out there. The performance of various web browsers running the benchmarks suite is depicted in the following chart (shorter is better). The test machine is an iMac from late 2010, with 3 GHz Intel Core i3.

spacer

If you think it’s not fast enough, wait for the improvements being made to major JavaScript engines out there. Preliminary tests showed that V8 engine in Chrome 17 (dev channel) executes the benchmarks suite 1.7 faster than Chrome 15. Related to that, JavaScriptCore in WebKit nightly speeds up the benchmark running time by 25% (and it keeps getting faster). In addition, Firefox 9 will feature type inference which shows 65% performance win when running the said benchmarks suite.

What about mobile devices? As expected, it’s rather slower at this kind of job, limited pretty much by the CPU power. Some data of the running time for the benchmarks suite: 5.8 sec for Amazon Kindle Fire, 7.9 sec for Apple iPad 2, 12.8 sec for Nexus S, and 17.9 sec for Nokia N9.

Since Esprima is written in JavaScript, it runs wherever there is a decent implementation of JavaScript. Supported browsers are (among others) IE 8+, Firefox 3.5+, Safari 4+, Chrome 7+, and Opera 10.5+. As expected, Esprima can also be used in Node.js applications by installing esprima package using npm.

The best way to try Esprima is right in the browser via the online syntax parser demo. Type in your code, and voila! Esprima will show you the corresponding syntax tree almost right away. There is also the operator precedence demo, inspired by previously similar demo. Beside comparing if an expressions is equivalent to another one, the example also rewrites your expression as if you would have written it using brackets to enforce the intended precedence, illustrated in the following screenshot:

spacer

Compared to other parsers, Esprima is one of the fastest. There is a whole speed comparison page which puts Esprima head-to-head against parse-js (famously known as part of UglifyJS), ZeParser, and Narcissus. Since Esprima does not output location information yet (see issue #6), like ZeParser and Narcissus, a pure speed benchmark is only fair between Esprima vs parse-js. Here is the result, tested with different (stable version) browsers. Still not impressed? With the upcoming Chrome 17, Esprima will be actually 2x faster than parse-js.

spacer

So which parser should you pick? Narcissus has been around for a while so its stability and correctness are well tested. It does also support various JavaScript extensions, as well as features from ES.next. Both ZeParser and parse-js are not necessarily new anymore so they are more battle hardened than Esprima. Since the excellent minifier UglifyJS is based on parse-js, I’m not shocked if there are tons of peculiar JavaScript syntax which parse-js can handle really well. At the end of the day, I still hope that as the new kid on the block, Esprima is attractive enough since it’s readable, easy of follow, heavily unit tested, and yet carrying out the parsing task at blazing speed. Thus, if you feel adventurous, give Esprima a try!

Beside dealing with code parsing, Esprima also has the ability to optionally collect the comments (see issue #71). Since it involves some extra steps, expect some minor performance penalty if you do that. Once those comments are extracted, a bit of additional cross reference will allow you to associate certain comment blocks with parts of the code. This is extremely valuable for an automatic documentation tool.

To keep an eye on Esprima development, go to its project page, watch the issue tracker for future plan, and join the discussion in the mailing list.

Get the code and express yourself!

P.S.: Special thanks to Thomas Aylott, Yusuke Suzuki, and Axel Rauschmayer for the useful initial discussion, suggestions, and feedback.

  • Share this:
  • Share
    • Facebook
    • Digg
    • Reddit

    Related posts:

    • math expression evaluator in javascript: part 2 (parser)
    • math evaluator in javascript: part 1 (the tokenizer)
    • matching a decimal digit
    • parsing: imperative vs declarative
    You can leave a response, or trackback from your own site.
    • MySchizoBuddy

      Can this be used as a learning tool for creating a parser for your own programming language.

      • ariya

        An easier start would be math expression parser (see the Related Posts).

    • NiKo

      Nice! I’ve just added test file syntax checking to CasperJS thanks to esprima, it works pretty well spacer

    • www.mysparebrain.com/ Tim M

      Nice one – I’ve worked on UglifyJS (github id schmerg) so it’s nice to see both a new fresh implementation (makes testing easier when there are more implementations) and the combination of pride-in-your-new-thing and due-respect-for-others.

      The uglifyjs module has some nice walker routines that make it easy to write single routines that will walk the parse tree performing various actions – this makes sense as uglifyjs is fundamentally about modifying the parse tree, but it’s something worth doing nicely in a parser.

      I see your parse discards comments – I presume this is in the tokenisation phase. Uglifyjs does this but for some features people want to put special hints for optimisations in comments. I know having all comments present in full in the parse tree would be a pain, but if you’re thinking of adding location information to the parse info, it might be an idea to make it easy to do things like “look for the comment before a statement” by keying the location back to a raw token stream index or similar.

      • ariya

        Thanks for the feedback! As for comments and location info, check the issue tracker. They are being worked on (to certain extent). The use case that you mentioned will be easily supported. Suggestions are welcomed spacer

    spacer spacer
    gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.