Zetafleet

Blog

Unified codebases with Dojo, Node, and RequireJS: the holy grail of DRY code

posted 28 Feb 2011 23:40 CST

Tagged: dojo, dry, howto, javascript, node and requirejs

8 Comments

Update 2011-05-10: The code referenced in this post is now out of date. Please check Dojo Boilerplate’s AMD branch for up-to-date code.

For a large part of my Web development career, I’ve felt unease at the fact that every single Web app I’ve worked on has required two completely separate codebases: one for the server, usually written in a language like PHP, and one on the client, written in JavaScript. Over the years, thousands of frameworks have been written to try to streamline and automate this process, from crappy little AJAX page loaders to incredibly complex Java-to-JavaScript translators—but no matter how good any of these frameworks was, you were always writing to the language running on the server first, even though that language was (and likely always will be) incompatible with what runs in browsers on the client.

For the first time in over a decade, I’m happy to say that I believe this discontinuity has finally been solved in a way that will fundamentally change how all Web applications are written in the future. I say this because yesterday, for the first time ever, I succeeded in running a single JavaScript codebase seamlessly on both the client and server using a simple combination of Dojo Toolkit 1.6, Node.js 0.4.1, and RequireJS 0.24.

I’m sure this isn’t the first time someone has accomplished this, but the robustness of Dojo’s tools, the performance of Node, and the simplicity of RequireJS all combined in a way that made me feel that the time has finally come for Web development to shift to pure, seamless, fully-integrated JavaScript. Unlike past experiments that often involved gross hacks or the use of barely functional pre-alpha software, I would feel confident in deploying this solution to a production environment today. I hope this article gets others to start exploring and feeling excited about these technologies and how they’re changing Web development as we know it.

Dojo on Node.js

In Dojo 1.6, the Dojo team spent a huge amount of effort adding full support for the CommonJS AMD specification across all Dojo and Dijit modules; this change is the key that really puts Dojo worlds above any other toolkit when it comes to creating single codebase applications. Prior to this, Dojo was already the best RIA toolkit on the market, but running it in Node meant some ugly hacking to circumvent Node’s module sandboxing. With AMD, these hacks are no longer necessary.

Unfortunately, a tiny amount of modification is necessary in order to get Dojo 1.6.0 running in Node 0.4. Luckily, the changes amount to two one-liners and an alternative package main module, and these fixes are already scheduled to be added to the next version of Dojo.

RequireJS on Node.js

Node’s default module system is designed around the idea that dependencies can be loaded synchronously, which makes it a terrible choice for writing code modules that will run on the client. Swap it out with RequireJS—which still allows you to seamlessly use Node packages on the server-side—and writing highly modular code for both client and server suddenly becomes trivial. Roughly speaking, RequireJS provides a 1:1 mapping between filenames and defined modules, so if you had a component named my/app/FirstClass, you would just put it in my/app/FirstClass.js and RequireJS would load the definition from that file. Easy-peasy!

RequireJS also has support for basic loading of text files as dependencies and simple Object dictionaries for i18n. It also provides a highly robust build system, which means that when you’re ready to deploy to production, it’ll concatenate and minify all your code, intern external text, even let you split your code into arbitrary collections of modules that can be lazy-loaded on demand and minify your CSS. Amazing!

Putting it all together

In order to run a single codebase application using Dojo, RequireJS, and Node, you really only need 4 special things (and they really aren’t that special):

A stub HTML file that loads RequireJS (index.html)
A shell script that invokes Node with RequireJS (server.sh)
A stub script that configures RequireJS for each environment and loads your main application code (js/my/_base.js)
Some sort of logic within your application that branches depending upon the environment for any modules that need to load only in one place or the other (js/my/app.js)

In the examples given below, the application structure looks roughly like this:

app/
- index.html
- server.sh
- css/
- js/
  - lib/
    - requirejs/
    - dojo/
    - dijit/
  - my/
    - _base.js
    - app.js
    - nls/
      - app.js

There’s no particular requirement that the application be structured in this way, it’s just how I personally like to structure mine today.

index.html

A very basic HTML file; this loads RequireJS, then loads the application stub. Everything else is handled by Dojo/Dijit and RequireJS from this point on.

<!DOCTYPE html>
<html>
  <head>
    <title>my app</title>
    <meta charset="utf-8">
    <link rel="stylesheet" class="css/app.css">
  </head>
  <body>
    <!-- this will change to a single, minified
         JS file once you are in production -->
    <script src="/img/spacer.gif"> </script>
    <script src="/img/spacer.gif"> </script>
  </body>
</html>

server.sh

A very basic shell script. r.js is one of the precompiled versions of RequireJS available on the download page and doesn’t come with the source download; if you do download the source version, use x.js from the bin directory, which serves the same purpose.

#!/bin/bash

JSENGINE=/path/to/node
REQUIREJS=js/lib/requirejs/r.js
REQUIREDIR=$(dirname "$REQUIREJS")

$JSENGINE $REQUIREJS $REQUIREDIR js/my/_base.js

js/my/_base.js

This configures Dojo by creating a global dojoConfig object, configures RequireJS, and kicks off loading of the main application code. Thanks to neonstalwart for his dojo-requirejs-template, which was the basis for this configuration code.

dojoConfig = {
  isDebug: true
};

require({
  baseUrl: 'js/',
  // set the paths to our library packages
  packages: [
    {
      name: 'dojo',
      location: 'lib/dojo',
      // these are loaded from js/lib/dojo/lib.
      // lib/main-commonjs is the alternative package
      // main module from ticket #12357;
      // you must place it there yourself (it does not
      // come with dojo yet)
      main: typeof window !== "undefined" ?
        'lib/main-browser' :
        'lib/main-commonjs',
      lib: '.'
    },
    {
      name: 'dijit',
      location: 'lib/dijit',
      main: 'lib/main',
      lib: '.'
    }
  ],
  // set the path for the require plugins—text, i18n, etc.
  paths: {
    require: 'lib/requirejs/require'
  }
});

// load the app!
require(['my/app']);

js/my/app.js

This is our application’s main application object. Really, you could load whatever you wanted from the stub, but this gives a very basic idea of what kinds of simple switching can be done to run different code depending upon the environment.

// requires dojo and our i18n dictionary
define(['dojo', 'i18n!my/nls/app'], function (dojo, i18n) {
  var isBrowser = dojo.isBrowser,
    mode = isBrowser ? 'Client' : 'Server',

    // our main application object; anything else that
    // requires 'my/app' in the future will receive this
    // object (because it’s returned at the end of this
    // function); all other defined modules work the
    // same way: the callback is invoked once and
    // the returned value is cached by RequireJS
    app = {
      onReady: function () {
        console.log(i18n.helloWorld);
      }
    };

  // loads either Client or Server class for Db and
  // Conduit depending upon if we are on the
  // client or server
  require(['my/db/' + mode, 'my/conduit/' + mode,
    'my/Baz'], function (Db, Conduit, Baz) {

    app.db = new Db();
    app.conduit = new Conduit();

    // this module works exactly the same on
    // both client and server, no extra code
    // necessary! NICE!
    app.baz = new Baz();

    // app has loaded, fire anything that has
    // connected to app.onReady!
    app.onReady();
  });

  return app;
});

I hope this article has raised your interest in Dojo, Node, and RequireJS, and that you’ll try building a unified codebase application of your own sometime soon. The ability to finally DRY out code between the client and server is here, and I’m incredibly excited about what it means for my applications and yours.

As a final note, if you’re a Web application developer and you’ve never used Dojo before (I see you there), you’re doing yourself a severe disservice if you are doing anything beyond very simple progressive enhancement of static pages. My colleague, Rebecca Murphey, will be doing a series of screencasts shortly on some of its killer features that make it the best choice for RIA development; I highly encourage you to watch her blog and check those out once they are available.

Google Chrome's Heap Profiler and Memory Timeline, explained

posted 27 Jan 2011 12:36 CST

Tagged: chrome, garbage collection, heap profiler, heap snapshot, javascript, memory leak, memory timeline and v8

4 Comments

(with thanks to Mikhail Naganov for his feedback on the Developer Tools mailing list)

Chrome’s Developer Tools contain some useful features for inspecting memory usage of a given page (and its change over time), but the documentation for these features is a bit sparse—and, if you are unfamiliar with these sorts of tools and what they do, their output can seem undecipherable. Hopefully this brief post helps explain these features and what they can do for you.

The Memory Timeline

The memory timeline gives you an overview of memory usage over time. This makes it very easy to see how much memory various parts of your application use and can provide a strong visual cue if your application is leaking memory over time. The blue area represents the amount of memory in use by your app at a given time; the remaining white area represents the total amount of allocated memory. The number in the top-left corner (as of Chrome 9) indicates the total amount of allocated (available) memory, not the total amount of used memory. In nearly all cases, your concern lies only with the amount of used memory.

You can see exactly how much memory was in use at the end of a particular event by hovering over the name of the record in the list of records until a bubble appears. Records are also recorded any time V8’s garbage collector runs, telling us how much memory was reclaimed. It is important to note that V8’s garbage collector is incredibly complex and may take up to 5 runs before it garbage collects an unused object, so don’t assume that everything that can be collected has been collected when you see that GC has occurred.

The Heap Profiler

The heap profiler is a somewhat more complicated tool than the memory timeline: while the memory timeline shows you how much memory is in use over time, the heap profiler gives you an overview of all of the objects in memory at the moment the snapshot was taken. This allows you to drill down and see exactly what kinds of objects are responsible for using memory at a given point in time.

Before going over the main table of the heap snapshot, I’ll briefly explain the difference between “code” and “objects” in the two pills at the bottom of the window.

Code objects are bits of JIT-compiled code that get stored in memory by the JavaScript engine. There are three principal types of code objects: Scripts, which are objects that contain functionless code executed directly within a <script> tag (e.g. <script>var foo = 42;</script>); SFIs, which are objects that contain the actual code for a function; and Functions, which are essentially wrappers that contain a pointer to an SFI (for the code) and information about the function’s lexical scope (which, when combined, form the basis of a complete function call).

Compiled code objects are separated from all other Objects, which are non-executable data stored in memory—Object, Arrays, Strings, and so on. (Uncompiled script source code gets stored here as well, as String data.) This separation is mostly irrelevant to JavaScript developers as there is no real concept of executable vs non-executable memory in JavaScript itself, but it is important to know that “code” here refers to JIT-compiled code, not source code.

In Chrome 9, the minimum number of code and object references in any page is around 5500 (about 580kB of used heap space). This is space that is taken up by native objects that you’d expect to see in any ECMAScript environment—RegExp, Date, Math, and so on. Access the window object in your code and the reference count jumps up to around 9000 (815kB). These numbers can be used as a rough baseline for the minimum amount of data that can exist on a page.

The table of data that the heap profiler provides can seem bewildering at first. Unlike profiling CPU time where you get a list of results by method name adding up to 100% time spent, the heap snapshot is an infinitely recursive list of objects in memory grouped by object constructor. Each group has a count that shows you how many references to objects of that type existed at the time the snapshot was taken. By creating multiple snapshots, you can profile the behaviour of your application at specific points in time to see which types of objects are being created, retained, and destroyed in response to certain events (like switching between views).

Another feature of the heap profiler is the ability to drill down into groups to see which other groups of objects are holding references to a given group of objects. There are two important things to know about this view:

Parent groups expand to show references from the children to the parent group, not references to the children from the parent group. (Consider it a drill-up, rather than a drill-down, though really when it comes to memory, it’s all pretty cyclical.)
The counts illustrate the number of references between objects, not the number of objects themselves. One child object can have many references to the same parent object, either directly or through other references, and there is no way to tell how many objects are responsible for all of the references within a given group.

You will also see a few special object types that are important to understand:

“(global property)” is a special intermediate object that stands between the global object (in the browser, this is window) and any objects that are referenced by the global object. This is done to improve performance, since the mechanism that is used to speed up property look-ups on regular objects does not work as well for property lookups on the global object.

The “(closure)” group simply indicates the number of references to the expanded group of objects through any closures. Closures are fantastically useful in JavaScript, but they can cause huge problems with unintentional memory retention (since V8’s current GC won’t clean up any of the memory from a closure until all the members of the closure have gone out of scope—Wikipedia tells me this type of garbage is called “semantic garbage”). Use them sparingly.

Finally, and most importantly, “(roots)” are the special group of objects that are used by the garbage collector as a starting point to determine which objects are eligible for garbage collection. A “root” is simply an object that the garbage collector assumes is reachable by default, which then has its references traced in order to find all other current objects that are reachable. Any object that is not reachable through any reference chain of any of the root objects is considered unreachable and will eventually be destroyed by the garbage collector. In V8, roots consist of objects in the current call stack (i.e. local variables and parameters of the currently executing function), active V8 handle scopes, global handles, and objects in the compilation cache. (Learn more about this topic by reading Mark Lam’s excellent article, Garbage Collection.)

Now that that’s out of the way, let’s look at a quick example. A relatively simple one to understand is the DOMWindow constructor, since there is typically only one DOMWindow on a given page—the window object—and that simplifies things a bit.

When you expand the DOMWindow group, you’ll be able to see which groups of objects have references back to any of the objects in original group of DOMWindows. In this case, you can see that there are several instances where the same group of DOMWindows refer back to themselves. Recursive structures like these are incredibly common—the window object, for instance, has several properties that point back to itself: window, parent, self, top, and frames. There can often be circular references that occur between several different objects as well, such as window.document.defaultView (which points back to window). These infinitely recursive structures are accurately represented within the heap snapshot.

Caveats

Unfortunately, while still far beyond anything that other browser manufacturers currently provide to inspect memory, the heap profiler still has several huge issues that keep it from being an effective tool for debugging memory leaks—and, in my opinion, this is by far the most needed missing feature in today’s developer tools.

The first problem is that the Chrome/V8 garbage collector (as of Chrome 9) can take up to five rounds to find and clear unreferenced data. This means that even though data has fallen out of scope and is eligible for collection, it may still show up in the heap snapshot as if it were still in scope. This makes it very difficult to determine whether data is actually leaking because it is still being referenced somewhere, or whether the garbage collector simply hasn’t gotten around to cleaning it up yet.

The second problem is that there is no way to actually inspect individual objects in memory and learn exactly what they are, what they are referencing, where they were assigned, or their age—the heap profiler only shows you aggregate information about the number of references to a particular type of object. This means that if you use the same constructor in many different areas of an application, but they only leak in one place in your code, it becomes nearly impossible to actually determine which part of the application is responsible for the leak without checking every instance where a certain type of object is used. Being able to drill down to view individual objects and their retainers, and to see the age of each object, would be invaluable in determining where objects are leaking and why.

Caveats aside, Chrome is currently the only browser that provides any real useful level of memory inspection. As long-running, single-page apps become more and more prevalent, the need for features that allow web developers to inspect the memory usage of their applications continues to grow. Hopefully this explanation of how these tools work will encourage you to make them a part of your arsenal of web development tools—and, hopefully, some other browser manufacturers will start providing similar tools, since not all garbage collectors are created equal.

Introducing TracTicketGraph

posted 25 Oct 2010 2:18 CDT

Tagged: code, plugin, python and trac

0 Comments

The jQuery project uses Trac for bug tracking, and a request from several team members (including myself) was to have a page that showed overview of ticket activity over time. There are a couple different Trac plugins that exist to perform this task, but they don’t work very well. TracTicketStatsPlugin requires YUI (even though Trac uses jQuery), it loads a bunch of debug crap, it has SQL injection issues, it’s ugly, and it requires Flash. TracMetrixPlugin requires matplotlib, which in turn requires a bunch of X11 libraries, screws up Trac permissions in 0.12, has worthless documentation, and is very slow. So, I dove into the world of Python and Trac and came up with something that should be much better.

TracTicketGraph has no external dependencies and uses the awesome flot library to generate graphs. It is released under an MIT license. There are a few caveats, mostly due to time constraints, but anyone with some Python and JS skill should be able to take care of these issues quickly and easily:

You can use a “days” query parameter to change the number of previous days viewed, but there is not currently any UI for making this change.
The end date is always fixed to the current date.
The size of the graph is fixed in JS, instead of being configurable by CSS.
It is not internationalized.
There are no tooltips.

Download TracTicketGraph · View a screenshot

Standalone Universal Character Set Detection

posted 16 Sep 2010 20:37 CDT

Tagged: c, cli, code, standalone and universalchardet

1 Comment

Mozilla have a pretty nice universal character set detector built into their products. It’s modular, it’s quick, and it has a great deal of real-world research and testing behind it. I wanted to be able to use it as part of a project I am working on, but couldn’t find a nice standalone command-line version. There is a Java port, but the overhead of loading up a JVM just to detect the character set of a document was unappealing, and porting the entire codebase to another language would take too long (plus it would run a lot slower). So, I spent an evening learning some C/C++ and came up with just what I needed. I thought it might be useful to someone else, too, so I am releasing it here.

The README.txt contains compilation and usage instructions. I have no more words now. Get it below!

Download universalchardet

What the hell is this? I don’t even…

posted 13 Sep 2010 22:51 CDT

Tagged: imma let u finish and project

1 Comment

Introducing Imma Let U Finish

Imma Let U Finish these offers here is the code to get and give support for any other information on file! —An actual sentence generated by Imma Let U Finish

Imma Let U Finish abuses Google Scribe to generate sentences that almost make sense sometimes but nearly always end up hilariously baffling. Enjoy!

(Design by Nimbupani Designs. Thanks to Ben Truyman for the domain name.)

1 2 3 4 » Older Posts