Yehuda Katz is a member of the Ruby on Rails core team, and lead developer of the Merb project. He is a member of the jQuery Core Team, and a core contributor to DataMapper. He contributes to many open source projects, like Rubinius and Johnson, and works on some he created himself, like Thor.
@chriseppstein ppl are constantly complaining to ppl who do free work. The same complaints are levelled at jQuery and Rails. @brendaneich
January 10th, 2012
While reading Hacker News posts about JavaScript, I often come across the misconception that Ruby’s blocks are essentially equivalent to JavaScript’s “first class functions”. Because the ability to pass functions around, especially when you can create them anonymously, is extremely powerful, the fact that both JavaScript and Ruby have a mechanism to do so makes it natural to assume equivalence.
In fact, when people talk about why Ruby’s blocks are different from Python‘s functions, they usually talk about anonymity, something that Ruby and JavaScript share, but Python does not have. At first glance, a Ruby block is an “anonymous function” (or colloquially, a “closure”) just as a JavaScript function is one.
This impression, which I admittedly shared in my early days as a Ruby/JavaScript developer, misses an important subtlety that turns out to have large implications. This subtlety is often referred to as “Tennent’s Correspondence Principle”. In short, Tennent’s Correspondence Principle says:
“For a given expression
expr
,lambda expr
should be equivalent.”
This is also known as the principle of abstraction, because it means that it is easy to refactor common code into methods that take a block. For instance, consider the common case of file resource management. Imagine that the block form of File.open
didn’t exist in Ruby, and you saw a lot of the following in your code:
begin f = File.open(filename, "r") # do something with f ensure f.close end
In general, when you see some code that has the same beginning and end, but a different middle, it is natural to refactor it into a method that takes a block. You would write a method like this:
def read_file(filename) f = File.open(filename, "r") yield f ensure f.close end
And you’d refactor instances of the pattern in your code with:
read_file(filename) do |f| # do something with f end
In order for this strategy to work, it’s important that the code inside the block look the same after refactoring as before. We can restate the correspondence principle in this case as:
# do something with f
should be equivalent to:
do # do something with end
At first glance, it looks like this is true in Ruby and JavaScript. For instance, let’s say that what you’re doing with the file is printing its mtime. You can easily refactor the equivalent in JavaScript:
try { // imaginary JS file API var f = File.open(filename, "r"); sys.print(f.mtime); } finally { f.close(); }
Into this:
read_file(function(f) { sys.print(f.mtime); });
In fact, cases like this, which are in fact quite elegant, give people the mistaken impression that Ruby and JavaScript have a roughly equivalent ability to refactor common functionality into anonymous functions.
However, consider a slightly more complicated example, first in Ruby. We’ll write a simple class that calculates a File’s mtime and retrieves its body:
class FileInfo def initialize(filename) @name = filename end # calculate the File's +mtime+ def mtime f = File.open(@name, "r") mtime = mtime_for(f) return "too old" if mtime < (Time.now - 1000) puts "recent!" mtime ensure f.close end # retrieve that file's +body+ def body f = File.open(@name, "r") f.read ensure f.close end # a helper method to retrieve the mtime of a file def mtime_for(f) File.mtime(f) end end
We can easily refactor this code using blocks:
class FileInfo def initialize(filename) @name = filename end # refactor the common file management code into a method # that takes a block def mtime with_file do |f| mtime = mtime_for(f) return "too old" if mtime < (Time.now - 1000) puts "recent!" mtime end end def body with_file { |f| f.read } end def mtime_for(f) File.mtime(f) end private # this method opens a file, calls a block with it, and # ensures that the file is closed once the block has # finished executing. def with_file f = File.open(@name, "r") yield f ensure f.close end end
Again, the important thing to note here is that we could move the code into a block without changing it. Unfortunately, this same case does not work in JavaScript. Let’s first write the equivalent FileInfo
class in JavaScript.
// constructor for the FileInfo class FileInfo = function(filename) { this.name = filename; }; FileInfo.prototype = { // retrieve the file's mtime mtime: function() { try { var f = File.open(this.name, "r"); var mtime = this.mtimeFor(f); if (mtime < new Date() - 1000) { return "too old"; } sys.print(mtime); } finally { f.close(); } }, // retrieve the file's body body: function() { try { var f = File.open(this.name, "r"); return f.read(); } finally { f.close(); } }, // a helper method to retrieve the mtime of a file mtimeFor: function(f) { return File.mtime(f); } };
If we try to convert the repeated code into a method that takes a function, the mtime
method will look something like:
function() { // refactor the common file management code into a method // that takes a block this.withFile(function(f) { var mtime = this.mtimeFor(f); if (mtime < new Date() - 1000) { return "too old"; } sys.print(mtime); }); }
There are two very common problems here. First, this
has changed contexts. We can fix this by allowing a binding as a second parameter, but it means that we need to make sure that every time we refactor to a lambda we make sure to accept a binding parameter and pass it in. The var self = this
pattern emerged in JavaScript primarily because of the lack of correspondence.
This is annoying, but not deadly. More problematic is the fact that return
has changed meaning. Instead of returning from the outer function, it returns from the inner one.
This is the right time for JavaScript lovers (and I write this as a sometimes JavaScript lover myself) to argue that return
behaves exactly as intended, and this behavior is simpler and more elegant than the Ruby behavior. That may be true, but it doesn’t alter the fact that this behavior breaks the correspondence principle, with very real consequences.
Instead of effortlessly refactoring code with the same start and end into a function taking a function, JavaScript library authors need to consider the fact that consumers of their APIs will often need to perform some gymnastics when dealing with nested functions. In my experience as an author and consumer of JavaScript libraries, this leads to many cases where it’s just too much bother to provide a nice block-based API.
In order to have a language with return
(and possibly super
and other similar keywords) that satisfies the correspondence principle, the language must, like Ruby and Smalltalk before it, have a function lambda and a block lambda. Keywords like return
always return from the function lambda, even inside of block lambdas nested inside. At first glance, this appears a bit inelegant, and language partisans often accuse Ruby of unnecessarily having two types of “callables”, in my experience as an author of large libraries in both Ruby and JavaScript, it results in more elegant abstractions in the end.
It’s worth noting that block lambdas only make sense for functions that take functions and invoke them immediately. In this context, keywords like return
, super
and Ruby’s yield
make sense. These cases include iterators, mutex synchronization and resource management (like the block form of File.open
).
In contrast, when functions are used as callbacks, those keywords no longer make sense. What does it mean to return from a function that has already returned? In these cases, typically involving callbacks, function lambdas make a lot of sense. In my view, this explains why JavaScript feels so elegant for evented code that involves a lot of callbacks, but somewhat clunky for the iterator case, and Ruby feels so elegant for the iterator case and somewhat more clunky for the evented case. In Ruby’s case, (again in my opinion), this clunkiness is more from the massively pervasive use of blocks for synchronous code than a real deficiency in its structures.
Because of these concerns, the ECMA working group responsible for ECMAScript, TC39, is considering adding block lambdas to the language. This would mean that the above example could be refactored to:
FileInfo = function(name) { this.name = name; }; FileInfo.prototype = { mtime: function() { // use the proposed block syntax, `{ |args| }`. this.withFile { |f| // in block lambdas, +this+ is unchanged var mtime = this.mtimeFor(f); if (mtime < new Date() - 1000) { // block lambdas return from their nearest function return "too old"; } sys.print(mtime); } }, body: function() { this.withFile { |f| f.read(); } }, mtimeFor: function(f) { return File.mtime(f); }, withFile: function(block) { try { var f = File.open(this.name, "r"); block(f); } finally { f.close(); } } };
Note that a parallel proposal, which replaces function-scoped var
with block-scoped let
, will almost certainly be accepted by TC39, which would slightly, but not substantively, change this example. Also note block lambdas automatically return their last statement.
Our experience with Smalltalk and Ruby show that people do not need to understand the SCARY correspondence principle for a language that satisfies it to yield the desired results. I love the fact that the concept of “iterator” is not built into the language, but is instead a consequence of natural block semantics. This gives Ruby a rich, broadly useful set of built-in iterators, and language users commonly build custom ones. As a JavaScript practitioner, I often run into situations where using a for loop is significantly more straight-forward than using forEach
, always because of the lack of correspondence between the code inside a built-in for loop and the code inside the function passed to forEach
.
For the reasons described above, I strongly approve of the block lambda proposal and hope it is adopted.
Posted in Other | 16 Comments »
December 12th, 2011
After we announced Amber.js last week, a number of people brought Amber Smalltalk, a Smalltalk implementation written in JavaScript, to our attention. After some communication with the folks behind Amber Smalltalk, we started a discussion on Hacker News about what we should do.
Most people told us to stick with Amber.js, but a sizable minority told us to come up with a different name. After thinking about it, we didn’t feel good about the conflict and decided to choose a new name.
Henceforth, the project formerly known as SproutCore 2.0 will be known as Ember.js. Our new website is up at www.emberjs.com
(and yes, we know this is pretty ridiculous)
Posted in Other | 22 Comments »
December 8th, 2011
A little over a year ago, I got my first serious glimpse at SproutCore, the JavaScript framework Apple used to build MobileMe (now iCloud). At the time, I had worked extensively with jQuery and Rails on client-side projects, and I had never found the arguments for the “solutions for big apps” very compelling. At the time, most of the arguments (at least within the jQuery community) focused on bringing more object orientation to JavaScript, but I never felt that they offered the layers of abstraction you really want to manage complexity.
When I first started to play with SproutCore, I realized that the bindings and computed properties were what gave it its real power. Bindings and computed properties provide a clean mechanism for building the layers of abstractions that improve the structure of large applications.
But even before I got involved in SproutCore, I had an epiphany one day when playing with Mustache.js. Because Mustache.js was a declarative way of describing a translation from a piece of JSON to HTML, it seemed to me that there was enough information in the template to also update the template when the underlying data changed. Unfortunately, Mustache.js itself lacked the power to implement this idea, and I was still lacking a robust enough observer library.
Not wanting to build an observer library in isolation (and believing that jQuery’s data support would work in a pinch), I started working on the first problem: building a template engine powerful enough to build automatically updating templates. The kernel of the idea for Handlebars (helpers and block helpers as the core primitives) came out of a discussion with Carl Lerche back when we were still at Engine Yard, and I got to work.
When I met SproutCore, I realized that it provided a more powerful observer library than anything I was considering at the time for the data-binding aspect of Handlebars, and that SproutCore’s biggest weakness was the lack of a good templating solution in its view layer. I also rapidly became convinced that bindings and computed properties were a significantly better abstraction, and allowed for hiding much more complexity, than manually binding observers.
After some months of retooling SproutCore with Tom Dale to take advantage of an auto-updating templating solution that fit cleanly into SproutCore’s binding model, we reached a crossroads. SproutCore itself was built from the ground up to provide a desktop-like experience on desktop browsers, and our ultimate plan had started to diverge from the widget-centric focus of many existing users of SproutCore. After a lot of soul-searching, we decided to start from scratch with SproutCore 2.0, taking with us the best, core ideas of SproutCore, but leaving the large, somewhat sprawling codebase behind.
Since early this year, we have worked with several large companies, including ZenDesk, BazaarVoice and LivingSocial, to iterate on the core ideas that we started from to build a powerful framework for building ambitious applications.
Throughout this time, though, we became increasingly convinced that calling what we were building “SproutCore 2.0″ was causing a lot of confusion, because SproutCore 1.x was primarily a native-style widget library, while SproutCore 2.0 was a framework for building web-based applications using HTML and CSS for the presentation layer. This lack of overlap causes serious confusion in the IRC room, mailing list, blog, when searching on Google, etc.
To clear things up, we have decided to name the SproutCore-inspired framework we have been building (so far called “SproutCore 2.0″) “Amber.js”. Amber brings a proven MVC architecture to web applications, as well as features that eliminate common boilerplate. If you played with SproutCore and liked the concepts but felt like it was too heavy, give Amber a try. And if you’re a Backbone fan, I think you’ll love how little code you need to write with Amber.
In the next few days, we’ll be launching a new website with examples, documentation, and download links. Stay tuned for further updates soon.
UPDATE: The code for Amber.js is still, as of December 8, hosted at the SproutCore organization. It will be moved and re-namespaced within a few days.
Posted in Other | 63 Comments »
November 19th, 2011
The primary reason I enjoy working with Rubinius is that it exposes, to Ruby, much of the internal machinery that controls the runtime semantics of the language. Further, it exposes that machinery primarily in order to enable user-facing semantics that are typically implemented in the host language (C for MRI, C and C++ for MacRuby, Java for JRuby) to be implemented in Ruby itself.
There is, of course, quite a bit of low-level functionality in Rubinius implemented in C++, but a surprising number of things are implemented in pure Ruby.
One example is the Binding
object. To create a new binding in Rubinius, you call Binding.setup
:
def self.setup(variables, code, static_scope, recv=nil) bind = allocate() bind.self = recv || variables.self bind.variables = variables bind.code = code bind.static_scope = static_scope return bind end
This method takes a number of more primitive constructs, which I will explain as this article progresses, but we can describe the constructs that make up the high-level Ruby Binding
in pure Ruby.
In fact, Rubinius implements Kernel#binding
itself in terms of Binding.setup
.
def binding return Binding.setup( Rubinius::VariableScope.of_sender, Rubinius::CompiledMethod.of_sender, Rubinius::StaticScope.of_sender, self) end
Yes, you’re reading that right. Rubinius exposes the ability to extract the constructs that make up a binding, one at a time, from a caller’s scope. And this is not just a hack (like Binding.of_caller for a short time in MRI). It’s core to how Rubinius manages eval
, which of course makes heavy use of bindings.
For a while, I have wanted the ability to Marshal.dump
a proc in Ruby. MRI has historically disallowed it, but there’s nothing conceptually impossible about it. A proc itself is a blob of executable code, a local variable scope (which is just a bunch of pointers to other objects), and a constant lookup scope. Rubinius exposes each of these constructs to Ruby, so Marshaling a proc simply means figuring out how to Marshal each of these constructs.
Let’s take a quick detour to learn about the constructs in question.
Rubinius represents Ruby’s constant lookup scope as a Rubinius::StaticScope
object. Perhaps the easiest way to understand it would be to look at Ruby’s built-in Module.nesting
function.
module Foo p Module.nesting module Bar p Module.nesting end end module Foo::Bar p Module.nesting end # Output: # [Foo] # [Foo::Bar, Foo] # [Foo::Bar]
Every execution context in Rubinius has a Rubinius::StaticScope
, which may optionally have a parent scope. In general, the top static scope (the static scope with no parent) in any execution context is Object
.
Because Rubinius allows us to get the static scope of a calling method, we can implement Module.nesting
in Rubinius:
def nesting scope = Rubinius::StaticScope.of_sender nesting = [] while scope and scope.module != Object nesting << scope.module scope = scope.parent end nesting end
A static scope also has an addition property called current_module
, which is used during class_eval
to define which module the runtime should add new methods to.
Adding Marshal.dump
support to a static scope is therefore quite easy:
class Rubinius::StaticScope def marshal_dump [@module, @current_module, @parent] end def marshal_load(array) @module, @current_module, @parent = array end end
These three instance variables are defined as Rubinius slots, which means that they are fully accessible to Ruby as instance variables, but don’t show up in the instance_variables
list. As a result, we need to explicitly dump the instance variables that we care about and reload them later.
A compiled method holds the information necessary to execute a blob of Ruby code. Some important parts of a compiled method are its instruction sequence (a list of the compiled instructions for the code), a list of any literals it has access to, names of local variables, its method signature, and a number of other important characteristics.
It’s actually quite a complex structure, but Rubinius has already knows how to convert an in-memory CompiledMethod
into a String, as it dumps compiled Ruby files into compiled files as part of its normal operation. There is one small caveat: this String form that Rubinius uses for its compiled method does not include its static scope, so we will need to include the static scope separately in the marshaled form. Since we already told Rubinius how to marshal a static scope, this is easy.
class Rubinius::CompiledMethod def _dump(depth) Marshal.dump([@scope, Rubinius::CompiledFile::Marshal.new.marshal(self)]) end def self._load(string) scope, dump = Marshal.load(string) cm = Rubinius::CompiledFile::Marshal.new.unmarshal(dump) cm.scope = scope cm end end
A variable scope represents the state of the current execution context. It contains all of the local variables in the current scope, the execution context currently in scope, the current self
, and several other characteristics.
I wrote about the variable scope before. It’s one of my favorite Rubinius constructs, because it provides a ton of useful runtime information to Ruby that is usually locked away inside the native implementation.
Dumping and loading the VariableScope
is also easy:
class VariableScope def _dump(depth) Marshal.dump([@method, @module, @parent, @self, nil, locals]) end def self._load(string) VariableScope.synthesize *Marshal.