Jesse Ruderman

Lessons from JS engine bugs

September 1st, 2011

Last week, I asked Luke Wagner to explain some security bugs that he fixed in the past. I hoped to learn from each bug at multiple levels, in ways that could help prevent future security bugs from arising and persisting.

Luke is one of the developers working on Firefox's JavaScript engine, which is currently our largest source of critical security bugs.

Method

I imagined we would recurse in exhaustive breadth and exhausting depth. Instead, we recursed only on the most interesting items, and refined a checklist of starting points:

What was the bug?
What went wrong in the developer's thinking that caused the bug to be introduced?
What made the bug exploitable?
What caused us to use especially dangerous features of C++?
Could a new abstraction make it possible to do this both fast and safe?
What caused the bug to persist? Could we have caught this earlier with improved regression tests, fuzz testing, dynamic analysis, or static analysis?

Luke and I made trees for all ten bugs, at first on paper and later using EtherPad. Then I extracted and categorized what I thought were the most useful lessons and recommendations.

Recommendations for introducing fewer bugs

Casts

Create centralized, type-restricted cast functions. This protects you when you change the representation of one of the types. It also protects against mistakes that cause the input type to be incorrect.

Sentinel values

Use tagged unions instead.
Use a typed wrapper (a struct containing a single value). When assigning from the underlying numeric type, convert using one of two functions: one that checks for special values, and one that explicitly does not.
Audit existing code paths to ensure they cannot generate the special value.

Clarity of invariants

Increase use of methods named AssertInvariants
Create an alias for JS_ASSERTION called JS_INVARIANT.

Interacting with other developers

If you're about to do something gross because someone else doesn't expose the right API/helper, maybe you should get it exposed.

JS Engine specific

Any patch that touches rooting should be reviewed by Igor.
Interpreter could have better abstraction and encapsulation for its stack.

Recommendations for catching bugs earlier

Static analysis

Find all casts (C-style casts, the reinterpret_cast keyword, and casts through unions) for a given type. Could be used to enforce centralization or to find things that should be centralized.
Be suspicious of a function with multiple return statements, all of which return the same primitive value.
Be suspicious of a function returning true/success in an OOM path.

Dynamic analysis

Ask Valgrind developers what they think of providing (in valgrind.h) a way to tie the addressability of "stacklike memory" to a variable that represents the end of the stack.

Fuzzing

We should fuzz worker threads somehow.
- In browser (slow and messy, but it's what users are running).
- In thread-safe shell (--enable-threadsafe?), which has "toy workers".
We should fuzz compartments better.
- I should ask Blake and Andreas for help with testing compartments and wrappers.
- I should ask Gary to run jsfunfuzz in xpcshell, where I can test both same-origin and different-origin compartments, and thus get more interesting wrappers.
We should give JS OOM fuzzing another shot.

Next steps

I'm curious if others have additional ideas for what could have prevented the ten bugs we looked at. For example, someone like Jeff Walden, who loves to write exhaustive regression tests, might have ideas that Luke and I did not consider.

I'd also like to do this kind of analysis with a other developers on bugs they have fixed.

Posted in JavaScript, Mozilla, Security | 1 Comment »

On the Isle of Rapidity

August 27th, 2011

Not all of our neighbors followed us. Some asked — demanded? — that we send back supplies.

We acknowledged their request, but our immediate task was to explore this Isle of Rapidity. What surprises would we discover? What surprises would discover us?

To survive in this strange land, we would have to befriend new neighbors. Living for so long atop Mount Annum, we had almost forgotten how to introduce ourselves.

But we had brought much to share. We had barely opened our packs when the wind seemed to whisper:

Here, gifts arrive almost before you send them.

Maybe it wouldn’t be so hard to make friends here.

And there was something inexplicably familiar about this island. Was it the scent of the flowers? The rhythmic waves in the distance? The chattering of wildlife, almost a chorus?

Here, a gift to your neighbor is equally a gift to yourself.

We felt a sudden shift in perception: the Isle of Rapidity was home.

Posted in Mozilla, Rapid release | 5 Comments »

Venturing from Mount Annum

August 27th, 2011

Our friends thought us mad.

We had thrived atop Mount Annum. It was the highest peak as far as the eye could see.

Once, we had sprained an ankle on the Foothills of Many Betas in the west. We remembered miserable visits to the Bog of Eternal Driver Approval in the east. Every time, we had quickly returned home.

But there we were, on the summit of Mount Annum, climbing into a cannon aimed at 4°N, 6°W.

So far away, and yet oddly specific. We lit the fuse and plugged our ears.

We flew over the stifling Bog of Eternal Driver Approval. We flew over the perilous Sea of Recklessness.

We landed, as we hoped, on the uncharted Isle of Rapidity. The landing was unexpectedly soft.

We’ve shaken off the dust and gunpowder. We’ve begun to tend our wounds. We’re excited about the upcoming climb.

Posted in Mozilla, Rapid release | 3 Comments »

Rapid releases and crashes

August 27th, 2011

In the months before Firefox's first rapid release, one concern echoed throughout engineering: crashes.

We had always relied on long stabilization periods to get crash rates down. Firefox 4 would be our last high-stability release. We hoped improvements on other aspects of quality would outweigh the decreased stability.

But then something surprising happened. We released Firefox 5, and Firefox didn’t get crashier.

Version	Crashes per 100 active daily users
3.6.20	1.8
4.0.1	1.6
5.0.1	1.4
6.0	1.6

KaiRo’s explanation parallels what I’ve seen helping with MemShrink:

The channel cascade gives each release 12 weeks of pure stabilization.
The channel audiences help by comparing alphas to alphas.
The short cycles enable backouts and reduce the desire to land half-baked features.

“Rapid release” doesn't mean building Firefox the way we always have, x times faster. It’s a new process that fits together in beautiful yet fragile ways.

Posted in Mozilla, Rapid release | 2 Comments »

Improving intranet compatibility

August 25th, 2011

Some organizations are reluctant to keep their browsers up-to-date because they worry that internal websites might not be compatible.

Organization-internal sites can have unusual compatibility constraints. Many have small numbers of users, yet are highly sensitive to downtime. Some were developed with the assumption that the web would always be as static as it was in 2003.

Rapid releases help in some ways: fewer things change at a time; we can deprecate APIs before removing them; and the permanent Aurora and Beta audiences help test each new release consistently.

But frequent releases make manual testing impractical. (Let's pretend for now that the roughly-monthly security "dot releases" never broke anything.)

As with the problem of extension compatibility, overlapping releases could be part of a solution. But we should start by thinking about ways to attack compatibility problems directly.

Automated testing

A tool could scan for the use of deprecated web features, such as the “-moz-” prefixed version of border-radius. This tool, similar in spirit to the AMO validator, could be run on the website's source code or on the streams received by Firefox.

There is already a compatibility detector for HTML issues, but my intuition is that CSS and DOM compatibility problems are more common.

User testing

Not all visitors have the technical skills and motivation to report issues they encounter. In some organizations, bureaucracy can stifle communication between visitors and developers. Automating error reports could help.

It would be cool if a Firefox extension could report warnings and errors from internal sites to a central server.

Depending on the privacy findings from this extension, it could become an onerror++ API available to all web sites, similar to the Web Performance APIs. This seems more sensible than adding API-specific error reporting.

Sample contracts

We could suggest things to put in contracts for outsourced intranet development.

Often, the best solution is to align incentives. That could take the form of specifying that the developers are responsible for maintenance costs for a specified length of time.

When that isn't practical, I'd suggest specifying requirements such as:

Follow forward-compatible practices: avoid browser-specific or prefixed features; sniff features rather than browsers; validate markup and CSS.
Include automated, browser-based tests for key workflows. (Perhaps using Selenium, a tool developed in part by Mozilla employees.)
Include non-obfuscated source code.

Channel management

There is a roughly exponential distribution of home users between the Nightly, Aurora, Beta, and Release channels. This helps Mozilla and public web sites fix incompatibilities before they affect large numbers of users.

Large organizations should strive for a similar channel distribution so that internal websites benefit in the same way. It might make sense for Mozilla to provide tools to help, perhaps based on our own update service or Build Your Own Browser.

Counterintuitively, the best strategy for security-conscious organizations may be to standardize on the Beta channel, with the option to downgrade to Release in an emergency. This isn't as crazy as it sounds. Today's betas are as stable as yesteryear's release candidates, thanks to the Aurora audience and the discipline made possible by the 6-week cadence. And since Beta gets security fixes sooner, they are safer in some ways.

The loss of release overlap takes away some options from IT admins and intranet developers, but rapid releases also make possible new strategies that could be better in the long term.

Posted in Mozilla, Rapid release, Security | 4 Comments »

Secure and compatible

August 25th, 2011

Previously, I discussed some of the ways Firefox's new rapid release process improves its security. But improving Firefox's security only helps users who actually update, and some people have expressed concern that rapid releases could make it more difficult for users to keep up.

There is some consensus on how to make the update process smoother and how to reduce the number of regressions that reach users. But incompatibilities pose a tougher problem.

We don't want users to have to choose between insecure and incompatible.

There are three ways to avoid facing Firefox users with this dilemma:

Prevent incompatibilities from arising in the first place.
Hasten the discovery of incompatibilities, and fix them quickly.
Overlap releases by delaying the end-of-life for the old version.

Overlapping releases

For many years, Mozilla tried to prevent this dilemma by providing overlap between releases. For example, Firefox 2 was supported for six months after Firefox 3.

We observed two security-related patterns that made us reluctant to continue providing overlapping releases:

First, some fixes were never backported, so users on the "old but supported" version were not as safe as they believed. Sometimes backports didn't happen because security patches required additional backports of architectural changes. Sometimes we were scared to backport large patches because we did not have large testing audiences for old branches. Some backports didn't happen because they affected web or add-on compatibility. And sometimes backports didn't happen simply because developers aren't focused on 2-year-old code. Providing old versions gave users a false sense of security.

Second, a feedback cycle developed between users lagging behind and add-ons lagging behind. Many stayed on the old version until the end-of-life date, and then encountered surprises when they were finally forced to update. Providing old versions did not actually shield users from urgent update dilemmas.

I feel we can only seriously consider returning to overlapping releases if we can first overcome these two problems.

Improving add-on compatibility

Everything we do to improve add-on compatibility helps to break the lag cycle.

The most ambitious compatibility project is Jetpack, which aims to create a more stable API for simple add-ons. Jetpack also has a permission model that promises to reduce the load on add-on reviewers, especially around release time.

Add-ons hosted on AMO now have their maxVersion bumped automatically based on source code scans. Authors of non-AMO-hosted add-ons can use the validator by itself or run the validator locally.

This month, Fligtar announced a plan of assuming compatibility. Under this proposal, only add-ons with obvious, fatal incompatibilities (such as binary XPCOM components compiled against the wrong version) will be required to bump maxVersion.

With assumed compatibility, we will be able to make more interesting use of crowdsourced compatibility data from early adopters. Meanwhile, the successful Firefox Feedback program may expand to cover add-ons.

Breaking the add-on lag cycle

Even with improvements to add-on compatibility, some add-ons will be updated late or abandoned. In these cases, we should seek to minimize the impact on users.

In Aurora 8, opt-in for third-party add-ons allows users to shed unwanted add-ons that had been holding them back.

When users have known-incompatible add-ons, Firefox should probabilistically delay updates for a few days. Many add-ons that are incompatible on the day of the release quickly become compatible. Users of those add-ons don't need to suffer through confusing UI.

But after a week, when it is likely that the add-on has been abandoned, Firefox should disable the add-on in order to update itself. It would be dishonest to ask users to choose between security and functionality once we know the latter is temporary.

Once we've solved the largest compatibility problems, we can have a more reasonable discussion about the benefits and opportunity costs of maintaining overlapping releases.

Posted in Mozilla, Rapid release, Security | 13 Comments »

Rapid releases and security

August 25th, 2011

Several people have asked me whether Mozilla's move to rapid releases has helped or hurt Firefox's security.

I think the new cadence has helped security overall, but it is interesting to look at both the ways it has helped and the ways it has hurt.

Security of new features

With release trains every 6 weeks, developers feel less pressure to rush half-baked features in just before each freeze.

Combined with Curtis's leadership, the rapid release cycle has made it possible for security reviews to happen earlier and more consistently. When we need in-depth reviews or meetings, we aren't overwhelmed by needing twenty in one month.

The rapid release process also necessitated new types of coordination, such as the use of team roadmaps and feature pages. The security team is able to take advantage of the new planning process to track which features might need review, even if the developers don't come to the security team and ask.

Security reviews are also more effective now. When we feel a new feature should be held back until a security concern is fixed, we create less controversy when the delay is only 6 weeks.

Security improvements and backports

Many security improvements require architectural changes that are difficult to backport to old versions. For example:

Firefox 3.6 added frame poisoning, making it impossible to exploit many crashes in layout code. Before frame poisoning, the layout module was one of the largest sources of security holes.
Firefox 4 fixed the long-standing :visited privacy hole. Mozilla was the first to create a patch, but due to Firefox's long development cycle, several other browsers shipped fixes first.
Firefox 6 introduced WeakMap, making it possible for extensions to store document-specific data without leaking memory and without allowing sites to see or corrupt the information. Extension authors might not be comfortable using WeakMap until most users are on Firefox 6 or higher.

Contrary to the hopes of some Linux distros and IT departments, it is no longer possible to backport "security fixes only" and have a browser that is safe, stable, and compatible.

We're constantly adding security features and mitigations for memory safety bugs. We're constantly reducing Firefox's attack surface by rewriting components from C++ into JavaScript (and soon Rust).

Disclosure windows

One area where rapid releases may hurt is the secrecy of security patches.

We keep security bug reports private until we've shipped a fix, but we check security patches into a public repository. Checking in patches allows us to test the fixes well, but opens the possibility of an attacker reverse-engineering the bug.

We check security patches into a public repository for two reasons:

We cannot rely entirely on in-house testing. Because of the variety of websites and extensions, our experience is that some regressions introduced by security patches are only found by volunteer testers.
We cannot ship public binaries, even just for testing, based on private security patches. This would violate both the spirit and letter of some open-source licenses. It also wouldn't be effective for secrecy: attackers have been using binary patch analysis to attack unpatched Microsoft software. (And that's without access to the to the old source!)

But we can shorten and combine the windows between patch disclosure and release. Instead of landing security patches on mozilla-central and letting them reach users in the normal 12-18 weeks, we can:

Land security fixes to all active branches at once. This may be safer now that the oldest active branch is at most 18 weeks old compared to mozilla-central. (In contrast, the Firefox 3.6.x series is based on a branch cut 2009-08-13, making it 24 months old.) But even 18 weeks is enough to create some regression risk, and it is not clear which audiences would test fixes on each branch.
Accelerate the channel flow for security bugs. For example, hold security fixes in a private repository until 3 weeks before each release. Then, give them a week on nightly, a week on aurora, and a week on beta. The same channel audiences that make rapid release possible would test security fixes, just for a shorter period of time. Something like this already happens on an ad-hoc basis in many bugs; formalizing the process could make it smoother.

There are significant security wins with rapid releases, and a few new challenges. Overall, it's a much better place to be than where we were a year ago.

(I posted followups about add-on compatibility and intranet compatibility.)

Posted in Mozilla, Rapid release, Security | 5 Comments »

Improving democracy

August 15th, 2011

An overview of how democracy can go wrong, even when everyone has good intentions

PNG · PDF · SVG · Fork on LucidChart

Election algorithms

Top left: We should reevaluate our choice of election algorithms.

Using simple plurality causes high levels of tactical voting and strategic nomination, and frequently produces results not desired by the majority.

A familiar example of tactical voting is declining to vote for your favorite third-party candidate, and instead voting for the "lesser of two evils", because you "don't want your vote to be thrown away".

A familiar example of strategic nomination is funding a weak opponent in the hope of splitting your main opponent's vote.

Simple plurality also entrenches two-party systems. This makes attack ads an effective strategy for politicians. The resulting polarization makes reasonable debate and compromise difficult.

I am a fan of instant round robin (Condorcet) methods. I especially like the beatpath (Schulze) method, since its independence of clones property suggests resistance to strategic nomination.

All election methods violate some intuitive criteria (Arrow). And all election methods sometimes admit tactical voting (Gibbard–Satterthwaite). But simple plurality is especially bad, and we should stop using it.

Moral heuristics

Top right: We should reevaluate our moral heuristics, upon realizing that our heuristics are motivated by a desire for (and often fail to create) prosperity, welfare, and happiness.

Moral heuristics serve us well in everyday life, where we are confronted with the need to make decisions quickly and with incomplete information. But the same heuristics can become counterproductive biases when applied to questions of policy. Examples include omission bias, as seen in the trolley-switch problem, and direct-effect bias, seen in the trolley-footbridge problem.

We should take care that we do not come to treat our moral heuristics as ends in and of themselves. (For more on the role of heuristics in consequentialist ethics, I suggest reading Siskind · Baron · Bennis+ · Bazerman+.)

Removing bad information

Top center: We should reevaluate our protections for freedom of speech, upon realizing that our protections are motivated by a desire for (and fail to create) effective democracy.

Protection for freedom of speech is motivated by a desire to ensure governments are not immune from criticism, to keep the powerless from feeling silenced, and to increase access to truth. In some cases, it is not clear that the protected speech furthers any of these goals.

Perhaps freedom of speech should limited in cases where the speaker has wide reach and says things that are demonstrably false, as an expansion of libel law. Or perhaps there should be limits on spending large amounts of money to amplify political speech. Other criteria that might be worth considering are intent to mislead, the speaker's power or incumbency, and whether the medium and timing make it possible to reply.

On the other hand, additional restrictions might not be worth the effort. The undesirable speech would probably become less effective, but not disappear completely. Any ambiguities in the law would create problems for both courts and speakers.

Adding good information

Center right: Perhaps the best way to combat incorrect information is with correct information.

The CFTC should stop discouraging the creation of economic prediction markets. Betting on unemployment, for example, serves legitimate hedging and transparency interests. Political discourse would improve if we had transparent predictions on topics other than the fate of large companies.

Policy experiments should be more common. Just as we require clinical trials for new medications, we should run field experiments for new policies when possible.

More government data should be open. Organizations like MySociety, Code For America, and Wolfram Alpha have done amazing things to help us visualize, interpret, and use the information available so far.

History and technology

Some advances in technology have weakened democracy by amplifying the problems highlighted in the chart.

First, increased interconnectedness is requiring democracies to operate at unprecedented scale. Advances in transportation and communication and warfare impel us to make some important decisions at the US federal level.

Increased distance and heterogeneity tests our individual capacity for empathy and altruism. Large scales magnify the incentives for concentrated interests to attempt to influence policy.

Second, the tools of subversion are improving. Advances in psychology and statistics allow for extremely manipulative advertising. Instant polling shifts focus from outcomes to opinions, and from policy to strategy.

But new tools that could strengthen democracy are also available, if we choose to use them.

Increasingly deep understanding of cognitive biases improves our capacity for reflection. Our experience running financial markets gives us ideas about how to create effective prediction markets.

And crucially, we have computers: computers to implement the voting algorithms invented as part of modern social choice theory, computers to run the advanced statistics that make field experiments reliable, and computers to allow citizens to make creative use of open government data.

Posted in Democracy, Economics, Ethics, Politics | 1 Comment »

« Previous Entries

Indistinguishable from Jesse

Lessons from JS engine bugs

Method

Recommendations for introducing fewer bugs

Recommendations for catching bugs earlier

Next steps

On the Isle of Rapidity

Venturing from Mount Annum

Rapid releases and crashes

Improving intranet compatibility

Automated testing

User testing

Sample contracts

Channel management

Secure and compatible

Overlapping releases

Improving add-on compatibility

Breaking the add-on lag cycle

Rapid releases and security

Security of new features

Security improvements and backports

Disclosure windows

Improving democracy

Election algorithms

Moral heuristics

Removing bad information

Adding good information

History and technology

Jesse

Firefox

Browser addons