The Idea Shop

An Easy Derivation of the Volume of Spheres Formula

Working 2,000 years before the development of calculus, the Greek mathematician Archimedes worked out a simple formula for the volume of a sphere:

Of his many mathematical contributions, Archimedes was most proud of this result, even going so far as to ask that the method he used to work out the formula -- a diagram circumscribing a cylinder inside a sphere -- be imprinted on his gravestone.

Archimedes' formula may have been a stroke of scientific genius in 250 B.C., but with the help of modern calculus the derivation is extremely simple. In this post I'll explain one way to derive the famous formula, and explain how it can be done in dimensions other than the usual three.

The Derivation
Consider the diagram below. It's a sphere with radius r. The goal is to find the volume, and here's how we do that.

Notice that one thing we can easily find is the area of a single horizontal slice of the ball. This is the shaded disk at the top of the diagram, which is drawn at height z. The disk has a radius of x, which we'll need to find the area of the disk. To find x, we can form a right triangle with sides z and x, and hypotenuse r. This is drawn in the figure. Then we can easily solve for x.

By the Pythagorean theorem, we know that , so solving for x we have . Then the area of the shaded disk is simply pi times the radius squared, or .

Now that we have the area of one horizontal disk, we want to find the area of all horizontal disks inside the ball summed together. That will give us the volume of the sphere. To do this, we simply take the definite integral of the disk area formula from above for all possible heights z, which are between -r (at the bottom of the ball) and r (at the top of the ball). That is, our volume is given by

Which is the volume formula we were looking for.

This same logic can be used to derive formulas for the volume of a "ball" in 4, 5, and higher dimensions as well. Doing so, you can show that the volume of a unit ball in one dimension (a line) is just 2; the volume in two dimensions (a disk) is , and -- as we've just shown -- the volume in three dimensions (a sphere) is . Continuing on to four, five, and ultimately n dimensions, a surprising result appears.

It turns out the volume of a unit ball peaks at five dimensions, and then proceeds to shrink thereafter, ultimately approaching zero as the dimension n goes to infinity. You can read more about this beautiful mathematical result here.

Addendum: You can find a clear explanation for how the volume-of-spheres formula generalizes to n dimensions on page 888-889 here.

Posted by Andrew on Monday August 16, 2010 | Feedback?

* * *

A Simpler Proof of the Bolzano-Weierstrass Theorem

A while back I posted a long proof of the Bolzano-Weierstrass theorem -- also known as the "sequential compactness theorem" -- which basically says every sequence that's bounded has a subsequence within it that converges. Here's a much shorter and simpler version of it.

First we'll prove a lemma that shows for any sequence we can always find a monotone subsequence -- that is, a subsequence that's always increasing or decreasing.

Lemma. Every sequence has a monotone subsequence.

Proof. Let be a sequence. Define a "peak" of as an element such that for all . That is is a peak if forever after that point going forward, there is no other element of the sequence that is greater than . Intuitively, think of shining a flashlight from the right onto the "mountain range" of a sequence's plotted elements. If the light hits an element, that element is a peak.

If has infinitely many peaks, then collect those peaks into a subsequence . This is a monotone decreasing subsequence, as required.

If has finitely many peaks, take n to be the position of the last peak. Then we know is not a peak. So there exists an n_1" target="_blank"> n_1" title="\inline n_2 > n_1" /> such that a_{n_1 @plus; 1}" target="_blank"> a_{n_1 + 1}" title="\inline a_{n_2} > a_{n_1 + 1}" />. Call this point . We also know that is not a peak either. So there also exists an n_2" target="_blank"> n_2" title="\inline n_3 > n_2" /> such that a_{n_2}" target="_blank"> a_{n_2}" title="\inline a_{n_3} > a_{n_2}" />. Call this point .

Continuing, we can create a subsequence that is monotone increasing. In either case -- if our sequence has infinite or finitely many peaks -- we can always find a monotone subsequence, as required.

Now that we've proved the above lemma, the proof of the Bolzano-Weierstrass theorem follows easily.

Theorem (Bolzano-Weierstrass).Every bounded sequence has a convergent subsequence.

Proof. By the previous lemma, every sequence has a monotone subsequence. Call this . Since is bounded by assumption, then the subsequence is also bounded. So by the monotone convergence theorem, since is monotone and bounded, it converges. So every bounded sequence has a convergent subsequence, completing the proof.

Posted by Andrew on Tuesday July 20, 2010 | Feedback?

* * *

New Study of the Kerry-Lieberman "American Power Act"

We've released our latest study today. The paper explores the economic and distributional impact of the "American Power Act" or Kerry-Lieberman cap-and-trade bill. The study is pretty comprehensive, covering several theoretical issues raised by the bill as well as providing new distributional estimates of the bill's cost by income, age group, region and family type. Check out the full study here.

Posted by Andrew on Tuesday June 29, 2010 | Feedback?

* * *

The Linear Algebra View of Calculus: Taking a Derivative with a Matrix

Most people think of linear algebra as a tool for solving systems of linear equations. While it definitely helps with that, the theory of linear algebra goes much deeper, providing powerful insights into many other areas of math.

In this post I'll explain a powerful and surprising application of linear algebra to another field of mathematics -- calculus. I'll explain how the fundamental calculus operations of differentiation and integration can be understood instead as a linear transformation. This is the "linear algebra" view of basic calculus.

Taking Derivatives as a Linear Transformation
In linear algebra, the concept of a vector space is very general. Anything can be a vector space as long as it follows two rules.

The first rule is that if u and v are in the space, then u + v must also be in the space. Mathematicians call this "closed under addition." Second, if u is in the space and c is a constant, then cu must also be in the space. This is known as "closed under scalar multiplication." Any collection of objects that follows those two rules -- they can be vectors, functions, matrices and more -- qualifies as a vector space.

One of the more interesting vector spaces is the set of polynomials of degree less than or equal to n. This is the set of all functions that have the following form:

where a0...an are constants.

Is this really a vector space? To check, we can verify that it follows our two rules from above. First, if p(t) and q(t) are both polynomials, then p(t) + q(t) is also a polynomial. That shows it's closed under addition. Second, if p(t) is a polynomial, so is c times p(t), where c is a constant. That shows it's closed under scalar multiplication. So the set of polynomials of degree at most n is indeed a vector space.

Now let's think about calculus. One of the first methods we learn is taking derivatives of polynomials. It's easy. If our polynomial is ax^2 + 3x, then our first derivative is 2ax + 3. This is true for all polynomials. So the general first derivative of an nth degree polynomial is given by:

The question is: is this also a vector space? To answer that, we check to see that it follows our two rules above. First, if we add any two derivatives together, the result will still be the derivative of some polynomial. Second, if we multiply any derivative by a constant c, this will still be the derivative of some polynomial. So the set of first derivatives of polynomials is also a vector space.

Now that we know polynomials and their first derivatives are both vector spaces, we can think of the operation "take the derivative" as a rule that maps "things in the first vector space" to "things in the second vector space." That is, taking the derivative of a polynomial is a "linear transformation" that maps one vector space (the set of all polynomials of degree at most n) into another vector space (the set of all first derivatives of polynomials of degree at most n).

If we call the set of polynomials , then the set of derivatives of this is , since taking the first derivative will reduce the degree of each polynomial term by 1. Thus, the operation "take the derivative" is just a function that maps . A similar argument shows that "taking the integral" is also a linear transformation in the opposite direction, from .

Once we realize differentiation and integration from calculus is really just a linear transformation, we can describe them using the tools of linear algebra.

Here's how we do that. To fully describe any linear transformation as a matrix multiplication in linear algebra, we follow three steps.

First, we find a basis for the subspace in the domain of the transformation. That is, if our transformation is from , we first write down a basis for .

Next, we feed each element of this basis through the linear transformation, and see what comes out the other side. That is, we apply the transformation to each element of the basis, which gives the "image" of each element under the transformation. Since every element of the domain is some combination of those basis elements, by running them through the transformation we can see the impact the transformation will have on any element in the domain.

Finally, we collect each of those resulting images into the columns of a matrix. That is, each time we run an element of the basis through the linear transformation, the output will be a vector (the "image" of the basis element). We then place these vectors into a matrix D, one in each column from left to right. That matrix D will fully represent our linear transformation.

An Example for Third-Degree Polynomials
Here's an example of how to do this for , the set of all polynomials of at most degree 3. This is the set of all functions of the following form:

where at a0...a3 are constants. When we apply our transformation, "take the derivative of this polynomial," it will reduce the degree of each term in our polynomial by one. Thus, the transformation D will be a linear mapping from to , which we write as .

To find the matrix representation for our transformation, we follow our three steps above: find a basis for the domain, apply the transformation to each basis element, and compile the resulting images into columns of a matrix.

First we find a basis for . The simplest basis is the following: 1, t, t^2, and t^3. All third-degree polynomials will be some linear combination of these four elements. In vector notation, we say that a basis for is given by:

Now that we have a basis for our domain , the next step is to feed the elements of it into the linear transformation to see what it does to them. Our linear transformation is, "take the first derivative of the element." So to find the "image" of each element, we just take the first derivative.

The first element of the basis is 1. The derivative of this is just zero. That is, the transformation D maps the vector (1, 0, 0, 0) to (0, 0, 0). Our second element is t. The derivative of this is just one. So the transformation D maps our second basis vector (0, t, 0, 0) to (1, 0, 0). Similarly for our third and fourth basis vectors, the transformation maps (0, 0, t^2, 0) to (0, 2t, 0), and it maps (0, 0, 0, t^3) to (0, 0, 3t^2).

Applying our transformation to the four basis vectors, we get the following four images under D:

Now that we've applied our linear transformation to each of our four basis vectors, we next collect the resulting images into the columns of a matrix. This is the matrix we're looking for -- it fully describes the action of differentiation for any third-degree polynomial in one simple matrix.

Collecting our four image vectors into a matrix, we have:

This matrix gives the linear algebra view of differentiation from calculus. Using it, we can find the derivative of any polynomial of degree three by expressing it as a vector and multiplying by this matrix.

For example, consider the polynomial . Note that the first derivative of this polynomial is ; we'll use this in a minute. In vector form, this polynomial can be written as:

To find its derivative, we simply multiply this vector by our D matrix from above:

which is exactly the first derivative of our polynomial function!

This is a powerful tool. By recognizing that differentiation is just a linear transformation -- as is integration, which follows a similar argument that I'll leave as an exercise -- we can see it's really just a rule that linearly maps functions in to functions in .

In fact, all m x n matrices can be understood in this way. That is, an m x n matrix is just a linear mapping that sends vectors in into . In the case of the example above, we have a 3 x 4 matrix that sends polynomials in (such as ax^3 + bx^2 + cx +d, which has four elements) into the space of first derivatives in (in this case, 3ax^2 + 2bx +c, which has three elements).

For more on linear transformations, here's a useful lecture from MIT's Gilbert Strang.

Posted by Andrew on Wednesday April 21, 2010 | Feedback?

* * *

A Simple Way of Solving Exact Ordinary Differential Equations

Exact differential equations are interesting and easy to solve. But you wouldn't know it from the way they're taught in most textbooks. Many authors stumble through pages of algebra trying to explain the method, leaving students baffled.

Thankfully, there's an easier way to understand exact differential equations. Years ago, I tried to come up with the simplest possible way of explaining the method. Here's what I came up with.

The entire method of solving exact differential equations can be boiled down to the diagram below: "Exact ODEs in a Nutshell."

Recall that exact ODEs are ones that we can write as M(x.y) + N(x,y)*y' = 0, where M and N are continuous functions, and y' is dy/dx. Here is how to read the diagram.

Starting with an exact ODE, we're on the second line labeled "starting point." We have functions M and N, and our goal is to move upward toward the top line labeled "goal." That is, given an exact ODE, we want to find a solution F(x,y) = c whose first partial derivatives are Fx (which is just the function M) and Fy (which is the function N).

Before we do anything, we check that our equation is really exact. To do this, we move to the bottom line labeled "test for exactness." That is, we take the derivative of Fx = M with respect to y (giving us Fxy = My). And we take the derivative of Fy = N with respect to x (which gives us Fyx = Nx). Set these equal to each other. A basic theorem from calculus says that the mixed partial derivatives Fxy and Fyx will be the same for any function F(x,y). If they're equal, F(x,y) on the top line is guaranteed to exist.

Now we can solve for the function F(x.y). The diagram makes it easy to see how. We know M(x,y) is just the first partial derivative of F with respect to x. So we can move upward toward F(x,y) by integrating M with respect to x. Similarly, we know the function N(x,y) is just the first partial derivative of F(x,y) with respect to y, so we can find another candidate for F by integrating N with respect to y.

In the end, we'll have two candidates for F(x,y). Sometimes they're the same, in which case we're done. Sometimes they're different, as one will have a term the other won't have -- a term that got dropped to zero as we differentiated from F(x,y) to either Fx or Fy, since it's a function of only one of x or y. This is easy to solve: just combine all the terms from both candidates for F(x,y), omitting any duplicate terms. This will be our solution F(x,y) = c.

Try using this method on a few examples here. I think you'll find it's much simpler -- and easier to remember years later -- than the round-about method used in most textbooks.

Posted by Andrew on Tuesday April 13, 2010 | Feedback?

* * *

A More Elegant View of the Correlation Coefficient

One of the first things students learn in statistics is the "correlation coefficient" r, which measures the strength of the relationship between two variables. The formula given in most textbooks is something like the following:

where x and y are the data sets we're trying to measure the correlation of.

This formula can be useful, but also has some major disadvantages. It's complex, hard to remember, and gives students almost no insight into what the correlation coefficient is really measuring. In this post I'll explain an alternative way of thinking about "r" as the cosine of the angle between two vectors. This is the "linear algebra view" of the correlation coefficient.

A Different View of Correlation
The idea behind the correlation coefficient is that we want a standard measure of how "related" two data sets x and y are. Rather than thinking about data sets, imagine instead that we place our x and y data into two vectors u and v. These will be two n-dimensional arrows pointing through space. The question is: how "similar" are these arrows to each other? As we'll see below, the answer is given by the correlation coefficient between them.

The figure below illustrates the idea of measuring the "similarity" of two vectors v1 and v2. In the figure, the vectors are separated by an angle theta. A pretty good measure of how "similar" they are is the cosine of theta. Think about what cosine is doing. If both v1 and v2 point in roughly the same direction, the cosine of theta will be about 1. If they point in opposite directions, it will be -1. And if they are perpendicular or orthogonal, it will be 0. In this way, the cosine of theta fits our intuition pretty well about what it means for two vectors to be "correlated" with each other.

What is the cosine of theta in the figure? From the geometry of right triangles, recall that the cosine of an angle is the ratio of the length of the adjacent side to the length of the hypotenuse. In the figure, we form a right triangle by projecting the vector v1 down onto v2. This gives us a new vector p. The cosine of theta is then given by:

Now suppose we're interested in the correlation between two data sets x and y. Imagine we normalize x and y by subtracting from each data point the mean of the data set. Let's call these new normalized data sets u and v. So we have:

The question is, how "correlated" or "similar" are these vectors u and v to each other in space? That is, what is the cosine of the angle between u and v? This is simple: from the formula derived above, the cosine is given by:

But since and , this means the cosine of theta is just the correlation coefficient between the two vectors u and v, or:

From this perspective, the correlation coefficient has an elegant geometric interpretation. If two data sets are positively correlated, they should roughly "point in the same direction" when placed into n-dimensional vectors. If they're uncorrelated, they should point in directions that are orthogonal to each other. And if they're negatively correlated, they should point in roughly opposite directions.

The cosine of the angle between two vectors nicely fits that intuition about correlation. So it's no surprise the two ideas are ultimately the same thing -- a much simpler interpretation of "r" than the usual textbook formulas.

Posted by Andrew on Saturday March 13, 2010 | Feedback?

* * *

Deriving the Logistic Population Growth Model

Note: This article has been updated to correct an algebra typo as of September 6, 2012.

Developing a good model for population growth is a pretty common problem faced by economists doing applied work. In this post, I'll walk through the derivation of a simple but flexible model I've used in the past known as the "logistic" model of population growth.

Some Background
I first ran into the problem of modeling population growth when building a gas tax forecasting model for the City of Seattle. In Washington State, gas taxes are first collected at the state level, and revenue is then distributed to cities based on population. From the standpoint of a particular city, population shifts between areas can have a big impact on gas tax revenue. This may not matter much for short-term forecasts, but it can have a huge effect on 20-year forecasts that serve as the basis for the city's long-term transportation plans.

To address this issue, we developed a model of city-level population growth as part of the larger tax revenue forecasting model, dramatically improving the quality of our long-term forecasts. In the rest of this post, I'll walk through the derivation of the population model we ended up using.

The Naïve Model: Exponential Growth
The simplest way of modeling population is to assume "exponential" growth. That is, just assume population grows by some annual rate, forever. If we let "y" be a city's population and "k" be the annual growth rate, the exponential growth model is given by

This is a simple first-order differential equation. We can solve this for "y" by using a technique called "separation of variables". First, we separate variables like this:

Then we integrate both sides and solve for y, as follows:

Since C is just an arbitrary constant, we can let e^C just equal C, which gives us

where k is the annual growth rate, t is the number of years from today, and C is the population at time t=0. This is the famous "exponential growth" model.

While the exponential model is useful for short-term forecasts, it gives unrealistic estimates for long time periods. After just a few decad