Posted to Tutorials  /  Tags: R, treemap

An Easy Way to Make a Treemap

By Nathan Yau

If your data is a hierarchy, a treemap is a good way to show all the values at once and keep the structure in the visual. This is a quick way to make a treemap in R.

spacer

Back in 1990, Ben Shneiderman, of the University of Maryland, wanted to visualize what was going on in his always-full hard drive. He wanted to know what was taking up so much space. Given the hierarchical structure of directories and files, he first tried a tree diagram. It got too big too fast to be useful though. Too many nodes. Too many branches.

The treemap was his solution. It's an area-based visualization where the size of each rectangle represents a metric since made popular by Martin Wattenberg's Map of the Market and Marcos Weskamp's newsmap.

Here's a really easy way to make your own treemap in just a couple lines of code. We're looking to make something like the above.

Step 0. Download R

Like before, we're going to use R, so you'll want to get it before going any further. Download it for Windows, Mac, or Linux. Don't let the out-dated site full you. You can get a lot done with the free software.

Step 1. Load the Data

We'll use data covering a hundred popular posts on FlowingData. Here it is in CSV format. You don't have to download it though. We'll just load it directly into R. The main thing to take note of is what is there. There's post id, number of views, number of comments, and category.

Okay, let's load it into R using read.csv():

data <- read.csv("datasets.flowingdata.com/post-data.txt")

Loading data in CSV format into R.spacer

Easy enough. We just used the read.csv() function to load data from a URL. If your data is on your computer, you could also do something like data <- read.csv("post-data.txt"). Just make sure the data file is in your current working directory, which you can change via the "Miscellaneous" menu.

Step 2. Load the Portfolio package

Only a few more lines of code, and you've got a treemap. It's so easy, because we're going to use the portfolio library in R. First, you have to install it. You can either install the library via the "Package Installer" or you can do it through the command line. Let's do the latter. Type this in the console to install portfolio:

install.packages("portfolio")

Once installed, load it into R:

library(portfolio)

Step 3. Make the Treemap

It's time to make the treemap with map.market(). Type this in the console:

map.market(id=data$id, area=data$views, group=data$category, color=data$comments, main="FlowingData Map")

Tada. You should get something like this:

The default treemap uses a red-green color scale.spacer

To sum up, we did this with four lines of code:

data <- read.csv("datasets.flowingdata.com/post-data.txt")
install.packages("portfolio")
library(portfolio)
map.market(id=data$id, area=data$views, group=data$category, color=data$comments, main="FlowingData Map")

Step 4. Customize

Now maybe you want to modify something like color. The cool thing about R is that you can see the code for all the functions, edit it, and then use your customized version. If the green and red scheme isn't for you or you don't care about the positive/negative cutoff, then you can change the code to do that. I won't go into detail, but if you type map.market in the console, you'll see the function. You can change color or cutoff around lines 36-46.

For example, you can do a black and white color scheme:

You don't have to stick to the default color scale though.spacer

I was alright with the green for this, so I saved it as a PDF and then loaded it into Illustrator as usual. I numbed the green some, cleaned up the labels with a new font and layout, and updated the legend.

Touched up version of treemap with black-green color scale.spacer

And there you go - a treemap with just a few lines of code in our all-trusty R. Rinse and repeat with your own data.

For more examples, guidance, and all-around data goodness like this, order Visualize This, the FlowingData book on visualization, design, and statistics.

  • Facebook
  • Twitter
  • Reddit
spacer

About the Author

Nathan Yau is a statistician who works primarily with visualization. He earned his PhD in statistics from UCLA, is the author of two best-selling books — Data Points and Visualize This — and runs FlowingData. Introvert. Likes food. Likes beer. Follow him @flowingdata.

Become a member. Instant access to tutorials and resources. Support FlowingData.

Join Now

Membership

This is for people who want to learn to make and design data graphics. Your support goes directly to FlowingData, an independently run site.

What You Get

  • Instant access to tutorials on how to make and design data graphics
  • Source code and files to use with your own data
  • Four-week course on visualization in R
  • Hand-picked links and resources from around the web

49 Comments

  • spacer
    jerome cukier — February 11, 2010 at 4:41 am

    then again, if R is too scaRy, there’s always www.many-eyes.com.

    • spacer
      Nathan Yau — February 11, 2010 at 10:20 am

      if customization or data privacy isn’t an issue, always go with many eyes :)

      • spacer
        Vlad G. — February 17, 2010 at 5:57 pm

        What about Data Applied? Their free plan keeps visualizations private (or you can embed in HTML to share). I find their stuff slightly better than Many Eyes, except for text mining.

  • spacer
    Hrishi Mittal — February 11, 2010 at 5:55 am

    Thanks for another simple tutorial, Nathan. In your example, I find it interesting that the number of views doesn’t necessarily correlate with the number of comments. What’s even more striking is how topics like Ugly Visualization and Mistaken Data get more comments than some of the arguably more interesting topics. I guess people love to critcise?

    • spacer
      Buck Field — February 11, 2010 at 7:34 am

      Hrishi,

      Your comment that people love to criticize is totally, completely, absolutely wrong in so many ways. ;)

    • spacer
      Nathan Yau — February 11, 2010 at 10:04 am

      people looove to band together over something that is clearly wrong. it should be noted also though that mistaken data and ugly viz only have one post each in the top 100.

      another thing is like when stuff gets hot on digg or stumbleupon. they might bring a lot of views, but not a whole lot of discussion.

  • spacer
    Adam Nieman — February 11, 2010 at 8:48 am

    Thanks very much for this helpful tutorial. You’ve finally convinced me to check out R.

    Though I love www.many-eyes.com it’s frustrating not to have total control of your treemap.

  • Pingback: Make a Treemap with R! « Visual Security

  • spacer
    KaX505 — February 11, 2010 at 11:34 am

    I could recommend WinDirStat
    Its Free.
    Its easy to use.
    It really helps on finding what is filling up on your harddrive.

    I have freed up alot of gb’s with this app.
    Try it, You wont regret,

  • Pingback: Tech Thursday – Flex for all, win CSS books, calculating earth and whale anatomy | Techno Portal

  • spacer
    Mark Diggory — February 11, 2010 at 12:39 pm

    If he needs to know what is using up most of the space on his hard drive…

    www.jgoodies.com/freeware/jdiskreport/

  • spacer
    Kyle — February 11, 2010 at 1:01 pm

    I was just working on using mosaicplot() this morning to hack together a treemap, glad I checked FlowingData before getting too deep into it!

    Thanks for another great tutorial, though it is a little eerie that you posted this the exact day I needed it…. :)

    • spacer
      Nathan Yau — February 11, 2010 at 1:25 pm

      i’m waaaatttccching you.

  • Pingback: Bookmarks for February 11th from 21:11 to 22:10 | jansroka.com

  • spacer
    Marc Smith — February 11, 2010 at 4:53 pm

    An Excel add-in to create treemaps also exists:

    research.microsoft.com/en-us/downloads/3f3ed95e-26d8-4616-a06c-b609df29756f/default.aspx

  • spacer
    Pierce Presley — February 11, 2010 at 5:05 pm

    This is great! I love simple tutorials that lead to tangible results, and this is one of the best.
    I might have liked a short explanation of how to change the colors, but that would have definitely been icing–the cake was already there.
    Again, good job.

  • spacer
    Eofhan — February 11, 2010 at 6:41 pm

    I downloaded & started learning R after the NBA heatmap tutorial. I had no idea there were straight-forward, powerful, free tools available. So, thanks for the introduction!

    With regard to the treemap — I’m guessing putting things like mouse-over titles onto the squares can’t be done in R (or is easier to do elsewhere). Is there a good, inexpensive, OS X -compatible tool for doing that?

  • spacer
    Vianney — February 11, 2010 at 7:01 pm

    For those that find `R’ a bit daunting, a wonderful, aesthetically pleasing Treemap interface can be found here:

    macrofocus.com/public/products/treemap/download/

    I think it’s one of the better Treemap softwares on the market.

    • spacer
      Luc Girardin — May 5, 2010 at 4:02 am

      Many thanks for the kind words! We will do our best to make Macrofocus TreeMap even better ;-)

  • spacer
    alex — February 12, 2010 at 5:20 am

    Nice post! More of this and pointers to more of this :)

  • Pingback: Daily Digest for February 11th

  • spacer
    Kevin — February 12, 2010 at 9:35 am

    Great tutorial, Nathan.

    Have you seen any implementation in which it can show more than 2 levels?

    Note: neither the Excel Add-In nor the Macrofocus product allow you to modify the graphic in another editor; it only exports it as a “picture”. So if you don’t like exactly what you see … tough. Graphics in R can be copied as a metafile, making it a snap to make changes in Illustrator or even PowerPoint.

    • spacer
      Luc Girardin — May 5, 2010 at 4:05 am

      Dear Kevin,

      There is a function in Macrofocus TreeMap to export into PowerPoint native format (and therefore allow modifications). In is in beta right now, but we would be happy to make it available to you…

  • spacer
    Daniel — February 12, 2010 at 1:50 pm

    Great post!

    Any ideas for how to get CSV info on your file structure? We use an AFS implementation, and so if there’s a nice way to query all of the substructure of a given directory for size information and dump that into a CSV, I’d be set, but I’m not sure how to go about actually collecting the data. I can get at the directory structure as both a mounted drive in Windows or Mac, or I can get at it via a Unix box over SSH. Thanks!

  • spacer
    Stephane — February 12, 2010 at 2:24 pm

    Thanks, great tuto! Simple and efficient. I guess the font used for your last result is Adobe Avenir, I managed to change colours, I will manage to move the gradient bar below, but… How do you make the text go on a new line? for long sentences, it goes off the square too easily… Cheers

    • spacer
      Nathan Yau — February 12, 2010 at 3:10 pm

      @Stephane – Avenir, correct. I did the text stuff in Illustrator, but if you wanted to do that in R, you’d have find the word length, find the width of the rectangle it is a label for, and then split the word accordingly, by space or hyphen.

  • gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.