Hacker Codex

Speed Up Compression via Parallel BZIP2 (PBZIP2)

Tweet

By pure chance, this morning I came across a post that mentioned PBZIP2. Having never heard of it, of course I had to look it up. Crikey. File this one under “Why Didn’t Someone Tell Me About This Earlier?!”

Wait a minute,” I said aloud to nobody in particular. “BZIP2 doesn’t support symmetric multi-processing? And there’s an alternate implementation that does take advantage of multiple CPUs?”

Whiskey. Tango. Foxtrot.”

And after a few tests, I’ll be tarred and feathered if it ain’t true: the speed improvement was, as promised, linear to the number of cores.

Installation

To install it via Homebrew on Mac OS X:

brew install pbzip2

To install it on Ubuntu or Debian:

sudo aptitude install pbzip2

The pbzip2 binary should now be available. Refer to the manpage for the gory details.

Testing

Using a 91 MB tar archive as my test file, I ran the following commands on a quad-core 2.93 GHz i7 running Mac OS X 10.7 (Lion) to see whether there was indeed any improvement in compression speed:

time bzip2 -k testfile.tar
time pbzip2 -k testfile.tar

The results: 18.7 seconds for bzip2, and… wait for it… 3.5 seconds for pbzip2. That represents an 81% reduction in compression time and a five-fold increase in speed in this particular test.

While decompression speed increases weren’t nearly as dramatic, pbzip2 decompression appears to faster than stock bzip2.

New Aliases

I don’t want to have to remember to specifically use the pbzip2 command, so I decided to add some aliases to my bashrc. First, let’s detect whether pbzip2 is installed and available:

# Check to see if pbzip2 is already on path; if so, set BZIP_BIN appropriately
type -P pbzip2 &>/dev/null && export BZIP_BIN="pbzip2"
# Otherwise, default to standard bzip2 binary
if [ -z $BZIP_BIN ]; then
  export BZIP_BIN="bzip2"
fi

Using the above logic, I set bz as an alias to pbzip2 if available, and if not, to bzip2:

alias bz=$BZIP_BIN

I usually compress directories more often than individual files, so I added some commands to quickly compress directories and expand bzipped tarballs:

tarb() {
  tar -cf "$1".tbz --use-compress-prog=$BZIP_BIN "$1"
}
untarbzip() {
  $BZIP_BIN -dc "$1" | tar x --exclude="._*"
}
alias buntar=untarbzip

Usage:

bz myfile
tarb mydirectory
buntar mytarball.tbz

Got a better method?

Have you had any experience with parallelized bzip2 compression? Sound off in the comments!



Follow me on Twitter to be notified when new articles are posted. You can also follow me on GitHub.

Comments

follow

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.