Speed Up Compression via Parallel BZIP2 (PBZIP2)
TweetBy pure chance, this morning I came across a post that mentioned PBZIP2. Having never heard of it, of course I had to look it up. Crikey. File this one under “Why Didn’t Someone Tell Me About This Earlier?!”
“Wait a minute,” I said aloud to nobody in particular. “BZIP2 doesn’t support symmetric multi-processing? And there’s an alternate implementation that does take advantage of multiple CPUs?”
“Whiskey. Tango. Foxtrot.”
And after a few tests, I’ll be tarred and feathered if it ain’t true: the speed improvement was, as promised, linear to the number of cores.
Installation
To install it via Homebrew on Mac OS X:
brew install pbzip2
To install it on Ubuntu or Debian:
sudo aptitude install pbzip2
The pbzip2
binary should now be available. Refer to the manpage for the gory details.
Testing
Using a 91 MB tar archive as my test file, I ran the following commands on a quad-core 2.93 GHz i7 running Mac OS X 10.7 (Lion) to see whether there was indeed any improvement in compression speed:
time bzip2 -k testfile.tar time pbzip2 -k testfile.tar
The results: 18.7 seconds for bzip2, and… wait for it… 3.5 seconds for pbzip2. That represents an 81% reduction in compression time and a five-fold increase in speed in this particular test.
While decompression speed increases weren’t nearly as dramatic, pbzip2 decompression appears to faster than stock bzip2.
New Aliases
I don’t want to have to remember to specifically use the pbzip2
command, so I decided to add some aliases to my bashrc. First, let’s detect whether pbzip2
is installed and available:
# Check to see if pbzip2 is already on path; if so, set BZIP_BIN appropriately type -P pbzip2 &>/dev/null && export BZIP_BIN="pbzip2" # Otherwise, default to standard bzip2 binary if [ -z $BZIP_BIN ]; then export BZIP_BIN="bzip2" fi
Using the above logic, I set bz
as an alias to pbzip2
if available, and if not, to bzip2
:
alias bz=$BZIP_BIN
I usually compress directories more often than individual files, so I added some commands to quickly compress directories and expand bzipped tarballs:
tarb() { tar -cf "$1".tbz --use-compress-prog=$BZIP_BIN "$1" } untarbzip() { $BZIP_BIN -dc "$1" | tar x --exclude="._*" } alias buntar=untarbzip
Usage:
bz myfile tarb mydirectory buntar mytarball.tbz
Got a better method?
Have you had any experience with parallelized bzip2 compression? Sound off in the comments!
Follow me on Twitter to be notified when new articles are posted. You can also follow me on GitHub.