WARNING: Release 3.0 has minor binary incompatibilities with previous releases, mainly due to the move from the interface it.unimi.dsi.util.LongBigList to the now standard it.unimi.dsi.fasutil.longs.LongBigList. It is part of a parallel release of fastutil, the DSI Utilities, Sux4J, MG4J, WebGraph, etc. that were all modified to fit the new interface. It comes in two versions: the standard version and the big version, which supports >231 nodes. Please read our (short) "Moving Java to Big Data" document for details.

Introduction

spacer WebGraph is a framework for graph compression aimed at studying web graphs. It provides simple ways to manage very large graphs, exploiting modern compression techniques. More precisely, it is currently made of:

  1. A set of flat codes, called ζ codes, which are particularly suitable for storing web graphs (or, in general, integers with power-law distribution in a certain exponent range). The fact that these codes work well can be easily tested empirically, but we also try to provide a detailed mathematical analysis.
  2. Algorithms for compressing web graphs that exploit gap compression and referentiation (à la LINK), intervalisation and ζ codes to provide a high compression ratio (see our datasets). The algorithms are controlled by several parameters, which provide different tradeoffs between access speed and compression ratio.
  3. Algorithms for accessing a compressed graph without actually decompressing it, using lazy techniques that delay the decompression until it is actually necessary.
  4. A complete, documented implementation of the algorithms above in Java distributed under the GNU General Public License. Besides a clearly defined API, we also provide several classes tha modify (e.g., transpose) or recompress a graph, so to experiment with various settings.
  5. Datasets for very large graph (e.g., a billion of links). These are either gathered from public sources (such as WebBase), or produced by UbiCrawler.

spacer In the end, with WebGraph you can access and analyse very large web graphs. Using WebGraph is as easy as installing a few jar files and downloading a dataset. This makes studying phenomena such as PageRank, distribution of graph properties of the web graph, etc. very easy.

You are welcome to use and improve WebGraph! If you find our software useful for your research, please quote this paper.

Installation

spacer You can grab WebGraph from Maven Central.

Otherwise, you have to install the .jar file coming with the distribution and the dependencies, which are gathered for your convenience in a tarball.

WebGraph++

Jacob Ratkievicz has developed a C++ version of WebGraph that you might want to try. The library exposes a BVGraph as an object of the Boost Graph Library, so it is easily integrable with other code.

pyWebgraph

Massimo Santini has developed a front-end that interfaces Jython with WebGraph. It makes exploring small portions of very large graphs very easy and interactive.

WebGraph for MATLAB®

David Gleich has developed a MATLAB® package to access WebGraph-encoded data easily.
 
 
gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.