|
We provide large graphs that can be accessed using WebGraph.
Please read the tutorial
to learn how to use such format.
If you publish results based on these graphs, please acknowledge the usage of WebGraph by quoting the following paper:
@inproceedings{BoVWFI,
author ="Paolo Boldi and Sebastiano Vigna",
title = "The {W}eb{G}raph Framework {I}: {C}ompression Techniques",
year = 2004,
booktitle="Proc. of the Thirteenth International World Wide Web Conference (WWW 2004)",
address="Manhattan, USA",
pages="595--601",
publisher="ACM Press"
}
If the graphs you are using were gathered by UbiCrawler, please acknowledge the usage of UbiCrawler by quoting the following paper:
@article{
BCSU3,
author="Paolo Boldi and Bruno Codenotti and Massimo Santini and Sebastiano Vigna",
title="UbiCrawler: A Scalable Fully Distributed Web Crawler",
journal="Software: Practice \& Experience",
year=2004,
volume=34,
number=8,
pages="711--726"
}
This table provide basic information about the available graphs, such as the crawl date,
the number of nodes and arcs, and the number of bits per link of the highly compressed version
(the version we provide for general usage has faster random access, but worse compression ratio).
We report, when available, the maximum depth per host and the maximum number of URLs per host.
The last column links to the institution which provided support for the crawl.
|