Laboratory for Web Algorithmics Web
 
Main Menu
Home
Software
Hardware
Datasets
Tesi
Publications
Collaborations
Teaching
People
Login Form
Username

Password

Remember me
Password Reminder
No account yet? Create one


We provide large graphs that can be accessed using WebGraph. Please read the tutorial to learn how to use such format. If you publish results based on these graphs, please acknowledge the usage of WebGraph by quoting the following paper:

@inproceedings{BoVWFI,
  author ="Paolo Boldi and Sebastiano Vigna",
  title = "The {W}eb{G}raph Framework {I}: {C}ompression Techniques",
  year = 2004,
  booktitle="Proc. of the Thirteenth International World Wide Web Conference (WWW 2004)",
  address="Manhattan, USA",
  pages="595--601",
  publisher="ACM Press"
}

If the graphs you are using were gathered by UbiCrawler, please acknowledge the usage of UbiCrawler by quoting the following paper:

@article{
BCSU3,
  author="Paolo Boldi and Bruno Codenotti and Massimo Santini and Sebastiano Vigna",
  title="UbiCrawler: A Scalable Fully Distributed Web Crawler",
  journal="Software: Practice \& Experience",
  year=2004,
  volume=34,
  number=8,
  pages="711--726"
}

This table provide basic information about the available graphs, such as the crawl date, the number of nodes and arcs, and the number of bits per link of the highly compressed version (the version we provide for general usage has faster random access, but worse compression ratio). We report, when available, the maximum depth per host and the maximum number of URLs per host. The last column links to the institution which provided support for the crawl.

Graph Crawl date Nodes Arcs Bits/link Max depth Max URLs Thanks
arabic-2005 2005 22 744 080 639 999 458 1.990 16 10 000 NagaokaUT
cnr-2000 2000 325 557 3 216 152 2.838     IIT
eu-2005 2005 862 664 19 235 140 4.376     DSI
in-2004 2004 1 382 908 16 917 053 2.171     NagaokaUT
indochina-2004 2004 7 414 866 194 109 311 1.472     NagaokaUT
it-2004 2004 41 291 594 1 150 725 436 1.999     IIT
sk-2005 2005 50 636 154 1 949 412 601 2.860 16 100 000 IIT
uk-2002 2002 18 520 486 298 113 762 2.224     IIT
uk-2005 2005 39 459 925 936 364 282 1.701     IIT
uk-2006-05 2006-05 77 741 046 2 965 197 340 2.125 16 50 000 DSI-DELIS
uk-2006-06 2006-06 80 644 902 2 481 281 617 2.242 16 50 000 DSI-DELIS
uk-2006-07 2006-07 96 395 298 3 030 665 444 2.495 16 50 000 DSI-DELIS
uk-2006-08 2006-08 100 751 978 3 250 153 746 2.470 16 50 000 DSI-DELIS
uk-2006-09 2006-09 106 288 541 3 871 625 613 2.224 16 50 000 DSI-DELIS
uk-2006-10 2006-10 93 463 772 3 130 910 405 2.061 16 50 000 DSI-DELIS
uk-2006-11 2006-11 106 783 458 3 479 400 938 2.101 16 50 000 DSI-DELIS
uk-2006-12 2006-12 103 098 631 3 768 836 665 2.079 16 50 000 DSI-DELIS
uk-2007-01 2007-01 108 563 230 3 929 837 236 1.982 16 50 000 DSI-DELIS
uk-2007-02 2007-02 110 123 614 3 944 932 566 2.023 16 50 000 DSI-DELIS
uk-2007-03 2007-03 107 565 084 3 642 701 825 2.059 16 50 000 DSI-DELIS
uk-2007-04 2007-04 106 867 191 3 790 305 474 2.057 16 50 000 DSI-DELIS
uk-2007-05 2007-05 105 896 555 3 738 733 648 1.950 16 50 000 DSI-DELIS
uk-union-2006-06-2007-05 2006-06-2007-05 133 633 040 5 507 679 822 2.644     DSI-DELIS
webbase-2001 2001 118 142 155 1 019 903 190 3.078     WebBase
Search
Recent items
News
   Home arrow Datasets