Laboratory for Web Algorithmics Web
 
Main Menu
Home
Software
Hardware
Datasets
Tesi
Publications
Collaborations
Teaching
People
Login Form
Username

Password

Remember me
Password Reminder
No account yet? Create one
Style
c7_doopal
Syndicate


Visualizzatore di indici per MG4J
Un motore di indicizzazione come MG4J produce un indice inverso, che contiene un'enorme quantità di dati riguardanti la collezione indicizzata, come le frequenze dei termini, la distribuzione dei termini nei documenti, delle lunghezze dei documenti, ecc. Si propone di sviluppare un tool Java in grado di generare, a partire da un indice di MG4J, un documento HTML contenente grafici, istogrammi, tabelle, ecc., che permettano di visualizzare con facilità dati aggregati dell'indice stesso.
Introduction to LAW

The Laboratory for Web Algorithmics (LAW) was established in 2002 at the Dipartimento di Scienze dell'Informazione (DSI) of the Università degli studi di Milano.

Research at LAW concerns all algorithmic aspects of the web. More in detail:

High-performance parallel web crawling

CrawlerUbiCrawler is a scalable, fault-tolerant and fully distributed web crawler developed in collaboration with the Istituto di Informatica e Telematica. The first report on the design of UbiCrawler won the Best Poster Award at the Tenth World Wide Web Conference.

Web-graph compression

WebGraphOnce a part of the web has been crawled, the resulting graph is very large—you need a compact representation. WebGraph is a framework built to this purpose. Among other things, WebGraph uses new instantaneous codes for the integers and new aggressive algorithmic compression techniques.

Web-graph analysis

Web graphs have special properties whose study requires a sizeable amount of mathematics, but also a careful study of actual web graphs. We have studied, for instance, the paradoxical way PageRank evolves during a crawl, and the way PageRank changes depending on the damping factor.

Search-engine construction

Often, the purpose of a crawl is the contruction of a full-text index of the text contained in the crawled pages. Such an index is at the basis of all existing commercial search engines such as Google.

The research of search-engine construction is based on MG4J, a system for full-text indexing of large-scale document collections.

Search
Newsflash
LAW has just released a time-aware web graph, named uk-union-2006-06-2007-05, which contains, in a highly compressed and quickly accessible form, the graphs of twelve 100Mpages snapshots of the .uk domain.
Recent items
News
   Home