DHTs over Peer Clusters for Distributed Information Retrieval

Odysseas Papapetrou and Wolf Siberski and Wolf-Tilo Balke and Wolfgang Nejdl
L3S Research Center, University of Hannover
{papapetrou, siberski, balke, nejdl}-AT-l3s.de

Abstract: Distributed Hash Tables (DHTs) are very efficient for querying based on key lookups, if only a small number of keys has to be registered by each individual peer. But when it comes to huge collections of document terms necessary for IR-style keyword search their usefulness is limited. One reason is the high cost of index maintenance. Due to the large sizes of vocabularies for document terms, joining peers cause huge amounts of key inserts, and therefore large numbers of index maintenance messages. We propose to use DHTs in combination with peer clustering to cope with this issue. In our approach peers are first clustered into communities, each of the communities having a representative super peer. Then each term occurring in a community is only registered once in a global DHT by the representative peer. Thus, though especially for frequent terms in a community index maintenance is reduced drastically, the global DHT is still complete and allows for correct query processing. Our extensive simulations show the applicability of the scheme for practical applications.
Keywords: Peer to Peer, Distributed Clustering, Hierarchical DHT

author = {Odysseas Papapetrou and Wolf Siberski and Wolf-Tilo Balke and Wolfgang Nejdl},
title = {DHTs over Peer Clusters for Distributed Information Retrieval},
booktitle = {21st International Advanced Information Networking and Applications (AINA-07)},
year = {2007},
abstract-url={http://wwww.l3s.de/~papapetrou/abstracts/aina07-peerclusters.html} }