BloomCast: Efficient and Effective Full-Text Retrieval in Unstructured P2P Networks(2012)

Note: Please Scroll Down to See the Download Link.

ABSTRACT:

Efficient and effective full-text retrieval in unstructured peer-to-peer networks remains a challenge in the research community. First, it is difficult, if not impossible, for unstructured P2P systems to effectively locate items with guaranteed recall. Second, existing schemes to improve search success rate often rely on replicating a large number of item replicas across the wide area network, incurring a large amount of communication and storage costs. In this paper, we propose BloomCast, an efficient and effective full-text retrieval scheme, in unstructured P2P networks. By leveraging a hybrid P2P protocol, BloomCast replicates the items uniformly at random across the P2P networks, achieving a guaranteed recall at a communication cost of  Þ, where N is the size of the network. Furthermore, by casting Bloom Filters instead of the raw documents across the network, BloomCast significantly reduces the communication and storage costs for replication. We demonstrate the power of BloomCast design through both mathematical proof and comprehensive simulations based on the query logs from a major commercial search engine and NIST TREC WT10G data collection. Results show that BloomCast achieves an average query recall of 91 percent, which outperforms the existing WP algorithm by 18 percent, while BloomCast greatly reduces the search latency for query processing by 57 percent.

EXISTING SYSTEM:

In the existing system there are two major issues . First, it is difficult, if not impossible, for unstructured P2P systems to effectively locate items with guaranteed recall. Second, existing schemes to improve search success rate often rely on replicating a large number of item replicas across the wide area network, incurring a large amount of communication and storage costs.

An existing p2p search schemes: DHT-based global index and federated search engine over unstructured protocols.

DHT-based search engines are based on distributed indexes that partition a logically global inverted index in a physically distributed manner.

Federated search engine over unstructured p2ps, queries are processed based on flooding.  Unstructured p2ps are commonly believed to be the best candidate for supporting full-text retrieval because the query evaluation operations an be handled at the nodes that store the relevant documents.

Replication strategies are extensively utilized to improve search performance in unstructured p2ps.  The first type is the query popularity aware strategies.

The second type of replication strategy is independent of the popularity of the query, such as the WP scheme.

DISADVATAGES OF EXISTING SYSTEM:

–     The exact match problem of DHTs, such schemes provide poor full-text search capacity.

–     Search recall is not guaranteed with acceptable communication cost using a flooding-based scheme.

–     The strategy is inefficient for solving insoluble queries, the queries for rate items.  The query frequency is difficult or even impossible to obtain in a distributed p2p system.  The existing replication strategies need to replicate the full document across the network, raising possibly unacceptable communication and storage costs.

PROPOSED SYSTEM:

In the proposed system, we   propose a novel strategy, called BloomCast , an efficient and effective full-text retrieval scheme, in unstructured P2P networks.

The query popularity independent replication strategy, we propose a novel strategy, called Bloom Cast, to support efficient and effective full-text retrieval.

Bloom Cast are mathematically that the recall can be guaranteed at a communication cost of O (square root N), where N is the size of the network.

ADVANTAGES OF PROPOSED SYSTEM:

–     By replicating the encoded term sets using Bloom Filters instead of raw documents among peers, the communication/storage costs are greatly reduced, while the full-text multi keyword searching are supported.

MODULES:

•      Node creation

•      Bloom cast replication model generation

•      Bloom cast

•      Bloom filter

•      Query recall

MODULE DESCRIPTION

Node creation

•      To retrieve the full-text efficiently we have created nodes in the p2p networks.

•      Each node is sending documents randomly and uniformly in the unstructured p2p network.

•      By creating nodes in unstructured p2p networks it reduces the communication and storage cost.

Bloom cast replication model generation

•      Replication model is generated based on the document replica and query replica.

•      Bloom cast replica is estimated by the number of nodes having replica of document and query.

•      By using this replication count we evaluate the search success rate of query searched by the user.

Bloom cast

•      Bloom cast is generated based on network size estimation, node subset sampling, replication protocol, query evaluation.

•      Network size is estimated by DHT subsystem which maintains the local repository of replicas.

•      After that we assign the sub nodes to reduce the cost and storage.

•      Query evaluation is estimated by optimum number of query replication randomly distributed in network.

Bloom filter

•      Bloom filter maintains the hash table for document replica and query replica.

•      Bloom filter reduces the memory storage and search engines efficient and effectively for full-text retrieval.

Query recall

•      The recall will produce the replica and Bloom filter without any loss.

•      Query recall will retrieve full-text in unstructured p2p network and reduces communication cost and storage cost.

•      It retrieves the data quickly and satisfies the user requirement.

SYSTEM CONFIGURATION:-

HARDWARE REQUIREMENTS:-

ü Processor             -Pentium –III

ü Speed                             -    1.1 Ghz

ü RAM                    -    256 MB(min)

ü Hard Disk            -   20 GB

ü Floppy Drive       -    1.44 MB

ü Key Board            -    Standard Windows Keyboard

ü Mouse                  -    Two or Three Button Mouse

ü Monitor                -    SVGA

SOFTWARE REQUIREMENTS:-

v   Operating System          : Windows95/98/2000/XP

v   Front End                      : Java

Click here to download BloomCast: Efficient and Effective Full-Text Retrieval in Unstructured P2P Networks(2012) source code