Gray Tunneling Based On Block Relevance for Focused Crawling
- DOI
- 10.2991/iske.2007.104How to use a DOI?
- Keywords
- Focused Crawler, Gray Tunneling, Content Block, Local Relevance
- Abstract
Focused crawlers are programs designed to selectively retrieve Web pages relevant to a specific domain for the use of domain-specific search engines. Tunneling is a heuristic-based method that solves global optimization problem. In this paper we defined Gray Tunneling and use content block algorithm to enhance focused crawler’s ability of traversing Gray Tunneling. Gray Tunneling resolves the problem that the topic-multiplicity of a web page makes the relevance of the highly relevant page been weakened. So during the process of crawling, in order to avoid the effect caused by the web page that is irrelevant to the specific topic as a whole but relevant partially, we divide a multi-topical page into several blocks and process the blocks individually, and then we can traverse the page that is irrelevant as a whole to expand the scope crawler reached and get more relevant pages. A comprehensive experiment has been conducted, the result shows obviously that this approach outperforms Best-First and Breadth-First algorithm both in harvest rate and efficiency.
- Copyright
- © 2007, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Na Luo AU - Zuo WanLi PY - 2007/10 DA - 2007/10 TI - Gray Tunneling Based On Block Relevance for Focused Crawling BT - Proceedings of the 2007 International Conference on Intelligent Systems and Knowledge Engineering (ISKE 2007) PB - Atlantis Press SP - 609 EP - 613 SN - 1951-6851 UR - https://doi.org/10.2991/iske.2007.104 DO - 10.2991/iske.2007.104 ID - Luo2007/10 ER -