Proceedings of the 2007 International Conference on Intelligent Systems and Knowledge Engineering (ISKE 2007)

Gray Tunneling Based On Block Relevance for Focused Crawling

Authors
Na Luo1, Zuo WanLi
1College of Computer Science and Technology, JiLin University Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education
Corresponding Author
Na Luo
Available Online October 2007.
DOI
10.2991/iske.2007.104How to use a DOI?
Keywords
Focused Crawler, Gray Tunneling, Content Block, Local Relevance
Abstract

Focused crawlers are programs designed to selectively retrieve Web pages relevant to a specific domain for the use of domain-specific search engines. Tunneling is a heuristic-based method that solves global optimization problem. In this paper we defined Gray Tunneling and use content block algorithm to enhance focused crawler’s ability of traversing Gray Tunneling. Gray Tunneling resolves the problem that the topic-multiplicity of a web page makes the relevance of the highly relevant page been weakened. So during the process of crawling, in order to avoid the effect caused by the web page that is irrelevant to the specific topic as a whole but relevant partially, we divide a multi-topical page into several blocks and process the blocks individually, and then we can traverse the page that is irrelevant as a whole to expand the scope crawler reached and get more relevant pages. A comprehensive experiment has been conducted, the result shows obviously that this approach outperforms Best-First and Breadth-First algorithm both in harvest rate and efficiency.

Copyright
© 2007, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2007 International Conference on Intelligent Systems and Knowledge Engineering (ISKE 2007)
Series
Advances in Intelligent Systems Research
Publication Date
October 2007
ISBN
978-90-78677-04-8
ISSN
1951-6851
DOI
10.2991/iske.2007.104How to use a DOI?
Copyright
© 2007, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Na Luo
AU  - Zuo WanLi
PY  - 2007/10
DA  - 2007/10
TI  - Gray Tunneling Based On Block Relevance for Focused Crawling
BT  - Proceedings of the 2007 International Conference on Intelligent Systems and Knowledge Engineering (ISKE 2007)
PB  - Atlantis Press
SP  - 609
EP  - 613
SN  - 1951-6851
UR  - https://doi.org/10.2991/iske.2007.104
DO  - 10.2991/iske.2007.104
ID  - Luo2007/10
ER  -