Application of Web Page Classification in a Domain-specific Search Engine
- DOI
- 10.2991/iccasm.2012.144How to use a DOI?
- Keywords
- Web page classification, Search engine, Domainspecific knowledge, Dictionary
- Abstract
Automatic web page classification can be used in domain-specific search engines to help users get the specific information more conveniently and precisely on Internet. The semantic similarity and noisy data in domain-specific web pages make traditional classifier perform poorly on them. In this paper, a dictionary-based multilingual web page classification method is proposed to try to improve the classification performance. A domain-specific dictionary is constructed in the method to intensify the domain-specific knowledge in the pages. An automatic encoding detection and integration method is also introduced in the classifier to extract Chinese and English information precisely from the multilingual pages. After verified in the experiments, the method is integrated into a real domain-specific search engine where it shows good effectiveness.
- Copyright
- © 2012, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Chunyan Liang PY - 2012/08 DA - 2012/08 TI - Application of Web Page Classification in a Domain-specific Search Engine BT - Proceedings of the 2012 International Conference on Computer Application and System Modeling (ICCASM 2012) PB - Atlantis Press SP - 568 EP - 570 SN - 1951-6851 UR - https://doi.org/10.2991/iccasm.2012.144 DO - 10.2991/iccasm.2012.144 ID - Liang2012/08 ER -