A Hybrid Method for Extracting Deep Web Information
- DOI
- 10.2991/amcce-15.2015.138How to use a DOI?
- Keywords
- information extraction; clinic expert information; domain model; block importance model; SVM
- Abstract
Some previous works show that more than 60% of the information available on the Web is located in Deep Web database. Such information cannot be directly indexed by search engines. In this paper, a hybrid method, which is composed of a domain model and a block importance model is proposed to extract information in Deep Web. The domain model is used for classifying and identifying whether a form is a WQI. The block importance model is used for filtering noisy information in re-sponse pages. These two models are both compared with a rule-based method. The experiment re-sults indicate that the domain model yields a precision 6.44% higher than that of the rule-based method, whereas the block importance model yields an F1 measure 10.5% higher than that of the XPath method.
- Copyright
- © 2015, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Yuanpeng Zhang AU - Li Wang AU - Kui Jiang AU - Danmin Qian AU - Jiancheng Dong PY - 2015/04 DA - 2015/04 TI - A Hybrid Method for Extracting Deep Web Information BT - Proceedings of the 2015 International Conference on Automation, Mechanical Control and Computational Engineering PB - Atlantis Press SP - 1194 EP - 1199 SN - 1951-6851 UR - https://doi.org/10.2991/amcce-15.2015.138 DO - 10.2991/amcce-15.2015.138 ID - Zhang2015/04 ER -