A Hybrid Method for Extracting Deep Web Information

Yuanpeng Zhang; Li Wang; Kui Jiang; Danmin Qian; Jiancheng Dong

doi:10.2991/amcce-15.2015.138

<Previous Article In Volume

Next Article In Volume>

A Hybrid Method for Extracting Deep Web Information

Authors

Yuanpeng Zhang, Li Wang, Kui Jiang, Danmin Qian, Jiancheng Dong

Corresponding Author

Yuanpeng Zhang

Available Online April 2015.

DOI: 10.2991/amcce-15.2015.138 How to use a DOI?
Keywords: information extraction; clinic expert information; domain model; block importance model; SVM
Abstract: Some previous works show that more than 60% of the information available on the Web is located in Deep Web database. Such information cannot be directly indexed by search engines. In this paper, a hybrid method, which is composed of a domain model and a block importance model is proposed to extract information in Deep Web. The domain model is used for classifying and identifying whether a form is a WQI. The block importance model is used for filtering noisy information in re-sponse pages. These two models are both compared with a rule-based method. The experiment re-sults indicate that the domain model yields a precision 6.44% higher than that of the rule-based method, whereas the block importance model yields an F1 measure 10.5% higher than that of the XPath method.
Copyright: © 2015, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 2015 International Conference on Automation, Mechanical Control and Computational Engineering
Series: Advances in Intelligent Systems Research
Publication Date: April 2015
ISBN: 978-94-62520-64-6
ISSN: 1951-6851
DOI: 10.2991/amcce-15.2015.138 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Yuanpeng Zhang
AU  - Li Wang
AU  - Kui Jiang
AU  - Danmin Qian
AU  - Jiancheng Dong
PY  - 2015/04
DA  - 2015/04
TI  - A Hybrid Method for Extracting Deep Web Information
BT  - Proceedings of the 2015 International Conference on Automation, Mechanical Control and Computational Engineering
PB  - Atlantis Press
SP  - 1194
EP  - 1199
SN  - 1951-6851
UR  - https://doi.org/10.2991/amcce-15.2015.138
DO  - 10.2991/amcce-15.2015.138
ID  - Zhang2015/04
ER  -

download .riscopy to clipboard