Proceedings of the 2015 International Conference on Automation, Mechanical Control and Computational Engineering

A Hybrid Method for Extracting Deep Web Information

Authors
Yuanpeng Zhang, Li Wang, Kui Jiang, Danmin Qian, Jiancheng Dong
Corresponding Author
Yuanpeng Zhang
Available Online April 2015.
DOI
10.2991/amcce-15.2015.138How to use a DOI?
Keywords
information extraction; clinic expert information; domain model; block importance model; SVM
Abstract

Some previous works show that more than 60% of the information available on the Web is located in Deep Web database. Such information cannot be directly indexed by search engines. In this paper, a hybrid method, which is composed of a domain model and a block importance model is proposed to extract information in Deep Web. The domain model is used for classifying and identifying whether a form is a WQI. The block importance model is used for filtering noisy information in re-sponse pages. These two models are both compared with a rule-based method. The experiment re-sults indicate that the domain model yields a precision 6.44% higher than that of the rule-based method, whereas the block importance model yields an F1 measure 10.5% higher than that of the XPath method.

Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2015 International Conference on Automation, Mechanical Control and Computational Engineering
Series
Advances in Intelligent Systems Research
Publication Date
April 2015
ISBN
978-94-62520-64-6
ISSN
1951-6851
DOI
10.2991/amcce-15.2015.138How to use a DOI?
Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Yuanpeng Zhang
AU  - Li Wang
AU  - Kui Jiang
AU  - Danmin Qian
AU  - Jiancheng Dong
PY  - 2015/04
DA  - 2015/04
TI  - A Hybrid Method for Extracting Deep Web Information
BT  - Proceedings of the 2015 International Conference on Automation, Mechanical Control and Computational Engineering
PB  - Atlantis Press
SP  - 1194
EP  - 1199
SN  - 1951-6851
UR  - https://doi.org/10.2991/amcce-15.2015.138
DO  - 10.2991/amcce-15.2015.138
ID  - Zhang2015/04
ER  -