Combining Classifiers to Extract Web Data
- DOI
- 10.2991/iccse-15.2015.76How to use a DOI?
- Keywords
- Web Data Extraction; Ensemble Learning; Data Integration
- Abstract
A lot of data on the web are usually embedded in the semi-structured pages. In order to automatically process the content embedded in Web pages, extracting data from them and making it available to computer applications remains a complex and urgent task. Most of current approaches use a single classifier to extract web data, but relying on a single classifier is not sufficient and different classifier has different performance for a problem. In this paper, we combine multiple classifiers to extract web data. Firstly, we identify the main data regions of web pages, and construct feature sets of text nodes in the regions. Secondly, we choose three kinds of base classifiers and then use the voting method to integrate results of each classifier. Finally, we combine integration results with heuristic rules to get the final extraction results. The experiment results show that our approach outperforms the baseline approaches.
- Copyright
- © 2015, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Qiang Chu AU - Yongquan Dong AU - Ping Ling PY - 2015/07 DA - 2015/07 TI - Combining Classifiers to Extract Web Data BT - Proceedings of the 2015 International Conference on Computational Science and Engineering PB - Atlantis Press SP - 412 EP - 416 SN - 2352-538X UR - https://doi.org/10.2991/iccse-15.2015.76 DO - 10.2991/iccse-15.2015.76 ID - Chu2015/07 ER -