A Novel Framework for Web Pages Classification
- DOI
- 10.2991/icmt-13.2013.130How to use a DOI?
- Keywords
- Score fusion • Multi-instance learning • Bag-of-features
- Abstract
In this paper, we propose a novel framework for classifying web pages containing images and text. Valid images are first chosen by the FOrward CompArison of Relative Sizes Sorting(FOCARSS) algorithm, and each valid image is represented by the mid-level feature vector generated by the Bag-Of-Features model. Taking these feature vectors of valid images in a web page as instances of a bag, Multi-Instance Learning is utilized to conduct the image-based web pages classification. Regarding the text information, Bag-Of-Words model is used to conduct the text-based web pages classification. Subsequently, score-level fusion schemes are used to fuse these two kinds of heterogeneous information. Experimental results on a representative dataset demonstrate that our framework can definitely take full advantage of image and text information and improve final classification performances.
- Copyright
- © 2013, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Hu Ruiguang AU - Hu Weiming PY - 2013/11 DA - 2013/11 TI - A Novel Framework for Web Pages Classification BT - Proceedings of 3rd International Conference on Multimedia Technology(ICMT-13) PB - Atlantis Press SP - 1054 EP - 1061 SN - 1951-6851 UR - https://doi.org/10.2991/icmt-13.2013.130 DO - 10.2991/icmt-13.2013.130 ID - Ruiguang2013/11 ER -