Tolerance Rough Set-Based Bag-of-Words Model for Document Representation
- DOI
- 10.2991/ijcis.d.200808.001How to use a DOI?
- Keywords
- Document representation; Tolerance rough set; Bag-of-Words
- Abstract
Document representation is one of the foundations of natural language processing. The bag-of-words (BoW) model, as the representative of document representation models, is a method with the properties of simplicity and validity. However, the traditional BoW model has the drawbacks of sparsity and lacking of latent semantic relations. In this paper, to solve these mentioned problems, we propose two tolerance rough set-based BOW models, called as TRBoW1 and TRBoW2 according to different weight calculation methods. Different from the popular representation methods of supervision, they are unsupervised and no prior knowledge required. Extending each document to its upper approximation with TRBoW1 or TRBoW2, the semantic relations among documents are mined and document vectors become denser. Comparative experiments on various document representation methods for text classification on different datasets have verified optimal performance of our methods.
- Copyright
- © 2020 The Authors. Published by Atlantis Press B.V.
- Open Access
- This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).
Download article (PDF)
View full text (HTML)
Cite this article
TY - JOUR AU - Dong Qiu AU - Haihuan Jiang AU - Ruiteng Yan PY - 2020 DA - 2020/08/19 TI - Tolerance Rough Set-Based Bag-of-Words Model for Document Representation JO - International Journal of Computational Intelligence Systems SP - 1218 EP - 1226 VL - 13 IS - 1 SN - 1875-6883 UR - https://doi.org/10.2991/ijcis.d.200808.001 DO - 10.2991/ijcis.d.200808.001 ID - Qiu2020 ER -