International Journal of Computational Intelligence Systems

Volume 13, Issue 1, 2020, Pages 1218 - 1226

Tolerance Rough Set-Based Bag-of-Words Model for Document Representation

Authors
Dong Qiu1, *, Haihuan Jiang1, Ruiteng Yan2
1College of Science, Chongqing University of Posts and Telecommunications, Nanan, Chongqing, 400065, P.R. China
2School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Nanan, Chongqing, 400065, P.R. China
*Corresponding author. Email: dongqiumath@163.com
Corresponding Author
Dong Qiu
Received 12 May 2020, Accepted 4 August 2020, Available Online 19 August 2020.
DOI
10.2991/ijcis.d.200808.001How to use a DOI?
Keywords
Document representation; Tolerance rough set; Bag-of-Words
Abstract

Document representation is one of the foundations of natural language processing. The bag-of-words (BoW) model, as the representative of document representation models, is a method with the properties of simplicity and validity. However, the traditional BoW model has the drawbacks of sparsity and lacking of latent semantic relations. In this paper, to solve these mentioned problems, we propose two tolerance rough set-based BOW models, called as TRBoW1 and TRBoW2 according to different weight calculation methods. Different from the popular representation methods of supervision, they are unsupervised and no prior knowledge required. Extending each document to its upper approximation with TRBoW1 or TRBoW2, the semantic relations among documents are mined and document vectors become denser. Comparative experiments on various document representation methods for text classification on different datasets have verified optimal performance of our methods.

Copyright
© 2020 The Authors. Published by Atlantis Press B.V.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)
View full text (HTML)

Journal
International Journal of Computational Intelligence Systems
Volume-Issue
13 - 1
Pages
1218 - 1226
Publication Date
2020/08/19
ISSN (Online)
1875-6883
ISSN (Print)
1875-6891
DOI
10.2991/ijcis.d.200808.001How to use a DOI?
Copyright
© 2020 The Authors. Published by Atlantis Press B.V.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - JOUR
AU  - Dong Qiu
AU  - Haihuan Jiang
AU  - Ruiteng Yan
PY  - 2020
DA  - 2020/08/19
TI  - Tolerance Rough Set-Based Bag-of-Words Model for Document Representation
JO  - International Journal of Computational Intelligence Systems
SP  - 1218
EP  - 1226
VL  - 13
IS  - 1
SN  - 1875-6883
UR  - https://doi.org/10.2991/ijcis.d.200808.001
DO  - 10.2991/ijcis.d.200808.001
ID  - Qiu2020
ER  -