Proceedings of the 2015 International Conference on Artificial Intelligence and Industrial Engineering

A Topic-based Dynamic Clustering Algorithm for Text Stream

Authors
Y. Rao, X.J. Li
Corresponding Author
Y. Rao
Available Online July 2015.
DOI
10.2991/aiie-15.2015.130How to use a DOI?
Keywords
sliding window; data stream; text mining; multi-phases clustering
Abstract

In order to provide real-time early warning from the public sentiment information in social network for making decision, a topic-based dynamic clustering for text stream (TBDC4TS) algorithm is proposed to cluster the text stream, which is formed by web crawler to continuously grab the web pages. The sliding time window (SWt) can be used to split the text stream into continuous segmentation, which includes a set of web news’ pages related to the velocity of stream and the size of sliding window. Furthermore, a multi-phase cluster method in TBDC4TS is used to merge the micro-cluster in each sliding window and Macro-cluster in single-pass engine together. The results of experiments, used 2650 web news pages to form a simulate text stream by web crawler, show that the TBDC4TS algorithm has 22.8 times executing efficiency and the higher clustering qualify, such as precision and recall rate, than Single-pass.

Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2015 International Conference on Artificial Intelligence and Industrial Engineering
Series
Advances in Intelligent Systems Research
Publication Date
July 2015
ISBN
978-94-62520-70-7
ISSN
1951-6851
DOI
10.2991/aiie-15.2015.130How to use a DOI?
Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Y. Rao
AU  - X.J. Li
PY  - 2015/07
DA  - 2015/07
TI  - A Topic-based Dynamic Clustering Algorithm for Text Stream
BT  - Proceedings of the 2015 International Conference on Artificial Intelligence and Industrial Engineering
PB  - Atlantis Press
SP  - 480
EP  - 483
SN  - 1951-6851
UR  - https://doi.org/10.2991/aiie-15.2015.130
DO  - 10.2991/aiie-15.2015.130
ID  - Rao2015/07
ER  -