A Topic-based Dynamic Clustering Algorithm for Text Stream
- DOI
- 10.2991/aiie-15.2015.130How to use a DOI?
- Keywords
- sliding window; data stream; text mining; multi-phases clustering
- Abstract
In order to provide real-time early warning from the public sentiment information in social network for making decision, a topic-based dynamic clustering for text stream (TBDC4TS) algorithm is proposed to cluster the text stream, which is formed by web crawler to continuously grab the web pages. The sliding time window (SWt) can be used to split the text stream into continuous segmentation, which includes a set of web news’ pages related to the velocity of stream and the size of sliding window. Furthermore, a multi-phase cluster method in TBDC4TS is used to merge the micro-cluster in each sliding window and Macro-cluster in single-pass engine together. The results of experiments, used 2650 web news pages to form a simulate text stream by web crawler, show that the TBDC4TS algorithm has 22.8 times executing efficiency and the higher clustering qualify, such as precision and recall rate, than Single-pass.
- Copyright
- © 2015, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Y. Rao AU - X.J. Li PY - 2015/07 DA - 2015/07 TI - A Topic-based Dynamic Clustering Algorithm for Text Stream BT - Proceedings of the 2015 International Conference on Artificial Intelligence and Industrial Engineering PB - Atlantis Press SP - 480 EP - 483 SN - 1951-6851 UR - https://doi.org/10.2991/aiie-15.2015.130 DO - 10.2991/aiie-15.2015.130 ID - Rao2015/07 ER -