Improvement of TF-IDF Algorithm Based on Hadoop Framework
Authors
Li Bin, Guoyong Yuan
Corresponding Author
Li Bin
Available Online August 2012.
- DOI
- 10.2991/iccasm.2012.98How to use a DOI?
- Keywords
- Hadoop, TF-IDF, distributed computing
- Abstract
TF-IDF algorithm is often used in search engine, text similarity computation, web data mining, etc. These applications are often faced with the massive data processing. Therefore, how to calculate the tf-idf quickly and efficiently is very important. In this paper, we give a tf-idf algorithm based on the hadoop framework. Experiments show that in the case of massive data computing, the new method applying hadoop framework is more efficient than the traditional methods.
- Copyright
- © 2012, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Li Bin AU - Guoyong Yuan PY - 2012/08 DA - 2012/08 TI - Improvement of TF-IDF Algorithm Based on Hadoop Framework BT - Proceedings of the 2012 International Conference on Computer Application and System Modeling (ICCASM 2012) PB - Atlantis Press SP - 391 EP - 393 SN - 1951-6851 UR - https://doi.org/10.2991/iccasm.2012.98 DO - 10.2991/iccasm.2012.98 ID - Bin2012/08 ER -