Proceedings of the 2012 International Conference on Computer Application and System Modeling (ICCASM 2012)

Improvement of TF-IDF Algorithm Based on Hadoop Framework

Authors
Li Bin, Guoyong Yuan
Corresponding Author
Li Bin
Available Online August 2012.
DOI
10.2991/iccasm.2012.98How to use a DOI?
Keywords
Hadoop, TF-IDF, distributed computing
Abstract

TF-IDF algorithm is often used in search engine, text similarity computation, web data mining, etc. These applications are often faced with the massive data processing. Therefore, how to calculate the tf-idf quickly and efficiently is very important. In this paper, we give a tf-idf algorithm based on the hadoop framework. Experiments show that in the case of massive data computing, the new method applying hadoop framework is more efficient than the traditional methods.

Copyright
© 2012, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2012 International Conference on Computer Application and System Modeling (ICCASM 2012)
Series
Advances in Intelligent Systems Research
Publication Date
August 2012
ISBN
978-94-91216-00-8
ISSN
1951-6851
DOI
10.2991/iccasm.2012.98How to use a DOI?
Copyright
© 2012, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Li Bin
AU  - Guoyong Yuan
PY  - 2012/08
DA  - 2012/08
TI  - Improvement of TF-IDF Algorithm Based on Hadoop Framework
BT  - Proceedings of the 2012 International Conference on Computer Application and System Modeling (ICCASM 2012)
PB  - Atlantis Press
SP  - 391
EP  - 393
SN  - 1951-6851
UR  - https://doi.org/10.2991/iccasm.2012.98
DO  - 10.2991/iccasm.2012.98
ID  - Bin2012/08
ER  -