Data Mining Engine based on Big Data
- DOI
- 10.2991/emcs-16.2016.64How to use a DOI?
- Keywords
- Big data; Data mining; Spark
- Abstract
In the environment of billions of data, aspect mining, real-time processing, ad hoc analysis, and offline computation ask higher requirement on calculation and storage performance. However, the data mining platform realized based on traditional relational database, distributed Hadoop platform cannot satisfy all tasks. This paper completes two traditional data mining algorithms-parallel transformation of Apriori and PageRank based on the in-memory computing module of Spark and its several actions as well as conversion operators. The conducted experiment validates the implementation efficiency as well as parallel effect of these two algorithms. This platform does not only take full advantage of in-memory computing, improve iteration speed but also support various distributed computing and storage scenes with strong expandability, which can perfectly deal with various scenario issues in big data environment.
- Copyright
- © 2016, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Song Guo PY - 2016/01 DA - 2016/01 TI - Data Mining Engine based on Big Data BT - Proceedings of the 2016 International Conference on Education, Management, Computer and Society PB - Atlantis Press SP - 264 EP - 267 SN - 2352-538X UR - https://doi.org/10.2991/emcs-16.2016.64 DO - 10.2991/emcs-16.2016.64 ID - Guo2016/01 ER -