Association Analysis of Large Sample Data Based on Hadoop
- DOI
- 10.2991/iiicec-15.2015.277How to use a DOI?
- Keywords
- Hadoop; Mahout; Association Rule Mining; FP-growth Algorithm; Pattern Assessment
- Abstract
This paper implemented effective associate rule mining based on Hadoop parallel computing. First, the parallel FP-growth algorithm was run on Hadoop platform to find the frequent item sets of the transaction data. Second, the strong association rules was generated from the frequent item sets by a designed algorithm. Then, redundant rules were deleted according to filtering conditions to make model evaluation. After those steps, all the funny and non-redundancy strong association rules were mined out. In addition, this paper also analyzed the efficiency of the Hadoop parallel computing and explained the superiority of the Hadoop parallel computing when it handles big data.
- Copyright
- © 2015, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Ran An AU - Jingchang Pan PY - 2015/03 DA - 2015/03 TI - Association Analysis of Large Sample Data Based on Hadoop BT - Proceedings of the 2015 International Industrial Informatics and Computer Engineering Conference PB - Atlantis Press SP - 1255 EP - 1258 SN - 2352-538X UR - https://doi.org/10.2991/iiicec-15.2015.277 DO - 10.2991/iiicec-15.2015.277 ID - An2015/03 ER -