Research on Serialization Storage Strategy Based on Spark Cluster
Authors
Fangfang Yang, Yuchong Xia
Corresponding Author
Fangfang Yang
Available Online April 2017.
- DOI
- 10.2991/icmmct-17.2017.96How to use a DOI?
- Keywords
- Spark; Memory; Operator; RDD; Caching
- Abstract
Spark is a kind of big data processing platform based on memory computing. The Spark default serialization strategy has low utilization of cache which has greatly influenced the efficiency of Spark task execution. For solving this problem of low computational efficiency caused by insufficient memory, this paper proposes an optimized serialized storage strategy, which combining with the running cot of RDD, RDD execution time and count of Action. Experimental results show that the proposed strategy can improve the computational efficiency under the limited task resources.
- Copyright
- © 2017, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Fangfang Yang AU - Yuchong Xia PY - 2017/04 DA - 2017/04 TI - Research on Serialization Storage Strategy Based on Spark Cluster BT - Proceedings of the 2017 5th International Conference on Machinery, Materials and Computing Technology (ICMMCT 2017) PB - Atlantis Press SP - 454 EP - 459 SN - 2352-5401 UR - https://doi.org/10.2991/icmmct-17.2017.96 DO - 10.2991/icmmct-17.2017.96 ID - Yang2017/04 ER -