Proceedings of the 2017 5th International Conference on Frontiers of Manufacturing Science and Measuring Technology (FMSMT 2017)

Locality-based Partitioning for Spark

Authors
Yuchong Xia, Fangfang Yang
Corresponding Author
Yuchong Xia
Available Online April 2017.
DOI
10.2991/fmsmt-17.2017.233How to use a DOI?
Keywords
Spark, shuffle, locality, data skew.
Abstract

Spark is a memory-based distributed data processing framework. Lots of data is transmitted through the network in the shuffle process, which is the main bottleneck of the Spark. Because the partitions are unbalanced in different nodes , the Reduce task input are unbalanced. In order to solve this problem, a partition policy based on task local level is designed to balance the task input. Finally, the optimization mechanism is verified by experiments, which can alleviate the data-skew and improve the efficiency of the job process.

Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2017 5th International Conference on Frontiers of Manufacturing Science and Measuring Technology (FMSMT 2017)
Series
Advances in Engineering Research
Publication Date
April 2017
ISBN
978-94-6252-331-9
ISSN
2352-5401
DOI
10.2991/fmsmt-17.2017.233How to use a DOI?
Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Yuchong Xia
AU  - Fangfang Yang
PY  - 2017/04
DA  - 2017/04
TI  - Locality-based Partitioning for Spark
BT  - Proceedings of the 2017 5th International Conference on Frontiers of Manufacturing Science and Measuring Technology (FMSMT 2017)
PB  - Atlantis Press
SP  - 1188
EP  - 1192
SN  - 2352-5401
UR  - https://doi.org/10.2991/fmsmt-17.2017.233
DO  - 10.2991/fmsmt-17.2017.233
ID  - Xia2017/04
ER  -