A Novel Processing Model For Scds In ETL
- DOI
- 10.2991/jimec-17.2017.29How to use a DOI?
- Keywords
- ETL, MapReduce, map-only
- Abstract
ETL(Extract-Transform-Load) which populates data from various data source systems to data warehouses (DWs) is an important part of building data warehouse. Nowadays, as the data growing rapidly, it is a big challenge for ETL to process such huge data quickly. MapReduce is a programming model for large-scale data-intensive processing. It is composed of two functions, map and reduce, this promotes the implementation of many tasks in parallel. However, this model has its disadvantages. For example, it is not so efficiency when the mappers produce lots of data, which will take a lot of network cost to move the Intermediate data to reducers. In this paper, we present a new method called map-only. With this method, we do the reduce in the local and do not need to transfer the data to the reducers through the network. The result shows that the method we present performs very well, which improves the speed of processing data for both Type-1 and Type-2 SCDs. For example, when the size of increasing data is 5GB, with the map-only method, it takes only 20 minutes to process the Type-2 SCDs while it costs 28 minutes to process the same data.
- Copyright
- © 2017, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Li Sun AU - Jiaoyan Zhang AU - Jiyun Li PY - 2017/10 DA - 2017/10 TI - A Novel Processing Model For Scds In ETL BT - Proceedings of the 2017 2nd Joint International Information Technology, Mechanical and Electronic Engineering Conference (JIMEC 2017) PB - Atlantis Press SP - 133 EP - 136 SN - 2352-538X UR - https://doi.org/10.2991/jimec-17.2017.29 DO - 10.2991/jimec-17.2017.29 ID - Sun2017/10 ER -