A Method of Deduplication based on Inconsistency
Authors
Yunpeng Wu, Yiting Lv, Yuping Sun, Qiang Hu, Mingfei Wu, Yu Guo, Decai Wang
Corresponding Author
Yunpeng Wu
Available Online November 2015.
- DOI
- 10.2991/emeeit-15.2015.68How to use a DOI?
- Keywords
- Data cleaning, Deduplication, Inconsistency, Functional Dependency, Index.
- Abstract
Reducing the number of comparisons is the most common way to improve the effectiveness of data cleaning. We investigate the problem by using inconsistency. We split redundant data into three categories. For each category, we give an algorithm and analyze its complexity, and combine them together finally. In particular, we address the chasing problem for the method under functional dependency. At the last, we experimentally verify that these algorithms effective and scale well, and that the method helps us more efficiently detecting duplications.
- Copyright
- © 2015, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Yunpeng Wu AU - Yiting Lv AU - Yuping Sun AU - Qiang Hu AU - Mingfei Wu AU - Yu Guo AU - Decai Wang PY - 2015/11 DA - 2015/11 TI - A Method of Deduplication based on Inconsistency BT - Proceedings of the 2015 International conference on Engineering Management, Engineering Education and Information Technology PB - Atlantis Press SP - 348 EP - 354 SN - 2352-538X UR - https://doi.org/10.2991/emeeit-15.2015.68 DO - 10.2991/emeeit-15.2015.68 ID - Wu2015/11 ER -