Mining data quality rules based on T-dependence
- DOI
- 10.2991/eusflat-19.2019.28How to use a DOI?
- Keywords
- Data Quality Pattern Mining Consistency Triangular Norms
- Abstract
Since their introduction in 1976, edit rules have been a standard tool in statistical analysis. Basically, edit rules are a compact representation of non-permitted combinations of values in a dataset. In this paper, we propose a technique to automatically find edit rules by use of the concept of T-dependence. We first generalize the traditional notion of lift, to that of T-lift, where stochastic independence is generalized to T-dependence. A combination of values is declared as an edit rule under a t-norm T if there is a strong negative correlation under T-dependence. We show several interesting properties of this approach. In particular, we show that under the minimum t-norm, edit rules can be computed efficiently by use of frequent pattern trees. Experimental results show that there is a weak to medium correlation in the rank order of edit rules obtained under T_M and T_P, indicating that the semantics of these kinds of dependencies are different.
- Copyright
- © 2019, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Toon Boeckling AU - Antoon Bronselaer AU - Guy De Tré PY - 2019/08 DA - 2019/08 TI - Mining data quality rules based on T-dependence BT - Proceedings of the 11th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT 2019) PB - Atlantis Press SP - 184 EP - 191 SN - 2589-6644 UR - https://doi.org/10.2991/eusflat-19.2019.28 DO - 10.2991/eusflat-19.2019.28 ID - Boeckling2019/08 ER -