Alternatives for Eliminating Duplicate in Data Storage
Authors
Tianming Yang, Jing Zhang, Wei Sun
Corresponding Author
Tianming Yang
Available Online July 2013.
- DOI
- 10.2991/iccnce.2013.140How to use a DOI?
- Keywords
- Data Storage, Duplicate Elimination, Compare- by-Hash.
- Abstract
Duplicate Elimination (DE) is a specialized data compression technique for eliminating duplicate copies of repeating data to optimize the use of storage space or bandwidth. The most common form of DE implementation works by dividing files as chunks and comparing chunks of data to detect duplicates. This paper implements a content-based chunking algorithm to improve duplicate elimination over fixed-sized blocking, and evaluates the methods of chunk comparison, that is, compare-by-hash versus compare-by-value. It indicates that compare-by-hash is efficient and feasible even employed in ultra-large-scale storage systems.
- Copyright
- © 2013, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Tianming Yang AU - Jing Zhang AU - Wei Sun PY - 2013/07 DA - 2013/07 TI - Alternatives for Eliminating Duplicate in Data Storage BT - Proceedings of the International Conference on Computer, Networks and Communication Engineering (ICCNCE 2013) PB - Atlantis Press SP - 565 EP - 568 SN - 1951-6851 UR - https://doi.org/10.2991/iccnce.2013.140 DO - 10.2991/iccnce.2013.140 ID - Yang2013/07 ER -