Fault Tolerant Distributed Stream Processing based on Backtracking
- DOI
- 10.2991/ijndc.2013.1.4.4How to use a DOI?
- Keywords
- stream processing, fault tolerance, checkpointing, failure recovery
- Abstract
Since distributed stream analytics is treated as a kind of cloud service, there exists a pressing need for its reliability and fault-tolerance, to guarantee the streaming data tuples to be processed in the order of their generation in every dataflow path, with each tuple processed once and only once. Currently there exist two kind approaches: one treats the whole process as a single transaction, and therefore suffers from the loss of intermediate results during failures; the other relies on the receipt of acknowledgement (ACK) to decide whether moving forward to emit the next resulting tuple or resending the current one after timeout, on the per-tuple basis, thus incurs extremely high latency penalty. In contradistinction to the above, we propose the backtrack mechanism for failure recovery, which allows a task to process tuples continuously without waiting for ACKs and without resending tuples in the failure-free case, but to request (ASK) the source tasks to resend the missing tuples only when it is restored from a failure which is a rare case thus has limited impact on the overall performance. The specific hard problem for building a transaction layer on-top of an existing stream processing platform consists in how to keep track the physical input/output messaging channels in order to realize re-messaging during failure recovery. Our solution is characterized by tracking physical messaging channels logically, for that we introduce the notions of virtual channel, task alias and messageId-set in reasoning, recording and communicating the channel information. We also provide a designated messaging channel, separated from the regular dataflow channel, for signaling ACK/ASK messages and for resending tuples, in order to avoid interrupting the regular order of data transfer. We have implemented the proposed mechanisms on Fontainebleau, the distributed stream analytics infrastructure we developed on top of Storm. As a principle, we ensure all the transactional properties to be system supported and transparent to users. Our experience shows the novelty and efficiency of the proposed mechanisms.
- Copyright
- © 2017, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - JOUR AU - Qiming Chen AU - Meichun Hsu AU - Castellanos Malu PY - 2013 DA - 2013/11/01 TI - Fault Tolerant Distributed Stream Processing based on Backtracking JO - International Journal of Networked and Distributed Computing SP - 226 EP - 238 VL - 1 IS - 4 SN - 2211-7946 UR - https://doi.org/10.2991/ijndc.2013.1.4.4 DO - 10.2991/ijndc.2013.1.4.4 ID - Chen2013 ER -