Distributed Systems verification using fault injection approach

Examensarbete för masterexamen

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.12380/237946
Download file(s):
File Description SizeFormat 
237946.pdfFulltext855.34 kBAdobe PDFView/Open
Type: Examensarbete för masterexamen
Master Thesis
Title: Distributed Systems verification using fault injection approach
Authors: Hao, Zhenxiao
Alnawasreh, Khaled
Abstract: Software nowadays becomes more complex and the number of the components that is involved in an application is externally large. If a fault occurs, the fault can easily propagate, become larger and take more time to detect and reproduce. Therefore, having a robust system that is able to perform normally even with the existence of faults is very important, but at the same time is very challenging. Different researches have been involved in handling and improving the robustness by using fault injection techniques presented in [23], [31]. Fault injection is mainly used in order to detect the unexpected faults as well as the dependencies bottleneck. Fault injection approaches work by sending fault messages to the components within a distributed system and observing how the system can handle them. This study presents a fault injection approach for testing the robustness of the embedded distributed system in the RBS (Radio based station) at Ericsson. RBS is a distributed system that consists of components that communicate with each other via messages. One characteristic of the distributed system at Ericsson is the possibility to work and provide services even though some components fail. Since the components are stateful and have complex protocol, verifying that the system is robust is not a trivial task. The new approach is inspired from Netflix’s ChaosMonkey. When Netflix moved their data center to amazon web service, they had the need to use fault injection technique for testing the reliability of the distributed system. After deep analysing of the Performance Management(PM) framework documentations at Ericsson, some potential bottlenecks have been discovered and some strategies on how the faults can be triggered have been implemented. A fault injection tool have been developed in this study for testing the robustness of the distributed system. Moreover, unexpected faults were detected after generating two fault types, which were sending random messages as well as delaying messages. This study illustrates the potential of utilizing fault injection approach that comes as a complementary to traditional software testing. The report is written in English.
Keywords: Data- och informationsvetenskap;Computer and Information Science
Issue Date: 2016
Publisher: Chalmers tekniska högskola / Institutionen för data- och informationsteknik (Chalmers)
Chalmers University of Technology / Department of Computer Science and Engineering (Chalmers)
URI: https://hdl.handle.net/20.500.12380/237946
Collection:Examensarbeten för masterexamen // Master Theses

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.