Distributed Systems verification using fault injection approach

Publicerad

Typ

Examensarbete för masterexamen
Master Thesis

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

Software nowadays becomes more complex and the number of the components that is involved in an application is externally large. If a fault occurs, the fault can easily propagate, become larger and take more time to detect and reproduce. Therefore, having a robust system that is able to perform normally even with the existence of faults is very important, but at the same time is very challenging. Different researches have been involved in handling and improving the robustness by using fault injection techniques presented in [23], [31]. Fault injection is mainly used in order to detect the unexpected faults as well as the dependencies bottleneck. Fault injection approaches work by sending fault messages to the components within a distributed system and observing how the system can handle them. This study presents a fault injection approach for testing the robustness of the embedded distributed system in the RBS (Radio based station) at Ericsson. RBS is a distributed system that consists of components that communicate with each other via messages. One characteristic of the distributed system at Ericsson is the possibility to work and provide services even though some components fail. Since the components are stateful and have complex protocol, verifying that the system is robust is not a trivial task. The new approach is inspired from Netflix’s ChaosMonkey. When Netflix moved their data center to amazon web service, they had the need to use fault injection technique for testing the reliability of the distributed system. After deep analysing of the Performance Management(PM) framework documentations at Ericsson, some potential bottlenecks have been discovered and some strategies on how the faults can be triggered have been implemented. A fault injection tool have been developed in this study for testing the robustness of the distributed system. Moreover, unexpected faults were detected after generating two fault types, which were sending random messages as well as delaying messages. This study illustrates the potential of utilizing fault injection approach that comes as a complementary to traditional software testing. The report is written in English.

Beskrivning

Ämne/nyckelord

Data- och informationsvetenskap, Computer and Information Science

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced