Manually Mapping Model Elements onto the Modeled Code by Analyzing GitHub Data

Examensarbete för masterexamen
Master's Thesis
Software engineering and technology (MPSOF), MSc
Zhang, Wenli
Context: Class diagrams are one of the most popular UML models and are frequently used in the early stages of software development. The advantage of using class diagrams is that they can reflect design decisions and the system’s implementation structure. Maintainers can use class diagrams to understand the system’s implementation structure. Yet, as the code evolves, the absence of updating class diagrams will cause the code implementation to deviate from the class diagram design. One concern is that such a divergent class diagram does not help maintainers much in the same way during the maintenance stage. As a solution, reverse engineering methods/tools can reverse code into class diagrams. Yet, another concern comes up, in most cases, the reverse-engineered class diagrams are not abstract, and they contain extensive information that will burden the understanding of the system’s implementation structure. This is because the existing reverse engineering methods/tools are imperfect as they do not manage to imitate the human ability to abstract relevant information from the source code. Surprisingly, existing studies on the characteristics of manual abstraction are based on the opinions and experiences of participants but do not study actual cases of models and source code. Also, the methods/technologies used for checking the similarities and differences between the models and source code are purely structural but do not analyze or take the semantics of the model elements into account when mapping classes from models and code. The semantics is closely related to abstraction creation. Thereby, a systematic manual study on the characteristics of manual abstraction is required. Aim: To fill this gap, this thesis aimed at studying the characteristics of the differences between the class diagram design and the code implementation by manually creating the mappings between the class diagram elements/constituents and the code constructs. Our manual studies can precisely capture the differences between the class diagram design and source code implementation and investigate the causes of these differences. Method: We employed the methodology of five case studies. The five subjects studied are five Java open-source projects collected from GitHub. They are semirandomly selected from the Linholmen dataset [1]. Results: For the differences between the class diagram design and code implementation, three causes are summarized: various levels of manual abstraction created in class diagrams, deviations of code implementation from class diagram design, and common changes between the class diagram elements/constituents and code constructs. We contribute to a sorted list of cases corresponding to these three causes.
UML , Models in Open Source Systems , Reverse Engineering , Deviations between Code and Design , Manual Abstraction in Modeling
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Teknik / material