Department of Computer Science and Engineering Chalmers University of Technology Gothenburg, Sweden, 2025 QuickInv: Program Invariant Discovery from Execution Traces Dynamic invariant discovery system built on top of a theory exploration system Master’s thesis in Computer Science and Engineering EDVIN NILSSON ENAYATULLAH NOROZI Master’s thesis 2025 QuickInv: Program Invariant Discovery from Execution Traces Dynamic invariant discovery system built on top of a theory exploration system EDVIN NILSSON ENAYATULLAH NOROZI Department of Computer Science and Engineering Chalmers University of Technology Gothenburg, Sweden, 2025 QuickInv: Program Invariant Discovery from Execution Traces Dynamic invariant discovery system built on top of a theory exploration system EDVIN NILSSON ENAYATULLAH NOROZI © EDVIN NILSSON, ENAYATULLAH NOROZI, 2025 Supervisor: Nicholas Smallbone, Department of Computer Science and Engineering Examiner: Yehia Abd Alrahman, Department of Computer Science and Engineering Master’s Thesis 2025 Department of Computer Science and Engineering Chalmers University of Technology SE-412 96 Gothenburg Telephone +46 31-772 10 00 Typeset in Typst Gothenburg, Sweden, 2025 iv QuickInv: Program Invariant Discovery from Execution Traces Dynamic invariant discovery system built on top of a theory exploration system EDVIN NILSSON ENAYATULLAH NOROZI Department of Computer Science and Engineering Chalmers University of Technology Abstract Automatic loop invariant inference is a fundamental problem in program verification, yet it is challenging due to the complex behavior of program loops, and the problem is undecidable. Traditional approaches are often limited by predefined templates or are specialized to specific types of loop invariants. This thesis presents QuickInv, a dynamic invariant discovery system that is built upon the principles of theory exploration to overcome these limitations. QuickInv discovers high-level invariants by systematically exploring potential invariants by testing against recorded execution traces. We demonstrate QuickInv on several common array programs, such as binary search, selection sort, and merge sort, successfully recovering many key loop invariants. Keywords: Invariant inference, Dynamic program analysis, Theory exploration, Random testing, Program correctness v Acknowledgements We would like to thank our supervisor Nicholas Smallbone for his support, guidance and encouragement throughout this project. We would also like to thank our examiner Yehia Abd Alrahman and our opponents Loke Gustafsson and Erik Magnusson for their valuable feedback. Edvin Nilsson and Enayatullah Norozi, Gothenburg, June 2025 vii Contents List of Figures xi List of Tables xiii 1 Introduction 1 1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Background 7 2.1 Floyd-Hoare Logic and Invariant Inference . . . . . . . . . . . . . . . . . . . 7 2.2 Theory Exploration and QuickSpec . . . . . . . . . . . . . . . . . . . . . . 10 3 A New Lens on Invariant Discovery – Theory Exploration 13 3.1 Program Invariants as Closed Logical Formulas . . . . . . . . . . . . . . . . 14 4 How QuickInv Works 17 4.1 Capturing Program States from Execution Traces . . . . . . . . . . . . . . 18 4.2 Program States and Variables . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.3 Sequencing Theory Exploration . . . . . . . . . . . . . . . . . . . . . . . . 22 4.4 Configuring QuickInv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5 Designing a Grammar 25 5.1 Program Invariant Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.2 Dealing with Partial Functions (Slice and Indexing) . . . . . . . . . . . . . 27 5.2.1 Combined Predicates . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.2.2 Generating Valid Input . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.2.3 Making Partial Functions Total . . . . . . . . . . . . . . . . . . . . . 29 5.2.4 Too General Invariants . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.3 Too Many Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.3.1 Monotonicity and Antitonicity . . . . . . . . . . . . . . . . . . . . . . 32 5.3.2 Semilattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.4 Undesirable Terms in the Grammar . . . . . . . . . . . . . . . . . . . . . . 33 5.5 False Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 6 Evaluation on Example Algorithms 35 ix Contents 6.1 Binary Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 6.2 Linear Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 6.3 Selection Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 6.4 Merge Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 7 Related Work 43 8 Discussion and Conclusion 45 9 References 49 x List of Figures Figure 1 Example program state log of binary search. . . . . . . . . . . . . . . . . 3 Figure 2 Illustration of QuickInvs pipeline: targeting a program invariant, program state data collection and testing candidate invariants. . . . . 18 Figure 3 Derivation of the invariant key ∈ arr[lo..hi) = key ∈ arr of binary search. . 31 xi List of Figures xii List of Tables Table 1 QuickInv results for binary search. . . . . . . . . . . . . . . . . . . . . . . 36 Table 2 QuickInv results for linear search. . . . . . . . . . . . . . . . . . . . . . . 37 Table 3 QuickInv results for selection sort. . . . . . . . . . . . . . . . . . . . . . . 38 Table 4 QuickInv results for the helper function find_min of selection sort. . . . . 38 Table 5 QuickInv results for merge sort. . . . . . . . . . . . . . . . . . . . . . . . 40 xiii List of Tables xiv 1 Introduction Writing bug-free and correct computer programs is hard. The most common way of finding software bugs is through testing, but testing still misses many bugs. A more rigorous way of preventing bugs is to use program verification tools, which allow programmers to formally prove that a program meets a specification. A fundamental problem in program verification, when verifying the behavior of program loops is to find suitable loop invariants. A loop invariant is a property, or an assertion of a program loop that holds before loop entry, between each loop iteration, and after loop termination. In Floyd-Hoare logic [1], [2], a widely used formal system for reasoning about the correctness of computer programs, loop invariants must be specified in order to reason about program loops. In general, it can be tricky for programmers to come up with suitable loop invariants to verify loop behavior. The loop invariant needs to be weak enough to satisfy every loop iteration and strong enough to satisfy the post-condition when the loop terminates. An alternative to human-specified loop invariants is to find them using automated tools. Automating loop invariant inference is also challenging. The problem of automatically inferring loop invariants is undecidable [3], meaning it is impossible to construct an algorithm that finds the correct loop invariants for any arbitrary loop. Instead, loop invariant inference will have to rely on heuristics, approximations, or restricted domains to work. For example, consider verifying that the implementation of binary search in Listing 1 in C is correct. The function’s return value, 𝑖𝑑𝑥, is the index where the value of 𝑘𝑒𝑦 exists in the array, or the special value −1 if the value does not appear in the array. This can be specified using two post-conditions: 𝑖𝑑𝑥 ≠ −1 ⟹ 𝑎𝑟𝑟[𝑖𝑑𝑥] = 𝑘𝑒𝑦 and 𝑖𝑑𝑥 = −1 ⟹ 𝑘𝑒𝑦 ∉ 𝑎𝑟𝑟. To verify that this implementation fulfills its post-conditions using a program verifier such as Dafny [4], two loop invariants are required: 0 ≤ 𝑙𝑜 ≤ ℎ𝑖 ≤ | 𝑎𝑟𝑟| and 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟 ⟺ 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟[𝑙𝑜 .. ℎ𝑖). 1 1. Introduction int binary_search(int* arr, int arr_len, int key) // pre: sorted arr // post: -1 <= idx < arr_len // post: idx != -1 ==> arr[idx] == key // post: idx == -1 ==> not (key in arr) { int lo = 0; int hi = arr_len; int idx = -1; while (lo < hi) // inv: 0 <= lo <= hi <= arr_len // inv: key in arr[lo..hi) <=> key in arr { int mid = (lo + hi) / 2; if (arr[mid] < key) { lo = mid + 1; } else if (arr[mid] > key) { hi = mid; } else if (arr[mid] == key) { idx = mid; break; } } return idx; } Listing 1 : Implementation of binary search. The most common way of automatically inferring loop invariants is by using static methods. Static methods analyze the program without executing it, relying on tech- niques like abstract interpretation [5] and symbolic execution [6] to find sound loop invariants. These methods are often specialized to specific types of invariants that they can infer, and are not easily extended to other types. The second loop invariant of binary search, 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟 ⟺ 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟[𝑙𝑜 .. ℎ𝑖) is about array membership, and these static methods most often express invariants using low-level logic. When using this low-level logic, array membership requires more complicated formulas with quantifiers, which are harder to understand and less intuitive for programmers. For example, this loop invariant would be expressed as ∃𝑖. 0 ≤ 𝑖 < | 𝑎𝑟𝑟| ∧ 𝑎𝑟𝑟[𝑖] = 𝑘𝑒𝑦 ⟺ ∃𝑖. 𝑙𝑜 ≤ 𝑖 < ℎ𝑖 ∧ 𝑎𝑟𝑟[𝑖] = 𝑘𝑒𝑦. The other general approach to inferring loop invariants is by using dynamic methods, which analyze the program by executing it with different inputs. The invariants found by dynamic methods are not guaranteed to be sound, but are often very effective in practice. One such dynamic tool is Daikon [7], which can find invariants using predefined templates such as 𝑥 = 𝑦, 𝑥 ≤ 𝑦, and 𝑦 = 𝑎𝑥 + 𝑏. These predefined templates restrict which invariants can be found, and the loop invariant 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟 ⟺ 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟[𝑙𝑜 .. ℎ𝑖) has a very specific shape. There is no template for it, so Daikon cannot find it. Dynamate [8], a system built on top of Daikon, can derive shapes of invariants from the post- conditions. If we write one of the post-conditions as 𝑖𝑑𝑥 = −1 ⟹ 𝑘𝑒𝑦 ∉ 𝑎𝑟𝑟[0..| 𝑎𝑟𝑟|), the system can find the loop invariants 𝑘𝑒𝑦 ∉ 𝑎𝑟𝑟[0.. 𝑙𝑜) and 𝑘𝑒𝑦 ∉ 𝑎𝑟𝑟[ℎ𝑖 ..| 𝑎𝑟𝑟|). These 2 1. Introduction two invariants can verify this implementation of binary search, but they are more restrictive than 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟 ⟺ 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟[𝑙𝑜 .. ℎ𝑖). For example, the latter can verify the variant of binary search that returns the first occurrence, which the first two invariants cannot. Loop invariants inferred from existing systems are often expressed in low-level logic, which can be less intuitive than high-level loop invariants. We also see a need for a flexible system that can find loop invariants without being restricted by predefined templates or specific shapes, and that is easily extensible by the user with new domain- specific operators. Our approach to dynamic loop invariant inference with QuickInv is to search for equational laws expressed in a high-level term language that hold for recorded program states captured while running the program. QuickInv uses a theory exploration system to search for loop invariants and is not limited by predefined templates. To discover these loop invariants of binary search with this approach, we first run the program multiple times with randomized input to record program states. Binary search expects its input array to be sorted, and this is a pre-condition that the input generation must respect. The program states are captured at the beginning of every loop iteration and after loop termination, which is where the loop invariants should hold, and the program states contain the values of all program variables in scope. An example program state log of binary search is shown in Figure 1. arr ↦[1,2,3,4,5] key ↦4 lo ↦0 hi ↦5 idx ↦-1 arr ↦[1,2,3,4,5] key ↦4 lo ↦3 hi ↦5 idx ↦-1 arr ↦[1,2,3,4,5] key ↦4 lo ↦3 hi ↦4 idx ↦-1 arr ↦[1,2,3,4,5] key ↦4 lo ↦3 hi ↦4 idx ↦3 arr ↦[1,2,8] key ↦6 lo ↦0 hi ↦3 idx ↦-1 arr ↦[1,2,8] key ↦6 lo ↦2 hi ↦3 idx ↦-1 arr ↦[1,2,8] key ↦6 lo ↦2 hi ↦2 idx ↦-1 Figure 1 : Example program state log of binary search. The next step is to explore possible invariant candidates that could hold for the captured program states. In QuickInv, invariants are expressed in an expressive high-level term language using provided domain-specific functions and constants, rather than low-level first-order logic like in many other systems. For example, array membership can simply be expressed as 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟 where ∈ is a provided function, instead of ∃𝑖. 0 ≤ 𝑖 < |𝑎𝑟𝑟| ∧ 𝑎𝑟𝑟[𝑖] = 𝑘𝑒𝑦. QuickInv is a Haskell program that generates the randomized input and collects the recorded program states for a C program. The grammar used in QuickInv to express invariants is defined in the functional programming language Haskell, and the user can add any Haskell function to the grammar. Given a grammar with functions and constants like slice, ∈, ≤, length, 0, 1 and True, where slice is an array slicing function defined as 𝑠𝑙𝑖𝑐𝑒 𝑎𝑟𝑟 𝑖 𝑗 ≔ 𝑎𝑟𝑟[𝑖..𝑗), we can generate invariant candidates by enumerating type-correct terms from the given grammar and, through testing, determine which terms are equivalent to each other. For example, we can find that key ∈ arr is equivalent to 3 1. Introduction key ∈ slice arr lo hi when testing against the captured program states, and therefore discover one of the loop invariants with the law key ∈ arr = key ∈ slice arr lo hi. We can also find the other loop invariant 0 ≤ 𝑙𝑜 ≤ ℎ𝑖 ≤ |𝑎𝑟𝑟| by discovering that the terms 0 ≤ lo, lo ≤ hi and hi ≤ length arr are equivalent to True. A fundamental challenge with this approach is that the search space for possible terms is very large and grows exponentially when the term size increases. Compared to other systems like Daikon, QuickInv is very expressive and is not limited by predefined templates. To reduce the search space, Daikon uses many hand-crafted optimizations, while QuickInv can reduce the search space by using discovered proper- ties of the given grammar. Letting the user add any domain-specific function to the grammar makes QuickInv easily extendable to algorithms from various domains, and the system will automatically adapt to the new grammar. 4 1. Introduction 1.1 Contributions This thesis presents QuickInv, a dynamic invariant discovery system built on top of a theory exploration system. Additionally, we build upon previous work by Reynolds [9] and present a grammar for expressing high-level program invariants for finite array programs with a theory exploration system. We evaluate the system by rediscovering the required loop invariants for a set of example array programs: binary search, linear search, selection sort, and merge sort. Finally, we present problems and limitations of the system observed in these examples and suggest possible solutions to the problems. The research questions we address are: • What is a suitable grammar for expressing high-level program invariants for finite array programs? • How effective is a dynamic, theory-exploration-based approach to discovering pro- gram invariants? 1.2 Roadmap This thesis is organized as follows: Section 2 provides the necessary background, introducing the principles of Floyd-Hoare logic, the invariant inference problem and theory exploration. Section 3 presents our core conceptual contribution, reframing the problem of invariant discovery as a theory exploration problem. Section 4 details the implementation of our system QuickInv; explaining how it captures program states and discovers program invariants. Section 5 focuses on the challenge of designing an effective and expressive grammar for these invariants. Section 6 evaluates the system’s performance on a set of common array algorithms. Section 7 discusses related work in the field of program analysis and invariant inference. Finally, Section 8 concludes the thesis by discussing the research questions, our contributions, and future work. 5 1. Introduction 6 2 Background Program invariants are a fundamental concept in program verification, but also useful for program documentation, regression testing, and reasoning about program behavior. While useful in other contexts, this thesis motivates automatic program invariants discovery as a way to assist program verification using Floyd-Hoare logic. 2.1 Floyd-Hoare Logic and Invariant Inference Floyd-Hoare logic [1], [2], is a foundational formal system for reasoning about the functional correctness of computer programs. At its core is the Hoare triple {𝑃}𝐶{𝑄}. This notation expresses that if the pre-condition 𝑃 holds before executing the command 𝐶, then the post-condition 𝑄 will hold afterwards assuming that 𝐶 terminates. This is known as partial correctness. A stronger notion of correctness is total correctness which, in addition, requires that 𝐶 terminates. Floyd-Hoare logic includes a set of axioms and inference rules that support the inductive derivation of valid Hoare triples over the structure of program commands. This compositional approach allows the correctness of complex programs to be established from the correctness of their parts. Termination is often proved using a variant, a well-founded measure that decreases in loops and recursive functions, ensuring that the program will eventually terminate. A particularly important rule is the while rule (Equation 1) which states that a loop is partially correct if it satisfies three conditions: (1) a property we call the loop invariant 𝐼 holds before the loop starts – it follows from the pre-condition 𝑃 , (2) the loop invariant 𝐼 is maintained over each iteration of the loop, (3) when the loop terminates, the invariant and the negation of the loop guard 𝐺 imply the post-condition 𝑄. In particular, the loop invariant 𝐼 is a property that holds before the loop starts, is maintained over each iteration of the loop, and holds after the loop terminates. The loop invariant 𝐼 is a property that needs to be specified for the proof and needs to satisfy the three conditions of the while rule. 𝑃 ⇒ 𝐼 {𝐼 ∧ 𝐺}𝐶{𝐼} (𝐼 ∧ ¬𝐺) ⇒ 𝑄 {𝑃} 𝚠𝚑𝚒𝚕𝚎 𝐺 𝚍𝚘 𝐶{𝑄} (1) The while rule provides a systematic way to reason about loops, which are inherently complex due to their potentially unbounded execution. It allows verification of partial correctness by using a loop invariant 𝐼 , without having to unfold the loop – it is 7 2. Background enough to proof three simpler conditions. The while rule bridges the gap between static reasoning and dynamic behavior, making program verification practical. Program verifiers such as Dafny [4], based on Floyd-Hoare logic, can automatically and efficiently verify functional correctness of programs by checking the validity of Hoare triples: given suitable pre-conditions, post-conditions, and loop invariants. Post- conditions – describing the expected behavior of the program after execution – and pre-conditions – describing the valid inputs to the program – are often easy to specify. For example the post-condition of the binary search program in Listing 1 is that the return value is the index where the searched value exists in the array, or the special value −1 if the value does not appear in the array, and the pre-condition is that the input array is sorted. Proving termination is also easy; for binary search, it is sufficient to choose the variant 𝑙𝑜 − ℎ𝑖 as the measure that decreases with each loop iteration. However, inventing loop invariants remains a major bottleneck; it needs to be strong enough to ensure the post-condition holds after the loop, yet weak enough to hold before and after every loop iteration. This is known as the invariant inference problem. It is a non-trivial task, especially for complex loops or entire programs, and often requires deep understanding of the program’s behavior and structure. Manual loop invariant inference is often a trial-and-error process. Therefore, automating invariant discovery is highly desirable. Existing methods for automatically inferring loop invariants work with low-level first- order logic, which is often very verbose, hard to read, and laborious to reason with. For example, Equation 2 is the loop invariant for the binary search program (Listing 1) expressed in low-level first-order logic¹. (∃𝑖. 0 ≤ 𝑖 < |𝑎𝑟𝑟| ∧ 𝑎𝑟𝑟[𝑖] = 𝑘𝑒𝑦) ⟺ (∃𝑖. 𝑙𝑜 ≤ 𝑖 < ℎ𝑖 ∧ 𝑎𝑟𝑟[𝑖] = 𝑘𝑒𝑦) (2) Previous work [10] suggests a need for expressing invariants as high-level properties based on what they call domain theory. By abstracting common patterns and concepts as high-level constructs, we allow for clarity and expressiveness in the invariants. By introducing domain-specific operators ∈ and array slicing we can express the binary search loop invariant as in Equation 3. We call the set of constants and operators used to express invariants the invariant grammar. 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟 ⟺ 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟[𝑙𝑜..ℎ𝑖) (3) A common class of programs that are less supported by existing methods are array programs, which are programs that manipulate or examine arrays, e.g. the binary search program in Listing 1. These programs are often more complex and requires invariants that are large and awkward to express in a low-level invariant grammar. Previous work ¹It is possible to verify binary search in Listing 1 with the invariant expressed in other ways that can be shorter, e.g. the invariant ∀𝑖. 0 ≤ 𝑖 < |𝑎𝑟𝑟| ∧ ¬(𝑙𝑜 ≤ 𝑖 < ℎ𝑖) ⟹ 𝑎𝑟𝑟[𝑖] ≠ 𝑘𝑒𝑦 is a bit shorter. This invariant is also enough to verify this implementation of binary search, although not equivalent to Equation 2. This invariant would for example not hold for a variation of binary search that return the index of the first occurrence of 𝑘𝑒𝑦, while Equation 2 would still be valid. This invariant, although shorter, is still less intuitive than Equation 3 and complex as it has a quantifier. 8 2. Background [9] presents high-level concepts for reasoning about array programs, which is shown to be effective for constructing manual formal proofs for some array programs. This thesis takes inspiration from these concepts to construct a high-level grammar for expressing program invariants for finite array programs. We use a theory exploration system to discover invariants with a flexible grammar that is mutable by the user, allowing the user to add domain-specific functions and constants to the grammar. 9 2. Background 2.2 Theory Exploration and QuickSpec Theory exploration is a method for systematically discovering interesting lemmas, axioms, or properties within a given formal theory. It involves generating conjectures and attempting to validate them using automated reasoning and testing techniques. QuickSpec [11] is a theory exploration system that discovers equational properties of a set of definitions in Haskell – useful for documentation, program understanding, and regression testing of functional programs. It systematically explores the behavior of a given set of functions and constants and generates conjectures (in the form of an equation) about their relationships. The input to QuickSpec is a set of functions and constants, and their type signatures, e.g. some common list functions in Haskell: (++) : [𝐼𝑛𝑡] × [𝐼𝑛𝑡] → [𝐼𝑛𝑡] -- 𝚕𝚒𝚜𝚝 𝚊𝚙𝚙𝚎𝚗𝚍 (:) : 𝐼𝑛𝑡 × [𝐼𝑛𝑡] → [𝐼𝑛𝑡] -- 𝚕𝚒𝚜𝚝 𝚌𝚘𝚗𝚜 [] : [𝐼𝑛𝑡] reverse : [𝐼𝑛𝑡] → [𝐼𝑛𝑡]. QuickSpec discovers and reports the following properties about these functions and constants: 1. reverse [] = [] 2. xs ++ [] = xs 3. [] ++ xs = xs 4. reverse (reverse xs) = xs 5. reverse (x : []) = x : [] 6. (x : xs) ++ ys = x : (xs ++ ys) 7. (xs ++ ys) ++ zs = xs ++ (ys ++ zs) 8. reverse xs ++ reverse ys = reverse (ys ++ xs) 9. xs ++ (x : (y : [])) = reverse (y : (x : reverse xs)) where xs, ys, and zs are variables of type [Int], and x and y are variables of type Int. QuickSpec automatically discovers the associativity and unit laws for append and all properties of the reverse function. The variables x, y, xs, ys, zs in the equations, found by QuickSpec, are bound by the universal quantifier. The closed form of property 4 in the output of QuickSpec above, for example, is ∀ 𝑥𝑠 : [𝐼𝑛𝑡]. 𝑟𝑒𝑣𝑒𝑟𝑠𝑒 (𝑟𝑒𝑣𝑒𝑟𝑠𝑒 𝑥𝑠) = 𝑥𝑠 (4) QuickSpec explores the equational theory of a set of functions by systematically gener- ating terms through the application of constants and functions to variables and smaller terms, bounded by a configurable term size limit. These terms are then evaluated on randomly sampled values for the universally quantified variables, and terms that produce identical results are grouped into equivalence classes. By leveraging the mathe- matical properties of these classes, QuickSpec efficiently identifies candidate equations 10 2. Background that capture functional relationships. Importantly, it doesn’t rely solely on brute-force testing: QuickSpec integrates lightweight reasoning – often faster than testing – to prune redundant terms and eliminate trivial or derivable equations using rewrite rules. These and other optimizations make its search space surprisingly tractable, enabling QuickSpec to quickly generate meaningful and minimal sets of equations, even for complex programs. QuickSpec uses the input functions and constants to build type-correct terms. Starting with the simplest terms (e.g. [], 𝑥𝑠), it applies the functions to these terms, generating larger terms (e.g. 𝑥 : [], 𝑥𝑠 ⧺ 𝑦𝑠, 𝑟𝑒𝑣𝑒𝑟𝑠𝑒(𝑥𝑠)), and searches for equations that hold between these terms incrementally. The term size limit controls how large terms can be, and thus how vast the search space is. The term 𝑥𝑠 ⧺ 𝑦𝑠, for example, has size 3. A key strength of QuickSpec is to perform automated reasoning to prune properties that are derivable from previously discovered properties. The pruner uses rewrite rules – built from previously discovered equations – to simplify terms on both sides of the equation, to eliminate equations between two terms that rewrite to the same term. Suppose QuickSpec has discovered the 9 properties above. Now consider the candidate equation 𝑟𝑒𝑣𝑒𝑟𝑠𝑒(𝑟𝑒𝑣𝑒𝑟𝑠𝑒([] ⧺ 𝑥𝑠)) = 𝑥𝑠 ⧺ 𝑟𝑒𝑣𝑒𝑟𝑠𝑒([]) This equation is redundant, and the pruner can eliminate it by applying previously discovered rules: 𝑟𝑒𝑣𝑒𝑟𝑠𝑒(𝑟𝑒𝑣𝑒𝑟𝑠𝑒([] ⧺ 𝑥𝑠)) = 𝑥𝑠 ⧺ 𝑟𝑒𝑣𝑒𝑟𝑠𝑒([]) {law 3 to left-hand side} 𝑟𝑒𝑣𝑒𝑟𝑠𝑒(𝑟𝑒𝑣𝑒𝑟𝑠𝑒(𝑥𝑠)) = 𝑥𝑠 ⧺ 𝑟𝑒𝑣𝑒𝑟𝑠𝑒([]) {law 4 to right-hand side} 𝑥𝑠 = 𝑥𝑠 ⧺ 𝑟𝑒𝑣𝑒𝑟𝑠𝑒([]) {law 1 to right-hand side} 𝑥𝑠 = 𝑥𝑠 ⧺ [] {law 2 to right-hand side} 𝑥𝑠 = 𝑥𝑠 . It is possible to sequence multiple runs of QuickSpec by feeding the laws discovered in one run into the next. In this way, the equations from the first run serve as a background theory for subsequent runs. QuickSpec uses this background theory during pruning: it simplifies terms and eliminates equations that are derivable from the previously discovered laws. Additionally, QuickSpec can be configured to use a user-defined set of equations as background theory, enabling more controlled and modular exploration. A notable limitation of QuickSpec is the limited support for discovering conditional equations – that is, equations that hold only under certain conditions. Many general properties such as 𝑥 ≤ 𝑦 ∧ 𝑦 ≤ 𝑧 ⇒ 𝑥 ≤ 𝑧 (transitivity) expressed as an equation 𝑥 ≤ 𝑦 = 𝑇𝑟𝑢𝑒 = 𝑦 ≤ 𝑧 = 𝑇𝑟𝑢𝑒 ⇒ 𝑥 ≤ 𝑧 = 𝑇𝑟𝑢𝑒 are not discovered by QuickSpec. QuickSpec can discover some conditional equations, but it requires the user to provide a predicate and a generator for generating random input for which the predicates holds. Given the predicate 𝑃(𝑥, 𝑦, 𝑧) = 𝑥 ≤ 𝑦 ∧ 𝑦 ≤ 𝑧, QuickSpec can discover the conditional equation 𝑃(𝑥, 𝑦, 𝑧) = 𝑇𝑟𝑢𝑒 ⇒ 𝑥 ≤ 𝑧 = 𝑇𝑟𝑢𝑒. To do this, QuickSpec expresses the conditional equation as a simple equation by introducing variables for the arguments of 𝑃(𝑥, 𝑦, 𝑧): 𝑃0, 𝑃1, and 𝑃2 denoting 𝑥, 𝑦, and 𝑧 respectively. The equation that QuickSpec discovers is 𝑃0 ≤ 𝑃2 = 𝑇𝑟𝑢𝑒; because these variables are prefixed with 𝑃 , QuickSpec knows they are special variables and samples random values for them from the generator provided 11 2. Background by the user. This will ensure that the values for which the equation is tested satisfy the predicate 𝑃(𝑥, 𝑦, 𝑧), and thus the equation holds only when the predicate holds. This encoding of conditional equations imposes a limitation that the predicates in the condition cannot share arguments. Otherwise QuickSpec would not know how to generate a value for the shared variable that fulfills both predicates at the same time. If we use the predicate 𝑃(𝑥, 𝑦) = 𝑥 ≤ 𝑦, QuickSpec will not discover the conditional equation 𝑃(𝑥, 𝑦) = 𝑇𝑟𝑢𝑒 ∧ 𝑃(𝑦, 𝑧) = 𝑇𝑟𝑢𝑒 ⇒ 𝑃(𝑥, 𝑧) = 𝑇𝑟𝑢𝑒 because the two predicates share the variable 𝑦. 12 3 A New Lens on Invariant Discovery – Theory Exploration The fundamental challenge in invariant discovery is the vast search space of potential invariants. Consider the elements present in a program invariant: program variables, constants, operators, and quantifiers. The number of possible properties (equations, inequalities, logical statements) that can be formed from these elements is combinato- rially explosive. A naive “generate-and-test” approach, where one blindly constructs and checks all possible expressions, would be computationally infeasible for anything beyond trivial programs or grammars. Even if we could test many properties, a significant portion would be: • Trivial: e.g., 𝑥 = 𝑥, 0 ≤ 0. • Redundant: Logically implied by simpler, already known properties (e.g., if 𝑡 = 𝑢 is an invariant, then 𝑓(𝑡) = 𝑓(𝑢) is also an invariant for any function 𝑓 , but less fundamental). • False positive: True only by coincidence for a few test cases but not a general property. We are interested in fundamental, concise, and non-obvious properties that characterize the program’s behavior. This thesis suggests theory exploration as a solution strategy for high-level invariant discovery. The key strengths of theory exploration, relevant for invariant discovery, are: (1) Flexible grammar with user-provided functions and constants, (2) systematic and efficient exploration of the search space, (3) automated reasoning to prune redundant and derivable properties from previously discovered properties. These strengths of theory exploration make it a suitable approach for discovering program invariants, as it allows effective navigation of the vast search space of potential invariants, while focusing computational effort on potentially new and interesting properties. To understand how theory exploration can be applied to invariant discovery, we imagine the program’s variables, and the operations applicable to them (as defined in the invariant grammar) as forming a small “mathematical theory.” The program’s execution, at specific points where we want to find invariants, provides concrete instances or models of this theory. An invariant is essentially a “theorem” that holds true across 13 3. A New Lens on Invariant Discovery – Theory Exploration all these observed instances (and ideally, all possible valid instances). The challenge of finding invariants then becomes analogous to exploring this implicit theory to uncover its inherent properties that are consistently satisfied by the program’s runtime behavior. By framing invariant discovery as a theory exploration problem, we can leverage the powerful machinery of existing theory exploration systems. This forms the basis of QuickInv: we reduce the task of inferring program invariants to a theory exploration problem, which we then solve using the theory exploration system QuickSpec. To achieve this, we must express program invariants as closed logical formulas in a form suitable for theory exploration, and determine the appropriate input for theory exploration. 3.1 Program Invariants as Closed Logical Formulas We aim to find program invariants using theory exploration, but the closed form of a program invariant is not always clear from the notation. In this section, we derive a closed form suitable for theory exploration to illustrate how this technique can be used to discover program invariants. Consider the loop invariant in Equation 5 for the binary search program in Listing 1. Invariants are logical formulas that involves program variables², constants, quantifiers, and operators which are pure functions. In this example the program variables are 𝑎𝑟𝑟, 𝑙𝑜, ℎ𝑖, and 𝑘𝑒𝑦, and the operators are ∈, ⟺, and the array slicing operator. 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟 ⟺ 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟[𝑙𝑜..ℎ𝑖) (5) In this notation, the program variables are free variables, but it is clear that the invariant does not hold for any arbitrary values assigned to the program variables. For example the assignment 𝑘𝑒𝑦 ↦ 0, 𝑎𝑟𝑟 ↦ [0, 1, 2], 𝑙𝑜 ↦ 1, ℎ𝑖 ↦ 3 does not satisfy the formula, since 0 is in the whole array but not in the slice [1..3) of the array [0, 1, 2]. This state, however, is not obtainable at any of the points in the program where the loop invariant should hold, because for the state to be reached, the loop would have to set the 𝑙𝑜 variable to 𝑚𝑖𝑑 when 𝑎𝑟𝑟[𝑚𝑖𝑑] < 𝑘𝑒𝑦. This is the opposite of how the program is implemented in Listing 1. The invariant only need to hold for obtainable program states; this should be reflected in the closed form of the invariant. Let 𝕊 denote the set of all program states 𝑠 and 𝒮𝒫 ⊆ 𝕊 denote the set of all program states 𝑠 obtainable by the program execution at a specific set of point 𝒫 in the program. In this example a program state 𝑠 over the loop can be represented by the tuple (𝑘𝑒𝑦, 𝑎𝑟𝑟, 𝑙𝑜, ℎ𝑖) consisting of input to the program and loop variables. A closed form of the invariant without free variables is: ∀(𝑘𝑒𝑦, 𝑎𝑟𝑟, 𝑙𝑜, ℎ𝑖) ∈ 𝒮𝒫. 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟 ⟺ 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟[𝑙𝑜..ℎ𝑖) (6) ²Invariants (specifically post-conditions and loop invariants) can also refer to original values of program input, e.g. the post-condition for a sorting algorithm needs to express that the output array is a permutation of the input array with the invariant 𝑚𝑢𝑙𝑡𝑖𝑠𝑒𝑡(𝑎𝑟𝑟) = 𝑚𝑢𝑙𝑡𝑖𝑠𝑒𝑡(𝑜𝑙𝑑(𝑎𝑟𝑟)). We can always treat invariants as a one-state function by recording a copy of the original value as part of the program state. Only mutable program inputs would require this. We prefix the original value with 𝑜𝑙𝑑 in the program state entry [10]. 14 3. A New Lens on Invariant Discovery – Theory Exploration We can express the invariant in a more theory-exploration-friendly way by modeling program variables as functions from program states to values, where the type of program states is denoted by 𝑆. In this representation we have: 𝑘𝑒𝑦 : 𝑆 → 𝐵𝑜𝑜𝑙, 𝑎𝑟𝑟 : 𝑆 → [𝐼𝑛𝑡], 𝑙𝑜 : 𝑆 → 𝐼𝑛𝑡, and ℎ𝑖 : 𝑆 → 𝐼𝑛𝑡 as used in Equation 7. ∀𝑠 ∈ 𝒮𝒫. 𝑘𝑒𝑦(𝑠) ∈ 𝑎𝑟𝑟(𝑠) ⟺ 𝑘𝑒𝑦(𝑠) ∈ 𝑎𝑟𝑟(𝑠)[𝑙𝑜(𝑠)..ℎ𝑖(𝑠)) (7) We can also optimize this formula for theory exploration with QuickSpec by replacing the logical implication operator with an equality. Furthermore, we can use the prefix operator 𝑠𝑙𝑖𝑐𝑒 : [𝑎] × 𝐼𝑛𝑡 × 𝐼𝑛𝑡 → [𝑎] to denote the array slicing operation. We thereby obtain a universally quantified formula in equational form (Equation 8). ∀𝑠 ∈ 𝒮𝒫. 𝑘𝑒𝑦(𝑠) ∈ 𝑎𝑟𝑟(𝑠) = 𝑘𝑒𝑦(𝑠) ∈ 𝑠𝑙𝑖𝑐𝑒(𝑎𝑟𝑟(𝑠), 𝑙𝑜(𝑠), ℎ𝑖(𝑠)) (8) We now have a closed logical formula that is suitable for theory exploration using QuickSpec – the formula is closed and only contains operators, constants, and variables bound by a universal quantifier . These operators and constants can serve as input to QuickSpec, which can then discover the invariant. Although this form can be discovered by QuickSpec, we can make it nicer to present by getting rid of the function applications of the program variables to 𝑠. To do so we lift the operators: (∈) : 𝑎 × [𝑎] → 𝐵𝑜𝑜𝑙 𝑠𝑙𝑖𝑐𝑒 : [𝑎] × 𝐼𝑛𝑡 × 𝐼𝑛𝑡 → [𝑎] to take a function from program state 𝑆 to values as arguments and result: (∈) : (𝑆 → 𝑎) × (𝑆 → [𝑎]) → 𝑆 → 𝐵𝑜𝑜𝑙 𝑠𝑙𝑖𝑐𝑒 : (𝑆 → [𝑎]) × (𝑆 → 𝐼𝑛𝑡) × (𝑆 → 𝐼𝑛𝑡) → 𝑆 → [𝑎] Now the expression 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟 = 𝑘𝑒𝑦 ∈ 𝑠𝑙𝑖𝑐𝑒(𝑎𝑟𝑟, 𝑙𝑜, ℎ𝑖) has a type 𝑆 → 𝐵𝑜𝑜𝑙 in Equa- tion 9. Given a program state 𝑠, the expression can be evaluated to a Boolean. ∀𝑠 ∈ 𝒮𝒫.[𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟 = 𝑘𝑒𝑦 ∈ 𝑠𝑙𝑖𝑐𝑒(𝑎𝑟𝑟, 𝑙𝑜, ℎ𝑖)](𝑠) (9) Omitting the universal quantifier (as is done by QuickSpec when presenting properties) give us the final form of the invariant in Equation 10. 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟 = 𝑘𝑒𝑦 ∈ 𝑠𝑙𝑖𝑐𝑒(𝑎𝑟𝑟, 𝑙𝑜, ℎ𝑖) (10) This is an equivalent form of the original invariant in Equation 5 (albeit with different types), which now can be discovered by QuickSpec. In other words, by running Quick- Spec with the input functions and constants from Equation 8 and allowing it to evaluate terms internally on the obtainable program states 𝒮𝒫, we can discover the invariant in Equation 10. 15 3. A New Lens on Invariant Discovery – Theory Exploration 16 4 How QuickInv Works QuickInv discovers program invariants by reframing the problem as a theory exploration task. The core methodology involves a three-step pipeline: first, it captures a diverse set of program states by executing the target program; second, it represents program variables and constants in a format suitable for the theory exploration engine QuickSpec; and third, it uses a sequenced, two-phase QuickSpec exploration process to distinguish general mathematical truths and discover a smaller set of specific program invariants. The entire workflow, from code to invariant, is illustrated in Figure 2. It begins by targeting a specific part of a program for observation. For instance, the while loop in a binary search implementation, which is highlighted in the Figure 2. The user instruments the program to record a log of its state at each point of interest. In this case, before the loop and after each iteration. The program is executed many times with random inputs. The result is a collection of captured execution traces, represented in the figure as a table of variable values (𝑎𝑟𝑟, 𝑘𝑒𝑦, 𝑙𝑜, ℎ𝑖, 𝑖𝑑𝑥). This table of observed states serves as the empirical evidence for the discovery process. 17 4. How QuickInv Works Execute and record states Test candidates on states int binary_search(int* arr, int arr_len, int key) { int lo = 0; int hi = arr_len; int idx = -1; while (lo < hi) { int mid = (lo + hi) / 2; if (arr[mid] < key) { lo = mid + 1; } else if (arr[mid] > key) { hi = mid; } else if (arr[mid] == key) { idx = mid; break; } } return idx; } arr ↦[1,2,3,4,5] key ↦4 lo ↦0 hi ↦5 idx ↦-1 arr ↦[1,2,3,4,5] key ↦4 lo ↦3 hi ↦5 idx ↦-1 arr ↦[1,2,3,4,5] key ↦4 lo ↦3 hi ↦4 idx ↦-1 arr ↦[1,2,3,4,5] key ↦4 lo ↦3 hi ↦4 idx ↦3 arr ↦[1,2,8] key ↦6 lo ↦0 hi ↦3 idx ↦-1 arr ↦[1,2,8] key ↦6 lo ↦2 hi ↦3 idx ↦-1 arr ↦[1,2,8] key ↦6 lo ↦2 hi ↦2 idx ↦-1 ✗ lo = hi ✓ 0 ≤ lo ✗ lo = 0 ✗ idx = -1 ✗ key ∈ arr ✓ lo ≤ hi ✓ key ∈ arr = slice arr lo hi ✗ lo ∈ arr = slice arr lo hi Figure 2 : Illustration of QuickInvs pipeline: targeting a program invariant, program state data collection and testing candidate invariants. With this data in hand, QuickInv systematically generates and tests candidate invari- ants using QuickSpec. The figure depicts this as a list of potential properties, where each one is checked against the recorded states. Properties that fail for even one state (e.g., 𝑙𝑜 = ℎ𝑖) are falsified and crossed out. Those that hold true for the tested observed states (e.g., 0 ≤ 𝑙𝑜 and 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟 = 𝑘𝑒𝑦 ∈ 𝑠𝑙𝑖𝑐𝑒 𝑎𝑟𝑟 𝑙𝑜 ℎ𝑖) and are not derivable from previously discovered properties by simple reasoning are accepted (marked with a tick), and presented as likely invariants. The following sections detail the mechanics behind each part of this pipeline: how states are captured and represented, and how the two-phase exploration process effectively distinguishes program-specific invariants from general truths, while discussing key configuration options for QuickInv. 4.1 Capturing Program States from Execution Traces The foundation of any dynamic analysis is the data it observes. In this thesis, the target program is manually instrumented with logging statements to capture program states at a specific set of points 𝒫 of interest. For a loop invariant, this is before the loop, after each iteration, and after the loop. For a pre-condition and post-condition, it is 18 4. How QuickInv Works before and after the program execution respectively. The target program is executed many times with random inputs to collect the states data. QuickInv needs a user-defined representation of a program state of a target program. In the examples we represent a program state with a Haskell data type with named fields for each program variable and input of interest. For example, the program state for the binary search program in Listing 1 is represented by: data State = State { key :: Int , arr :: [Int] , lo :: Int , hi :: Int , idx :: Int } In Haskell, defining a data type like this automatically generates field access functions (also known as getters) for each field in the record. That means after this definition, the following functions are available: key :: State -> Int arr :: State -> [Int] lo :: State -> Int hi :: State -> Int idx :: State -> Int To gather data, QuickInv executes the user-instrumented target program multiple times with randomized inputs. This data is sampled by QuickInv to randomly test candidates on by evaluation. A naive approach of using a uniform distribution for inputs can be problematic, as it may overrepresented states from large inputs (e.g., large arrays) and miss crucial edge-case behaviors (e.g., empty arrays, single-element arrays). Running the program with the input 𝑘𝑒𝑦 ↦ 1, 𝑎𝑟𝑟 ↦ [1, 2, 3, 4] captures the program states: [State {key = 1, arr = [1,2,3,4], lo = 0, hi = 4, idx = -1}, State {key = 1, arr = [1,2,3,4], lo = 0, hi = 2, idx = -1}, State {key = 1, arr = [1,2,3,4], lo = 2, hi = 2, idx = 2}] while running the program with the input 𝑘𝑒𝑦 ↦ 5, 𝑎𝑟𝑟 ↦ [] captures the program states: [State {key = 5, arr = [], lo = 0, hi = 0, idx = -1}] The naive approach of collecting program states in a list is not sufficient to ensure that the program states are representative of the program’s behavior. In this case there are far more program states where the array is non-empty than there are program states where the array is empty; the likelihood of randomly sampling the state with empty array is 1 4 in this example. We want all the inputs that we run the program with to be equally likely to be sampled, but some inputs generates many more program states. Because we are targeting the loop, we capture far more program states where 𝑎𝑟𝑟 is non-empty (from the first run) than where it is empty (from the second run). This can 19 4. How QuickInv Works lead to missing edge cases, and as a result, discover false invariants that do not hold for all program states. For example, the invariant 0 < length arr would be more likely to be discovered, which is not true for all program states in 𝒮𝒫. To mitigate this, QuickInv stores program states in a nested array. The outer array is indexed by different inputs to the program being recorded, while the inner array contains the program states for each input. We sample a program state from this nested array by first sampling a random input and then sampling a random program state from the inner array. This approach ensures that we can sample states generated by small inputs more evenly, and states generated by large inputs are not overrepresented. In the binary search example, this nested array would look like: [ [State {key = 1, arr = [1,2,3,4], lo = 0, hi = 4, idx = -1}, State {key = 1, arr = [1,2,3,4], lo = 0, hi = 2, idx = -1}, State {key = 1, arr = [1,2,3,4], lo = 2, hi = 2, idx = 2}] , [State {key = 5, arr = [], lo = 0, hi = 0, idx = -1}] ] Sampling a program state from this nested array would first sample an index from the outer array, and then sample a random index from the inner array, making the likelihood of sampling the state where 𝑎𝑟𝑟 is empty 1 2 . We have used QuickCheck [12], a random testing library for Haskell, to generate random inputs for the target program. QuickCheck provides a powerful way to generate random inputs based on the types of the program variables, ensuring that the inputs are valid and cover a wide range of possible values. In our evaluation, we found that this approach significantly improves the quality of the discovered invariants, minimizing the discovery of false positives. For the example programs in this thesis, the target programs were executed with 5 000 random inputs to collect program states. Although there are no strict requirements on how program states are collected – for example, they could simply be printed to a file – we use Haskell’s foreign function interface (FFI) to call C programs (the implementation language of our example programs) from Haskell. A callback function is provided to collect the program states during execution. 4.2 Program States and Variables QuickInv represents program values as functions from program states to values. newtype Prog s a = Prog (s -> a) QuickInv exports the con and var functions to construct program variables and constants: con :: a -> Prog s a con = Prog . const var :: (s -> a) -> Prog s a var = Prog 20 4. How QuickInv Works We can now construct program variables and constants for binary search example as follows: var key :: Prog State Int var arr :: Prog State [Int] var lo :: Prog State Int var hi :: Prog State Int var idx :: Prog State Int con 0 :: Prog State Int con False :: Prog State Bool con [] :: Prog State [Int] From this point onward in the thesis, we overload program variables and constants such as key, arr, 0, [], etc., to refer to program values of type Prog s a. For example, in the expression key ∈ arr, we have key :: Prog s Int and arr :: Prog s [Int]. Operations on program variables   Functions on program variables are lifted versions of functions on normal values. For example, the length function is defined as a lifted function that takes a program value for an array and returns an integer program variable that is the length of the original program array. The lifted length function is defined as follows: plength :: Prog s [a] -> Prog s Int plength (Prog arr) = Prog (\s -> length (arr s)) With an Applicative Functor instance for Prog, we can lift functions easily. plength :: Prog s [a] -> Prog s Int plength = fmap length pslice :: Prog s [a] -> Prog s Int -> Prog s Int -> Prog s [a] pslice = liftA3 slice Comparing program variables and constants   Expressions involving lifted opera- tors and constants are of type Prog (a wrapped function s -> a) and cannot be compared directly. To compare program values we use observational equality; Two program values are equal if they are equal in all states. This is done in QuickSpec by providing a function observe that takes a program state and a program value and returns the value of the program value in that state. Program variables are now compared by comparing their values in a given state (t == u iff ∀s. observe s t == observe s u). instance (Ord a, Eq a, Arbitrary s) => Observe s a (Prog s a) where observe :: s -> Prog s a -> a observe state (Prog f) = f state To test whether the invariant key ∈ arr = key ∈ (slice arr lo hi) holds, QuickSpec will sample program states s randomly from the collected program states and evaluates whether observe s (key ∈ arr) = observe s (key ∈ (slice arr lo hi)) holds. This is an equality over Boolean values, which can be compared directly. If it samples a state where the left-hand side differs from the right-hand side, then the invariant is falsified. 21 4. How QuickInv Works Otherwise, if it is equal for all sampled states then the invariant is considered true and presented to the user. 4.3 Sequencing Theory Exploration Running QuickSpec with the program variables and constants defined above will result in discovering many laws about the program variables and constants. However, many of these laws are not useful as program invariants. For example, laws such as 0 = length [], x ≤ x, and 0 ≤ 1 are true for any program state – general mathematical truths. They are not entirely useless though; they are useful for reasoning about the program and pruning laws in QuickSpec. Knowing that x ≤ x, we do not need to test and report laws such as lo ≤ lo and hi ≤ hi, since they follow from x ≤ x. QuickSpec prunes laws that follow from previously discovered laws and avoid testing and reporting them to the user. QuickInv sequences two phases of theory exploration with QuickSpec. First, QuickSpec is run with functions and constants in the invariant grammar – discovering general laws about the grammar (e.g. sorted [] and slice a x x = []), which are general mathe- matical truths. We call this phase the background phase. The laws discovered in this first phase are used as background theory in the second phase, which is run with functions and constants in the invariant grammar, and in addition, the program variables. We call the second phase the invariant discovery phase. The pruner in QuickSpec uses the background theory to prune derivable laws, avoiding testing and reporting them to the user. For example, since the laws sorted [] and slice a x x = [] were discovered in the first run, the pruner will not test and report the law sorted (slice arr lo lo) in the second run, since it can be rewritten as sorted [] using slice a x x = []. We already know that sorted [] is true. The second phase discovers laws that involves program variables, e.g. key in arr = key in slice arr lo hi, which do not follow from any laws discovered previously. In QuickSpec, Testing effort is spent on final laws that are presented to the user, reducing the number of spurious laws leads to better performance as well, as testing a true law is the most expensive part of the process. The background phase for the binary search example discovers 98 laws, some of which being: 1. sorted [] 2. 0 = length [] 3. x ≤ x 4. x ∈ [] = False The invariant discovery phase of theory exploration discovers 22 laws, some of which being: 5. 0 ≤ lo 7. lo ≤ hi 9. hi ≤ length arr 13. key ∈ slice arr lo hi = key ∈ arr 22 4. How QuickInv Works These are the required loop invariants for the verification of the binary search program in Listing 1. 4.4 Configuring QuickInv QuickInv is highly configurable, allowing users to tailor the invariant discovery process to their program domain, balance performance and completeness, and shape the final output. These configurations are inherited from QuickSpec. This section outlines the key configuration options, useful for tuning the invariant discovery process. Invariant Grammar   The core configuration is the invariant grammar – the set of Haskell functions and constants that defines the space of possible invariants. QuickInv provides a standard prelude of common functions (see Section 5.1), but users can modify this by adding or removing definitions, such as domain-specific operators or arithmetic functions. The design of this grammar is crucial and is discussed in detail in Section 5. Background Theory   QuickInv supports the use of a background theory – a set of known equational laws used to eliminate redundant properties. Users can provide their own axioms, especially for properties that QuickSpec cannot discover, such as the transitivity of ≤ (𝑥 ≤ 𝑦 ∧ 𝑦 ≤ 𝑧 ⇒ 𝑥 ≤ 𝑧). These help reduce spurious and redundant invariants and improve output quality. Filtering Terms and Properties   To reduce search complexity and irrelevant results, QuickInv allows custom filters. Enumerator filters discard syntactically unhelpful terms – like nested slice calls (slice (slice ...)) which are equivalent to slicing with an offset – which can speed up the search. Print filters remove discovered laws that are technically valid but not meaningful in context from being presented to the user. Tuning the Exploration Engine   QuickInv inherits key tuning parameters from QuickSpec. The most important is the maximum term size, which controls the com- plexity of discoverable invariants. Larger limits uncover more complex laws but expand the search space rapidly. Users can also configure the number of test cases used to validate properties – higher values increase confidence but also slow down execution. 23 4. How QuickInv Works 24 5 Designing a Grammar This section describes the grammar used in QuickInv, followed by some of the challenges in designing an expressive grammar that works well for finding invariants using theory exploration; for example, how to deal with partial functions and how to avoid redundant and false laws. 5.1 Program Invariant Grammar Designing a good set of constants and functions to be able to discover program invariants is essential. If an invariant cannot be expressed with the functions and constants and program variables, we will not be able to discover it. Additionally, some functions and constants are useful during reasoning and pruning. To obtain a compact set of laws presented to the user, we also need to discover such general mathematical truths. An obvious grammar could include low-level functions and constants such as ∧, ∨, ⇒, ≤, <, ¬, ∀𝑘, ≠, −, 𝑎𝑟𝑟[? ], 0, 1, 𝑡𝑟𝑢𝑒. We can express that arr is sorted between index 0 and 𝑖 with (∀𝑘.0 < 𝑘 < 𝑖 ⟹ 𝑎𝑟𝑟[𝑘 − 1] ≤ 𝑎𝑟𝑟[𝑘]) = 𝑡𝑟𝑢𝑒 in equation form. In our method, we instead include high-level functions for arrays such as slice, ∈, arr[?], length, sorted, 0, True, ≤. We can express that arr is sorted between 0 and 𝑖 with sorted (slice arr 0 i) = True as an equation. The latter grammar has two advantages. Firstly, it is more intuitive and easier to understand compared to the former, which needs some deciphering to understand. Secondly, the term size needed to express the former is much larger than the latter. The equation in the former grammar has a left-hand side term size of 14. The equation in the latter grammar has an expression size of 5. Since, QuickSpec iterates all terms up to a term size, with the former grammar many useful terms would quickly become infeasible to generate because of the explosive blowup in the number of combinations needed to consider. We specialize the grammar to the domain of array programs and, in that way minimize the expression size while allowing expression of intuitive relationships. We also avoid quantifiers like ∀ and ∃ by limiting our domain to finite array programs – we can test finite arrays instead of relying on quantifiers. Although QuickSpec can discover invariants with universal quantification, we avoid them in the invariants we present to 25 5. Designing a Grammar the user. With this method, we limit the expressivity, but with a well-designed grammar for the target domain, all useful relationships can hopefully be expressed. QuickInv can be specialized for other domains by choosing a different set of functions and constants. Reasoning about arrays requires common operations on arrays and ways to compare arrays, their elements, and their properties. Previous work by Reynolds [9] has found useful operations and their properties for reasoning about arrays. Although Reynolds’ work is not directly applicable to our problem, it provides a good starting point for designing our grammar. We can use the operations and properties found by Reynolds as a basis for our grammar. Reynolds treats arrays as functions from indices to values. We instead treat arrays as a list of values, which is more convenient for our purpose. We have defined a grammar prelude for QuickInv with useful functions and constants for expressing invariants common in many array programs. The prelude can be seen as a good starting point, but functions and constants can easily be added or removed as needed. QuickInv Prelude • True :: Bool - true • False :: Bool - false • 0 :: Int - zero • 1 :: Int - one • [] :: [Int] - empty array The prelude contains the elementary constants like True and False, 0, 1 and []. The constant True lets us discover predicates as equational laws, for example, sorted arr = True, but these laws will be presented as just sorted arr to the user. • slice :: [Int] -> Int -> Int -> [Int] - slicing an array • !! :: [Int] -> Int -> Int - array indexing • length :: [Int] -> Int - length of an array • ∈ :: Int -> [Int] -> Bool - membership of an element in an array • sorted :: [Int] -> Bool - whether an array is sorted Our grammar includes array operations such as slicing, indexing, and array length. The constant 0 and length arr are useful for slicing from the beginning or to the end of the array, respectively. The grammar also includes the predicates ∈ and sorted, which are useful for algorithms that reason about array containment or sortedness, like searching or sorting algorithms. • ≤ :: Int -> Int -> Bool - less than or equal to Moreover, the grammar includes operators to express inequalities, so we can find invariants like 0 ≤ 𝑖 ≤ | 𝑎𝑟𝑟| with the laws 0 ≤ i = True and i ≤ length arr = True. The relation < is excluded from the grammar because x < y can be expressed with the equation y ≤ x = False and having multiple ways to express the same thing is generally undesired. A final post-processing step is used to present y ≤ x = False as x < y to the user. 26 5. Designing a Grammar • ≤* :: [Int] -> [Int] -> Bool - point-wise extension of ≤ to arrays • ≤* :: Int -> [Int] -> Bool - point-wise extension of ≤ to arrays with single argument on the left-hand side • ≤* :: [Int] -> Int -> Bool - point-wise extension of ≤ to arrays with single argument on the right-hand side We also include the ≤* operator that is the point-wise extension of ≤ for arrays as described by Reynolds [9]. This lets us express that elements in one array are less than or equal to all the elements in another array; for example, slice arr 0 i ≤* slice arr i (length arr). Additionally, we have the ≤* operator with a single element on either side of the operator, which lets us compare a single element with an array. For example, that x is the smallest element of an array can be expressed as x ≤* arr = True. • multiset :: [Int] -> IntMultiSet - array to multiset conversion • ∪ :: IntMultiSet -> IntMultiSet - union of multisets Lastly, the prelude also includes functions for multisets, which is useful for expressing invariants that an algorithm preserve elements in an array. For example, that a sorting algorithm returns an array with the same elements as the input can be expressed as multiset arr = multiset old_arr. An alternative way of expressing this would be to have a permutation function in the grammar, but the downside of this would be larger term sizes; for example, the law multiset (slice arr p r) = multiset (slice old_arr p r) with a term size of 5 would instead be permutation (slice arr p r) (slice old_arr p r) = True with a term size of 9. Arithmetic Extension • + :: Int -> Int -> Int - addition • - :: Int -> Int -> Int - subtraction Arithmetic operations will significantly increase the number of possible terms and number of invariants found, and is therefore offered as an extension. It is recommended to first try to search for invariants without arithmetic operations and then add them if needed. The grammar can easily be customized to work in different domains by adding any custom Haskell function to the grammar or exclude any of the predefined functions. 5.2 Dealing with Partial Functions (Slice and Indexing) One of the challenges of designing a grammar that works well with theory exploration is how to deal with partial functions. Partial functions are not defined for all possible inputs. For example, the slice function is not defined for negative indices, indices that are out of array bounds and indices where the first index is larger than the second index. The definition of slice in Haskell would look something like: slice :: [a] -> Int -> Int -> [a] slice a x y | not (0 <= x && x <= y && y <= length a) = error "slice: invalid input" 27 5. Designing a Grammar | otherwise = [a !! i | i <- [x..y - 1]] Using this definition of slice will result in unwanted trivial loop invariants like slice arr lo lo = [] and slice arr hi hi = [] to be discovered. This is because there are many general laws about slice that need valid inputs to hold, for example, slice a x x = []. When this law is absent from the background laws will result in these unwanted invariants to be discovered. This general law cannot be found by QuickSpec unless we provide a predicate 0 ≤ x ≤ length a. Otherwise, during testing, QuickSpec will try to QuickCheck the law slice a x x = [] which will error out when it tries to test with an invalid input for x (e.g. x = -1) and the test will fail with a runtime error. A runtime error is treated as a falsification of the law in QuickSpec. Generally, the condition required for a law involving a function is that the input is valid and that the input is in the domain of the function. For total functions this condition is always true, but for partial functions this is not the case. Therefore, we need to be able to express the condition that the input is valid. A general way to do this is to use a predicate for valid inputs for the partial function. For example, we can define a predicate: slice_dom a x y = 0 <= x && x <= y && y <= length a Now QuickSpec can find more laws such as slice_dom a x y => slice a x x = []. There is, however, a problem with this approach. We cannot find laws with other conditions that reuse the variables a, x, and y due to the limitation of how conditionals are handled in QuickSpec. Any law that needs additional conditions on any of the input to slice cannot be expressed in QuickSpec. For example, the law slice_dom a x y && sorted a => sorted (slice a x y) cannot be found this way because a is used in the predicate slice_dom and sorted. We can find more conditional laws about total functions than we can about partial functions since one condition almost always required for partial functions is that the input is valid. The question is if we can find laws about partial functions without needing the valid input condition. If we can, then we will be able to find conditional laws just like total functions. 5.2.1 Combined Predicates One potential solution to this problem is to combine the predicates in the grammar and generate more predicates conjuncted with slice_dom. If the grammar has the predicate sorted we can also generate the predicate slice_dom_and_sorted a x y = slice_dom a x y && sorted a. This way we can find laws such as slice_dom_and_sorted a x y => sorted (slice a x y), but the number of predicates now is at least twice the number of predicates without these combined predicates. This slows down QuickSpec substantially because conditional laws are slow in QuickSpec, due to the way they are checked during pruning. 28 5. Designing a Grammar 5.2.2 Generating Valid Input Another, possibly better, method is to restrict the input to the valid range during testing. This method is used for predicates, where a separate generator is used for generating random values fulfilling the predicate. The same method could help with partial functions, if it is possible. For example, the law length (slice a x y) = y - x will hold for all x, y and a if the inputs tested are in the domain of slice. Similarly, the law sorted a => sorted (slice a x y) can be discovered without the domain condition. The difficulty in this method lies in the implementation. How can we handle a property with multiple instances of partial functions? Can we lift the restriction that the inputs are all variables? For example, how do we generate input for the property slice (slice a x y) y z = slice a x z? The input validity conditions for the property are: 0 <= x <= y <= length a && 0 <= y <= z <= length (slice a x y) && 0 <= x <= z <= length a How can we generate valid input for this property? We need to combine multiple generators and generate values that all the generators would be able to generate. Is it possible to generate valid indices for the outer slice in the left-hand side, since the array input is a term (slice a x y), not a variable? These challenges highlight that implementing this method is difficult. 5.2.3 Making Partial Functions Total Another approach to dealing with partial functions is to extend their definition to handle invalid inputs, making them total functions. Clamping One way to make slice total is to define it to clamp the input indices to the valid range: slice xs i j | not (0 <= i && i <= j && j <= length xs) = slice xs (max 0 i) (min (length xs) j) | otherwise = [xs !! k | k <- [i..j - 1]] This definition of slice allows discovering of the laws slice a x x = [] and sorted a => sorted (slice a x y), but this hinders laws about the length of the slice because of the clamping behavior; for example the law x ≤ y => length (slice a1 x y) = y - x since length (slice [1,2,3] 0 5) = length [1,2,3] = 3 ≠ 5 - 0. Randomness 29 5. Designing a Grammar A slice definition that preserves the length property is one that uses random values for invalid input: index_dom :: [a] -> Int -> Bool index_dom a x = 0 <= x && x < length xs slice :: (Int -> a) -> [a] -> Int -> Int -> [a] slice def a x y | y <= x = [] | otherwise = [if index_dom a i then a !! i else def i | i <- [x..y - 1]] The argument def is a random function that provides a value for indices outside the original array’s bounds. The idea is to make the function total, without creating any spurious laws about the structure of the array for invalid input. Using random values is a way to model undefindness during testing, and for a law to hold, the law needs to hold for all def functions. With this definition of slice, we can find the laws about length. However, we can no longer find sorted a => sorted (slice a x y) without the domain constraint. For example slice def [1,2,3] (-1) 5 could evaluate to [8,1,2,3,0,7], which is no longer sorted. The absence of this law will result in unwanted redundant loop invariants like sorted (slice arr 0 lo), sorted (slice 0 hi) and sorted (slice lo hi) for binary search. Replicate A total definition of slice that preserves both the length property and the sorted sub- slice property extends the array by replicating the values at its ends; for example, slice def [1,2,3] (-1) 5 = [1,1,2,3,3,3]. slice :: a -> [a] -> Int -> Int -> [a] slice def [] x y = slice def [def] x y slice def a x y | y <= x = [] | x < 0 = replicate (min (-x) y - x) (head a) ++ slice def a 0 y | y > length a = slice def a x (length a) ++ replicate (min (y - length a) y - x) (last a) | otherwise = [a !! i | i <- [x..y - 1]] The benefit of this definition of slice is that it can find both x ≤ y => length (slice a x y) = y - x and sorted a => sorted (slice a x y). When slicing an empty array, there are no values to replicate. Returning an empty array in this case would invalidate the law about the length of the slice. In this case, we instead use a single random value that is replicated for the array slice. This is the definition of slice that is used by QuickInv, and the index function is also based on this definition. However, this definition is still not perfect. It finds some laws about how it behaves for invalid inputs, like sorted (a x 0) and sorted (slice a (length a) x). If array concatenation was included in the grammar, this definition 30 5. Designing a Grammar would not work for laws like slice a 0 x ++ slice a x (length a) = a; for example, slice [1,2,3] 0 5 ++ slice [1,2,3] 5 3 = [1,2,3,3,3]. 5.2.4 Too General Invariants Unfortunately, when using the extended total function of slice, we often discovered invariants with array slicing in a more general form with free variables instead of program invariants. For example, when evaluated on selection sort implemented in Listing 3, we would like to discover the loop invariant 𝑠𝑜𝑟𝑡𝑒𝑑 𝑎𝑟𝑟[0..𝑖), but this loop invariant was discarded because it was already known from the more general law ∀𝑥. 𝑠𝑜𝑟𝑡𝑒𝑑 𝑎𝑟𝑟[𝑥..𝑖). This more general form follows from the fact that all sub-array slices of a sorted array slice are themselves sorted. Another example is the loop invariant 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟[𝑙𝑜..ℎ𝑖) = 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟 for binary search. This law was also discarded because it was already known since it could be derived from two other laws: ∀𝑥. 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟[𝑙𝑜..𝑥) = 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟[0..𝑥) and 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟[0..ℎ𝑖) = 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟. The derivation steps are shown in Figure 3. The more general forms of this invariant follow from the fact that if an array slice contains an element, then a larger slice containing the original slice will also contain that element. ∀𝑥. 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟[𝑙𝑜..𝑥) = 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟[0..𝑥) 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟[𝑙𝑜..ℎ𝑖) = 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟[0..ℎ𝑖) Instantiate with 𝑥 ↦ ℎ𝑖 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟[𝑙𝑜..ℎ𝑖) = 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟 Apply 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟[0..ℎ𝑖) = 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟 (11) Figure 3 : Derivation of the invariant 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟[𝑙𝑜..ℎ𝑖) = 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟 of binary search. To circumvent this undesired behavior, we use the extended total function definition of slice in the background phase, but in the invariant discovery phase, we instead use the partial definition of slice. The partial definition in the invariant discovery phase will cause laws like ∀𝑥. 𝑠𝑜𝑟𝑡𝑒𝑑 𝑎𝑟𝑟[𝑥..𝑖) to result in a runtime error for invalid values of 𝑥 during testing and therefore be treated as a falsification of the law by QuickSpec. Laws like 𝑠𝑜𝑟𝑡𝑒𝑑 𝑎𝑟𝑟[0..𝑖) will now be discovered because ∀𝑥. 𝑠𝑜𝑟𝑡𝑒𝑑 𝑎𝑟𝑟[𝑥..𝑖) is discarded and the array slice from 0 to 𝑖 is always valid during testing. We will also avoid laws like 𝑠𝑜𝑟𝑡𝑒𝑑 𝑎𝑟𝑟[1..𝑖) since the array slice will be invalid when testing against a program state where 𝑖 is 0. 5.3 Too Many Laws Many of the laws for loop invariants are not necessary for verifying the program, and we would like to avoid as many redundant laws as possible. A strong background theory allows us to prune away laws that are trivially true. Our slice definition lets us remove trivial laws for binary search, such as slice arr lo lo = [], length (slice arr 0 hi) = hi, and sorted (slice arr 0 lo), but we find other examples of redundant laws for the example programs we tested. For example, in selection sort (implemented in Listing 3), we have the loop invariant multiset old_arr = multiset arr, but we also get the law length old_arr = length arr, 31 5. Designing a Grammar which should follow from the first law. To prune away this law, a background law like multiset a1 = multiset a2 => length a1 = length a2 is needed, but we do not discover it because QuickSpec cannot find this implication with an equation as the condition. Another redundant law in selection sort that is harder to prune away is sorted (slice arr i (length arr)) = sorted arr, and this law follows from the loop invariants sorted (slice arr 0 i) and slice arr 0 i ≤* slice arr i (length arr). To get rid of it, we would need the background laws sorted a1 & a1 ≤* a2 => sorted a2 = sorted (a1 ++ a2) and slice a 0 x ++ slice a x (length a) = a, but the first law has reused variables in the condition and is therefore undiscoverable, and the second law does not hold for our slice definition when x is outside its valid range. Sometimes, there will be laws that are not needed to verify the algorithm, but they are not redundant in the sense that they follow from other laws. For example, if we add arithmetic operations to binary search, we discover the law lo ≤ (hi - lo) = lo ≤ 0, which given 0 ≤ 𝑙𝑜 ≤ ℎ𝑖 can be reformulated as 𝑙𝑜 ≠ 0 ⟹ ℎ𝑖 < 2 ⋅ 𝑙𝑜. This law follows from what lo and hi states are reachable in binary search, but it is not a loop invariant required to verify the algorithm. This means that there can be laws about the behavior of the implementation that are of no interest when verifying its correctness. In binary search, we also discover redundant laws like key ∈ slice arr 0 hi and key ∈ slice arr lo (length arr), which follow from key ∈ slice arr lo hi and 0 ≤ lo ≤ hi ≤ length arr. These laws follow from that if an array slice contains an element, then a larger slice containing the original slice will also contain that element, which is a type of monotonicity. Pruning away these redundant laws would need a complex background law like x0 ∈ slice a x1 x2 & x3 ≤ x1 & x2 ≤ x4 => x0 ∈ slice a x3 x4. This law is undiscoverable because it reuses variables in the conditions and uses more than the four variables that QuickSpec can handle by default; increasing this value would have a performance cost. An alternative, more general approach for redundant laws like this is to reason about monotonicity and antitonicity. 5.3.1 Monotonicity and Antitonicity We have noticed that the conditional laws needed to reason about some of the behavior of array (especially for reasoning about slice) is that we need some laws following from monotonicity and antitonicity. For example to reason about ∈ we need the law a ⊑ b => (x ∈ a ==> x ∈ b) which is monotonicity of x ∈ with the partial order ⊑ on arrays (infixOf operator) and ==> as the partial order on Booleans (implication). For this to work, we need to have implication as a function in the grammar, which is inefficient. Another one is about sortedness a ⊑ b => (sorted b ==> sorted a) which says that sorted is an antitone function with respect to the partial order ⊑ on arrays. We also have that a ⊑ b => (length a ≤ length b) which is monotonicity of length. It is potentially interesting to consider that monotonicity and antitonicity properties can be used to embed order for one type in another type. Thus, this fact can be used to avoid order information about one type if one has the order in another type. The 32 5. Designing a Grammar information for one type can be used to reason about the order in another type (order embedding). 5.3.2 Semilattices We have also looked at how we can use functions that have the structure of semilattices in the grammar, in order to reason about redundant laws. A semilattice consists of a set with an associated binary operation that is associative, commutative, and idempotent. One example is the longest common subsequence (LCS) operation on a set of all integer arrays. LCS returns the longest subsequence common of two sequences, for example, lcs [1,2,3,4,5] [1,3,5,7] = [1,3,5]. Sometimes, there will be multiple subsequences that are the longest, for example, [1] and [2] are both valid results for lcs [1,2] [2,1]. In order for the commutativity property (lcs x y = lcs y x) to hold, the returned subsequence cannot depend on the argument order. This can be done by, for example, implementing LCS, so it returns the longest subsequence that is the lexicographically smallest. LCS can be used to reason about sortedness about sub-slices. We can find the back- ground law sorted a1 => sorted (lcs a1 a2). For example, in binary search, we have the redundant law sorted (slice arr lo hi), which follows from the pre-condition sorted arr. If we discover that lcs arr (slice arr lo hi) = slice arr lo hi, we can apply the law sorted a1 => sorted (lcs a1 a2) where a1 is arr and a2 is (slice arr lo hi) to get sorted arr => sorted (slice arr lo hi) and therefore know that the original law was redundant. A limitation with this approach is that we discover sorted (slice arr lo hi) before we know it is redundant because lcs arr (slice arr lo hi) = slice arr lo hi is discovered later due to its larger term size. Compared to the background law sorted a => sorted (slice a x y) that we find with the clamping and replicate version of slice, a benefit with this approach is that it will also work for sub- slices of slices. For example, in merge sort, we have two sorted slices before merging them, and any laws about sub-slices of them being sorted are redundant. 5.4 Undesirable Terms in the Grammar In the grammar, we have the operator ≤* that is the point-wise extension of ≤ on arrays, which lets us express that elements in one array are less than or equal to all the elements in another array. It may also be useful to have invariants that a single element is less than or equal to all elements in an array slice. There are two ways of expressing this; we could add a version of the operator that takes a single integer on the left-hand side, or we could add the singleton function that takes an integer and returns a singleton array. This second approach has an undesirable effect on the grammar that equality x = y, can be expressed with the term x ∈ singleton y, which we do not want. QuickSpec is designed to find equalities efficiently without having to test every combination, and having equality expressible in the grammar would lead to inefficiencies and undesired laws like idx ∈ singleton (length arr) = False. However, this effect can easily be 33 5. Designing a Grammar avoided by adding an enumeration filter that filters out terms containing ∈ singleton, as described in Section 4.4. 5.5 False Laws Another problem is the discovery of false laws. The laws are discovered through random testing, which allows incorrect laws to pass through. Discovered false laws can affect which laws are explored next, potentially leading to the discovery of more false laws and preventing the discovery of true laws as they are pruned away by false laws. An example of a false law is length arr ≤ 1 = arr ≤* arr that sometimes appears. The expression arr ≤* arr is only true when all the elements in arr are equal to each other, and this is always true when arr is empty or contains a single element. This coincides also with length arr ≤ 1. When arr is a larger array, it is unlikely that all elements are generated to the same value. Therefore, to falsify this law, the generated arr needs to have a length of at least two, and all elements need to have the same value, for example arr = [7,7]. This law can pass thousands of test cases before being falsified. The number of false laws can be reduced by increasing the number of test cases and improving the quality of program states. Increasing the number of test cases will reduce performance, and the number of times the program is executed to capture states will also need adjustments to get enough states. To achieve higher quality program states, they should generally have a good distribution between small and large arrays. The distribution of values within arrays should not be too similar to the ranges of other program variables. The generated program input should result in a good distribution of different execution paths of the algorithm; for example, the input for binary search should cover both cases when the element is found and when it is not. 34 6 Evaluation on Example Algorithms The following section summarizes how QuickInv performs on some example algorithms with the loop invariants needed to verify the correctness of the algorithms using a verification tool such as Dafny [4]. QuickInv is evaluated on a system with an AMD Ryzen 9700X processor and 32 GB of RAM. The algorithms are executed 5 000 times with randomized inputs to collect the program states, and a law is required to pass 10 000 test-cases to be accepted. 6.1 Binary Search For binary search from the introduction implemented in Listing 1, the loop invariants are 0 ≤ 𝑙𝑜 ≤ ℎ𝑖 ≤ |𝑎𝑟𝑟| and 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟[𝑙𝑜..ℎ𝑖) = 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟, and QuickInv can find both of them. The grammar used is the full QuickInv Prelude defined in Section 5.1, and the constant −1 is added because it appear in the program. For this grammar, QuickInv will find 98 background laws in the background phase, and 22 loop invariant candidates was found in the invariant discovery phase. Four out of these 22 loop invariant candidates were required to verify the algorithm’s correctness, namely 0 ≤ lo, lo ≤ hi, hi ≤ length arr, and key ∈ slice arr lo hi = key ∈ arr. The invariant candidates key ∉ slice arr 0 lo and key ∉ slice arr hi (length arr) are also discovered, which can be used instead of key ∈ slice arr lo hi = key ∈ arr to verify the algorithm. For this example, QuickInv is configured to search for laws up to a term size of 7, which is the default for QuickInv. The term size is determined by the maximum number of functions, constants and variables on one side of the equal sign; for example, the law key ∈ slice arr lo hi = key ∈ arr has a term size of 6. A summary of the QuickInv results is presented in Table 1. 35 6. Evaluation on Example Algorithms Table 1 : QuickInv results for binary search. Grammar QuickInv Prelude -1 :: Int Loop Invariants Is Found Law(s) Found by QuickInv 0 ≤ 𝑙𝑜 ≤ ℎ𝑖 ≤ |𝑎𝑟𝑟| Yes 0 ≤ lo, lo ≤ hi, hi ≤ length arr 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟[𝑙𝑜..ℎ𝑖) = 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟 Yes key ∈ slice arr lo hi = key ∈ arr Number of Background Laws 98 Number of Invariant Laws 22 Invariant Laws of Interest 4/22 Max Term Size 7 (default) Execution Time 38 seconds 6.2 Linear Search The second example algorithm is linear search, implemented in Listing 2. If the searched value 𝑘𝑒𝑦 does not appear in the array, then the special value −1 is returned. This special value is also added to the grammar. The example has the loop invariants 𝑖𝑑𝑥 ≠ −1 ⟹ 𝑘𝑒𝑦 = 𝑎𝑟𝑟[𝑖𝑑𝑥] and 𝑖𝑑𝑥 = −1 ⟹ 𝑘𝑒𝑦 ∉ 𝑎𝑟𝑟[0..𝑖). These invariants have a condition on whether idx is equal to −1 or not, and such conditional invariants can currently not be found by QuickInv. However, QuickInv could find an alternative invariant without an implication for 𝑖𝑑𝑥 = −1 ⟹ 𝑘𝑒𝑦 ∉ 𝑎𝑟𝑟[0..𝑖), which is 𝑘𝑒𝑦 ∈ 𝑎𝑟𝑟[0..𝑖) = 0 ≤ 𝑖𝑑𝑥. For linear search, five of six invariant candidates were required to verify the algorithm. The one that was not required was 𝑖𝑑𝑥 < 𝑖. A summary of the test results is presented in Table 2. 36 6. Evaluation on Example Algorithms Table 2 : QuickInv results for linear search. Grammar QuickInv Prelude -1 :: Int Loop Invariants Is Found Law(s) Found by QuickInv 0 ≤ 𝑖 ≤ |𝑎𝑟𝑟| Yes 0 ≤ i, i ≤ length arr −1 ≤ 𝑖𝑑𝑥 < |𝑎𝑟𝑟| Yes (-1) ≤ idx, idx < length arr 𝑖𝑑𝑥 ≠ −1 ⟹ 𝑘𝑒𝑦 = 𝑎𝑟𝑟[𝑖𝑑𝑥] No (inexpressible) 𝑖𝑑𝑥 = −1 ⟹ 𝑘𝑒𝑦 ∉ 𝑎𝑟𝑟[0..𝑖) Yes (alternative invariant) key ∈ slice arr 0 i = 0 ≤ idx Number of Background Laws 98 Number of Invariant Laws 6 Invariant Laws of Interest 5/6 Max Term Size 7 (default) Execution Time 34 seconds int linear_search(int* arr, int arr_len, int key) // post: -1 <= idx < arr_len // post: idx != -1 ==> key == arr[idx] // post: idx == -1 ==> not (key in arr) { int idx = -1; int i = 0; while (i < arr_len) // inv : 0 <= i <= arr_len // inv: -1 <= idx < arr_len // inv: idx != -1 ==> key == arr[idx] // inv: idx == -1 ==> not (key in arr[0..i)) { if (arr[i] == key) { idx = i; } i = i + 1; } return idx; } Listing 2 : Implementation of linear search. 6.3 Selection Sort For selection sort implemented in Listing 3, QuickInv finds all three loop invariants. In order to discover the invariant 𝑎𝑟𝑟[0..𝑖) ≤* 𝑎𝑟𝑟[𝑖..|𝑎𝑟𝑟|), we needed to increase the term size to 10, which resulted in QuickInv taking around over an hour to execute due to the exponentially larger search space. 37 6. Evaluation on Example Algorithms Table 3 : QuickInv results for selection sort. Grammar QuickInv Prelude Loop Invariants Is Found Law(s) Found by QuickInv 0 ≤ 𝑖 ≤ |𝑎𝑟𝑟| Yes 0 ≤ i, i ≤ length arr 𝑠𝑜𝑟𝑡𝑒𝑑 𝑎𝑟𝑟[0..𝑖) Yes sorted (slice arr 0 i) 𝑎𝑟𝑟[0..𝑖) ≤* 𝑎𝑟𝑟[𝑖..|𝑎𝑟𝑟|) Yes slice arr 0 i ≤* slice arr i (length arr) 𝑚𝑢𝑙𝑡𝑖𝑠𝑒𝑡(𝑎𝑟𝑟) = 𝑚𝑢𝑙𝑡𝑖𝑠𝑒𝑡(𝑜𝑙𝑑(𝑎𝑟𝑟)) Yes multiset old_arr = multiset arr Number of Background Laws 503 Number of Invariant Laws 16 Invariant Laws of Interest 5/16 Max Term Size 10 Execution Time 1 hour, 8 minutes and 28 seconds Our implementation of selection sort has a helper function find_min that finds where the minimum element is in the unsorted part of the array. To find the invariants for this helper function, the program states are captured by running the selection sort function and capturing the states in the helper function. It would also be possible to run the helper function directly. The QuickInv results for the helper function are shown in Table 4. Table 4 : QuickInv results for the helper function find_min of selection sort. Grammar QuickInv Prelude Loop Invariants Is Found Law(s) Found by QuickInv 𝑖 ≤ 𝑗 ≤ |𝑎𝑟𝑟| Yes i ≤ j, j ≤ length arr 𝑖 ≤ 𝑚 < |𝑎𝑟𝑟| Yes i ≤ m, m < length arr 𝑎𝑟𝑟[𝑚] ≤* 𝑎𝑟𝑟[𝑖..𝑗) Yes (arr !! m) ≤* slice arr i j Number of Background Laws 130 Number of Invariant Laws 44 Invariant Laws of Interest 4/44 Max Term Size 8 Execution Time 1 minute and 40 seconds 38 6. Evaluation on Example Algorithms void selection_sort(int* arr, int arr_len) // post: sorted arr // post: multiset(arr) == multiset(old(arr)) { int i = 0; while (i < arr_len) // inv: 0 <= i <= arr_len // inv: sorted arr[0..i) // inv: arr[0..i) <=* arr[i..arr_len) // inv: multiset(arr) == multiset(old(arr)) { int m = find_min(arr, arr_len, i); int t = arr[m]; arr[m] = arr[i]; arr[i] = t; i = i + 1; } } int find_min(int* arr, int arr_len, int i) // pre: 0 <= i < arr_len // post: i <= m < arr_len // post: arr[m] <= arr[i..arr_len) { int j = i; int m = i; while (j < arr_len) // inv: i <= j <= arr_len // inv: i <= m < arr_len // inv: arr[m] <=* arr[i..j) { if (arr[j] < arr[m]) { m = j; } j = j + 1; } return m; } Listing 3 : Implementation of selection sort. 6.4 Merge Sort For merge sort implemented in Listing 4, QuickInv finds 11 of 14 loop invariants. Of the three invariants that we cannot find, two of them have inexpressible implications and one is too large. The loop invariant 𝑚𝑢𝑙𝑡𝑖𝑠𝑒𝑡(𝑎𝑟𝑟[𝑝..(𝑝 + 𝑘))) = 𝑚𝑢𝑙𝑡𝑖𝑠𝑒𝑡(𝑢[0..𝑖)) ∪ 𝑚𝑢𝑙𝑡𝑖𝑠𝑒𝑡(𝑣[0..𝑗)) has a term size of 11, and it caused performance and memory issues when trying to discover it. When the maximum term size was increased to 11, QuickInv searched for background laws for over two hours and then terminated because it ran out of system memory. 39 6. Evaluation on Example Algorithms For this algorithm, we discover way too many invariant candidates, 756 laws and only a few are of interest. The amount of program variables and that addition and subtraction are in the grammar is one contributing factor to why there are so many laws found for this algorithm. The results are summarized in Table 5. Table 5 : QuickInv results for merge sort. Grammar QuickInv Prelude Arithmetic Extension Loop Invariants Is Found Law(s) Found by QuickInv 0 ≤ 𝑘 ≤ 𝑟 − 𝑝 Yes 0 ≤ k, k ≤ (r - p) 0 ≤ 𝑖 ≤ 𝑞 − 𝑝 Yes 0 ≤ i, i ≤ length u, q - p = length u 0 ≤ 𝑗 ≤ 𝑟 − 𝑞 Yes 0 ≤ j, j ≤ length v, r - q = length v 𝑘 = 𝑖 + 𝑗 Yes k = i + j 𝑠𝑜𝑟𝑡𝑒𝑑 𝑢 Yes sorted u 𝑠𝑜𝑟𝑡𝑒𝑑 𝑣 Yes sorted v 𝑠𝑜𝑟𝑡𝑒𝑑 𝑎𝑟𝑟[𝑝..(𝑝 + 𝑘)) Yes sorted (slice arr p (k + p)) 𝑖 < 𝑞 − 𝑝 ⟹ 𝑎𝑟𝑟[𝑝..(𝑝 + 𝑘)) ≤* 𝑢[𝑖] No (inexpressible) 𝑗 < 𝑟 − 𝑞 ⟹ 𝑎𝑟𝑟[𝑝..(𝑝 + 𝑘)) ≤* 𝑣[𝑗] No (inexpressible) 𝑎𝑟𝑟[0..𝑝) = 𝑜𝑙𝑑(𝑎𝑟𝑟[0..𝑝)) Yes slice old_arr 0 p = slice arr 0 p 𝑎𝑟𝑟[𝑟..|𝑎𝑟𝑟|) = 𝑜𝑙𝑑(𝑎𝑟𝑟[𝑟..|𝑎𝑟𝑟|)) Yes slice old_arr r (length arr) = slice arr r (length arr) 𝑚𝑢𝑙𝑡𝑖𝑠𝑒𝑡(𝑎𝑟𝑟[𝑝..(𝑝 + 𝑘))) = 𝑚𝑢𝑙𝑡𝑖𝑠𝑒𝑡(𝑢[0..𝑖)) ∪ 𝑚𝑢𝑙𝑡𝑖𝑠𝑒𝑡(𝑣[0..𝑗)) No (too large term) 𝑢 = 𝑜𝑙𝑑(𝑎𝑟𝑟[𝑝..𝑞)) Yes u = slice old_arr p q 𝑣 = 𝑜𝑙𝑑(𝑎𝑟𝑟[𝑞..𝑟)) Yes v = slice old_arr q r Number of Background Laws 118 Number of Invariant Laws 756 Invariant Laws of Interest 16/756 Max Term Size 7 (default) Execution Time 12 minutes and 2 seconds void merge(int* arr, int arr_len, int p, int q, int r) // pre: 0 <= p < q < r <= arr_len // pre: sorted arr[p..q) && sorted arr[q..r) // post: sorted arr[p..r) // post: multiset(arr[p..r)) == multiset(old(arr[p..r))) 40 6. Evaluation on Example Algorithms // post: arr[0..p) == old(arr[0..p)) // post: arr[r..arr_len) == old(arr[r..arr_len]) { int u[q - p]; copy(a, u, p, q); int v[r - q]; copy(a, v, q, r); int i = 0; int j = 0; int k = 0; while (k < r - p) // inv: 0 <= k <= r - p // inv: 0 <= i <= q - p // inv: 0 <= j <= r - q // inv: k == i + j // inv: sorted u // inv: sorted v // inv: sorted arr[p..p + k) // inv: i < q - p ==> arr[p..(p + k)) <=* u[i] // inv: j < r - q ==> arr[p..(p + k)) <=* v[j] // inv: arr[0..p) = old(arr[0..p)) // inv: arr[r..arr_len) = old(arr[r..arr_len)) // inv: multiset(arr[p..p + k)) == multiset(u[0..i)) // ++ multiset(v[0..j)) // inv: u == old(arr[p..q)) // inv: v == old(arr[q..r)) { if ((i < q - p && j < r - q && u[i] <= v[j]) || j == r - q) { arr[p + k] = u[i]; i = i + 1; } else { arr[p + k] = v[j]; j = j + 1; } k = k + 1; } } void merge_sort(int* arr, int arr_len, int p, int r) // pre: 0 <= p < r <= arr_len // post: sorted arr[p..r) // post: multiset(arr[p..r)) == multiset(old(arr[p..r))) // post: arr[0..p) == old(arr[0..p)) // post: arr[r..arr_len) == old(arr[r..arr_len]) { if (r - p > 1) { int q = (p + r) / 2; merge_sort(arr, arr_len, p, q); merge_sort(arr, arr_len, q, r); merge(arr, arr_len, p, q, r); } } 41 6. Evaluation on Example Algorithms int* copy(int* a, int* b, int p, int q) // pre: 0 <= p < q <= a_len // post: b == a[p..q) { for (int k = p; k < q; ++k) // inv: p <= k <= q // inv: a[p..k) == b[0..k - p) { b[k - p] = a[k]; } return b; } Listing 4 : Implementation of merge sort. 42 7 Related Work Loop invariant inference methods can be divided into two different categories: static and dynamic. Static methods infer invariants directly from the code by reasoning. Dynamic methods observe runtime behavior through execution traces to identify invariants. This thesis is concerned with dynamic loop invariant inference. Static methods include abstract interpretation [5], constraint solving [13], [14], predicate abstraction [15], and more recently constraint Horn clause solving [16], [17]. M. Karr [18] showed that it is possible to discover linear relationships between program variables of the form 𝑎1 × 𝑥1 + 𝑎2 × 𝑥2 + … + 𝑎𝑛 × 𝑥𝑛 + 𝑐 = 0 where 𝑥1, 𝑥2, …, 𝑥𝑛 are program variables and 𝑎1, 𝑎2, …, 𝑎𝑛, 𝑐 are constants. This was done by approximating program behavior by an abstract domain. Constraint-solving-based systems translate invariant inference to constraint solving and have been used to discover polynomial relationships between program variables [13]. Finally, predicate abstraction has been used to find classes of invariants by constructing Boolean formulas given a set of predicates [19]. It has been shown that program verification can be reduced to constraint Horn clauses solving [16] – the problem of finding an unknown formula (e.g. a loop invariant) that satisfies a set of constraints (e.g. the verification conditions in the while rule of Hoare logic). A few such Horn clause solvers have been developed recently [16], [17]. Static methods are powerful and can discover sound invariants, but their designs are often complex and often scales poorly with program complexity. Dynamic methods scale better due to their rather simplistic approach. A popular dynamic invariant inference system is Daikon [7]. Daikon collects execution traces, which are sequences of variable values recorded during the program’s execution. It then analyzes these traces to discover patterns or relationships that hold true. Daikon has been integrated with automatic test case generators to produce better execution traces to catch edge cases, eliminating many false positives [20]. Daikon considers a set of defined templates for kinds of invariants to search for. Some examples are constant (𝑥 = 𝑎), non-zero (𝑥 ≠ 0), being in a range (𝑎 ≤ 𝑥 ≤ 𝑏) and linear relationships (𝑦 = 𝑎𝑥 + 𝑏) among others. The user can also extend Daikon with more templates. It can, however, be hard to know what template might be needed without knowing the invariant we are searching for. 43 7. Related Work An algorithmic difference between Daikons approach and QuickInv is that Daikon is more efficient with how it uses the collected states data. Daikon starts off with the assumption that all invariant candidates are true. It iterates over recorded states, incrementally falsifying candidates. QuickInv, on the other hand, starts off with the assumption that all candidates (except t = t for all t) are false. It then samples thousands of states to test each candidate, unless falsified quickly. It is possible for QuickInv to miss a counter-example due to sampling – even if the data includes the counter-example, which can lead to false positives. An alternative to QuickSpec is Speculate [21], which is a theory exploration system inspired by QuickSpec. Speculate discovers universally quantified equations, and in addition, it can also discover conditional equations and inequalities as a post-processing step after equation discovery. Speculate lacks built-in support for observational equality, and is less optimized than QuickSpec. QuickSpec has been used in HipSpec [22] to discover lemmas for inductive theorem proving of recursive functions. It was also used for conjecture discovery in the interactive proof assistant Isabelle/HOL [23]. QuickSpec was extended to support guided theory exploration using user-defined templates in RoughSpec [24], to allow searching for larger terms with templates. QuickSpec added support for reasoning with conditional equa- tions towards the end of this thesis via an encoding [25]. It enabled adding conditional laws manually to the pruner, however, QuickSpec still has the same limited support for automatically discovering conditional laws. 44 8 Discussion and Conclusion This thesis presented QuickInv, a system for dynamic program invariant discovery built upon the theory exploration system QuickSpec. Our central aim was to leverage theory exploration to discover high-level, easy-to-understand invariants from program execution traces, particularly for array programs. We reframed invariant discovery as a theory exploration problem, developed a high-level grammar in Haskell for expression such invariants, and implemented the QuickInv system. We captured program states by executing C programs and employed a two-phase QuickSpec process to distinguish general mathematical truths from program-specific properties. The evaluation was on the binary search, linear search, selection sort, and merge sort algorithms demonstrated QuickInv’s ability to rediscover many essential loop invariants through systematic exploration of possible invariants up to a certain term size. The thesis sought to address several research questions. Regarding a suitable grammar for high-level invariants in array programs, we listed a set of functions and constants in a prelude and additionally some arithmetic functions required for some of the algorithms. The performance of the system depends highly on what is included in the grammar, and the number of functions and constants. The addition and subtraction operators lead to significantly worse performance and more discovered laws – many of them spurious. QuickSpecs effeciency has enabled systematic exploration of large search spaces, however, exploring larger terms becomes infeasible quite quickly beyond term size 7 due to the exponential growth of the search space. The grammar is able to express the invariants needed for the example algorithms; however, there is an additional requirement for the grammar to fulfill: the grammar needs to have nice properties that are simple and discoverable automatically by QuickSpec. Due to the limitations of conditional support in QuickSpec, laws about partial functions such as array slicing and indexing cannot be discovered automatically. Total functions have simpler, often non-conditional properties that can be discovered automatically. One solution is to manually add background properties as axioms to the QuickSpec pruner, which allows reasoning about the properties that otherwise would not be discovered by QuickSpec. This, however, is laborious and requires domain knowledge and is problematic for a user-configurable grammar. 45 8. Discussion and Conclusion Concerning the effectiveness of a dynamic, theory-exploration-based approach, QuickInv demonstrated encouraging potential. It successfully rediscovered many loop invariants for the example algorithms, while struggling with invariants with large term sizes and conditional invariants. The system’s systematic exploration is a significant advantage over template-based approaches like Daikon, which can miss important invariants due to their reliance on predefined templates. The system’s ability to leverage reasoning and pruning of redundant laws, informed by background theory discovered in its first phase, reduces the number of reported laws following from reasoning. There are, however, still many laws that the simple reasoning based on rewriting up to a certain term size cannot prune away, and the possibility of discovering false laws that can only be falsified by certain corner cases – an issue with all dynamic invariant discovery systems – remains a challenge. The main limitations of QuickInv are the inability to discover conditional invariants and background laws, and to handle the large search space of invariants more efficiently to allow for discovery of larger term sizes. Addressing these limitations naturally points towards several key directions for future work. A primary focus is enhancing the discovery of conditional invariants. One approach within QuickInv could involve a pre-filtering step on the collected program states, allowing the system to test potential consequent properties only on states satis- fying a user-defined antecedent, thereby enabling the inference of specific 𝑃 ⇒ 𝑄 style invariants. Concurrently, advancing QuickSpec’s support for more intricate conditional laws would be beneficial, particu