DF A GPU Polyhedral Discrete Element Method Formulation and implementation of large scale simulations for non-spherical particles using novel GPU techniques Master’s thesis in Complex Adaptive Systems ADAM BILOCK Department of Mathematical Sciences CHALMERS UNIVERSITY OF TECHNOLOGY Gothenburg, Sweden 2020 MASTER’S THESIS 2020 A GPU Polyhedral Discrete Element Method Formulation and implementation of large scale simulations for irregular non-convex particles using novel GPU techniques ADAM BILOCK DF Department of Mathematical Sciences CHALMERS UNIVERSITY OF TECHNOLOGY Gothenburg, Sweden 2020 A GPU Polyhedral Discrete Element Method Formulation and implementation of large scale simulations for irregular non-convex particles using novel GPU techniques ADAM BILOCK © ADAM BILOCK, 2020. Supervisor: Klas Jareteg, Fraunhofer-Chalmers Research Centre Examiner: Anders Logg, Department of Mathematical Sciences Master’s Thesis 2020 Department of Mathematical Sciences Chalmers University of Technology SE-412 96 Gothenburg Telephone +46 31 772 1000 Cover: One million Schönhardt polyhedral particles simulated with the presented method. Typeset in LATEX, template by David Frisk Printed by Chalmers Reproservice Gothenburg, Sweden 2020 iv A GPU Polyhedral Discrete Element Method Formulation and implementation of large scale simulations for irregular non-convex particles using novel GPU techniques ADAM BILOCK Department of Mathematical Sciences Chalmers University of Technology Abstract This thesis presents a Discrete Element Method (DEM) to simulate irregular shaped particles by a non-convex polyhedron representation. By using novel GPU techniques and an efficient HPC implementation the presented method shows a level of through- put not previously attained with polyhedron particle representations in the open liter- ature. Further, via such a representation the exact volumetric overlaps of the particles are resolved and, as a result, the method is robust and numerically stable with respect to geometric changes. The efficient and well-behaved method allows for significant progress in the study of granular materials, where previously mainly the inadequate particle representation of spherical or clumped spherical particles have been used. The exact volumetric overlaps are resolved by a simplex representation which allows for the use of non-convex particles without any decomposition, aiding both perfor- mance and the ease of use of the method. Further, care is given to attain efficient scal- ing of the method with respect to particle resolution. Such a property enables studies on higher resolution particles than previously shown in related work, and is result of ef- ficient filtering of polyhedron triangles in the narrow contact phase. In addition, other novel techniques, such as a GPU BVH implementation for the broad phase contact de- tection, also aids the performance and the flexibility of the proposed and implemented method. The method is shown to be convergent with respect to particle resolution, both for in- dividual particle collisions and also for laboratory scale particle systems. The HPC im- plementation is proven to be highly efficient, where, for instance, a one second simula- tion of one million non-convex particles is simulated within an hour on a single GPU. By the effective filtering of triangles in the narrow contact phase, near linear scaling can be achieved with regards to particle resolution. Keywords: Discrete Element Method, Polyhedral intersection, GPU, HPC, Particulate systems, Non-convex particles. v Acknowledgements Foremost I want to acknowledge my supervisor Klas Jareteg for always taking time for discussions. Your input on everything between physics, algorithms and report writing have been highly valuable and appreciated during this thesis work. I also want to ac- knowledge Johannes Quist, the project lead of the DEM research group at FCC, whose insights into DEM modelling have been of great value. Also in general has the DEM research team at FCC been of great support, I am excited to continue working with all of you. Further I want to thank my examiner Anders Logg for giving valuable feedback and spending your time on this thesis. I want to acknowledge FCC in general, and in particular department head Fredrik Edelvik, for letting me do my thesis on an interesting yet relevant topic. I also want to thank FCC for the great technical support which has enabled me to perform my thesis work unin- terrupted even through the current exceptional circumstances. Finally, marking the end of my 5 years of studies, I want to thank my family and friends for supporting me throughout these years. This work was carried out within the framework of the Centre for Additive Manufactur- ing - Metal (CAM2), supported in part by the InfraSweden2030 project DigiRoad, and funded in part by the Swedish Government Agency for Innovation Systems, VINNOVA. Adam Bilock, Gothenburg, June 2020 vii Contents Abbreviations xi List of Figures xiii List of Tables xv 1 Introduction 1 1.1 Project scope limitations and aim . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Outline of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 General-Purpose Computing on Graphics Processing Units (GPGPU) 5 2.1 Hardware architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Programming model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Implementation details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3 Overview of the Discrete Element Method 9 3.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 Discrete element representations . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3 World geometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.4 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4 Contact Forces for Irregular Shaped Particles 15 4.1 HM+D model for spheres . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2 HM+D model for irregular shaped particles . . . . . . . . . . . . . . . . . . 16 5 Polyhedral DEM Contact Detection 19 5.1 Polyhedron representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 5.2 Intersection nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5.3 Intersection mass properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5.4 Multiple contact error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.5 Broad phase contact detection . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.6 World geometry contact detection . . . . . . . . . . . . . . . . . . . . . . . . 29 6 Physical Verification and Convergence of Method 31 6.1 Verification and convergence for spherical polyhedrons . . . . . . . . . . . 31 6.2 Verification and convergence for irregular shaped particles . . . . . . . . . 35 6.3 Verification and convergence on a laboratory scale . . . . . . . . . . . . . . 36 ix Contents 7 Performance 41 7.1 Overall performance - gravity packing simulation . . . . . . . . . . . . . . . 41 7.2 Scaling with particle resolution . . . . . . . . . . . . . . . . . . . . . . . . . . 44 7.3 Memory consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 8 Conclusion 49 8.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Bibliography 53 A Simulation parameters I A.1 Calibration rig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I A.2 Gravity packing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II x Abbreviations BFS Breadth First Search. BVH Bounding Volume Hierarchy. CUDA Compute Unified Device Architecture. DEM Discrete Element Method. DFS Depth First Search. FMA Floating point Multiply Add. FPGA Field Programming Gate Array. GPGPU General Purpose computing on Graphics Processing Unit. GPU Graphics Processing Unit. HM+D Hertz-Mindlin-Deresiewicz. ILP Instruction Level Parallelism. IPS Industrial Path Solutions. NDEM Nonsmooth Discrete Element Method. SM Streaming Muliprocessor. SP Streaming Processor. xi Abbreviations xii List of Figures 2.1 The Nvidia Volta streaming multiprocessor . . . . . . . . . . . . . . . . . . . 6 3.1 Control flow of the discrete element method. . . . . . . . . . . . . . . . . . 9 3.2 The three most common particle representations in DEM. . . . . . . . . . . 11 3.3 Overview of common particle representations in DEM. . . . . . . . . . . . 12 3.4 Illustration of the fallacy of multisphere DEM. . . . . . . . . . . . . . . . . . 13 5.1 Collision of two polyhedral particles. . . . . . . . . . . . . . . . . . . . . . . 19 5.2 Polyhedral intersection nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.3 Filtering of polyhedral particle triangles. . . . . . . . . . . . . . . . . . . . . 23 5.4 Polyhedral intersection volume computation with cusp model. . . . . . . . 25 5.5 A bounding volume hierarchy. . . . . . . . . . . . . . . . . . . . . . . . . . . 27 6.1 Spherical polyhedrons used in this study. . . . . . . . . . . . . . . . . . . . . 32 6.2 Average kinetic energy response for spherical particles. . . . . . . . . . . . 32 6.3 Standard deviation in kinetic energy response for spherical particles. . . . 33 6.4 Convergence of kinetic energy response for spheres. . . . . . . . . . . . . . 34 6.5 Irregular shaped polyhedrons used in this study. . . . . . . . . . . . . . . . 35 6.6 The average normalized kinetic energy response for a irregular shaped particle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 6.7 The variance in the normalized kinetic energy for a irregular shaped par- ticle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 6.8 Experimental calibration rig. . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 6.9 Height map from rig simulation. . . . . . . . . . . . . . . . . . . . . . . . . . 39 6.10 Convergence of calibration rig simulation. . . . . . . . . . . . . . . . . . . . 40 7.1 The gravity packing simulation. . . . . . . . . . . . . . . . . . . . . . . . . . 41 7.2 The Schönhardt polyhedron used in this study. . . . . . . . . . . . . . . . . 42 7.3 Simulation time of solvers for gravity packing. . . . . . . . . . . . . . . . . . 43 7.4 Number of collision pairs during gravity packing simulation. . . . . . . . . 44 7.5 Scaling of solvers against number of pairs. . . . . . . . . . . . . . . . . . . . 45 7.6 Scaling of solver for irregular shaped particles. . . . . . . . . . . . . . . . . . 46 7.7 Scaling for calibration rig simulation . . . . . . . . . . . . . . . . . . . . . . 47 xiii List of Figures xiv List of Tables 7.1 Memory consumption for gravity packing. . . . . . . . . . . . . . . . . . . . 47 A.1 Parameter values used for the calibration rig simulations without tangen- tial spring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I A.2 Parameter values used for the calibration rig simulations with tangential spring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II A.3 Parameter values used for the gravity packing simulations. . . . . . . . . . III xv List of Tables xvi 1 Introduction There is a large demand for understanding the nature of granular materials from both science and industry. An increasingly popular method to study these systems is the Discrete Element Method (DEM). By simulating the particles as distinct elements, DEM can achieve a level of detail and generality not present in alternative methods for gran- ular materials. This is opposed to continuum methods where instead the system is treated as a continuous material. DEM first originated in 1979 from implementations by Cundall and Strack[1], but due to the computational burden of resolving each dis- tinct particle it is only in recent years the method been commonly adopted. To lower the computational cost a common approach of DEM is to approximate all particles as spheres [2]. However, it is evident that spheres cannot represent the often varying particle shapes present in granular materials. The most popular remedy of this problem is to use compounds of spheres (here on referred to as multispheres)[3]. The simplicity of contact detection for spheres is preserved with the multispheres and, in addition, the ability to capture the shape of granular material is improved in compar- ison to spheres [4]. However, a deficiency of the multisphere approach is the lack of representation of sharp edges, often present in real particles. More importantly all the force models in use for multispheres in the open literature are fundamentally flawed. This is due to that the exact intersection of two multispheres cannot efficiently be resolved and, consequently, the acting force on the particles are naively accumulated from all compounded spheres [5]. This deficiency of the force model for multispheres results in non-converging behaviour of the method, where on one hand more compound spheres are needed for an accurate shape representation and on the other hand more compound spheres makes the force model ever more in- accurate [6]. Henceforth, to attain an accurate representation of granular material a better model of the particles is needed. One candidate is polyhedrons, which by its flexible repre- sentation can be made to accurately model most realistic shapes. Furthermore, such a formulation can also, in contrast to purely algebraic models, capture the sharp edges of e.g rock material often studied with DEM. However, there is a significant computa- tional burden of polyhedral DEM. To lower the computational burden some early ef- forts[7] simplified the problem by using the indentation depth on a feature-by-feature manner, meaning that the indentation depth is computed for each face-face, edge- face and vertex-face of the polyhedron. With this approach the same issue as with the multisphere appears, i.e. the force model is not decoupled from the geometric mod- 1 1. Introduction elling[6] and the only advantage over the multispheres is the more flexible geometry of polyhedrons. However, the problem with a non-converging method can be resolved with polyhedrons, namely by ensuring that the exact volumetric overlap of the parti- cles are computed. In Polyhedral particles for the discrete element method by B. Nas- sauer et. al. [8] this approach was explored by presenting a general framework for the method. This was followed by another article [9], where a more advanced force model was derived from Hertz contact law, and was shown to agree well with FEM simula- tions. However, only systems of at most 500 particles were studied. In the work of N. Govender et.al. [10, 11, 12, 13] it was shown how a GPU polyhedral DEM utilizing volu- metric overlaps can be effectively applied to large scale industrial usage. In their work, for instance a one second simulation of one million non-convex particles were simu- lated overnight on a single desktop computer. Besides the open literature, efforts in the realm of polyhedral DEM have also been made in commercially available DEM software. Most notable in the software Rocky DEM, supporting multi-GPU simulations and a large extent of features. However, their work is mostly not relevant for this thesis, since neither their computational method nor other displays of their accuracy are published, and it is thus impossible to do any comparisons. Nevertheless, according to N. Govender et. al[14] Rocky DEM neither re- solve the exact volumetric overlaps nor attain the same level of throughput as Goven- der et.al. have displayed, but no further details were given about the simulation case and thus the results are difficult to relate to in a scientific context. Recently Becker 3D introduced a polyhedral DEM GPU solver, but again no reports of their method, accu- racy or other capabilities makes a direct comparison likewise impossible. Since 2016 the Fraunhofer-Chalmers Centre (FCC) has developed a DEM framework. The current framework consists of a state-of-art GPU implementation of DEM for spher- ical particles and a CPU implementation of multisphere DEM. The spherical GPU solver currently allows for more than 50 million polydisperse spherical particles. Further- more, the performance of the CPU implementation of the multisphere solver has shown to be a limiting factor for industrial scale studies of e.g. infrastructure and rock ma- terial handling simulations. In addition, the aforementioned issues[6] regarding the force models of multisphere DEM has also been exposed in research work conducted by the FCC DEM group. As a consequence of the performance and model limitations, the FCC DEM group have a need for a performant GPU DEM solver that can represent non-spherical particles and also to evaluate the polyhedral representation of particles. With the developed competence in HPC based on GPUs, a polyhedron DEM frame- work is a feasible next step in state-of-art software for particulate systems. 1.1 Project scope limitations and aim The project work was conducted at FCC in the DEM research group. The purpose of the thesis work was to implement and verify a state-of-the-art polyhedron DEM GPU code with performance equivalent or better than what has been reported in the literature. To acquire this goal, the project work was delimited in some aspects: • The project was implemented directly in the existing DEM framework, minimiz- 2 1. Introduction ing the efforts required for setting up the software infrastructure. • The polyhedron particles were provided at the start of the work and no further studies were conducted on the triangulation process of the model particles them- selves. • The force model was not the primary concern of the thesis and tentatively mod- els from the literature were to be used. 1.2 Research questions In this work several research questions revolving the algorithmic aspects are to be an- swered: • What is the current state-of-art in the open literature as regards to DEM simula- tions of polyhedron particles? • Compared to the spherical GPU solver at FCC: What is the performance charac- teristics for the polyhedron code? • Can a linear scaling with the number of particles be achieved on a state-of-art GPU? • How does the computational time of the polyhedral code scale with the resolu- tion of the triangulation of the particles? Furthermore, research questions regarding the use of the polyhedral particle represen- tations in DEM were formulated: • Can convergence in physical behavior of the particles be shown with regards to the particle resolution? • How finely resolved particles are required to achieve convergence in terms of their physical behaviour? • Can the method capture the range of repose angles found in experiments in a calibration rig? • Finally, and more generally: what are the benefits respectively disadvantage of a polyhedral particle representation? 1.3 Outline of thesis Section 2 introduces GPU programming, the implications of which will determine many later decisions in the used methods. Section 3 gives an overview of DEM. Then, the fol- lowing sections will go into details of the required components of polyhedral DEM. First the force models are introduced in section 4, followed by the contact detection detailed in section 5. The remaining sections will analyze the method, first the method is verified in section 6 whereafter the performance of the solver is analyzed in section 7. Finally the thesis is summarized in section 8. 3 1. Introduction 4 2 General-Purpose Computing on Graphics Processing Units (GPGPU) With the advent of the power wall[15], meaning that the clock rate of single core ar- chitectures could not be further increased due to thermodynamical laws, progress in computational power have been shifted towards parallel computation on a single chip. In the forefront of this shift lies general-purpose computing on graphics processing units (GPGPU). With the initial purpose of rendering 3D graphics, the hardware in GPUs have due to the parallel nature of rendering been developed and optimized for highly parallel computation. The underlying computational units of the GPU follows the stream processing paradigm which by its constrained computational model can at- tain a much higher throughput per power than the traditional CPU architecture. Some aspects of both the hardware architecture and corresponding SIMT (Single Instruc- tion Multiple Threads) programming model are important for some of the algorithmic choices of this thesis and, thus, such aspects are carefully described in this section. // 2.1 Hardware architecture The GPU is a so called co-processor and hence is its execution entirely separated from the execution of the host (meaning the CPU). In practice this means that the GPU in a broad perspective largely requires the same elements as the CPU. This means that it has its own global memory, it has a similar cache hierarchy and the execution of the underlying computational units is completely organized by the GPU itself. While the hierarchy of the GPU is similar to the CPU, the components of this hierarchy have largely been altered to accommodate the massive parallelism on offer [15]. Starting at the lowest level of the architecture, the underlying computational units of the GPU follows the aforementioned streaming process paradigm (these computa- tional units are often referred to as Streaming Processors, SP). In the Nvidia chips a SP is represented by 32 floating point units and 32 integer units. By strictly following the streaming process paradigm all 32 units must execute the same instruction. By this restriction a much higher number of computational units can reside on a single chip opposed to a CPU. The impacts of this limitation on the programming model will be further considered later. Each SP does merely consist of the actual arithmetic logic units (ALU) and to actu- ally execute instructions and interact with memory each SP resides within a Streaming 5 2. General-Purpose Computing on Graphics Processing Units (GPGPU) Figure 2.1: The Nvidia Volta streaming multiprocessor (SM). Most required hardware for computation resides within the SM. Note that each marked computational unit in the figure, e.g. the marked FP64 units, is here a streaming processor and consists of 32 ALUs in total. Also noteworthy is the tensor cores which takes up nearly half of the space for computational units and by its restricted execution of half-precision 4 × 4 matrix FMA (Fused Multiply Add) operations cannot be readily utilized for many ap- plications. [16] Multiprocessor (SM). These units organize the computation of the SPs, where each SM consist of many SPs. Figure 2.1 shows the SM for the Nvidia Volta architecture[16]. In this architecture the different ALUs of the SP have been separated such that each SP consist of one FP32 unit, one integer unit and one FP64 unit. Further, each SM has its own L1 cache, both instruction and data, and also importantly it has access to the so called shared memory. The shared memory has access times comparable to the L1 cache, but can be manually controlled by the programmer. Moreover in the Volta ar- chitecture[16], and also the succeeding Turing architecture[17] each SM has access to 8 tensor cores. These units can execute 4×4 matrix FMA (Fused Multiply Add) opera- tions, albeit only at half precision (meaning a 16-bit representation). Finally each GPU consist of many SMs (e.g. the Volta architectures have 80 SMs per GPU) and on this level there is also a L2 cache which interacts with the global memory. An important limitation of the GPU is that the synchronization of the SMs is not especially effective, often the execution between the different SMs rather is synchronized at the host and 6 2. General-Purpose Computing on Graphics Processing Units (GPGPU) usage of for example global atomics should be avoided. A note regarding the GPU hardware is that in comparison with the CPU the amount of memory (throughout the hierarchy) available per thread is significant lower, which in practice means that significant care is required such that consecutive threads access memory linearly. Finally since the GPU is a co-processor, care must be taken to avoid transferring to much data between the GPU and the host, which for the implementa- tion of this thesis is avoided by having all key computational steps performed by the GPU. 2.2 Programming model In this work the CUDA programming model[18] will be used as a extension of the pro- gramming language C++. Some aspects of this programming model will be of impor- tance for the algorithmic implementations. The basic model of CUDA consists of a host program that launches GPU kernels. A kernel consist of many threads executing the same program. The host execution primarily serves as a global synchronization point of the GPU and managing the control flow of the application. This role of the host is partly induced by the lack of effective global synchronization of the different SMs as previously discussed, but also due to that most control flow tasks are single threaded and thus more suitable for the CPU. So far the programming model follows the streaming process paradigm. However, CUDA also exposes a lower granularity of the thread organization which maps to the aforementioned hardware hierarchy. Foremost the threads are divided into blocks. Each thread block can only be executed on a single SM, where the threads in a block can be synchronized within the kernel and all have access to the same shared mem- ory. Utilizing the shared memory is essential to attain effective kernels, however it is severely limited in size (between 46-96kB[16, 17]). Furthermore, each thread block consist of several thread warps, where a thread warp resides in a single SP. The threads in a warp have faster synchronization and can to certain extent communicate by and share registers. A significant limitation of the programming model is that all threads in a warp need to execute the same instruction, this is imposed by the underlying com- putational units following the streaming process paradigm. Thus can the performance significantly degrade if the execution of the threads in a single warp diverges. This lim- itation means a significant challenge for the programmer to ensure that control flow within the kernels minimize the number of branches. This limitation does in practice mean that often simpler more straightforward algorithms should be preferred opposed to what typically is the most efficient solution on the CPU [15]. It should be mentioned that this limitation of thread divergence have with the newer architectures, Volta[16] and Turing[17], been slightly relaxed, but diverging control flow should still be mini- mized. 7 2. General-Purpose Computing on Graphics Processing Units (GPGPU) 2.3 Implementation details In the implementation of the methods presented in this thesis, the impact of low level technical details on the GPU have shown to give significant change in performance. Due to the large scope of the thesis, not all such details can be enclosed, rather this section will highlight some key programming patterns which has enabled increase of the performance. A recurring pattern is to always minimize the data usage. This is two-fold, it is im- portant both to minimize the actual data footprint, but also to reduce the data flow of the application. This affects both the choices of methods, and the implementation details such as effectively packing the data, using custom memory allocators and en- suring that the data gets properly cached during computations in the memory hierar- chy. This, at times, could be taken to the extremes, where it was beneficial to reorder data before key computational kernels in order to improve the cache hit rate during the actual computation. Similarly, also large performance benefits could be achieved by reordering data to minimize the aforementioned thread divergence. The generated code from the compiler is often non-optimal [19, 20] for CUDA, this is a result of both choices in the design of the C++ programming language and, in some cases, limited compiler support for CUDA. Thus, for many of the key computational kernels in the implementation, the code was analyzed on an assembly instruction level. Such an analysis often disclosed how small, almost trivial, changes to the C++ code could lead to significant changes in the generated machine code and ultimately the performance of the solver. One such example is aliasing [21], where the C++ lan- guage does not guarantee that two pointers of the same type do not point to the same memory location. If aliasing is not remedied it can in certain instances lead to a signif- icant higher memory throughput usage since there is no guarantee that the underlying data have not changed and thus data has to be reloaded from the global memory at each usage. Another example, is that small changes in the C++ code could hinder the compiler to vectorize loops, which results in worse instruction level parallelism (ILP), in turn also significantly degrading the performance. To quickly iterate the fine tuning of the low level details, most key computational ker- nels were micro-benchmarked, meaning that the performance of single kernels were measured and the process automated in the build system. Moreover, also profiling tools such as NVIDIA Nsight Compute/Systems [22, 23] were frequently used, which allowed for profiling both on a system level (Nsight Systems) but also at an assembly instruction level (Nsight Compute). 8 3 Overview of the Discrete Element Method In DEM, the particles are simulated as rigid bodies where the state of each particle is individually tracked and evolve over time due to various interactions with e.g. other particles or materials. The dynamics of the particles are determined by Newton’s sec- ond law, namely, mi d2xi dt 2 = Fi Ii d2Θi dt 2 = Ti (3.1) here mi are mass, xi is position of the centre of mass, Fi is the acting force, Ii is the in- ertia, Θi is the orientation and Ti is the acting torque, respectively on particle i . Com- monly eq. 3.1 are integrated by explicit methods. However, also implicit versions of DEM exist, and are then often referred to as nonsmooth DEM (NDEM)[24] which cor- rects for the fact that the forces and torques in eq. 3.1 cannot be expressed as a di- rect mapping from the particle state in the previous time step. NDEM can for certain applications compensate the increased computational burden induced by an implicit method by allowing for the use of larger time steps. However, for most applications, explicit formulations are preferred since then eq. 3.1 can be solved for each particle independently, whereas for the implicit counterpart the resulting matrix systems from eq. 3.1 are significantly more evolved to solve. Henceforth, in the remainder of this Input Contact detection Contact forces State update Post-Process Figure 3.1: Overview of the control flow of the discrete element method. Given input data the method iterates each time step by; first resolving the contacts of the particles, then evaluating the contact forces and finally the state of the particles are updated according to eq. 3.1. 9 3. Overview of the Discrete Element Method work only explicit DEM is considered. The control flow of a time step in an explicit formulation of DEM can be seen in figure 3.1. Here contact detection refers to finding the interacting particles, and if present also the interactions of the particles with other bodies (e.g. static world geometries). Note that each particle contact pair is resolved independently. The contact detection is typically performed in several phases; a broad phase, a narrow phase and finally resolving the geometric overlaps. The broad phase contact detection often relies on bounding volumes of the particles (e.g. axis-aligned boxes) which aims to efficiently, but approximately, filter out all potential pairs. Then for the case of complex particle shapes a narrow phase contact detection is often performed, which further filters out potential pairs but in some cases also individual components of the complex shapes (e.g. triangles or spheres). For simple spheres the narrow phase is often not required. Finally the actual overlap of the particles are resolved, the exact details of this most detailed phase depends on the particle representation and the force model used. The contact detection methods of this work are further discussed in section 5. With the contacts resolved, the next step consists of computing the interacting forces to give the right hand side of eq. 3.1. In this work only interactions resulting from direct geometric contact are consider, i.e. the particle has no long-range forces. The formu- lation of the contact forces in this work can be seen in section 4. Finally the state of the particles are updated by integrating eq. 3.1 which is further detailed in section 3.4. Now the simulations are performed over time by iterating over the time steps. 3.1 Limitations With the explicit DEM formulation there are certain key assumptions and limitations to consider. One of the main assumptions is that the particle geometry is assumed to be rigid, meaning that there is no deformation or plasticity in the particles. Still, the rigid geometries of the particles are allowed to overlap, but this is only valid if the overlap of the particles are assumed to be small enough such that effects from deformation or other plasticity are small. This limitation has the negative effect of restricting the method to small time steps such that these small overlaps are attained at each step [2]. 3.2 Discrete element representations As discussed in the introduction (see section 1) there are several different representa- tions of the discrete elements in DEM. The most common shape is simple spheres, which largely is due to their computational advantages. However, as many studies have shown the effect of the shape of particles on the systems can be profound [12, 4]. Henceforth, several methods to represent irregularly shaped particles have been devel- oped. The most common method is the multisphere representation, where each par- ticle is represented by several clumped spheres. Moreover, there are several attempts of different analytical representations e.g. ellipsoids. Further, most relevant for this 10 3. Overview of the Discrete Element Method Figure 3.2: The three most common particle representations in DEM. The simple sphere, multispheres and polyhedrons (left to right). thesis is the polyhedral representations. Figure 3.2 shows three most common parti- cle representations in DEM, a simple sphere, multisphere and polyhedron. Note that for all approaches there are typically only a few particle models per population, where often only the scale of each model can be changed for each particle, this is due the high memory requirements of having unique models for each particle. Figure 3.3 gives a overview of the associated computational cost and physical fidelity of the different particle resolutions. Spheres have commonly been used in DEM. Largely the reason for this has been the computational advantage of such a simple shape. However, it is evident that spheres cannot represent irregular shaped particles often found in granular materials. While this can be to some extent compensated for by using non-uniform rolling friction of the spheres[4] this is accompanied with significant tuning of the parameters which significantly reduce the usability, and ultimately the validity, of the method. The scale of the current state-of-art spherical DEM solvers is in the order of tenths of millions of polydisperse spheres [25]. It has also been displayed in the literature how the usage of GPUs can significantly speed up the simulation time, where speed up ratios between 60-100 times single threaded CPU implementations have been shown [25]. Multispheres consist of several clumped sub-spheres as displayed in figure 3.2. The sub-spheres of one multisphere typically remains fixed in the particle body frame. The ability to model irregularly shaped particles is improved in comparison to spheres and it has been shown how the clumped representation can to some extent give predictive results with respect to particle shape[4]. The ease of contact detection also remains with multispheres where only slight modifications are required opposed to spherical DEM. However, a significant issue arise by its geometric modelling in the force model. Namely the contact detection of each sub-sphere is completely independent of the other sub-spheres in the particle. Meaning that each sub-sphere pair of one parti- cle contact is modelled as separate contact forces. This issue is illustrated in figure 11 3. Overview of the Discrete Element Method Physical fidelity C o m p u ta ti o n al co st Figure 3.3: Overview of the common particle representations in DEM. The scales gives a outline of the physical fidelity and the associated computational cost for the different representations. From left (bottom) to right (top) there is simple sphere, multisphere, analytical and polyhedral particle representations. 3.4, where one particle contact is represented by 8 different sphere-sphere contacts, all these 8 sphere-sphere contacts will naively be accumulated and give a significant higher contact force compared to if the actual overlap volume could be resolved. It has been shown in the literature how this severely distort the contact forces for multisphere particles [5]. Now, since the only approach to improve the geometric modelling of the multispheres is to increase the number of subspheres this leads to a non-converging method with respect to particle resolution [6]. This issue can be illustrated by, for in- stance removing the smallest two spheres in figure 3.4, then the resulting contact force of the collision in figure 3.4 will be significantly lower. Polyhedrons have largely been avoided in DEM due to the high computational de- mand of resolving the contacts. To reduce the computational cost the first efforts with regards to polyhedral DEM used contact forces by an feature-by-feature manner, meaning that each face-face, face-edge and edge-edge contact were independently re- solved. Notable work utilizing this approach is found in Govender et. al. [7, 26] where primarily the preeminent advantage of utilizing the GPU also was shown for polyhe- dral DEM where millions of particles could be simulated on a single GPU. However, this contact detection scheme similarly to the multisphere approach can severely dis- tort the contact forces and again the force model is not decoupled from the geometric modelling [6]. This problem can be avoided for polyhedrons by instead resolving the exact volumetric overlaps of the particles. Notable work using this contact detection scheme is seen in Nassauer et. al. in [8], where a general framework for the method was developed, and also in [9] where a more advanced force model was derived which attempts to take the contact curvature into account which results in good agreement 12 3. Overview of the Discrete Element Method Figure 3.4: The figure shows collision of two objects. The actual objects are repre- sented as the dotted lines, the spheres with the same colour as the lines are part of the multisphere representation of that object. For this case, the multisphere representa- tion would give very strong contact force, since all individual sphere-sphere contacts between the blue and red spheres will be counted as individual forces. with FEM simulations. However, in [8, 9] the system sizes are restricted to 500 parti- cles and the solver was not adopted for non-convex particles. Instead Govender. et. al. have in recent years[11, 12, 13] adapted the initial solver from [7, 26] to instead resolve the exact volumetric overlaps of the particles. In their work also non-convex particles could be simulated by decomposing the particles into convex parts. Govender et. al can achieve this physical fidelity while still being capable of simulating large system sizes, for instance a one second simulation of one million non-convex particles could be simulated overnight. It is noteworthy that in the work of Govender et. al. a slightly simpler elastic force model opposed to the one proposed by Nassauer et. al. was used which to not the same extent corrects the force model to the contact curvature. 3.3 World geometries For DEM to be usable, the particles also need to interact with a surrounding environ- ment. Rigid world geometries is the most common such an environment. Several dif- ferent methods exist to model the world geometries; purely as analytical surfaces (e.g. planes and cylinders), as several rigid clumped particles or as triangle meshes. The for- mer alternative, with purely analytical surfaces, can be the most straightforward when only a few different world geometries are considered, but this approach also severely restricts the usage of the method. In contrast, more general approaches such as inter- actions with CAD surfaces is significantly more evolved. The second approach, rigid clumped particles, are a popular method for polyhedral DEM[8, 9], since then the con- tact detection for the world geometries and particle interactions are completely equiv- alent. For certain instances the conversion between triangle mesh to clumped poly- hedron particles can be straightforward, however for complicated surfaces the con- 13 3. Overview of the Discrete Element Method version can be much more complicated and hence is this option not especially user friendly. Only recently in literature on polyhedral DEM have the case of world geome- tries represented by triangle meshes been considered [13]. Using a triangle mesh to represent the world geometry is the most flexible approach since they can both be cre- ated from point clouds of scanned objects or be generated as computational meshes from analytical surfaces. For the case of polyhedral DEM, the contact forces between particle and triangle mesh can remain the same as for particle-particle interactions, see section 4. Further, there is also only slight changes required for the contact detec- tion in comparison with the particle interactions, which is further discussed in section 5.6. 3.4 Integration In this work explicit integration is used for solving the ODE system of eq. 3.1 and as previously mentioned is the standard approach in DEM. A common method first pro- posed by Cundall and Strack in 1979[1] for the translational integration is the Velocity Verlet algorithm. The translational integration then goes as following: 1. Update velocity, vi (t + 1 2∆t ) = vi (t )+ 1 2 ai (t )∆t 2. Update position, xi (t +∆t ) = xi (t )+vi (t + 1 2∆t )∆t 3. Compute force, fi , based on xi (t+∆t ) and vi (t+ 1 2∆t ), update acceleration, ai (t+ ∆t ) = fi /mi 4. Update velocity, v(t +∆t ) = v(t + 1 2∆t )+ 1 2 ai (t +∆t )∆t where i refers to the particle i = 1, ...,n. Note that the force computation is performed at positions and velocities that are a half-step apart, which for the viscous forces of DEM are deemed acceptable through empirical evidence[27]. For the orientation ex- plicit forward Euler integration is used. 14 4 Contact Forces for Irregular Shaped Particles This section will present the contact forces used in this study, i.e. the expression needed to evaluate the right hand side of eq. 3.1. The model will be derived from the well stud- ied Hertz-Mindlin-Deresiewicz (HM+D) model for spheres. Hence will the spherical model first be described and then it will be shown how it can be adapted to arbitrary shapes as first shown in[9] for the regular Hertz contact law for spheres. Note that in an earlier phase of this work the model proposed by [9] was intended to be used. However, as further detailed in section 6.3 the desired behaviour of particle systems could not be achieved with this model, hence in section 4.2 an additional tan- gential spring stiffness are introduced to address this issue. 4.1 HM+D model for spheres The HM+D model has a normal elastic force which corresponds to the Hertz contact law for spheres[28], Fn,e = 4 3 E∗R1/2δ3/2 (4.1) where E∗ refers to the effective modulus, R∗ is the effective radius of the spheres and δ is the indentation depth. The effective modulus is defined as, 1 E∗ = 1− v2 1 E1 + 1− v2 2 E2 (4.2) where vi is the Poisson’s ratio and Ei is the Young’s modulus for each sphere i = 1,2. The effective radius is defined as, 1 R = 1 R1 + 1 R2 (4.3) where Ri is the radius of the sphere i = 1,2. Note how the intersection with a sphere and a plane can be defined by letting R2 → ∞ and consequently R = R1. Finally, the indentation depth for sphere-sphere contacts can be calculated as, δ= { R1 +R2 −|c2 − c1| if |c2 − c1| < R1 +R2 0 otherwise (4.4) 15 4. Contact Forces for Irregular Shaped Particles where ci is the center of each sphere i = 1,2. Further is the scalar dissipative normal force given by, Fn,d = 2γ √ m∗kn vn (4.5) where γ is a damping coefficient, m∗ is the effective mass of the particles, kn is the nor- mal spring stiffness and vn is the relative velocity in the normal direction. The effective mass is given by, m = m1m2 m1 +m2 (4.6) and the normal spring stiffness is given by, kn = 2E∗pRδ. (4.7) In the HM+D model a tangential spring force is derived from the no-slip theory of Mindlin [29]. The scalar component of this force is given by, F n t ,e =F n−1 t ,e +kn t ∆δt if ∆Fn ≥ 0 (4.8) F n t ,e =F n−1 t ,e ( kn t kn−1 t ) +kn t ∆δt otherwise (4.9) where kt is the tangential spring stiffness, n refers to the current time step and ∆δt is the tangential displacement at the current time step. The tangential spring stiffness is given by the following expression, kt = 8G∗pRδ, (4.10) where G∗ is the effective shear modulus, given by, 1 G∗ = 1− v2 1 G1 + 1− v2 2 G2 . (4.11) The tangential force also has a dissipative component, giving the final expression for the scalar tangential force as, Ft = F n t ,e +2γ √ mkt vt if Ft <µFn (4.12) Ft =µFn otherwise (4.13) where µ is a friction coefficient. 4.2 HM+D model for irregular shaped particles Now following the work of Nassauer et. al [9] a contact model for arbitrary shaped particles can be derived. This is done by noting that the volume of a sphere-sphere intersection can be approximated as, V ≈πδ2R (4.14) 16 4. Contact Forces for Irregular Shaped Particles if one assumes δ≪ R. Similarly for a sphere-plane intersection the volume is given by, V = 1 3 πδ2(3R −d) (4.15) which again can be approximated according to equation 4.14 when δ ≪ R. The as- sumption of, δ≪ R, i.e. small overlaps relative to the particle size is not a restrictive assumption, since as mentioned in section 3 this is already assumed in DEM mod- elling. Now inserting this expression into the HM+D model for spheres the following scalar normal force for arbitrary shaped particles is obtained which instead of the ra- dius is based on volume, Fn = 4 3 p π E∗pV δ+2γ √ m∗kn vn , (4.16) but now the normal spring stiffness is defined as, kn = 2E∗ √ V πδ . (4.17) Further, the expressions in eq. 4.8 and eq. 4.12 for the tangential force remains the same, but for an irregular shaped particle the tangential spring stiffness can instead be defined as, kt = 8G∗ √ V πδ . (4.18) Again following the work of Nassauer et. al. [9] the direction of the normal force is calculated by integrating over the surface normal of one of the particles, n = ∫ A1 ns ds∣∣∣∫A1 ns ds ∣∣∣ = ∑ i Ai 1ni s∑ i Ai 1 (4.19) where A1 =∑ i Ai 1 is the area belonging to one of the particles in the intersection poly- hedral and ns is the surface normal. Further the indentation depth, δ, is defined as the maximum cross sectional length of the intersection in the normal direction. Finally the contact point of the interaction is defined as the center of mass of the volumetric over- lap of the particles. Note that the same contact force model can be used for particle interactions with the triangle mesh world geometry. However, it requires the restric- tion on the world geometry to have a closed surface and have a defined interior, note that the interior opposed to the polyhedral particles can still be an open sub-space of R3. With the contact forces well defined, the right hand side of eq. 3.1 can be determined. Furthermore, as the required properties for each particle overlap are known, it now remains to compute these properties for polyhedrons, which will be considered in the next section 5. 17 4. Contact Forces for Irregular Shaped Particles 18 5 Polyhedral DEM Contact Detection Figure 5.1 shows an intersection of two particles. The derived contact forces from sec- tion 4 will need several properties of the overlapping volume from each such contact pair. Namely, the volume, V , of the overlap needs to be computed (the marked grey area), further to attain the direction of the force, n, eq. 4.19 needs to be evaluated which are equivalent of integrating the normals along the red triangles marked in fig- ure 5.1. Moreover, to get the indentation depth, δ, the extent of the overlapping volume needs to be evaluated along the direction n. Also the center of mass, c, of the overlap- ping volume needs to be evaluated to give a contact point of the interaction. In the following, the term mass properties will be used to refer to all the above properties of the volumetric overlap. This section will present the required theory and the method used for finding such properties for each particle contact. This section will also con- sider the broad phase contact detection i.e. to effectively, but approximately, filter out all contact pairs, both particle-particle and particle-triangle mesh contacts. Figure 5.1: Collision of two polyhedral particles. The intersection volume is marked in grey. The surface integrated to attain the normal force direction is marked as red. 5.1 Polyhedron representations A polyhedron is defined as a 3-dimensional solid composed of several polygons [30]. Typically, the polygons are connected at the edges. When using volumetric overlaps for DEM particles an essential restriction is that the interior of the polyhedron represents a closed subspace of R3. The convex polyhedron is an important special case, it can 19 5. Polyhedral DEM Contact Detection be defined as the intersection of several half-spaces, i.e the bounded solution, x, to the linear equation, Ax ≤ b (5.1) where A ∈ R3 ×Rm and b ∈ Rm . For two convex polyhedrons represented by A1,2 and b1,2, the intersection is defined as the bounded solution, x, to the linear equation, A′x ≤ b′ (5.2) where A′ = [A1, A2] ∈ R3 ×Rn+m and b′ = [b1,b2] ∈ Rn+m . Note how this implies that also the solution, x, is convex. These properties of convex polyhedrons will have large implications when intersection algorithms are considered in the next section 5.2. 5.2 Intersection nodes The intersection nodes of the volumetric overlaps needs to be computed in order to acquire the desired mass properties of each polyhedral intersection (see section 5.3 how the mass properties are computed from the intersection nodes). The intersection nodes of two general polyhedrons are defined as the extreme nodes of each convex part of the intersection. Namely, there are two types of intersection nodes (see also figure 5.2): • Already existing nodes of one of the polyhedrons which reside in the interior of the other polyhedron. • New nodes formed from an edge of one of the polyhedrons intersecting a triangle from the other polyhedron. In previous work with regards to polyhedral DEM[8, 11] the intersection nodes have in essence been found from iteration over the respective features (e.g. triangles/edges or half-planes) of each polyhedron. In [8] only convex particles were considered and the half-plane representation (see eq. 5.1) was used, whereas in [11] triangulations were used. A naive pseudo algorithm which computes the intersection nodes by iterating all features for a triangulation can be seen in Algorithm 1, note that no assumptions are needed on the convexity. Given two polyhedrons, each with n features, it directly follows that this algorithm will scale as O(n2). While for the convex case there exist theoretically optimal algorithms with worst case time complexity of O(n) [31, 32]. However, it is unclear if these theoretical optimal algorithms pose an actual advan- tage in terms of performance, since the constant factor of these theoretical optimal algorithms relative to the naive algorithm are unknown. This is especially true on the GPU, where, as outlined in section 2.2, simpler algorithms are typically preferred and the complicated control flow (e.g. one is recursive) of these theoretical optimal algo- rithms cast doubts how suitable they would be for a GPU implementation. Further, e.g. rock material can have many small concavities and thus it would be preferable to have algorithms capable of non-convex particles. No proven optimal algorithms for non- convex polyhedrons have been shown in the literature, and most implementations rely on convex decomposition [33]. Triangulations are even commonly decomposed into the smallest possible convex polyhedron, the tetrahedron, and the intersection for each tetrahedron is then resolved[34]. However, most literature on the subject of 20 5. Polyhedral DEM Contact Detection Figure 5.2: Intersection points of the polyhedral intersection is marked as dots. The blue dot shows an already existing node, the red dots are new nodes formed from triangle-edge intersections. non-convex polyhedral intersection works under the assumption of having a few large triangulations and the computation can then be accelerated by placing each triangle or tetrahedron into some accelerated data structure such as boundary volume hierar- chies or grid files. For polyhedral DEM there is vastly different assumptions, the number of triangles per particle is expected to be low. Moreover, as previously mentioned, these particles can have many small concavities which means that doing decomposition could substan- tially increase the triangle count. Further, in this work with a GPU implementation, millions of intersections are assumed to be performed concurrently. With many inter- sections performed concurrently the memory requirements of each intersection must be very low and consequently would using advanced data structures such as the grid file or bounding volume hierarchies for each triangle not be possible. Consequently, most algorithms presented in the literature aimed to improve the scaling of polyhedral intersection are not likely beneficial for the case of a polyhedral DEM GPGPU imple- mentation. Instead we propose a simple heuristic algorithm which takes advantage of a funda- mental assumption of soft DEM, namely that the particles are assumed to have small overlaps (see section 3 and 4). The basic idea of the algorithm is to quickly filter out tri- angles which are far from the overlapping volume and then using the naive algorithm 1 on the filtered triangles. Namely, the triangles is filtered by first defining a direction, d̂ = c2 −c1 ∥c2 −c1∥ (5.3) where ci , i = 1,2, is respective particle center. Further, the extent of each particle in that direction is expressed as, e1 = max j=1,...,nnodes v j ,1 · d̂; e2 = min j=1,...,nnodes v j ,2 · d̂ (5.4) where v j ,i , j = 1, ...,nnodes , is the nodes of polyhedrons i = 1,2. Now one can define half-planes orthogonal to d̂ at each distance e1,2. The half-planes is defined as the 21 5. Polyhedral DEM Contact Detection Algorithm 1 Given two polyhedrons, P1 and P2 find the intersection nodes, c. Let c = {} be the intersection nodes. for each node, p, in P1 do if p inside P2 then Let c = c ∪ {p} end if end for for each node, p, in P2 do if p inside P1 then Let c = c ∪ {p} end if end for for each edge, e, in P1 do for each triangle, t , in P2 do if e intersects t at p then Let c = c ∪ {p} end if end for end for for each edge, e, in P2 do for each triangle, t , in P1 do if e intersects t at p then Let c = c ∪ {p} end if end for end for following, h1 = {x ∈R3|x · d̂ ≤ e1} h2 = {x ∈R3|x · d̂ ≥ e2}. (5.5) Then the triangles in each particle which does not reside in the half-plane of the other particle can be removed from consideration. Also, if the two half-planes do not inter- sect, then the particles cannot intersect. If there exist an intersection of the half-planes, this intersection is convex, and triangles from both of the particles that reside in the intersection are guaranteed to be included. Since algorithm 1 will be applied on the filtered triangles, the last note is important, because it enables for using ray-tracing to check if nodes are inside the polyhedrons even on the filtered triangles. An illustration of the filtering is seen in figure 5.3, where the intersection of the half-planes are marked as the grey area. This entire filtering process can clearly be performed in O(n) time complexity. It also has no additional memory requirements since the half-planes are readily recomputed at each time step. For the typical use case in DEM this filtering is effective, as later seen in section 7.2.1. There is however no guarantee of its effectiveness for completely gen- eral polyhedrons, as it is easy to construct examples where no triangles at all will be excluded. 22 5. Polyhedral DEM Contact Detection Figure 5.3: Illustration of the filtering of triangles. The grey area is the intersection of the two respective half-planes used for filtering. The red edges (in 3D triangles) are include for further computation of mass properties. Importantly the algorithm cannot fail, this property is independent of the direction, d̂. The proof of this is trivial, since it is enough to prove that the half-plane of respective particle will in all cases cover the entire particle. This directly follows from the defini- tion of the half-planes in eqs. 5.4 and 5.5 in combination with the fact that triangles are convex. 5.3 Intersection mass properties The standard approach [8, 10] to evaluate mass properties of overlapping polyhedron particles in DEM consist of triangulating the intersection volume and then from the resulting triangulation calculating the desired mass properties. Further, triangulation of the volume is typically restricted to convex volumes, where e.g. convex hull algo- rithms or similar have been used. This method is flawed in two regards; first it is doing more work than needed and, due to the triangulation, it is restricted to convex volumes (see section 5.1). This section aims to remedy these issues by presenting an alternative method to perform these calculations. When using triangulations to compute the mass properties one stores each constituent triangle, k, of the polyhedral intersection. Further to compute the mass properties the nodes, vi k , i = 1,2,3, of each triangle k is needed. Then for example the volume of the triangulated intersection volume can be computed as, V = n∑ k=1 1 6 ∣∣(v1 k ×v2 k ) ·v3 k ∣∣, (5.6) where the other mass properties can be computed in a similar manner. 23 5. Polyhedral DEM Contact Detection However, we propose that using an alternative data structure, based entirely on local properties of each node of the intersection, can significantly simplify the computation of the mass properties. The foundation of the data structure, introduced by W. Ran- dolph Franklin [34], is a cusp defined as the following quadruple, C = (p, t̂, n̂, b̂) (5.7) where each vector is defined as the following: p : The Cartesian coordinates of the node. t̂ : Unit vector in the direction of an adjacent edge of the node. n̂ : Unit vector in the direction of an adjacent face to t̂ and perpendicular to t̂. b̂ : Unit vector in the direction of the interior of the polyhedron which is perpendicular to both t̂ and n̂. Note also how b̂ only adds one bit of information, since b̂ =± n̂×t̂ ∥t̂×n̂∥ , i.e. it only adds the sign of the vector. Now a polyhedron can be represented as an unordered list of cusps, L = {Ci }, i = 1, ..,m where it is noteworthy that the only constraints on the polyhedron is that it is finite and that the interior of the polyhedron is a closed subspace of R3. Each cusp can be seen as a signed simplex defined by the points,( pi , pi − t̂i ·pi t̂i , pi − t̂i ·pi t̂i − n̂i ·pi n̂i , O = pi − t̂i ·pi t̂i − n̂i ·pi n̂i − b̂i ·pi b̂i ) (5.8) where the last equality trivially is seen by noting that (t̂i , n̂i , v̂i ) defines an orthogonal coordinate system of all 3 dimensions. Now to compute the area, A, volume, V , and center of mass, c, of the polyhedron the properties are accumulated from each signed simplex independently: A = m∑ i=1 Ai = m∑ i=1 1 2 t̂i ·pi n̂i ·pi (5.9) V = m∑ i=1 Vi = m∑ i=1 −1 6 t̂i ·pi n̂i ·pi b̂i ·pi (5.10) c = 1 V m∑ i=1 Vi ci = 1 V m∑ i=1 Vi 4 (3pi +2t̂i ·pi t̂i + n̂i ·pi n̂i ) (5.11) For given input data, the cusp model is inferior to the regular triangle representation in terms of number of operations required for the mass property computation. Since for a polyhedron of n triangles the number of cusps required to represent this poly- hedron is m = 6n. On a NVIDIA GPU the dot product requires 3 floating point opera- tions (2 fused-multiply add and 1 multiplication) and the cross product 6 floating point operations (3 fused-multiply add respective multiplication). Now to compute the vol- ume of the polyhedron the total number of floating point operations (FP) for respective method is, # FP ∼ 9n for triangle representation # FP ∼ 11m = 66n for cusp representation. 24 5. Polyhedral DEM Contact Detection While this ratio can be made more favorable to the cusp model in the case when all three properties are computed at once, it nevertheless requires substantially more float- ing point operations. However, for the case of polyhedral intersection there is a substantial advantage of the cusp representation. Namely that no triangulation of the intersection volume is required since the local properties of each cusp can be determined at each intersec- tion point. Moreover, since the cusp model also works on completely general polyhe- drons no convex decomposition of the particle models is needed. This representation also fits the streaming processors of the GPU especially well, since both the triangula- tion and the decomposition require more complicated control flows and hence runs a greater risk of thread divergence. 0 20000 40000 60000 80000 100000 120000 140000 160000 Distance from origin (m) 4.90 4.95 5.00 5.05 5.10 5.15 5.20 Vo lu m e (m 3 ) Volume computation with cusp model Reference frame Body frame Figure 5.4: The volume for the same volumetric overlap as a function of its distance from origin. When the cusp model is naively used in the reference frame significant errors are introduced by the floating point arithmetic. Note that the ratio between vol- ume and distance is the important factor here, but with floating point representations also the absolute values will determine the exact error, hence the absolute values are shown in the figure. In implementations which use floating point arithmetic a significant problem remains with the cusp model. Since each simplex has one node at the origin, then when the geometries are far from the origin |Vi | ≫ V in eq. 5.10 and significant errors are intro- duced in the floating point arithmetic. For polyhedral DEM there is however a simple solution, namely that the intersection computation is done in the body frame of one of the particles. Figure 5.4 shows the volume computation as a function of the distance from origin. For the reference frame computation the error increase with the distance 25 5. Polyhedral DEM Contact Detection whereas it remains constant for the body frame computation since in that case the computation is effectively always done in the left side of the figure i.e. close to the origin. 5.4 Multiple contact error It is important to highlight that with the above presented methods multiple contacts are not properly resolved. Namely each particle-particle contact is always modelled as a single contact, which by the non-linearity of the force models, and also torque, is not correct. However, one should note that all other DEM methods does the opposite error, which is that it models single particle contacts as multiple contacts. However, since the force models are on such a simple form, the basic motivation why this is not properly resolved is that large modelling errors already exist. Moreover, for the typical use cases of DEM, merging the contacts should not be a large concern, but if for example more complex shapes such as toroidal particles is studied, this modelling error should potentially be further investigated. 5.5 Broad phase contact detection Besides resolving the exact intersection between the particles, a DEM solver also needs a so called broad phase contact detection algorithm, as outlined in section 3. Where broad phase contact detection refers to effectively, but approximately, finding all the potential particle pairs. This is to avoid the quadratic scaling to the number of particles of the naive approach (which is to check all possible particle pairs). Typically acceler- ated data structures are used to improve the scaling. The two preeminent methods for this is the Cartesian grid and the bounding volume hierarchy (BVH). The Cartesian grid divides the space into cells and each particle is classified according to which cells it covers. Now all potential pairs can be resolved by only checking particles in the same or neighbouring cells. The BVH on the other hand is a tree data structure where each node is represented by a geometric object that contains all of its children [35], and in DEM the leafs are represented by bounding volumes of the particles. Now potential contacts for a particle is efficiently resolved by traversing the tree structure. For the case of polyhedral DEM on the GPU it has been reported in the literature [7] how the grid implementation leads to severe performance degradation when the par- ticle aspect ratio goes beyond 4:1. This degradation appears since the grid has to com- promise between high memory usage and a more fine grained grid. Thus, when the aspects ratios are large the grid can be adjusted to roughly fit one of the smallest parti- cles, but then the memory consumption can be significant. Thus to limit the memory consumption one has to increase the cell size, but then, as reported in the literature, the performance suffers since now more contact pairs are emitted from the data struc- ture. The same problem also appears in cases where there is a few separated dense areas of particles, since then also the space between the dense areas needs to be cov- ered by the grid. For the BVH this problem does not arise, here the memory usage is completely separated from the tightness of the bounding volume. Hence, with the 26 5. Polyhedral DEM Contact Detection desire of a general framework, capable of handling varying environments and particle shapes, only the case of BVH broad phase contact detection will be considered in this thesis. A B C A B C Figure 5.5: A Bounding Volume Hierarchy (BVH). Each node in the tree structure con- sist of a geometric object (here a rectangle) which covers all its children. This per- mits for fast overlap tests by traversing the trees structure from the root. Image by Schreiberx - Own work, CC BY-SA 3.0[36] An example of a BVH can be seen in figure 5.5. The basic idea behind BVH is that the parent nodes of the tree can be tested with a geometric predicate, e.g. an overlap test, and if the predicate is true for a parent node one continues down to its children, however if a parent node does not fulfill the predicate all nodes beneath it can be dis- regarded. In this way, large portions of the tree structure can quickly be excluded from the search. For most realistic use cases one can typically obtain logarithmic scaling for each search query [35], albeit the worst case time-complexity is not improved from the naive approach. There are two main algorithmic aspects of a BVH structure, the build and the search algorithms, which both will be considered in the following sections. 5.5.1 Build algorithm The build algorithm should given a set of geometric objects generate the tree structure of these objects and populate each node with its bounding volume. The build algo- rithm needs to be balanced according to; build speed, memory consumption (both temporary and permanent) and the quality of the final tree structure. While there is several different factors to consider for the quality of the tree structure, the short an- swer is that a higher tree quality yields faster search performance. To make the com- promise between tree quality and build speed an important guideline is the build to search ratio. For instance, in scenarios with rigid geometries, the build can be pre- computed once, whereas the tree can be traversed an indefinite number of times. Thus for rigid geometries, the speed of the build algorithm is of a rather low significance and mainly the quality of the tree structure should be of consideration. However, for con- tact detection of moving objects, e.g. particles in a DEM simulation as relevant for the current thesis, one typically only searches once for all objects of the tree against them- selves. Thus, for the particles in DEM can the time spent on the build algorithm be directly weighted against the time saved in the traversal [35]. Mainly due to the advent of GPU-accelerated real-time ray-tracing, e.g. for computer games, there has been a significant advance in constructing BVHs on the GPU in the 27 5. Polyhedral DEM Contact Detection past few years [37, 38, 39, 40]. To build the tree structure on the GPU often requires a significant different approach to the CPU-counterpart since the CPU algorithms rarely parallelize well. Most of the work has been performed for binary BVHs, i.e a degree of 2 per node. A notable early work is [37] where the feasibility of constructing the BVH on the GPU is shown. In this early work a simple build algorithm based on Morton codes was developed which was significant faster than CPU algorithms. This signifi- cant faster build could be achieved while still maintaining decent search performance. The work on binary BVH in [37] has been further extended in later articles optimiz- ing both the quality of the tree and the build time [38, 39, 40]. It should be noted that some of these improvements are with regards to optimizing the Surface Area Heuristics (SAH), which in essence is minimizing the surface area of subtrees and are thus mostly relevant for ray-tracing applications. A in-house FCC build algorithm developed by the author has been applied in this the- sis work and the algorithm is based on some of the aforementioned binary BVH re- search [37, 38, 39, 40]. While the algorithmic details are not within the scope of this work, some aspects of the induced memory layout from the build algorithm will be of significance for later algorithmic considerations. Namely has the tree structure been adapted to higher degree nodes and especially has the memory layout been optimized for breadth first search. With the later meaning that each level of the tree is placed lin- early in memory (as in [37]) with spatially close nodes also close in memory, both these efforts to improve the cache hit rate. 5.5.2 Search algorithm The two most prominent algorithms for traversing the hierarchy is the breadth-first search (BFS) and depth-first search (DFS). BFS traverses the tree in a level-by-level manner, meaning that all active nodes are processed in a single level at once and only after is the next level considered. On the contrary, does DFS traverse a branch of the tree until termination, and at termination proceeds to the latest branch split. In the context of contact detection for real-time applications with serial implementations the most popular of the two have been reported to be the DFS. This preference for DFS is due to a larger amount of temporary memory is needed for BFS. Limiting this by some form of iterative implementation is not well suited for real-time applications, since consistent performance is often as important as the speed [35]. On the contrary, for DEM only the total simulation time is of importance, thus this limitation is of less sig- nificance. The only relevant case for GPU accelerated search queries of a BVH is for the case where many queries are performed at once. Since otherwise at the root only a single thread will be active and the massive parallelism of the GPU is not properly utilized. Further, for the case of contact detection between objects (i.e not ray-tracing) it has been shown in the literature how a BFS implementation is superior in terms of efficiency [41]. In this work a BFS implementation is used. To constrain the memory the searches are made in batches which are iteratively adjusted to not breach the memory limitation. Due to the relatively slow changes in geometry in DEM this works well, and as previ- 28 5. Polyhedral DEM Contact Detection Algorithm 2 Traverse a level, l , in a BVH and store the to be further processed nodes at next level l +1. Let n = {} be the nodes at next level to further process. for each node-search object pair, v,o, on level l in parallel do for each child, c, of the node v do if c intersects o then Let n = n ∪ {c} end if end for end for ously mentioned the uneven performance of this method is not a concern. A pseudo algorithm for each level traverse is seen in Algorithm 2. On a psuedo level this algo- rithm are very simple, however an efficient massively parallel implementation is a sig- nificant challenge. Especially packing the node, c, into the list, n, is significant to attain an efficient GPU BFS algorithm. This process is often referred to as stream compaction, which is for massively parallel implementations a non-trivial task [42], which incorpo- rates to compute a prefix scan of the number of items to compact [43]. Also a DFS algorithm was implemented, but the batched BFS outperformed it by a factor of 5-10 times. By analyzing the algorithms with NVIDIA Nsight Compute[22] this large difference can largely be explained by a significant lower L1/2 cache hit rate of the DFS search. 5.6 World geometry contact detection As outlined in section 3, it is desirable that the particles can interact with a surround- ing geometry. In this work only the case of triangle mesh geometries are considered. As mentioned in section 4, the same contact forces for the particle interactions are used for the particle interaction with world geometry. This enforce some restrictions on the triangle meshes used for the world geometry, namely that their surface must be closed and have a defined interior. Since the same contact forces are used as for the particle interactions, this also means that in large the contact detection remains the same. However, the number of trian- gles meshes are expected to be few, but to have many triangles, which is the opposite compared to the particles. Thus for the world geometries, each triangle is instead used in the same BVH implementation presented in section 5.5, this is in contrast with the particles, where a bounding volume of the entire particle were used in the BVH. Thus can the particle-triangle pairs be found by searching over the BVH containing the tri- angles. After computing the particle-triangle pairs, algorithm 1 can be applied and the same scheme for finding the properties as previously presented in section 5.3 is then followed. Note that for typical DEM applications the cost of finding, and resolving the particle-triangle pairs is small compared to particle-particle pairs, since the number of particle-triangle pairs will scale as bulk surface whereas the particle-particle pairs 29 5. Polyhedral DEM Contact Detection scales as the bulk volume. 30 6 Physical Verification and Convergence of Method This section presents the verification of both the method and the implementation. Since the force model is derived from contacts of spherical particles the model is first verified for two spherical particles colliding. Next, the convergence of this case with respect to particle resolution is studied. Then, the same experiments are repeated for an irregular rock shaped particle. To further verify the method on a larger scale it is investigated whether relevant physi- cal properties of particle systems can be captured. In particular we investigate whether the solver achieves convergence of the repose angle with respect to particle resolution, and in addition the solver is also verified against data from an experimental setup for rock material. Due to DEM requiring substantial calibration experiments to properly validate it against experimental data, doing such a complete validation study against experiments are out of scope for this mainly method driven thesis. Thus only verification of the methods are considered. 6.1 Verification and convergence for spherical polyhedrons Since the force models in use (see section 4) are derived from the spherical case the method is first verified for two spheres colliding. To average out the effect of dis- cretization the experiments are repeated many times with the initial orientation of the spheres sampled according to a uniform distribution. Further, only the normal elastic force is considered. The radii of the spheres are R = 0.1m and the initial speed of both particles are 5ms−1. The spherical polyhedrons used for this experiment were generated by the built-in spherical triangulation tool in the software IPS - Industrial Path Solutions [44]. Fig- ure 6.1 shows the generated polyhedrons for a few different resolutions. It should be noted that the models were normalized to give a consistent volume. This is not the case for the spheres directly after generation in the software IPS since the spheres are triangulated by placing the nodes on the surface of a sphere with a given radius, mean- ing that the volume decreases with the resolution of the triangulation from the models. 31 6. Physical Verification and Convergence of Method Figure 6.1: Spherical polyhedrons used for this study with the resolutions 40, 84 and 180 (left to right). However, for the verification of the solver it is of a greater importance to keep the mass and volume of the particles consistent. This is to keep the initial kinetic energy consis- tent, since if not, it will skew the kinetic energy response studied in below experiments. In practical terms, a higher initial kinetic energy will result in a longer response. 0.00 0.01 0.02 0.03 0.04 Time (s) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 N or m al iz ed ki ne tic en er gy (E k /E 0 k ) Avg. Sphere (E = 1.00e+06 Pa) 0.000 0.005 0.010 0.015 0.020 Time (s) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Avg. Sphere (E = 2.15e+07 Pa) Sphere 40 tri. 60 tri. 84 tri. 112 tri. 144 tri. 180 tri. Figure 6.2: The normalized kinetic energy averaged over 10000 instances as a function of time during collision of spheres. The dotted line shows the response for a perfect sphere, and the solid lines show the response for polyhedral spheres of different res- olutions. The left figure shows the response for spheres with an Young’s modulus of E = 1MPa and the right figure for stiffer particles with E = 21.5MPa. The initial orien- tation of the polyhedrons are uniformly distributed. Figure 6.2 shows the average kinetic energy as a function of time during a collision of two spheres. The kinetic energy, Ek , is normalized to the initial kinetic energy, E 0 k (i.e. the kinetic energy of both particles before the collision) as Ek /E 0 k . In the figure to the left the Young’s modulus is, E = 1MPa, and in the figure to the right the Young’s modulus is E = 21.5MPa. In both cases the particle material has a Poisson’s ratio of ν= 32 6. Physical Verification and Convergence of Method 0.25. The figures shows the response for different resolutions of the polyhedral spheres, and also for a perfect sphere simulated with the internal FCC spherical DEM solver. For the softer material properties it is clear that the response of all the polyhedrons closely match the perfect sphere with only slight deviations. However, for the stiffer particles the lower resolution polyhedrons leads to a significant error in the response. This is expected since stiffer particles will lead to less overlap and effectively making the overlapping volume resolution lower. 0.00 0.01 0.02 0.03 0.04 Time (s) 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 N or m al iz ed ki ne tic en er gy (E k /E 0 k ) S.D Sphere (E = 1.00e+06 Pa) 0.000 0.005 0.010 0.015 0.020 Time (s) 0.00 0.05 0.10 0.15 0.20 0.25 0.30 S.D Sphere (E = 2.15e+07 Pa) 40 tri. 60 tri. 84 tri. 112 tri. 144 tri. 180 tri. Figure 6.3: The standard deviation of the kinetic energy over 10000 instances as a func- tion of time for two spheres colliding. The left figure shows the response for spheres with an Young’s modulus of E = 1MPa and the right figure for stiffer particles with E = 21.5MPa. The initial orientation of the polyhedrons are uniformly distributed. Note that for a perfect sphere the standard deviation is always zero. In figure 6.2 only the average response in the kinetic energy was considered, this to verify the force model and the method of using volumetric overlaps. However, as men- tioned, by averaging out the response, the fluctuations induced by the discretization of the particles was neglected. Thus figure 6.3 also shows the response in the standard deviation of the normalized kinetic energy, which becomes a measure of the fluctua- tions induced from discretization. The two notable peaks in the figure show that the discretization cause fluctuations in the energy response, and that the fluctuations are most profound at the beginning and end of the collision. At the beginning and end of the collision, the overlapping volume is smaller, and hence will the effective resolution of the overlapping volume be lower, which cause larger fluctuations. Also, one can note that the standard deviation is higher for stiffer particles and that there is a clear reduc- tion when the number of triangles are increased. However, there is a notable peak in the standard deviation for even the highest resolution polyhedrons, whereas for per- fect spheres it should always be zero. Finally, to get an overview of the convergence for a larger range of particle stiffness, 33 6. Physical Verification and Convergence of Method 40 60 80 100 120 140 160 180 Number of triangles 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 N or m al iz ed ki ne tic en er gy (E k /E 0 k ) Min. average in energy response 40 60 80 100 120 140 160 180 Number of triangles 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 N or m al iz ed ki ne tic en er gy (E k /E 0 k ) Max. S.D in energy response E = 1.00e+06 Pa E = 2.15e+07 Pa E = 4.64e+08 Pa E = 1.00e+10 Pa Figure 6.4: Convergence of the kinetic energy response for spheres with respect to the resolution of polyhedral spheres. To the left the minimum average normalized kinetic energy is shown and to the right the maximum standard deviation of the same quan- tity is shown. Clearly the convergence gets significantly slower as the stiffness of the particles increase. the minimum value of the averaged normalized kinetic energy, i.e. the minimum of the data plotted in figure 6.2, and the maximum value of the standard deviation in the normalized energy response, i.e. the maximum of the data shown in figure 6.3, is con- sidered in figure 6.4. Note that the maximum/minimum values are used instead of the average or RMSD over time, since in that case the measure become dependent of the length of the simulation, which has to be different to attain to the different stiffness. Hence to be able to compare between the different resolutions and stiffness, the ex- trema were considered. However, it should be noted, that these are not necessarily a precise measure of how well the response match the perfect sphere, but rather for com- parisons between the polyhedrons. Nevertheless, figure 6.4 clearly show that there is a clear convergence in the response for the range of particle stiffness considered. Also it is clear from figure 6.4 that stiffer particles leads to larger discretization errors, not only affecting the fluctuations in the response, but also the average response. This result is fundamental to polyhedral DEM. Foremost, that the method converge with respect to particle resolution, instead of an erratic response as is the case for the multispheres, is in itself profound. Also, a limitation of the method is here clearly shown, as the capabilities to model smooth surfaces with non-zero curvature is lim- ited and in addition largely depends on the material properties. The results shows that the polyhedrons are to a larger extent capable of modelling smooth surfaces for softer particles than for hard particles. 34 6. Physical Verification and Convergence of Method 6.2 Verification and convergence for irregular shaped par- ticles While the results for the spherical particles showed that the method in contrast to the multisphere method can converge in resolution, the use of spherical particles is how- ever not the most relevant for actual usage of the method. In this section the method is studied for irregular shaped particles to ensure that the method can converge also for more complex shapes. Figure 6.5: The irregular shaped polyhedral particle models used in this study. The fig- ure presents particles with 10, 40, 70 and 1000 (original) triangles (left to right). While not clear from the figure several small concavities exist in the model. Figure 6.5 shows the irregular shaped polyhedrons used in this study. The rightmost polyhedron is the original triangulation consisting of 1000 triangles. The three left- most polyhedrons consisting of 10, 40 and respectively 70 triangles were generated by using the IPS - Industrial Path Solutions[44] triangulation simplification tool with the standard settings. The models were then normalized to give a consistent volume which the IPS simplification tool does not attempt to attain. Note that while not clear from the figure, the polyhedrons have several small concave sections. The same numerical experiments as for the spheres were performed with the irregular shaped particles, namely studying the kinetic energy response during collision of two particles. Besides the changed particle model the experiments were conducted in the same way and with same parameter values as for the spherical particles. The average normalized kinetic energy for the irregular shaped particles is shown in figure 6.6. The left figure shows for a Young’s modulus of E = 21.5MPa and on the the right E = 464.0MPa. The response for the irregular shaped particle is noticeable differ- ent opposed to spheres. For this particle model the minimum average kinetic energy does not approach zero, which is expected due to the irregular shape combined with the random initial orientation. Note that for an individual collision the kinetic energy will go to zero. The standard deviation of the normalized kinetic energy at different times are shown in figure 6.7. The standard deviation are significant higher compared to the spheres, which is expected since now the original shape will also induce fluctu- ations in the response. While there is a difference in the response for both the average and standard deviation when the resolution change, only small shifts can be seen and at least in the average response there is a tendency of convergence when the resolution is increased. 35 6. Physical Verification and Convergence of Method 0.00 0.01 0.02 0.03 0.04 0.05 Time (s) 0.75 0.80 0.85 0.90 0.95 1.00 N or m al iz ed ki ne tic en er gy (E k /E 0 k ) Avg. Rock (E = 2.15e+07 Pa) 0.00 0.01 0.02 0.03 Time (s) 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00 Avg. Rock (E = 4.64e+08 Pa) 20 tri. 40 tri. 60 tri. 80 tri. 100 tri. Figure 6.6: The normalized kinetic energy averaged over 10000 instances as a function of time during collision of two irregular shaped particles. The different coloured solid lines show the response for different resolutions. The left figure shows the response for particles with an Young’s modulus of E = 21.5MPa and the right figure for stiffer particles with E = 464.0MPa. The initial orientation of the polyhedrons are uniformly distributed. Note that a similar analysis as for the spheres in figure 6.4 is not included. Since such a study could easily be misleading due to the more irregular shape of the response curves. Moreover, as will be further discussed in section 8, the actual convergence is not the important conclusion. The importance of these studies, rather is to highlight how utilizing volumetric overlaps of the polyhedrons leads to a force model that do not significantly change when small geometric changes are introduced, i.e. that it does not behave like multispheres or feature by feature polyhedral force models. This important property of the volumetric overlaps should already be evident from figure 6.6 and figure 6.7. 6.3 Verification and convergence on a laboratory scale The numerical experiments in the previous sections to a large extent separated out solely the geometry of the particles and how changes to the geometry affected the the normal elastic response. Those experiments verified that polyhedrons with volumet- ric overlaps were well-behaved with respect to small geometric changes. However, to verify the tangential components of the force model and also study the aggregated ef- fects of using polyhedrons, this section will study the method on a larger scale. The experiments are increased to a laboratory scale based on a calibration rig experiment. The experimental setup can be seen in figure 6.8, and both numerical and laboratory experiments were conducted as follows[45, 46]: 1. Particles were placed/generated in the hopper (the wedge shape at the top of 36 6. Physical Verification and Convergence of Method 0.00 0.01 0.02 0.03 0.04 0.05 Time (s) 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 N or m al iz ed ki ne tic en er gy (E k /E 0 k ) S.D. Rock (E = 2.15e+07 Pa) 0.00 0.01 0.02 0.03 Time (s) 0.00 0.05 0.10 0.15 0.20 S.D. Rock (E = 4.64e+08 Pa) 20 tri. 40 tri. 60 tri. 80 tri. 100 tri. Figure 6.7: The standard deviation of the normalized kinetic energy from 10000 in- stances as a function of time during collision of two irregular shaped particles. The re- sponse for different resolution particles are displayed as different coloured lines. The left figure shows the response for particles with an Young’s modulus of E = 21.5MPa and the right figure for stiffer particles with E = 464.0MPa. The initial orientation of the polyhedrons are uniformly distributed. figure 6.8) with a given size distribution and specified total mass. 2. The trap doors below the hopper were opened. 3. Flow phase: Particles flow down from the hopper on to the tilted plane and finally to the bottom plate. 4. Steady state phase: Particles form a steady pile at the bottom plate. The repose angle is measured when the kinetic energy has converged to zero. For the experiments, the repose angle was defined as the maximum angle of the pile. For each numerical experiment a 2-dimensional height map was created by dividing the particles into bins in the horizontal direction and measuring the maximum height in each bin. Then the repose angle was defined as the maximum derivative of a fitted sigmoid function to the height map of the system, h(x) = c 1+exp{a(x −b)} (6.1) where a, b and c are constants fitted to each height map, hi = h(xi ), i = 1, ...,20, by using the function curve_fit from the python3 package scipy.optimize. Figure 6.9 shows an example of a height map with its corresponding sigmoid function. In an earlier phase of this work, the numerical experiment on the calibration rig were conducted without any tangential spring stiffness and instead purely based on the force model proposed by Nassauer et. al. [9]. The simulation setup can be seen in table A.1. From such experiments it was clear that the desired behaviour of the system could not be achieved. Specifically, the particles did not reach a steady-state of non- zero repose angle. Instead small vibrations in the particles caused the repose angle to 37 6. Physical Verification and Convergence of Method Figure 6.8: The calibration rig to measure the repose angle of rock material. To the left the real experimental setup is seen, to the right the rig modelled in the polyhedral DEM solver is shown. For the illustration of the DEM simulation the particles are colour coded according to their speed. slowly drift towards zero. While there is a possibility of this being an issue with the parameter setup, a range of different friction parameters were investigated, but even if the simulation in other regards showed a highly unphysical behaviour due to high fric- tion (e.g. the particles moving very slowly down the middle plate) there were still small vibrations in the pile of particles which lead to drifts in the repose angle. It should be mentioned that no studies were performed of this and no similar behaviours were re- ported in [9]. Since the desired behaviour of the system could not be achieved without the tangen- tial spring stiffness a force model containing such component was derived from the force models for spheres (see section 4). Besides changed parameters from the use of different force models, the same setup as without a tangential spring was used for the simulation (see table A.2). The response of the the calibration rig simulation with respect to particle resolution is seen in figure 6.10. Both the averaged kinetic energy during the flow phase of the experiment and the repose angle of the steady-state phase are shown. There are signif- icant fluctuations in the repose angle even for a fixed number of triangles. This partly can be due to measurement errors from the fitted sigmoid function. However, all fit- ted sigmoid functions and their corresponding height map where manually inspected and there was a noticeable difference also in the height maps. Nevertheless, to verify that the variance measured in the repose angle was not entirely due to measurement errors, also the kinetic energy is shown in figure 6.10. The measurement errors in the kinetic energy should be minimal and yet significant fluctuations in the data can be seen. This indicates that fluctuations in the physics itself exist, which is expected since 38 6. Physical Verification and Convergence of Method 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Length (m) 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 H ei gh t( m ) Height map from rig simulation Fitted sigmoid Data Figure 6.9: An example height map from the rig simulation. Also shown is a fitted sigmoid function to the height map. The repose angle is measured as the maximum derivative of the fitted sigmoid function. the particles are generated randomly from a given size distribution. However, notice- able larger fluctuations can be seen in the repose angle compared to the kinetic energy. It is unclear whether this difference is due to larger measurement errors in the repose angle or induced from the physics, this uncertainty suggest that for future verification and validation of DEM better measurements than the repose angle may be found. To compensate for these fluctuation a total of 11 simulations were performed per res- olution. There is notable convergence for both the mean of the kinetic energy and repose angle. At around roughly 60 triangles per particle the mean of both measures roughly stabilizes within the range of the standard deviations. For the numerical ex- periments a stiffness of E = 20MPa was used. One can note that the average response for the single collision for a similar stiffness shows a comparable convergence as for the calibration rig, at least in a broad perspective, where the single collision roughly have converged after 60 triangles, similar to the calibration rig. Further, from figure 6.10 it is also clear that the most coarse grained particles at 10-20 triangles shows significant different values. This can intuitively be explained by the coarser particles having sharper edges. The sharper edges cause a higher degree of interlocking between the particles. The interlocking lowers the kinetic energy of the system during the flow phase due to lower flow rate from the hopper. Whereas for the repose angle the increase in interlocking makes the pile of particles more stable and causes the increase in repose angle. Further the repose angle can also be compared to experimental values, the experimental range of repose angles are between 33 to 42 degrees which are for a range of different materials and particle shapes. In figure 6.10 it is clear that only varying the particle resolution of one particle model almost covers the entire experimental range. Note that the experimental data was existing data which 39 6. Physical Verification and Convergence of Method 10 20 30 40 50 60 70 80 Number of triangles 0.95 0.96 0.97 0.98 0.99 1.00 1.01 1.02 R el at iv e ki ne tic en er gy Kinetic energy (Calibration rig) 10 20 30 40 50 60 70 80 Number of triangles 32 33 34 35 36 37 38 39 R ep os e an gl e (d eg re es ) Repose angle (Calibration rig) Data Mean Standard deviation Figure 6.10: The kinetic energy during the flow phase of the calibration rig and the re- pose angle during the steady-state phase. Each dot shows a measure from an separate experiment. The dotted line shows the mean of the measures, and the grey area shows the range of the standard deviation. Noticeable fluctuations in both measures can be seen. This can be compared to the experimental range of 33 to 42 degrees in repose angle for a range of different materials and particle shapes. were not sampled for this purpose and hence no stronger verification could be made. Rather the purpose of this verification was to consider if the general behaviour of the experimental setup could be captured by the method, which figure 6.10 clearly shows is the case. However, it should be highlighted that stronger verification and validation of the method should be one of the top priorities for further work. Indirectly, figure 6.10 also demonstrates the need for DEM solvers to be highly effi- cient and capable of taking full advantage of modern hardware. Due to the high com- putational demands of DEM the open literature suggests that the typical studies are conducted on only a few data points, which figure 6.10 clearly shows can lead to signif- icant misguidance. Hence to further enhance the usefulness and predictability of the method one of main research goals of DEM should be