DF

A GPU Polyhedral
Discrete Element Method
Formulation and implementation of large scale simulations for
non-spherical particles using novel GPU techniques

Master’s thesis in Complex Adaptive Systems

ADAM BILOCK

Department of Mathematical Sciences
CHALMERS UNIVERSITY OF TECHNOLOGY
Gothenburg, Sweden 2020


MASTER’S THESIS 2020

A GPU Polyhedral Discrete Element Method

Formulation and implementation of large scale simulations for
irregular non-convex particles using novel GPU techniques

ADAM BILOCK

DF

Department of Mathematical Sciences
CHALMERS UNIVERSITY OF TECHNOLOGY

Gothenburg, Sweden 2020


A GPU Polyhedral Discrete Element Method
Formulation and implementation of large scale simulations for irregular non-convex
particles using novel GPU techniques
ADAM BILOCK

© ADAM BILOCK, 2020.

Supervisor: Klas Jareteg, Fraunhofer-Chalmers Research Centre
Examiner: Anders Logg, Department of Mathematical Sciences

Master’s Thesis 2020
Department of Mathematical Sciences
Chalmers University of Technology
SE-412 96 Gothenburg
Telephone +46 31 772 1000

Cover: One million Schönhardt polyhedral particles simulated with the presented method.

Typeset in LATEX, template by David Frisk
Printed by Chalmers Reproservice
Gothenburg, Sweden 2020

iv


A GPU Polyhedral Discrete Element Method
Formulation and implementation of large scale simulations for irregular non-convex
particles using novel GPU techniques
ADAM BILOCK
Department of Mathematical Sciences
Chalmers University of Technology

Abstract

This thesis presents a Discrete Element Method (DEM) to simulate irregular shaped
particles by a non-convex polyhedron representation. By using novel GPU techniques
and an efficient HPC implementation the presented method shows a level of through-
put not previously attained with polyhedron particle representations in the open liter-
ature. Further, via such a representation the exact volumetric overlaps of the particles
are resolved and, as a result, the method is robust and numerically stable with respect
to geometric changes. The efficient and well-behaved method allows for significant
progress in the study of granular materials, where previously mainly the inadequate
particle representation of spherical or clumped spherical particles have been used.

The exact volumetric overlaps are resolved by a simplex representation which allows
for the use of non-convex particles without any decomposition, aiding both perfor-
mance and the ease of use of the method. Further, care is given to attain efficient scal-
ing of the method with respect to particle resolution. Such a property enables studies
on higher resolution particles than previously shown in related work, and is result of ef-
ficient filtering of polyhedron triangles in the narrow contact phase. In addition, other
novel techniques, such as a GPU BVH implementation for the broad phase contact de-
tection, also aids the performance and the flexibility of the proposed and implemented
method.

The method is shown to be convergent with respect to particle resolution, both for in-
dividual particle collisions and also for laboratory scale particle systems. The HPC im-
plementation is proven to be highly efficient, where, for instance, a one second simula-
tion of one million non-convex particles is simulated within an hour on a single GPU.
By the effective filtering of triangles in the narrow contact phase, near linear scaling
can be achieved with regards to particle resolution.

Keywords: Discrete Element Method, Polyhedral intersection, GPU, HPC, Particulate
systems, Non-convex particles.

v


Acknowledgements

Foremost I want to acknowledge my supervisor Klas Jareteg for always taking time for
discussions. Your input on everything between physics, algorithms and report writing
have been highly valuable and appreciated during this thesis work. I also want to ac-
knowledge Johannes Quist, the project lead of the DEM research group at FCC, whose
insights into DEM modelling have been of great value. Also in general has the DEM
research team at FCC been of great support, I am excited to continue working with all
of you. Further I want to thank my examiner Anders Logg for giving valuable feedback
and spending your time on this thesis.

I want to acknowledge FCC in general, and in particular department head Fredrik Edelvik,
for letting me do my thesis on an interesting yet relevant topic. I also want to thank FCC
for the great technical support which has enabled me to perform my thesis work unin-
terrupted even through the current exceptional circumstances.

Finally, marking the end of my 5 years of studies, I want to thank my family and friends
for supporting me throughout these years.

This work was carried out within the framework of the Centre for Additive Manufactur-
ing - Metal (CAM2), supported in part by the InfraSweden2030 project DigiRoad, and
funded in part by the Swedish Government Agency for Innovation Systems, VINNOVA.

Adam Bilock, Gothenburg, June 2020

vii


Contents

Abbreviations xi

List of Figures xiii

List of Tables xv

1 Introduction 1
1.1 Project scope limitations and aim . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Outline of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 General-Purpose Computing on Graphics Processing Units (GPGPU) 5
2.1 Hardware architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Programming model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Implementation details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Overview of the Discrete Element Method 9
3.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Discrete element representations . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 World geometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4 Contact Forces for Irregular Shaped Particles 15
4.1 HM+D model for spheres . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 HM+D model for irregular shaped particles . . . . . . . . . . . . . . . . . . 16

5 Polyhedral DEM Contact Detection 19
5.1 Polyhedron representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.2 Intersection nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.3 Intersection mass properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.4 Multiple contact error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.5 Broad phase contact detection . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.6 World geometry contact detection . . . . . . . . . . . . . . . . . . . . . . . . 29

6 Physical Verification and Convergence of Method 31
6.1 Verification and convergence for spherical polyhedrons . . . . . . . . . . . 31
6.2 Verification and convergence for irregular shaped particles . . . . . . . . . 35
6.3 Verification and convergence on a laboratory scale . . . . . . . . . . . . . . 36

ix


Contents

7 Performance 41
7.1 Overall performance - gravity packing simulation . . . . . . . . . . . . . . . 41
7.2 Scaling with particle resolution . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7.3 Memory consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

8 Conclusion 49
8.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Bibliography 53

A Simulation parameters I
A.1 Calibration rig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I
A.2 Gravity packing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II

x


Abbreviations

BFS Breadth First Search.
BVH Bounding Volume Hierarchy.

CUDA Compute Unified Device Architecture.

DEM Discrete Element Method.
DFS Depth First Search.

FMA Floating point Multiply Add.
FPGA Field Programming Gate Array.

GPGPU General Purpose computing on Graphics Processing Unit.
GPU Graphics Processing Unit.

HM+D Hertz-Mindlin-Deresiewicz.

ILP Instruction Level Parallelism.
IPS Industrial Path Solutions.

NDEM Nonsmooth Discrete Element Method.

SM Streaming Muliprocessor.
SP Streaming Processor.

xi


Abbreviations

xii


List of Figures

2.1 The Nvidia Volta streaming multiprocessor . . . . . . . . . . . . . . . . . . . 6

3.1 Control flow of the discrete element method. . . . . . . . . . . . . . . . . . 9
3.2 The three most common particle representations in DEM. . . . . . . . . . . 11
3.3 Overview of common particle representations in DEM. . . . . . . . . . . . 12
3.4 Illustration of the fallacy of multisphere DEM. . . . . . . . . . . . . . . . . . 13

5.1 Collision of two polyhedral particles. . . . . . . . . . . . . . . . . . . . . . . 19
5.2 Polyhedral intersection nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.3 Filtering of polyhedral particle triangles. . . . . . . . . . . . . . . . . . . . . 23
5.4 Polyhedral intersection volume computation with cusp model. . . . . . . . 25
5.5 A bounding volume hierarchy. . . . . . . . . . . . . . . . . . . . . . . . . . . 27

6.1 Spherical polyhedrons used in this study. . . . . . . . . . . . . . . . . . . . . 32
6.2 Average kinetic energy response for spherical particles. . . . . . . . . . . . 32
6.3 Standard deviation in kinetic energy response for spherical particles. . . . 33
6.4 Convergence of kinetic energy response for spheres. . . . . . . . . . . . . . 34
6.5 Irregular shaped polyhedrons used in this study. . . . . . . . . . . . . . . . 35
6.6 The average normalized kinetic energy response for a irregular shaped

particle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.7 The variance in the normalized kinetic energy for a irregular shaped par-

ticle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.8 Experimental calibration rig. . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.9 Height map from rig simulation. . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.10 Convergence of calibration rig simulation. . . . . . . . . . . . . . . . . . . . 40

7.1 The gravity packing simulation. . . . . . . . . . . . . . . . . . . . . . . . . . 41
7.2 The Schönhardt polyhedron used in this study. . . . . . . . . . . . . . . . . 42
7.3 Simulation time of solvers for gravity packing. . . . . . . . . . . . . . . . . . 43
7.4 Number of collision pairs during gravity packing simulation. . . . . . . . . 44
7.5 Scaling of solvers against number of pairs. . . . . . . . . . . . . . . . . . . . 45
7.6 Scaling of solver for irregular shaped particles. . . . . . . . . . . . . . . . . . 46
7.7 Scaling for calibration rig simulation . . . . . . . . . . . . . . . . . . . . . . 47

xiii


List of Figures

xiv


List of Tables

7.1 Memory consumption for gravity packing. . . . . . . . . . . . . . . . . . . . 47

A.1 Parameter values used for the calibration rig simulations without tangen-
tial spring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I

A.2 Parameter values used for the calibration rig simulations with tangential
spring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II

A.3 Parameter values used for the gravity packing simulations. . . . . . . . . . III

xv


List of Tables

xvi


1
Introduction

There is a large demand for understanding the nature of granular materials from both
science and industry. An increasingly popular method to study these systems is the
Discrete Element Method (DEM). By simulating the particles as distinct elements, DEM
can achieve a level of detail and generality not present in alternative methods for gran-
ular materials. This is opposed to continuum methods where instead the system is
treated as a continuous material. DEM first originated in 1979 from implementations
by Cundall and Strack[1], but due to the computational burden of resolving each dis-
tinct particle it is only in recent years the method been commonly adopted.

To lower the computational cost a common approach of DEM is to approximate all
particles as spheres [2]. However, it is evident that spheres cannot represent the often
varying particle shapes present in granular materials. The most popular remedy of this
problem is to use compounds of spheres (here on referred to as multispheres)[3]. The
simplicity of contact detection for spheres is preserved with the multispheres and, in
addition, the ability to capture the shape of granular material is improved in compar-
ison to spheres [4]. However, a deficiency of the multisphere approach is the lack of
representation of sharp edges, often present in real particles.

More importantly all the force models in use for multispheres in the open literature are
fundamentally flawed. This is due to that the exact intersection of two multispheres
cannot efficiently be resolved and, consequently, the acting force on the particles are
naively accumulated from all compounded spheres [5]. This deficiency of the force
model for multispheres results in non-converging behaviour of the method, where on
one hand more compound spheres are needed for an accurate shape representation
and on the other hand more compound spheres makes the force model ever more in-
accurate [6].

Henceforth, to attain an accurate representation of granular material a better model
of the particles is needed. One candidate is polyhedrons, which by its flexible repre-
sentation can be made to accurately model most realistic shapes. Furthermore, such a
formulation can also, in contrast to purely algebraic models, capture the sharp edges
of e.g rock material often studied with DEM. However, there is a significant computa-
tional burden of polyhedral DEM. To lower the computational burden some early ef-
forts[7] simplified the problem by using the indentation depth on a feature-by-feature
manner, meaning that the indentation depth is computed for each face-face, edge-
face and vertex-face of the polyhedron. With this approach the same issue as with the
multisphere appears, i.e. the force model is not decoupled from the geometric mod-

1


1. Introduction

elling[6] and the only advantage over the multispheres is the more flexible geometry
of polyhedrons. However, the problem with a non-converging method can be resolved
with polyhedrons, namely by ensuring that the exact volumetric overlap of the parti-
cles are computed. In Polyhedral particles for the discrete element method by B. Nas-
sauer et. al. [8] this approach was explored by presenting a general framework for the
method. This was followed by another article [9], where a more advanced force model
was derived from Hertz contact law, and was shown to agree well with FEM simula-
tions. However, only systems of at most 500 particles were studied. In the work of N.
Govender et.al. [10, 11, 12, 13] it was shown how a GPU polyhedral DEM utilizing volu-
metric overlaps can be effectively applied to large scale industrial usage. In their work,
for instance a one second simulation of one million non-convex particles were simu-
lated overnight on a single desktop computer.

Besides the open literature, efforts in the realm of polyhedral DEM have also been
made in commercially available DEM software. Most notable in the software Rocky
DEM, supporting multi-GPU simulations and a large extent of features. However, their
work is mostly not relevant for this thesis, since neither their computational method
nor other displays of their accuracy are published, and it is thus impossible to do any
comparisons. Nevertheless, according to N. Govender et. al[14] Rocky DEM neither re-
solve the exact volumetric overlaps nor attain the same level of throughput as Goven-
der et.al. have displayed, but no further details were given about the simulation case
and thus the results are difficult to relate to in a scientific context. Recently Becker 3D
introduced a polyhedral DEM GPU solver, but again no reports of their method, accu-
racy or other capabilities makes a direct comparison likewise impossible.

Since 2016 the Fraunhofer-Chalmers Centre (FCC) has developed a DEM framework.
The current framework consists of a state-of-art GPU implementation of DEM for spher-
ical particles and a CPU implementation of multisphere DEM. The spherical GPU solver
currently allows for more than 50 million polydisperse spherical particles. Further-
more, the performance of the CPU implementation of the multisphere solver has shown
to be a limiting factor for industrial scale studies of e.g. infrastructure and rock ma-
terial handling simulations. In addition, the aforementioned issues[6] regarding the
force models of multisphere DEM has also been exposed in research work conducted
by the FCC DEM group. As a consequence of the performance and model limitations,
the FCC DEM group have a need for a performant GPU DEM solver that can represent
non-spherical particles and also to evaluate the polyhedral representation of particles.
With the developed competence in HPC based on GPUs, a polyhedron DEM frame-
work is a feasible next step in state-of-art software for particulate systems.

1.1 Project scope limitations and aim

The project work was conducted at FCC in the DEM research group. The purpose of the
thesis work was to implement and verify a state-of-the-art polyhedron DEM GPU code
with performance equivalent or better than what has been reported in the literature.
To acquire this goal, the project work was delimited in some aspects:

• The project was implemented directly in the existing DEM framework, minimiz-

2


1. Introduction

ing the efforts required for setting up the software infrastructure.
• The polyhedron particles were provided at the start of the work and no further

studies were conducted on the triangulation process of the model particles them-
selves.

• The force model was not the primary concern of the thesis and tentatively mod-
els from the literature were to be used.

1.2 Research questions

In this work several research questions revolving the algorithmic aspects are to be an-
swered:

• What is the current state-of-art in the open literature as regards to DEM simula-
tions of polyhedron particles?

• Compared to the spherical GPU solver at FCC: What is the performance charac-
teristics for the polyhedron code?

• Can a linear scaling with the number of particles be achieved on a state-of-art
GPU?

• How does the computational time of the polyhedral code scale with the resolu-
tion of the triangulation of the particles?

Furthermore, research questions regarding the use of the polyhedral particle represen-
tations in DEM were formulated:

• Can convergence in physical behavior of the particles be shown with regards to
the particle resolution?

• How finely resolved particles are required to achieve convergence in terms of
their physical behaviour?

• Can the method capture the range of repose angles found in experiments in a
calibration rig?

• Finally, and more generally: what are the benefits respectively disadvantage of a
polyhedral particle representation?

1.3 Outline of thesis

Section 2 introduces GPU programming, the implications of which will determine many
later decisions in the used methods. Section 3 gives an overview of DEM. Then, the fol-
lowing sections will go into details of the required components of polyhedral DEM.
First the force models are introduced in section 4, followed by the contact detection
detailed in section 5. The remaining sections will analyze the method, first the method
is verified in section 6 whereafter the performance of the solver is analyzed in section
7. Finally the thesis is summarized in section 8.

3


1. Introduction

4


2
General-Purpose Computing on

Graphics Processing Units (GPGPU)

With the advent of the power wall[15], meaning that the clock rate of single core ar-
chitectures could not be further increased due to thermodynamical laws, progress in
computational power have been shifted towards parallel computation on a single chip.
In the forefront of this shift lies general-purpose computing on graphics processing
units (GPGPU). With the initial purpose of rendering 3D graphics, the hardware in
GPUs have due to the parallel nature of rendering been developed and optimized for
highly parallel computation. The underlying computational units of the GPU follows
the stream processing paradigm which by its constrained computational model can at-
tain a much higher throughput per power than the traditional CPU architecture. Some
aspects of both the hardware architecture and corresponding SIMT (Single Instruc-
tion Multiple Threads) programming model are important for some of the algorithmic
choices of this thesis and, thus, such aspects are carefully described in this section. //

2.1 Hardware architecture

The GPU is a so called co-processor and hence is its execution entirely separated from
the execution of the host (meaning the CPU). In practice this means that the GPU in
a broad perspective largely requires the same elements as the CPU. This means that
it has its own global memory, it has a similar cache hierarchy and the execution of
the underlying computational units is completely organized by the GPU itself. While
the hierarchy of the GPU is similar to the CPU, the components of this hierarchy have
largely been altered to accommodate the massive parallelism on offer [15].

Starting at the lowest level of the architecture, the underlying computational units of
the GPU follows the aforementioned streaming process paradigm (these computa-
tional units are often referred to as Streaming Processors, SP). In the Nvidia chips a
SP is represented by 32 floating point units and 32 integer units. By strictly following
the streaming process paradigm all 32 units must execute the same instruction. By this
restriction a much higher number of computational units can reside on a single chip
opposed to a CPU. The impacts of this limitation on the programming model will be
further considered later.

Each SP does merely consist of the actual arithmetic logic units (ALU) and to actu-
ally execute instructions and interact with memory each SP resides within a Streaming

5


2. General-Purpose Computing on Graphics Processing Units (GPGPU)

Figure 2.1: The Nvidia Volta streaming multiprocessor (SM). Most required hardware
for computation resides within the SM. Note that each marked computational unit in
the figure, e.g. the marked FP64 units, is here a streaming processor and consists of
32 ALUs in total. Also noteworthy is the tensor cores which takes up nearly half of the
space for computational units and by its restricted execution of half-precision 4 × 4
matrix FMA (Fused Multiply Add) operations cannot be readily utilized for many ap-
plications. [16]

Multiprocessor (SM). These units organize the computation of the SPs, where each SM
consist of many SPs. Figure 2.1 shows the SM for the Nvidia Volta architecture[16]. In
this architecture the different ALUs of the SP have been separated such that each SP
consist of one FP32 unit, one integer unit and one FP64 unit. Further, each SM has
its own L1 cache, both instruction and data, and also importantly it has access to the
so called shared memory. The shared memory has access times comparable to the L1
cache, but can be manually controlled by the programmer. Moreover in the Volta ar-
chitecture[16], and also the succeeding Turing architecture[17] each SM has access to
8 tensor cores. These units can execute 4×4 matrix FMA (Fused Multiply Add) opera-
tions, albeit only at half precision (meaning a 16-bit representation). Finally each GPU
consist of many SMs (e.g. the Volta architectures have 80 SMs per GPU) and on this
level there is also a L2 cache which interacts with the global memory. An important
limitation of the GPU is that the synchronization of the SMs is not especially effective,
often the execution between the different SMs rather is synchronized at the host and

6


2. General-Purpose Computing on Graphics Processing Units (GPGPU)

usage of for example global atomics should be avoided.

A note regarding the GPU hardware is that in comparison with the CPU the amount of
memory (throughout the hierarchy) available per thread is significant lower, which in
practice means that significant care is required such that consecutive threads access
memory linearly. Finally since the GPU is a co-processor, care must be taken to avoid
transferring to much data between the GPU and the host, which for the implementa-
tion of this thesis is avoided by having all key computational steps performed by the
GPU.

2.2 Programming model

In this work the CUDA programming model[18] will be used as a extension of the pro-
gramming language C++. Some aspects of this programming model will be of impor-
tance for the algorithmic implementations. The basic model of CUDA consists of a
host program that launches GPU kernels. A kernel consist of many threads executing
the same program. The host execution primarily serves as a global synchronization
point of the GPU and managing the control flow of the application. This role of the
host is partly induced by the lack of effective global synchronization of the different
SMs as previously discussed, but also due to that most control flow tasks are single
threaded and thus more suitable for the CPU.

So far the programming model follows the streaming process paradigm. However,
CUDA also exposes a lower granularity of the thread organization which maps to the
aforementioned hardware hierarchy. Foremost the threads are divided into blocks.
Each thread block can only be executed on a single SM, where the threads in a block
can be synchronized within the kernel and all have access to the same shared mem-
ory. Utilizing the shared memory is essential to attain effective kernels, however it is
severely limited in size (between 46-96kB[16, 17]). Furthermore, each thread block
consist of several thread warps, where a thread warp resides in a single SP. The threads
in a warp have faster synchronization and can to certain extent communicate by and
share registers. A significant limitation of the programming model is that all threads in
a warp need to execute the same instruction, this is imposed by the underlying com-
putational units following the streaming process paradigm. Thus can the performance
significantly degrade if the execution of the threads in a single warp diverges. This lim-
itation means a significant challenge for the programmer to ensure that control flow
within the kernels minimize the number of branches. This limitation does in practice
mean that often simpler more straightforward algorithms should be preferred opposed
to what typically is the most efficient solution on the CPU [15]. It should be mentioned
that this limitation of thread divergence have with the newer architectures, Volta[16]
and Turing[17], been slightly relaxed, but diverging control flow should still be mini-
mized.

7


2. General-Purpose Computing on Graphics Processing Units (GPGPU)

2.3 Implementation details

In the implementation of the methods presented in this thesis, the impact of low level
technical details on the GPU have shown to give significant change in performance.
Due to the large scope of the thesis, not all such details can be enclosed, rather this
section will highlight some key programming patterns which has enabled increase of
the performance.

A recurring pattern is to always minimize the data usage. This is two-fold, it is im-
portant both to minimize the actual data footprint, but also to reduce the data flow
of the application. This affects both the choices of methods, and the implementation
details such as effectively packing the data, using custom memory allocators and en-
suring that the data gets properly cached during computations in the memory hierar-
chy. This, at times, could be taken to the extremes, where it was beneficial to reorder
data before key computational kernels in order to improve the cache hit rate during
the actual computation. Similarly, also large performance benefits could be achieved
by reordering data to minimize the aforementioned thread divergence.

The generated code from the compiler is often non-optimal [19, 20] for CUDA, this is
a result of both choices in the design of the C++ programming language and, in some
cases, limited compiler support for CUDA. Thus, for many of the key computational
kernels in the implementation, the code was analyzed on an assembly instruction
level. Such an analysis often disclosed how small, almost trivial, changes to the C++
code could lead to significant changes in the generated machine code and ultimately
the performance of the solver. One such example is aliasing [21], where the C++ lan-
guage does not guarantee that two pointers of the same type do not point to the same
memory location. If aliasing is not remedied it can in certain instances lead to a signif-
icant higher memory throughput usage since there is no guarantee that the underlying
data have not changed and thus data has to be reloaded from the global memory at
each usage. Another example, is that small changes in the C++ code could hinder the
compiler to vectorize loops, which results in worse instruction level parallelism (ILP),
in turn also significantly degrading the performance.

To quickly iterate the fine tuning of the low level details, most key computational ker-
nels were micro-benchmarked, meaning that the performance of single kernels were
measured and the process automated in the build system. Moreover, also profiling
tools such as NVIDIA Nsight Compute/Systems [22, 23] were frequently used, which
allowed for profiling both on a system level (Nsight Systems) but also at an assembly
instruction level (Nsight Compute).

8


3
Overview of the Discrete Element

Method

In DEM, the particles are simulated as rigid bodies where the state of each particle is
individually tracked and evolve over time due to various interactions with e.g. other
particles or materials. The dynamics of the particles are determined by Newton’s sec-
ond law, namely,

mi
d2xi

dt 2
= Fi

Ii
d2Θi

dt 2
= Ti

(3.1)

here mi are mass, xi is position of the centre of mass, Fi is the acting force, Ii is the in-
ertia, Θi is the orientation and Ti is the acting torque, respectively on particle i . Com-
monly eq. 3.1 are integrated by explicit methods. However, also implicit versions of
DEM exist, and are then often referred to as nonsmooth DEM (NDEM)[24] which cor-
rects for the fact that the forces and torques in eq. 3.1 cannot be expressed as a di-
rect mapping from the particle state in the previous time step. NDEM can for certain
applications compensate the increased computational burden induced by an implicit
method by allowing for the use of larger time steps. However, for most applications,
explicit formulations are preferred since then eq. 3.1 can be solved for each particle
independently, whereas for the implicit counterpart the resulting matrix systems from
eq. 3.1 are significantly more evolved to solve. Henceforth, in the remainder of this

Input Contact detection Contact forces State update

Post-Process

Figure 3.1: Overview of the control flow of the discrete element method. Given input
data the method iterates each time step by; first resolving the contacts of the particles,
then evaluating the contact forces and finally the state of the particles are updated
according to eq. 3.1.

9


3. Overview of the Discrete Element Method

work only explicit DEM is considered.

The control flow of a time step in an explicit formulation of DEM can be seen in figure
3.1. Here contact detection refers to finding the interacting particles, and if present
also the interactions of the particles with other bodies (e.g. static world geometries).
Note that each particle contact pair is resolved independently. The contact detection
is typically performed in several phases; a broad phase, a narrow phase and finally
resolving the geometric overlaps. The broad phase contact detection often relies on
bounding volumes of the particles (e.g. axis-aligned boxes) which aims to efficiently,
but approximately, filter out all potential pairs. Then for the case of complex particle
shapes a narrow phase contact detection is often performed, which further filters out
potential pairs but in some cases also individual components of the complex shapes
(e.g. triangles or spheres). For simple spheres the narrow phase is often not required.
Finally the actual overlap of the particles are resolved, the exact details of this most
detailed phase depends on the particle representation and the force model used. The
contact detection methods of this work are further discussed in section 5.

With the contacts resolved, the next step consists of computing the interacting forces
to give the right hand side of eq. 3.1. In this work only interactions resulting from direct
geometric contact are consider, i.e. the particle has no long-range forces. The formu-
lation of the contact forces in this work can be seen in section 4. Finally the state of
the particles are updated by integrating eq. 3.1 which is further detailed in section 3.4.
Now the simulations are performed over time by iterating over the time steps.

3.1 Limitations

With the explicit DEM formulation there are certain key assumptions and limitations to
consider. One of the main assumptions is that the particle geometry is assumed to be
rigid, meaning that there is no deformation or plasticity in the particles. Still, the rigid
geometries of the particles are allowed to overlap, but this is only valid if the overlap
of the particles are assumed to be small enough such that effects from deformation
or other plasticity are small. This limitation has the negative effect of restricting the
method to small time steps such that these small overlaps are attained at each step [2].

3.2 Discrete element representations

As discussed in the introduction (see section 1) there are several different representa-
tions of the discrete elements in DEM. The most common shape is simple spheres,
which largely is due to their computational advantages. However, as many studies
have shown the effect of the shape of particles on the systems can be profound [12, 4].
Henceforth, several methods to represent irregularly shaped particles have been devel-
oped. The most common method is the multisphere representation, where each par-
ticle is represented by several clumped spheres. Moreover, there are several attempts
of different analytical representations e.g. ellipsoids. Further, most relevant for this

10


3. Overview of the Discrete Element Method

Figure 3.2: The three most common particle representations in DEM. The simple
sphere, multispheres and polyhedrons (left to right).

thesis is the polyhedral representations. Figure 3.2 shows three most common parti-
cle representations in DEM, a simple sphere, multisphere and polyhedron. Note that
for all approaches there are typically only a few particle models per population, where
often only the scale of each model can be changed for each particle, this is due the
high memory requirements of having unique models for each particle. Figure 3.3 gives
a overview of the associated computational cost and physical fidelity of the different
particle resolutions.

Spheres have commonly been used in DEM. Largely the reason for this has been the
computational advantage of such a simple shape. However, it is evident that spheres
cannot represent irregular shaped particles often found in granular materials. While
this can be to some extent compensated for by using non-uniform rolling friction of
the spheres[4] this is accompanied with significant tuning of the parameters which
significantly reduce the usability, and ultimately the validity, of the method. The scale
of the current state-of-art spherical DEM solvers is in the order of tenths of millions of
polydisperse spheres [25]. It has also been displayed in the literature how the usage of
GPUs can significantly speed up the simulation time, where speed up ratios between
60-100 times single threaded CPU implementations have been shown [25].

Multispheres consist of several clumped sub-spheres as displayed in figure 3.2. The
sub-spheres of one multisphere typically remains fixed in the particle body frame. The
ability to model irregularly shaped particles is improved in comparison to spheres and
it has been shown how the clumped representation can to some extent give predictive
results with respect to particle shape[4]. The ease of contact detection also remains
with multispheres where only slight modifications are required opposed to spherical
DEM. However, a significant issue arise by its geometric modelling in the force model.
Namely the contact detection of each sub-sphere is completely independent of the
other sub-spheres in the particle. Meaning that each sub-sphere pair of one parti-
cle contact is modelled as separate contact forces. This issue is illustrated in figure

11


3. Overview of the Discrete Element Method

Physical fidelity

C
o

m
p

u
ta

ti
o

n
al

co
st

Figure 3.3: Overview of the common particle representations in DEM. The scales gives
a outline of the physical fidelity and the associated computational cost for the different
representations. From left (bottom) to right (top) there is simple sphere, multisphere,
analytical and polyhedral particle representations.

3.4, where one particle contact is represented by 8 different sphere-sphere contacts,
all these 8 sphere-sphere contacts will naively be accumulated and give a significant
higher contact force compared to if the actual overlap volume could be resolved. It has
been shown in the literature how this severely distort the contact forces for multisphere
particles [5]. Now, since the only approach to improve the geometric modelling of the
multispheres is to increase the number of subspheres this leads to a non-converging
method with respect to particle resolution [6]. This issue can be illustrated by, for in-
stance removing the smallest two spheres in figure 3.4, then the resulting contact force
of the collision in figure 3.4 will be significantly lower.

Polyhedrons have largely been avoided in DEM due to the high computational de-
mand of resolving the contacts. To reduce the computational cost the first efforts
with regards to polyhedral DEM used contact forces by an feature-by-feature manner,
meaning that each face-face, face-edge and edge-edge contact were independently re-
solved. Notable work utilizing this approach is found in Govender et. al. [7, 26] where
primarily the preeminent advantage of utilizing the GPU also was shown for polyhe-
dral DEM where millions of particles could be simulated on a single GPU. However,
this contact detection scheme similarly to the multisphere approach can severely dis-
tort the contact forces and again the force model is not decoupled from the geometric
modelling [6]. This problem can be avoided for polyhedrons by instead resolving the
exact volumetric overlaps of the particles. Notable work using this contact detection
scheme is seen in Nassauer et. al. in [8], where a general framework for the method
was developed, and also in [9] where a more advanced force model was derived which
attempts to take the contact curvature into account which results in good agreement

12


3. Overview of the Discrete Element Method

Figure 3.4: The figure shows collision of two objects. The actual objects are repre-
sented as the dotted lines, the spheres with the same colour as the lines are part of the
multisphere representation of that object. For this case, the multisphere representa-
tion would give very strong contact force, since all individual sphere-sphere contacts
between the blue and red spheres will be counted as individual forces.

with FEM simulations. However, in [8, 9] the system sizes are restricted to 500 parti-
cles and the solver was not adopted for non-convex particles. Instead Govender. et. al.
have in recent years[11, 12, 13] adapted the initial solver from [7, 26] to instead resolve
the exact volumetric overlaps of the particles. In their work also non-convex particles
could be simulated by decomposing the particles into convex parts. Govender et. al
can achieve this physical fidelity while still being capable of simulating large system
sizes, for instance a one second simulation of one million non-convex particles could
be simulated overnight. It is noteworthy that in the work of Govender et. al. a slightly
simpler elastic force model opposed to the one proposed by Nassauer et. al. was used
which to not the same extent corrects the force model to the contact curvature.

3.3 World geometries

For DEM to be usable, the particles also need to interact with a surrounding environ-
ment. Rigid world geometries is the most common such an environment. Several dif-
ferent methods exist to model the world geometries; purely as analytical surfaces (e.g.
planes and cylinders), as several rigid clumped particles or as triangle meshes. The for-
mer alternative, with purely analytical surfaces, can be the most straightforward when
only a few different world geometries are considered, but this approach also severely
restricts the usage of the method. In contrast, more general approaches such as inter-
actions with CAD surfaces is significantly more evolved. The second approach, rigid
clumped particles, are a popular method for polyhedral DEM[8, 9], since then the con-
tact detection for the world geometries and particle interactions are completely equiv-
alent. For certain instances the conversion between triangle mesh to clumped poly-
hedron particles can be straightforward, however for complicated surfaces the con-

13


3. Overview of the Discrete Element Method

version can be much more complicated and hence is this option not especially user
friendly. Only recently in literature on polyhedral DEM have the case of world geome-
tries represented by triangle meshes been considered [13]. Using a triangle mesh to
represent the world geometry is the most flexible approach since they can both be cre-
ated from point clouds of scanned objects or be generated as computational meshes
from analytical surfaces. For the case of polyhedral DEM, the contact forces between
particle and triangle mesh can remain the same as for particle-particle interactions,
see section 4. Further, there is also only slight changes required for the contact detec-
tion in comparison with the particle interactions, which is further discussed in section
5.6.

3.4 Integration

In this work explicit integration is used for solving the ODE system of eq. 3.1 and as
previously mentioned is the standard approach in DEM. A common method first pro-
posed by Cundall and Strack in 1979[1] for the translational integration is the Velocity
Verlet algorithm. The translational integration then goes as following:

1. Update velocity, vi (t + 1
2∆t ) = vi (t )+ 1

2 ai (t )∆t
2. Update position, xi (t +∆t ) = xi (t )+vi (t + 1

2∆t )∆t
3. Compute force, fi , based on xi (t+∆t ) and vi (t+ 1

2∆t ), update acceleration, ai (t+
∆t ) = fi /mi

4. Update velocity, v(t +∆t ) = v(t + 1
2∆t )+ 1

2 ai (t +∆t )∆t
where i refers to the particle i = 1, ...,n. Note that the force computation is performed
at positions and velocities that are a half-step apart, which for the viscous forces of
DEM are deemed acceptable through empirical evidence[27]. For the orientation ex-
plicit forward Euler integration is used.

14


4
Contact Forces for Irregular Shaped

Particles

This section will present the contact forces used in this study, i.e. the expression needed
to evaluate the right hand side of eq. 3.1. The model will be derived from the well stud-
ied Hertz-Mindlin-Deresiewicz (HM+D) model for spheres. Hence will the spherical
model first be described and then it will be shown how it can be adapted to arbitrary
shapes as first shown in[9] for the regular Hertz contact law for spheres.

Note that in an earlier phase of this work the model proposed by [9] was intended to
be used. However, as further detailed in section 6.3 the desired behaviour of particle
systems could not be achieved with this model, hence in section 4.2 an additional tan-
gential spring stiffness are introduced to address this issue.

4.1 HM+D model for spheres

The HM+D model has a normal elastic force which corresponds to the Hertz contact
law for spheres[28],

Fn,e = 4

3
E∗R1/2δ3/2 (4.1)

where E∗ refers to the effective modulus, R∗ is the effective radius of the spheres and δ

is the indentation depth. The effective modulus is defined as,

1

E∗ = 1− v2
1

E1
+ 1− v2

2

E2
(4.2)

where vi is the Poisson’s ratio and Ei is the Young’s modulus for each sphere i = 1,2.
The effective radius is defined as,

1

R
= 1

R1
+ 1

R2
(4.3)

where Ri is the radius of the sphere i = 1,2. Note how the intersection with a sphere
and a plane can be defined by letting R2 → ∞ and consequently R = R1. Finally, the
indentation depth for sphere-sphere contacts can be calculated as,

δ=
{

R1 +R2 −|c2 − c1| if |c2 − c1| < R1 +R2

0 otherwise
(4.4)

15


4. Contact Forces for Irregular Shaped Particles

where ci is the center of each sphere i = 1,2. Further is the scalar dissipative normal
force given by,

Fn,d = 2γ
√

m∗kn vn (4.5)

where γ is a damping coefficient, m∗ is the effective mass of the particles, kn is the nor-
mal spring stiffness and vn is the relative velocity in the normal direction. The effective
mass is given by,

m = m1m2

m1 +m2
(4.6)

and the normal spring stiffness is given by,

kn = 2E∗pRδ. (4.7)

In the HM+D model a tangential spring force is derived from the no-slip theory of
Mindlin [29]. The scalar component of this force is given by,

F n
t ,e =F n−1

t ,e +kn
t ∆δt if ∆Fn ≥ 0 (4.8)

F n
t ,e =F n−1

t ,e

(
kn

t

kn−1
t

)
+kn

t ∆δt otherwise (4.9)

where kt is the tangential spring stiffness, n refers to the current time step and ∆δt is
the tangential displacement at the current time step. The tangential spring stiffness is
given by the following expression,

kt = 8G∗pRδ, (4.10)

where G∗ is the effective shear modulus, given by,

1

G∗ = 1− v2
1

G1
+ 1− v2

2

G2
. (4.11)

The tangential force also has a dissipative component, giving the final expression for
the scalar tangential force as,

Ft = F n
t ,e +2γ

√
mkt vt if Ft <µFn (4.12)

Ft =µFn otherwise (4.13)

where µ is a friction coefficient.

4.2 HM+D model for irregular shaped particles

Now following the work of Nassauer et. al [9] a contact model for arbitrary shaped
particles can be derived. This is done by noting that the volume of a sphere-sphere
intersection can be approximated as,

V ≈πδ2R (4.14)

16


4. Contact Forces for Irregular Shaped Particles

if one assumes δ≪ R. Similarly for a sphere-plane intersection the volume is given by,

V = 1

3
πδ2(3R −d) (4.15)

which again can be approximated according to equation 4.14 when δ ≪ R. The as-
sumption of, δ≪ R, i.e. small overlaps relative to the particle size is not a restrictive
assumption, since as mentioned in section 3 this is already assumed in DEM mod-
elling. Now inserting this expression into the HM+D model for spheres the following
scalar normal force for arbitrary shaped particles is obtained which instead of the ra-
dius is based on volume,

Fn = 4

3
p
π

E∗pV δ+2γ
√

m∗kn vn , (4.16)

but now the normal spring stiffness is defined as,

kn = 2E∗
√

V

πδ
. (4.17)

Further, the expressions in eq. 4.8 and eq. 4.12 for the tangential force remains the
same, but for an irregular shaped particle the tangential spring stiffness can instead be
defined as,

kt = 8G∗
√

V

πδ
. (4.18)

Again following the work of Nassauer et. al. [9] the direction of the normal force is
calculated by integrating over the surface normal of one of the particles,

n =
∫

A1
ns ds∣∣∣∫A1
ns ds

∣∣∣ =
∑

i Ai
1ni

s∑
i Ai

1

(4.19)

where A1 =∑
i Ai

1 is the area belonging to one of the particles in the intersection poly-
hedral and ns is the surface normal. Further the indentation depth, δ, is defined as the
maximum cross sectional length of the intersection in the normal direction. Finally the
contact point of the interaction is defined as the center of mass of the volumetric over-
lap of the particles. Note that the same contact force model can be used for particle
interactions with the triangle mesh world geometry. However, it requires the restric-
tion on the world geometry to have a closed surface and have a defined interior, note
that the interior opposed to the polyhedral particles can still be an open sub-space of
R3.

With the contact forces well defined, the right hand side of eq. 3.1 can be determined.
Furthermore, as the required properties for each particle overlap are known, it now
remains to compute these properties for polyhedrons, which will be considered in the
next section 5.

17


4. Contact Forces for Irregular Shaped Particles

18


5
Polyhedral DEM Contact Detection

Figure 5.1 shows an intersection of two particles. The derived contact forces from sec-
tion 4 will need several properties of the overlapping volume from each such contact
pair. Namely, the volume, V , of the overlap needs to be computed (the marked grey
area), further to attain the direction of the force, n, eq. 4.19 needs to be evaluated
which are equivalent of integrating the normals along the red triangles marked in fig-
ure 5.1. Moreover, to get the indentation depth, δ, the extent of the overlapping volume
needs to be evaluated along the direction n. Also the center of mass, c, of the overlap-
ping volume needs to be evaluated to give a contact point of the interaction. In the
following, the term mass properties will be used to refer to all the above properties of
the volumetric overlap. This section will present the required theory and the method
used for finding such properties for each particle contact. This section will also con-
sider the broad phase contact detection i.e. to effectively, but approximately, filter out
all contact pairs, both particle-particle and particle-triangle mesh contacts.

Figure 5.1: Collision of two polyhedral particles. The intersection volume is marked in
grey. The surface integrated to attain the normal force direction is marked as red.

5.1 Polyhedron representations

A polyhedron is defined as a 3-dimensional solid composed of several polygons [30].
Typically, the polygons are connected at the edges. When using volumetric overlaps for
DEM particles an essential restriction is that the interior of the polyhedron represents
a closed subspace of R3. The convex polyhedron is an important special case, it can

19


5. Polyhedral DEM Contact Detection

be defined as the intersection of several half-spaces, i.e the bounded solution, x, to the
linear equation,

Ax ≤ b (5.1)

where A ∈ R3 ×Rm and b ∈ Rm . For two convex polyhedrons represented by A1,2 and
b1,2, the intersection is defined as the bounded solution, x, to the linear equation,

A′x ≤ b′ (5.2)

where A′ = [A1, A2] ∈ R3 ×Rn+m and b′ = [b1,b2] ∈ Rn+m . Note how this implies that
also the solution, x, is convex. These properties of convex polyhedrons will have large
implications when intersection algorithms are considered in the next section 5.2.

5.2 Intersection nodes

The intersection nodes of the volumetric overlaps needs to be computed in order to
acquire the desired mass properties of each polyhedral intersection (see section 5.3
how the mass properties are computed from the intersection nodes). The intersection
nodes of two general polyhedrons are defined as the extreme nodes of each convex part
of the intersection. Namely, there are two types of intersection nodes (see also figure
5.2):

• Already existing nodes of one of the polyhedrons which reside in the interior of
the other polyhedron.

• New nodes formed from an edge of one of the polyhedrons intersecting a triangle
from the other polyhedron.

In previous work with regards to polyhedral DEM[8, 11] the intersection nodes have in
essence been found from iteration over the respective features (e.g. triangles/edges or
half-planes) of each polyhedron. In [8] only convex particles were considered and the
half-plane representation (see eq. 5.1) was used, whereas in [11] triangulations were
used. A naive pseudo algorithm which computes the intersection nodes by iterating
all features for a triangulation can be seen in Algorithm 1, note that no assumptions
are needed on the convexity. Given two polyhedrons, each with n features, it directly
follows that this algorithm will scale as O(n2). While for the convex case there exist
theoretically optimal algorithms with worst case time complexity of O(n) [31, 32].

However, it is unclear if these theoretical optimal algorithms pose an actual advan-
tage in terms of performance, since the constant factor of these theoretical optimal
algorithms relative to the naive algorithm are unknown. This is especially true on the
GPU, where, as outlined in section 2.2, simpler algorithms are typically preferred and
the complicated control flow (e.g. one is recursive) of these theoretical optimal algo-
rithms cast doubts how suitable they would be for a GPU implementation. Further, e.g.
rock material can have many small concavities and thus it would be preferable to have
algorithms capable of non-convex particles. No proven optimal algorithms for non-
convex polyhedrons have been shown in the literature, and most implementations
rely on convex decomposition [33]. Triangulations are even commonly decomposed
into the smallest possible convex polyhedron, the tetrahedron, and the intersection
for each tetrahedron is then resolved[34]. However, most literature on the subject of

20


5. Polyhedral DEM Contact Detection

Figure 5.2: Intersection points of the polyhedral intersection is marked as dots. The
blue dot shows an already existing node, the red dots are new nodes formed from
triangle-edge intersections.

non-convex polyhedral intersection works under the assumption of having a few large
triangulations and the computation can then be accelerated by placing each triangle
or tetrahedron into some accelerated data structure such as boundary volume hierar-
chies or grid files.

For polyhedral DEM there is vastly different assumptions, the number of triangles per
particle is expected to be low. Moreover, as previously mentioned, these particles can
have many small concavities which means that doing decomposition could substan-
tially increase the triangle count. Further, in this work with a GPU implementation,
millions of intersections are assumed to be performed concurrently. With many inter-
sections performed concurrently the memory requirements of each intersection must
be very low and consequently would using advanced data structures such as the grid
file or bounding volume hierarchies for each triangle not be possible. Consequently,
most algorithms presented in the literature aimed to improve the scaling of polyhedral
intersection are not likely beneficial for the case of a polyhedral DEM GPGPU imple-
mentation.

Instead we propose a simple heuristic algorithm which takes advantage of a funda-
mental assumption of soft DEM, namely that the particles are assumed to have small
overlaps (see section 3 and 4). The basic idea of the algorithm is to quickly filter out tri-
angles which are far from the overlapping volume and then using the naive algorithm
1 on the filtered triangles. Namely, the triangles is filtered by first defining a direction,

d̂ = c2 −c1

∥c2 −c1∥
(5.3)

where ci , i = 1,2, is respective particle center. Further, the extent of each particle in
that direction is expressed as,

e1 = max
j=1,...,nnodes

v j ,1 · d̂; e2 = min
j=1,...,nnodes

v j ,2 · d̂ (5.4)

where v j ,i , j = 1, ...,nnodes , is the nodes of polyhedrons i = 1,2. Now one can define
half-planes orthogonal to d̂ at each distance e1,2. The half-planes is defined as the

21


5. Polyhedral DEM Contact Detection

Algorithm 1 Given two polyhedrons, P1 and P2 find the intersection nodes, c.

Let c = {} be the intersection nodes.
for each node, p, in P1 do

if p inside P2 then
Let c = c ∪ {p}

end if
end for
for each node, p, in P2 do

if p inside P1 then
Let c = c ∪ {p}

end if
end for
for each edge, e, in P1 do

for each triangle, t , in P2 do
if e intersects t at p then

Let c = c ∪ {p}
end if

end for
end for
for each edge, e, in P2 do

for each triangle, t , in P1 do
if e intersects t at p then

Let c = c ∪ {p}
end if

end for
end for

following,

h1 = {x ∈R3|x · d̂ ≤ e1} h2 = {x ∈R3|x · d̂ ≥ e2}. (5.5)

Then the triangles in each particle which does not reside in the half-plane of the other
particle can be removed from consideration. Also, if the two half-planes do not inter-
sect, then the particles cannot intersect. If there exist an intersection of the half-planes,
this intersection is convex, and triangles from both of the particles that reside in the
intersection are guaranteed to be included. Since algorithm 1 will be applied on the
filtered triangles, the last note is important, because it enables for using ray-tracing to
check if nodes are inside the polyhedrons even on the filtered triangles. An illustration
of the filtering is seen in figure 5.3, where the intersection of the half-planes are marked
as the grey area.

This entire filtering process can clearly be performed in O(n) time complexity. It also
has no additional memory requirements since the half-planes are readily recomputed
at each time step. For the typical use case in DEM this filtering is effective, as later seen
in section 7.2.1. There is however no guarantee of its effectiveness for completely gen-
eral polyhedrons, as it is easy to construct examples where no triangles at all will be
excluded.

22


5. Polyhedral DEM Contact Detection

Figure 5.3: Illustration of the filtering of triangles. The grey area is the intersection of
the two respective half-planes used for filtering. The red edges (in 3D triangles) are
include for further computation of mass properties.

Importantly the algorithm cannot fail, this property is independent of the direction, d̂.
The proof of this is trivial, since it is enough to prove that the half-plane of respective
particle will in all cases cover the entire particle. This directly follows from the defini-
tion of the half-planes in eqs. 5.4 and 5.5 in combination with the fact that triangles are
convex.

5.3 Intersection mass properties

The standard approach [8, 10] to evaluate mass properties of overlapping polyhedron
particles in DEM consist of triangulating the intersection volume and then from the
resulting triangulation calculating the desired mass properties. Further, triangulation
of the volume is typically restricted to convex volumes, where e.g. convex hull algo-
rithms or similar have been used. This method is flawed in two regards; first it is doing
more work than needed and, due to the triangulation, it is restricted to convex volumes
(see section 5.1). This section aims to remedy these issues by presenting an alternative
method to perform these calculations.

When using triangulations to compute the mass properties one stores each constituent
triangle, k, of the polyhedral intersection. Further to compute the mass properties the
nodes, vi

k , i = 1,2,3, of each triangle k is needed. Then for example the volume of the
triangulated intersection volume can be computed as,

V =
n∑

k=1

1

6

∣∣(v1
k ×v2

k ) ·v3
k

∣∣, (5.6)

where the other mass properties can be computed in a similar manner.

23


5. Polyhedral DEM Contact Detection

However, we propose that using an alternative data structure, based entirely on local
properties of each node of the intersection, can significantly simplify the computation
of the mass properties. The foundation of the data structure, introduced by W. Ran-
dolph Franklin [34], is a cusp defined as the following quadruple,

C = (p, t̂, n̂, b̂) (5.7)

where each vector is defined as the following:

p : The Cartesian coordinates of the node.

t̂ : Unit vector in the direction of an adjacent edge of the node.

n̂ : Unit vector in the direction of an adjacent face to t̂ and perpendicular to t̂.

b̂ : Unit vector in the direction of the interior of the polyhedron which is

perpendicular to both t̂ and n̂.

Note also how b̂ only adds one bit of information, since b̂ =± n̂×t̂
∥t̂×n̂∥ , i.e. it only adds the

sign of the vector. Now a polyhedron can be represented as an unordered list of cusps,
L = {Ci }, i = 1, ..,m where it is noteworthy that the only constraints on the polyhedron
is that it is finite and that the interior of the polyhedron is a closed subspace of R3. Each
cusp can be seen as a signed simplex defined by the points,(

pi , pi − t̂i ·pi t̂i , pi − t̂i ·pi t̂i − n̂i ·pi n̂i , O = pi − t̂i ·pi t̂i − n̂i ·pi n̂i − b̂i ·pi b̂i

)
(5.8)

where the last equality trivially is seen by noting that (t̂i , n̂i , v̂i ) defines an orthogonal
coordinate system of all 3 dimensions.

Now to compute the area, A, volume, V , and center of mass, c, of the polyhedron the
properties are accumulated from each signed simplex independently:

A =
m∑

i=1
Ai =

m∑
i=1

1

2
t̂i ·pi n̂i ·pi (5.9)

V =
m∑

i=1
Vi =

m∑
i=1

−1

6
t̂i ·pi n̂i ·pi b̂i ·pi (5.10)

c = 1

V

m∑
i=1

Vi ci = 1

V

m∑
i=1

Vi

4
(3pi +2t̂i ·pi t̂i + n̂i ·pi n̂i ) (5.11)

For given input data, the cusp model is inferior to the regular triangle representation
in terms of number of operations required for the mass property computation. Since
for a polyhedron of n triangles the number of cusps required to represent this poly-
hedron is m = 6n. On a NVIDIA GPU the dot product requires 3 floating point opera-
tions (2 fused-multiply add and 1 multiplication) and the cross product 6 floating point
operations (3 fused-multiply add respective multiplication). Now to compute the vol-
ume of the polyhedron the total number of floating point operations (FP) for respective
method is,

# FP ∼ 9n for triangle representation

# FP ∼ 11m = 66n for cusp representation.

24


5. Polyhedral DEM Contact Detection

While this ratio can be made more favorable to the cusp model in the case when all
three properties are computed at once, it nevertheless requires substantially more float-
ing point operations.

However, for the case of polyhedral intersection there is a substantial advantage of
the cusp representation. Namely that no triangulation of the intersection volume is
required since the local properties of each cusp can be determined at each intersec-
tion point. Moreover, since the cusp model also works on completely general polyhe-
drons no convex decomposition of the particle models is needed. This representation
also fits the streaming processors of the GPU especially well, since both the triangula-
tion and the decomposition require more complicated control flows and hence runs a
greater risk of thread divergence.

0 20000 40000 60000 80000 100000 120000 140000 160000
Distance from origin (m)

4.90

4.95

5.00

5.05

5.10

5.15

5.20

Vo
lu

m
e

(m
3
)

Volume computation with cusp model

Reference frame
Body frame

Figure 5.4: The volume for the same volumetric overlap as a function of its distance
from origin. When the cusp model is naively used in the reference frame significant
errors are introduced by the floating point arithmetic. Note that the ratio between vol-
ume and distance is the important factor here, but with floating point representations
also the absolute values will determine the exact error, hence the absolute values are
shown in the figure.

In implementations which use floating point arithmetic a significant problem remains
with the cusp model. Since each simplex has one node at the origin, then when the
geometries are far from the origin |Vi | ≫ V in eq. 5.10 and significant errors are intro-
duced in the floating point arithmetic. For polyhedral DEM there is however a simple
solution, namely that the intersection computation is done in the body frame of one of
the particles. Figure 5.4 shows the volume computation as a function of the distance
from origin. For the reference frame computation the error increase with the distance

25


5. Polyhedral DEM Contact Detection

whereas it remains constant for the body frame computation since in that case the
computation is effectively always done in the left side of the figure i.e. close to the
origin.

5.4 Multiple contact error

It is important to highlight that with the above presented methods multiple contacts
are not properly resolved. Namely each particle-particle contact is always modelled
as a single contact, which by the non-linearity of the force models, and also torque, is
not correct. However, one should note that all other DEM methods does the opposite
error, which is that it models single particle contacts as multiple contacts. However,
since the force models are on such a simple form, the basic motivation why this is
not properly resolved is that large modelling errors already exist. Moreover, for the
typical use cases of DEM, merging the contacts should not be a large concern, but if
for example more complex shapes such as toroidal particles is studied, this modelling
error should potentially be further investigated.

5.5 Broad phase contact detection

Besides resolving the exact intersection between the particles, a DEM solver also needs
a so called broad phase contact detection algorithm, as outlined in section 3. Where
broad phase contact detection refers to effectively, but approximately, finding all the
potential particle pairs. This is to avoid the quadratic scaling to the number of particles
of the naive approach (which is to check all possible particle pairs). Typically acceler-
ated data structures are used to improve the scaling. The two preeminent methods
for this is the Cartesian grid and the bounding volume hierarchy (BVH). The Cartesian
grid divides the space into cells and each particle is classified according to which cells
it covers. Now all potential pairs can be resolved by only checking particles in the same
or neighbouring cells. The BVH on the other hand is a tree data structure where each
node is represented by a geometric object that contains all of its children [35], and in
DEM the leafs are represented by bounding volumes of the particles. Now potential
contacts for a particle is efficiently resolved by traversing the tree structure.

For the case of polyhedral DEM on the GPU it has been reported in the literature [7]
how the grid implementation leads to severe performance degradation when the par-
ticle aspect ratio goes beyond 4:1. This degradation appears since the grid has to com-
promise between high memory usage and a more fine grained grid. Thus, when the
aspects ratios are large the grid can be adjusted to roughly fit one of the smallest parti-
cles, but then the memory consumption can be significant. Thus to limit the memory
consumption one has to increase the cell size, but then, as reported in the literature,
the performance suffers since now more contact pairs are emitted from the data struc-
ture. The same problem also appears in cases where there is a few separated dense
areas of particles, since then also the space between the dense areas needs to be cov-
ered by the grid. For the BVH this problem does not arise, here the memory usage
is completely separated from the tightness of the bounding volume. Hence, with the

26


5. Polyhedral DEM Contact Detection

desire of a general framework, capable of handling varying environments and particle
shapes, only the case of BVH broad phase contact detection will be considered in this
thesis.

A
B C

A

B C

Figure 5.5: A Bounding Volume Hierarchy (BVH). Each node in the tree structure con-
sist of a geometric object (here a rectangle) which covers all its children. This per-
mits for fast overlap tests by traversing the trees structure from the root. Image by
Schreiberx - Own work, CC BY-SA 3.0[36]

An example of a BVH can be seen in figure 5.5. The basic idea behind BVH is that
the parent nodes of the tree can be tested with a geometric predicate, e.g. an overlap
test, and if the predicate is true for a parent node one continues down to its children,
however if a parent node does not fulfill the predicate all nodes beneath it can be dis-
regarded. In this way, large portions of the tree structure can quickly be excluded from
the search. For most realistic use cases one can typically obtain logarithmic scaling for
each search query [35], albeit the worst case time-complexity is not improved from the
naive approach. There are two main algorithmic aspects of a BVH structure, the build
and the search algorithms, which both will be considered in the following sections.

5.5.1 Build algorithm

The build algorithm should given a set of geometric objects generate the tree structure
of these objects and populate each node with its bounding volume. The build algo-
rithm needs to be balanced according to; build speed, memory consumption (both
temporary and permanent) and the quality of the final tree structure. While there is
several different factors to consider for the quality of the tree structure, the short an-
swer is that a higher tree quality yields faster search performance. To make the com-
promise between tree quality and build speed an important guideline is the build to
search ratio. For instance, in scenarios with rigid geometries, the build can be pre-
computed once, whereas the tree can be traversed an indefinite number of times. Thus
for rigid geometries, the speed of the build algorithm is of a rather low significance and
mainly the quality of the tree structure should be of consideration. However, for con-
tact detection of moving objects, e.g. particles in a DEM simulation as relevant for the
current thesis, one typically only searches once for all objects of the tree against them-
selves. Thus, for the particles in DEM can the time spent on the build algorithm be
directly weighted against the time saved in the traversal [35].

Mainly due to the advent of GPU-accelerated real-time ray-tracing, e.g. for computer
games, there has been a significant advance in constructing BVHs on the GPU in the

27


5. Polyhedral DEM Contact Detection

past few years [37, 38, 39, 40]. To build the tree structure on the GPU often requires a
significant different approach to the CPU-counterpart since the CPU algorithms rarely
parallelize well. Most of the work has been performed for binary BVHs, i.e a degree of
2 per node. A notable early work is [37] where the feasibility of constructing the BVH
on the GPU is shown. In this early work a simple build algorithm based on Morton
codes was developed which was significant faster than CPU algorithms. This signifi-
cant faster build could be achieved while still maintaining decent search performance.
The work on binary BVH in [37] has been further extended in later articles optimiz-
ing both the quality of the tree and the build time [38, 39, 40]. It should be noted that
some of these improvements are with regards to optimizing the Surface Area Heuristics
(SAH), which in essence is minimizing the surface area of subtrees and are thus mostly
relevant for ray-tracing applications.

A in-house FCC build algorithm developed by the author has been applied in this the-
sis work and the algorithm is based on some of the aforementioned binary BVH re-
search [37, 38, 39, 40]. While the algorithmic details are not within the scope of this
work, some aspects of the induced memory layout from the build algorithm will be of
significance for later algorithmic considerations. Namely has the tree structure been
adapted to higher degree nodes and especially has the memory layout been optimized
for breadth first search. With the later meaning that each level of the tree is placed lin-
early in memory (as in [37]) with spatially close nodes also close in memory, both these
efforts to improve the cache hit rate.

5.5.2 Search algorithm

The two most prominent algorithms for traversing the hierarchy is the breadth-first
search (BFS) and depth-first search (DFS). BFS traverses the tree in a level-by-level
manner, meaning that all active nodes are processed in a single level at once and only
after is the next level considered. On the contrary, does DFS traverse a branch of the
tree until termination, and at termination proceeds to the latest branch split. In the
context of contact detection for real-time applications with serial implementations the
most popular of the two have been reported to be the DFS. This preference for DFS is
due to a larger amount of temporary memory is needed for BFS. Limiting this by some
form of iterative implementation is not well suited for real-time applications, since
consistent performance is often as important as the speed [35]. On the contrary, for
DEM only the total simulation time is of importance, thus this limitation is of less sig-
nificance.

The only relevant case for GPU accelerated search queries of a BVH is for the case where
many queries are performed at once. Since otherwise at the root only a single thread
will be active and the massive parallelism of the GPU is not properly utilized. Further,
for the case of contact detection between objects (i.e not ray-tracing) it has been shown
in the literature how a BFS implementation is superior in terms of efficiency [41].
In this work a BFS implementation is used. To constrain the memory the searches are
made in batches which are iteratively adjusted to not breach the memory limitation.
Due to the relatively slow changes in geometry in DEM this works well, and as previ-

28


5. Polyhedral DEM Contact Detection

Algorithm 2 Traverse a level, l , in a BVH and store the to be further processed nodes at
next level l +1.

Let n = {} be the nodes at next level to further process.
for each node-search object pair, v,o, on level l in parallel do

for each child, c, of the node v do
if c intersects o then

Let n = n ∪ {c}
end if

end for
end for

ously mentioned the uneven performance of this method is not a concern. A pseudo
algorithm for each level traverse is seen in Algorithm 2. On a psuedo level this algo-
rithm are very simple, however an efficient massively parallel implementation is a sig-
nificant challenge. Especially packing the node, c, into the list, n, is significant to attain
an efficient GPU BFS algorithm. This process is often referred to as stream compaction,
which is for massively parallel implementations a non-trivial task [42], which incorpo-
rates to compute a prefix scan of the number of items to compact [43].

Also a DFS algorithm was implemented, but the batched BFS outperformed it by a
factor of 5-10 times. By analyzing the algorithms with NVIDIA Nsight Compute[22]
this large difference can largely be explained by a significant lower L1/2 cache hit rate
of the DFS search.

5.6 World geometry contact detection

As outlined in section 3, it is desirable that the particles can interact with a surround-
ing geometry. In this work only the case of triangle mesh geometries are considered. As
mentioned in section 4, the same contact forces for the particle interactions are used
for the particle interaction with world geometry. This enforce some restrictions on the
triangle meshes used for the world geometry, namely that their surface must be closed
and have a defined interior.

Since the same contact forces are used as for the particle interactions, this also means
that in large the contact detection remains the same. However, the number of trian-
gles meshes are expected to be few, but to have many triangles, which is the opposite
compared to the particles. Thus for the world geometries, each triangle is instead used
in the same BVH implementation presented in section 5.5, this is in contrast with the
particles, where a bounding volume of the entire particle were used in the BVH. Thus
can the particle-triangle pairs be found by searching over the BVH containing the tri-
angles. After computing the particle-triangle pairs, algorithm 1 can be applied and the
same scheme for finding the properties as previously presented in section 5.3 is then
followed. Note that for typical DEM applications the cost of finding, and resolving the
particle-triangle pairs is small compared to particle-particle pairs, since the number
of particle-triangle pairs will scale as bulk surface whereas the particle-particle pairs

29


5. Polyhedral DEM Contact Detection

scales as the bulk volume.

30


6
Physical Verification and Convergence

of Method

This section presents the verification of both the method and the implementation.
Since the force model is derived from contacts of spherical particles the model is first
verified for two spherical particles colliding. Next, the convergence of this case with
respect to particle resolution is studied. Then, the same experiments are repeated for
an irregular rock shaped particle.

To further verify the method on a larger scale it is investigated whether relevant physi-
cal properties of particle systems can be captured. In particular we investigate whether
the solver achieves convergence of the repose angle with respect to particle resolution,
and in addition the solver is also verified against data from an experimental setup for
rock material.

Due to DEM requiring substantial calibration experiments to properly validate it against
experimental data, doing such a complete validation study against experiments are out
of scope for this mainly method driven thesis. Thus only verification of the methods
are considered.

6.1 Verification and convergence for spherical polyhedrons

Since the force models in use (see section 4) are derived from the spherical case the
method is first verified for two spheres colliding. To average out the effect of dis-
cretization the experiments are repeated many times with the initial orientation of the
spheres sampled according to a uniform distribution. Further, only the normal elastic
force is considered. The radii of the spheres are R = 0.1m and the initial speed of both
particles are 5ms−1.

The spherical polyhedrons used for this experiment were generated by the built-in
spherical triangulation tool in the software IPS - Industrial Path Solutions [44]. Fig-
ure 6.1 shows the generated polyhedrons for a few different resolutions. It should be
noted that the models were normalized to give a consistent volume. This is not the
case for the spheres directly after generation in the software IPS since the spheres are
triangulated by placing the nodes on the surface of a sphere with a given radius, mean-
ing that the volume decreases with the resolution of the triangulation from the models.

31


6. Physical Verification and Convergence of Method

Figure 6.1: Spherical polyhedrons used for this study with the resolutions 40, 84 and
180 (left to right).

However, for the verification of the solver it is of a greater importance to keep the mass
and volume of the particles consistent. This is to keep the initial kinetic energy consis-
tent, since if not, it will skew the kinetic energy response studied in below experiments.
In practical terms, a higher initial kinetic energy will result in a longer response.

0.00 0.01 0.02 0.03 0.04
Time (s)

0.0

0.2

0.4

0.6

0.8

1.0

1.2

N
or

m
al

iz
ed

ki
ne

tic
en

er
gy

(E
k

/E
0 k
)

Avg. Sphere (E = 1.00e+06 Pa)

0.000 0.005 0.010 0.015 0.020
Time (s)

0.0

0.2

0.4

0.6

0.8

1.0

1.2
Avg. Sphere (E = 2.15e+07 Pa)

Sphere
40 tri.
60 tri.
84 tri.
112 tri.
144 tri.
180 tri.

Figure 6.2: The normalized kinetic energy averaged over 10000 instances as a function
of time during collision of spheres. The dotted line shows the response for a perfect
sphere, and the solid lines show the response for polyhedral spheres of different res-
olutions. The left figure shows the response for spheres with an Young’s modulus of
E = 1MPa and the right figure for stiffer particles with E = 21.5MPa. The initial orien-
tation of the polyhedrons are uniformly distributed.

Figure 6.2 shows the average kinetic energy as a function of time during a collision
of two spheres. The kinetic energy, Ek , is normalized to the initial kinetic energy, E 0

k
(i.e. the kinetic energy of both particles before the collision) as Ek /E 0

k . In the figure
to the left the Young’s modulus is, E = 1MPa, and in the figure to the right the Young’s
modulus is E = 21.5MPa. In both cases the particle material has a Poisson’s ratio of ν=

32


6. Physical Verification and Convergence of Method

0.25. The figures shows the response for different resolutions of the polyhedral spheres,
and also for a perfect sphere simulated with the internal FCC spherical DEM solver.
For the softer material properties it is clear that the response of all the polyhedrons
closely match the perfect sphere with only slight deviations. However, for the stiffer
particles the lower resolution polyhedrons leads to a significant error in the response.
This is expected since stiffer particles will lead to less overlap and effectively making
the overlapping volume resolution lower.

0.00 0.01 0.02 0.03 0.04
Time (s)

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

N
or

m
al

iz
ed

ki
ne

tic
en

er
gy

(E
k

/E
0 k
)

S.D Sphere (E = 1.00e+06 Pa)

0.000 0.005 0.010 0.015 0.020
Time (s)

0.00

0.05

0.10

0.15

0.20

0.25

0.30
S.D Sphere (E = 2.15e+07 Pa)

40 tri.
60 tri.
84 tri.
112 tri.
144 tri.
180 tri.

Figure 6.3: The standard deviation of the kinetic energy over 10000 instances as a func-
tion of time for two spheres colliding. The left figure shows the response for spheres
with an Young’s modulus of E = 1MPa and the right figure for stiffer particles with
E = 21.5MPa. The initial orientation of the polyhedrons are uniformly distributed.
Note that for a perfect sphere the standard deviation is always zero.

In figure 6.2 only the average response in the kinetic energy was considered, this to
verify the force model and the method of using volumetric overlaps. However, as men-
tioned, by averaging out the response, the fluctuations induced by the discretization
of the particles was neglected. Thus figure 6.3 also shows the response in the standard
deviation of the normalized kinetic energy, which becomes a measure of the fluctua-
tions induced from discretization. The two notable peaks in the figure show that the
discretization cause fluctuations in the energy response, and that the fluctuations are
most profound at the beginning and end of the collision. At the beginning and end of
the collision, the overlapping volume is smaller, and hence will the effective resolution
of the overlapping volume be lower, which cause larger fluctuations. Also, one can note
that the standard deviation is higher for stiffer particles and that there is a clear reduc-
tion when the number of triangles are increased. However, there is a notable peak in
the standard deviation for even the highest resolution polyhedrons, whereas for per-
fect spheres it should always be zero.

Finally, to get an overview of the convergence for a larger range of particle stiffness,

33


6. Physical Verification and Convergence of Method

40 60 80 100 120 140 160 180
Number of triangles

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9
N

or
m

al
iz

ed
ki

ne
tic

en
er

gy
(E

k
/E

0 k
)

Min. average in energy response

40 60 80 100 120 140 160 180
Number of triangles

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

N
or

m
al

iz
ed

ki
ne

tic
en

er
gy

(E
k

/E
0 k
)

Max. S.D in energy response

E = 1.00e+06 Pa
E = 2.15e+07 Pa
E = 4.64e+08 Pa
E = 1.00e+10 Pa

Figure 6.4: Convergence of the kinetic energy response for spheres with respect to the
resolution of polyhedral spheres. To the left the minimum average normalized kinetic
energy is shown and to the right the maximum standard deviation of the same quan-
tity is shown. Clearly the convergence gets significantly slower as the stiffness of the
particles increase.

the minimum value of the averaged normalized kinetic energy, i.e. the minimum of
the data plotted in figure 6.2, and the maximum value of the standard deviation in the
normalized energy response, i.e. the maximum of the data shown in figure 6.3, is con-
sidered in figure 6.4. Note that the maximum/minimum values are used instead of the
average or RMSD over time, since in that case the measure become dependent of the
length of the simulation, which has to be different to attain to the different stiffness.
Hence to be able to compare between the different resolutions and stiffness, the ex-
trema were considered. However, it should be noted, that these are not necessarily a
precise measure of how well the response match the perfect sphere, but rather for com-
parisons between the polyhedrons. Nevertheless, figure 6.4 clearly show that there is a
clear convergence in the response for the range of particle stiffness considered. Also it
is clear from figure 6.4 that stiffer particles leads to larger discretization errors, not only
affecting the fluctuations in the response, but also the average response.

This result is fundamental to polyhedral DEM. Foremost, that the method converge
with respect to particle resolution, instead of an erratic response as is the case for the
multispheres, is in itself profound. Also, a limitation of the method is here clearly
shown, as the capabilities to model smooth surfaces with non-zero curvature is lim-
ited and in addition largely depends on the material properties. The results shows that
the polyhedrons are to a larger extent capable of modelling smooth surfaces for softer
particles than for hard particles.

34


6. Physical Verification and Convergence of Method

6.2 Verification and convergence for irregular shaped par-
ticles

While the results for the spherical particles showed that the method in contrast to the
multisphere method can converge in resolution, the use of spherical particles is how-
ever not the most relevant for actual usage of the method. In this section the method is
studied for irregular shaped particles to ensure that the method can converge also for
more complex shapes.

Figure 6.5: The irregular shaped polyhedral particle models used in this study. The fig-
ure presents particles with 10, 40, 70 and 1000 (original) triangles (left to right). While
not clear from the figure several small concavities exist in the model.

Figure 6.5 shows the irregular shaped polyhedrons used in this study. The rightmost
polyhedron is the original triangulation consisting of 1000 triangles. The three left-
most polyhedrons consisting of 10, 40 and respectively 70 triangles were generated by
using the IPS - Industrial Path Solutions[44] triangulation simplification tool with the
standard settings. The models were then normalized to give a consistent volume which
the IPS simplification tool does not attempt to attain. Note that while not clear from
the figure, the polyhedrons have several small concave sections.

The same numerical experiments as for the spheres were performed with the irregular
shaped particles, namely studying the kinetic energy response during collision of two
particles. Besides the changed particle model the experiments were conducted in the
same way and with same parameter values as for the spherical particles.

The average normalized kinetic energy for the irregular shaped particles is shown in
figure 6.6. The left figure shows for a Young’s modulus of E = 21.5MPa and on the the
right E = 464.0MPa. The response for the irregular shaped particle is noticeable differ-
ent opposed to spheres. For this particle model the minimum average kinetic energy
does not approach zero, which is expected due to the irregular shape combined with
the random initial orientation. Note that for an individual collision the kinetic energy
will go to zero. The standard deviation of the normalized kinetic energy at different
times are shown in figure 6.7. The standard deviation are significant higher compared
to the spheres, which is expected since now the original shape will also induce fluctu-
ations in the response. While there is a difference in the response for both the average
and standard deviation when the resolution change, only small shifts can be seen and
at least in the average response there is a tendency of convergence when the resolution
is increased.

35


6. Physical Verification and Convergence of Method

0.00 0.01 0.02 0.03 0.04 0.05
Time (s)

0.75

0.80

0.85

0.90

0.95

1.00
N

or
m

al
iz

ed
ki

ne
tic

en
er

gy
(E

k
/E

0 k
)

Avg. Rock (E = 2.15e+07 Pa)

0.00 0.01 0.02 0.03
Time (s)

0.91

0.92

0.93

0.94

0.95

0.96

0.97

0.98

0.99

1.00
Avg. Rock (E = 4.64e+08 Pa)

20 tri.
40 tri.
60 tri.
80 tri.
100 tri.

Figure 6.6: The normalized kinetic energy averaged over 10000 instances as a function
of time during collision of two irregular shaped particles. The different coloured solid
lines show the response for different resolutions. The left figure shows the response
for particles with an Young’s modulus of E = 21.5MPa and the right figure for stiffer
particles with E = 464.0MPa. The initial orientation of the polyhedrons are uniformly
distributed.

Note that a similar analysis as for the spheres in figure 6.4 is not included. Since such
a study could easily be misleading due to the more irregular shape of the response
curves. Moreover, as will be further discussed in section 8, the actual convergence is
not the important conclusion. The importance of these studies, rather is to highlight
how utilizing volumetric overlaps of the polyhedrons leads to a force model that do not
significantly change when small geometric changes are introduced, i.e. that it does not
behave like multispheres or feature by feature polyhedral force models. This important
property of the volumetric overlaps should already be evident from figure 6.6 and figure
6.7.

6.3 Verification and convergence on a laboratory scale

The numerical experiments in the previous sections to a large extent separated out
solely the geometry of the particles and how changes to the geometry affected the the
normal elastic response. Those experiments verified that polyhedrons with volumet-
ric overlaps were well-behaved with respect to small geometric changes. However, to
verify the tangential components of the force model and also study the aggregated ef-
fects of using polyhedrons, this section will study the method on a larger scale. The
experiments are increased to a laboratory scale based on a calibration rig experiment.
The experimental setup can be seen in figure 6.8, and both numerical and laboratory
experiments were conducted as follows[45, 46]:

1. Particles were placed/generated in the hopper (the wedge shape at the top of

36


6. Physical Verification and Convergence of Method

0.00 0.01 0.02 0.03 0.04 0.05
Time (s)

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16
N

or
m

al
iz

ed
ki

ne
tic

en
er

gy
(E

k
/E

0 k
)

S.D. Rock (E = 2.15e+07 Pa)

0.00 0.01 0.02 0.03
Time (s)

0.00

0.05

0.10

0.15

0.20
S.D. Rock (E = 4.64e+08 Pa)

20 tri.
40 tri.
60 tri.
80 tri.
100 tri.

Figure 6.7: The standard deviation of the normalized kinetic energy from 10000 in-
stances as a function of time during collision of two irregular shaped particles. The re-
sponse for different resolution particles are displayed as different coloured lines. The
left figure shows the response for particles with an Young’s modulus of E = 21.5MPa
and the right figure for stiffer particles with E = 464.0MPa. The initial orientation of
the polyhedrons are uniformly distributed.

figure 6.8) with a given size distribution and specified total mass.
2. The trap doors below the hopper were opened.
3. Flow phase: Particles flow down from the hopper on to the tilted plane and finally

to the bottom plate.
4. Steady state phase: Particles form a steady pile at the bottom plate. The repose

angle is measured when the kinetic energy has converged to zero.
For the experiments, the repose angle was defined as the maximum angle of the pile.
For each numerical experiment a 2-dimensional height map was created by dividing
the particles into bins in the horizontal direction and measuring the maximum height
in each bin. Then the repose angle was defined as the maximum derivative of a fitted
sigmoid function to the height map of the system,

h(x) = c

1+exp{a(x −b)}
(6.1)

where a, b and c are constants fitted to each height map, hi = h(xi ), i = 1, ...,20, by
using the function curve_fit from the python3 package scipy.optimize. Figure 6.9
shows an example of a height map with its corresponding sigmoid function.

In an earlier phase of this work, the numerical experiment on the calibration rig were
conducted without any tangential spring stiffness and instead purely based on the
force model proposed by Nassauer et. al. [9]. The simulation setup can be seen in
table A.1. From such experiments it was clear that the desired behaviour of the system
could not be achieved. Specifically, the particles did not reach a steady-state of non-
zero repose angle. Instead small vibrations in the particles caused the repose angle to

37


6. Physical Verification and Convergence of Method

Figure 6.8: The calibration rig to measure the repose angle of rock material. To the left
the real experimental setup is seen, to the right the rig modelled in the polyhedral DEM
solver is shown. For the illustration of the DEM simulation the particles are colour
coded according to their speed.

slowly drift towards zero. While there is a possibility of this being an issue with the
parameter setup, a range of different friction parameters were investigated, but even if
the simulation in other regards showed a highly unphysical behaviour due to high fric-
tion (e.g. the particles moving very slowly down the middle plate) there were still small
vibrations in the pile of particles which lead to drifts in the repose angle. It should be
mentioned that no studies were performed of this and no similar behaviours were re-
ported in [9].

Since the desired behaviour of the system could not be achieved without the tangen-
tial spring stiffness a force model containing such component was derived from the
force models for spheres (see section 4). Besides changed parameters from the use of
different force models, the same setup as without a tangential spring was used for the
simulation (see table A.2).

The response of the the calibration rig simulation with respect to particle resolution
is seen in figure 6.10. Both the averaged kinetic energy during the flow phase of the
experiment and the repose angle of the steady-state phase are shown. There are signif-
icant fluctuations in the repose angle even for a fixed number of triangles. This partly
can be due to measurement errors from the fitted sigmoid function. However, all fit-
ted sigmoid functions and their corresponding height map where manually inspected
and there was a noticeable difference also in the height maps. Nevertheless, to verify
that the variance measured in the repose angle was not entirely due to measurement
errors, also the kinetic energy is shown in figure 6.10. The measurement errors in the
kinetic energy should be minimal and yet significant fluctuations in the data can be
seen. This indicates that fluctuations in the physics itself exist, which is expected since

38


6. Physical Verification and Convergence of Method

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
Length (m)

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

H
ei

gh
t(

m
)

Height map from rig simulation

Fitted sigmoid
Data

Figure 6.9: An example height map from the rig simulation. Also shown is a fitted
sigmoid function to the height map. The repose angle is measured as the maximum
derivative of the fitted sigmoid function.

the particles are generated randomly from a given size distribution. However, notice-
able larger fluctuations can be seen in the repose angle compared to the kinetic energy.
It is unclear whether this difference is due to larger measurement errors in the repose
angle or induced from the physics, this uncertainty suggest that for future verification
and validation of DEM better measurements than the repose angle may be found.

To compensate for these fluctuation a total of 11 simulations were performed per res-
olution. There is notable convergence for both the mean of the kinetic energy and
repose angle. At around roughly 60 triangles per particle the mean of both measures
roughly stabilizes within the range of the standard deviations. For the numerical ex-
periments a stiffness of E = 20MPa was used. One can note that the average response
for the single collision for a similar stiffness shows a comparable convergence as for
the calibration rig, at least in a broad perspective, where the single collision roughly
have converged after 60 triangles, similar to the calibration rig.

Further, from figure 6.10 it is also clear that the most coarse grained particles at 10-20
triangles shows significant different values. This can intuitively be explained by the
coarser particles having sharper edges. The sharper edges cause a higher degree of
interlocking between the particles. The interlocking lowers the kinetic energy of the
system during the flow phase due to lower flow rate from the hopper. Whereas for the
repose angle the increase in interlocking makes the pile of particles more stable and
causes the increase in repose angle. Further the repose angle can also be compared
to experimental values, the experimental range of repose angles are between 33 to 42
degrees which are for a range of different materials and particle shapes. In figure 6.10
it is clear that only varying the particle resolution of one particle model almost covers
the entire experimental range. Note that the experimental data was existing data which

39


6. Physical Verification and Convergence of Method

10 20 30 40 50 60 70 80
Number of triangles

0.95

0.96

0.97

0.98

0.99

1.00

1.01

1.02
R

el
at

iv
e

ki
ne

tic
en

er
gy

Kinetic energy (Calibration rig)

10 20 30 40 50 60 70 80
Number of triangles

32

33

34

35

36

37

38

39

R
ep

os
e

an
gl

e
(d

eg
re

es
)

Repose angle (Calibration rig)

Data
Mean
Standard deviation

Figure 6.10: The kinetic energy during the flow phase of the calibration rig and the re-
pose angle during the steady-state phase. Each dot shows a measure from an separate
experiment. The dotted line shows the mean of the measures, and the grey area shows
the range of the standard deviation. Noticeable fluctuations in both measures can be
seen. This can be compared to the experimental range of 33 to 42 degrees in repose
angle for a range of different materials and particle shapes.

were not sampled for this purpose and hence no stronger verification could be made.
Rather the purpose of this verification was to consider if the general behaviour of the
experimental setup could be captured by the method, which figure 6.10 clearly shows
is the case. However, it should be highlighted that stronger verification and validation
of the method should be one of the top priorities for further work.

Indirectly, figure 6.10 also demonstrates the need for DEM solvers to be highly effi-
cient and capable of taking full advantage of modern hardware. Due to the high com-
putational demands of DEM the open literature suggests that the typical studies are
conducted on only a few data points, which figure 6.10 clearly shows can lead to signif-
icant misguidance. Hence to further enhance the usefulness and predictability of the
method one of main research goals of DEM should be