Traffic Congestion in Urban Modelling
Master’s thesis in Complex Adaptive Systems

ERIK ROOS

Department of Space, Earth and Environment
CHALMERS UNIVERSITY OF TECHNOLOGY
Gothenburg, Sweden 2018


Master’s thesis FRT 2018:01

Traffic Congestion in Urban Modelling

Erik Roos

Department of Space, Earth and Environment
Chalmers University of Technology

Gothenburg, Sweden 2018


Traffic Congestion in Urban Modelling
ERIK ROOS

c© ERIK ROOS, 2018.

Supervisor: Alexander Hellervik, Trafikverket
Examiner: Claes Andersson, Department of Space, Earth and Environment

Master’s Thesis FRT 2018:01
Department of Space, Earth and Environment
Chalmers University of Technology
SE-412 96 Gothenburg
Telephone +46 31 772 1000

Typeset in LATEX

iv


Abstract

In this thesis, a model for traffic congestion in urban areas is developed using
betweenness centrality. The betweenness centrality is augmented with the activity
[1] in the network, which corresponds to the assessed value of a property. In this
way, shortest paths between two nodes with high activity contribute more to the
overall betweenness centrality, and in turn traffic, than between two nodes with
low activity. The traffic network is imported from Open Street Map.

In each iteration, the betweenness centrality in the network is calculated and
the speeds on the edges are updated according to the VQ-relations [2][3]. A new
activity is then calculated with the new speeds. Travel times in the network are
finally compared to travel times given by Google Maps.

The model is tried out for a number of settings in the city of Gothenburg. The
travel times obtained with the speed limits given from OSM show high linearity but
the average travel time is too short. The model manages to improve the travel times
to some extent but loses linearity when too much traffic is introduced. Updating
the activity in each iteration shows little difference compared to calculating the
activity once and keeping it static.

When comparing flows, the model does indeed capture some real-life behaviours
but fails to capture others. For example, in the inner city and along the major
highway going through Gothenburg the traffic is not slowed down at all in the
model.

Future work includes studying different ways of defining the shortest path be-
tween nodes (such as topological or angular distance), introducing some through
traffic or increasing the geographical calculation area and increasing the resolution
of the traffic network (making it directed and have more precise road types).

Keywords: traffic congestion, betweenness centrality, urban modelling


Contents

1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Theory 4
2.1 Betweenness centrality and activity . . . . . . . . . . . . . . . . . . 4
2.2 Activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Augmenting the BC . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Traffic flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.5 VQ-relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.6 Algorithm and mixing betweenness centrality . . . . . . . . . . . . . 8
2.7 Checking the results . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.8 Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Method and data 11
3.1 Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4 Results 13
4.1 Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Traffic flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 No activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4 Convergence and mixing . . . . . . . . . . . . . . . . . . . . . . . . 18
4.5 VQ-relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.6 Other cities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5 Discussion 20
5.1 Baseline model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.2 Time of day and asymmetrical flow . . . . . . . . . . . . . . . . . . 20
5.3 VQ-relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.4 Traffic flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.5 Using Google Maps as the correct traffic . . . . . . . . . . . . . . . 22

vii


5.6 Activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

6 Conclusions and future work 24

viii


1

Introduction

In the world of today, it is close to impossible to not be a part of the giant network
that is our transport system. As soon as you step out of the door you have to
interact with pedestrians, bicycles, cars, busses and whatever you come across.
For the most part, the interaction is automatic, there are certain basic rules you
have to follow, and for a single agent, this is quite easy to describe and understand.
But when many agents have to interact complex features that are not described
by these rules arise, for example, congestion.

While the people living in a city often know which areas are crowded and
when to avoid them, city planners try to create traffic networks that have as little
congestion as possible. An architect might attack this problem from one side, but
as a physicist, it is natural to study traffic networks as complex networks. When
looking at the problem from a mathematical point of view there are many tools
one can use to obtain inherent properties of the network.

Alexander Hellervik is currently doing his Ph.D. on the interaction between
urban traffic networks and urban activity and land use. In his model Hellervik
takes in just a traffic network and tries to predict where there will be a lot of
so-called ’activity’, which roughly can be translated into the assessed value of a
property [1]. The idea is to keep the number of parameters to a minimum and
see what kind of information can be obtained from the traffic network, just the
actual roads, of an urban area. This could then be used to, for example, see how
the center of a city would change when removing or building a new road. It could
also be used as a component of a larger model.

The ’activity’ of the network is calculated as an iterated eigenvalue centrality
depending on a node’s position in the network (how well connected the node is)
and travel time to other nodes. How long it takes to get from one node to another
is calculated simply by the lengths of the roads divided by the maximum speed
of each road. This is highly simplified, anyone who has ever been in a car knows
that when driving in a city you often go a lot slower than the speeds on the traffic
signs, which motivates the introduction of a congestion model to possibly get more
accurate travel times. This is the starting point of this thesis.

While Hellervik wants an answer to the question ’Is it possible to improve the

1


accuracy of the activity model by introducing a congestion model?’, the opposite
question can also be made: ’Is it possible to use the activity model to get an
accurate congestion model?’. This is the problem formulation of this thesis and
we will spend the next pages trying to find out.

1.1 Background

There are many models simulating traffic, both on a microscopic level (e.g. an
intersection) and macroscopic levels (e.g. the traffic of a large city). Some models
are more advanced than others but they often have a lot of parameters. One exam-
ple is Trafikverket’s macroscopical model called Sampers [4]. Sampers simulates
traffic on a national level in Sweden. To get a traffic flow the model uses empir-
ical data to describe what trips are being made and how often. Going out and
asking people how and where they want to travel works for a national company
like Trafikverket but is a bit out of the scope of this thesis. The model gives good
results but takes a long time to run. Instead, in this thesis, the idea is to make a
faster model using only the traffic network as a parameter.

This limits the number of ways to model traffic. One could use some kind of
agent-based model, spreading out agents over the network and make them go to
different places and interact with each other. This quickly becomes quite heavy
calculation wise. We are looking for something faster and more abstract, preferably
something inherent in the network. Using graph theory there are a number of
centrality measures that can be calculated for the network [5][6]. The different
centrality measures calculate how central nodes or edges in the network are from
different aspects. One centrality measure that seems very much to describe some
kind of flow in the network is betweenness centrality.

Betweenness centrality for an edge is a measurement of how much ’in between’
other nodes a certain edge is. It is calculated by finding the shortest paths be-
tween each pair of nodes and then giving each edge a score depending on how
many ’shortest paths’ that pass through this edge. If an edge has a high value
of betweenness centrality, it means that when taking the shortest path from one
node to another, chances are high that you would pass through this edge. It is
natural to believe that this would in some way reflect the actual traffic flow in a
network and there have been several papers showing that there is indeed a correla-
tion [6][7][8][9]. One thing these papers have in common is that they all augment
the betweenness centrality in some way.

Altshuler et al. [8] augmented betweenness centrality in a number of ways using
the distance between nodes and travel times among other things to achieve better
results. They also introduced a mixing step where they mixed the betweenness
centrality obtained for a free flow network and that of a congested network, which

2


also improved the results. Furthermore, they also introduced a dependence on the
road type for the betweenness centrality, as one, for example, would rather go on
a highway when traveling a long distance.

Jayasinghe et al. [7] showed that using a BC focusing on topological or an-
gular distance gave a better result than the actual shortest distance. Topological
distance being defined as the number of turns one would have to take from one
node to another and angular distance is defined as the total angles turned. Both
introduced to describe a behaviour where you prefer to take not the actual shortest
path but rather the one perceived as ’straightest’.

Scheurer et al. [6] also introduces a number of other centrality measures to
combine with the betweenness centrality to describe a public transport network.

Kazerani et al. [9] also conclude that the betweenness centrality has to be
augmented in some way to give meaningful results.

Porta et al. [10][11] also argue that betweenness centrality show important
properties of the flow of a network but also look at information centrality and
straightness centrality as good candidates to describe the flow.

In this paper, there will also be an attempt to describe traffic using BC. How-
ever, with a different augmentation than in previous attempts. In [1] Hellervik
calculates activity levels in a city based only on the actual road network. A high
activity level of a property would correspond to a high assessed value, which in turn
corresponds to economic activity (such as an office or a shopping mall). An area
with high activity would in this way generate a lot of traffic (this is not necessarily
true, for example, location can have a big effect on the assessed value without
actually affecting the generated traffic). If this hypothesis holds, the activity level
could be used to augment the betweenness centrality to reflect the amount of traf-
fic. The path between two nodes with high activity contributes more to the overall
betweenness centrality than two nodes with low activity.

To summarize, the idea of this thesis is to try to use the betweenness centrality
of a traffic network and augment it with the activity calculated by Hellervik to
create a model that describes traffic congestion.

3


2

Theory

2.1 Betweenness centrality and activity

Betweenness centrality is one of many ways to measure how ’central’ a node or edge
in a network. The betweenness centrality of an edge v (or a node) is calculated as

CBC(v) =
∑

s 6=v 6=t

σst(v)

σst
, (2.1)

where σst(v) is the number of shortest paths from node s to node t passing through
edge v and σst is the total number of shortest paths (with the same length) from
node s to node t. In words, one calculates the shortest path(s) from all nodes to all
other nodes. When a shortest path passes an edge, the edge is given the value +1.
Since there can be several shortest paths between two nodes, not all traversing the
same set of edges, the denominator makes sure that each set of two nodes only
contribute 1 to the total betweenness centrality.

The edges with a high BC would then be very central edges with a high prob-
ability of traversing it when going between two nodes in the network. The idea in
this thesis is that the betweenness centrality reflects and should be in proportion
to the traffic flow.

However, it is easy to tell it will not work straight out of the box. In the
betweenness centrality in eq. , all nodes are equal which should not be the case.
A path between two nodes that generate a lot of traffic, i.e. two busy intersections
in the center of any city should contribute more to the flow of traffic than between
two houses in the suburbs. This suggests that the BC needs to be augmented in
some way, this is where the activity comes in.

2.2 Activity

The activity is calculated by first dividing the map into cells. This is done by
first simply putting a mesh grid over the traffic network. The cells that are close
enough to any edge in the network are then connected to the network. The cells

4


are connected by adding a virtual road from the center of each cell to the closest
edge in the network. The activity for each cell is then basically calculated as an
eigenvalue centrality. Note that cells that are deemed to be ’too far’ from the road
network are not used.

Using Dijktra’s algorithm the travel times between cells, dij, are obtained.
Dijktra’s algorithm calculates the shortest path between two cells, here shortest
means shortest travel time. The travel time for each edge is taken to be the length
of the edge divided by the speed of the edge.

The interaction, Dij, for the cells i and j are then calculated for all pairs in
the network, it is defined in [1] as:

Dij = (1 + adij)
−b, (2.2)

where a and b are constants. As suggested by Hellervik, the constants are set to
a = 0.001 and b = 1.5. Dij is basically just the inverted distance and to avoid
unreasonably large values for short distances 1 is added. The normalized activity
ai is then defined as

ai =
∑
j

Dijaj∑
kDjk

. (2.3)

With Mij = Dij/
∑

kDik we get the eigenvalue equation:

ai =
∑
j

Mijaj, (2.4)

which can be solved for ai. What is obtained when solving this equation is an
activity in each cell that is equal to the sum of the activity in all other cells times
their respective normalized interactions. A cell with high interactions with other
cells with high activity would in turn also have a high activity.

2.3 Augmenting the BC

When performing the calculations for the BC, only the cells that are used when
calculating the activity are seen as traffic generators. So the first augmentation
to the BC is that we only take into account the shortest paths between the cells,
and not between all nodes in the network. The second tweak to the BC is that it
is weighted by the activity in the two nodes. In [1] Hellervik gives a formula for
the activity flow, pij, in the network as:

pij =
Dijai∑
kDik

= Mijai. (2.5)

5


Comparing with eq. 2.3, pij is just the contribution of activity from node i to node
j. This value is then used in the betweenness centrality equation 2.1:

CBC(v) =
∑

s 6=v 6=t

pstσst(v)

σst
. (2.6)

This takes into account both the activity in the origin and the destination (since
the flow in from j to i is also added) but also the length of the shortest path.
This is supposed to reflect that there should be a higher flow of traffic between
two nodes with high activity on a close distance than between two nodes with low
activity that are far away from each other.

2.4 Traffic flow

With the augmented BC calculated, a traffic flow is then spread out over the
network. As there is no inherent amount of traffic in a network, a way to define
a total number of trips in the network is needed. The total number of trips per
hour is set to a factor of the population of the city in question as

Fcity = βtrips · Pcity, (2.7)

where Fcity is the total number of trips per hour in the city, βtrips is the ’trip factor’
and Pcity is the population of the city. The trips are then spread out on the edges
according to the betweenness centrality of the edges.

While this is a very easy way to get an amount of traffic there are some draw-
backs to this way of defining it. The percentage of drivers in Gothenburg would
(probably) be lower than an arbitrary city in the U.S. and in the same way larger
than a poorer town, somewhere in the more undeveloped part of the world.

Another drawback is that the flow is static, in real life the traffic flow varies a
lot during the day. To get good results one would have to use different amounts
of flow during different parts of the day. Furthermore, the flow of traffic can in
itself change the total amount of traffic. If taking the car would mean that you
would get stuck in congestion on your way to work, people might choose different
transport alternatives.

But it is, for now, of less importance since the goal of this project is to try
to find congestion behaviours using BC, not to find a good way to estimate the
number of cars on the road.

6


Figure 2.1: The VQ-relations show empirical data on how fast people actually
drive depending on the flow of cars. As the flow passes certain breakpoints the
speed goes down. Note that even when there is no traffic the speed is lower than
the allowed speed limit.

2.5 VQ-relations

Another crucial part of the model is the so-called VQ-relations. Trafikverket has
measured ’actual’ speed on different kinds of roads [2][3]. The ’actual’ speed of
a road depends (obviously) on allowed speed but also on what kind of road it is
and where the road is situated. People tend to drive faster than the speed limit
on roads in the countryside, while they drive slower than the speed limit in urban
areas. Furthermore, the actual speed of a road depends on the amount of traffic.
Each road type has a certain capacity with certain breakpoints. If the flow is below
the first breakpoint then every car can go with maximum speed. As mentioned
before, the maximum speed is not necessarily the speed given by the road signs (it
is often lower but in some cases also higher). After passing the first breakpoint,
the traffic is so high it starts to affect the driving speed. At first, it is just a little
bit slower but by each breakpoint passed the speed gets lower and lower until after
the last breakpoint when the speed is set to 10km/h, see figure 2.1.

Trafikverket has VQ-relations for a number of road types depending on the
number of lanes, sight, speed and if it is in a rural or urban area. The classification
in OSM is, unfortunately, a bit cruder. The roads in OSM are classified only by
name as, for example, ’motorway’, ’residential’ or ’primary’.

7


In the model, each piece of road from OSM is approximated to the one that
seems to fit best with the VQ-relations. This approximation is done by hand and
because of the discrepancy of how the road types are defined it is a bit rough.
The approximation is based on the speed of the edge and position of the road.
For example, a road classified as ’residential’ by OSM would get the VQ-relation
of ’V-Q centrumomr̊ade 2-fält Citygata’ for the speeds 40km/h and 50km/h and
’V-Q mellanomr̊ade 2-fält Tangent’ for 60km/h and 70km/h. The other road types
in OSM are approximated in the same way but with different VQ-relations.

2.6 Algorithm and mixing betweenness central-

ity

With these pieces we can form the algorithm for the model, it goes as follows:

1. Calculate the shortest paths between nodes using Dijkstra’s algorithm.

2. Calculate the activity in the different cells.

3. Calculate the betweenness centrality given by the cells and weight it by the
activity.

4. Mix the new betweenness centrality with the previous one to create a mixed
betweenness centrality (optional).

5. Calculate an estimated number of trips on each edge with the population
and the augmented betweenness centrality.

6. Update the speeds on the edges according to the VQ-relations.

7. Start over from 1 with the new speeds.

8. Compare the travel times to Google Maps.

Step 4 is introduced to keep the traffic flows from changing too fast. With
a constant 0 ≤ α < 1 the newly calculated betweenness centrality Bnew in each
iteration is mixed with the previous betweenness centrality Bprev to form a mixed
betweenness centrality Bmixed:

Bmixed = αBprev + (1 − α)Bnew, (2.8)

where α = 0.24 as suggested by [8]. This value is then used as the new betweenness
centrality for the following calculations.

8


2.7 Checking the results

In the last part of the algorithm, when the new speeds of the edges have been
calculated, it is time to check the results. This is done by first selecting a number
of pairs of start and end points in the network and calculating both the travel
times and length between these nodes. This is then compared to what Google
Maps would say for the same pairs of start and end points. By comparing both
the baseline model (with unaltered speeds) and each iteration with Google Maps,
one can see how the model develops. With Google Maps Distance Matrix API [12]
both the travel time and length of a given route are easily available. Google Maps
was chosen for its availability and flexibility (you can compare any routes in the
network).

There are some problems with comparing with Google Maps. We can’t control
how the data is gathered nor which parameters are taken into account. In the
simple model of this report, no heed is taken to traffic lights, intersections nor
acceleration, which is taken into account to some extent by Google Maps. The
simulated data will probably not be exactly as Google Maps’, even with a perfect
model, but comparing with Google Maps should give an indication if it is correct
or not. Keep in mind that Google Maps’ data is not perfect either.

The comparison with Google Maps is done in two ways. The first one is
straightforward, it is just the average ratio of the travel time of each route calcu-
lated as:

C =
1

N

∑
ij

Calculated(i, j)

Google(i, j)
, (2.9)

where N is the total number of pairs being compared. The average ratio shows
how much, on average, the travel time of each route has improved. This has the
drawback that we do not know in what way it is improving/or getting worse. By
increasing the travel times of some routes a lot while decreasing for others could
give the same results as increasing the travel times of all routes a little. To be
sure this is not the case another correlation is introduced, the Pearson correlation.
The Pearson correlation, r, measures the linearity between two sets of data and is
defined as

r =

∑n
i=1(xi − x̄)(yi − ȳ)√∑n

i=1(xi − x̄)2
√∑n

i=1(yi − ȳ)2
, (2.10)

where x, y are the two datasets, x̄, ȳ are the averages of each dataset and n is the
number of data pairs. Furthermore, r = 1 corresponds to total positive linearity,
r = 0 is no correlation at all and r = −1 is totally negative linearity. A perfect
model compared with the true travel times of the routes would give r = 1.

Note that the Pearson correlation does not say anything about the actual dif-
ference between the two datasets, just that there is a linear relation. So r = 1

9


does not mean that there is a 1:1 relation between the datasets, this is why we
need the first measurement.

Of the two measurements, the Pearson correlation is the most important since
it says more about the overall behaviour of the network.

2.8 Cost

In the previous attempts at using BC to describe traffic flow, different suggested
augmentations of the BC [7][8] have been made, or rather how you calculate it.
When calculating the shortest paths between nodes, you can instead of using
distance define a cost to go between nodes. There are different ways to define this,
for example, the actual price in money you have to pay for fuel costs etc. or the
number of intersections and stoplights.

There are also more abstract ways of defining the price, reflecting that it does
not actually matter how short a road is, the only thing that matters is how it is
perceived. One example is using an angular distance, where a high price would be
to make a lot of sharp turns. Another one is just the number of turns themselves,
people would rather continue straight when it is perhaps more efficient to make a
couple of turns. It is simply easier to just continue straight.

In this thesis, the travel time is used as a cost because it is a logical way to
measure distance but also because of the limited information from OSM, as there
is little information about intersections and angles.

10


3

Method and data

The model is implemented in Python. For convenience and relevance, the simu-
lation is, for the most part, performed on Gothenburg. The network is imported
to Python with the package OSMnx [13] as a NetworkX graph [14]. The activity
is then calculated, followed by the betweenness centrality. The betweenness cen-
trality is then weighted by the activity and, if used, mixed with its predecessor.
A traffic flow is then spread out according to the betweenness centrality and new
speeds for the roads are obtained according to the VQ-relations. This is then re-
peated for a number of iterations or until convergence is met. Finally, the result
is compared to Google Maps.

3.1 Settings

The model is tried out for five different settings:

1. Static, the activity is calculated once and is not updated in each iteration.

2. Mixed, the resulting BC in each iteration are mixed with the BC in the
previous iteration.

3. Static and mixed.

4. Non-static and non-mixed.

5. No activity and no mixing (only betweenness centrality).

Each setting is executed for five iterations with a range of different numbers
of total trips to find the best fit. Each setting was evaluated with the same paths
given by Google Maps. Comparing how the relations with Google Maps develop in
each iteration compared to the baseline model, one can see if the model improves
or not.

The measurements are done three times, one for rush hour traffic in the morn-
ing, on for rush hour in the afternoon and another one in the middle of the night
when there should be no congestion.

11


For reference, the model is also tried out for the cities of Lund, Sweden, and
Verona, Italy.

3.2 Data

The traffic network is taken from Open Street Map (OSM) [15]. OSM is a crowd-
sourced map where users themselves can add data. OSM was chosen for its con-
venience, being both free and easy to access. To import the traffic network the
Python module ’OMSnx’ was used [13]. OMSnx imports a network from OSM to
python so one can use the Python package NetworkX [14]. NetworkX is a package
for graph theory with a lot of built-in functions.

While the VQ-relations themselves are taken from empirical data, how to clas-
sify each road to align with the classification of Trafikverket is not trivial. OSM
is in some aspects very accurate and there is data on all kinds of roads from small
walking paths to highways. While data such as distance and speed limits are accu-
rate and easy to interpret, there are other aspects that are harder to use straight
out of the box. For example, the lane counts can be off. At an intersection it is
common that a road that previously had one or two lanes splits into three or even
four lanes. As a result, the whole edge has three or four lanes. The number of
lanes is approximated to fit the VQ-relations, which can cause some errors.

Also, the classification of different roads differs. Since it is editable it seems
like what some people would classify as a certain road type, other people would
call something else. This together with the fact that how OSM and Trafikverket
classify roads differs also affects the chosen VQ-relations.

Another thing to think about is as the model is implemented now it is possible
to drive in both directions on all roads. And while this is not a problem overall,
some roads will be wrongly classified. Furthermore, this can cause problems on a
one-directional road, as when going on or of a highway or when at a traffic circle.

12


4

Results

When comparing using a static activity and/or mixing the BC the best results
were obtained when mixing but it mattered less if the activity was static or not.
It was only for high amounts of traffic that it was possible to see a difference. In
the following results, the model with static activity and mixed BC was used. The
runtime for each iteration was of the order of calculating the activity.

4.1 Correlations

In Gothenburg, the model gave the best results when comparing with Google
Maps on a Tuesday at 16:30. The resulting relations can be seen in tables 4.1a
and 4.1b. Looking at the relations, increasing the traffic flow causes the average
ratio to go up, giving a better relation with the travel times given by Google
Maps. However, it is not the same for the Pearson correlation. For small amounts
of traffic, the Pearson correlation is more or less unaffected while the average ratio
improves. But as the traffic increases the Pearson correlation goes down. It seems
like there is a trade-off in the model between a good average ratio and a good
Pearson correlation. The best results were obtained for the third iteration with a
trip factor of β = 0.06.

In figure 4.1, the calculated travel times are plotted against the travel times
given by Google Maps. The orange line is the 1:1 relation while the green line
is fitted to the data. For shorter routes, the relation is highly linear but as the
length of the routes increase the travel times are more spread out. After a couple
of iterations, most of the travel times have moved towards the orange line, giving
a better average ratio. But, since some travel times have not changed, the Pearson
correlation suffers.

4.2 Traffic flows

A graphical comparison of the congestion in the network on a Tuesday at 16:30 in
Gothenburg can be seen in figures 4.2 and 4.3. In the figures, red corresponds to

13


Gothenburg, Iteration
Tuesday, 16:30 0 1 2 3 4 5

T
ri

p
fa

ct
or

0.02 0.557 0.568 0.568 0.568 0.568 0.568
0.04 0.557 0.576 0.572 0.574 0.573 0.575
0.06 0.557 0.596 0.583 0.584 0.584 0.583
0.08 0.557 0.63 0.591 0.598 0.597 0.599
0.12 0.557 0.693 0.647 0.652 0.621 0.678
0.14 0.557 0.724 0.679 0.665 0.665 0.692
0.16 0.557 0.753 0.717 0.693 0.688 0.719
0.18 0.557 0.774 0.753 0.701 0.752 0.755
0.20 0.557 0.798 0.785 0.713 0.742 0.74

(a) Average ratio

Gothenburg, Iteration
Tuesday, 16:30 0 1 2 3 4 5

T
ri

p
fa

ct
or

0.02 0.855 0.854 0.854 0.854 0.854 0.854
0.04 0.855 0.854 0.853 0.853 0.854 0.854
0.06 0.855 0.844 0.845 0.853 0.844 0.852
0.08 0.855 0.802 0.842 0.834 0.829 0.833
0.12 0.855 0.748 0.795 0.765 0.802 0.768
0.14 0.855 0.733 0.777 0.767 0.79 0.769
0.16 0.855 0.73 0.768 0.762 0.785 0.75
0.18 0.855 0.728 0.737 0.756 0.778 0.692
0.20 0.855 0.72 0.729 0.75 0.765 0.735

(b) Pearson correlation

Table 4.1: The tables show the different correlations obtained for the model on a
tuesday at 16:30 in Gothenburg. The average ratio increases with the traffic but
the Pearson correlation goes down after the flow factor reaches β = 0.06.

14


0 500 1000 1500 2000 2500 3000 3500 4000
Google Maps [s]

0

500

1000

1500

2000

2500

3000

3500

4000
Ca

lcu
la
te
d 
tra

ve
l t
im

es
 [s

]

Travel times (trip factor: 0.06) 
Iteration: 0
Iteration: 4

Figure 4.1: The figure shows the calculated travel times plotted against the ones
given by Google Maps. The orange line is the 1:1 relation while the green line is
fitted to the data. Most of the data points move towards the orange line after a
couple of iterations.

that the flow has passed a high breakpoint in the VQ-relations. While the model
seems to predict some congestion quite well, especially along some of the bigger
roads, it seems to do a bad job at predicting the congestion in the inner city. For
example, Korsvägen is marked red by Google Maps while in the model it is barely
affected at all. One thing to note in figure 4.2 is the congestion along highway
E6 and from Gullbergsvass to Majorna, both roads passing ’through’ Gothenburg.
Google predicts heavy traffic while the model predicts very little traffic.

Figure 4.4 shows the betweenness centrality calculated for Gothenburg. It is
worth to note how different it looks compared to the predicted congestion in figures
4.2 and 4.3. A high flow of traffic does not necessarily lead to congestion.

4.3 No activity

To try the hypothesis that using the activity to augment the betweenness centrality
would give better results, a run was made without using the activity. The result
can be seen in table 4.2. As when increasing the amount of traffic, the average
travel time improves when not using activity but the linearity goes down. The
Pearson correlation is higher when using the activity.

15


Figure 4.2: The figures show the congestion given by Google Maps (left) from
central Gothenburg on a Tuesday at 16:30 and the best corresponding results
given by the model (right). The colors in the figure to the right correspond to the
breakpoints in the VQ-relations, red being the worst. The model captures some
behaviours. Looking at the ’bend’ in the north it seems to work quite well while
in the more central parts in the south it does not work as well. Furthermore, the
model fails to capture the behaviour along highway E6.

Figure 4.3: As in figure 4.2 the figures show the congestion given by Google Maps
(left) from central Gothenburg on a Tuesday at 16:30 and the best corresponding
result given from the model (right). The model catches some congestion well while
it misses others.

16


Figure 4.4: The figure shows the resulting betweenness centrality in Gothenburg,
where red means a high value. The BC predicts that there will be heavy traffic in
Tingstadstunneln. A high flow of traffic does not necessarily lead to congestion.

Gothenburg, Iteration
Tuesday, 16:30 0 1 2 3 4 5

Average ratio
Act. 0.557 0.596 0.583 0.584 0.584 0.584

No act. 0.557 0.639 0.591 0.613 0.597 0.605

Pearson correlation
Act. 0.855 0.844 0.845 0.853 0.844 0.852

No act. 0.856 0.802 0.835 0.831 0.822 0.821

Table 4.2: The table shows the different correlations obtained when and when not
using the activity to augment the betweenness centrality. While the average ratio
has improved when only using BC, the linearity is better when using the activity.
A trip factor of β = 0.06 was used.

17


4.4 Convergence and mixing

Looking at tables 4.1a and 4.1b, the speeds of the roads do not seem to converge.
Even for longer runs as in table 4.3, it does not reach convergence, instead, it
oscillates between two levels. The mixing step was introduced to encourage con-
vergence and clearly does not work as intended. It did, however, give better results
in the model compared to when not mixing.

Iteration
13 14 15 16 17 18 19 20

Trip factor 0.06 0.853 0.843 0.853 0.845 0.853 0.843 0.853 0.845

Table 4.3: To see if the model reaches convergence a longer run was made. Not
even for longer runs it reaches congvergence, instead it seems to oscillate betwenen
two levels. For this run the model was static and the BC was mixed.

4.5 VQ-relations

To see how well the VQ-relations work, runs were made when just updating the
speed according to the VQ-relations with no traffic, see table 4.4.

Baseline VQ
0.557 0.567

(a) Average ratio, afternoon.

Baseline VQ
0.856 0.853

(b) Pearson, afternoon.

Baseline VQ
0.743 0.756

(c) Average ratio, night.

Baseline VQ
0.956 0.960

(d) Pearson, night.

Table 4.4: The tables show the measured relations for the baseline model and after
the speeds have been updated with the VQ-relations. There is a slight improvement
both in rush hour and in the middle of the night. Note that there is no implemented
traffic in either of the cases.

Both in the afternoon and in the middle of the night the average travel time
improves. For the Pearson correlation, we only see an improvement in the night,
while it gets a bit worse in the afternoon.

18


Monday, 16:30
Iteration

0 1 2 3 4 5

Average ratio
Lund 0.557 0.606 0.583 0.602 0.589 0.601

Verona 0.458 0.557 0.513 0.543 0.51 0.545

Pearson correlation
Lund 0.822 0.81 0.838 0.81 0.839 0.811

Verona 0.892 0.874 0.891 0.874 0.89 0.873

Table 4.5: The model seems to work well for Lund, there is improvement in both
the average ratio and in the Pearson correlation. For Verona the results are similiar
to those of Gothenburg. The flow factor was β = 0.18 for Lund and β = 0.06
Verona.

4.6 Other cities

The model was also tried on the cities of Lund, Sweden, and Verona, Italy. The
best runs can be seen in table 4.5. The model gives good results for Lund while it
gives results similar to Gothenburg for Verona. Note that the average ratio before
the first iteration is the same for Lund and Gothenburg while it is a bit lower for
Verona.

19


5

Discussion

The model gave some mixed results. In most cases, the model did not show any
improvement compared to the baseline model and when it did it was only a slight
improvement. That being said, there are still things to point out and conclusions
that can be drawn to improve future work.

5.1 Baseline model

One thing to note first of all is the good correlation given when just using the speeds
given by the network. While the travel times themselves are a bit off the linearity is
high. This suggests that perhaps an advanced traffic model is not needed to get an
accurate activity model. One can instead just decrease all the speeds with a factor.
In this way the linearity is conserved but the travel times better match those given
by Google Maps. This is a lightweight solution and is easy to implement but has
the drawback that the constant varies over the day and would be different from city
to city (and even from area to area in the city). We saw in table 4.4 that Verona
would have different constant than Gothenburg/Lund (which suggests that traffic
congestion is a bigger problem in Verona than in Gothenburg/Lund).

If one continuously works with the same city this could be a good solution to
make the traffic more realistic. And as long as you are working with urban areas,
even a moderate decrease in speeds could still improve the activity model.

5.2 Time of day and asymmetrical flow

As mentioned before, there is no real time dependence in the model. The idea is
that there should be a relation between the time of day and amount of traffic (trip
factor). In the morning, when people go to work, there should be one number of
total trips while during the day it should go down and then increase again when
people want to go home. In the model, this is controlled by changing the amount of
traffic but the model still gives better results in the afternoon than in the morning.
The different results probably come from the fact that the flows during the day

20


are asymmetrical.
As the model is built, the flow is not biased. In real life, one goes to work

in the morning and from work in the late afternoon. This causes different traffic
flows and subsequent congestions, on different sides of the bottlenecks. The model
takes no heed to this in the same way. The flows are constant and the same in
both directions. In combination with using an undirected network, this poses a
problem.

Making the network directed could probably improve this, one would in many
cases effectively get two separate networks. Also, the traffic junctions at motorways
would be affected for the better as the flow would be more realistic. Traffic circles
would also benefit from this. As mentioned before the model does not capture the
congestion at Korsvägen, which is a real problem.

Another reason for the discrepancy between morning and afternoon can be
attributed to that office buildings can have very high assessed value without any
people living there. An office works a traffic consumer in the morning but as a
generator in the afternoon. The model superimposes these behaviours and seems
to fit better in the afternoon.

One solution to this is to in some way classify the activity into different cate-
gories. If one could distinguish between what is an office and an apartment house,
it would be possible to implement a directed flow of traffic. In this way, one could
also get a time dependence in the model.

5.3 VQ-relations

Updating the base speed on the edges after the VQ-relations does seem to im-
prove the correlations, see tables 4.4. This is expected as it is empirical data
from Trafikverket. The gain is not great but updating the speed on each edge
after the VQ-relations is computationally light and could easily be implemented
in Hellerviks model.

While it is hard to tell anything about how much the VQ-relations make a
difference in the rest of the model these results suggest that the VQ-relations hold
and that the fault is at the modelled traffic and not in the VQ-relations.

As stated above there is a certain discrepancy between how Trafikverket and
OSM classify their edges. As a result, the edges in the model are approximated to
some extent. Making a more detailed translation of the classification of the edges
would probably improve the correlations.

It is good that the model improves a little bit when just using the VQ-relations
but the main reason for using the relations is how the speed drops with an increased
traffic flow. This behaviour is key to updated the speeds and getting a dynamic
traffic flow.

21


5.4 Traffic flows

Looking at figures 4.2 and 4.3, the model captures some aspects of the traffic flow.
The model predicts that there should be very heavy traffic in the ’bend’ north
of the river, as does Google maps. Furthermore, to the west just above where
it says ’Oceanen’ both Google Maps and the model says there should be traffic
congestion.

Along highway E6 there is a lot of traffic according to Google Maps but this is
not captured at all by the model. This can probably be attributed to the fact there
is a lot of through traffic in real life. A lot of the traffic along all of the west coast
passes through Gothenburg. Furthermore, all the traffic in the model is generated
and ’consumed’ inside of Gothenburg. Adding some kind of flow through the city
would handle this problem and give a more realistic traffic flow. In the same way,
increasing the computational area would probably be good to improve the flows
in central Gothenburg. Unfortunately, the computations were done on a computer
which can only handle networks roughly the size of Gothenburg.

While the model captures some aspects of the traffic flow, large areas of the
inner city are unaffected in the model. The model does not add any penalty for
stop lights or intersections, which does not pose a big problem when riding on a
highway but is highly relevant when traversing the inner city. Implementing some
kind of punishment in speed for traffic lights and intersections could take care of
this. Changing the cost to take into account number of turns, as suggested by [7],
would also change the flow in the central parts of the city where there are a lot of
intersections.

5.5 Using Google Maps as the correct traffic

Google Maps was used for its convenience and availability and not because it
represents the truth. While it is important to bear this in mind, it still gives a
good indication of how the traffic flows compare to the real flows. It is possible
that using a different source for the travel times would change the results in section
4 but one would have to give up the flexibility of the Google Maps data where you
can compare virtually any points anywhere.

5.6 Activity

The motivation for using the activity to augment the betweenness centrality is
twofold. The first reason is simply that the activity is the main thing being cal-
culated in Hellervik’s model and it would be neat to have something that depends

22


on the activity and could be updated in each iteration. In this way, it is natural
to try to use the activity.

Furthermore, the activity should correspond to the assessed value of a property.
The assessed value depends, among other things, on the size of the property. The
model is based on the assumption that this corresponds to population size and
that traffic is generated accordingly. Using the activity you can get around the
problem that each node does not generate or consume the same amount of traffic.

There are some problems with this. For example, houses close to the sea
usually are quite expensive and have a high taxation value while not housing so
many people. Another example is a central office building or school, where the
assessed value can be high but having no one living there. As mentioned before,
it is possible that this contributes to the discrepancy we see when comparing
the traffic flows in the morning and the afternoon. In the model, offices would
generate traffic all through the day, while it actually almost only generates traffic
when people are going home from work.

The activity used in the model is a simplified model, Hellervik uses a more
advanced model that gives a more accurate activity. The simplified model is used
to avoid singularities and ensure convergence of the activity, which is not always
the case in the more advanced model. Using the more advanced model could give
a better traffic flow.

23


6

Conclusions and future work

There is very little in-data in the model, only the road networks and a popu-
latin, and the model still manages to capture some real traffic behaviours. While
there is still room for improvement, the results obtained in the report show that
betweenness centrality can indeed be used to model traffic.

The model does a good job of capturing some aspects of the traffic flow while it
has a hard time capturing others and there are certain areas that can be improved
to get better results. These areas can be divided into two parts, namely how to
model the traffic network and how to perform the calculations.

When it comes to the network one should take in a larger network than just
the city itself, since the traffic in a city is not a closed system. In addition to this,
some through traffic should be added. Especially a city like Gothenburg with its
important geographical setting this is important as all traffic on the west coast
passes through the city. Furthermore, the network should also be directed and
having more exact road types could give better results. Introducing some kind of
penalty for intersections and turns is also a future prospect.

For the calculations, trying out different ways to define the shortest path
with topological or angular distance instead of metric distance could improve the
model. Introducing one or more other centralities such as information centrality
or straightness centrality as suggested by Porta et al. [10][11] is also something
that could be explored.

An inherent time dependence is also something that is needed to get a robust
model. One way of doing this is to have different activity types. If one could
distinguish what is an office and what is a home, the flows could be directed and
also time dependent.

24


Bibliography

[1] A. Hellervik. Statistics and dynamics in urban modelling. Unpublished
manuscript, Chalmers, Gothenburg, Sweden, 2017.

[2] Trafikverket, Borlänge. Bygg om bygg nytt: Bilaga 4.1 VQ-samband 2014
Landsbygd, 2014.

[3] Trafikverket, Borlänge. Bygg om bygg nytt: Bilaga 4.2 VQ-samband 2014
Tätort, 2014.

[4] Trafikverket, Borlänge. Sampers trafikprognoser - en kort introduktion, first
edition, 2015.

[5] I M L N Jayaweera, K.K.K.R. Perera, and Jayantha Munasinghe. Centrality
measures to identify traffic congestion on road networks: A case study of sri
lanka. 13:13–19, 04 2017.

[6] Jan Scheurer, Carey Curtis, and Sergio Porta. Spatial network analysis of
public transport systems: Developing a strategic planning tool to assess the
congruence of movement and urban structure in australian cities. 01 2007.

[7] A. Jayasinghe, K. Sano, and H. Nishiuchi. Explaining traffic flow patterns
using centrality measures. International Journal for Traffic and Transport
Engineering, 5(2):134–149, 2015.

[8] Y. Altshuler, R. Puzis, Y. Elovici, S. Bekhor, and A. Pentland. Augmented be-
tweenness centrality for mobility prediction in transportation networks. Find-
ing Patterns of Human Behaviors in Network and Mobility Data, 2011.

[9] Aisan Kazerani and Stephan Winter. Can betweenness centrality explain
traffic flow. 01 2009.

[10] Sergio Porta, Paolo Crucitti, and Vito Latora. The network analysis of urban
streets: A primal approach. 33:705–725, 02 2006.

25


[11] Sergio Porta, Paolo Crucitti, and Vito Latora. The network analysis of urban
streets: A dual approach. 369:853–866, 09 2006.

[12] Google Inc. Google Maps Distance Matrix API. https://developers.

google.com/maps/documentation/distance-matrix/, 2017.

[13] G. Boeing. Smnx: New methods for acquiring, constructing, analyzing, and
visualizing complex street networks. Computers, Environment and Urban
Systems, 65(126-139), 2017.

[14] Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. Exploring network
structure, dynamics, and function using NetworkX. In Proceedings of the 7th
Python in Science Conference (SciPy2008), pages 11–15, Pasadena, CA USA,
August 2008.

[15] OpenStreetMap contributors. Planet dump retrieved from
https://planet.osm.org . https://www.openstreetmap.org, 2017.

26

https://developers.google.com/maps/documentation/distance-matrix/
https://developers.google.com/maps/documentation/distance-matrix/
 https://www.openstreetmap.org 

	Introduction
	Background

	Theory
	Betweenness centrality and activity
	Activity
	Augmenting the BC
	Traffic flow
	VQ-relations
	Algorithm and mixing betweenness centrality
	Checking the results
	Cost

	Method and data
	Settings
	Data

	Results
	Correlations
	Traffic flows
	No activity
	Convergence and mixing
	VQ-relations
	Other cities

	Discussion
	Baseline model
	Time of day and asymmetrical flow
	VQ-relations
	Traffic flows
	Using Google Maps as the correct traffic
	Activity

	Conclusions and future work