Evaluating in-memory caching strategies
for distributed web services

Master’s thesis in Computer science and engineering

Love Lyckaro

Department of Computer Science and Engineering
CHALMERS UNIVERSITY OF TECHNOLOGY
UNIVERSITY OF GOTHENBURG
Gothenburg, Sweden 2023


Master’s thesis 2023

Evaluating in-memory caching strategies for
distributed web services

Love Lyckaro

Department of Computer Science and Engineering
Chalmers University of Technology

University of Gothenburg
Gothenburg, Sweden 2023


Evaluating in-memory caching strategies for distributed web services

Love Lyckaro

© Love Lyckaro, 2023.

Supervisor: Luca Di Stefano, Department of Computer Science and Engineering
Advisor: Korp Thidrandir, Antura AB
Examiner: Alejandro Russo, Department of Computer Science and Engineering

Master’s Thesis 2023
Department of Computer Science and Engineering
Chalmers University of Technology and University of Gothenburg
SE-412 96 Gothenburg
Telephone +46 31 772 1000

Typeset in LATEX
Gothenburg, Sweden 2023

iv


Evaluating in-memory caching strategies for distributed web services

Love Lyckaro
Department of Computer Science and Engineering
Chalmers University of Technology and University of Gothenburg

Abstract
Caching is an incredibly common component in modern computer engineering. Present
everywhere, caches benefit greatly from domain-specific knowledge. This thesis, in
collaboration with Antura AB, targets caching between distributed web services
and their database. The state-of-the-art in this caching domain is to have a shared,
networked, in-memory, key-value store, such as Redis. This thesis had two goals.
Firstly, to implement an alternative to this shared cache system using caches dis-
tributed on each server of the web application. Secondly, to implement a test suite
for comparing these cache systems. A common interface was created for these cache
systems and both were implemented using F#. A test suite was then created using
a sample distributed web application and simulated user requests. The user simula-
tion covered different read-write ratios, domain sizes, and behavior with respect to
a commonly requested “shared” domain. The results of the tests found cases where
either cache system performed better on average. Although the shared cache system
performed significantly better as the degree of distribution increased.

Keywords: caching, distributed caching, redis, distributed systems, web caching,
database caching, user simulation.

v


Contents

1 Introduction 1

2 Background 3
2.1 General overview of caching . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Database caching using an in-memory key-value store . . . . . . . . . 4

2.2.1 Consistency models . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Message brokers and the publish subscribe pattern . . . . . . . . . . . 7
2.4 Case: Antura . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Main idea 9
3.1 Designing a per-server cache system . . . . . . . . . . . . . . . . . . . 9
3.2 Comparing per-server and shared caching . . . . . . . . . . . . . . . . 10
3.3 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Methods 11
4.1 Defining a cache model . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 The distributed cache system . . . . . . . . . . . . . . . . . . . . . . 11
4.3 The shared cache system . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.4 Testing and evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.4.1 System parameters . . . . . . . . . . . . . . . . . . . . . . . . 13
4.4.2 User simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5 Results and Discussion 17
5.1 Latency vs hit rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2 Impact of prefetching . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.3 Impact of number of nodes . . . . . . . . . . . . . . . . . . . . . . . . 18
5.4 Impact of shared domain . . . . . . . . . . . . . . . . . . . . . . . . . 18

6 Conclusion 25
6.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Bibliography 27

vii


Contents

viii


1
Introduction

Large memories are slow and small memories are fast. Ideally, one would want a
large and fast memory. For computer engineers, combining small and fast memories
with large but slow memories in an effective way is a well-known challenge. The
usual solution to this problem is to introduce caches that allow one to simulate
large and fast memories. In computer systems, caches can be found everywhere:
Multiple levels of cache separate the CPU from RAM, caches accelerate access to
comparatively slow hard drives, and Web caches decrease the load on databases and
servers.

Although well researched, caching is far from a solved problem. With every domain
comes possibilities for specific cache strategies. This project is just one of many
taking a look at a specific domain, and considering “is our caching system the best
for this use case?”. Our domain consists of caching between a distributed Web service
and its database. Adding a cache layer between a database and the servers that
access it can have many advantages. It can improve latency for the end user, decrease
the load on the database, and reduce the overall network traffic. Approaches in
caching between servers can also be used to mitigate similar bottlenecks in Web
services, for example, between Web clients and servers.

A common strategy in this niche is to have a networked, in-memory, key-value store
that is shared between the different instances of the distributed Web application.
The major benefit of sharing the cache between servers is that cache retrievals from
one server can benefit the others. The drawback is an added latency on every cache
lookup and write due to the network.

This thesis explores an alternative to this shared caching system. Focusing on imple-
mentation and evaluation of a system where each node of the Web service manages
its own in-memory cache. This system decreases read latency, as it does not need
to communicate over the network.

For evaluating these caching systems, a benchmarking framework is developed that
enables evaluation of these systems across many different domains, with differences
in both user and application behavior. This permits us to draw conclusions about
under which circumstances we should prefer a distributed cache system over a shared
one.

The structure of this thesis is as follows: chapter 2 covers the necessary background,
including the basics of caching, caching using in-memory key-value stores, messag-

1


1. Introduction

ing, and our use case. This is followed by chapter 3, which contains a description of
the main problems the thesis tries to solve. chapter 4 gives a more detailed descrip-
tion of the technical solutions made in the thesis, as well as the factors taken into
account in performance testing. The results of testing are presented in chapter 5.
Finally, in chapter 6 conclusions are drawn about the main goals, as well as some
recommendations for future work.

2


2
Background

This section covers the state-of-the-art in database cache systems, as well as criteria
for evaluating cache systems, and models to formally describe what guarantees a
distributed cache system can offer. Finally, it covers Antura’s Web service as a
specific case study for evaluating cache systems.

2.1 General overview of caching
Caches are incredibly common in computer systems, and distributed Web services
are no exception. In general, a cache consists of a small and fast memory, where
one saves the most commonly used entries from a larger slower memory; later, one
may attempt to retrieve a needed entry from the fast memory before accessing the
slow one, thus being able to simulate a large and fast memory. Caches in many
domains share a number of characteristics and this subsection aims to cover the
ones relevant to this project. We will follow the taxonomy of caching systems on
the Web proposed by Wang [1]. In this survey the author identifies many relevant
features, among which these are the most important for this project:

1. Where should a cache be placed for optimal performance?

2. How do caches cooperate with each other and with the slow memory?

3. Should the cache prefetch data or not?

4. Can the cache maintain data consistency?

The first point is of particular importance for this project and highlights the fact that
one can build many different cache architectures, with different pros and cons. For
instance, caches closer to the requester of data can improve latency, whilst caches
further from the requester can improve opportunities for data sharing.

Regarding point 2, cooperation between caches and the slow memory is not a trivial
task. One must identify where writes are first made: to the cache, or to the slow
memory? In the first case, the write would need to propagate to other duplicate
entries. In the second case, any cached entries of the overwritten value would need
to be invalidated. Moreover, when your cache is distributed, the caches would need
to communicate these things to each affected cache instance, as well as the slow
memory.

3


2. Background

Wang also considers a system where writes go directly to the large, but slow memory
(in Wangs use case, a Web server), which then messages the cache to invalidate the
entries. One could also think of a use case where invalidation is combined with
updates to the stale entries.

Prefetching of data (Point 3) can improve performance, as it decreases mandatory
reads from the slow memory. Prefetching is the act of trying to retrieve data likely
to be used in beforehand. This can happen both on startup, or through fetching
additional data upon the request of a resource.

Data consistency (Point 4) is discussed in subsection 2.2.1.

What one should especially consider about these features, and indeed about caching
in general, is the need to tailor one’s cache system to the use case. Since Wang’s
paper, the internet has grown significantly. With this growth new exciting arenas
and approaches to caching have come into play, although these four considerations
(placement, cooperation, prefetching, and consistency) have all kept their relevance.
An example is the recent research into ICN (Information Centric Networking), which
is trying to tackle the enormous amount of data throughput needed for modern
information distribution, especially for use cases like video streaming. In their survey
paper of caching in ICN, Din, Hassan, Khan, et al. [2] write about tailoring caches
for this use case. Another example of tailoring a cache system to its use case is [3]
where YouTube have tailored a memcached-based [4] cache system to their needs
using a cache-replacement algorithm based on their users’ behavior.

2.2 Database caching using an in-memory key-value
store

There are a number of approaches to database caching. Arguably, the most common
is to use an in-memory key-value store such as Redis [5] or memcached [4]. Among
others, Facebook has used this approach to great effect [6]. The approach relies on
having one or more instances of a key-value store between the application and the
database. The former is able to serve requests far faster than the latter would. This
key-value store is, most often, shared between the different servers, which brings both
pros and cons. On one hand, sharing the cache over the network allows for cache
misses by one server to benefit the others. On the other hand, sharing the cache over
the network also brings overhead. Requests have to go through the network stack,
which can be an order of magnitude slower than local memory access. A simplified
structure of this type can be seen in Figure 2.1.

In their paper Design Considerations for Distributed Caching on the Internet [7],
Tewari, Dahlin, Vin, et al. explore the considerations one should make when de-
signing a cache system for the Web. The authors focus on large distributed cache
systems, where they identify a number of important best practices:

1. Cache systems should share data among many clients to reduce compulsory
misses and scale to a large number of caches.

4


2. Background

Figure 2.1: A database caching system with a shared in-memory cache.

2. Cache hit time constitutes a significant fraction of the total information access
latency. Hence, cache architectures should minimize the cost to access a cache.

3. Even the ideal cache will have a significant number of compulsory and com-
munication misses. Thus, cache systems should not slow down misses.

Interesting for this project is the first note: Cache systems should share data among
many clients to reduce compulsory misses. In the paper, the authors note that testing
on large cache systems reveals that following this rule improves performance. They
then use this finding to justify developing a multi-tiered cache system for the Web.
However, this system differs somewhat from the ones of interest in this project, both
in terms of scale and location of the cache. With their caches lying in proxies between
end users and Web applications, rather than between the application and database.
An interesting question that then arises is whether the authors’ assumption holds
in cases like Antura’s or if it would be more performant to e.g. lower cache request
latency at the cost of decreasing data sharing.

2.2.1 Consistency models
A question relevant to distributed systems in general, and caches in particular, is
the question of what guarantees these systems can uphold. This question has been
researched before and a number of different consistency models have been developed

5


2. Background

to describe these guarantees. In the paper Eventually consistent [8], W. Vogels re-
ports on the different aspects that have been considered when choosing a consistency
model for Amazon’s many distributed storage services. Vogels explains the different
models through the lens of three different, independent actors A, B and C. These
are all processes that try to read from, and write to some distributed storage system.
The concistency model is defined by when the actors are able to observe changes
made to the system by the other actors. Some of the models relevant to this project
can be informally described as:

1. Strong consistency: after A updates the storage system, any subsequent access
by A, B or C returns the updated value. We may also say that the update is
instantly observable.

2. Weak consistency: this model does not guarantee an updated return value
after A has updated the storage system. It will usually only guarantee that
the updated value becomes observable after some other requirement is met.

3. Eventual consistency: a specific form of weak consistency where an update
to the storage system is guaranteed to eventually be returned on subsequent
requests from the actors, given that no other updates are done in the meantime.

4. Session consistency: a special case of eventual consistency where requests are
viewed through the lens of a session. A series of updates and requests made
in the same session is guaranteed to be consistent: however, this guarantee is
not upheld when switching sessions.

Of particular interest to this project are models 3 and 4, that is, eventual consistency
and session consistency.

Strong consistency might seem the most enticing: after all, it does give the strictest
guarantees. Vogels explains that these strict guarantees must be weighed against the
consequences of distributing such a system. These are most succinctly summarized
by the CAP Theorem1 [9], which states that a distributed data store can at most
provide two of the following three guarantees:

• Strong consistency: every subsequent read after a write is correct;

• Availability: every request receives a non-error result;

• Partition tolerance: system keeps running correctly despite an arbitrary amount
of delayed or missing messages between nodes.

Thus, an available partition-tolerant system cannot also provide strong consistency,
but may guarantee weaker forms of consistency

1Also known as Brewer’s theorem, previously Brewer’s conjecture.

6


2. Background

2.3 Message brokers and the publish subscribe
pattern

Message brokers are a component in computer systems facilitating communication
between processes or computers. Originally developed for Prolog inter-process com-
munication in [10], the messaging paradigm has gone on to widespread use.

In this project we use a message broker to facilitate a publish-subscribe pattern of
messaging. Where messages are published to channels, which interested parties (in
our case, the other caches) can subscribe to. This allows all parties to not have to
worry about exactly which processes to communicate with. Originally called “News
service”, the messaging paradigm was first explained in [11].

2.4 Case: Antura
This thesis is done in collaboration with Antura AB. They are a software company
focused on a project management Web service called Antura. This Web service is
distributed over many different servers, each of which runs a number of different pro-
cesses. Antura are currently using a distributed cache system akin to the one under
review in this project. However they are unsure about how it performs compared
to a shared cache system like Redis [5]. They are also uncertain about what factors
they should take into account when choosing whether to migrate to a new system.

In particular Antura are curious about how the prevalence of “common” requests,
and the shared state caused by these, affects the performance of both systems. They
believe that their user behavior is quite disparate, with most users requesting data
only relevant to the project they are working on. Through the use of sticky sessions,
where users are consistently routed to the same server instance, they believe that
this shared state is further reduced. These circumstances seem to imply that a
distributed cache system might be preferable, however Antura currently lack the
experimental data to support this conclusion. Another factor Antura are interested
in is how the cache performance is effected by further distribution of the application.

7


2. Background

8


3
Main idea

In this section we describe the main goals of the project, namely to:

1. Design a cache system with a per-server in-memory cache; and

2. Evaluate the performance of this type of system against one that uses a shared
cache, in order to determine under which circumstances one should prefer the
former over the latter.

3.1 Designing a per-server cache system
The first goal of the project was to design a proof-of-concept cache system that
had one cache per server. The hypothesis was that, in use cases where the data
shared between servers is small, this setup could improve performance by avoiding
the network overhead on cache requests. Figure 3.1 shows a scheme of this kind of
system:

Figure 3.1: The idea for an alternative cache setup. Each server keeps its own cache.
Allowing an order of magnitude shorter latency for cache lookups. With the trade-
off being extra communication being necessary for cache invalidation due to data
duplication.

This idea raised a number of challenges:

9


3. Main idea

• How should duplicate cache entries be handled?

• How should cache invalidation be performed?

• What consistency model should the cache fulfill?

3.2 Comparing per-server and shared caching
The second goal of the project concerns the viability of a cache system such as this.
The research question can be summarized as: Under which circumstances does a
per-server cache system outperform a shared equivalent?

The project aims to create a test suite for testing these cache systems, allowing
us to measure performance metrics like cache hit rate and latency. As with the
implementation of the system, this question raises a number of challenges:

• How does one best test the performance of a cache? How can different request
patterns be simulated in a good way?

• Which request patterns would favor one over the other?

• How does the system performance scale as more servers are added?

3.3 Delimitations
To keep the project viable within its allocated time frame and to focus attention on
the matter at hand, a number of topics were not explored in this thesis. Additionally,
as the project was necessarily focused on Antura’s use case, some results might not
be as general as they could be.

The project did not explore:

• How different shared and networked caches compare against each other.

• Cache replacement algorithms: though cache replacement algorithms are im-
portant in many cache systems where space is scarce, Antura’s use case rarely
runs out of memory for the cache.

• Distributed messaging. One could imagine a system where individual caches
communicate directly with each other, avoiding a potential bottleneck in the
message broker. This was not explored because we did not foresee the message
broker being a bottleneck in this use case, and we feared that this would add
a rather large amount of complexity to the project.

• Prefetching upon request. Prefetching can be done both on startup of the
system and upon request of a resource. This thesis only covers prefetching on
startup, as that was most feasable in Antura’s use case.

10


4
Methods

This chapter outlines the methods used for solving the challenges set out in the
previous chapter, both regarding the implementation of the cache system and testing.
Throughout these parts, the language F# has been used for almost all programming
tasks. F# is a functional programming language based on the dotnet platform. It
has support for object orientation, and importantly in this project, for classes and
interfaces.

4.1 Defining a cache model
An important factor in creating the caches in this project was specifying what we
mean by cache and what guarantees we want this cache to provide. As outlined in
subsection 2.2.1, there are a number of consistency models a distributed system can
enforce. We have elected to have our caches fulfill eventual consistency: that is, any
updates to our caches will eventually be propagated throughout the system, given
enough time without conflicting updates.

We have also elected to have the caches save elements of one specific type. That is,
for each type of item we want to cache, we will create a new cache object in our
program. E.g. we could have one instance of our cache class for User data, and
another for Profile pictures.

Our caches have to provide a few important operations: one has to be able to get
cached items, set the value of a cached item, and delete cached items. Lastly, we
demand that the caches support a function we call getOrFetch, which given a key,
and a (possibly expensive) function for looking up the value, either returns the
cached value if it exists, or invokes the function and returns its evaluation. These
operations are specified in an interface for caches which can be seen in Listing 1.

4.2 The distributed cache system
The distributed cache system was implemented using dotnet’s thread safe dictionar-
ies. As the caches possibly contain duplicate data, each cache has to communicate
upon deletion or modification of a cached value. To facilitate this, a message broker
was added to the system. Originally we planned to use RabbitMQ, however we later

11


4. Methods

type ICache<'a> =
member this.get (key: string) : option<'a>
member this.set (key: string) (value: 'a) : unit
member this.getOrFetch (key: string) (expensiveLookup: unit -> 'a) : 'a
member this.delete (key: string) : bool

Listing 1: The interface used for caches in this project

// Invalidate (domain : string) (key : string) : InvalidationMessage

type Cache<'a> (domain: string) =
let dict = ConcurrentDictionary()
member this.delete (key: string) : unit =

dict.TryDelete key
messagebroker.publish (Invalidate domain key)

Listing 2: Deletion function for distributed cache. When deleting a cached item from
a cache, a message is sent to all other caches in the same domain to also remove this
item.

opted for Redis’ built-in message broker, which allowed us to have one less com-
ponent in the system. The message broker is used to facilitate a publish-subscribe
pattern. In Listing 2 an example of this communication is given, where upon removal
of a key from a cache, it also sends an invalidation message to any other caches in
the same domain. By default, these messages are only used to flush stale data from
caches, however another benefit with our messaging model is that it allows us to
model dependencies between cached items.

In the distributed cache system, each cache is backed by its own thread-safe dictio-
nary, which has to specify its domain in order to find the correct channel to subscribe
to.

4.3 The shared cache system
The shared cache system was implemented on top of Redis [5]. As the Redis instance
is shared among all nodes of the web service, we can be certain that the cache does
not introduce data duplication. Each cache object on a Web node shares the same
connection to the Redis instance, and differs only in a prefix added to the saved
keys.

A problem introduced by the shared cache system that we did not have to consider
in the distributed case is serialization. Since the shared cache uses the separate
entity Redis, we could not simply store the cached items as regular F# values,
but rather had to serialize and deserialize them upon write and read operations,
respectively. For this purpose we used the JSON serialization library JSON.Net by
Newtonsoft [12].

12


4. Methods

4.4 Testing and evaluation
In order to test the performance of the cache systems, we have developed a simple
benchmark system. This system consists of an example distributed Web service
exposing a simple CRUD (Create, Read, Update, Delete) API. This is accessed
through a reverse-proxy which enables load balancing with sticky sessions. Finally
performance data from each cache request is logged to a log aggregator. A schematic
overview of the testing framework can be seen in Figure 4.1.

The Web service is based on an example domain with data types for people, pets,
and families (which contain references to both people and pets). The idea is that
through simulated user behavior in this example domain we can explore different
scenarios in a systematic way.

Figure 4.1: A schematic view of the testing framework

4.4.1 System parameters
There are four different cache systems we are comparing: a system may be dis-
tributed or shared, and it may or may not feature data prefetching. Prefetching
concerns whether we load some, or even all, relevant items into a cache on startup.
Due to lack of time, prefetching upon request was not explored. There is some
interesting interaction between whether we prefetch or not, and whether the cache

13


4. Methods

is shared or distributed. As an example it raises the question of where to put our
source of truth, that is, where we expect the correct version of each item to be. If
we prefetch a domain, we can assume that each item in that domain is in the cache.
It could then be beneficial for write-latency to write to the cache first, only to then
update the database. Always having our most up-to-date version of each item in
the cache, and only saving to the database to keep data persistence. However, this
would not be without drawbacks. As mentioned earlier, both the distributed and
shared caches provide only eventual consistency, and as such we would have to take
care to ensure the data would not cause to large problems from possible data races.

Another very important factor in how the system will perform is how distributed the
application is. That is, how many nodes are running the application. This impacts
how many instances of the cache are created in the distributed case, and thus how
much data-duplication is necessary.

4.4.2 User simulation
The project is concerned with what factors to consider when choosing a cache sys-
tem. One of the most important factors is request patterns, and by extension, user
behavior. As such a number of features of user behavior are considered. These can
be seen below and were chosen as they were foreseen to have a significant impact
upon cache performance.

• Read-write ratio. The percentage of requests which just read a value as op-
posed to editing or removing it.

• Size of the shared domain: Amount of data which every user is likely to request.
A real life example could be data related to the front page of a website.

• Size of the non-shared domain: The non-shared domain is consists of all data
which is not commonly requested. In a typical use case this would be far larger
than the shared domain. A real life example would be all data related only to
specific users or projects.

• Probability of a request going to the shared vs non-shared domain. The per-
centage of requests which should go to the shared vs non-shared domain. This
is used to model whether most users are similar to one another, i.e. request
mostly the shared domain, or if they are very disparate, i.e. request mostly
from the much larger non-shared domain.

The user simulation is done through the help of a tool called Locust [13]. Which
allows for the definition of example users. In Locust each user is defined through
tasks, e.g. web requests, which they can perform. When running a benchmark with
Locust, a number of users are created, each of which will pick random tasks to
perform for an amount of time. These tasks can be weighted, such that some tasks
are more likely than others. An example Locust user specification can be seen in
Listing 3.

14


4. Methods

from locust import HttpUser, task
from random import randint

nonSharedDomainSize = 10000
sharedDomainSize = 1500
percentageShared = 50
weightDelete = 20
weightRead = 80

def getKey():
isShared = randint(1,100) <= percentageShared

if isShared:
return randint(sharedDomainSize + 1, sharedDomainSize + nonSharedDomainSize)

else:
return randint (1, sharedDomainSize)

class User(HttpUser):
@task(weightRead)
def get(self):

n = getKey()
self.client.get(f"/cache/{n}")

@task(weightDelete)
def remove(self):

n = getKey()
self.client.delete(f"/cache/{n}")

Listing 3: An example Locust file with non shared domain size 10000, shared domain
size 1500, 50% requests to shared domain, and 80% read ratio.

15


4. Methods

16


5
Results and Discussion

The following chapter covers the results of testing and discusses their implications.
The results will be presented in the form of several graphs, all following the same
pattern. For an example graph see Figure 5.3.

• cache hit denotes whether a requested element was present in the cache or
not. Consequently average on hit means average within all hits, and average
on miss means average within all non-hits.

• cache response denotes the time in which a cache determines whether it has a
requested element or not. Thus average cache response on miss would be the
average time before a cache determines that it does not have the requested
element.

• lookup denotes the time used to fetch a missing element from the slow memory.
In testing the slow memory was set to always return in 10 ms. Times higher
than that are caused by the slow down induced by the cache.

• total response denotes the full time a cache request takes to handle. This is
typically just the cache response + lookup.

5.1 Latency vs hit rate
Throughout all testing done the most clear result is the respective benefits of the two
models. Namely hit rate vs latency. Consider Figure 5.4: Average cache response,
on hit, miss and in general is an order of magnitude faster for the distributed cache
system. This is to be expected: after all, memory access is far faster than network
access. Going on to consider the average total response on hit we see the same thing.
The average total response on miss is more interesting. Recall that our simulated
slow memory is set to a latency of 10 ms, any additional time is incurred due to
the cache. Again the distributed cache beats the shared system by a far margin.
Which brings us to the most interesting bar, the average total response. Notice that
though the distributed cache system had an on average lower latency in every case,
the average total response is almost equal. This is due to the hit rate difference. In
that test the shared cache had a hit rate of 59%, compared to the distributed caches
43%.

This general pattern stays true throughout all our testing, and most of our factors

17


5. Results and Discussion

had a larger impact on our hit rate rather than the latency times.

5.2 Impact of prefetching
In our tests, prefetching had a consistent positive impact upon the hit rate, although
the size of this impact differed slightly based on the read ratio of the test case. Con-
sider the difference between Figure 5.4 and Figure 5.1: just turning on prefetching
causes a significant increase in hit rate for both the shared and distributed caches
(from 59% to 89% for shared, and from 43% to 86% for the distributed). Impor-
tantly, this almost completely removes the difference in hit rate between the two
systems. However, as we can see if we consider the difference between Figure 5.1
and Figure 5.2, a lower read ratio causes this hit rate improvement to decrease.

5.3 Impact of number of nodes
Perhaps the most significant factor in this comparison is the degree of distribution.
The number of nodes running the Web service has a direct impact on the amount
of data duplication needed for the distributed cache system. Another feature high-
lighted by the degree of distribution is the possibility for different nodes to benefit
from each others’ misses in the shared cache system. Consider the difference between
Figure 5.4 and Figure 5.3. Increasing the amount of nodes from three to twelve more
than halves the hit rate for the distributed cache, going from 43% to 20%. This also
causes the difference in total response to go from being about equal to the shared
cache performing more than twice as fast as the distributed system.

5.4 Impact of shared domain
The most subtle impact came from changes in domain sizes and ratio. Consider
Figure 5.5 and Figure 5.6: Changing from 30% to 60% of requests going to the shared
domain results predictably in increased hit rate. Though this hit rate increase is
not equal in both models, with the shared cache system getting a greater increase
(38% to 53%) compared to the distributed system (21% to 29%).

18


5. Results and Discussion

Ti
m

e 
(L

ow
er

 is
 b

et
te

r)

0 ms

5 ms

10 ms

15 ms

20 ms

Avg cacheresponse Avg cacheresponse 
on hit

Avg cacheresponse 
on miss

Avg totalresponse Avg totalresponse on 
hit

Avg totalresponse on 
miss

Shared Distributed

(a) Response times in ms, lower is better

H
it 

R
at

e 
(H

ig
he

r i
s 

be
tte

r)

0%

25%

50%

75%

100%

Shared Distributed

(b) hit rate, higher is better

Figure 5.1: 3 nodes, with prefetching, 10000 elements non-shared domain, 1500
shared domain, 30% shared domain, 80% read ratio.

19


5. Results and Discussion

Ti
m

e 
(L

ow
er

 is
 b

et
te

r)

0 ms

5 ms

10 ms

15 ms

20 ms

Avg cacheresponse Avg cacheresponse 
on hit

Avg cacheresponse 
on miss

Avg totalresponse Avg totalresponse on 
hit

Avg totalresponse on 
miss

Shared Distributed

(a) Response times in ms, lower is better

H
it 

R
at

e 
(H

ig
he

r i
s 

be
tte

r)

0%

25%

50%

75%

100%

Shared Distributed

(b) hit rate, higher is better

Figure 5.2: 3 nodes, with prefetching, 10000 elements non-shared domain, 1500
shared domain, 30% shared domain, 50% read ratio.

20


5. Results and Discussion

Ti
m

e 
(L

ow
er

 is
 b

et
te

r)

0 ms

5 ms

10 ms

15 ms

20 ms

Avg cacheresponse Avg cacheresponse 
on hit

Avg cacheresponse 
on miss

Avg totalresponse Avg totalresponse on 
hit

Avg totalresponse on 
miss

Shared Distributed

No prefetching, 12 nodes

(a) Response times in ms, lower is better.

H
it 

R
at

e 
(H

ig
he

r i
s 

be
tte

r)

0%

25%

50%

75%

100%

Shared Distributed

No Prefetching, 12 Nodes

(b) hit rate, higher is better

Figure 5.3: 12 nodes, no prefetching, 10000 elements non-shared domain, 1500
shared domain, 30% shared domain, 80% read ratio.

21


5. Results and Discussion

Ti
m

e 
(L

ow
er

 is
 b

et
te

r)

0 ms

5 ms

10 ms

15 ms

20 ms

Avg cacheresponse Avg cacheresponse 
on hit

Avg cacheresponse 
on miss

Avg totalresponse Avg totalresponse on 
hit

Avg totalresponse on 
miss

Shared Distributed

No prefetching, 3 nodes

(a) Response times in ms, lower is better

H
it 

R
at

e 
(H

ig
he

r i
s 

be
tte

r)

0%

25%

50%

75%

100%

Shared Distributed

No Prefetching, 3 Nodes

(b) hit rate, higher is better

Figure 5.4: 3 nodes, no prefetching, 10000 elements non-shared domain, 1500 shared
domain, 30% shared domain, 80% read ratio.

22


5. Results and Discussion

Ti
m

e 
(L

ow
er

 is
 b

et
te

r)

0 ms

5 ms

10 ms

15 ms

20 ms

Avg cacheresponse Avg cacheresponse 
on hit

Avg cacheresponse 
on miss

Avg totalresponse Avg totalresponse on 
hit

Avg totalresponse on 
miss

Shared Distributed

(a) Response times in ms, lower is better.

H
it 

R
at

e 
(H

ig
he

r i
s 

be
tte

r)

0%

25%

50%

75%

100%

Shared Distributed

(b) hit rate, higher is better

Figure 5.5: 6 nodes, no prefetching, 50000 elements non-shared domain, 1500 shared
domain, 30% shared domain, 80% read ratio.

23


5. Results and Discussion

Ti
m

e 
(L

ow
er

 is
 b

et
te

r)

0 ms

5 ms

10 ms

15 ms

20 ms

Avg cacheresponse Avg cacheresponse 
on hit

Avg cacheresponse 
on miss

Avg totalresponse Avg totalresponse on 
hit

Avg totalresponse on 
miss

Shared Distributed

(a) Response times in ms, lower is better

H
it 

R
at

e 
(H

ig
he

r i
s 

be
tte

r)

0%

25%

50%

75%

100%

Shared Distributed

(b) hit rate, higher is better

Figure 5.6: 6 nodes, no prefetching, 50000 elements non-shared domain, 1500 shared
domain, 60% shared domain, 80% read ratio.

24


6
Conclusion

The first goal of the project was to create a per-server cache system. The caching
domain was modeled. Session consistency was chosen as the consistency model. A
common interface was created for caches in this domain. Finally, two implementa-
tions of this interface were programmed. One using a per-server cache as described
in the first goal and one as a reference using the state-of-the-art in memory key-value
store Redis.

The second goal concerned performance testing. A test suite was created testing four
variants of the caching system, under different user behaviors. A few conclusions
can be drawn from these results:

• There is a clear trade-off between hit rate and latency to take into account
when choosing the shared vs the distributed cache system.

• The shared cache system scales far better as you add nodes. The distributed
cache system’s hit rate is quickly hindered as the number of nodes was in-
creased. Due to the fact that nodes cannot benefit from each other’s cache
misses.

• Prefetching at startup helps both cache systems, though not equally. In par-
ticular, it helps close the gap in hit rate between the two systems.

• A larger cache domain results in a lower hit rate for both cache systems.
Though again, in the shared cache system different nodes can benefit from
each other’s misses, and thus the penalty is slightly lower for them.

• As users become more similar and request more frequently from some shared
“common” domain, the hit rate for both cache systems goes up. However,
again, the shared cache system benefits more than the distributed one.

In conclusion: there are cases where both cache systems are viable. As long as the
distributed web app runs on 3 nodes or less, the systems often perform very similarly.
Although the app is deployed on more nodes, the distributed cache system quickly
starts to fall behind in hit rate. As this happens, the benefit of a lower latency
diminishes quickly.

25


6. Conclusion

6.1 Future work
There are always more things to explore, and this project is no different. The project
has necessarily been focused on Antura’s use case. This has meant disregarding
factors that could play a large part in other similar cases. The most important of
these could arguably be cache-replacement and cache-eviction algorithms. This can
benefit greatly from domain-specific knowledge.

Another factor that the project would have explored given more time is the impact
of other prefetching methods. In testing, only prefetching of the complete domain
on startup was explored. It would have been interesting to look at domain-specific
prefetching and how a cache interface supporting this could have looked. In Anturas
case, this could have included strategies like loading an entire project into the cache
upon a cache request of a single object within it. Another thing that could be
explored in more depth is different variants of the shared cache system. Redis has
a depth of features which were not explored, including distribution options, which
could be especially interesting in this comparison. One could also explore a hybrid
system, with both shared and distributed features, which could theoretically be
tailored for both common and disparate user request patterns.

26


Bibliography

[1] J. Wang, “A survey of web caching schemes for the internet,” Comput. Com-
mun. Rev., vol. 29, no. 5, pp. 36–46, 1999. doi: 10.1145/505696.505701.
[Online]. Available: https://doi.org/10.1145/505696.505701.

[2] I. U. Din, S. Hassan, M. K. Khan, M. Guizani, O. Ghazali, and A. Habbal,
“Caching in information-centric networking: Strategies, challenges, and future
research directions,” IEEE Communications Surveys & Tutorials, vol. 20, no. 2,
pp. 1443–1474, 2018. doi: 10.1109/COMST.2017.2787609.

[3] M.-C. Lee, F.-Y. Leu, and Y.-P. Chen, “Pareto-based cache replacement for
YouTube,” World Wide Web, vol. 18, no. 6, pp. 1523–1540, 2015. doi: 10.
1007/s11280-014-0318-9.

[4] Memcached - a distributed memory object caching system, https://memcached.
org/.

[5] Redis, https://redis.io/.
[6] R. Nishtala, H. Fugal, S. Grimm, et al., “Scaling memcache at facebook,” in

10th USENIX Symposium on Networked Systems Design and Implementation
(NSDI 13), 2013, pp. 385–398.

[7] R. Tewari, M. Dahlin, H. Vin, and J. Kay, “Design considerations for dis-
tributed caching on the internet,” in Proceedings. 19th IEEE International
Conference on Distributed Computing Systems (Cat. No.99CB37003), 1999,
pp. 273–284. doi: 10.1109/ICDCS.1999.776529.

[8] W. Vogels, “Eventually consistent,” Communications of the ACM, vol. 52,
no. 1, pp. 40–44, 2009. doi: 10.1145/1435417.1435432.

[9] S. Gilbert and N. Lynch, “Brewer’s conjecture and the feasibility of consistent,
available, partition-tolerant web services,” SIGACT News, vol. 33, no. 2, p. 51,
Jun. 2002, issn: 01635700. doi: 10.1145/564585.564601.

[10] M. J. Wise, “Message-brokers and communicating prolog processes,” in PARLE
’92: Parallel Architectures and Languages Europe, 4th International PARLE
Conference, Paris, France, June 15-18, 1992, Proceedings, D. Etiemble and J.
Syre, Eds., ser. Lecture Notes in Computer Science, vol. 605, Springer, 1992,
pp. 535–549. doi: 10.1007/3-540-55599-4\_109. [Online]. Available: https:
//doi.org/10.1007/3-540-55599-4%5C_109.

[11] K. P. Birman and T. A. Joseph, “Exploiting virtual synchrony in distributed
systems,” in Proceedings of the Eleventh ACM Symposium on Operating Sys-
tem Principles, SOSP 1987, Stouffer Austin Hotel, Austin, Texas, USA, Novem-
ber 8-11, 1987, L. Belady, Ed., ACM, 1987, pp. 123–138. doi: 10.1145/41457.
37515. [Online]. Available: https://doi.org/10.1145/41457.37515.

27

https://doi.org/10.1145/505696.505701
https://doi.org/10.1145/505696.505701
https://doi.org/10.1109/COMST.2017.2787609
https://doi.org/10.1007/s11280-014-0318-9
https://doi.org/10.1007/s11280-014-0318-9
https://memcached.org/
https://memcached.org/
https://redis.io/
https://doi.org/10.1109/ICDCS.1999.776529
https://doi.org/10.1145/1435417.1435432
https://doi.org/10.1145/564585.564601
https://doi.org/10.1007/3-540-55599-4\_109
https://doi.org/10.1007/3-540-55599-4%5C_109
https://doi.org/10.1007/3-540-55599-4%5C_109
https://doi.org/10.1145/41457.37515
https://doi.org/10.1145/41457.37515
https://doi.org/10.1145/41457.37515


Bibliography

[12] Json.net - newtonsoft, https://www.newtonsoft.com/json.
[13] Locust - a modern load testing framework, https://locust.io/.

28

https://www.newtonsoft.com/json
https://locust.io/

	Introduction
	Background
	General overview of caching
	Database caching using an in-memory key-value store
	Consistency models

	Message brokers and the publish subscribe pattern
	Case: Antura

	Main idea
	Designing a per-server cache system
	Comparing per-server and shared caching
	Delimitations

	Methods
	Defining a cache model
	The distributed cache system
	The shared cache system
	Testing and evaluation
	System parameters
	User simulation


	Results and Discussion
	Latency vs hit rate
	Impact of prefetching
	Impact of number of nodes
	Impact of shared domain

	Conclusion
	Future work

	Bibliography