Predictive Caching
Publicerad
Författare
Typ
Examensarbete för masterexamen
Program
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Though digitization has revolutionized the entertainment industry, streaming services
like Netflix, Spotify, etc. are the ones who made the content available to the
users through hand-held devices. These services require an active internet connection
to deliver the requested content to the user device, consuming the expensive
mobile data subscriptions of the user. The aim of the thesis project is to optimize
the mobile data usage by predicting the content a user is most likely to download so
that it can be pre-fetched when the user’s device is connected to high bandwidth,
less-expensive network. Different use cases were considered to identify the potential
candidates that a user is most likely to download through mobile data subscription.
First, users are highly probable to download the personalized content recommended
by these services. Hence, the user behavior on personalized content was modeled
using a Logistic Regression algorithm as a generic baseline approach. Second, the
users tend to use multiple devices to stream content and it is very likely that they
play the same content from different devices. This has a strong pre-cache potential
in the context that contents viewed/listened to in one device could be used to predict
the possible streaming behavior in the user’s other devices. Third, the users
prefer to play contents from different playlists provided by streaming services. The
third use case exploited the user behavior on playlists to predict the contents a user
is likely to download in future. We employed a Gradient Boosting algorithm to
model the device sync and playlist use cases. The results were evaluated using a
generic evaluation metric defined solely for the purpose, and different use cases were
compared. The device sync model predicted 15% of the potential savings that were
identified through data analysis, whereas the playlist model predicted 30%.
Beskrivning
Ämne/nyckelord
Computer science, engineering, thesis, machine learning, predictive caching, user behavior, xgboost, logistic regression