Predicting retention among application users with online ensemble learning models
Examensarbete för masterexamen
Engineering mathematics and computational science (MPENM), MSc
Most service providing companies consider customer retention as the most important asset for improving profitability. Even for services and applications without paying customers the retention of users is essential, as more advertisement impressions are generated and the reputation of the brand strengthens. The ability to foresee which users will be retained and which are likely to churn is therefore highly valuable for any expanding company. Forza Football is one of the world’s most popular football live score applications with millions of weekly active users. New data of application activity among users arrives sequentially in the form of a stream. To predict future user activities, a model must be able to adapt to seasonal drifts in activity. The model must furthermore remain scalable and time efficient when analyzing new instance arrivals, given that the size of each instance is several million observations. Motivated by these requirements, this thesis approaches a data stream of previous user activities to predict the activities of upcoming instances. State-of-the-art ensemble classification methods are acclimatized to an online learning environment to incorporate both historical and current information in a computationally low-cost manner. Various predictive models are proposed which obtains accurate predictions that are efficient in terms of storage and computational time. The models are stable in detecting and adjusting to concept drifts.
Online learning, Data stream analysis, Concept drift, Random Forest, Decision tree ensembles, retention prediction, churn prevention