1. Between game
Here we will extract features based on the history of games of a player. The aim is to find features that can potentially seperate the classes finished
and unfinished (i.e. timeout or restart)
.
1.1 Number of unfinished games.
For the first feature we consider the number of unfinished games in total (excluding the last game) of the player. To examplify:
[restart, finished, finished]
, number of unfinished games: 1
[restart, finished, timeout]
, number of unfinished games: 1
[timeout]
, number of unfinished games: 0
As we want to predict the class of the last game of a player, we only look in the past games of the players.
A bootstrap algorithm is performed to (visually) inspect wether this features can seperate both classesd. As observed Figure 4 this is indeed the case. Hence we will use this feature in our model.
Figure 4
1.2 Number succesive unfinished games before last game.
For the second feature we want to consider the number of chained unfinished games up to the last game. We believe (intuitively) that this feature, in addition to the total number of unfinished games (see 1.1), impacts the degree to which a player is likely to give up or not. To exemplify:
[timeout restart timeout]
, recent unfinished streak: 2
[timeout restart finished timeout]
, recent unfinished streak: 0
Again, following a bootstrap analysis (Figure 5), we see that this features also significantly seperates both classes.
Figure 5
1.3 Time difference between first and last game.
For the third feature we focus on how long ago the first game of an individual players was with respect to their last. The motivation to investigate this feature is that a long time difference is a potential indicator predicting that a player is more likely to have an unfinished game as he feels the game has been "worn out".
The bootstrap anlaysis (Figure 6) suggests that this reasoning could be the case indeed.
Figure 6
1.4 Longest win streak.
Intuitively being on a long win streak, or having had a long win streak can be motivating for playing a game repeatedly despite having some unsuccesful games in between. Exactly for this reason we also want to consider the longest steak of finished games a player has had in the past.
According to our expected, people with their last game being succesful also have a significantly larger win streak in the past (Figure 7).
Figure 7
1.5 Cross correlations.
So far we have discussed 4 distinct features as potential predictors. However, we are not done yet. We would like to assess wether the different features are (independently) meaningful. That is, we preferably want them to not correlate to much with each other. To assess this, we computed a covariance matix in Figure 8 below.
Figure 8
Hence, we observe that the cross-correlations are generaly low, indicating that they all independently contribute meaningful information to our model. For that reason we take all features into account and add them to our model.
We have now finished the first part of building our model as depicted in Figure 9 and the next step is to look at the In game features.
Figure 8