Interlude
Wikispeedia is an online human-computation game based on Wikipedia [1]. A player has to navigate from a designated source to a given target article, solely by clicking on Wikipedia links. As such, this generates a dataset that contains human navigation paths on Wikipedia which allows for interesting studies. That is, topics concerning the semantic distance between concepts [2] or the searching behavior of humans [3] can be interrogated. To do this, a lot of data is needed which can be supported by an active community that frequently plays the Wikispeedia game. But how can you get people to play and finish the game maybe even multiple times? What motivates people to finish the task and do not give up? With this work these questions are answered by proposing a set-up for the game that is likely to incentivize people to stay engaged, and hence more data for analysis purposes can be generated.
Table 1
Data
In this study a condensed version of Wikipedia is used [4]. Summary statistics can be found in Table 1 above. We are interested in the individual players and more particularly in their unfinished paths. We would like to motivate players to succesfully complete multiple games, and for that we in the first place have to understand why some players are actually not even able to finish a game.
For each unique player we can compute the proportion of finished games over the total number of games played by that person. The distribution of the latter is displayed in Figure 1 below.
Figure 1
Based on this result we can partition the players into 3 groups:
- 1. Players with 0 finished games.
- 2. Players with partially finished games.
- 3. Players with only finished games.
Hence we observe that a substantial 25% of the total number of players do not have any finished game at all😕 This is a big loss❗ Imagine if this portion would also have had succesful attemps (i.e. finished games). Then there would have been much more data on finished games to do further analysis on. On the other side of the spectrum we find that 51% of the players have finished games only. We want to investigate what caused this bifurcation of behavior. The remaining 24% falls in between these two categories. These players experience mixed results when playing the Wikispeedia game.
Let's have a look at the number of games played per group displayed in the Figure 2 below.
Figure 2
First we see that the distribution of the number of games played for all groups (i.e. 1., 2. and 3.) follows a power-law. Other striking obervations are that group 1. seems to play less games than the other two groups. Perhaps the fact that these players are not able to finish a single game at all demotivates them to play further. Group 2. and 3. are fairly similar, besides the few outliers in group 3. These outlier represent a very tiny portion of highly motivated players that play many games.
Also we can check per player what type (i.e. finished or unfinished) their last game was. By definition group 1. and 3. only end on an unfinished or finished game respectively. It is interesting, however, to investigate group 2.. We observe that players ending on an unfinished path have significantly less finished games overall compared to players ending on a finished path. This could suggest that players ending with an unsuccesful game could get demotivated to continue playing another game. This abandoning behavior could result in these players missing out on potential succesful games in the future.
Now what?
So we see many players that are unable to finish any game at all as they quickly abandon the game after relatively few games. On top of that there are players with mixed succes, though at some point they experience an unfinished path being the last straw that breaks the camel's back. Both groups are lost potential ❌. Ideally we want these players to stay engaged in order to generate potential finished paths in the future 📈. One strategy to achieve this is to detect during an individual's progress when that person is likely to quit the game. When the odds for the latter are high, the game can offer "easier" games or provide "hints" to guide the player.
Hence, the first step to achieving this is to actually create an accurate detector. This is what we will address in this report. The aim is to build a simple (logistic regression) model using features that we can extract from two categories:
- Between game: Features derived from the history of games played by a player.
- Im game: Features derived from the path traversed of a single game of a player.
A schematic depiction of the overall model is depicted below in Figure 3. Accordingly we will train a classifier that predict wether the individual is likely to succesfully finish the game or not ✅. Under the assumption that players ending on a unfinished game get demotivated and are more likely to abandon the game, this investigation contributes to understanding why players quit the game. Concurrently, this provides the first stepping stone towards finding adequate means to keep players engaged in the game.
Figure 3