The Tale of the Unfinished Paths by MelsJagt

Home
1. Between game
2. In game
3. Classification
4. Conclusion

2. In game

After having made an analysis based on players history, we now focus on the analysis per game and try to understand what makes a player finish or not. We first focus on the Wikipedia page categories to see if players get stuck in certain ones, and if the starting or the target page has an impact on the success of the game. Then we focus on the paths and try to find out the relationship between the path the player took, the distance to the target and the result of the game, as well as the motivation to keep playing.

2.1 Analysis on categories.

We want to analyze where the player can get stuck during their games as this could potentially lead to an unfinished result. For each Wikipedia article in the players graph (i.e. that path taken during a game), we have its in and out degree and these two values are not always the same. In fact when we constructed the graph we remove edges that went back to the previous pages visited. For example, suppose a player starts with page A and then goes to B. He/she might then see that it was not a good idea to go to B and uses the back button to return to A, after which the player moves go to C. At the end A has in/out degree of 0/2 whereas B has in/out degree 1/0. Therefore, analysising the difference between the degrees can give us insight on where player get stuck.

In fact when we compute D = in_deg - out_degree, we can have two cases:

D is big: we came to this node (wikipedia page) but then returned to the previous one - the player thought that he/she was on the wrong way (node B in our example)
D is negative: this means that we get back and forth from this node, not knowing how to escape - the player think he is at a key node but does not know how to go further (node A in our example)

Figure 10

We observe in Figure 10 above that the D for nodes in finished paths are concentrated around 0, which means the go back button was not used much during these games, whereas for unfinished paths, the histogram is skewed wider.

Looking more specifically at unfinished paths, we see that we have both cases (D is large positively and negatively). In addition what is interesting is the following:

Nodes with D large positively (Node B) are the nodes that are used more often in finished paths like Politics, History, Chemistry which means that maybe players that did not finish did not adopt the good strategy and should have continued in the paths using these topics.
Nodes with D large negatively (Node A), the node players cannot escape, are the ones related to topics like Literature, Theatre, Architecture which are the node used less in finished than in unfinished. So we might conclude that the players should escape these node as soon as possible and used better concepts like Politics.

However, with this analysis we should be careful with possible cofounder which could be that the topics of Literature, Theatre, Architecture are intrinsically harder and this is why people get stuck. Nonetheless it is still an interesting observatioo that has been made here.

2.2 Comparing topics of starting and target pages

We will now look at the topic distribution for both the starting and target pages (Figure 11).

Starting page

First we see that the starting page histogram has a smaller variance then the target page histogram. This is not a surprise since we know that the starting phase of going to a hub is the easiest part. Moreover we cannot say that a starting page is easier but there are still some pages that seems harder to start with

['Architecture', 'Theatre', 'Literature_types', 'General_Biology', 'Language_and_literature', 'Design_and_Technology', 'Air_and_Sea_transport', 'Conflict_and_Peace']

. In particular, starting with the topic Architecture lead to a probability of sucess of 0.26 which means that the game is finished once over 4 attempts.

Target page

In the target page histogram, we can see a lot of variance and there are topics that seems to be easier or harder than others. We can see that topics with Geography or Countries are quiet easy. Topics as European_Countries, USA_Presidents, European_Geography are finished more than 4 times over 5 attemps. On the other side, topics more related to Movies or Literature are harder. For example, the topic General_Literature has a score of 0.08 which mean that a game is finished less than 1 times over 10 with this concept as target.

Figure 11

In conclusion, some starting and target concepts has a big impact on the result of a game. We will use these two score for our logistic regression.

2.3 In game progress

So far we looked at the topic distributions related to the starting, target page and the node degrees. Now we want to dig deeper into the progress of an individual player as it traverses through the path taken by him/her. Therefore we propose a progress score that keeps track of the the player's goal-directed path navigation, illustrated in Figure 12:

Figure 12

In the schematic above the following procedure applies to evaluate the progress:

1 : For every node, the shortest (theoretical) distance to the target is computed using the graph constructed of the connectivity of Wikipedia articles
2 : The differential is calculated as diff[i] = node[i]-node[i+1]
3 : We convolve with a Halved Gaussian Kernel. We assume that the player has a general feeling about their distance to the target. The players gets motivated when the distance to the target decreases and reciprocally demotivated when the distance increases. Also we assume that the most recent moves of the player are most prominent in the memory. Hence a Halved Gaussian Kernel is a good model to mimic this behavior
4 : We repeat this procedure for all other games played.

Following this procedure for all games, we can compare how the progress differs between a finished or unfinished game. For a fair comparison we took out the last game of finished paths since this is per definition the target. Acccordingly we can have a fair comparison between the two classes, desplayed in Figure 13 below.

Figure 13

Interestingly, we observe that for both finished and unfinished games the best progress score is obtained around path length 4-5. At the same time we observe a slight difference between finished and unfinished games. However this difference is not too convincing as the standard deviations overlap. For that reason we leave it out of the logistic regression.

Adding together the features related to the concept, we now completed our feature matrix, and we are ready to move on to the classifation stage. For more visit the next page!

Figure 14