Blog post cover

2019-11-14 01:02 - Technical

I used Machine Learning to find what game to play next

This article is both for people who have some knowledge of Machine Learning and those who have never heard about it. I will present my approach to making a recommendation system for Steam games.

Why ?

Steam has its own system of recommendations that probably uses machine Learning technologies, but their system has many flaws that make it generally irrelevant:

  • Steam does not have a precise quantification of how much I liked the games: recommending a game after playing it is optional and it’s a binary choice: either I liked it or not (whereas the reality is a lot more nuanced).
  • Steam must use a generic model to work with a wide variety of users, and such a system has trouble learning what criteria are relevant for each user.
  • Steam is a business and its purpose is profit: their recommendation system is probably biased towards maximizing their revenue.

One of Steam’s approaches to making recommendations is clustering, which means matching similar games, and then recommending games similar to those I played. For example, on the following screenshot of my list of recommendations, Steam felt that Secret Neighbor was similar to Portal 2 so I might be interested. Except that multiplayer games and horror are not my thing, so… well… nope, sorry Steam.

Secret Neighbor

My idea is to make my own recommendation system, which will work better than Steam’s one (because customized for me in particular).

So… what’s Machine Learning?

Basically it’s a bunch of computer techniques that share a common idea: to allow machines to find results without having been explicitly programmed for these tasks, thanks to a learning phase.

My approach uses supervised learning, that is, I train my model on a number of games I already know, giving it annotated data, i.e containing not only features of the game but also a quantity between 0 and 1 which expresses how much I liked the game. During the training, the algorithm tries to understand the link between the features of the games and their score, and in the prediction (or inference) phase, I give to the algorithm games that I don’t know and ask him which score it would give them. I can ask it to do this for all Steam games and to list those it has assigned the highest score to.

Build the dataset

The first step of a recommendation system is to collect data on a large set of games, in my case all those from the Steam store.

For that I built a scraping algorithm, that is to say a program that collects data. Rather than using the Steam Online Store like a human would, my algorithm uses APIs (Application Programming Interfaces), basically communication protocols that allow programs to query a site or service.

I gathered the data from several APIs: the official Steam ones ( / api) and an unofficial API of the site SteamSpy ( The reason is that in all the mess of the official interfaces of Steam, I did not find all the data that I wanted.

Among the information that I collected, we find for example for each game:

  • Its Metacritic score.
  • Its Steam score (ratio of positive/negative reviews)
  • Release date.
  • Tags I found relevant (e.g “Story Rich”, “Great Soundtrack”)
  • Controller support (I don’t like to play with the keyboard)
  • etc…

Then I gave a score to the games. First a score to those I played, which is pretty obvious. But that’s not enough. Indeed:

  • The games I played aren’t representative of all Steam games.
  • The vast majority of the notes I put were very positive. My model must know not only what I like, but also what I don’t like, to understand the difference.

So I doubled the number of annotated games by picking games at random, and if I came across games I was sure didn’t interest me, I added them with a score of zero.

Out of the more than 37,000 games that Steam counts, my dataset has only kept the 3070 games that appear on Metacritic, and among those 115 have a score, half of which have a good score (games I have played), and the other half has a zero score.

Analysis of the relevant features

One would wrongly assume that the Metacritic score or the Steam score of a game would be enough to find what games I like or not, but it doesn’t take into account that each individual has different tastes. The image below illustrates this by plotting the score I gave to the games I played, in function of their Metacritic rating or Steam rating:

Features vs score

Note that the games I liked all have a relatively good Metacritic/Steam score, but the opposite is not true: many games that have a good Metacritic/Steam score did not really please me.

These features are not enough to explain why I liked a game or not. As mentioned above, my intuition was to use the Steam tag system to add relevant information about each game that will allow the model to better learn my behavior. These tags can be:

  • Positive tags, i.e that will generally increase the score of a game.
  • Negative tags, i.e which will penalize a game.

I can not use all available tags for reasons that I will discuss in the next section. I have to find a limited number of tags. Finding the right tags was a trial and error process by evaluating how different tags influenced my model.

Train the machine

This section is less trivial but I will try to make it more or less understandable anyway. The model I used is called a neural network. The idea is to roughly simulate the functioning of the human brain.

When building my dataset, I turned all the features into numeric values. The neural network is a sequence of layers where the neurons of each layer react to the values ​​of the previous layer to generate a value in turn: the values ​​of the layer on the left are the features of each game and the value of the neuron on the right is the score I’ve assigned to the game. The network learns by changing the way each neuron generates a value, thanks to a magical process called backpropagation. Well it’s not magic at all but hey it’s a little too long to explain for this article, so if you’re interested, uh… well follow this Stanford course, it’s free, online and that’s how I learnt. If you want to quickly experiment with neural networks interactively, you can also use the Tensorflow playground.

A very simple neural network for our problem would be:

simple neural network

The real neural network I used has 11 input features, and two intermediate layers of 16 neurons each.

A problem we quickly face is called overfitting: because of the small number of annotated examples used to train the neural network, it fits too much to annotated examples and performs worse on games that haven’t been used to train it. This is a major problem because it means that the quality of the predictions will be lower. One way to measure this is to divide the training set into two: one part (eg 70%) is used to train the model. The other part (here 30%) is used only to evaluate the performance of the model (these games are unknown to the model since they are not used to train it). If I evaluate the errors of predictions alog the training, on each of the two sets, I observe (average squared errors):


Here, the error on the training set decreases, i.e the algorithm makes progress on the samples that it knows. But after a while, its predictions on unknown samples get worse because the algorithm is too specific to the known samples.

So I applied a process called regularization, which consists, basically, in training the algorithm by penalizing too unequal importance of the different features. The training curve now looks like this:


The error was almost halved on the validation set, while it slightly increased on the training set (the model is less specific to the known samples). The model is ready to make predictions.


My algorithm computed for each game of the Steam store that I did not give a score to, what score it thinks I would give this game. The 5 games to get the highest scores according to the algorithm are:

rank name of the game
1 Return of the Obra Dinn
2 Mutazione
3 The Wolf Among Us
4 Dark Souls III

I must say that I am impressed by the quality of the results. The first two games are already in my wish list (which the algo did not know!), And The Wolf Among Us and SOMA are also very likely to interest me. Dark Souls III not too much because it’s too dark and difficult.


I’m a Machine Learning beginner so do not trust me too much (that’s what I do professionally but I’m more on the GPU optimization side).

I am very satisfied with the results, and it has been an interesting exercise.

You can find the source code on GitHub.

Note: if you aren’t convinced that Machine Learning is great yet, well most of this article was translated from French with Google Translate because I’m a lazy bastard. I just changed a few words. Translating services use more advanced Machine Learning algorithms but it’s based on the same ideas.

Cover picture from NieR Automata.

nyri0 (2019-11-14 20:13)

Additional details: the following table contains the Pearson correlation coefficients of various features wrt the score.

feature correlation with the score
Metacritic 0.395986129743881
Reviews 0.280593754607957
story rich 0.24654041479133
great soundtrack -0.130778359562621
atmospheric 0.441390265757566
date 0.150532820912653

The tag “atmospheric” has the highest correlation, so it is very likely to define whether I liked a game or not. Surprisingly the tag “great soundtrack” has a negative correlation. It means that it is correlated to the score that I gave to the games, but it has the tendancy to go against it. It’s weird but we have to keep in mind that this is calculated among a set of games that I mostly liked, and this tag doesn’t appear much, so one or two games that I didn’t like with this tag can influence the correlation a lot.

nyri0 (2019-11-15 21:45)

Update: had a look further down the list and made interesting discoveries such as Pinstripe and The Gardens Between! :)

nyri0 (2019-11-15 23:22)

Another update: I used random forests and achieved the same quality of results as neural networks with no effort whatsoever!! Most of the top games are pretty similar with both approaches. The code is also on GitHub.

Log in to post a comment.