In previous articles, we’ve looked at odds and probabilities, then used this to understand the concept of value betting. Whilst discussing value, we touched on creating our own prediction model to allow you to generate your own probabilities and odds for certain sporting events. This can then be used to compare your odds with those of the bookmaker to identify value in the market and (touch wood) ensure sustained profit in the long term. In this article, we go through the steps required to create our own football (soccer) prediction model using Poisson Distribution, as well as look at some of the limitations of this approach.
So what is Poisson Distribution? If you Google it, you get back a lot of scary definitions that are very difficult to understand, such as “Poisson distribution is the probability of the number of events that occur in a given interval when the expected number of events is known and the events occur independently of one another”. What this basically means is that when we know the average number of times that an event will happen, we can use Poisson to calculate how likely other numbers deviate from this average.
Luckily though, we don’t need to fully understand the concept, the formula or how to calculate it because Microsoft Excel has a formula which can work out Poisson automatically. All we really need to know is that it can be used to calculate the probability of outcomes for a football match, which in turn can be turned into odds which we can use to identify value in the market. This covers a number of goal based markets such as Match Outcome (1×2), Correct Score, Over / Under Match Goals, Both Teams To Score and Asian Handicap. There is plenty of more in depth reading into Poisson online, but we won’t be delving into that level in this article.
Although it has its limitations and faults, Poisson is a useful starting point to understand the fundamentals of creating your own odds. It can work as a standalone model which you use to advise your betting, or it can be used to understand the basics before going on to explore further, more complicated methods. It also has applications to other sports, but in this article we will just look at football.
As you begin to create your own odds, check them against our top-rated football betting sites with the best odds below:
So how do we actually create a predictive model for football games based on Poisson distribution?
As a quick summary, what we are going to do is take historical results to calculate the number of goals teams score and concede. These averages are compared to the league average and used to create values for attacking strength and defensive strength for every team, which are then turned into goal expectation figures. This metric is put into a Poisson Distribution formula which works out the probability of every result when two teams face each other. We then take these probabilities to create our own odds, compare these against the bookies’ odds, then identify where there is value in the market because the bookies are offering more generous odds that we’d expect. Simple!
The beauty with a method like this is that there are a number of different points during the process where you might decide to try a different value as an input or may want to include something else in the calculation. You may even choose to calculate goal expectation in a completely different way, for instance, by using Elo ratings which ranks all teams against each other – as teams play each other, their respective rating will increase or decrease based on the outcome of the result – and will be covered in a later article. That is perfectly fine and will help you develop and refine your predictive model during its lifetime.
The below is a slightly modified version to the method I used throughout the 2013/14 season – after all, I don’t want to give all of my secrets away – however, it will allow you to create your own predictive model if you follow these steps.
The first step is to decide which league(s) you want to build a predictive model for. Until you get your model to a stage where you are happy with it, it makes sense to focus only on one league, preferably one you know well. Once everything is working as you wish, then the model can be replicated for different leagues. You will go through a period of testing and improving, so it makes sense to do this for one league to start with rather than making the exact same changes for multiple leagues. Trust me, there is nothing worse than taking on too much at the start by attempting to predict every football game being played. For this example, we will use the English Premier League.
Open Microsoft Excel. It will become your best friend! Using a website such as WhoScored.com, Soccerway.com or Football-Data.co.uk, copy and paste all results from last season into a format that you can manipulate in Excel – for example:These results are the base data that help you get to the point where you can create your own odds. As more games are played, you will add these to this list of results, but we don’t need to think about that just yet. This is one of the first points where you need to decide how many results you want to use as an input into your calculation. Some people may use five games, others may use 10, whilst some may use data for the entire season. What you choose is up to you and this may be something you wish to tinker with as you refine your model. For this example, we will use all 38 games from the 2013/14 season.
If you’re good with Excel, you can use all of these results to calculate the next step. If you’re not good with formulas such as Sum Ifs and Count Ifs, then a shortcut is to create another table based on the final league table. The key things we are looking to capture is goals scored and goals conceded by teams in games at home and on the road. This will then be used to work out the total goals in the league, average goals in the league, in addition to average goals for and against per team.Goals For and Goals Against are simple Sums in Excel, whilst the two averages are worked out by dividing the total goals by the games played. For instance, Arsenal’s Average Goals is simply 36 / 19 = 1.89. The below shows two tables – one for teams at home and one for teams against – showing all of these calculations.
Now that we have these key stats, we can use them to calculate the attacking strength and defensive strength for each team. Again, this is a relatively simple thing to do and can be achieved by dividing Average Goals For or Average Goals Against by the league average.For example, to work out Arsenal’s home attacking strength, it would be 1.89 divided by 1.57 which equals 1.20 – this means that Arsenal score 20% more goals at home than the average team.As another example, to work out Aston Villa’s away defensive strength, you would divide 1.68 by 1.57 to give 1.07 – this shows that Villa have a worst defence than an average team as they concede 7% more goals.
If we repeat this calculation for all teams, we can work out the attacking and defensive strengths when playing at home and when playing away:
We now use this reference table of attacking and defensive strengths to calculate how many goals we expect a team to score in a particular match – we call this the Goal Expectancy. It makes sense that a team like Aston Villa are likely to have higher goal expectancy against a team like Sunderland compared to a team like Arsenal. This is because of two main reasons – (1) Arsenal’s defence will be stronger than Sunderland’s, thus Villa will struggle to score, and (2) Sunderland’s attack will be weaker than Arsenal’s, so Villa are likely to concede less goals. These two factors create the Goal Expectancy metric, which can be worked out for any match. If we take Arsenal vs Aston Villa at the Emirates Stadium as an example, we can see that Arsenal would be expected to score an average 2.02 goals to Villa’s 0.53 goals:
Home Team Goal Expectancy: home attacking strength (1.20) x away defensive strength (1.07) x average goals home (1.57) = 2.02
Away Team Goal Expectancy: away attacking strength (1.16) x home defensive strength (0.48) x average goals away (0.96) = 0.53
Hopefully you are still with me…if not, go back and read again. If you are, then great, let’s continue!What we now need to do is use the Poisson Distribution in Excel to calculate the probability of all possible scorelines for the hypothetical Arsenal vs Aston Villa game. The best way I’ve found of doing this is to set up a matrix with all possible scorelines from 0-0 to 10-10. Again, you could decide to change this and continue up to 15-15, or even stop at 8-8 if you think it is unlikely a team will score more than 8 goals.In Microsoft Excel, the Poisson Distribution formula is:
Poisson = (x, mean, cumulative)
Where: x = Number of goals Mean = the probability of that team scoring a goal i.e. goal expectancy Cumulative = Is set to FALSE, so that the formula returns a value exactly equal to x (number of goals)
Obviously we don’t have cell references in this example as you’d find in Excel, but the formula should still make sense. If we use 0-0 as an example, the Poisson Distribution formula would look like this:
If we add values this equates to =((POISSON(0, 2.02, FALSE)* POISSON(0, 0.53, FALSE)))*100
Which produces a 7.808% probability that the score will be 0-0
If we use the formula for all of these scorelines up to 10-10 and use a matrix, then something like this will be created. As you can see, the most likely scoreline is 2-0 to Arsenal (15.93% probability), closely followed by 1-0 to Arsenal (15.77% probability).
Should you enjoy betting on the Correct Score market, then the above table will give you a decent indication of expected scorelines. However, what we can do on top of this is create our own odds for common betting markets using these probabilities. For example:
Home Win: If you add up the probability of all results where the home team wins (e.g. 1-0, 2-0, 2-1, 3-2 etc) then you will have the overall likelihood of a Home Win.
Under 2.5 Goals: If you add up the probability of all scorelines which have less than 3 goals in the game (i.e. 0-0, 1-1, 1-0, 0-1, 2-0, 0-2), then you have the overall probability of Under 2.5 match goals.
Both Teams To Score Yes: If you add up the probability of all scorelines in which both teams find the back of the net (e.g. 1-1, 2-1, 3-1, 2-2 etc), you are left with the probability of Both Teams To Score.
In the search for value, you may also consider looking at other markets which are goal based. For example, Over / Under 1.5 Goals, Team to Win to Nil, Double Chance (win and draw) or Asian Handicap, although the latter does require a bit more work. However, the below table gives the probability of a few of the most common markets by using the principle of the above bullet points:
The next step is to turn the probability into decimal odds. If you remember a previous article when we discussed probability and odds , you will – or should – remember that the formula to turn decimal odds into probabilities is (1 / Decimal Odds) x 100. To convert from probabilities into decimal odds, just do the reverse, i.e. 100 / probability. The table below shows the associated odds for these probabilities:
Remember that bookies include an edge – called an overround – when they work out there odds so that they can guarantee profit. It is therefore important to add this margin into your odds to best reflect this overround. The margin you choose is up to you – it could be from 5% up to 20% – but for this example we will use 7.5%. Simply multiply the true odds by the margin, for example, Odds x 1.075. The table below shows the new odds with the margin included:
Now comes the fun part; deciding on where to place your bets. You have your own odds and now need to compare these against the odds from bookmakers. This is the core of value betting which was discussed in a previous article . As a recap, value betting is all about looking for opportunities where you feel that the bookies’ are offering higher odds than you’d expect. As you have your own odds and can easily find those from your favourite bookie, it is a simple process of comparing the two and seeing where the bookies’ odds are higher. If you get value, then you bet – although you may wish to qualify out some bets by doing additional research to see which players are injured, motivational factors, played in Europe mid-week etc.In your spreadsheet, add an additional two columns – one for the bookies’ odds and one for whether to bet or not. Manually add the bookies’ odds to the spreadsheet, then go through seeing where there is value. The below is used for illustrative purposes, but gives you an indication of the type of thing you may find when comparing the odds. In this scenario, the value is with backing the draw, backing the Aston Villa win and backing Under 2.5 Goals.
That’s it, your predictive model is complete. Now go make some profit!Actually, it’s not that easy. As mentioned earlier, you may go through a process of tinkering if things don’t look quite right, or after a while of monitoring the results to try and get more accurate.If everything looks OK with the model, you will then need to expand it to include the same calculations for every other game in that league. At the moment, we are only calculating the odds and bets for one match. Rather than use the same table, it makes sense to set up another 9 of these tables so that you can do one for each league game.
It will be time consuming to start with, but try to get to a point where the spreadsheet can be as automated as possible – a version 2.0 if you will. For instance, with my model, I input the fixtures, then the attacking and defensive metrics are calculated automatically. This is then pulled through to another sheet where the Poission Distribution formula calculates all of the odds. I then only need to manually add the odds for all of the games directly from the bookies, then the spreadsheet tells me which bets to place.I have also expanded my model to include Premier League, La Liga, Serie A and Bundesliga fixtures, as well as used it previously for MLS and Brazil Serie A. The model stays the same, the only difference is the inputs.
As games progress and results are known, you will need to include these in your calculations. If your model is working on data from last season and not including data from this season, then it is likely to be out of date. In Steps 2 and 3, we used either a list of results or the league table to work out the numbers of goals and averages. You need to find a way of incorporating these most recent results in your calculations. I simply add these to my list of results and ensure the formulas cover the new results. You may also choose to remove old results that you deem to be too long ago and now redundant. For example, if your model is based on 38 games (19 home, 19 away), then you would need to add the most recent home game whilst deleting the oldest home game to keep it at 19.
And there it is, your own predictive football model. Obviously I will give a couple of caveats at this point as no predictive model can be spot on or take into account every factor in the world. Some like Poisson Distribution, others don’t. I’ve personally found that it has been profitable for me over the last season, but that’s not to say that it will continue to be or that there isn’t a better method out there. A few points to consider are:
The model uses past data to predict future results. The accuracy of this method is open to debate. Does something that happened 6 months ago with different players in different weather conditions really help us understand what will happen?
In this scenario, the model is also based on last season’s information – players and managers come and go, so the Manchester United under David Moyes could be very different to the Manchester United under Louis van Gaal. Similarly, will Liverpool be as free-scoring without the talents of Luis Suarez? You may wish to wait for a few games to have been played in the new season before betting to ensure that things are in line with your expectations.
The only real factor that this approach takes into account is the result. We’ve all seen plenty of games where a team dominated a match but only won 1-0. Or even the odd situation where the dominant team lost the match via a goal on the counter attack. Match results tell us the final score, but do not tell us what actually happened during the game.
The model is objective, which means it does not take into account other factors. However, as we know, a lot of things can affect a game, both before and during. A model such as this does not take into account things like injuries, suspensions, fatigue or weather which could affect the predictions prior to the game. Similarly, it is believed that goal expectation is affected by factors that happen in the game, such as an away goal or a red card.
It is also believed the probability of draws and the probability of zero is underestimated when using Poisson Distribution to predict football games. This can however be rectified by using a method known as zero-inflation to increase the probability of no goals.
I hope that this has been useful and you have plenty of hours of fun with your new spreadsheet. Remember, always check and double check the figures, do your research, don’t bet what you can’t afford to lose and ask questions should your model be too dissimilar to the market as that could indicate an issue. Happy betting!