Football Match Prediction
My football match prediction webapp running live on a Sunday evening in November 2019. This is a picture of an early version, but unfortunately is the only picture I still have… I later improved the performance and UI and got to around 70% accuracy over win/lose/draw predictions, but eventually came up against the hard truth that football games have a substantial component of randomness that is impossible to predict before the match begins, no matter how much you improve your algorithm and data!
Project Overview
The aim of this project was to predict the outcome of football matches. The final version provided up to date predictions of all Premier League and Championship games, but is no longer running due to the cost of the API data and hosting the webapp.
Data
The data comes from api-football, for which the various data feeds are well documented here.
In the current version, we used the match fixtures API to get historical match results for the Premier League and Championship for the last 10 years, as well as to get the upcoming fixtures for the next week.
Flow
The first python recipe (compute_Leagues) is used to get the available leagues and their corresponding IDs from the API. This data is cleaned and filtered down to the leagues of interest to give the Leagues_prepared_filtered dataset.
These league IDs are then used to get all the corresponding historical fixtures (from beginning of 2010 season to yesterday inclusive), as well as the upcoming fixtures (from today for the next week inclusive), from the API.
We then use a custom developed plugin to compute the Elo ratings (wikipedia: Elo Ratings) for each team over the fixture history. Elo ratings originate from chess but also provide an accurate way of ranking football teams over time. We extract the most recent Elo ratings for each team from the history using SQL which can be joined to the upcoming fixtures.
We trained a simple logistic regression algorithm on these ranks from the fixtures history to predict the outcome of the game. This model is then used to score the upcoming fixtures to get the model predictions for the next week of fixtures. We also evaluate the model on the historical fixtures so we can trace the accuracay of historical predictions as well.
Automation
There are currently 5 scenarios: 4 of which are run daily in sequence starting at 0200 UTC, one of which is run weekly on a Sunday at 0400 UTC (in addition to one just to re-build the entire flow from scratch) which are used to automate the project:
Compute Latest Ranks (Daily, 0200 UTC) This updates the historical fixtures table with the latest results, recalculates the Elo ranks for the entire history and then extracts the most recent Elo rank for each team.
Get Upcoming Fixtures (Daily, after Compute Latest Ranks completes) This gets the upcoming fixtures for the next week including the current day.
Predict Historical Fixtures (Daily, after Get Upcoming Fixtures completes) This uses the model to evaluate all historical fixtures (get predictions and compare them against the result to see if they were true or false).
Predict Upcoming Fixtures (Daily, after Predict Historical Fixtures completes) This uses the model to predict all upcoming fixtures.
Retrain Model (Weekly, Sunday 0400 UTC) This retrains the model with the complete history of fixtures (including the latest week of fixture results).
Webapp
The predictions for the upcoming fixtures and also for the historical fixtures are served via a basic Flask webapp. The webapp provides an interface with buttons to get either the upcoming or historical fixtures. These buttons use JS to access a python backend to get the requested data.
Dashboard
The predictions are also served via a dashboard. The dashboard has three slides:
The first slide shows the current team rankings, the upcoming fixture predictions and the historical fixture predictions.
The second slide shows an overview of the model, including training information, model performance metrics, the confusion matrix and the prediction density distributions.
The third slide contains the web application described above.