Football Match Prediction
Overview
The aim of this project was to predict the outcome of football matches. Currently, the project is predicting all Premier League and Championship games.
Data
The data all comes from api-football, for which the various data feeds are well documented here.
In the current version, we used the match fixtures API to get historical match results for the Premier League and Championship for the last 10 years, as well as to get the upcoming fixtures for the next week.
Flow
The first python recipe (compute_Leagues) is used to get the available leagues and their corresponding IDs from the API. This data is cleaned and filtered down to the leagues of interest to give the Leagues_prepared_filtered dataset.
These league IDs are then used to get all the corresponding historical fixtures (from beginning of 2010 season to yesterday inclusive), as well as the upcoming fixtures (from today for the next week inclusive), from the API.
We then use a custom developed plugin to compute the Elo ratings (wikipedia: Elo Ratings) for each team over the fixture history. Elo ratings originate from chess but also provide an accurate way of ranking football teams over time. We extract the most recent Elo ratings for each team from the history using SQL which can be joined to the upcoming fixtures.
We trained a simple logistic regression algorithm on these ranks from the fixtures history to predict the outcome of the game. This model is then used to score the upcoming fixtures to get the model predictions for the next week of fixtures. We also evaluate the model on the historical fixtures so we can trace the accuracay of historical predictions as well.
Automation
There are currently 5 scenarios: 4 of which are run daily in sequence starting at 0200 UTC, one of which is run weekly on a Sunday at 0400 UTC (in addition to one just to re-build the entire flow from scratch) which are used to automate the project:
Compute Latest Ranks (Daily, 0200 UTC) This updates the historical fixtures table with the latest results, recalculates the Elo ranks for the entire history and then extracts the most recent Elo rank for each team.
Get Upcoming Fixtures (Daily, after Compute Latest Ranks completes) This gets the upcoming fixtures for the next week including the current day.
Predict Historical Fixtures (Daily, after Get Upcoming Fixtures completes) This uses the model to evaluate all historical fixtures (get predictions and compare them against the result to see if they were true or false).
Predict Upcoming Fixtures (Daily, after Predict Historical Fixtures completes) This uses the model to predict all upcoming fixtures.
Retrain Model (Weekly, Sunday 0400 UTC) This retrains the model with the complete history of fixtures (including the latest week of fixture results).
Webapp
The predictions for the upcoming fixtures and also for the historical fixtures are served via a basic Flask webapp. The webapp provides an interface with buttons to get either the upcoming or historical fixtures. These buttons use JS to access a python backend to get the requested data.
Dashboard
The predictions are also served via a dashboard. The dashboard has three slides:
The first slide shows the current team rankings, the upcoming fixture predictions and the historical fixture predictions.
The second slide shows an overview of the model, including training information, model performance metrics, the confusion matrix and the prediction density distributions.
The third slide contains the web application described above.