Every year, in European football, takes place the UEFA Champions League. This competition is the most important for the European clubs. The best teams of each championship face each other in order to win the “big ears cup”. The first three months correspond to the pool phase. 32 teams are in competition, issued from the first places of their respective championships or winners of the preliminaries rounds. The teams are separated in eight groups of four, chosen by draw, according to some rules:
• One team of each “hat”, where the hat represent the quality of the team with coefficients from the five previous years.
• Only one team per country in the same group.
• Because of TV rights, we need to separate teams from the same country. For example, in Spain, FC. Barcelona may be in groups A-B-C-D and Real Madrid in groups E-F-G-H.
Then, there are six games in each group where every team faces the others at home and away. To classify them, the number of points is used (3 for a victory, 1 for a draw) or the particular difference in case of equality (victories, goals difference and away goals). The two first continue in Champions league with the first as seed, the third one goes to Europa League and the fourth is eliminated. After that, it will be direct eliminations with two-way games.
I have done a study on the pool phase of the season 2014-15. For that, I have taken 91 variables to determine the position in the group. I have also classified them to study different aspects.
In this first part, we will see the groups' data, I mean the principal data that are shown when we look at the rankings of the groups. We have the number of points, the number of victories, draws and loss, the goals for, against and the difference.
For the following graphs, we can see points for the different values taken. The index are the teams, ordered by groups and by rank. Four lines represent the average for each position. And the colours are there to show the position (black is for the firsts, red for the seconds, green for the thirds and blue for the fourths).
When we see the number of points, it respects the order in the group. That is totally consistent because the ranking is done with the number of points. This graph explains nothing, except that the ranking is highly correlated to the number of points.
We can see some differences between the groups, where Real Madrid dominated the second group, whereas the third one was more undecided.
We have a similar assessment when we look at the number of wins and loss:
The more we have wins, or the less we have loss, the more we have points. And it is the sign of the position in the group. Now let's see the number of draws:
That is surprising! We cannot predict the rank with the number of draws. Doing several draws has no effect on the chance of qualification. This analysis is confirmed with the correlation coefficient of only 0.095. If we try to have a continuous curve, thanks to a density representation, the result is the same, we cannot differentiate the positions:
Now, if we analyse the goals, we also can separate the positions:
To have a better position, we need to score more and concede less goals. Like that, we can win games and have more points, it seems logical. This point can be confirmed with the goal difference, which is goals for minus goals against.
To conclude, this part, we can say that the logic is confirmed. We are better placed if:
• We concede less goals
• We score more goals
• We win more
• We lose less
• We have more points
But we have also seen that the number of draws has nearly no influence on the final ranking, this conclusion is not evident at the first sight.
We will now study the results per game. For that, we compare the 6 games one by one. We will analyse the number of points, the goals for and against, the location (at home or away), but also the variables normalised: the number of points, divided by the total of points of the team, likewise for the goals.
The first assessment is that the hierarchy is mainly respected. We can even though see that the thirds score more at the first game and less during the second one, whereas it is the opposite for the seconds. The second day was the most beneficial for the fourths but less for the firsts.
When we look at the ratio, we can better see these differences (each day must have 0.167 of ratio). The best day was the first for the thirds, the second for the fourths, the third for the firsts and the fourth for the seconds. Every position has a favourite game to score.
For the goals against, it is more difficult to make a conclusion. The day 3 may be the best to conclude about the qualification or not (first and second together), but it doesn't seem significant. We can notice the firsts case where five of them have a favourite game to concede goals (1, 2, 4 or 5) but it is because they have conceded less goals (1 for Monaco, 2 for Real Madrid).
The number of points can be deduce from the goals. We mainly respect the hierarchy, except for the first game: this one shows an opposite result for the second place battle (thirds have more points than seconds).
With the ratio view (next), we can see that the fourths take most of their points during the second game, whereas the thirds take them during the first and fifth games. It is better divided for the first and second (they take more points).
Finally, if we study the location of each game, we can really see differences. This variable is worth 1 if the team plays at home, 0 if she plays away.
For example, the firsts play at home for the games 1, 3 and 6. But this has not a direct link with the final position. We will see later the present link between the hat and the position, and the places of the games are decided by the hat.
In conclusion for this part, we can say, with only the first four games, the different positions by studying the ratio of goals for. Otherwise the results just show the repartition between the games but don't really explain the final position.
We will see the data that make a team be better than the others. These are the variables that are used to show the quality of the team.
First, we will look at the goals in both half-time. For each half, it respect the hierarchy (as the total of goals for and against) but we can see some differences in the ratio (divided by the total of goals).
The goals are mainly scored during the second half, this result is true for all the teams. But for the goals conceded, we can separate the qualified and unqualified: The firsts and seconds take more of their goals during the second half, whereas the fourths have a ratio near 50%. We can explain that by an easing off: If the team lead 4-0, they are more able concede a goal at the end of the game.
The average minute for a goal can be significant. The seconds score firsts, and the hierarchy is respected for the others. We can say that the seconds are more able to stop playing before the end of the game.
If we analyse the number of penalties, we see that the qualified teams have more, but once more, it is due to the total of goals. If we look at the ratio, we see that it is not really significant to make a result, the fourths rate is distorted by APOEL Nicosia who scored once on a penalty.
If we look at the number of shots for the teams, the result is logical: if we shoot more, we are more able to score and to win the game. We could analyse the shots on post, stopped or on target but there is a lack of data on the UEFA website.
The number of passes attempted is representative of the position but it is linked with the possession (see after). A good team attempt more passes but also succeed more of these passes. The hierarchy is still respected but we see that the fourth teams miss more passes than the others.
The possession of the ball is a good indicator of the position. In France, there are often games where the winner does a hold-up: a bad possession and one goal during the match, but in Champions league, we cannot afford that, we need to have a good possession to have a better place. “Dominating is not winning” is false in European games.
The number of crosses also influence the position. We can see that crossing well can determine the second place. Thirds missed more of them, they have a ratio worse than the fourths one.
The result is quite the same with the corners: the fourths have more corners for than the thirds. But the number of corners is a rather good indicator of the position.
On the other hand, the number of fouls made is not representative of the position. If we do many fouls, we don't finish firsts nor lasts. The fouls suffered are not enough extensive to conclude on it.
Finally, we can see the number of bookings. Whereas the yellow ones can make the position, the red ones just tell us that the firsts take none and the fourths take more of them.
In conclusion of this part, we can say that the positions are deserved: The firsts dominated in most of the domains whereas the lasts were less powerful.
When we think of Barcelona, we think of Messi. Ronaldo for Madrid, Zlatan for Paris, Drogba for Chelsea, Gerrard for Liverpool, Sneijder for Galatasaray, Perrin for Saint-Étienne, Pirlo for Juventus, etc. Having a good player in the team can make a success, but is it the sign of a good team?
Running is not a sign of a good team, nor a sign of a bad team. In every team, there is a player that runs around 10.5 km per game. The short difference between the positions can be explained by turnovers of the players when the team lead.
The assessment is the same for the best shooter and passer of the team. Of course the best shooter score more in a team that score more. But the ratio shows us that it is not a sign of a good team. The ratio changes because fourths teams score few and on free kicks or individual actions (as Nicosia on one penalty).
To conclude: a good team is not directly due to a good player. Individualities don't make the result. Football stays a team sport, with eleven players, every one remains as important as the others.
We have seen fields' data that can explain the quality of the team, but now we will see variables that seem independent from the results.
The groups are made thanks to hats based on the UEFA rank. This is a sign of the quality of the five past years, but does it show the current quality?
In fact, it is. A team with a good recent past is a better team, that's why we always have the same teams in the final phase of the Champions league. We also can see how many teams has respect their initial rank:
22 teams out of 32 (69%) have respected their rank. Thanks to the graph on the right, we can see that five groups have respected totally this rank (groups A, B, F, G and H) whereas the group C (the one of Monaco and Benfica) is reversed.
That was for the recent history, but what about the old one?
The number of participation can explain the rank for the two qualified and two unqualified, but it doesn't explain why the team can be qualified. The year of creation can explain it: A young club is less able to be a good team and finishes often last. A football team is like wine, it enhance with time. As French, we also can notice that the youngest qualified is Paris SG and the youngest first is Monaco. French football sees more frequent moves in the best teams, it is hard to have a beautiful history and be actually good.
John Stein, a Scottish player and trainer said that “Football without fans is nothing”. Can we rule in his favour? Is the twelfth man as important as the eleven's others?
The size of the stadium explain the final position. With a big stadium, we finish firsts whereas a small one makes us finish lasts. If the size is medium, the crowd makes the difference. Of course there are more people in a big stadium, but the attendance percentage shows a difference of about 10% between seconds and thirds.
To finish, we will study the influence of the name of the team, and more precisely the presence of “FC”. There will be three FCB in the round of sixteen with Basel, Bayern Munich and Barcelona, this factor can be important.
Unbelievable but true! A club with “FC” is more likely to be qualified than a club without it. We also can see that, with this name, the median position is 1.9 (instead of 2.5 as expected). Among the 15 FC teams, 12 of them were qualified (out of 16 qualified teams). An innocent name is a factor of qualification. Could you imagine that?
To conclude this part, we can say that the history (and more specifically, the recent one) explains the position. We have confirmed John Stein's point of view (and the one of hundreds of thousands of fans around the world) about the importance of the fans. Finally, we have highlighted the importance of “FC” in the name of the club in view of the qualification, I don't think many people know that.
This report is there to make us understand the Champions league data. We have seen the important points that makes the ranking. We have confirmed the place of the goals and wins, but also the continuity during the days of competition. We have seen the role of each day in the final position (each position has his own day).
About the statistics, the position confirms the importance of being powerful in every domain, whether it be concerning the possession or the number of corners. The importance of a good player in the team was not confirmed with these data. The impact of the best player is not seen in the best striker, best passer nor best runner.
We also have seen surprising facts about the importance of a good (recent) history and numerous fans, but more surprising with the presence of “FC” in the name, and the absence of link between the number of draws and the position.
With these conclusion, a team can know how to be good in the future years. When we know that last year every firsts beat the seconds to access to the round of height, the first place is really the place that the teams should aim for.
To finish, here are the different correlations between the position and the other variables, ordered by their importance, followed by a graph to compare them:
Variable | Correlation |
---|---|
Points | -0.9185937397 |
Losts | 0.8967631239 |
Difference | -0.8942697709 |
Wins | -0.8793488715 |
Goals for | -0.7894636493 |
Goals against | 0.7779934020 |
Corners against | 0.7003642842 |
Goals against in first half | 0.6768985550 |
Hat | 0.6750000000 |
Goals for in first half | -0.6518171826 |
Percentage of possession | -0.6469441968 |
Goals for in second half | -0.6465182980 |
UEFA rank before the group stage | -0.6408776505 |
Goals against in second alf | 0.6363372345 |
Points on day 3 | -0.6330740706 |
Time of possession | -0.6283644820 |
Goals for on day 3 | -0.6164004621 |
Goals against on day 6 | 0.6061252864 |
Best passer | -0.5965587590 |
Pass attempted | -0.5952520265 |
Shots | -0.5893593531 |
Crowd | -0.5867043545 |
Points on day 6 | -0.5854656110 |
Corners for | -0.5528285026 |
Goals against on day 3 | 0.5486641476 |
Best striker | -0.5240601711 |
Stadium capacity | -0.5052886578 |
Goals against on day 4 | 0.5025189076 |
Crosses succeded | -0.4879025031 |
"FC" in the team name | -0.4760952286 |
Goals for on day 5 | -0.4729653291 |
Ratio of goals for on day 3 | -0.4686198678 |
Crosses attempted | -0.4595133553 |
Points on day 5 | -0.4550219882 |
Points on day 1 | -0.4529073595 |
Points on day 4 | -0.4448039039 |
Ratio of goals for on day 2 | 0.4342363516 |
Ratio of shots by the best striker | 0.4334293319 |
Points on day 2 | -0.4308143175 |
CreationYear | 0.4151015097 |
Goals for on day 6 | -0.4006590876 |
Percentage of successful passes | -0.3947214397 |
Goals against on day 5 | 0.3924605922 |
Goals against on day 1 | 0.3840844210 |
Goals for on day 4 | -0.3819143698 |
Ratio of shots blocked | 0.3722119977 |
Ratio of goals against on day 3 | 0.3608701287 |
Yellow cards | 0.3524372772 |
Penalties | -0.3286878676 |
Goals for on day 1 | -0.3278769448 |
Shots on goal post | -0.3233161507 |
Red cards | 0.3222516933 |
Goals against on day 2 | 0.3143026903 |
Goals for on day 2 | -0.2869720216 |
Ratio of points on day 6 | -0.2739387659 |
Ratio of points on day 2 | 0.2691652366 |
Ratio of successful crosses | -0.2566649266 |
Attendance percentage | -0.2520724300 |
Played at home on day 6 | -0.2520504151 |
Ratio of goals against during second half | -0.2325479491 |
Ratio of goals against during first half | 0.2325479491 |
Played at home on day 1 | -0.2236067977 |
Played at home on day 2 | 0.2236067977 |
Played at home on day 5 | 0.2236067977 |
Ratio of passes by best passer | -0.2191536889 |
Ratio of points on day 3 | -0.2088698143 |
Number of participations | -0.1944987186 |
Ratio of goals on penalties | 0.1839176720 |
Ratio of goals against on day 6 | 0.1665360216 |
Shots blocked | -0.1624419897 |
Ratio of goals for on day 5 | -0.1492584676 |
Fouls commited | 0.1473647101 |
Ratio of goals against on day 2 | -0.1464973049 |
Ratio of goals on post | -0.1458779669 |
Fouls received | -0.1391157741 |
Average minute of goals scored | 0.1327158077 |
Ratio of points on day 5 | -0.1319646309 |
Ratio of goals against on day 1 | -0.1291799654 |
Ratio of points on day 1 | 0.1227495572 |
Ratio of goals for on day 6 | -0.1133369394 |
Ratio of goals against on day 4 | -0.0960851477 |
Draws | 0.0951063937 |
Maximal distance run by one player | 0.0875209087 |
Ratio of goals scored in first half | -0.0808207353 |
Played at home on day 3 | 0.0559016994 |
Played at home on day 4 | -0.0559016994 |
RatGoalsFor2Half | 0.0287101335 |
Ratio of goals against on day 5 | -0.0163546280 |
Ratio of goals for on day 1 | -0.0101980901 |
Ratio of goals for on day 4 | 0.0093564754 |
Ratio of points on day 4 | 0.0009535543 |