• This website uses cookies. More information.
  • The This Is Anfield Forums community is moving to a new home. Click here for more information on the transition.

LFC Analytics/Data Thread

petergriffin2020

It's all for me grog
Joined
Jan 28, 2020
Messages
458
As mentioned in other threads, here's the first of (hopefully) many posts regarding analytics. I thought about doing a Premier League analytics thread but figured I would start with LFC...perhaps if there is enough interest we can start another one!

The First Chart: I wanted to see the number of matches Liverpool won (home and away) this year in the Premier League in which our xG was less than that of our opponent. There were four in total; three away (Southampton, Chelsea, and Wolves) and one home (Man City).



What is xG? Basically, it's the number of expected goals that a team would score based on the quality (and number) of chances they created (here's a more thorough explanation).

Each chance is assigned a value based on the % likelihood of it being scored; for example, a 35-yard pot shot might be assigned an xG of 0.03 (meaning that 3/100 shots from that position would be expected to score), while a tap-in from two yards might be 0.90 (meaning 90/100 shots from that position would be expected to score). xG does take into account different types of shots (header/volley/first-time ground finish/etc.).

In addition to assigning each chance a value between 0 and 1, a team's xG for a given match is tabulated by adding up all the chances and their xG. For example, if a team has 4 big chances that are all 0.25, 0.3, 0.35, and 0.4 xG, while the other team has 20 shots that are all 0.04, the first team will have a higher xG (0.25+0.30+0.35+0.4 = 1.3) than the second (0.04 x 20 = 0.8) even though the second created more chances.

Bottom Line: So, in our games against Southampton, Chelsea, Wolves, and City, they created chances that were either higher-quality, more, or both relative to ours. This makes sense, as:
- At Southampton we had Adrian to thank for keeping us level in the first half (even if he gave Ingsy the ball in the second)
- Chelsea battered us second half at the Bridge
- Wolves looked more likely to win the game before Bobby popped up with the late winner
- City — Bernardo/Trent handballs nonwithstanding — created a hatful of big chances but only scored one

These data came from FiveThirtyEight's immense repository of Premier League analytics. If you're a nerd like I am, I highly encourage you to check it out.

Below: @Arminius had expressed an interest in seeing the "under the hood" details (meaning the Python environment), so I've incorporated both the graph and most of the code used to generate it. You can view the image in the original (full) size by clicking on it.


















Nerds Only Section: So what does the code mean?

Programming can be daunting — I still suffer from impostor syndrome from time to time — but it really boils down to taking objects and manipulating objects based on conditions. So, basically, you take an object (in this case, a spreadsheet) and break it down, analyze it, and visualize it based on the objects that came before. It's kind of like working with building blocks: everything builds on what came before.

When you want to import a file (in this case, a spreadsheet) into Python, you essentially tell Python:

1. What "libraries" (modules) you want to use (I used Pandas, the analytics library; Numpy, the mathematical library; and Matplotlib, a data visualization library).
2. What file you want to examine (this becomes your first object).

When you import a spreadsheet, you can then view it in Python by calling the object name — I named it matches_BPL.

Once you see what you're working with, you take that file and subset it based on criteria. So, for LFC home matches, I created an object called matches_BPL_LFC1 in which I wanted to see all matches where team1 (the column name for Home Team) was Liverpool. Likewise, for LFC away matches, I created matches_BPL_LFC2, where team2 was Liverpool. Then, I took the two and combined them into an object called matches_BPL_LFC.

From there, matches_BPL_LFC was broken down into LFC_home_wins and LFC_away_wins, which was based on score1 in LFC_home_wins being greater than score2, and vice-versa for LFC_away_wins. From there, I created two subsets based on wins where our xG was lower than our opponents'. Then, I created a count of the number of games which matched that criteria (through using "len", which is a global Python function for counting). Penultimately, I created objects by defining colors by their RGB values (red is 1,0,0; blue is 0,0,1) and used those in the chart. Lastly, I created two bar graphs (in one chart) that showed the number of games where this proved to be the case.

Anyway, that's a start! Should you find something interesting to contribute, please feel free — the more insights the better.
 
Last edited:

Red_Jedi

Anfield kick about
Ad-free Member
Joined
May 30, 2017
Messages
2,031
Nice idea @petergriffin2020

I regularly look at five thirty eight and also find xG much better than shots on target.

The only issue I have with xG is that the team that is behind would usually push more and create more chances. Sometimes scoring early can skew the xG aspect. I think that’s what happened with the city game at home. Being 2-0 up at half time meant that the 2nd half they were going to come at us.

Do you have the data for the 2 games we’ve drawn and the 2 we’ve lost this season?
 

petergriffin2020

It's all for me grog
Joined
Jan 28, 2020
Messages
458
Nice idea @petergriffin2020

I regularly look at five thirty eight and also find xG much better than shots on target.

The only issue I have with xG is that the team that is behind would usually push more and create more chances. Sometimes scoring early can skew the xG aspect. I think that’s what happened with the city game at home. Being 2-0 up at half time meant that the 2nd half they were going to come at us.
I agree that, while better than SOG, it's not perfect by any stretch. Still provides an interesting metric to objectively evaluate (as best we can) scoring opportunities. We can take contrasting views from the flow of play/sentiment of the game/etc. but this at least provides some context that, yes, we did dominate as much as it appeared (or no we didn't!).

The only caveat with the City game is that they created a number of big chances, including two while 1-0 down, and a number of others in the first half. Of course, scoring so early (twice!) changed the complexion of the game, but I guess what I'm saying is that it's not as if all their big chances came at 2-0 down (or 3-0 down) in the second half.

City home game in depth: Went to find the original xG from understat...it varies slightly from the totals that 538 had, but I'm not too bothered. Anyway, this timing chart shows an interesting trend...they were ahead on xG until our second goal, then we were well ahead, then they didn't climb back to a higher xG until they went to 3-1. Final understat xG was 1.33 LFC to 1.48 MCFC.



Do you have the data for the 2 games we’ve drawn and the 2 we’ve lost this season?
Games lost/drawn: Of the four games in which we dropped points, our opponent finished the match with a higher xG than us in three (the one exception was the Man Utd game).

Here are a couple of charts. As always, click on them to see the full size:


Left: The number of dropped point games where we had a lower or higher xG than our opponent.
Right: The xGs of opponents and LFC.

In hindsight, on the whole, I feel this does make sense:
- United is a funny one. On the day, I felt United had the better of the play in our game against them, and was (and still am) perfectly content with that point (particularly considering we won the title :)). However, aside from the goal, they didn't have many big chances (their two "big" chances, in my mind, were Rashford and Fred's attempts for 2-0 with xGs of 0.03 and 0.05, respectively). FWIW, Rashford's opener was 0.57, while Lallana's equalizer was 0.60, so the quality of the chances that the goals were didn't influence the xG result that much.
- Watford away, we got battered, end of.
- Everton away, they had the two big chances at the end, but we had more chances; ergo, it somewhat evened out, although they just about shaded it 0.73-0.68
- City away, again, battered. Even our front three's "big" early chances had a combined xG of 0.25, while City's first three goals (4th technically as an OG didn't count, but the shot leading to it was 0.34) were 0.76 (pen), 0.16, and 0.34...so their four big goalscoring opportunities (which they took) were significantly higher probability shots than ours that we didn't take. Again, while we did have some chances, they were on the whole the better side, although it is up for debate how the game would have evolved had one of our early chances gone in.
 

petergriffin2020

It's all for me grog
Joined
Jan 28, 2020
Messages
458
Was curious to see xG for the United home game. Despite our 2-0 win and general dominance in play, the xG was a close 2.01-1.40. Below, the timing chart:



Liverpool: Salah's big miss at the start of the second half was 0.40; his goal in the 93rd minute for 2-0 was 0.26. Mane's miss 1-v-1 was 0.30. Virgil's header for 1-0 was 0.06.

United: Everyone talks about Martial's big chance at the start of the second half, and it was a big one (0.30), but it paled in comparison to Pereira's miss on half time (0.79!). That chance alone counted for more than half of their xG for the game. Good job it wasn't Rashford there!
 

petergriffin2020

It's all for me grog
Joined
Jan 28, 2020
Messages
458
Last post before bed, both because I need to sleep and because I don't want to burn myself out! This should be an ever-green, ever-evolving thread and I don't want to exhaust all the data in the first 12 hours (totally kidding, there's no way I can even fathom the amount of data out there). That said, I'll keep posting things as I think of them...I really need more hobbies.

Wanted to see what our max and min xGs this season were.

Minimum: Villa home (1.28) and Watford away (0.27).

Maximum: Both Leicester games! (4.39 home and 4.24 away.) In addition to the fact that both xG maximums were against the same team, what I found even more surprising is that we had a higher xG in the home game, which was a much narrower margin of victory. Then again, the home game saw us go ahead, create loads of chances, get pegged back, before winning in added time, while the away game was 1-0 for a while but got blown open in the last 20 minutes. Crazy enough, Leicester's xGs were 0.12 and 0.14, respectively, meaning we could have, in theory, won both games by four clear goals!



In the screenshot, I left the coding environment up to display what a spreadsheet looks like rendered through Pandas (in Python and other coding environments it's called a dataframe). The preview mode omits a number of central columns, but a big advantage to Python (in addition to being generally faster and much better-equipped to handle large data sets) vs. Excel is that you can examine, visualize, and subset data without having to consistently change screens/filters/etc. For example, upon returning to work, I was asked to examine 12 months of financial data for a client (12 spreadsheets x 3,000 rows = 36,000 rows!). Instead of copy-pasting into a single Excel document and then filtering a bunch of subsets, I imported all 12 months, combined them into a single dataframe, subset the dataframe, and then visualized it...all in one place. I felt fortunate to have learned some Python while out, as the project was way easier than if I had done it solely in Excel...efficiency is king!
 

Red_Jedi

Anfield kick about
Ad-free Member
Joined
May 30, 2017
Messages
2,031
Wow. Thats a bit heavy for my small dense head.

I remember reading an article at the end of the 2018 season - Champions League final vs Real Madrid and also 4th place again in league.
This article was all about the xG - and it suggested that we should have been 2nd in the league. Our 4th place finish was below our performance (according to xG).

This article really made be believe even more that Jurgen would get us competing with the likes of City. I still didn't believe we would topple city the way we did this season - but whatever set up and performance levels Jurgen was getting from the squad was in the right direction.

Really interesting to see what we do next season. Man Utd and Chelsea have improved as the season has progressed. No doubt City will plug the holes in their team. I cant see us being very active with any significant incomings this summer - unless we can take Nike's website down, and the shirts are out of stock within weeks - meaning that they need to deliver Mbappe.

Must be so nice to analyse an almost perfect season.
 

petergriffin2020

It's all for me grog
Joined
Jan 28, 2020
Messages
458
Must be so nice to analyse an almost perfect season.
Makes it more fun, for sure!

I remember hearing that we perhaps could have finished higher in 17-18, but wasn't sure where the metric came from. I had heard about xG but didn't read much into it. That said, I do remember United fans (surprisingly) being aware that we were a proper side, and I think in hindsight it is surprising they finished second that year. Once 18-19 started, it was clear that — Chelsea's brief flurry of wins aside — we were the main challengers to City.
 

petergriffin2020

It's all for me grog
Joined
Jan 28, 2020
Messages
458
While we still have two games left to play at home, it is pretty staggering to think that we've won 17 to date this year. With that in mind, as well as the fact that almost half (14/30) of our victories have been by 1 goal, I was curious to see if there were any trends to identify in terms of margin of home victories relative to the quality of teams (measured by SPI).

I created three SPI buckets: one for teams with an SPI of <70, one for those between 70 and 80, and one for those 80+. I also created buckets for margin of victory: 2 goals plus, and 1 goal.

Below: Surprisingly, our largest number of 1 goal home victories have come against teams with an SPI <70.



Below: While teams with SPI <70 represents our largest single group of home opponents to date, it also represents the greatest % of one-goal wins relative to total wins: 3/7 (42% of) wins were by one goal. This is greater than 33% of total wins against SPI 70-80 (2/6) and 80+ (1/3).



Below: Lastly, for shits and giggles, I was curious to see what the average margin of victory per SPI bucket was. Surprisingly, it's the SPI 70-80 bucket with 2.16 goals.



It's tough to draw a lot of conclusions from these data, but still surprising to see that our closest home games have generally been against the "lower" sides. Of course, the United (SPI 80+) home game was 1-0 until added time, but so was Watford (SPI <70), so one could argue it all evens out. Either way, I'm sure I'll think of more patterns to analyze over the weekend. Let's see a good performance tomorrow...er, today!
 
Last edited:

Incognito

The Normal One
Joined
Jan 6, 2011
Messages
3,610
A good thing to probably check across is how much have we improved from our corners over the years, both attacking and defending wise. Maybe something on the lines of GA/ corners conceded and GF/ corners won from a simplistic view. Might be some other factors to consider as well?
 

petergriffin2020

It's all for me grog
Joined
Jan 28, 2020
Messages
458
A good thing to probably check across is how much have we improved from our corners over the years, both attacking and defending wise. Maybe something on the lines of GA/ corners conceded and GF/ corners won from a simplistic view. Might be some other factors to consider as well?
Good shout! I actually don’t have that in the data set I’m working with, but I’m sure I could find it quite easily. I have a number of things that I would like to analyze...you would be amazed at how many data sets are out there.
 

petergriffin2020

It's all for me grog
Joined
Jan 28, 2020
Messages
458
Not a fun result yesterday, but I will leave you with a simple chart to remind you all just how incredible this season has been.

Below: Liverpool F.C. — Premier League Champions 2019-2020 — results to date.
 

Prolix

Long Time Nemesis™
Ad-free Member
Joined
Sep 17, 2012
Messages
3,297
I work with a vast array of player data as part of the fantasy league that I run. Each season I've been working to tweak the scoring system to include more variables and provide a more complex / more accurate output of player contributions. One thing I would love to know from anyone versed in statistics is where I should set the cutoff point (in terms of minutes played) for including/excluding a player's data in the analysis.
 

petergriffin2020

It's all for me grog
Joined
Jan 28, 2020
Messages
458
Just got a new desktop and mechanical keyboard, so all the more motivation to do some more Python today. Between this and FIFA20 I have moved up in the world these last couple of weeks!

Below: Visualization of our goals in the Premier League this year for and against by quarter-hour. At present, we have totals of 76 scored (2.17/game) and 27 conceded (0.77/game). We had a memorable season in 2014 with 101 and 50, and while we are a bit short of that scoring total this year, the defensive performance — and the end result — makes for very different (and much better) reading!

As you can see, we score the most (19) just before halftime and concede the most (6) in the last 15 minutes.

That said, looking more at the last 15, we have:
- Not conceded a goal that's directly resulted in dropped points.
- Scored 16 goals, the second-most of any block.
- Directly gained 14 points. Mentality monsters!



Below: Visualizations of our multi-goal home (left) and away games.


Below: The average gaps between goals in minutes home (left) and away (right).


Below: The minimum gaps between goals in minutes home (left) and away (right). In this latest iteration, I realized it would make more sense to offset the 4-5 (a single game, Everton) by making it lighter in comparison.
 
Last edited:

petergriffin2020

It's all for me grog
Joined
Jan 28, 2020
Messages
458
Waiting until the end of the season to do a more full recap with loads of graphs and stats.

Spent a few months learning the basics of analytics with Python, now I am starting to get into machine learning (AI)...definitely not anywhere near pro status but maybe we'll have some (statistically-informed) predictions when next season rolls around (don't hold it against me if they're shite! ;)).
 

petergriffin2020

It's all for me grog
Joined
Jan 28, 2020
Messages
458
Here's the first of many end-of-season graphics — our overall record in the 2019-20 Premier League season. Makes for pretty good reading! :)

 

petergriffin2020

It's all for me grog
Joined
Jan 28, 2020
Messages
458


Here are the goal totals — again, not bad reading. I want to take a look at more individual stats but most of the datasets that examine them haven't been updated to include the final matchday, so will wait until they are out.
 

Zoran

Well-Known Member
Joined
Mar 8, 2007
Messages
19,355
Overall in all competitions, we played something like a 2-1 season. When you calculate goals scored/conceded with official first team games. Of course, we did concede quite a few goals with kids in domestic cups a few times. I think I counted around 11 games we won exactly 2-1 in all competitions. 9 in the league.
 

petergriffin2020

It's all for me grog
Joined
Jan 28, 2020
Messages
458
I take no shame in being a nerd! I'm pretty sure I used that exact word to describe why I did what I did this weekend — tinkering with our helpdesk API to design some (very basic) automated reports.

Told you all there would be more stats after the season was completed...and so here we are! I'll probably keep putting out more as I think of them, but this is the initial round.

Stat of the Season: I had opined some form of the statement "City don't seem to win a lot of close games this year" in the recent past. I looked it up: 14 of our 32 wins (43%) were by one goal...remarkable; for comparison, City won 6 games by one goal. That's 16 points — almost our winning margin — right there! If I could underline one factor that's won us the title, it's our resiliency, and this is it illustrated.



Below: Of our 14 one-goal wins, we pulled out more on the road (8) than at home (6). Mentality monsters.



Below: Much is made of how much we rely on our front 3 for offensive production. As the goalscoring chart indicates, there is certainly something to be said with that — but it really doesn't tell the full story in my opinion. Still, makes for pretty good reading.

S%OG = Shots as a % of SOG
G%SOG = Goals as a % of SOG
G%SH = Goals as a % of Shots



Despite his perceived wastefulness, Salah scored with 1 of every 5 shots — pretty impressive. Mane, however, scored 3 of every 11 shots! That is, to me, absurd!

Below:
Here's the top six teams in terms of goalscorers in the League's top 100 goalscorers. We are joint-top with City on 9; Chelsea (alongside West Ham and Villa?!) on 7; Saints on 6.



Below: We are also joint-top of the table in terms of unique goalscorers, with Arsenal on 17; Chelsea and Newcastle on 16; Leicester and Aston Villa on 14.



Fun Nuggets:
1. My Chelsea fan friend — who trolled me mercilessly after the slip — said in the last year "your team has ballers at every position." How times change! :) With that in mind — and with our joint-leading unique goalscorers and goalscorers in the top 100 — I thought it interesting that, of our 22 outfield players who made at least one Premier League appearance, only 3 of them had shots without scoring (Taki, Degsy, and Williams) while only 2 didn't have any shots (Gomez, Elliott).
2. In addition to being our 4th-highest goalscorer, Virgil was (correct me if I'm wrong) the highest-scoring defender in the division — level on goals with Foden, Ward-Prowse, Pepe, and Iheanacho, among others.
 

petergriffin2020

It's all for me grog
Joined
Jan 28, 2020
Messages
458
Despite their defensive issues, City only conceded two goals more than us in the League. That said, I had posited that they seemed to lose a lot of games after going 1-0 down. When 1-0 down, we won 5 games, drew 1, and suffered 2 losses; they won 3, drew 1, and lost 7!

Most tellingly, though, I think is the stat regarding "Net Points From In-Balance Result" which is the number of points won from losing positions minus the number of points lost from winning positions. We had a balance of +14; City's was +1.

 

petergriffin2020

It's all for me grog
Joined
Jan 28, 2020
Messages
458
Thought you nerds statisticians might enjoy this:

Thanks for the heads up! I totally agree with the guy who said that there's a lot that we as a team do that can't be captured through analytics...I just think about our cohesiveness as a unit is one thing in particular that can't be quantified, at least not through normal lenses.
 

[email protected]

Well-Known Member
Joined
Oct 19, 2014
Messages
4,028
Thought you nerds statisticians might enjoy this:

Stats are brilliant and all that but football itself is probably the least structured of sports and has so many intangibles so as to be virtually unpredictable. Teams can have 75% possession, 90% pass success, 35 shots in total with 20 on target and could still lose. xG can help but it simply cannot bridge that gap in unpredictability.

In other news water is wet.
 

petergriffin2020

It's all for me grog
Joined
Jan 28, 2020
Messages
458

Same guy as the article from The Guardian but a bit more Liverpool-tinged.

Also, I do intend to do some more stats...just been busy with work lately...much as doing LFC stats would be a dream job!
 

Noo Noo

Well-Known Member
Joined
Nov 10, 2014
Messages
5,719

Same guy as the article from The Guardian but a bit more Liverpool-tinged.

Also, I do intend to do some more stats...just been busy with work lately...much as doing LFC stats would be a dream job!
A quote from the article
" So when you look at that one, pound-for-pound quality-basis, Liverpool and Man City are pretty awesome and they don't seem to have made too many mistakes [in the market]."

WTF was he smoking? I think he needs to read up on City's transfer dealings.
 

petergriffin2020

It's all for me grog
Joined
Jan 28, 2020
Messages
458
I just found a treasure trove of new spreadsheets covering the PL season 2017-20 that should keep me busy for a few weeks. Big thing I wanted to examine was our goals for, goals against, and margin of goals vs. opponent — over the course of the three years as a whole and on the basis of individual seasons. And, I wanted to demonstrate how these changed over time by using trend-lines.

Below: Total goals per season the last three years for (left) and against (right). The scoring output went up by 5 between 17-18 and 18-19, but it regressed by 4 this year, so 19-20 was on +1 from 17-18. Still, given our finishing position, who really cares? As for the goals against, we averaged 1 per game in 17-18, 0.58 last year, and 0.87 this year. Pretty remarkable that our goals for and against were worse than both of last year's totals, but we won the League this year and not last year.





Below: Goals per game and goals against per game by game, 2017-20. Remarkable that the goals for trend line seems pretty straight! Meanwhile, there is a noticeable decline in goals against...some would call that the Virgil effect.



Below: Goals for per game, segmented by season.



Below: Goals against per game, segmented by season.



Below: Goal margin vs. opponent by game, 2017-20. Anything above 0 is a win; anything below 0 is a loss; 0 is a draw. We often hear about how we've only lost 9 games in 3 years, but this graphic really spells it out for me.



Below: Margin vs. opponent, segmented by season. 17-18 and 18-19 saw the margin of victory increase over the course of the year, while this year it declined.

Thinking about this year, there are some interesting conclusions to be drawn. Even accounting for the possibility of complacency post-title, it's interesting to think we were winning by more earlier in the year on the whole. On the one hand, I wasn't feeling great about us giving up pretty much a goal a game through November; on the other hand, we only gave up more than one once through mid-February, and even when we were conceding I figured we had a run of clean sheets in us if Gomez got in the team (as it proved!).

Lastly, I think our margin of victory tells an interesting story. The average goal margin in 17-18 was 1.21; it increased to 1.76 last year; this year it was 1.36. While our average margin this year was much closer to 17-18 than 18-19, we are certainly more defensively solid now than we were in 17-18; as such, even though we ground out a lot of games this year, we did so with a greater margin of victory, less goals conceded, and generally exerting more control. The maturity monsters.




Conclusion: In terms of performance, we were actually better last year in a number of areas. That said, our sheer ability to get results this year was obviously the differentiator — as I heard Melissa Reddy say in Germany's Greatest Export, the boys played every game like the title was on the line. The fact that we still beat last year's point haul even after quasi-coasting in the last 7 games is astounding!