#### petergriffin2020

##### Well-Known Member

- Joined
- Jan 28, 2020

- Messages
- 309

As mentioned in other threads, here's the first of (hopefully) many posts regarding analytics. I thought about doing a Premier League analytics thread but figured I would start with LFC...perhaps if there is enough interest we can start another one!

Each chance is assigned a value based on the % likelihood of it being scored; for example, a 35-yard pot shot might be assigned an xG of 0.03 (meaning that 3/100 shots from that position would be expected to score), while a tap-in from two yards might be 0.90 (meaning 90/100 shots from that position would be expected to score). xG does take into account different types of shots (header/volley/first-time ground finish/etc.).

In addition to assigning each chance a value between 0 and 1, a team's xG for a given match is tabulated by adding up all the chances and their xG. For example, if a team has 4 big chances that are all 0.25, 0.3, 0.35, and 0.4 xG, while the other team has 20 shots that are all 0.04, the first team will have a higher xG (0.25+0.30+0.35+0.4 = 1.3) than the second (0.04 x 20 = 0.8) even though the second created more chances.

- At Southampton we had Adrian to thank for keeping us level in the first half (even if he gave Ingsy the ball in the second)

- Chelsea battered us second half at the Bridge

- Wolves looked more likely to win the game before Bobby popped up with the late winner

- City — Bernardo/Trent handballs nonwithstanding — created a hatful of big chances but only scored one

These data came from FiveThirtyEight's immense repository of Premier League analytics. If you're a nerd like I am, I highly encourage you to check it out.

Programming can be daunting — I still suffer from impostor syndrome from time to time — but it really boils down to taking objects and manipulating objects based on conditions. So, basically, you take an object (in this case, a spreadsheet) and break it down, analyze it, and visualize it based on the objects that came before. It's kind of like working with building blocks: everything builds on what came before.

When you want to import a file (in this case, a spreadsheet) into Python, you essentially tell Python:

1. What "libraries" (modules) you want to use (I used Pandas, the analytics library; Numpy, the mathematical library; and Matplotlib, a data visualization library).

2. What file you want to examine (this becomes your first object).

When you import a spreadsheet, you can then view it in Python by calling the object name — I named it matches_BPL.

Once you see what you're working with, you take that file and subset it based on criteria. So, for LFC home matches, I created an object called matches_BPL_LFC1 in which I wanted to see all matches where team1 (the column name for Home Team) was Liverpool. Likewise, for LFC away matches, I created matches_BPL_LFC2, where team2 was Liverpool. Then, I took the two and combined them into an object called matches_BPL_LFC.

From there, matches_BPL_LFC was broken down into LFC_home_wins and LFC_away_wins, which was based on score1 in LFC_home_wins being greater than score2, and vice-versa for LFC_away_wins. From there, I created two subsets based on wins where our xG was lower than our opponents'. Then, I created a count of the number of games which matched that criteria (through using "len", which is a global Python function for counting). Penultimately, I created objects by defining colors by their RGB values (red is 1,0,0; blue is 0,0,1) and used those in the chart. Lastly, I created two bar graphs (in one chart) that showed the number of games where this proved to be the case.

Anyway, that's a start! Should you find something interesting to contribute, please feel free — the more insights the better.

**The First Chart:**I wanted to see the number of matches Liverpool won (home and away) this year in the Premier League in which our xG was less than that of our opponent. There were four in total; three away (Southampton, Chelsea, and Wolves) and one home (Man City).**What is xG?**Basically, it's the number of**expected**goals that a team would score based on the quality (and number) of chances they created (here's a more thorough explanation).Each chance is assigned a value based on the % likelihood of it being scored; for example, a 35-yard pot shot might be assigned an xG of 0.03 (meaning that 3/100 shots from that position would be expected to score), while a tap-in from two yards might be 0.90 (meaning 90/100 shots from that position would be expected to score). xG does take into account different types of shots (header/volley/first-time ground finish/etc.).

In addition to assigning each chance a value between 0 and 1, a team's xG for a given match is tabulated by adding up all the chances and their xG. For example, if a team has 4 big chances that are all 0.25, 0.3, 0.35, and 0.4 xG, while the other team has 20 shots that are all 0.04, the first team will have a higher xG (0.25+0.30+0.35+0.4 = 1.3) than the second (0.04 x 20 = 0.8) even though the second created more chances.

**Bottom Line:**So, in our games against Southampton, Chelsea, Wolves, and City, they created chances that were either higher-quality, more, or both relative to ours. This makes sense, as:- At Southampton we had Adrian to thank for keeping us level in the first half (even if he gave Ingsy the ball in the second)

- Chelsea battered us second half at the Bridge

- Wolves looked more likely to win the game before Bobby popped up with the late winner

- City — Bernardo/Trent handballs nonwithstanding — created a hatful of big chances but only scored one

These data came from FiveThirtyEight's immense repository of Premier League analytics. If you're a nerd like I am, I highly encourage you to check it out.

**Below:**@Arminius had expressed an interest in seeing the "under the hood" details (meaning the Python environment), so I've incorporated both the graph and most of the code used to generate it. You can view the image in the original (full) size by clicking on it.**Nerds Only Section: So what does the code mean?**Programming can be daunting — I still suffer from impostor syndrome from time to time — but it really boils down to taking objects and manipulating objects based on conditions. So, basically, you take an object (in this case, a spreadsheet) and break it down, analyze it, and visualize it based on the objects that came before. It's kind of like working with building blocks: everything builds on what came before.

When you want to import a file (in this case, a spreadsheet) into Python, you essentially tell Python:

1. What "libraries" (modules) you want to use (I used Pandas, the analytics library; Numpy, the mathematical library; and Matplotlib, a data visualization library).

2. What file you want to examine (this becomes your first object).

When you import a spreadsheet, you can then view it in Python by calling the object name — I named it matches_BPL.

Once you see what you're working with, you take that file and subset it based on criteria. So, for LFC home matches, I created an object called matches_BPL_LFC1 in which I wanted to see all matches where team1 (the column name for Home Team) was Liverpool. Likewise, for LFC away matches, I created matches_BPL_LFC2, where team2 was Liverpool. Then, I took the two and combined them into an object called matches_BPL_LFC.

From there, matches_BPL_LFC was broken down into LFC_home_wins and LFC_away_wins, which was based on score1 in LFC_home_wins being greater than score2, and vice-versa for LFC_away_wins. From there, I created two subsets based on wins where our xG was lower than our opponents'. Then, I created a count of the number of games which matched that criteria (through using "len", which is a global Python function for counting). Penultimately, I created objects by defining colors by their RGB values (red is 1,0,0; blue is 0,0,1) and used those in the chart. Lastly, I created two bar graphs (in one chart) that showed the number of games where this proved to be the case.

Anyway, that's a start! Should you find something interesting to contribute, please feel free — the more insights the better.

Last edited: