+ - 0:00:00
Notes for current slide
Notes for next slide

Leveling Up with the Tidyverse
(and hockey data)

Meghan Hall

Carnegie Mellon Sports Analytics Conference
October 25, 2020 | #CMSAC20

1

Me, Exceedingly Briefly

2

Me, Exceedingly Briefly

1️⃣  I work in higher ed as a data manager.



2

Me, Exceedingly Briefly

1️⃣  I work in higher ed as a data manager.



2️⃣  I dabble in hockey analysis.



2

Me, Exceedingly Briefly

1️⃣  I work in higher ed as a data manager.



2️⃣  I dabble in hockey analysis.



3️⃣  I use R a lot, thanks to 1️⃣ and 2️⃣, and love helping other people learn.



2

The Plan for Today

3

The Plan for Today

Do you need to know anything about hockey?

3

The Plan for Today

Do you need to know anything about hockey?

No.

3
4

The Plan for Today

Do you need to know anything about hockey?

No.



Do you need to know anything about coding?

5

The Plan for Today

Do you need to know anything about hockey?

No.



Do you need to know anything about coding?

Eh.

5

The Plan for Today

GOAL: "Level up your R programming and make your analysis more efficient."

6

The Plan for Today

GOAL: "Level up your R programming and make your analysis more efficient."


1️⃣  Data manipulation with tidyr and dplyr (and stringr and lubridate and purrr!)

6

The Plan for Today

GOAL: "Level up your R programming and make your analysis more efficient."


1️⃣  Data manipulation with tidyr and dplyr (and stringr and lubridate and purrr!)

2️⃣  User-defined functions

6

The Plan for Today

GOAL: "Level up your R programming and make your analysis more efficient."


1️⃣  Data manipulation with tidyr and dplyr (and stringr and lubridate and purrr!)

2️⃣  User-defined functions

3️⃣  Custom ggplot2 themes


6

The Plan for Today

GOAL: "Level up your R programming and make your analysis more efficient."


1️⃣  Data manipulation with tidyr and dplyr (and stringr and lubridate and purrr!)

2️⃣  User-defined functions

3️⃣  Custom ggplot2 themes


🕗  5-minute breaks at 5:00 and 5:30

6

The Plan for Today

GOAL: "Level up your R programming and make your analysis more efficient."


1️⃣  Data manipulation with tidyr and dplyr (and stringr and lubridate and purrr!)

2️⃣  User-defined functions

3️⃣  Custom ggplot2 themes


🕗  5-minute breaks at 5:00 and 5:30

❓  Questions!


6

The Plan for Today

GOAL: "Level up your R programming and make your analysis more efficient."


1️⃣  Data manipulation with tidyr and dplyr (and stringr and lubridate and purrr!)

2️⃣  User-defined functions

3️⃣  Custom ggplot2 themes


🕗  5-minute breaks at 5:00 and 5:30

❓  Questions!


HOW: Let's examine the composition of a power play.

6

Five-on-five

7

Five-on-four with three forwards

8

Five-on-four with four forwards

9

Two questions

10

Two questions

1️⃣  How does the positional composition change over the time of the power play?


(Is the four-forward configuration maybe more common at the beginning or the end of the power play?)


10

Two questions

1️⃣  How does the positional composition change over the time of the power play?


(Is the four-forward configuration maybe more common at the beginning or the end of the power play?)


2️⃣  Does a four-forward power play unit have a higher rate of zone exits?

10

Let's Get Set Up

11

If you want to follow along

12

If you want to follow along

12

If you want to follow along

# Load the package
devtools::install_github("meghall06/betweenthepipes")
library(betweenthepipes)
# Load the data
pbp <- pbp_example
bio <- bio_example
tracking <- track_example
12

Data sources

13

Data sources

Play-by-play data

  • pbp: NHL play-by-play data
  • Four Philadelphia Flyers games in November 2019
  • Scraped via the Evolving-Hockey R scraper
13

Data sources

game_id game_date event_index game_period game_seconds event_type event_description event_team event_length
2019020336 2019-11-21 1 1 0 PGSTR 0
2019020336 2019-11-21 2 1 0 PGEND 0
2019020336 2019-11-21 3 1 0 ANTHEM 0
2019020336 2019-11-21 4 1 0 PSTR Period Start- Local time: 7:08 EST 0
2019020336 2019-11-21 5 1 0 CHANGE PHI 0
2019020336 2019-11-21 6 1 0 CHANGE CAR 0
2019020336 2019-11-21 7 1 0 FAC PHI won Neu. Zone - PHI #28 GIROUX vs CAR #20 AHO PHI 21
2019020336 2019-11-21 8 1 21 GIVE PHI GIVEAWAY - #28 GIROUX, Neu. Zone PHI 3
2019020336 2019-11-21 9 1 24 CHANGE PHI 7
2019020336 2019-11-21 10 1 31 CHANGE PHI 2
14

Data sources

game_id game_date event_index game_period game_seconds event_type event_description event_team event_length
2019020336 2019-11-21 1 1 0 PGSTR 0
2019020336 2019-11-21 2 1 0 PGEND 0
2019020336 2019-11-21 3 1 0 ANTHEM 0
2019020336 2019-11-21 4 1 0 PSTR Period Start- Local time: 7:08 EST 0
2019020336 2019-11-21 5 1 0 CHANGE PHI 0
2019020336 2019-11-21 6 1 0 CHANGE CAR 0
2019020336 2019-11-21 7 1 0 FAC PHI won Neu. Zone - PHI #28 GIROUX vs CAR #20 AHO PHI 21
2019020336 2019-11-21 8 1 21 GIVE PHI GIVEAWAY - #28 GIROUX, Neu. Zone PHI 3
2019020336 2019-11-21 9 1 24 CHANGE PHI 7
2019020336 2019-11-21 10 1 31 CHANGE PHI 2
15

Data sources

game_id game_date event_index game_period game_seconds event_type event_description event_team event_length
2019020336 2019-11-21 1 1 0 PGSTR 0
2019020336 2019-11-21 2 1 0 PGEND 0
2019020336 2019-11-21 3 1 0 ANTHEM 0
2019020336 2019-11-21 4 1 0 PSTR Period Start- Local time: 7:08 EST 0
2019020336 2019-11-21 5 1 0 CHANGE PHI 0
2019020336 2019-11-21 6 1 0 CHANGE CAR 0
2019020336 2019-11-21 7 1 0 FAC PHI won Neu. Zone - PHI #28 GIROUX vs CAR #20 AHO PHI 21
2019020336 2019-11-21 8 1 21 GIVE PHI GIVEAWAY - #28 GIROUX, Neu. Zone PHI 3
2019020336 2019-11-21 9 1 24 CHANGE PHI 7
2019020336 2019-11-21 10 1 31 CHANGE PHI 2
16

Data sources

game_strength_state home_team away_team home_on_1 home_on_2 home_on_3 home_on_4 home_on_5 home_on_6
5v4 CAR PHI DOUGIE.HAMILTON JACCOB.SLAVIN JORDAN.STAAL PETR.MRAZEK RYAN.DZINGEL WARREN.FOEGELE
5v4 CAR PHI ANDREI.SVECHNIKOV DOUGIE.HAMILTON PETR.MRAZEK RYAN.DZINGEL SEBASTIAN.AHO TEUVO.TERAVAINEN
5v4 CAR PHI ANDREI.SVECHNIKOV DOUGIE.HAMILTON PETR.MRAZEK RYAN.DZINGEL SEBASTIAN.AHO TEUVO.TERAVAINEN
5v4 CAR PHI ANDREI.SVECHNIKOV DOUGIE.HAMILTON PETR.MRAZEK RYAN.DZINGEL SEBASTIAN.AHO TEUVO.TERAVAINEN
5v4 CAR PHI ANDREI.SVECHNIKOV DOUGIE.HAMILTON PETR.MRAZEK RYAN.DZINGEL SEBASTIAN.AHO TEUVO.TERAVAINEN
5v4 CAR PHI ANDREI.SVECHNIKOV DOUGIE.HAMILTON PETR.MRAZEK RYAN.DZINGEL SEBASTIAN.AHO TEUVO.TERAVAINEN
5v4 CAR PHI ANDREI.SVECHNIKOV DOUGIE.HAMILTON PETR.MRAZEK RYAN.DZINGEL SEBASTIAN.AHO TEUVO.TERAVAINEN
5v4 CAR PHI ANDREI.SVECHNIKOV DOUGIE.HAMILTON PETR.MRAZEK RYAN.DZINGEL SEBASTIAN.AHO TEUVO.TERAVAINEN
5v4 CAR PHI ANDREI.SVECHNIKOV DOUGIE.HAMILTON PETR.MRAZEK RYAN.DZINGEL SEBASTIAN.AHO TEUVO.TERAVAINEN
4v5 CAR PHI BRETT.PESCE BROCK.MCGINN JOEL.EDMUNDSON JORDAN.STAAL PETR.MRAZEK
17

Data sources

pbp %>%
count(game_id, game_date, home_team, away_team)
game_id game_date home_team away_team n
2019020336 2019-11-21 CAR PHI 586
2019020349 2019-11-23 PHI CGY 682
2019020367 2019-11-25 PHI VAN 606
2019020384 2019-11-27 CBJ PHI 627
18

Data sources

Play-by-play data

  • pbp: NHL play-by-play data
  • Four Philadelphia Flyers games in November 2019
  • Scraped via the Evolving-Hockey R scraper

Skater Data

  • bio: Demographic and position data for these players from 2019
  • Downloaded from Natural Stat Trick
19

Data sources

Player Position Age Date of Birth Birth City Birth State/Province Birth Country Nationality Height (in) Weight (lbs)
Adam Gaudette C 23 10/3/96 Braintree MA USA USA 73 170
Alan Quine C 27 2/25/93 Belleville ON CAN CAN 72 203
Alexander Edler D 34 4/21/86 Ostersund SWE SWE 75 212
Alexander Wennberg C 25 9/22/94 Stockholm SWE SWE 74 197
Alexander Yelesin D 24 2/7/96 Yaroslavl RUS RUS 72 195
Alexandre Texier C 20 9/13/99 St. Martin D'heres FRA FRA 72 192
Andrei Svechnikov R 20 3/26/00 Barnual RUS RUS 74 195
Andrew Mangiapane L 24 4/4/96 Toronto ON CAN CAN 70 184
Andrew Peeke D 22 3/17/98 Parkland FL USA USA 75 194
Andy Andreoff C 29 5/17/91 Pickering ON CAN CAN 73 203
20

Data sources

Player Position Age Date of Birth Birth City Birth State/Province Birth Country Nationality Height (in) Weight (lbs)
Adam Gaudette C 23 10/3/96 Braintree MA USA USA 73 170
Alan Quine C 27 2/25/93 Belleville ON CAN CAN 72 203
Alexander Edler D 34 4/21/86 Ostersund SWE SWE 75 212
Alexander Wennberg C 25 9/22/94 Stockholm SWE SWE 74 197
Alexander Yelesin D 24 2/7/96 Yaroslavl RUS RUS 72 195
Alexandre Texier C 20 9/13/99 St. Martin D'heres FRA FRA 72 192
Andrei Svechnikov R 20 3/26/00 Barnual RUS RUS 74 195
Andrew Mangiapane L 24 4/4/96 Toronto ON CAN CAN 70 184
Andrew Peeke D 22 3/17/98 Parkland FL USA USA 75 194
Andy Andreoff C 29 5/17/91 Pickering ON CAN CAN 73 203
21

Data sources

Play-by-play data

  • pbp: NHL play-by-play data
  • Four Philadelphia Flyers games in November 2019
  • Scraped via the Evolving-Hockey R scraper

Skater Data

  • bio: Demographic and position data for these players from 2019
  • Downloaded from Natural Stat Trick

Tracking Data

  • tracking: Zone exit data on the power play in these four games
  • Personally tracked by me!
22

Data sources

game_id game_period time event_type
2019020336 1 16:02:00 EXIT
2019020336 1 15:27:00 EXIT
2019020336 1 14:42:00 EXIT
2019020336 1 13:57:00 EXIT
2019020336 1 12:36:00 EXIT
2019020336 1 12:12:00 EXIT
2019020336 2 18:36:00 EXIT
2019020336 2 18:10:00 EXIT
2019020336 2 17:20:00 EXIT
2019020336 2 14:40:00 EXIT
23

Data sources

game_id game_period time event_type
2019020336 1 16:02:00 EXIT
2019020336 1 15:27:00 EXIT
2019020336 1 14:42:00 EXIT
2019020336 1 13:57:00 EXIT
2019020336 1 12:36:00 EXIT
2019020336 1 12:12:00 EXIT
2019020336 2 18:36:00 EXIT
2019020336 2 18:10:00 EXIT
2019020336 2 17:20:00 EXIT
2019020336 2 14:40:00 EXIT
24

Data sources

Play-by-play data

  • pbp: NHL play-by-play data
  • Four Philadelphia Flyers games in November 2019
  • Scraped via the Evolving-Hockey R scraper

Skater Data

  • bio: Demographic and position data for these players from 2019
  • Downloaded from Natural Stat Trick

Tracking Data

  • tracking: Zone exit data on the power play in these four games
  • Personally tracked by me!
25

Plan of attack

26

Plan of attack

🧐  Study your different data sources



27

Plan of attack

🧐  Study your different data sources



🔎  Find the relationships



27

Plan of attack

🧐  Study your different data sources



🔎  Find the relationships



📝  Sketch out how you want your data to look at the end

27

Problems we get to solve

❌  Player position data is in a different place



28

Problems we get to solve

❌  Player position data is in a different place



❌  Zone exit data is in a different place



28

Problems we get to solve

❌  Player position data is in a different place



❌  Zone exit data is in a different place



❌  One row per event instead of one row per second 🤔



28

Packages we'll be using

29

Packages we'll be using

⭐  My package (where the data is)

library(betweenthepipes)
29

Packages we'll be using

⭐  My package (where the data is)

library(betweenthepipes)

❤️  The tidyverse (aka dplyr, tidyr, ggplot2, stringr)

library(tidyverse)
29

Packages we'll be using

⭐  My package (where the data is)

library(betweenthepipes)

❤️  The tidyverse (aka dplyr, tidyr, ggplot2, stringr)

library(tidyverse)

🕦  For dealing with time (also part of the tidyverse)

library(lubridate)
29

Packages we'll be using

⭐  My package (where the data is)

library(betweenthepipes)

❤️  The tidyverse (aka dplyr, tidyr, ggplot2, stringr)

library(tidyverse)

🕦  For dealing with time (also part of the tidyverse)

library(lubridate)

🧹  For some data cleaning help

library(janitor)
29

Packages we'll be using

⭐  My package (where the data is)

library(betweenthepipes)

❤️  The tidyverse (aka dplyr, tidyr, ggplot2, stringr)

library(tidyverse)

🕦  For dealing with time (also part of the tidyverse)

library(lubridate)

🧹  For some data cleaning help

library(janitor)

🔂  For padding out our data

library(padr)
29

Play by play data

30

Play by play data

power <- pbp %>%
# filter to only the power play strength states
filter(game_strength_state %in% c("5v4", "4v5")) %>%
# filter to only events that have time associated with them
filter(event_length > 0) %>%
# create a new variable that designates the PP team
mutate(PP_team = ifelse(game_strength_state == "5v4",
home_team, away_team)) %>%
# create a new set of variables to determine who the PP players are
mutate(PP_1 = ifelse(home_team == PP_team, home_on_1, away_on_1),
PP_2 = ifelse(home_team == PP_team, home_on_2, away_on_2),
PP_3 = ifelse(home_team == PP_team, home_on_3, away_on_3),
PP_4 = ifelse(home_team == PP_team, home_on_4, away_on_4),
PP_5 = ifelse(home_team == PP_team, home_on_5, away_on_5),
PP_6 = ifelse(home_team == PP_team, home_on_6, away_on_6))
31

Play by play data

power <- pbp %>%
# filter to only the power play strength states
filter(game_strength_state %in% c("5v4", "4v5")) %>%
# filter to only events that have time associated with them
filter(event_length > 0) %>%
# create a new variable that designates the PP team
mutate(PP_team = ifelse(game_strength_state == "5v4",
home_team, away_team)) %>%
# create a new set of variables to determine who the PP players are
mutate(PP_1 = ifelse(home_team == PP_team, home_on_1, away_on_1),
PP_2 = ifelse(home_team == PP_team, home_on_2, away_on_2),
PP_3 = ifelse(home_team == PP_team, home_on_3, away_on_3),
PP_4 = ifelse(home_team == PP_team, home_on_4, away_on_4),
PP_5 = ifelse(home_team == PP_team, home_on_5, away_on_5),
PP_6 = ifelse(home_team == PP_team, home_on_6, away_on_6))
32

Play by play data

power <- pbp %>%
# filter to only the power play strength states
filter(game_strength_state %in% c("5v4", "4v5")) %>%
# filter to only events that have time associated with them
filter(event_length > 0) %>%
# create a new variable that designates the PP team
mutate(PP_team = ifelse(game_strength_state == "5v4",
home_team, away_team)) %>%
# create a new set of variables to determine who the PP players are
mutate(PP_1 = ifelse(home_team == PP_team, home_on_1, away_on_1),
PP_2 = ifelse(home_team == PP_team, home_on_2, away_on_2),
PP_3 = ifelse(home_team == PP_team, home_on_3, away_on_3),
PP_4 = ifelse(home_team == PP_team, home_on_4, away_on_4),
PP_5 = ifelse(home_team == PP_team, home_on_5, away_on_5),
PP_6 = ifelse(home_team == PP_team, home_on_6, away_on_6))
33

Play by play data

power <- pbp %>%
# filter to only the power play strength states
filter(game_strength_state %in% c("5v4", "4v5")) %>%
# filter to only events that have time associated with them
filter(event_length > 0) %>%
# create a new variable that designates the PP team
mutate(PP_team = ifelse(game_strength_state == "5v4",
home_team, away_team)) %>%
# create a new set of variables to determine who the PP players are
mutate(PP_1 = ifelse(home_team == PP_team, home_on_1, away_on_1),
PP_2 = ifelse(home_team == PP_team, home_on_2, away_on_2),
PP_3 = ifelse(home_team == PP_team, home_on_3, away_on_3),
PP_4 = ifelse(home_team == PP_team, home_on_4, away_on_4),
PP_5 = ifelse(home_team == PP_team, home_on_5, away_on_5),
PP_6 = ifelse(home_team == PP_team, home_on_6, away_on_6))
34

Play by play data

power <- pbp %>%
# filter to only the power play strength states
filter(game_strength_state %in% c("5v4", "4v5")) %>%
# filter to only events that have time associated with them
filter(event_length > 0) %>%
# create a new variable that designates the PP team
mutate(PP_team = ifelse(game_strength_state == "5v4",
home_team, away_team)) %>%
# create a new set of variables to determine who the PP players are
mutate(PP_1 = ifelse(home_team == PP_team, home_on_1, away_on_1),
PP_2 = ifelse(home_team == PP_team, home_on_2, away_on_2),
PP_3 = ifelse(home_team == PP_team, home_on_3, away_on_3),
PP_4 = ifelse(home_team == PP_team, home_on_4, away_on_4),
PP_5 = ifelse(home_team == PP_team, home_on_5, away_on_5),
PP_6 = ifelse(home_team == PP_team, home_on_6, away_on_6))
35

Play by play data

game_id event_index PP_team PP_1 PP_2 PP_3 PP_4 PP_5 PP_6
2019020336 41 CAR ANDREI.SVECHNIKOV DOUGIE.HAMILTON PETR.MRAZEK RYAN.DZINGEL SEBASTIAN.AHO TEUVO.TERAVAINEN
2019020336 42 CAR ANDREI.SVECHNIKOV DOUGIE.HAMILTON PETR.MRAZEK RYAN.DZINGEL SEBASTIAN.AHO TEUVO.TERAVAINEN
2019020336 43 CAR ANDREI.SVECHNIKOV DOUGIE.HAMILTON PETR.MRAZEK RYAN.DZINGEL SEBASTIAN.AHO TEUVO.TERAVAINEN
2019020336 44 CAR ANDREI.SVECHNIKOV DOUGIE.HAMILTON PETR.MRAZEK RYAN.DZINGEL SEBASTIAN.AHO TEUVO.TERAVAINEN
2019020336 45 CAR ANDREI.SVECHNIKOV DOUGIE.HAMILTON PETR.MRAZEK RYAN.DZINGEL SEBASTIAN.AHO TEUVO.TERAVAINEN
2019020336 46 CAR ANDREI.SVECHNIKOV DOUGIE.HAMILTON PETR.MRAZEK RYAN.DZINGEL SEBASTIAN.AHO TEUVO.TERAVAINEN
2019020336 60 PHI BRIAN.ELLIOTT CLAUDE.GIROUX IVAN.PROVOROV JAMES.VAN RIEMSDYK MORGAN.FROST TRAVIS.KONECNY
2019020336 61 PHI BRIAN.ELLIOTT CLAUDE.GIROUX IVAN.PROVOROV JAMES.VAN RIEMSDYK MORGAN.FROST TRAVIS.KONECNY
2019020336 62 PHI BRIAN.ELLIOTT CLAUDE.GIROUX IVAN.PROVOROV JAMES.VAN RIEMSDYK MORGAN.FROST TRAVIS.KONECNY
2019020336 66 PHI BRIAN.ELLIOTT JAKUB.VORACEK MATT.NISKANEN OSKAR.LINDBLOM SEAN.COUTURIER SHAYNE.GOSTISBEHERE
36

37

Play by play data

pivot <- power %>%
# pick the variables that we want
select(game_id, event_index, PP_team, game_seconds, event_length,
home_goalie, away_goalie, PP_1:PP_6) %>%
# pivot the six player variables
pivot_longer(PP_1:PP_6,
names_to = "on_ice",
values_to = "player") %>%
# filter out the goalies
filter(player != home_goalie & player != away_goalie) %>%
# remove the now-unnecessary goalie variables
select(-c(home_goalie, away_goalie, on_ice))
38

Play by play data

pivot <- power %>%
# pick the variables that we want
select(game_id, event_index, PP_team, game_seconds, event_length,
home_goalie, away_goalie, PP_1:PP_6) %>%
# pivot the six player variables
pivot_longer(PP_1:PP_6,
names_to = "on_ice",
values_to = "player") %>%
# filter out the goalies
filter(player != home_goalie & player != away_goalie) %>%
# remove the now-unnecessary goalie variables
select(-c(home_goalie, away_goalie, on_ice))
39

Play by play data

game_id event_index PP_team game_seconds event_length home_goalie away_goalie on_ice player
2019020336 41 CAR 234 7 PETR.MRAZEK BRIAN.ELLIOTT PP_1 ANDREI.SVECHNIKOV
2019020336 41 CAR 234 7 PETR.MRAZEK BRIAN.ELLIOTT PP_2 DOUGIE.HAMILTON
2019020336 41 CAR 234 7 PETR.MRAZEK BRIAN.ELLIOTT PP_3 PETR.MRAZEK
2019020336 41 CAR 234 7 PETR.MRAZEK BRIAN.ELLIOTT PP_4 RYAN.DZINGEL
2019020336 41 CAR 234 7 PETR.MRAZEK BRIAN.ELLIOTT PP_5 SEBASTIAN.AHO
2019020336 41 CAR 234 7 PETR.MRAZEK BRIAN.ELLIOTT PP_6 TEUVO.TERAVAINEN
2019020336 42 CAR 241 24 PETR.MRAZEK BRIAN.ELLIOTT PP_1 ANDREI.SVECHNIKOV
2019020336 42 CAR 241 24 PETR.MRAZEK BRIAN.ELLIOTT PP_2 DOUGIE.HAMILTON
2019020336 42 CAR 241 24 PETR.MRAZEK BRIAN.ELLIOTT PP_3 PETR.MRAZEK
2019020336 42 CAR 241 24 PETR.MRAZEK BRIAN.ELLIOTT PP_4 RYAN.DZINGEL
2019020336 42 CAR 241 24 PETR.MRAZEK BRIAN.ELLIOTT PP_5 SEBASTIAN.AHO
2019020336 42 CAR 241 24 PETR.MRAZEK BRIAN.ELLIOTT PP_6 TEUVO.TERAVAINEN
40

Play by play data

game_id event_index PP_team game_seconds event_length home_goalie away_goalie on_ice player
2019020336 41 CAR 234 7 PETR.MRAZEK BRIAN.ELLIOTT PP_1 ANDREI.SVECHNIKOV
2019020336 41 CAR 234 7 PETR.MRAZEK BRIAN.ELLIOTT PP_2 DOUGIE.HAMILTON
2019020336 41 CAR 234 7 PETR.MRAZEK BRIAN.ELLIOTT PP_3 PETR.MRAZEK
2019020336 41 CAR 234 7 PETR.MRAZEK BRIAN.ELLIOTT PP_4 RYAN.DZINGEL
2019020336 41 CAR 234 7 PETR.MRAZEK BRIAN.ELLIOTT PP_5 SEBASTIAN.AHO
2019020336 41 CAR 234 7 PETR.MRAZEK BRIAN.ELLIOTT PP_6 TEUVO.TERAVAINEN
2019020336 42 CAR 241 24 PETR.MRAZEK BRIAN.ELLIOTT PP_1 ANDREI.SVECHNIKOV
2019020336 42 CAR 241 24 PETR.MRAZEK BRIAN.ELLIOTT PP_2 DOUGIE.HAMILTON
2019020336 42 CAR 241 24 PETR.MRAZEK BRIAN.ELLIOTT PP_3 PETR.MRAZEK
2019020336 42 CAR 241 24 PETR.MRAZEK BRIAN.ELLIOTT PP_4 RYAN.DZINGEL
2019020336 42 CAR 241 24 PETR.MRAZEK BRIAN.ELLIOTT PP_5 SEBASTIAN.AHO
2019020336 42 CAR 241 24 PETR.MRAZEK BRIAN.ELLIOTT PP_6 TEUVO.TERAVAINEN
41

Play by play data

game_id event_index PP_team game_seconds event_length home_goalie away_goalie on_ice player
2019020336 41 CAR 234 7 PETR.MRAZEK BRIAN.ELLIOTT PP_1 ANDREI.SVECHNIKOV
2019020336 41 CAR 234 7 PETR.MRAZEK BRIAN.ELLIOTT PP_2 DOUGIE.HAMILTON
2019020336 41 CAR 234 7 PETR.MRAZEK BRIAN.ELLIOTT PP_3 PETR.MRAZEK
2019020336 41 CAR 234 7 PETR.MRAZEK BRIAN.ELLIOTT PP_4 RYAN.DZINGEL
2019020336 41 CAR 234 7 PETR.MRAZEK BRIAN.ELLIOTT PP_5 SEBASTIAN.AHO
2019020336 41 CAR 234 7 PETR.MRAZEK BRIAN.ELLIOTT PP_6 TEUVO.TERAVAINEN
2019020336 42 CAR 241 24 PETR.MRAZEK BRIAN.ELLIOTT PP_1 ANDREI.SVECHNIKOV
2019020336 42 CAR 241 24 PETR.MRAZEK BRIAN.ELLIOTT PP_2 DOUGIE.HAMILTON
2019020336 42 CAR 241 24 PETR.MRAZEK BRIAN.ELLIOTT PP_3 PETR.MRAZEK
2019020336 42 CAR 241 24 PETR.MRAZEK BRIAN.ELLIOTT PP_4 RYAN.DZINGEL
2019020336 42 CAR 241 24 PETR.MRAZEK BRIAN.ELLIOTT PP_5 SEBASTIAN.AHO
2019020336 42 CAR 241 24 PETR.MRAZEK BRIAN.ELLIOTT PP_6 TEUVO.TERAVAINEN
42

Play by play data

pivot <- power %>%
# pick the variables that we want
select(game_id, event_index, PP_team, game_seconds, event_length,
home_goalie, away_goalie, PP_1:PP_6) %>%
# pivot the six player variables
pivot_longer(PP_1:PP_6,
names_to = "on_ice",
values_to = "player") %>%
# filter out the goalies
filter(player != home_goalie & player != away_goalie) %>%
# remove the now-unnecessary goalie variables
select(-c(home_goalie, away_goalie, on_ice))
43

Play by play data

pivot <- power %>%
# pick the variables that we want
select(game_id, event_index, PP_team, game_seconds, event_length,
home_goalie, away_goalie, PP_1:PP_6) %>%
# pivot the six player variables
pivot_longer(PP_1:PP_6,
names_to = "on_ice",
values_to = "player") %>%
# filter out the goalies
filter(player != home_goalie & player != away_goalie) %>%
# remove the now-unnecessary goalie variables
select(-c(home_goalie, away_goalie, on_ice))
44

Play by play data

game_id event_index PP_team game_seconds event_length player
2019020336 41 CAR 234 7 ANDREI.SVECHNIKOV
2019020336 41 CAR 234 7 DOUGIE.HAMILTON
2019020336 41 CAR 234 7 RYAN.DZINGEL
2019020336 41 CAR 234 7 SEBASTIAN.AHO
2019020336 41 CAR 234 7 TEUVO.TERAVAINEN
2019020336 42 CAR 241 24 ANDREI.SVECHNIKOV
2019020336 42 CAR 241 24 DOUGIE.HAMILTON
2019020336 42 CAR 241 24 RYAN.DZINGEL
2019020336 42 CAR 241 24 SEBASTIAN.AHO
2019020336 42 CAR 241 24 TEUVO.TERAVAINEN
45

Skater data

46

Skater data

bio <- bio %>%
# reformat the variable names
clean_names() %>%
# use two stringr functions to convert our player names to all
# uppercase and replace the first space with a period
mutate(player = str_to_upper(player),
player = str_replace(player, " ", ".")) %>%
# record our position variable into a 0/1 variable for forward
mutate(forward = ifelse(position == "D", 0, 1)) %>%
# select only these two variables
select(player, forward)
47

Skater data

bio <- bio %>%
# reformat the variable names
clean_names() %>%
# use two stringr functions to convert our player names to all
# uppercase and replace the first space with a period
mutate(player = str_to_upper(player),
player = str_replace(player, " ", ".")) %>%
# record our position variable into a 0/1 variable for forward
mutate(forward = ifelse(position == "D", 0, 1)) %>%
# select only these two variables
select(player, forward)
48

Skater data

Player Position Age Date of Birth Birth City Birth State/Province Birth Country Nationality Height (in) Weight (lbs)
Adam Gaudette C 23 10/3/96 Braintree MA USA USA 73 170
Alan Quine C 27 2/25/93 Belleville ON CAN CAN 72 203
Alexander Edler D 34 4/21/86 Ostersund SWE SWE 75 212
Alexander Wennberg C 25 9/22/94 Stockholm SWE SWE 74 197
Alexander Yelesin D 24 2/7/96 Yaroslavl RUS RUS 72 195
Alexandre Texier C 20 9/13/99 St. Martin D'heres FRA FRA 72 192
Andrei Svechnikov R 20 3/26/00 Barnual RUS RUS 74 195
Andrew Mangiapane L 24 4/4/96 Toronto ON CAN CAN 70 184
Andrew Peeke D 22 3/17/98 Parkland FL USA USA 75 194
Andy Andreoff C 29 5/17/91 Pickering ON CAN CAN 73 203
49

Skater data

player position age date_of_birth birth_city birth_state_province birth_country nationality height_in
Adam Gaudette C 23 10/3/96 Braintree MA USA USA 73
Alan Quine C 27 2/25/93 Belleville ON CAN CAN 72
Alexander Edler D 34 4/21/86 Ostersund SWE SWE 75
Alexander Wennberg C 25 9/22/94 Stockholm SWE SWE 74
Alexander Yelesin D 24 2/7/96 Yaroslavl RUS RUS 72
Alexandre Texier C 20 9/13/99 St. Martin D'heres FRA FRA 72
Andrei Svechnikov R 20 3/26/00 Barnual RUS RUS 74
Andrew Mangiapane L 24 4/4/96 Toronto ON CAN CAN 70
Andrew Peeke D 22 3/17/98 Parkland FL USA USA 75
Andy Andreoff C 29 5/17/91 Pickering ON CAN CAN 73
50

Skater data

bio <- bio %>%
# reformat the variable names
clean_names() %>%
# use two stringr functions to convert our player names to all
# uppercase and replace the first space with a period
mutate(player = str_to_upper(player),
player = str_replace(player, " ", ".")) %>%
# record our position variable into a 0/1 variable for forward
mutate(forward = ifelse(position == "D", 0, 1)) %>%
# select only these two variables
select(player, forward)
51

Skater data

player position age date_of_birth birth_city birth_state_province birth_country nationality
ADAM.GAUDETTE C 23 10/3/96 Braintree MA USA USA
ALAN.QUINE C 27 2/25/93 Belleville ON CAN CAN
ALEXANDER.EDLER D 34 4/21/86 Ostersund SWE SWE
ALEXANDER.WENNBERG C 25 9/22/94 Stockholm SWE SWE
ALEXANDER.YELESIN D 24 2/7/96 Yaroslavl RUS RUS
ALEXANDRE.TEXIER C 20 9/13/99 St. Martin D'heres FRA FRA
ANDREI.SVECHNIKOV R 20 3/26/00 Barnual RUS RUS
ANDREW.MANGIAPANE L 24 4/4/96 Toronto ON CAN CAN
ANDREW.PEEKE D 22 3/17/98 Parkland FL USA USA
ANDY.ANDREOFF C 29 5/17/91 Pickering ON CAN CAN
52

Skater data

player position age date_of_birth birth_city birth_state_province birth_country nationality
ADAM.GAUDETTE C 23 10/3/96 Braintree MA USA USA
ALAN.QUINE C 27 2/25/93 Belleville ON CAN CAN
ALEXANDER.EDLER D 34 4/21/86 Ostersund SWE SWE
ALEXANDER.WENNBERG C 25 9/22/94 Stockholm SWE SWE
ALEXANDER.YELESIN D 24 2/7/96 Yaroslavl RUS RUS
ALEXANDRE.TEXIER C 20 9/13/99 St. Martin D'heres FRA FRA
ANDREI.SVECHNIKOV R 20 3/26/00 Barnual RUS RUS
ANDREW.MANGIAPANE L 24 4/4/96 Toronto ON CAN CAN
ANDREW.PEEKE D 22 3/17/98 Parkland FL USA USA
ANDY.ANDREOFF C 29 5/17/91 Pickering ON CAN CAN
53

Skater data

bio <- bio %>%
# reformat the variable names
clean_names() %>%
# use two stringr functions to convert our player names to all
# uppercase and replace the first space with a period
mutate(player = str_to_upper(player),
player = str_replace(player, " ", ".")) %>%
# record our position variable into a 0/1 variable for forward
mutate(forward = ifelse(position == "D", 0, 1)) %>%
# select only these two variables
select(player, forward)
54

Skater data

bio <- bio %>%
# reformat the variable names
clean_names() %>%
# use two stringr functions to convert our player names to all
# uppercase and replace the first space with a period
mutate(player = str_to_upper(player),
player = str_replace(player, " ", ".")) %>%
# record our position variable into a 0/1 variable for forward
mutate(forward = ifelse(position == "D", 0, 1)) %>%
# select only these two variables
select(player, forward)
55

Skater data

player forward
ADAM.GAUDETTE 1
ALAN.QUINE 1
ALEXANDER.EDLER 0
ALEXANDER.WENNBERG 1
ALEXANDER.YELESIN 0
ALEXANDRE.TEXIER 1
ANDREI.SVECHNIKOV 1
ANDREW.MANGIAPANE 1
ANDREW.PEEKE 0
ANDY.ANDREOFF 1
56

Skater data

pivot_position <- pivot %>%
left_join(bio, by = "player")
57

Skater data

pivot_position <- pivot %>%
left_join(bio, by = "player")
game_id event_index PP_team game_seconds event_length player forward
2019020336 41 CAR 234 7 ANDREI.SVECHNIKOV 1
2019020336 41 CAR 234 7 DOUGIE.HAMILTON 0
2019020336 41 CAR 234 7 RYAN.DZINGEL 1
2019020336 41 CAR 234 7 SEBASTIAN.AHO 1
2019020336 41 CAR 234 7 TEUVO.TERAVAINEN 1
2019020336 42 CAR 241 24 ANDREI.SVECHNIKOV 1
2019020336 42 CAR 241 24 DOUGIE.HAMILTON 0
2019020336 42 CAR 241 24 RYAN.DZINGEL 1
2019020336 42 CAR 241 24 SEBASTIAN.AHO 1
2019020336 42 CAR 241 24 TEUVO.TERAVAINEN 1
57

Skater data

pivot_position %>%
get_dupes(game_id, event_index, player)
## # A tibble: 0 x 8
## # … with 8 variables: game_id <dbl>, event_index <dbl>, player <chr>,
## # dupe_count <int>, PP_team <chr>, game_seconds <dbl>, event_length <dbl>,
## # forward <dbl>
58

Skater data

fixes <- pivot_position %>%
filter(is.na(forward)) %>%
count(player) %>%
arrange(player)
59

Skater data

fixes <- pivot_position %>%
filter(is.na(forward)) %>%
count(player) %>%
arrange(player)
player n
ALEX.EDLER 9
ALEX.WENNBERG 4
CHRIS.TANEV 2
59

Skater data

bio %>%
filter(str_detect(player, 'EDLER|WENNBERG|TANEV'))
60

Skater data

bio %>%
filter(str_detect(player, 'EDLER|WENNBERG|TANEV'))
player forward
ALEXANDER.EDLER 0
ALEXANDER.WENNBERG 1
CHRISTOPHER.TANEV 0
60

Skater data

bio <- bio %>%
mutate(player = case_when(
player == "ALEXANDER.EDLER" ~ "ALEX.EDLER",
player == "ALEXANDER.WENNBERG" ~ "ALEX.WENNBERG",
player == "CHRISTOPHER.TANEV" ~ "CHRIS.TANEV",
TRUE ~ player))
61

Skater data

bio <- bio %>%
mutate(player = case_when(
player == "ALEXANDER.EDLER" ~ "ALEX.EDLER",
player == "ALEXANDER.WENNBERG" ~ "ALEX.WENNBERG",
player == "CHRISTOPHER.TANEV" ~ "CHRIS.TANEV",
TRUE ~ player))
pivot_position <- pivot %>%
left_join(bio, by = "player")
fixes <- pivot_position %>%
filter(is.na(forward)) %>%
count(player) %>%
arrange(player)
61

Skater data

bio <- bio %>%
mutate(player = case_when(
player == "ALEXANDER.EDLER" ~ "ALEX.EDLER",
player == "ALEXANDER.WENNBERG" ~ "ALEX.WENNBERG",
player == "CHRISTOPHER.TANEV" ~ "CHRIS.TANEV",
TRUE ~ player))
pivot_position <- pivot %>%
left_join(bio, by = "player")
fixes <- pivot_position %>%
filter(is.na(forward)) %>%
count(player) %>%
arrange(player)
player n
61

Play by play

game_id event_index PP_team game_seconds event_length player forward
2019020336 41 CAR 234 7 ANDREI.SVECHNIKOV 1
2019020336 41 CAR 234 7 DOUGIE.HAMILTON 0
2019020336 41 CAR 234 7 RYAN.DZINGEL 1
2019020336 41 CAR 234 7 SEBASTIAN.AHO 1
2019020336 41 CAR 234 7 TEUVO.TERAVAINEN 1
2019020336 42 CAR 241 24 ANDREI.SVECHNIKOV 1
2019020336 42 CAR 241 24 DOUGIE.HAMILTON 0
2019020336 42 CAR 241 24 RYAN.DZINGEL 1
2019020336 42 CAR 241 24 SEBASTIAN.AHO 1
2019020336 42 CAR 241 24 TEUVO.TERAVAINEN 1
62

63

Play by play

data <- pivot_position %>%
# first create a basic count variable for each player within a PP
group_by(game_id, event_index) %>%
mutate(on_ice = row_number()) %>%
ungroup() %>%
# pivot the players and positions
pivot_wider(names_from = "on_ice",
values_from = c("player", "forward")) %>%
# create a variable to count the number of forwards
mutate(PP_fwds = select(., forward_1:forward_5) %>%
rowSums(na.rm = TRUE)) %>%
# remove the now-unnecessary variables
select(-c(player_1:forward_5))
64

Play by play

data <- pivot_position %>%
# first create a basic count variable for each player within a PP
group_by(game_id, event_index) %>%
mutate(on_ice = row_number()) %>%
ungroup() %>%
# pivot the players and positions
pivot_wider(names_from = "on_ice",
values_from = c("player", "forward")) %>%
# create a variable to count the number of forwards
mutate(PP_fwds = select(., forward_1:forward_5) %>%
rowSums(na.rm = TRUE)) %>%
# remove the now-unnecessary variables
select(-c(player_1:forward_5))
65

Play by play

game_id event_index PP_team game_seconds event_length player forward on_ice
2019020336 41 CAR 234 7 ANDREI.SVECHNIKOV 1 1
2019020336 41 CAR 234 7 DOUGIE.HAMILTON 0 2
2019020336 41 CAR 234 7 RYAN.DZINGEL 1 3
2019020336 41 CAR 234 7 SEBASTIAN.AHO 1 4
2019020336 41 CAR 234 7 TEUVO.TERAVAINEN 1 5
2019020336 42 CAR 241 24 ANDREI.SVECHNIKOV 1 1
2019020336 42 CAR 241 24 DOUGIE.HAMILTON 0 2
2019020336 42 CAR 241 24 RYAN.DZINGEL 1 3
2019020336 42 CAR 241 24 SEBASTIAN.AHO 1 4
2019020336 42 CAR 241 24 TEUVO.TERAVAINEN 1 5
66

Play by play

data <- pivot_position %>%
# first create a basic count variable for each player within a PP
group_by(game_id, event_index) %>%
mutate(on_ice = row_number()) %>%
ungroup() %>%
# pivot the players and positions
pivot_wider(names_from = "on_ice",
values_from = c("player", "forward")) %>%
# create a variable to count the number of forwards
mutate(PP_fwds = select(., forward_1:forward_5) %>%
rowSums(na.rm = TRUE)) %>%
# remove the now-unnecessary variables
select(-c(player_1:forward_5))
67

Play by play

game_id event_index PP_team game_seconds event_length player_1 forward_1 player_2 forward_2
2019020336 41 CAR 234 7 ANDREI.SVECHNIKOV 1 DOUGIE.HAMILTON 0
2019020336 42 CAR 241 24 ANDREI.SVECHNIKOV 1 DOUGIE.HAMILTON 0
2019020336 43 CAR 265 2 ANDREI.SVECHNIKOV 1 DOUGIE.HAMILTON 0
2019020336 44 CAR 267 5 ANDREI.SVECHNIKOV 1 DOUGIE.HAMILTON 0
2019020336 45 CAR 272 4 ANDREI.SVECHNIKOV 1 DOUGIE.HAMILTON 0
2019020336 46 CAR 276 7 ANDREI.SVECHNIKOV 1 DOUGIE.HAMILTON 0
2019020336 60 PHI 313 33 CLAUDE.GIROUX 1 IVAN.PROVOROV 0
2019020336 61 PHI 346 10 CLAUDE.GIROUX 1 IVAN.PROVOROV 0
2019020336 62 PHI 356 1 CLAUDE.GIROUX 1 IVAN.PROVOROV 0
2019020336 66 PHI 357 22 JAKUB.VORACEK 1 MATT.NISKANEN 0
68

Play by play

data <- pivot_position %>%
# first create a basic count variable for each player within a PP
group_by(game_id, event_index) %>%
mutate(on_ice = row_number()) %>%
ungroup() %>%
# pivot the players and positions
pivot_wider(names_from = "on_ice",
values_from = c("player", "forward")) %>%
# create a variable to count the number of forwards
mutate(PP_fwds = select(., forward_1:forward_5) %>%
rowSums(na.rm = TRUE)) %>%
# remove the now-unnecessary variables
select(-c(player_1:forward_5))
69

Play by play

data <- pivot_position %>%
# first create a basic count variable for each player within a PP
group_by(game_id, event_index) %>%
mutate(on_ice = row_number()) %>%
ungroup() %>%
# pivot the players and positions
pivot_wider(names_from = "on_ice",
values_from = c("player", "forward")) %>%
# create a variable to count the number of forwards
mutate(PP_fwds = select(., forward_1:forward_5) %>%
rowSums(na.rm = TRUE)) %>%
# remove the now-unnecessary variables
select(-c(player_1:forward_5))
70

Play by play

game_id event_index PP_team game_seconds event_length PP_fwds
2019020336 41 CAR 234 7 4
2019020336 42 CAR 241 24 4
2019020336 43 CAR 265 2 4
2019020336 44 CAR 267 5 4
2019020336 45 CAR 272 4 4
2019020336 46 CAR 276 7 4
2019020336 60 PHI 313 33 4
2019020336 61 PHI 346 10 4
2019020336 62 PHI 356 1 4
2019020336 66 PHI 357 22 3
71

Problems we get to solve

✅  Player position data is in a different place



❌  Zone exit data is in a different place



❌  One row per event instead of one row per second 🤔

72

Tracking data

73

Tracking data

game_id game_period time event_type
2019020336 1 16:02:00 EXIT
2019020336 1 15:27:00 EXIT
2019020336 1 14:42:00 EXIT
2019020336 1 13:57:00 EXIT
2019020336 1 12:36:00 EXIT
2019020336 1 12:12:00 EXIT
2019020336 2 18:36:00 EXIT
2019020336 2 18:10:00 EXIT
2019020336 2 17:20:00 EXIT
2019020336 2 14:40:00 EXIT
74

Tracking data

tracking <- tracking %>%
# create a new variable that converts to m:s
mutate(time_new = str_sub(time, 1, 5),
time_new = ms(time_new),
# create a new variable for total seconds
seconds = minute(time_new)*60 + second(time_new),
# create a game_seconds variable to match what's in pbp data
game_seconds = case_when(
game_period == 1 ~ 1200 - seconds,
game_period == 2 ~ 2400 - seconds,
game_period == 3 ~ 3600 - seconds)) %>%
# remove now-unnecessary variables
select(-c(time, time_new, seconds, game_period))
75

Tracking data

tracking <- tracking %>%
# create a new variable that converts to m:s
mutate(time_new = str_sub(time, 1, 5),
time_new = ms(time_new),
# create a new variable for total seconds
seconds = minute(time_new)*60 + second(time_new),
# create a game_seconds variable to match what's in pbp data
game_seconds = case_when(
game_period == 1 ~ 1200 - seconds,
game_period == 2 ~ 2400 - seconds,
game_period == 3 ~ 3600 - seconds)) %>%
# remove now-unnecessary variables
select(-c(time, time_new, seconds, game_period))
76

Tracking data

game_id game_period time event_type time_new
2019020336 1 16:02:00 EXIT 16M 2S
2019020336 1 15:27:00 EXIT 15M 27S
2019020336 1 14:42:00 EXIT 14M 42S
2019020336 1 13:57:00 EXIT 13M 57S
2019020336 1 12:36:00 EXIT 12M 36S
2019020336 1 12:12:00 EXIT 12M 12S
2019020336 2 18:36:00 EXIT 18M 36S
2019020336 2 18:10:00 EXIT 18M 10S
2019020336 2 17:20:00 EXIT 17M 20S
2019020336 2 14:40:00 EXIT 14M 40S
77

Tracking data

tracking <- tracking %>%
# create a new variable that converts to m:s
mutate(time_new = str_sub(time, 1, 5),
time_new = ms(time_new),
# create a new variable for total seconds
seconds = minute(time_new)*60 + second(time_new),
# create a game_seconds variable to match what's in pbp data
game_seconds = case_when(
game_period == 1 ~ 1200 - seconds,
game_period == 2 ~ 2400 - seconds,
game_period == 3 ~ 3600 - seconds)) %>%
# remove now-unnecessary variables
select(-c(time, time_new, seconds, game_period))
78

Tracking data

game_id game_period time event_type time_new seconds
2019020336 1 16:02:00 EXIT 16M 2S 962
2019020336 1 15:27:00 EXIT 15M 27S 927
2019020336 1 14:42:00 EXIT 14M 42S 882
2019020336 1 13:57:00 EXIT 13M 57S 837
2019020336 1 12:36:00 EXIT 12M 36S 756
2019020336 1 12:12:00 EXIT 12M 12S 732
2019020336 2 18:36:00 EXIT 18M 36S 1116
2019020336 2 18:10:00 EXIT 18M 10S 1090
2019020336 2 17:20:00 EXIT 17M 20S 1040
2019020336 2 14:40:00 EXIT 14M 40S 880
79

Tracking data

tracking <- tracking %>%
# create a new variable that converts to m:s
mutate(time_new = str_sub(time, 1, 5),
time_new = ms(time_new),
# create a new variable for total seconds
seconds = minute(time_new)*60 + second(time_new),
# create a game_seconds variable to match what's in pbp data
game_seconds = case_when(
game_period == 1 ~ 1200 - seconds,
game_period == 2 ~ 2400 - seconds,
game_period == 3 ~ 3600 - seconds)) %>%
# remove now-unnecessary variables
select(-c(time, time_new, seconds, game_period))
80

Tracking data

game_id event_type game_seconds
2019020336 EXIT 238
2019020336 EXIT 273
2019020336 EXIT 318
2019020336 EXIT 363
2019020336 EXIT 444
2019020336 EXIT 468
2019020336 EXIT 1284
2019020336 EXIT 1310
2019020336 EXIT 1360
2019020336 EXIT 1520
81

Padding our data

82

What's the goal?

We want to expand our data: one row per event ➡️  one row per second



83

What's the goal?

We want to expand our data: one row per event ➡️  one row per second



So we can more easily:

1️⃣  Examine the average composition of a power play per second

2️⃣  Join in our manually-tracked data



83

What's the goal?

We want to expand our data: one row per event ➡️  one row per second



So we can more easily:

1️⃣  Examine the average composition of a power play per second

2️⃣  Join in our manually-tracked data



Thankfully, we have the padr package.

83

Padding our data

data <- data %>%
# create an 0/1 variable to identify a new power play
# and a power_play_count variable
mutate(new_power_play = ifelse(lag(game_seconds + event_length) ==
game_seconds & lag(PP_team) ==
PP_team, 0, 1),
new_power_play = ifelse(is.na(new_power_play),
1, new_power_play),
power_play_count = cumsum(new_power_play)) %>%
select(power_play_count, everything())
84

Padding our data

data <- data %>%
# create an 0/1 variable to identify a new power play
# and a power_play_count variable
mutate(new_power_play = ifelse(lag(game_seconds + event_length) ==
game_seconds & lag(PP_team) ==
PP_team, 0, 1),
new_power_play = ifelse(is.na(new_power_play),
1, new_power_play),
power_play_count = cumsum(new_power_play)) %>%
select(power_play_count, everything())
85

Padding our data

game_id event_index PP_team game_seconds event_length PP_fwds new_power_play
2019020336 41 CAR 234 7 4 1
2019020336 42 CAR 241 24 4 0
2019020336 43 CAR 265 2 4 0
2019020336 44 CAR 267 5 4 0
2019020336 45 CAR 272 4 4 0
2019020336 46 CAR 276 7 4 0
2019020336 60 PHI 313 33 4 1
2019020336 61 PHI 346 10 4 0
2019020336 62 PHI 356 1 4 0
2019020336 66 PHI 357 22 3 0
86

Padding our data

data <- data %>%
# create an 0/1 variable to identify a new power play
# and a power_play_count variable
mutate(new_power_play = ifelse(lag(game_seconds + event_length) ==
game_seconds & lag(PP_team) ==
PP_team, 0, 1),
new_power_play = ifelse(is.na(new_power_play),
1, new_power_play),
power_play_count = cumsum(new_power_play)) %>%
select(power_play_count, everything())
87

Padding our data

game_id event_index PP_team game_seconds event_length PP_fwds new_power_play power_play_count
2019020336 41 CAR 234 7 4 1 1
2019020336 42 CAR 241 24 4 0 1
2019020336 43 CAR 265 2 4 0 1
2019020336 44 CAR 267 5 4 0 1
2019020336 45 CAR 272 4 4 0 1
2019020336 46 CAR 276 7 4 0 1
2019020336 60 PHI 313 33 4 1 2
2019020336 61 PHI 346 10 4 0 2
2019020336 62 PHI 356 1 4 0 2
2019020336 66 PHI 357 22 3 0 2
88

Padding our data

data <- data %>%
# create an 0/1 variable to identify a new power play
# and a power_play_count variable
mutate(new_power_play = ifelse(lag(game_seconds + event_length) ==
game_seconds & lag(PP_team) ==
PP_team, 0, 1),
new_power_play = ifelse(is.na(new_power_play),
1, new_power_play),
power_play_count = cumsum(new_power_play)) %>%
select(power_play_count, everything())
89

Padding our data

power_play_count game_id event_index PP_team game_seconds event_length PP_fwds new_power_play
1 2019020336 41 CAR 234 7 4 1
1 2019020336 42 CAR 241 24 4 0
1 2019020336 43 CAR 265 2 4 0
1 2019020336 44 CAR 267 5 4 0
1 2019020336 45 CAR 272 4 4 0
1 2019020336 46 CAR 276 7 4 0
2 2019020336 60 PHI 313 33 4 1
2 2019020336 61 PHI 346 10 4 0
2 2019020336 62 PHI 356 1 4 0
2 2019020336 66 PHI 357 22 3 0
90

Let's do a test

sample <- data %>%
filter(power_play_count == 1) %>%
pad_int('game_seconds',
start_val = 234,
end_val = 283) %>%
fill(power_play_count, game_id, PP_team, PP_fwds,
.direction = "down")
91

Let's do a test

sample <- data %>%
filter(power_play_count == 1) %>%
pad_int('game_seconds',
start_val = 234,
end_val = 283) %>%
fill(power_play_count, game_id, PP_team, PP_fwds,
.direction = "down")
92

Let's do a test

power_play_count game_id event_index PP_team game_seconds event_length PP_fwds new_power_play
1 2019020336 41 CAR 234 7 4 1
1 2019020336 42 CAR 241 24 4 0
1 2019020336 43 CAR 265 2 4 0
1 2019020336 44 CAR 267 5 4 0
1 2019020336 45 CAR 272 4 4 0
1 2019020336 46 CAR 276 7 4 0
93

Let's do a test

power_play_count game_id event_index PP_team game_seconds event_length PP_fwds new_power_play
1 2019020336 41 CAR 234 7 4 1
1 2019020336 42 CAR 241 24 4 0
1 2019020336 43 CAR 265 2 4 0
1 2019020336 44 CAR 267 5 4 0
1 2019020336 45 CAR 272 4 4 0
1 2019020336 46 CAR 276 7 4 0
94

Let's do a test

power_play_count game_id event_index PP_team game_seconds event_length PP_fwds new_power_play
1 2019020336 41 CAR 234 7 4 1
1 2019020336 42 CAR 241 24 4 0
1 2019020336 43 CAR 265 2 4 0
1 2019020336 44 CAR 267 5 4 0
1 2019020336 45 CAR 272 4 4 0
1 2019020336 46 CAR 276 7 4 0
95

Let's do a test

sample <- data %>%
filter(power_play_count == 1) %>%
pad_int('game_seconds',
start_val = 234,
end_val = 283) %>%
fill(power_play_count, game_id, PP_team, PP_fwds,
.direction = "down")
96

Let's do a test

game_seconds power_play_count game_id event_index PP_team event_length PP_fwds new_power_play
234 1 2019020336 41 CAR 7 4 1
235
236
237
238
239
240
241 1 2019020336 42 CAR 24 4 0
242
243
97

Let's do a test

game_seconds power_play_count game_id event_index PP_team event_length PP_fwds new_power_play
234 1 2019020336 41 CAR 7 4 1
235
236
237
238
239
240
241 1 2019020336 42 CAR 24 4 0
242
243
98

Let's do a test

sample <- data %>%
filter(power_play_count == 1) %>%
pad_int('game_seconds',
start_val = 234,
end_val = 283) %>%
fill(power_play_count, game_id, PP_team, PP_fwds,
.direction = "down")
99

Let's do a test

game_seconds power_play_count game_id event_index PP_team event_length PP_fwds new_power_play
234 1 2019020336 41 CAR 7 4 1
235 1 2019020336 CAR 4
236 1 2019020336 CAR 4
237 1 2019020336 CAR 4
238 1 2019020336 CAR 4
239 1 2019020336 CAR 4
240 1 2019020336 CAR 4
241 1 2019020336 42 CAR 24 4 0
242 1 2019020336 CAR 4
243 1 2019020336 CAR 4
100

Let's do a test

game_seconds power_play_count game_id event_index PP_team event_length PP_fwds new_power_play
234 1 2019020336 41 CAR 7 4 1
235 1 2019020336 CAR 4
236 1 2019020336 CAR 4
237 1 2019020336 CAR 4
238 1 2019020336 CAR 4
239 1 2019020336 CAR 4
240 1 2019020336 CAR 4
241 1 2019020336 42 CAR 24 4 0
242 1 2019020336 CAR 4
243 1 2019020336 CAR 4
101

When is it time for a function?

102

When is it time for a function?

😬  (Don't be scared like me.)


102

When is it time for a function?

😬  (Don't be scared like me.)


📝  You've copied and pasted code more than twice.


102

When is it time for a function?

😬  (Don't be scared like me.)


📝  You've copied and pasted code more than twice.


🔁  You need to run multiple lines of code on different segments of data.

102

When is it time for a function?

😬  (Don't be scared like me.)


📝  You've copied and pasted code more than twice.


🔁  You need to run multiple lines of code on different segments of data.


# basic syntax of a function
name_of_function <- function(needed_value) {
# all the code goes here
}
102

Padding our data

padding_function <- function(count) {
each_power_play <- data %>%
filter(power_play_count == count)
minimum <- min(each_power_play$game_seconds)
maximum <- max(each_power_play$game_seconds +
each_power_play$event_length - 1)
padded <- each_power_play %>%
pad_int('game_seconds',
start_val = minimum,
end_val = maximum) %>%
fill(power_play_count, game_id, PP_team, PP_fwds,
.direction = "down")
}
103

Padding our data

padding_function <- function(count) {
each_power_play <- data %>%
filter(power_play_count == count)
minimum <- min(each_power_play$game_seconds)
maximum <- max(each_power_play$game_seconds +
each_power_play$event_length - 1)
padded <- each_power_play %>%
pad_int('game_seconds',
start_val = minimum,
end_val = maximum) %>%
fill(power_play_count, game_id, PP_team, PP_fwds,
.direction = "down")
}
104

Padding our data

padding_function <- function(count) {
each_power_play <- data %>%
filter(power_play_count == count)
minimum <- min(each_power_play$game_seconds)
maximum <- max(each_power_play$game_seconds +
each_power_play$event_length - 1)
padded <- each_power_play %>%
pad_int('game_seconds',
start_val = minimum,
end_val = maximum) %>%
fill(power_play_count, game_id, PP_team, PP_fwds,
.direction = "down")
}
105

Padding our data

padding_function <- function(count) {
each_power_play <- data %>%
filter(power_play_count == count)
minimum <- min(each_power_play$game_seconds)
maximum <- max(each_power_play$game_seconds +
each_power_play$event_length - 1)
padded <- each_power_play %>%
pad_int('game_seconds',
start_val = minimum,
end_val = maximum) %>%
fill(power_play_count, game_id, PP_team, PP_fwds,
.direction = "down")
}
106

Padding our data

padding_function <- function(count) {
each_power_play <- data %>%
filter(power_play_count == count)
minimum <- min(each_power_play$game_seconds)
maximum <- max(each_power_play$game_seconds +
each_power_play$event_length - 1)
padded <- each_power_play %>%
pad_int('game_seconds',
start_val = minimum,
end_val = maximum) %>%
fill(power_play_count, game_id, PP_team, PP_fwds,
.direction = "down")
}
107

Padding our data

A common scenario:


108

Padding our data

A common scenario:


✂️  Split the data into pieces


108

Padding our data

A common scenario:


✂️  Split the data into pieces


🔁  Apply a function to each piece


108

Padding our data

A common scenario:


✂️  Split the data into pieces


🔁  Apply a function to each piece


➕  Put it back together


108

Padding our data

A common scenario:


✂️  Split the data into pieces


🔁  Apply a function to each piece


➕  Put it back together


We can do this with the map() family of functions in the purrr package.

108

Padding our data

data_padded <-
map_df(unique(data$power_play_count), padding_function)
109

Padding our data

data_padded <-
map_df(unique(data$power_play_count), padding_function)

map_df has two arguments: the data and the function

109

Padding our data

data_padded <-
map_df(unique(data$power_play_count), padding_function)

map_df has two arguments: the data and the function

unique(data$power_play_count)

We're telling it to perform the function for each unique value of power_play_count in the data data frame

109

Padding our data

data_padded <-
map_df(unique(data$power_play_count), padding_function)

map_df has two arguments: the data and the function

unique(data$power_play_count)

We're telling it to perform the function for each unique value of power_play_count in the data data frame

It will apply our padding_function and return a data frame with the data padded

109

Padding our data

game_seconds power_play_count game_id event_index PP_team event_length PP_fwds new_power_play
234 1 2019020336 41 CAR 7 4 1
235 1 2019020336 CAR 4
236 1 2019020336 CAR 4
237 1 2019020336 CAR 4
238 1 2019020336 CAR 4
239 1 2019020336 CAR 4
240 1 2019020336 CAR 4
241 1 2019020336 42 CAR 24 4 0
242 1 2019020336 CAR 4
243 1 2019020336 CAR 4
110

Padding our data

game_seconds power_play_count game_id event_index PP_team event_length PP_fwds new_power_play
2266 27 2019020384 CBJ 3
2267 27 2019020384 CBJ 3
2268 27 2019020384 CBJ 3
2269 27 2019020384 CBJ 3
2270 27 2019020384 CBJ 3
2271 27 2019020384 381 CBJ 5 3 0
2272 27 2019020384 CBJ 3
2273 27 2019020384 CBJ 3
2274 27 2019020384 CBJ 3
2275 27 2019020384 CBJ 3
111

Padding our data

data_final <- data_padded %>%
group_by(power_play_count) %>%
# add in a new variable for the second of
# each power play, counting up from 1
mutate(power_play_second = row_number()) %>%
ungroup()
112

Padding our data

data_final <- data_padded %>%
group_by(power_play_count) %>%
# add in a new variable for the second of
# each power play, counting up from 1
mutate(power_play_second = row_number()) %>%
ungroup()
113

Padding our data

game_seconds power_play_count game_id event_index PP_team event_length PP_fwds new_power_play power_play_second
234 1 2019020336 41 CAR 7 4 1 1
235 1 2019020336 CAR 4 2
236 1 2019020336 CAR 4 3
237 1 2019020336 CAR 4 4
238 1 2019020336 CAR 4 5
239 1 2019020336 CAR 4 6
240 1 2019020336 CAR 4 7
241 1 2019020336 42 CAR 24 4 0 8
242 1 2019020336 CAR 4 9
243 1 2019020336 CAR 4 10
114

Padding our data

game_seconds power_play_count game_id event_index PP_team event_length PP_fwds new_power_play power_play_second
234 1 2019020336 41 CAR 7 4 1 1
235 1 2019020336 CAR 4 2
236 1 2019020336 CAR 4 3
237 1 2019020336 CAR 4 4
238 1 2019020336 CAR 4 5
239 1 2019020336 CAR 4 6
240 1 2019020336 CAR 4 7
241 1 2019020336 42 CAR 24 4 0 8
242 1 2019020336 CAR 4 9
243 1 2019020336 CAR 4 10
max(data_final$power_play_second)
## [1] 120
114

Problems we get to solve

✅  Player position data is in a different place



❌  Zone exit data is in a different place



✅  One row per event instead of one row per second 🤔

115

Tracking data

data_final <- data_final %>%
left_join(tracking, by = c("game_id", "game_seconds"))
116

Tracking data

game_seconds power_play_count game_id event_index PP_team event_length PP_fwds new_power_play power_play_second event_type
234 1 2019020336 41 CAR 7 4 1 1
235 1 2019020336 CAR 4 2
236 1 2019020336 CAR 4 3
237 1 2019020336 CAR 4 4
238 1 2019020336 CAR 4 5 EXIT
239 1 2019020336 CAR 4 6
240 1 2019020336 CAR 4 7
241 1 2019020336 42 CAR 24 4 0 8
242 1 2019020336 CAR 4 9
243 1 2019020336 CAR 4 10
117

Tracking data

game_seconds power_play_count game_id event_index PP_team event_length PP_fwds new_power_play power_play_second event_type
234 1 2019020336 41 CAR 7 4 1 1
235 1 2019020336 CAR 4 2
236 1 2019020336 CAR 4 3
237 1 2019020336 CAR 4 4
238 1 2019020336 CAR 4 5 EXIT
239 1 2019020336 CAR 4 6
240 1 2019020336 CAR 4 7
241 1 2019020336 42 CAR 24 4 0 8
242 1 2019020336 CAR 4 9
243 1 2019020336 CAR 4 10
data_final %>%
get_dupes(game_id, game_seconds)
## # A tibble: 0 x 11
## # … with 11 variables: game_id <dbl>, game_seconds <dbl>, dupe_count <int>,
## # power_play_count <dbl>, event_index <dbl>, PP_team <chr>,
## # event_length <dbl>, PP_fwds <dbl>, new_power_play <dbl>,
## # power_play_second <int>, event_type <chr>
117

Problems we get to solve

✅  Player position data is in a different place



✅  Zone exit data is in a different place



✅  One row per event instead of one row per second 🤔

118
If you can't handle the data cleaning, you don't deserve the data analysis. -- Me
119

Does a four-forward power play unit have a higher rate of zone exits?

120

Summarizing our data

summary <- data_final %>%
group_by(PP_fwds) %>%
summarize(time_on_ice = sum(event_length, na.rm = TRUE) / 60,
exits = sum(!is.na(event_type))) %>%
mutate(exit_rate = exits * 60 / time_on_ice)
121

Summarizing our data

summary <- data_final %>%
group_by(PP_fwds) %>%
summarize(time_on_ice = sum(event_length, na.rm = TRUE) / 60,
exits = sum(!is.na(event_type))) %>%
mutate(exit_rate = exits * 60 / time_on_ice)
122

Summarizing our data

summary <- data_final %>%
group_by(PP_fwds) %>%
summarize(time_on_ice = sum(event_length, na.rm = TRUE) / 60,
exits = sum(!is.na(event_type))) %>%
mutate(exit_rate = exits * 60 / time_on_ice)
123

Summarizing our data

summary <- data_final %>%
group_by(PP_fwds) %>%
summarize(time_on_ice = sum(event_length, na.rm = TRUE) / 60,
exits = sum(!is.na(event_type))) %>%
mutate(exit_rate = exits * 60 / time_on_ice)
124

Summarizing our data

summary <- data_final %>%
group_by(PP_fwds) %>%
summarize(time_on_ice = sum(event_length, na.rm = TRUE) / 60,
exits = sum(!is.na(event_type))) %>%
mutate(exit_rate = exits * 60 / time_on_ice)
125

Summarizing our data

summary <- data_final %>%
group_by(PP_fwds) %>%
summarize(time_on_ice = sum(event_length, na.rm = TRUE) / 60,
exits = sum(!is.na(event_type))) %>%
mutate(exit_rate = exits * 60 / time_on_ice)
PP_fwds time_on_ice exits exit_rate
3 11.25 17 90.66667
4 30.50 66 129.83607
126

Let's make a graph!

127

The simplest graph

summary %>%
ggplot(aes(x = as.character(PP_fwds), y = exit_rate)) +
geom_bar(stat="identity")

128

A little better

summary %>%
ggplot(aes(x = as.character(PP_fwds), y = exit_rate)) +
geom_bar(stat="identity") +
labs(title = "Rate of zone exits on the power play",
subtitle = "5v4 only, 4 NHL games November 2019",
x = "Number of forwards on the power play",
y = "Zone exits per 60 minutes") +
geom_text(aes(label = round(exit_rate, 2)), vjust = 1.6,
color = "white", size = 4)
129

A little better

130

With color!

summary %>%
ggplot(aes(x = as.character(PP_fwds), y = exit_rate,
fill = as.character(PP_fwds))) +
geom_bar(stat="identity") +
labs(title = "Rate of zone exits on the power play",
subtitle = "5v4 only, 4 NHL games November 2019",
x = "Number of forwards on the power play",
y = "Zone exits per 60 minutes") +
geom_text(aes(label = round(exit_rate, 2)), vjust = 1.6,
color = "white", size = 4) +
scale_fill_manual(values = c("#808080", "#5C164E")) +
theme(legend.position = "none")
131

With color!

132

Much better

summary %>%
ggplot(aes(x = as.character(PP_fwds), y = exit_rate,
fill = as.character(PP_fwds))) +
geom_bar(stat="identity") +
labs(title = "Rate of zone exits on the power play",
subtitle = "5v4 only, 4 NHL games November 2019",
x = "Number of forwards on the power play",
y = "Zone exits per 60 minutes") +
geom_text(aes(label = round(exit_rate, 2)), vjust = 1.6,
color = "white", size = 4) +
scale_fill_manual(values = c("#808080", "#912919")) +
CMSAC_theme() +
theme(legend.position = "none")
133

Much better

134

What are the benefits of creating a custom theme?

🕥  Saves time!



135

What are the benefits of creating a custom theme?

🕥  Saves time!



⭐  Helps establish your #brand.



135

What are the benefits of creating a custom theme?

🕥  Saves time!



⭐  Helps establish your #brand.



❤️  Looks way prettier than a standard ggplot2 graph.

135

Custom theme how-to

📝  Use Google fonts

library(showtext)
font_add_google("Open Sans", "opensans")
136

Custom theme how-to

📝  Use Google fonts

library(showtext)
font_add_google("Open Sans", "opensans")

📓  Design your theme

CMSAC_theme <- function () {
theme_linedraw(base_size=11, base_family="opensans") %+replace%
theme(
axis.ticks = element_blank(),
panel.grid.minor = element_blank(),
panel.grid.major = element_blank(),
plot.title = element_text(size = 15, hjust = 0, vjust = 0.5,
margin = margin(b = 0.2, unit = "cm")),
plot.subtitle = element_text(size = 9, face = "italic", hjust = 0,
vjust = 0.5, margin = margin(b = 0.2, unit = "cm")),
axis.title = element_text(face = "bold"))
}
136

Custom theme how-to

📝  Use Google fonts

library(showtext)
font_add_google("Open Sans", "opensans")

📓  Design your theme

CMSAC_theme <- function () {
theme_linedraw(base_size=11, base_family="opensans") %+replace%
theme(
axis.ticks = element_blank(),
panel.grid.minor = element_blank(),
panel.grid.major = element_blank(),
plot.title = element_text(size = 15, hjust = 0, vjust = 0.5,
margin = margin(b = 0.2, unit = "cm")),
plot.subtitle = element_text(size = 9, face = "italic", hjust = 0,
vjust = 0.5, margin = margin(b = 0.2, unit = "cm")),
axis.title = element_text(face = "bold"))
}
137

Custom theme how-to

📝  Use Google fonts

library(showtext)
font_add_google("Open Sans", "opensans")

📓  Design your theme

CMSAC_theme <- function () {
theme_linedraw(base_size=11, base_family="opensans") %+replace%
theme(
axis.ticks = element_blank(),
panel.grid.minor = element_blank(),
panel.grid.major = element_blank(),
plot.title = element_text(size = 15, hjust = 0, vjust = 0.5,
margin = margin(b = 0.2, unit = "cm")),
plot.subtitle = element_text(size = 9, face = "italic", hjust = 0,
vjust = 0.5, margin = margin(b = 0.2, unit = "cm")),
axis.title = element_text(face = "bold"))
}
138

Custom theme how-to

📝  Use Google fonts

library(showtext)
font_add_google("Open Sans", "opensans")

📓  Design your theme

CMSAC_theme <- function () {
theme_linedraw(base_size=11, base_family="opensans") %+replace%
theme(
axis.ticks = element_blank(),
panel.grid.minor = element_blank(),
panel.grid.major = element_blank(),
plot.title = element_text(size = 15, hjust = 0, vjust = 0.5,
margin = margin(b = 0.2, unit = "cm")),
plot.subtitle = element_text(size = 9, face = "italic", hjust = 0,
vjust = 0.5, margin = margin(b = 0.2, unit = "cm")),
axis.title = element_text(face = "bold"))
}
139

Custom theme how-to

1️⃣  Start with an existing theme: theme_linedraw(), theme_classic(), theme_light(), theme_minimal(), etc.

140

Custom theme how-to

1️⃣  Start with an existing theme: theme_linedraw(), theme_classic(), theme_light(), theme_minimal(), etc.

2️⃣  Use the ggplot2 theme reference page to find your arguments.

140

Custom theme how-to

1️⃣  Start with an existing theme: theme_linedraw(), theme_classic(), theme_light(), theme_minimal(), etc.

2️⃣  Use the ggplot2 theme reference page to find your arguments.

3️⃣  Investigate the options within those arguments.

theme_minimal <- theme_minimal()
theme_minimal$plot.subtitle
## List of 11
## $ family : NULL
## $ face : NULL
## $ colour : NULL
## $ size : NULL
## $ hjust : num 0
## $ vjust : num 1
## $ angle : NULL
## $ lineheight : NULL
## $ margin : 'margin' num [1:4] 0pt 0pt 5.5pt 0pt
## ..- attr(*, "valid.unit")= int 8
## ..- attr(*, "unit")= chr "pt"
## $ debug : NULL
## $ inherit.blank: logi TRUE
## - attr(*, "class")= chr [1:2] "element_text" "element"
140

How does the positional composition change over the time of the power play?

141

Summarizing our data

time <- data_final %>%
group_by(power_play_second, PP_fwds) %>%
summarize(n = n()) %>%
add_tally(n, name = "total") %>%
mutate(percent = n / total)
142

Summarizing our data

time <- data_final %>%
group_by(power_play_second, PP_fwds) %>%
summarize(n = n()) %>%
add_tally(n, name = "total") %>%
mutate(percent = n / total)
143

Summarizing our data

power_play_second PP_fwds n
1 3 9
1 4 18
2 3 9
2 4 18
3 3 9
3 4 18
4 3 9
4 4 18
5 3 8
5 4 18
144

Summarizing our data

time <- data_final %>%
group_by(power_play_second, PP_fwds) %>%
summarize(n = n()) %>%
add_tally(n, name = "total") %>%
mutate(percent = n / total)
145

Summarizing our data

power_play_second PP_fwds n total
1 3 9 27
1 4 18 27
2 3 9 27
2 4 18 27
3 3 9 27
3 4 18 27
4 3 9 27
4 4 18 27
5 3 8 26
5 4 18 26
146

Summarizing our data

time <- data_final %>%
group_by(power_play_second, PP_fwds) %>%
summarize(n = n()) %>%
add_tally(n, name = "total") %>%
mutate(percent = n / total)
147

Summarizing our data

power_play_second PP_fwds n total percent
1 3 9 27 0.3333333
1 4 18 27 0.6666667
2 3 9 27 0.3333333
2 4 18 27 0.6666667
3 3 9 27 0.3333333
3 4 18 27 0.6666667
4 3 9 27 0.3333333
4 4 18 27 0.6666667
5 3 8 26 0.3076923
5 4 18 26 0.6923077
148

Making an area graph

time %>%
mutate(PP_fwds = ifelse(PP_fwds == 3,
"3 forwards", "4 forwards")) %>%
ggplot(aes(x = power_play_second, y = percent, fill = PP_fwds)) +
geom_area() +
labs(title = "Composition of the power play",
subtitle = "5v4 only, 4 NHL games November 2019",
x = "Second of the power play") +
scale_fill_manual(values = c("#808080", "#912919")) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1),
expand = c(0, 0)) +
scale_x_continuous(limits = c(1, 120),
breaks = seq(0, 120, 20),
expand = c(0, 0)) +
CMSAC_theme() +
theme(legend.position = c(0.87, 1.106),
legend.title = element_blank(),
axis.title.y = element_blank())
149

Making an area graph

150

Making an area graph

time %>%
mutate(PP_fwds = ifelse(PP_fwds == 3,
"3 forwards", "4 forwards")) %>%
ggplot(aes(x = power_play_second, y = percent, fill = PP_fwds)) +
geom_area() +
labs(title = "Composition of the power play",
subtitle = "5v4 only, 4 NHL games November 2019",
x = "Second of the power play") +
scale_fill_manual(values = c("#808080", "#912919")) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1),
expand = c(0, 0)) +
scale_x_continuous(limits = c(1, 120),
breaks = seq(0, 120, 20),
expand = c(0, 0)) +
CMSAC_theme() +
theme(legend.position = c(0.87, 1.106),
legend.title = element_blank(),
axis.title.y = element_blank())
151

Making an area graph

time %>%
mutate(PP_fwds = ifelse(PP_fwds == 3,
"3 forwards", "4 forwards")) %>%
ggplot(aes(x = power_play_second, y = percent, fill = PP_fwds)) +
geom_area() +
labs(title = "Composition of the power play",
subtitle = "5v4 only, 4 NHL games November 2019",
x = "Second of the power play") +
scale_fill_manual(values = c("#808080", "#912919")) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1),
expand = c(0, 0)) +
scale_x_continuous(limits = c(1, 120),
breaks = seq(0, 120, 20),
expand = c(0, 0)) +
CMSAC_theme() +
theme(legend.position = c(0.87, 1.106),
legend.title = element_blank(),
axis.title.y = element_blank())
152

Making an area graph

time %>%
mutate(PP_fwds = ifelse(PP_fwds == 3,
"3 forwards", "4 forwards")) %>%
ggplot(aes(x = power_play_second, y = percent, fill = PP_fwds)) +
geom_area() +
labs(title = "Composition of the power play",
subtitle = "5v4 only, 4 NHL games November 2019",
x = "Second of the power play") +
scale_fill_manual(values = c("#808080", "#912919")) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1),
expand = c(0, 0)) +
scale_x_continuous(limits = c(1, 120),
breaks = seq(0, 120, 20),
expand = c(0, 0)) +
CMSAC_theme() +
theme(legend.position = c(0.87, 1.106),
legend.title = element_blank(),
axis.title.y = element_blank())
153

Making an area graph

154

Making an area graph

time %>%
mutate(PP_fwds = ifelse(PP_fwds == 3,
"3 forwards", "4 forwards")) %>%
ggplot(aes(x = power_play_second, y = percent, fill = PP_fwds)) +
geom_area() +
labs(title = "Composition of the power play",
subtitle = "5v4 only, 4 NHL games November 2019",
x = "Second of the power play") +
scale_fill_manual(values = c("#808080", "#912919")) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1),
expand = c(0, 0)) +
scale_x_continuous(limits = c(1, 120),
breaks = seq(0, 120, 20),
expand = c(0, 0)) +
CMSAC_theme() +
theme(legend.position = c(0.87, 1.106),
legend.title = element_blank(),
axis.title.y = element_blank())
155

We've done a lot!

⭐  Pivoted and joined data with tidyr and dplyr.


⭐  Wrote our own function and used purrr to apply it iteratively.


⭐  Created a custom theme and made an area graph.


156

We've done a lot!

⭐  Pivoted and joined data with tidyr and dplyr.


⭐  Wrote our own function and used purrr to apply it iteratively.


⭐  Created a custom theme and made an area graph.


❗  Slides/data/code are available on my website and Github so you can review.

156

Thanks!

<i class="fab fa-twitter "></i>{=html} @MeghanMHall
<i class="fab fa-github "></i>{=html} meghall06
<i class="fas fa-link "></i>{=html} meghan.rbind.io
<i class="fas fa-envelope "></i>{=html} meghanhall6@gmail.com

Slides created via the R package xaringan.

157

Me, Exceedingly Briefly

2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow