1️⃣ I work in higher ed as a data manager.
1️⃣ I work in higher ed as a data manager.
2️⃣ I dabble in hockey analysis.
1️⃣ I work in higher ed as a data manager.
2️⃣ I dabble in hockey analysis.
3️⃣ I use R a lot, thanks to 1️⃣ and 2️⃣, and love helping other people learn.
Do you need to know anything about hockey?
Do you need to know anything about hockey?
No.
Do you need to know anything about hockey?
No.
Do you need to know anything about coding?
Do you need to know anything about hockey?
No.
Do you need to know anything about coding?
Eh.
GOAL: "Level up your R programming and make your analysis more efficient."
GOAL: "Level up your R programming and make your analysis more efficient."
1️⃣ Data manipulation with tidyr
and dplyr
(and stringr
and lubridate
and purrr
!)
GOAL: "Level up your R programming and make your analysis more efficient."
1️⃣ Data manipulation with tidyr
and dplyr
(and stringr
and lubridate
and purrr
!)
2️⃣ User-defined functions
GOAL: "Level up your R programming and make your analysis more efficient."
1️⃣ Data manipulation with tidyr
and dplyr
(and stringr
and lubridate
and purrr
!)
2️⃣ User-defined functions
3️⃣ Custom ggplot2
themes
GOAL: "Level up your R programming and make your analysis more efficient."
1️⃣ Data manipulation with tidyr
and dplyr
(and stringr
and lubridate
and purrr
!)
2️⃣ User-defined functions
3️⃣ Custom ggplot2
themes
🕗 5-minute breaks at 5:00 and 5:30
GOAL: "Level up your R programming and make your analysis more efficient."
1️⃣ Data manipulation with tidyr
and dplyr
(and stringr
and lubridate
and purrr
!)
2️⃣ User-defined functions
3️⃣ Custom ggplot2
themes
🕗 5-minute breaks at 5:00 and 5:30
❓ Questions!
GOAL: "Level up your R programming and make your analysis more efficient."
1️⃣ Data manipulation with tidyr
and dplyr
(and stringr
and lubridate
and purrr
!)
2️⃣ User-defined functions
3️⃣ Custom ggplot2
themes
🕗 5-minute breaks at 5:00 and 5:30
❓ Questions!
HOW: Let's examine the composition of a power play.
1️⃣ How does the positional composition change over the time of the power play?
(Is the four-forward configuration maybe more common at the beginning or the end of the power play?)
1️⃣ How does the positional composition change over the time of the power play?
(Is the four-forward configuration maybe more common at the beginning or the end of the power play?)
2️⃣ Does a four-forward power play unit have a higher rate of zone exits?
# Load the packagedevtools::install_github("meghall06/betweenthepipes")library(betweenthepipes)# Load the datapbp <- pbp_examplebio <- bio_exampletracking <- track_example
pbp
: NHL play-by-play datagame_id | game_date | event_index | game_period | game_seconds | event_type | event_description | event_team | event_length |
---|---|---|---|---|---|---|---|---|
2019020336 | 2019-11-21 | 1 | 1 | 0 | PGSTR | 0 | ||
2019020336 | 2019-11-21 | 2 | 1 | 0 | PGEND | 0 | ||
2019020336 | 2019-11-21 | 3 | 1 | 0 | ANTHEM | 0 | ||
2019020336 | 2019-11-21 | 4 | 1 | 0 | PSTR | Period Start- Local time: 7:08 EST | 0 | |
2019020336 | 2019-11-21 | 5 | 1 | 0 | CHANGE | PHI | 0 | |
2019020336 | 2019-11-21 | 6 | 1 | 0 | CHANGE | CAR | 0 | |
2019020336 | 2019-11-21 | 7 | 1 | 0 | FAC | PHI won Neu. Zone - PHI #28 GIROUX vs CAR #20 AHO | PHI | 21 |
2019020336 | 2019-11-21 | 8 | 1 | 21 | GIVE | PHI GIVEAWAY - #28 GIROUX, Neu. Zone | PHI | 3 |
2019020336 | 2019-11-21 | 9 | 1 | 24 | CHANGE | PHI | 7 | |
2019020336 | 2019-11-21 | 10 | 1 | 31 | CHANGE | PHI | 2 |
game_id | game_date | event_index | game_period | game_seconds | event_type | event_description | event_team | event_length |
---|---|---|---|---|---|---|---|---|
2019020336 | 2019-11-21 | 1 | 1 | 0 | PGSTR | 0 | ||
2019020336 | 2019-11-21 | 2 | 1 | 0 | PGEND | 0 | ||
2019020336 | 2019-11-21 | 3 | 1 | 0 | ANTHEM | 0 | ||
2019020336 | 2019-11-21 | 4 | 1 | 0 | PSTR | Period Start- Local time: 7:08 EST | 0 | |
2019020336 | 2019-11-21 | 5 | 1 | 0 | CHANGE | PHI | 0 | |
2019020336 | 2019-11-21 | 6 | 1 | 0 | CHANGE | CAR | 0 | |
2019020336 | 2019-11-21 | 7 | 1 | 0 | FAC | PHI won Neu. Zone - PHI #28 GIROUX vs CAR #20 AHO | PHI | 21 |
2019020336 | 2019-11-21 | 8 | 1 | 21 | GIVE | PHI GIVEAWAY - #28 GIROUX, Neu. Zone | PHI | 3 |
2019020336 | 2019-11-21 | 9 | 1 | 24 | CHANGE | PHI | 7 | |
2019020336 | 2019-11-21 | 10 | 1 | 31 | CHANGE | PHI | 2 |
game_id | game_date | event_index | game_period | game_seconds | event_type | event_description | event_team | event_length |
---|---|---|---|---|---|---|---|---|
2019020336 | 2019-11-21 | 1 | 1 | 0 | PGSTR | 0 | ||
2019020336 | 2019-11-21 | 2 | 1 | 0 | PGEND | 0 | ||
2019020336 | 2019-11-21 | 3 | 1 | 0 | ANTHEM | 0 | ||
2019020336 | 2019-11-21 | 4 | 1 | 0 | PSTR | Period Start- Local time: 7:08 EST | 0 | |
2019020336 | 2019-11-21 | 5 | 1 | 0 | CHANGE | PHI | 0 | |
2019020336 | 2019-11-21 | 6 | 1 | 0 | CHANGE | CAR | 0 | |
2019020336 | 2019-11-21 | 7 | 1 | 0 | FAC | PHI won Neu. Zone - PHI #28 GIROUX vs CAR #20 AHO | PHI | 21 |
2019020336 | 2019-11-21 | 8 | 1 | 21 | GIVE | PHI GIVEAWAY - #28 GIROUX, Neu. Zone | PHI | 3 |
2019020336 | 2019-11-21 | 9 | 1 | 24 | CHANGE | PHI | 7 | |
2019020336 | 2019-11-21 | 10 | 1 | 31 | CHANGE | PHI | 2 |
game_strength_state | home_team | away_team | home_on_1 | home_on_2 | home_on_3 | home_on_4 | home_on_5 | home_on_6 |
---|---|---|---|---|---|---|---|---|
5v4 | CAR | PHI | DOUGIE.HAMILTON | JACCOB.SLAVIN | JORDAN.STAAL | PETR.MRAZEK | RYAN.DZINGEL | WARREN.FOEGELE |
5v4 | CAR | PHI | ANDREI.SVECHNIKOV | DOUGIE.HAMILTON | PETR.MRAZEK | RYAN.DZINGEL | SEBASTIAN.AHO | TEUVO.TERAVAINEN |
5v4 | CAR | PHI | ANDREI.SVECHNIKOV | DOUGIE.HAMILTON | PETR.MRAZEK | RYAN.DZINGEL | SEBASTIAN.AHO | TEUVO.TERAVAINEN |
5v4 | CAR | PHI | ANDREI.SVECHNIKOV | DOUGIE.HAMILTON | PETR.MRAZEK | RYAN.DZINGEL | SEBASTIAN.AHO | TEUVO.TERAVAINEN |
5v4 | CAR | PHI | ANDREI.SVECHNIKOV | DOUGIE.HAMILTON | PETR.MRAZEK | RYAN.DZINGEL | SEBASTIAN.AHO | TEUVO.TERAVAINEN |
5v4 | CAR | PHI | ANDREI.SVECHNIKOV | DOUGIE.HAMILTON | PETR.MRAZEK | RYAN.DZINGEL | SEBASTIAN.AHO | TEUVO.TERAVAINEN |
5v4 | CAR | PHI | ANDREI.SVECHNIKOV | DOUGIE.HAMILTON | PETR.MRAZEK | RYAN.DZINGEL | SEBASTIAN.AHO | TEUVO.TERAVAINEN |
5v4 | CAR | PHI | ANDREI.SVECHNIKOV | DOUGIE.HAMILTON | PETR.MRAZEK | RYAN.DZINGEL | SEBASTIAN.AHO | TEUVO.TERAVAINEN |
5v4 | CAR | PHI | ANDREI.SVECHNIKOV | DOUGIE.HAMILTON | PETR.MRAZEK | RYAN.DZINGEL | SEBASTIAN.AHO | TEUVO.TERAVAINEN |
4v5 | CAR | PHI | BRETT.PESCE | BROCK.MCGINN | JOEL.EDMUNDSON | JORDAN.STAAL | PETR.MRAZEK |
pbp %>% count(game_id, game_date, home_team, away_team)
game_id | game_date | home_team | away_team | n |
---|---|---|---|---|
2019020336 | 2019-11-21 | CAR | PHI | 586 |
2019020349 | 2019-11-23 | PHI | CGY | 682 |
2019020367 | 2019-11-25 | PHI | VAN | 606 |
2019020384 | 2019-11-27 | CBJ | PHI | 627 |
pbp
: NHL play-by-play databio
: Demographic and position data for these players from 2019Player | Position | Age | Date of Birth | Birth City | Birth State/Province | Birth Country | Nationality | Height (in) | Weight (lbs) |
---|---|---|---|---|---|---|---|---|---|
Adam Gaudette | C | 23 | 10/3/96 | Braintree | MA | USA | USA | 73 | 170 |
Alan Quine | C | 27 | 2/25/93 | Belleville | ON | CAN | CAN | 72 | 203 |
Alexander Edler | D | 34 | 4/21/86 | Ostersund | SWE | SWE | 75 | 212 | |
Alexander Wennberg | C | 25 | 9/22/94 | Stockholm | SWE | SWE | 74 | 197 | |
Alexander Yelesin | D | 24 | 2/7/96 | Yaroslavl | RUS | RUS | 72 | 195 | |
Alexandre Texier | C | 20 | 9/13/99 | St. Martin D'heres | FRA | FRA | 72 | 192 | |
Andrei Svechnikov | R | 20 | 3/26/00 | Barnual | RUS | RUS | 74 | 195 | |
Andrew Mangiapane | L | 24 | 4/4/96 | Toronto | ON | CAN | CAN | 70 | 184 |
Andrew Peeke | D | 22 | 3/17/98 | Parkland | FL | USA | USA | 75 | 194 |
Andy Andreoff | C | 29 | 5/17/91 | Pickering | ON | CAN | CAN | 73 | 203 |
Player | Position | Age | Date of Birth | Birth City | Birth State/Province | Birth Country | Nationality | Height (in) | Weight (lbs) |
---|---|---|---|---|---|---|---|---|---|
Adam Gaudette | C | 23 | 10/3/96 | Braintree | MA | USA | USA | 73 | 170 |
Alan Quine | C | 27 | 2/25/93 | Belleville | ON | CAN | CAN | 72 | 203 |
Alexander Edler | D | 34 | 4/21/86 | Ostersund | SWE | SWE | 75 | 212 | |
Alexander Wennberg | C | 25 | 9/22/94 | Stockholm | SWE | SWE | 74 | 197 | |
Alexander Yelesin | D | 24 | 2/7/96 | Yaroslavl | RUS | RUS | 72 | 195 | |
Alexandre Texier | C | 20 | 9/13/99 | St. Martin D'heres | FRA | FRA | 72 | 192 | |
Andrei Svechnikov | R | 20 | 3/26/00 | Barnual | RUS | RUS | 74 | 195 | |
Andrew Mangiapane | L | 24 | 4/4/96 | Toronto | ON | CAN | CAN | 70 | 184 |
Andrew Peeke | D | 22 | 3/17/98 | Parkland | FL | USA | USA | 75 | 194 |
Andy Andreoff | C | 29 | 5/17/91 | Pickering | ON | CAN | CAN | 73 | 203 |
pbp
: NHL play-by-play databio
: Demographic and position data for these players from 2019tracking
: Zone exit data on the power play in these four gamesgame_id | game_period | time | event_type |
---|---|---|---|
2019020336 | 1 | 16:02:00 | EXIT |
2019020336 | 1 | 15:27:00 | EXIT |
2019020336 | 1 | 14:42:00 | EXIT |
2019020336 | 1 | 13:57:00 | EXIT |
2019020336 | 1 | 12:36:00 | EXIT |
2019020336 | 1 | 12:12:00 | EXIT |
2019020336 | 2 | 18:36:00 | EXIT |
2019020336 | 2 | 18:10:00 | EXIT |
2019020336 | 2 | 17:20:00 | EXIT |
2019020336 | 2 | 14:40:00 | EXIT |
game_id | game_period | time | event_type |
---|---|---|---|
2019020336 | 1 | 16:02:00 | EXIT |
2019020336 | 1 | 15:27:00 | EXIT |
2019020336 | 1 | 14:42:00 | EXIT |
2019020336 | 1 | 13:57:00 | EXIT |
2019020336 | 1 | 12:36:00 | EXIT |
2019020336 | 1 | 12:12:00 | EXIT |
2019020336 | 2 | 18:36:00 | EXIT |
2019020336 | 2 | 18:10:00 | EXIT |
2019020336 | 2 | 17:20:00 | EXIT |
2019020336 | 2 | 14:40:00 | EXIT |
pbp
: NHL play-by-play databio
: Demographic and position data for these players from 2019tracking
: Zone exit data on the power play in these four games🧐 Study your different data sources
🧐 Study your different data sources
🔎 Find the relationships
🧐 Study your different data sources
🔎 Find the relationships
📝 Sketch out how you want your data to look at the end
❌ Player position data is in a different place
❌ Player position data is in a different place
❌ Zone exit data is in a different place
❌ Player position data is in a different place
❌ Zone exit data is in a different place
❌ One row per event instead of one row per second 🤔
⭐ My package (where the data is)
library(betweenthepipes)
⭐ My package (where the data is)
library(betweenthepipes)
❤️ The tidyverse (aka dplyr
, tidyr
, ggplot2
, stringr
)
library(tidyverse)
⭐ My package (where the data is)
library(betweenthepipes)
❤️ The tidyverse (aka dplyr
, tidyr
, ggplot2
, stringr
)
library(tidyverse)
🕦 For dealing with time (also part of the tidyverse)
library(lubridate)
⭐ My package (where the data is)
library(betweenthepipes)
❤️ The tidyverse (aka dplyr
, tidyr
, ggplot2
, stringr
)
library(tidyverse)
🕦 For dealing with time (also part of the tidyverse)
library(lubridate)
🧹 For some data cleaning help
library(janitor)
⭐ My package (where the data is)
library(betweenthepipes)
❤️ The tidyverse (aka dplyr
, tidyr
, ggplot2
, stringr
)
library(tidyverse)
🕦 For dealing with time (also part of the tidyverse)
library(lubridate)
🧹 For some data cleaning help
library(janitor)
🔂 For padding out our data
library(padr)
power <- pbp %>% # filter to only the power play strength states filter(game_strength_state %in% c("5v4", "4v5")) %>% # filter to only events that have time associated with them filter(event_length > 0) %>% # create a new variable that designates the PP team mutate(PP_team = ifelse(game_strength_state == "5v4", home_team, away_team)) %>% # create a new set of variables to determine who the PP players are mutate(PP_1 = ifelse(home_team == PP_team, home_on_1, away_on_1), PP_2 = ifelse(home_team == PP_team, home_on_2, away_on_2), PP_3 = ifelse(home_team == PP_team, home_on_3, away_on_3), PP_4 = ifelse(home_team == PP_team, home_on_4, away_on_4), PP_5 = ifelse(home_team == PP_team, home_on_5, away_on_5), PP_6 = ifelse(home_team == PP_team, home_on_6, away_on_6))
power <- pbp %>% # filter to only the power play strength states filter(game_strength_state %in% c("5v4", "4v5")) %>% # filter to only events that have time associated with them filter(event_length > 0) %>% # create a new variable that designates the PP team mutate(PP_team = ifelse(game_strength_state == "5v4", home_team, away_team)) %>% # create a new set of variables to determine who the PP players are mutate(PP_1 = ifelse(home_team == PP_team, home_on_1, away_on_1), PP_2 = ifelse(home_team == PP_team, home_on_2, away_on_2), PP_3 = ifelse(home_team == PP_team, home_on_3, away_on_3), PP_4 = ifelse(home_team == PP_team, home_on_4, away_on_4), PP_5 = ifelse(home_team == PP_team, home_on_5, away_on_5), PP_6 = ifelse(home_team == PP_team, home_on_6, away_on_6))
power <- pbp %>% # filter to only the power play strength states filter(game_strength_state %in% c("5v4", "4v5")) %>% # filter to only events that have time associated with them filter(event_length > 0) %>% # create a new variable that designates the PP team mutate(PP_team = ifelse(game_strength_state == "5v4", home_team, away_team)) %>% # create a new set of variables to determine who the PP players are mutate(PP_1 = ifelse(home_team == PP_team, home_on_1, away_on_1), PP_2 = ifelse(home_team == PP_team, home_on_2, away_on_2), PP_3 = ifelse(home_team == PP_team, home_on_3, away_on_3), PP_4 = ifelse(home_team == PP_team, home_on_4, away_on_4), PP_5 = ifelse(home_team == PP_team, home_on_5, away_on_5), PP_6 = ifelse(home_team == PP_team, home_on_6, away_on_6))
power <- pbp %>% # filter to only the power play strength states filter(game_strength_state %in% c("5v4", "4v5")) %>% # filter to only events that have time associated with them filter(event_length > 0) %>% # create a new variable that designates the PP team mutate(PP_team = ifelse(game_strength_state == "5v4", home_team, away_team)) %>% # create a new set of variables to determine who the PP players are mutate(PP_1 = ifelse(home_team == PP_team, home_on_1, away_on_1), PP_2 = ifelse(home_team == PP_team, home_on_2, away_on_2), PP_3 = ifelse(home_team == PP_team, home_on_3, away_on_3), PP_4 = ifelse(home_team == PP_team, home_on_4, away_on_4), PP_5 = ifelse(home_team == PP_team, home_on_5, away_on_5), PP_6 = ifelse(home_team == PP_team, home_on_6, away_on_6))
power <- pbp %>% # filter to only the power play strength states filter(game_strength_state %in% c("5v4", "4v5")) %>% # filter to only events that have time associated with them filter(event_length > 0) %>% # create a new variable that designates the PP team mutate(PP_team = ifelse(game_strength_state == "5v4", home_team, away_team)) %>% # create a new set of variables to determine who the PP players are mutate(PP_1 = ifelse(home_team == PP_team, home_on_1, away_on_1), PP_2 = ifelse(home_team == PP_team, home_on_2, away_on_2), PP_3 = ifelse(home_team == PP_team, home_on_3, away_on_3), PP_4 = ifelse(home_team == PP_team, home_on_4, away_on_4), PP_5 = ifelse(home_team == PP_team, home_on_5, away_on_5), PP_6 = ifelse(home_team == PP_team, home_on_6, away_on_6))
game_id | event_index | PP_team | PP_1 | PP_2 | PP_3 | PP_4 | PP_5 | PP_6 |
---|---|---|---|---|---|---|---|---|
2019020336 | 41 | CAR | ANDREI.SVECHNIKOV | DOUGIE.HAMILTON | PETR.MRAZEK | RYAN.DZINGEL | SEBASTIAN.AHO | TEUVO.TERAVAINEN |
2019020336 | 42 | CAR | ANDREI.SVECHNIKOV | DOUGIE.HAMILTON | PETR.MRAZEK | RYAN.DZINGEL | SEBASTIAN.AHO | TEUVO.TERAVAINEN |
2019020336 | 43 | CAR | ANDREI.SVECHNIKOV | DOUGIE.HAMILTON | PETR.MRAZEK | RYAN.DZINGEL | SEBASTIAN.AHO | TEUVO.TERAVAINEN |
2019020336 | 44 | CAR | ANDREI.SVECHNIKOV | DOUGIE.HAMILTON | PETR.MRAZEK | RYAN.DZINGEL | SEBASTIAN.AHO | TEUVO.TERAVAINEN |
2019020336 | 45 | CAR | ANDREI.SVECHNIKOV | DOUGIE.HAMILTON | PETR.MRAZEK | RYAN.DZINGEL | SEBASTIAN.AHO | TEUVO.TERAVAINEN |
2019020336 | 46 | CAR | ANDREI.SVECHNIKOV | DOUGIE.HAMILTON | PETR.MRAZEK | RYAN.DZINGEL | SEBASTIAN.AHO | TEUVO.TERAVAINEN |
2019020336 | 60 | PHI | BRIAN.ELLIOTT | CLAUDE.GIROUX | IVAN.PROVOROV | JAMES.VAN RIEMSDYK | MORGAN.FROST | TRAVIS.KONECNY |
2019020336 | 61 | PHI | BRIAN.ELLIOTT | CLAUDE.GIROUX | IVAN.PROVOROV | JAMES.VAN RIEMSDYK | MORGAN.FROST | TRAVIS.KONECNY |
2019020336 | 62 | PHI | BRIAN.ELLIOTT | CLAUDE.GIROUX | IVAN.PROVOROV | JAMES.VAN RIEMSDYK | MORGAN.FROST | TRAVIS.KONECNY |
2019020336 | 66 | PHI | BRIAN.ELLIOTT | JAKUB.VORACEK | MATT.NISKANEN | OSKAR.LINDBLOM | SEAN.COUTURIER | SHAYNE.GOSTISBEHERE |
pivot <- power %>% # pick the variables that we want select(game_id, event_index, PP_team, game_seconds, event_length, home_goalie, away_goalie, PP_1:PP_6) %>% # pivot the six player variables pivot_longer(PP_1:PP_6, names_to = "on_ice", values_to = "player") %>% # filter out the goalies filter(player != home_goalie & player != away_goalie) %>% # remove the now-unnecessary goalie variables select(-c(home_goalie, away_goalie, on_ice))
pivot <- power %>% # pick the variables that we want select(game_id, event_index, PP_team, game_seconds, event_length, home_goalie, away_goalie, PP_1:PP_6) %>% # pivot the six player variables pivot_longer(PP_1:PP_6, names_to = "on_ice", values_to = "player") %>% # filter out the goalies filter(player != home_goalie & player != away_goalie) %>% # remove the now-unnecessary goalie variables select(-c(home_goalie, away_goalie, on_ice))
game_id | event_index | PP_team | game_seconds | event_length | home_goalie | away_goalie | on_ice | player |
---|---|---|---|---|---|---|---|---|
2019020336 | 41 | CAR | 234 | 7 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_1 | ANDREI.SVECHNIKOV |
2019020336 | 41 | CAR | 234 | 7 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_2 | DOUGIE.HAMILTON |
2019020336 | 41 | CAR | 234 | 7 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_3 | PETR.MRAZEK |
2019020336 | 41 | CAR | 234 | 7 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_4 | RYAN.DZINGEL |
2019020336 | 41 | CAR | 234 | 7 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_5 | SEBASTIAN.AHO |
2019020336 | 41 | CAR | 234 | 7 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_6 | TEUVO.TERAVAINEN |
2019020336 | 42 | CAR | 241 | 24 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_1 | ANDREI.SVECHNIKOV |
2019020336 | 42 | CAR | 241 | 24 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_2 | DOUGIE.HAMILTON |
2019020336 | 42 | CAR | 241 | 24 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_3 | PETR.MRAZEK |
2019020336 | 42 | CAR | 241 | 24 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_4 | RYAN.DZINGEL |
2019020336 | 42 | CAR | 241 | 24 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_5 | SEBASTIAN.AHO |
2019020336 | 42 | CAR | 241 | 24 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_6 | TEUVO.TERAVAINEN |
game_id | event_index | PP_team | game_seconds | event_length | home_goalie | away_goalie | on_ice | player |
---|---|---|---|---|---|---|---|---|
2019020336 | 41 | CAR | 234 | 7 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_1 | ANDREI.SVECHNIKOV |
2019020336 | 41 | CAR | 234 | 7 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_2 | DOUGIE.HAMILTON |
2019020336 | 41 | CAR | 234 | 7 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_3 | PETR.MRAZEK |
2019020336 | 41 | CAR | 234 | 7 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_4 | RYAN.DZINGEL |
2019020336 | 41 | CAR | 234 | 7 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_5 | SEBASTIAN.AHO |
2019020336 | 41 | CAR | 234 | 7 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_6 | TEUVO.TERAVAINEN |
2019020336 | 42 | CAR | 241 | 24 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_1 | ANDREI.SVECHNIKOV |
2019020336 | 42 | CAR | 241 | 24 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_2 | DOUGIE.HAMILTON |
2019020336 | 42 | CAR | 241 | 24 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_3 | PETR.MRAZEK |
2019020336 | 42 | CAR | 241 | 24 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_4 | RYAN.DZINGEL |
2019020336 | 42 | CAR | 241 | 24 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_5 | SEBASTIAN.AHO |
2019020336 | 42 | CAR | 241 | 24 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_6 | TEUVO.TERAVAINEN |
game_id | event_index | PP_team | game_seconds | event_length | home_goalie | away_goalie | on_ice | player |
---|---|---|---|---|---|---|---|---|
2019020336 | 41 | CAR | 234 | 7 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_1 | ANDREI.SVECHNIKOV |
2019020336 | 41 | CAR | 234 | 7 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_2 | DOUGIE.HAMILTON |
2019020336 | 41 | CAR | 234 | 7 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_3 | PETR.MRAZEK |
2019020336 | 41 | CAR | 234 | 7 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_4 | RYAN.DZINGEL |
2019020336 | 41 | CAR | 234 | 7 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_5 | SEBASTIAN.AHO |
2019020336 | 41 | CAR | 234 | 7 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_6 | TEUVO.TERAVAINEN |
2019020336 | 42 | CAR | 241 | 24 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_1 | ANDREI.SVECHNIKOV |
2019020336 | 42 | CAR | 241 | 24 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_2 | DOUGIE.HAMILTON |
2019020336 | 42 | CAR | 241 | 24 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_3 | PETR.MRAZEK |
2019020336 | 42 | CAR | 241 | 24 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_4 | RYAN.DZINGEL |
2019020336 | 42 | CAR | 241 | 24 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_5 | SEBASTIAN.AHO |
2019020336 | 42 | CAR | 241 | 24 | PETR.MRAZEK | BRIAN.ELLIOTT | PP_6 | TEUVO.TERAVAINEN |
pivot <- power %>% # pick the variables that we want select(game_id, event_index, PP_team, game_seconds, event_length, home_goalie, away_goalie, PP_1:PP_6) %>% # pivot the six player variables pivot_longer(PP_1:PP_6, names_to = "on_ice", values_to = "player") %>% # filter out the goalies filter(player != home_goalie & player != away_goalie) %>% # remove the now-unnecessary goalie variables select(-c(home_goalie, away_goalie, on_ice))
pivot <- power %>% # pick the variables that we want select(game_id, event_index, PP_team, game_seconds, event_length, home_goalie, away_goalie, PP_1:PP_6) %>% # pivot the six player variables pivot_longer(PP_1:PP_6, names_to = "on_ice", values_to = "player") %>% # filter out the goalies filter(player != home_goalie & player != away_goalie) %>% # remove the now-unnecessary goalie variables select(-c(home_goalie, away_goalie, on_ice))
game_id | event_index | PP_team | game_seconds | event_length | player |
---|---|---|---|---|---|
2019020336 | 41 | CAR | 234 | 7 | ANDREI.SVECHNIKOV |
2019020336 | 41 | CAR | 234 | 7 | DOUGIE.HAMILTON |
2019020336 | 41 | CAR | 234 | 7 | RYAN.DZINGEL |
2019020336 | 41 | CAR | 234 | 7 | SEBASTIAN.AHO |
2019020336 | 41 | CAR | 234 | 7 | TEUVO.TERAVAINEN |
2019020336 | 42 | CAR | 241 | 24 | ANDREI.SVECHNIKOV |
2019020336 | 42 | CAR | 241 | 24 | DOUGIE.HAMILTON |
2019020336 | 42 | CAR | 241 | 24 | RYAN.DZINGEL |
2019020336 | 42 | CAR | 241 | 24 | SEBASTIAN.AHO |
2019020336 | 42 | CAR | 241 | 24 | TEUVO.TERAVAINEN |
bio <- bio %>% # reformat the variable names clean_names() %>% # use two stringr functions to convert our player names to all # uppercase and replace the first space with a period mutate(player = str_to_upper(player), player = str_replace(player, " ", ".")) %>% # record our position variable into a 0/1 variable for forward mutate(forward = ifelse(position == "D", 0, 1)) %>% # select only these two variables select(player, forward)
bio <- bio %>% # reformat the variable names clean_names() %>% # use two stringr functions to convert our player names to all # uppercase and replace the first space with a period mutate(player = str_to_upper(player), player = str_replace(player, " ", ".")) %>% # record our position variable into a 0/1 variable for forward mutate(forward = ifelse(position == "D", 0, 1)) %>% # select only these two variables select(player, forward)
Player | Position | Age | Date of Birth | Birth City | Birth State/Province | Birth Country | Nationality | Height (in) | Weight (lbs) |
---|---|---|---|---|---|---|---|---|---|
Adam Gaudette | C | 23 | 10/3/96 | Braintree | MA | USA | USA | 73 | 170 |
Alan Quine | C | 27 | 2/25/93 | Belleville | ON | CAN | CAN | 72 | 203 |
Alexander Edler | D | 34 | 4/21/86 | Ostersund | SWE | SWE | 75 | 212 | |
Alexander Wennberg | C | 25 | 9/22/94 | Stockholm | SWE | SWE | 74 | 197 | |
Alexander Yelesin | D | 24 | 2/7/96 | Yaroslavl | RUS | RUS | 72 | 195 | |
Alexandre Texier | C | 20 | 9/13/99 | St. Martin D'heres | FRA | FRA | 72 | 192 | |
Andrei Svechnikov | R | 20 | 3/26/00 | Barnual | RUS | RUS | 74 | 195 | |
Andrew Mangiapane | L | 24 | 4/4/96 | Toronto | ON | CAN | CAN | 70 | 184 |
Andrew Peeke | D | 22 | 3/17/98 | Parkland | FL | USA | USA | 75 | 194 |
Andy Andreoff | C | 29 | 5/17/91 | Pickering | ON | CAN | CAN | 73 | 203 |
player | position | age | date_of_birth | birth_city | birth_state_province | birth_country | nationality | height_in |
---|---|---|---|---|---|---|---|---|
Adam Gaudette | C | 23 | 10/3/96 | Braintree | MA | USA | USA | 73 |
Alan Quine | C | 27 | 2/25/93 | Belleville | ON | CAN | CAN | 72 |
Alexander Edler | D | 34 | 4/21/86 | Ostersund | SWE | SWE | 75 | |
Alexander Wennberg | C | 25 | 9/22/94 | Stockholm | SWE | SWE | 74 | |
Alexander Yelesin | D | 24 | 2/7/96 | Yaroslavl | RUS | RUS | 72 | |
Alexandre Texier | C | 20 | 9/13/99 | St. Martin D'heres | FRA | FRA | 72 | |
Andrei Svechnikov | R | 20 | 3/26/00 | Barnual | RUS | RUS | 74 | |
Andrew Mangiapane | L | 24 | 4/4/96 | Toronto | ON | CAN | CAN | 70 |
Andrew Peeke | D | 22 | 3/17/98 | Parkland | FL | USA | USA | 75 |
Andy Andreoff | C | 29 | 5/17/91 | Pickering | ON | CAN | CAN | 73 |
bio <- bio %>% # reformat the variable names clean_names() %>% # use two stringr functions to convert our player names to all # uppercase and replace the first space with a period mutate(player = str_to_upper(player), player = str_replace(player, " ", ".")) %>% # record our position variable into a 0/1 variable for forward mutate(forward = ifelse(position == "D", 0, 1)) %>% # select only these two variables select(player, forward)
player | position | age | date_of_birth | birth_city | birth_state_province | birth_country | nationality |
---|---|---|---|---|---|---|---|
ADAM.GAUDETTE | C | 23 | 10/3/96 | Braintree | MA | USA | USA |
ALAN.QUINE | C | 27 | 2/25/93 | Belleville | ON | CAN | CAN |
ALEXANDER.EDLER | D | 34 | 4/21/86 | Ostersund | SWE | SWE | |
ALEXANDER.WENNBERG | C | 25 | 9/22/94 | Stockholm | SWE | SWE | |
ALEXANDER.YELESIN | D | 24 | 2/7/96 | Yaroslavl | RUS | RUS | |
ALEXANDRE.TEXIER | C | 20 | 9/13/99 | St. Martin D'heres | FRA | FRA | |
ANDREI.SVECHNIKOV | R | 20 | 3/26/00 | Barnual | RUS | RUS | |
ANDREW.MANGIAPANE | L | 24 | 4/4/96 | Toronto | ON | CAN | CAN |
ANDREW.PEEKE | D | 22 | 3/17/98 | Parkland | FL | USA | USA |
ANDY.ANDREOFF | C | 29 | 5/17/91 | Pickering | ON | CAN | CAN |
player | position | age | date_of_birth | birth_city | birth_state_province | birth_country | nationality |
---|---|---|---|---|---|---|---|
ADAM.GAUDETTE | C | 23 | 10/3/96 | Braintree | MA | USA | USA |
ALAN.QUINE | C | 27 | 2/25/93 | Belleville | ON | CAN | CAN |
ALEXANDER.EDLER | D | 34 | 4/21/86 | Ostersund | SWE | SWE | |
ALEXANDER.WENNBERG | C | 25 | 9/22/94 | Stockholm | SWE | SWE | |
ALEXANDER.YELESIN | D | 24 | 2/7/96 | Yaroslavl | RUS | RUS | |
ALEXANDRE.TEXIER | C | 20 | 9/13/99 | St. Martin D'heres | FRA | FRA | |
ANDREI.SVECHNIKOV | R | 20 | 3/26/00 | Barnual | RUS | RUS | |
ANDREW.MANGIAPANE | L | 24 | 4/4/96 | Toronto | ON | CAN | CAN |
ANDREW.PEEKE | D | 22 | 3/17/98 | Parkland | FL | USA | USA |
ANDY.ANDREOFF | C | 29 | 5/17/91 | Pickering | ON | CAN | CAN |
bio <- bio %>% # reformat the variable names clean_names() %>% # use two stringr functions to convert our player names to all # uppercase and replace the first space with a period mutate(player = str_to_upper(player), player = str_replace(player, " ", ".")) %>% # record our position variable into a 0/1 variable for forward mutate(forward = ifelse(position == "D", 0, 1)) %>% # select only these two variables select(player, forward)
bio <- bio %>% # reformat the variable names clean_names() %>% # use two stringr functions to convert our player names to all # uppercase and replace the first space with a period mutate(player = str_to_upper(player), player = str_replace(player, " ", ".")) %>% # record our position variable into a 0/1 variable for forward mutate(forward = ifelse(position == "D", 0, 1)) %>% # select only these two variables select(player, forward)
player | forward |
---|---|
ADAM.GAUDETTE | 1 |
ALAN.QUINE | 1 |
ALEXANDER.EDLER | 0 |
ALEXANDER.WENNBERG | 1 |
ALEXANDER.YELESIN | 0 |
ALEXANDRE.TEXIER | 1 |
ANDREI.SVECHNIKOV | 1 |
ANDREW.MANGIAPANE | 1 |
ANDREW.PEEKE | 0 |
ANDY.ANDREOFF | 1 |
pivot_position <- pivot %>% left_join(bio, by = "player")
pivot_position <- pivot %>% left_join(bio, by = "player")
game_id | event_index | PP_team | game_seconds | event_length | player | forward |
---|---|---|---|---|---|---|
2019020336 | 41 | CAR | 234 | 7 | ANDREI.SVECHNIKOV | 1 |
2019020336 | 41 | CAR | 234 | 7 | DOUGIE.HAMILTON | 0 |
2019020336 | 41 | CAR | 234 | 7 | RYAN.DZINGEL | 1 |
2019020336 | 41 | CAR | 234 | 7 | SEBASTIAN.AHO | 1 |
2019020336 | 41 | CAR | 234 | 7 | TEUVO.TERAVAINEN | 1 |
2019020336 | 42 | CAR | 241 | 24 | ANDREI.SVECHNIKOV | 1 |
2019020336 | 42 | CAR | 241 | 24 | DOUGIE.HAMILTON | 0 |
2019020336 | 42 | CAR | 241 | 24 | RYAN.DZINGEL | 1 |
2019020336 | 42 | CAR | 241 | 24 | SEBASTIAN.AHO | 1 |
2019020336 | 42 | CAR | 241 | 24 | TEUVO.TERAVAINEN | 1 |
pivot_position %>% get_dupes(game_id, event_index, player)
## # A tibble: 0 x 8## # … with 8 variables: game_id <dbl>, event_index <dbl>, player <chr>,## # dupe_count <int>, PP_team <chr>, game_seconds <dbl>, event_length <dbl>,## # forward <dbl>
fixes <- pivot_position %>% filter(is.na(forward)) %>% count(player) %>% arrange(player)
fixes <- pivot_position %>% filter(is.na(forward)) %>% count(player) %>% arrange(player)
player | n |
---|---|
ALEX.EDLER | 9 |
ALEX.WENNBERG | 4 |
CHRIS.TANEV | 2 |
bio %>% filter(str_detect(player, 'EDLER|WENNBERG|TANEV'))
bio %>% filter(str_detect(player, 'EDLER|WENNBERG|TANEV'))
player | forward |
---|---|
ALEXANDER.EDLER | 0 |
ALEXANDER.WENNBERG | 1 |
CHRISTOPHER.TANEV | 0 |
bio <- bio %>% mutate(player = case_when( player == "ALEXANDER.EDLER" ~ "ALEX.EDLER", player == "ALEXANDER.WENNBERG" ~ "ALEX.WENNBERG", player == "CHRISTOPHER.TANEV" ~ "CHRIS.TANEV", TRUE ~ player))
bio <- bio %>% mutate(player = case_when( player == "ALEXANDER.EDLER" ~ "ALEX.EDLER", player == "ALEXANDER.WENNBERG" ~ "ALEX.WENNBERG", player == "CHRISTOPHER.TANEV" ~ "CHRIS.TANEV", TRUE ~ player))
pivot_position <- pivot %>% left_join(bio, by = "player")fixes <- pivot_position %>% filter(is.na(forward)) %>% count(player) %>% arrange(player)
bio <- bio %>% mutate(player = case_when( player == "ALEXANDER.EDLER" ~ "ALEX.EDLER", player == "ALEXANDER.WENNBERG" ~ "ALEX.WENNBERG", player == "CHRISTOPHER.TANEV" ~ "CHRIS.TANEV", TRUE ~ player))
pivot_position <- pivot %>% left_join(bio, by = "player")fixes <- pivot_position %>% filter(is.na(forward)) %>% count(player) %>% arrange(player)
player | n |
---|---|
game_id | event_index | PP_team | game_seconds | event_length | player | forward |
---|---|---|---|---|---|---|
2019020336 | 41 | CAR | 234 | 7 | ANDREI.SVECHNIKOV | 1 |
2019020336 | 41 | CAR | 234 | 7 | DOUGIE.HAMILTON | 0 |
2019020336 | 41 | CAR | 234 | 7 | RYAN.DZINGEL | 1 |
2019020336 | 41 | CAR | 234 | 7 | SEBASTIAN.AHO | 1 |
2019020336 | 41 | CAR | 234 | 7 | TEUVO.TERAVAINEN | 1 |
2019020336 | 42 | CAR | 241 | 24 | ANDREI.SVECHNIKOV | 1 |
2019020336 | 42 | CAR | 241 | 24 | DOUGIE.HAMILTON | 0 |
2019020336 | 42 | CAR | 241 | 24 | RYAN.DZINGEL | 1 |
2019020336 | 42 | CAR | 241 | 24 | SEBASTIAN.AHO | 1 |
2019020336 | 42 | CAR | 241 | 24 | TEUVO.TERAVAINEN | 1 |
data <- pivot_position %>% # first create a basic count variable for each player within a PP group_by(game_id, event_index) %>% mutate(on_ice = row_number()) %>% ungroup() %>% # pivot the players and positions pivot_wider(names_from = "on_ice", values_from = c("player", "forward")) %>% # create a variable to count the number of forwards mutate(PP_fwds = select(., forward_1:forward_5) %>% rowSums(na.rm = TRUE)) %>% # remove the now-unnecessary variables select(-c(player_1:forward_5))
data <- pivot_position %>% # first create a basic count variable for each player within a PP group_by(game_id, event_index) %>% mutate(on_ice = row_number()) %>% ungroup() %>% # pivot the players and positions pivot_wider(names_from = "on_ice", values_from = c("player", "forward")) %>% # create a variable to count the number of forwards mutate(PP_fwds = select(., forward_1:forward_5) %>% rowSums(na.rm = TRUE)) %>% # remove the now-unnecessary variables select(-c(player_1:forward_5))
game_id | event_index | PP_team | game_seconds | event_length | player | forward | on_ice |
---|---|---|---|---|---|---|---|
2019020336 | 41 | CAR | 234 | 7 | ANDREI.SVECHNIKOV | 1 | 1 |
2019020336 | 41 | CAR | 234 | 7 | DOUGIE.HAMILTON | 0 | 2 |
2019020336 | 41 | CAR | 234 | 7 | RYAN.DZINGEL | 1 | 3 |
2019020336 | 41 | CAR | 234 | 7 | SEBASTIAN.AHO | 1 | 4 |
2019020336 | 41 | CAR | 234 | 7 | TEUVO.TERAVAINEN | 1 | 5 |
2019020336 | 42 | CAR | 241 | 24 | ANDREI.SVECHNIKOV | 1 | 1 |
2019020336 | 42 | CAR | 241 | 24 | DOUGIE.HAMILTON | 0 | 2 |
2019020336 | 42 | CAR | 241 | 24 | RYAN.DZINGEL | 1 | 3 |
2019020336 | 42 | CAR | 241 | 24 | SEBASTIAN.AHO | 1 | 4 |
2019020336 | 42 | CAR | 241 | 24 | TEUVO.TERAVAINEN | 1 | 5 |
data <- pivot_position %>% # first create a basic count variable for each player within a PP group_by(game_id, event_index) %>% mutate(on_ice = row_number()) %>% ungroup() %>% # pivot the players and positions pivot_wider(names_from = "on_ice", values_from = c("player", "forward")) %>% # create a variable to count the number of forwards mutate(PP_fwds = select(., forward_1:forward_5) %>% rowSums(na.rm = TRUE)) %>% # remove the now-unnecessary variables select(-c(player_1:forward_5))
game_id | event_index | PP_team | game_seconds | event_length | player_1 | forward_1 | player_2 | forward_2 |
---|---|---|---|---|---|---|---|---|
2019020336 | 41 | CAR | 234 | 7 | ANDREI.SVECHNIKOV | 1 | DOUGIE.HAMILTON | 0 |
2019020336 | 42 | CAR | 241 | 24 | ANDREI.SVECHNIKOV | 1 | DOUGIE.HAMILTON | 0 |
2019020336 | 43 | CAR | 265 | 2 | ANDREI.SVECHNIKOV | 1 | DOUGIE.HAMILTON | 0 |
2019020336 | 44 | CAR | 267 | 5 | ANDREI.SVECHNIKOV | 1 | DOUGIE.HAMILTON | 0 |
2019020336 | 45 | CAR | 272 | 4 | ANDREI.SVECHNIKOV | 1 | DOUGIE.HAMILTON | 0 |
2019020336 | 46 | CAR | 276 | 7 | ANDREI.SVECHNIKOV | 1 | DOUGIE.HAMILTON | 0 |
2019020336 | 60 | PHI | 313 | 33 | CLAUDE.GIROUX | 1 | IVAN.PROVOROV | 0 |
2019020336 | 61 | PHI | 346 | 10 | CLAUDE.GIROUX | 1 | IVAN.PROVOROV | 0 |
2019020336 | 62 | PHI | 356 | 1 | CLAUDE.GIROUX | 1 | IVAN.PROVOROV | 0 |
2019020336 | 66 | PHI | 357 | 22 | JAKUB.VORACEK | 1 | MATT.NISKANEN | 0 |
data <- pivot_position %>% # first create a basic count variable for each player within a PP group_by(game_id, event_index) %>% mutate(on_ice = row_number()) %>% ungroup() %>% # pivot the players and positions pivot_wider(names_from = "on_ice", values_from = c("player", "forward")) %>% # create a variable to count the number of forwards mutate(PP_fwds = select(., forward_1:forward_5) %>% rowSums(na.rm = TRUE)) %>% # remove the now-unnecessary variables select(-c(player_1:forward_5))
data <- pivot_position %>% # first create a basic count variable for each player within a PP group_by(game_id, event_index) %>% mutate(on_ice = row_number()) %>% ungroup() %>% # pivot the players and positions pivot_wider(names_from = "on_ice", values_from = c("player", "forward")) %>% # create a variable to count the number of forwards mutate(PP_fwds = select(., forward_1:forward_5) %>% rowSums(na.rm = TRUE)) %>% # remove the now-unnecessary variables select(-c(player_1:forward_5))
game_id | event_index | PP_team | game_seconds | event_length | PP_fwds |
---|---|---|---|---|---|
2019020336 | 41 | CAR | 234 | 7 | 4 |
2019020336 | 42 | CAR | 241 | 24 | 4 |
2019020336 | 43 | CAR | 265 | 2 | 4 |
2019020336 | 44 | CAR | 267 | 5 | 4 |
2019020336 | 45 | CAR | 272 | 4 | 4 |
2019020336 | 46 | CAR | 276 | 7 | 4 |
2019020336 | 60 | PHI | 313 | 33 | 4 |
2019020336 | 61 | PHI | 346 | 10 | 4 |
2019020336 | 62 | PHI | 356 | 1 | 4 |
2019020336 | 66 | PHI | 357 | 22 | 3 |
✅ Player position data is in a different place
❌ Zone exit data is in a different place
❌ One row per event instead of one row per second 🤔
game_id | game_period | time | event_type |
---|---|---|---|
2019020336 | 1 | 16:02:00 | EXIT |
2019020336 | 1 | 15:27:00 | EXIT |
2019020336 | 1 | 14:42:00 | EXIT |
2019020336 | 1 | 13:57:00 | EXIT |
2019020336 | 1 | 12:36:00 | EXIT |
2019020336 | 1 | 12:12:00 | EXIT |
2019020336 | 2 | 18:36:00 | EXIT |
2019020336 | 2 | 18:10:00 | EXIT |
2019020336 | 2 | 17:20:00 | EXIT |
2019020336 | 2 | 14:40:00 | EXIT |
tracking <- tracking %>% # create a new variable that converts to m:s mutate(time_new = str_sub(time, 1, 5), time_new = ms(time_new), # create a new variable for total seconds seconds = minute(time_new)*60 + second(time_new), # create a game_seconds variable to match what's in pbp data game_seconds = case_when( game_period == 1 ~ 1200 - seconds, game_period == 2 ~ 2400 - seconds, game_period == 3 ~ 3600 - seconds)) %>% # remove now-unnecessary variables select(-c(time, time_new, seconds, game_period))
tracking <- tracking %>% # create a new variable that converts to m:s mutate(time_new = str_sub(time, 1, 5), time_new = ms(time_new), # create a new variable for total seconds seconds = minute(time_new)*60 + second(time_new), # create a game_seconds variable to match what's in pbp data game_seconds = case_when( game_period == 1 ~ 1200 - seconds, game_period == 2 ~ 2400 - seconds, game_period == 3 ~ 3600 - seconds)) %>% # remove now-unnecessary variables select(-c(time, time_new, seconds, game_period))
game_id | game_period | time | event_type | time_new |
---|---|---|---|---|
2019020336 | 1 | 16:02:00 | EXIT | 16M 2S |
2019020336 | 1 | 15:27:00 | EXIT | 15M 27S |
2019020336 | 1 | 14:42:00 | EXIT | 14M 42S |
2019020336 | 1 | 13:57:00 | EXIT | 13M 57S |
2019020336 | 1 | 12:36:00 | EXIT | 12M 36S |
2019020336 | 1 | 12:12:00 | EXIT | 12M 12S |
2019020336 | 2 | 18:36:00 | EXIT | 18M 36S |
2019020336 | 2 | 18:10:00 | EXIT | 18M 10S |
2019020336 | 2 | 17:20:00 | EXIT | 17M 20S |
2019020336 | 2 | 14:40:00 | EXIT | 14M 40S |
tracking <- tracking %>% # create a new variable that converts to m:s mutate(time_new = str_sub(time, 1, 5), time_new = ms(time_new), # create a new variable for total seconds seconds = minute(time_new)*60 + second(time_new), # create a game_seconds variable to match what's in pbp data game_seconds = case_when( game_period == 1 ~ 1200 - seconds, game_period == 2 ~ 2400 - seconds, game_period == 3 ~ 3600 - seconds)) %>% # remove now-unnecessary variables select(-c(time, time_new, seconds, game_period))
game_id | game_period | time | event_type | time_new | seconds |
---|---|---|---|---|---|
2019020336 | 1 | 16:02:00 | EXIT | 16M 2S | 962 |
2019020336 | 1 | 15:27:00 | EXIT | 15M 27S | 927 |
2019020336 | 1 | 14:42:00 | EXIT | 14M 42S | 882 |
2019020336 | 1 | 13:57:00 | EXIT | 13M 57S | 837 |
2019020336 | 1 | 12:36:00 | EXIT | 12M 36S | 756 |
2019020336 | 1 | 12:12:00 | EXIT | 12M 12S | 732 |
2019020336 | 2 | 18:36:00 | EXIT | 18M 36S | 1116 |
2019020336 | 2 | 18:10:00 | EXIT | 18M 10S | 1090 |
2019020336 | 2 | 17:20:00 | EXIT | 17M 20S | 1040 |
2019020336 | 2 | 14:40:00 | EXIT | 14M 40S | 880 |
tracking <- tracking %>% # create a new variable that converts to m:s mutate(time_new = str_sub(time, 1, 5), time_new = ms(time_new), # create a new variable for total seconds seconds = minute(time_new)*60 + second(time_new), # create a game_seconds variable to match what's in pbp data game_seconds = case_when( game_period == 1 ~ 1200 - seconds, game_period == 2 ~ 2400 - seconds, game_period == 3 ~ 3600 - seconds)) %>% # remove now-unnecessary variables select(-c(time, time_new, seconds, game_period))
game_id | event_type | game_seconds |
---|---|---|
2019020336 | EXIT | 238 |
2019020336 | EXIT | 273 |
2019020336 | EXIT | 318 |
2019020336 | EXIT | 363 |
2019020336 | EXIT | 444 |
2019020336 | EXIT | 468 |
2019020336 | EXIT | 1284 |
2019020336 | EXIT | 1310 |
2019020336 | EXIT | 1360 |
2019020336 | EXIT | 1520 |
We want to expand our data: one row per event ➡️ one row per second
We want to expand our data: one row per event ➡️ one row per second
So we can more easily:
1️⃣ Examine the average composition of a power play per second
2️⃣ Join in our manually-tracked data
We want to expand our data: one row per event ➡️ one row per second
So we can more easily:
1️⃣ Examine the average composition of a power play per second
2️⃣ Join in our manually-tracked data
Thankfully, we have the padr
package.
data <- data %>% # create an 0/1 variable to identify a new power play # and a power_play_count variable mutate(new_power_play = ifelse(lag(game_seconds + event_length) == game_seconds & lag(PP_team) == PP_team, 0, 1), new_power_play = ifelse(is.na(new_power_play), 1, new_power_play), power_play_count = cumsum(new_power_play)) %>% select(power_play_count, everything())
data <- data %>% # create an 0/1 variable to identify a new power play # and a power_play_count variable mutate(new_power_play = ifelse(lag(game_seconds + event_length) == game_seconds & lag(PP_team) == PP_team, 0, 1), new_power_play = ifelse(is.na(new_power_play), 1, new_power_play), power_play_count = cumsum(new_power_play)) %>% select(power_play_count, everything())
game_id | event_index | PP_team | game_seconds | event_length | PP_fwds | new_power_play |
---|---|---|---|---|---|---|
2019020336 | 41 | CAR | 234 | 7 | 4 | 1 |
2019020336 | 42 | CAR | 241 | 24 | 4 | 0 |
2019020336 | 43 | CAR | 265 | 2 | 4 | 0 |
2019020336 | 44 | CAR | 267 | 5 | 4 | 0 |
2019020336 | 45 | CAR | 272 | 4 | 4 | 0 |
2019020336 | 46 | CAR | 276 | 7 | 4 | 0 |
2019020336 | 60 | PHI | 313 | 33 | 4 | 1 |
2019020336 | 61 | PHI | 346 | 10 | 4 | 0 |
2019020336 | 62 | PHI | 356 | 1 | 4 | 0 |
2019020336 | 66 | PHI | 357 | 22 | 3 | 0 |
data <- data %>% # create an 0/1 variable to identify a new power play # and a power_play_count variable mutate(new_power_play = ifelse(lag(game_seconds + event_length) == game_seconds & lag(PP_team) == PP_team, 0, 1), new_power_play = ifelse(is.na(new_power_play), 1, new_power_play), power_play_count = cumsum(new_power_play)) %>% select(power_play_count, everything())
game_id | event_index | PP_team | game_seconds | event_length | PP_fwds | new_power_play | power_play_count |
---|---|---|---|---|---|---|---|
2019020336 | 41 | CAR | 234 | 7 | 4 | 1 | 1 |
2019020336 | 42 | CAR | 241 | 24 | 4 | 0 | 1 |
2019020336 | 43 | CAR | 265 | 2 | 4 | 0 | 1 |
2019020336 | 44 | CAR | 267 | 5 | 4 | 0 | 1 |
2019020336 | 45 | CAR | 272 | 4 | 4 | 0 | 1 |
2019020336 | 46 | CAR | 276 | 7 | 4 | 0 | 1 |
2019020336 | 60 | PHI | 313 | 33 | 4 | 1 | 2 |
2019020336 | 61 | PHI | 346 | 10 | 4 | 0 | 2 |
2019020336 | 62 | PHI | 356 | 1 | 4 | 0 | 2 |
2019020336 | 66 | PHI | 357 | 22 | 3 | 0 | 2 |
data <- data %>% # create an 0/1 variable to identify a new power play # and a power_play_count variable mutate(new_power_play = ifelse(lag(game_seconds + event_length) == game_seconds & lag(PP_team) == PP_team, 0, 1), new_power_play = ifelse(is.na(new_power_play), 1, new_power_play), power_play_count = cumsum(new_power_play)) %>% select(power_play_count, everything())
power_play_count | game_id | event_index | PP_team | game_seconds | event_length | PP_fwds | new_power_play |
---|---|---|---|---|---|---|---|
1 | 2019020336 | 41 | CAR | 234 | 7 | 4 | 1 |
1 | 2019020336 | 42 | CAR | 241 | 24 | 4 | 0 |
1 | 2019020336 | 43 | CAR | 265 | 2 | 4 | 0 |
1 | 2019020336 | 44 | CAR | 267 | 5 | 4 | 0 |
1 | 2019020336 | 45 | CAR | 272 | 4 | 4 | 0 |
1 | 2019020336 | 46 | CAR | 276 | 7 | 4 | 0 |
2 | 2019020336 | 60 | PHI | 313 | 33 | 4 | 1 |
2 | 2019020336 | 61 | PHI | 346 | 10 | 4 | 0 |
2 | 2019020336 | 62 | PHI | 356 | 1 | 4 | 0 |
2 | 2019020336 | 66 | PHI | 357 | 22 | 3 | 0 |
sample <- data %>% filter(power_play_count == 1) %>% pad_int('game_seconds', start_val = 234, end_val = 283) %>% fill(power_play_count, game_id, PP_team, PP_fwds, .direction = "down")
sample <- data %>% filter(power_play_count == 1) %>% pad_int('game_seconds', start_val = 234, end_val = 283) %>% fill(power_play_count, game_id, PP_team, PP_fwds, .direction = "down")
power_play_count | game_id | event_index | PP_team | game_seconds | event_length | PP_fwds | new_power_play |
---|---|---|---|---|---|---|---|
1 | 2019020336 | 41 | CAR | 234 | 7 | 4 | 1 |
1 | 2019020336 | 42 | CAR | 241 | 24 | 4 | 0 |
1 | 2019020336 | 43 | CAR | 265 | 2 | 4 | 0 |
1 | 2019020336 | 44 | CAR | 267 | 5 | 4 | 0 |
1 | 2019020336 | 45 | CAR | 272 | 4 | 4 | 0 |
1 | 2019020336 | 46 | CAR | 276 | 7 | 4 | 0 |
power_play_count | game_id | event_index | PP_team | game_seconds | event_length | PP_fwds | new_power_play |
---|---|---|---|---|---|---|---|
1 | 2019020336 | 41 | CAR | 234 | 7 | 4 | 1 |
1 | 2019020336 | 42 | CAR | 241 | 24 | 4 | 0 |
1 | 2019020336 | 43 | CAR | 265 | 2 | 4 | 0 |
1 | 2019020336 | 44 | CAR | 267 | 5 | 4 | 0 |
1 | 2019020336 | 45 | CAR | 272 | 4 | 4 | 0 |
1 | 2019020336 | 46 | CAR | 276 | 7 | 4 | 0 |
power_play_count | game_id | event_index | PP_team | game_seconds | event_length | PP_fwds | new_power_play |
---|---|---|---|---|---|---|---|
1 | 2019020336 | 41 | CAR | 234 | 7 | 4 | 1 |
1 | 2019020336 | 42 | CAR | 241 | 24 | 4 | 0 |
1 | 2019020336 | 43 | CAR | 265 | 2 | 4 | 0 |
1 | 2019020336 | 44 | CAR | 267 | 5 | 4 | 0 |
1 | 2019020336 | 45 | CAR | 272 | 4 | 4 | 0 |
1 | 2019020336 | 46 | CAR | 276 | 7 | 4 | 0 |
sample <- data %>% filter(power_play_count == 1) %>% pad_int('game_seconds', start_val = 234, end_val = 283) %>% fill(power_play_count, game_id, PP_team, PP_fwds, .direction = "down")
game_seconds | power_play_count | game_id | event_index | PP_team | event_length | PP_fwds | new_power_play |
---|---|---|---|---|---|---|---|
234 | 1 | 2019020336 | 41 | CAR | 7 | 4 | 1 |
235 | |||||||
236 | |||||||
237 | |||||||
238 | |||||||
239 | |||||||
240 | |||||||
241 | 1 | 2019020336 | 42 | CAR | 24 | 4 | 0 |
242 | |||||||
243 |
game_seconds | power_play_count | game_id | event_index | PP_team | event_length | PP_fwds | new_power_play |
---|---|---|---|---|---|---|---|
234 | 1 | 2019020336 | 41 | CAR | 7 | 4 | 1 |
235 | |||||||
236 | |||||||
237 | |||||||
238 | |||||||
239 | |||||||
240 | |||||||
241 | 1 | 2019020336 | 42 | CAR | 24 | 4 | 0 |
242 | |||||||
243 |
sample <- data %>% filter(power_play_count == 1) %>% pad_int('game_seconds', start_val = 234, end_val = 283) %>% fill(power_play_count, game_id, PP_team, PP_fwds, .direction = "down")
game_seconds | power_play_count | game_id | event_index | PP_team | event_length | PP_fwds | new_power_play |
---|---|---|---|---|---|---|---|
234 | 1 | 2019020336 | 41 | CAR | 7 | 4 | 1 |
235 | 1 | 2019020336 | CAR | 4 | |||
236 | 1 | 2019020336 | CAR | 4 | |||
237 | 1 | 2019020336 | CAR | 4 | |||
238 | 1 | 2019020336 | CAR | 4 | |||
239 | 1 | 2019020336 | CAR | 4 | |||
240 | 1 | 2019020336 | CAR | 4 | |||
241 | 1 | 2019020336 | 42 | CAR | 24 | 4 | 0 |
242 | 1 | 2019020336 | CAR | 4 | |||
243 | 1 | 2019020336 | CAR | 4 |
game_seconds | power_play_count | game_id | event_index | PP_team | event_length | PP_fwds | new_power_play |
---|---|---|---|---|---|---|---|
234 | 1 | 2019020336 | 41 | CAR | 7 | 4 | 1 |
235 | 1 | 2019020336 | CAR | 4 | |||
236 | 1 | 2019020336 | CAR | 4 | |||
237 | 1 | 2019020336 | CAR | 4 | |||
238 | 1 | 2019020336 | CAR | 4 | |||
239 | 1 | 2019020336 | CAR | 4 | |||
240 | 1 | 2019020336 | CAR | 4 | |||
241 | 1 | 2019020336 | 42 | CAR | 24 | 4 | 0 |
242 | 1 | 2019020336 | CAR | 4 | |||
243 | 1 | 2019020336 | CAR | 4 |
😬 (Don't be scared like me.)
😬 (Don't be scared like me.)
📝 You've copied and pasted code more than twice.
😬 (Don't be scared like me.)
📝 You've copied and pasted code more than twice.
🔁 You need to run multiple lines of code on different segments of data.
😬 (Don't be scared like me.)
📝 You've copied and pasted code more than twice.
🔁 You need to run multiple lines of code on different segments of data.
# basic syntax of a functionname_of_function <- function(needed_value) { # all the code goes here}
padding_function <- function(count) { each_power_play <- data %>% filter(power_play_count == count) minimum <- min(each_power_play$game_seconds) maximum <- max(each_power_play$game_seconds + each_power_play$event_length - 1) padded <- each_power_play %>% pad_int('game_seconds', start_val = minimum, end_val = maximum) %>% fill(power_play_count, game_id, PP_team, PP_fwds, .direction = "down")}
padding_function <- function(count) { each_power_play <- data %>% filter(power_play_count == count) minimum <- min(each_power_play$game_seconds) maximum <- max(each_power_play$game_seconds + each_power_play$event_length - 1) padded <- each_power_play %>% pad_int('game_seconds', start_val = minimum, end_val = maximum) %>% fill(power_play_count, game_id, PP_team, PP_fwds, .direction = "down")}
padding_function <- function(count) { each_power_play <- data %>% filter(power_play_count == count) minimum <- min(each_power_play$game_seconds) maximum <- max(each_power_play$game_seconds + each_power_play$event_length - 1) padded <- each_power_play %>% pad_int('game_seconds', start_val = minimum, end_val = maximum) %>% fill(power_play_count, game_id, PP_team, PP_fwds, .direction = "down")}
padding_function <- function(count) { each_power_play <- data %>% filter(power_play_count == count) minimum <- min(each_power_play$game_seconds) maximum <- max(each_power_play$game_seconds + each_power_play$event_length - 1) padded <- each_power_play %>% pad_int('game_seconds', start_val = minimum, end_val = maximum) %>% fill(power_play_count, game_id, PP_team, PP_fwds, .direction = "down")}
padding_function <- function(count) { each_power_play <- data %>% filter(power_play_count == count) minimum <- min(each_power_play$game_seconds) maximum <- max(each_power_play$game_seconds + each_power_play$event_length - 1) padded <- each_power_play %>% pad_int('game_seconds', start_val = minimum, end_val = maximum) %>% fill(power_play_count, game_id, PP_team, PP_fwds, .direction = "down")}
A common scenario:
A common scenario:
✂️ Split the data into pieces
A common scenario:
✂️ Split the data into pieces
🔁 Apply a function to each piece
A common scenario:
✂️ Split the data into pieces
🔁 Apply a function to each piece
➕ Put it back together
A common scenario:
✂️ Split the data into pieces
🔁 Apply a function to each piece
➕ Put it back together
We can do this with the map()
family of functions in the purrr
package.
data_padded <- map_df(unique(data$power_play_count), padding_function)
data_padded <- map_df(unique(data$power_play_count), padding_function)
map_df
has two arguments: the data and the function
data_padded <- map_df(unique(data$power_play_count), padding_function)
map_df
has two arguments: the data and the function
unique(data$power_play_count)
We're telling it to perform the function for each unique value of power_play_count
in the data
data frame
data_padded <- map_df(unique(data$power_play_count), padding_function)
map_df
has two arguments: the data and the function
unique(data$power_play_count)
We're telling it to perform the function for each unique value of power_play_count
in the data
data frame
It will apply our padding_function
and return a data frame with the data padded
game_seconds | power_play_count | game_id | event_index | PP_team | event_length | PP_fwds | new_power_play |
---|---|---|---|---|---|---|---|
234 | 1 | 2019020336 | 41 | CAR | 7 | 4 | 1 |
235 | 1 | 2019020336 | CAR | 4 | |||
236 | 1 | 2019020336 | CAR | 4 | |||
237 | 1 | 2019020336 | CAR | 4 | |||
238 | 1 | 2019020336 | CAR | 4 | |||
239 | 1 | 2019020336 | CAR | 4 | |||
240 | 1 | 2019020336 | CAR | 4 | |||
241 | 1 | 2019020336 | 42 | CAR | 24 | 4 | 0 |
242 | 1 | 2019020336 | CAR | 4 | |||
243 | 1 | 2019020336 | CAR | 4 |
game_seconds | power_play_count | game_id | event_index | PP_team | event_length | PP_fwds | new_power_play |
---|---|---|---|---|---|---|---|
2266 | 27 | 2019020384 | CBJ | 3 | |||
2267 | 27 | 2019020384 | CBJ | 3 | |||
2268 | 27 | 2019020384 | CBJ | 3 | |||
2269 | 27 | 2019020384 | CBJ | 3 | |||
2270 | 27 | 2019020384 | CBJ | 3 | |||
2271 | 27 | 2019020384 | 381 | CBJ | 5 | 3 | 0 |
2272 | 27 | 2019020384 | CBJ | 3 | |||
2273 | 27 | 2019020384 | CBJ | 3 | |||
2274 | 27 | 2019020384 | CBJ | 3 | |||
2275 | 27 | 2019020384 | CBJ | 3 |
data_final <- data_padded %>% group_by(power_play_count) %>% # add in a new variable for the second of # each power play, counting up from 1 mutate(power_play_second = row_number()) %>% ungroup()
data_final <- data_padded %>% group_by(power_play_count) %>% # add in a new variable for the second of # each power play, counting up from 1 mutate(power_play_second = row_number()) %>% ungroup()
game_seconds | power_play_count | game_id | event_index | PP_team | event_length | PP_fwds | new_power_play | power_play_second |
---|---|---|---|---|---|---|---|---|
234 | 1 | 2019020336 | 41 | CAR | 7 | 4 | 1 | 1 |
235 | 1 | 2019020336 | CAR | 4 | 2 | |||
236 | 1 | 2019020336 | CAR | 4 | 3 | |||
237 | 1 | 2019020336 | CAR | 4 | 4 | |||
238 | 1 | 2019020336 | CAR | 4 | 5 | |||
239 | 1 | 2019020336 | CAR | 4 | 6 | |||
240 | 1 | 2019020336 | CAR | 4 | 7 | |||
241 | 1 | 2019020336 | 42 | CAR | 24 | 4 | 0 | 8 |
242 | 1 | 2019020336 | CAR | 4 | 9 | |||
243 | 1 | 2019020336 | CAR | 4 | 10 |
game_seconds | power_play_count | game_id | event_index | PP_team | event_length | PP_fwds | new_power_play | power_play_second |
---|---|---|---|---|---|---|---|---|
234 | 1 | 2019020336 | 41 | CAR | 7 | 4 | 1 | 1 |
235 | 1 | 2019020336 | CAR | 4 | 2 | |||
236 | 1 | 2019020336 | CAR | 4 | 3 | |||
237 | 1 | 2019020336 | CAR | 4 | 4 | |||
238 | 1 | 2019020336 | CAR | 4 | 5 | |||
239 | 1 | 2019020336 | CAR | 4 | 6 | |||
240 | 1 | 2019020336 | CAR | 4 | 7 | |||
241 | 1 | 2019020336 | 42 | CAR | 24 | 4 | 0 | 8 |
242 | 1 | 2019020336 | CAR | 4 | 9 | |||
243 | 1 | 2019020336 | CAR | 4 | 10 |
max(data_final$power_play_second)
## [1] 120
✅ Player position data is in a different place
❌ Zone exit data is in a different place
✅ One row per event instead of one row per second 🤔
data_final <- data_final %>% left_join(tracking, by = c("game_id", "game_seconds"))
game_seconds | power_play_count | game_id | event_index | PP_team | event_length | PP_fwds | new_power_play | power_play_second | event_type |
---|---|---|---|---|---|---|---|---|---|
234 | 1 | 2019020336 | 41 | CAR | 7 | 4 | 1 | 1 | |
235 | 1 | 2019020336 | CAR | 4 | 2 | ||||
236 | 1 | 2019020336 | CAR | 4 | 3 | ||||
237 | 1 | 2019020336 | CAR | 4 | 4 | ||||
238 | 1 | 2019020336 | CAR | 4 | 5 | EXIT | |||
239 | 1 | 2019020336 | CAR | 4 | 6 | ||||
240 | 1 | 2019020336 | CAR | 4 | 7 | ||||
241 | 1 | 2019020336 | 42 | CAR | 24 | 4 | 0 | 8 | |
242 | 1 | 2019020336 | CAR | 4 | 9 | ||||
243 | 1 | 2019020336 | CAR | 4 | 10 |
game_seconds | power_play_count | game_id | event_index | PP_team | event_length | PP_fwds | new_power_play | power_play_second | event_type |
---|---|---|---|---|---|---|---|---|---|
234 | 1 | 2019020336 | 41 | CAR | 7 | 4 | 1 | 1 | |
235 | 1 | 2019020336 | CAR | 4 | 2 | ||||
236 | 1 | 2019020336 | CAR | 4 | 3 | ||||
237 | 1 | 2019020336 | CAR | 4 | 4 | ||||
238 | 1 | 2019020336 | CAR | 4 | 5 | EXIT | |||
239 | 1 | 2019020336 | CAR | 4 | 6 | ||||
240 | 1 | 2019020336 | CAR | 4 | 7 | ||||
241 | 1 | 2019020336 | 42 | CAR | 24 | 4 | 0 | 8 | |
242 | 1 | 2019020336 | CAR | 4 | 9 | ||||
243 | 1 | 2019020336 | CAR | 4 | 10 |
data_final %>% get_dupes(game_id, game_seconds)
## # A tibble: 0 x 11## # … with 11 variables: game_id <dbl>, game_seconds <dbl>, dupe_count <int>,## # power_play_count <dbl>, event_index <dbl>, PP_team <chr>,## # event_length <dbl>, PP_fwds <dbl>, new_power_play <dbl>,## # power_play_second <int>, event_type <chr>
✅ Player position data is in a different place
✅ Zone exit data is in a different place
✅ One row per event instead of one row per second 🤔
If you can't handle the data cleaning, you don't deserve the data analysis. -- Me
summary <- data_final %>% group_by(PP_fwds) %>% summarize(time_on_ice = sum(event_length, na.rm = TRUE) / 60, exits = sum(!is.na(event_type))) %>% mutate(exit_rate = exits * 60 / time_on_ice)
summary <- data_final %>% group_by(PP_fwds) %>% summarize(time_on_ice = sum(event_length, na.rm = TRUE) / 60, exits = sum(!is.na(event_type))) %>% mutate(exit_rate = exits * 60 / time_on_ice)
summary <- data_final %>% group_by(PP_fwds) %>% summarize(time_on_ice = sum(event_length, na.rm = TRUE) / 60, exits = sum(!is.na(event_type))) %>% mutate(exit_rate = exits * 60 / time_on_ice)
summary <- data_final %>% group_by(PP_fwds) %>% summarize(time_on_ice = sum(event_length, na.rm = TRUE) / 60, exits = sum(!is.na(event_type))) %>% mutate(exit_rate = exits * 60 / time_on_ice)
summary <- data_final %>% group_by(PP_fwds) %>% summarize(time_on_ice = sum(event_length, na.rm = TRUE) / 60, exits = sum(!is.na(event_type))) %>% mutate(exit_rate = exits * 60 / time_on_ice)
summary <- data_final %>% group_by(PP_fwds) %>% summarize(time_on_ice = sum(event_length, na.rm = TRUE) / 60, exits = sum(!is.na(event_type))) %>% mutate(exit_rate = exits * 60 / time_on_ice)
PP_fwds | time_on_ice | exits | exit_rate |
---|---|---|---|
3 | 11.25 | 17 | 90.66667 |
4 | 30.50 | 66 | 129.83607 |
summary %>% ggplot(aes(x = as.character(PP_fwds), y = exit_rate)) + geom_bar(stat="identity")
summary %>% ggplot(aes(x = as.character(PP_fwds), y = exit_rate)) + geom_bar(stat="identity") + labs(title = "Rate of zone exits on the power play", subtitle = "5v4 only, 4 NHL games November 2019", x = "Number of forwards on the power play", y = "Zone exits per 60 minutes") + geom_text(aes(label = round(exit_rate, 2)), vjust = 1.6, color = "white", size = 4)
summary %>% ggplot(aes(x = as.character(PP_fwds), y = exit_rate, fill = as.character(PP_fwds))) + geom_bar(stat="identity") + labs(title = "Rate of zone exits on the power play", subtitle = "5v4 only, 4 NHL games November 2019", x = "Number of forwards on the power play", y = "Zone exits per 60 minutes") + geom_text(aes(label = round(exit_rate, 2)), vjust = 1.6, color = "white", size = 4) + scale_fill_manual(values = c("#808080", "#5C164E")) + theme(legend.position = "none")
summary %>% ggplot(aes(x = as.character(PP_fwds), y = exit_rate, fill = as.character(PP_fwds))) + geom_bar(stat="identity") + labs(title = "Rate of zone exits on the power play", subtitle = "5v4 only, 4 NHL games November 2019", x = "Number of forwards on the power play", y = "Zone exits per 60 minutes") + geom_text(aes(label = round(exit_rate, 2)), vjust = 1.6, color = "white", size = 4) + scale_fill_manual(values = c("#808080", "#912919")) + CMSAC_theme() + theme(legend.position = "none")
🕥 Saves time!
🕥 Saves time!
⭐ Helps establish your #brand.
🕥 Saves time!
⭐ Helps establish your #brand.
❤️ Looks way prettier than a standard ggplot2
graph.
📝 Use Google fonts
library(showtext)font_add_google("Open Sans", "opensans")
📝 Use Google fonts
library(showtext)font_add_google("Open Sans", "opensans")
📓 Design your theme
CMSAC_theme <- function () { theme_linedraw(base_size=11, base_family="opensans") %+replace% theme( axis.ticks = element_blank(), panel.grid.minor = element_blank(), panel.grid.major = element_blank(), plot.title = element_text(size = 15, hjust = 0, vjust = 0.5, margin = margin(b = 0.2, unit = "cm")), plot.subtitle = element_text(size = 9, face = "italic", hjust = 0, vjust = 0.5, margin = margin(b = 0.2, unit = "cm")), axis.title = element_text(face = "bold")) }
📝 Use Google fonts
library(showtext)font_add_google("Open Sans", "opensans")
📓 Design your theme
CMSAC_theme <- function () { theme_linedraw(base_size=11, base_family="opensans") %+replace% theme( axis.ticks = element_blank(), panel.grid.minor = element_blank(), panel.grid.major = element_blank(), plot.title = element_text(size = 15, hjust = 0, vjust = 0.5, margin = margin(b = 0.2, unit = "cm")), plot.subtitle = element_text(size = 9, face = "italic", hjust = 0, vjust = 0.5, margin = margin(b = 0.2, unit = "cm")), axis.title = element_text(face = "bold")) }
📝 Use Google fonts
library(showtext)font_add_google("Open Sans", "opensans")
📓 Design your theme
CMSAC_theme <- function () { theme_linedraw(base_size=11, base_family="opensans") %+replace% theme( axis.ticks = element_blank(), panel.grid.minor = element_blank(), panel.grid.major = element_blank(), plot.title = element_text(size = 15, hjust = 0, vjust = 0.5, margin = margin(b = 0.2, unit = "cm")), plot.subtitle = element_text(size = 9, face = "italic", hjust = 0, vjust = 0.5, margin = margin(b = 0.2, unit = "cm")), axis.title = element_text(face = "bold")) }
📝 Use Google fonts
library(showtext)font_add_google("Open Sans", "opensans")
📓 Design your theme
CMSAC_theme <- function () { theme_linedraw(base_size=11, base_family="opensans") %+replace% theme( axis.ticks = element_blank(), panel.grid.minor = element_blank(), panel.grid.major = element_blank(), plot.title = element_text(size = 15, hjust = 0, vjust = 0.5, margin = margin(b = 0.2, unit = "cm")), plot.subtitle = element_text(size = 9, face = "italic", hjust = 0, vjust = 0.5, margin = margin(b = 0.2, unit = "cm")), axis.title = element_text(face = "bold")) }
1️⃣ Start with an existing theme: theme_linedraw()
, theme_classic()
, theme_light()
, theme_minimal()
, etc.
1️⃣ Start with an existing theme: theme_linedraw()
, theme_classic()
, theme_light()
, theme_minimal()
, etc.
2️⃣ Use the ggplot2 theme reference page to find your arguments.
1️⃣ Start with an existing theme: theme_linedraw()
, theme_classic()
, theme_light()
, theme_minimal()
, etc.
2️⃣ Use the ggplot2 theme reference page to find your arguments.
3️⃣ Investigate the options within those arguments.
theme_minimal <- theme_minimal()theme_minimal$plot.subtitle
## List of 11## $ family : NULL## $ face : NULL## $ colour : NULL## $ size : NULL## $ hjust : num 0## $ vjust : num 1## $ angle : NULL## $ lineheight : NULL## $ margin : 'margin' num [1:4] 0pt 0pt 5.5pt 0pt## ..- attr(*, "valid.unit")= int 8## ..- attr(*, "unit")= chr "pt"## $ debug : NULL## $ inherit.blank: logi TRUE## - attr(*, "class")= chr [1:2] "element_text" "element"
time <- data_final %>% group_by(power_play_second, PP_fwds) %>% summarize(n = n()) %>% add_tally(n, name = "total") %>% mutate(percent = n / total)
time <- data_final %>% group_by(power_play_second, PP_fwds) %>% summarize(n = n()) %>% add_tally(n, name = "total") %>% mutate(percent = n / total)
power_play_second | PP_fwds | n |
---|---|---|
1 | 3 | 9 |
1 | 4 | 18 |
2 | 3 | 9 |
2 | 4 | 18 |
3 | 3 | 9 |
3 | 4 | 18 |
4 | 3 | 9 |
4 | 4 | 18 |
5 | 3 | 8 |
5 | 4 | 18 |
time <- data_final %>% group_by(power_play_second, PP_fwds) %>% summarize(n = n()) %>% add_tally(n, name = "total") %>% mutate(percent = n / total)
power_play_second | PP_fwds | n | total |
---|---|---|---|
1 | 3 | 9 | 27 |
1 | 4 | 18 | 27 |
2 | 3 | 9 | 27 |
2 | 4 | 18 | 27 |
3 | 3 | 9 | 27 |
3 | 4 | 18 | 27 |
4 | 3 | 9 | 27 |
4 | 4 | 18 | 27 |
5 | 3 | 8 | 26 |
5 | 4 | 18 | 26 |
time <- data_final %>% group_by(power_play_second, PP_fwds) %>% summarize(n = n()) %>% add_tally(n, name = "total") %>% mutate(percent = n / total)
power_play_second | PP_fwds | n | total | percent |
---|---|---|---|---|
1 | 3 | 9 | 27 | 0.3333333 |
1 | 4 | 18 | 27 | 0.6666667 |
2 | 3 | 9 | 27 | 0.3333333 |
2 | 4 | 18 | 27 | 0.6666667 |
3 | 3 | 9 | 27 | 0.3333333 |
3 | 4 | 18 | 27 | 0.6666667 |
4 | 3 | 9 | 27 | 0.3333333 |
4 | 4 | 18 | 27 | 0.6666667 |
5 | 3 | 8 | 26 | 0.3076923 |
5 | 4 | 18 | 26 | 0.6923077 |
time %>% mutate(PP_fwds = ifelse(PP_fwds == 3, "3 forwards", "4 forwards")) %>% ggplot(aes(x = power_play_second, y = percent, fill = PP_fwds)) + geom_area() + labs(title = "Composition of the power play", subtitle = "5v4 only, 4 NHL games November 2019", x = "Second of the power play") + scale_fill_manual(values = c("#808080", "#912919")) + scale_y_continuous(labels = scales::percent_format(accuracy = 1), expand = c(0, 0)) + scale_x_continuous(limits = c(1, 120), breaks = seq(0, 120, 20), expand = c(0, 0)) + CMSAC_theme() + theme(legend.position = c(0.87, 1.106), legend.title = element_blank(), axis.title.y = element_blank())
time %>% mutate(PP_fwds = ifelse(PP_fwds == 3, "3 forwards", "4 forwards")) %>% ggplot(aes(x = power_play_second, y = percent, fill = PP_fwds)) + geom_area() + labs(title = "Composition of the power play", subtitle = "5v4 only, 4 NHL games November 2019", x = "Second of the power play") + scale_fill_manual(values = c("#808080", "#912919")) + scale_y_continuous(labels = scales::percent_format(accuracy = 1), expand = c(0, 0)) + scale_x_continuous(limits = c(1, 120), breaks = seq(0, 120, 20), expand = c(0, 0)) + CMSAC_theme() + theme(legend.position = c(0.87, 1.106), legend.title = element_blank(), axis.title.y = element_blank())
time %>% mutate(PP_fwds = ifelse(PP_fwds == 3, "3 forwards", "4 forwards")) %>% ggplot(aes(x = power_play_second, y = percent, fill = PP_fwds)) + geom_area() + labs(title = "Composition of the power play", subtitle = "5v4 only, 4 NHL games November 2019", x = "Second of the power play") + scale_fill_manual(values = c("#808080", "#912919")) + scale_y_continuous(labels = scales::percent_format(accuracy = 1), expand = c(0, 0)) + scale_x_continuous(limits = c(1, 120), breaks = seq(0, 120, 20), expand = c(0, 0)) + CMSAC_theme() + theme(legend.position = c(0.87, 1.106), legend.title = element_blank(), axis.title.y = element_blank())
time %>% mutate(PP_fwds = ifelse(PP_fwds == 3, "3 forwards", "4 forwards")) %>% ggplot(aes(x = power_play_second, y = percent, fill = PP_fwds)) + geom_area() + labs(title = "Composition of the power play", subtitle = "5v4 only, 4 NHL games November 2019", x = "Second of the power play") + scale_fill_manual(values = c("#808080", "#912919")) + scale_y_continuous(labels = scales::percent_format(accuracy = 1), expand = c(0, 0)) + scale_x_continuous(limits = c(1, 120), breaks = seq(0, 120, 20), expand = c(0, 0)) + CMSAC_theme() + theme(legend.position = c(0.87, 1.106), legend.title = element_blank(), axis.title.y = element_blank())
time %>% mutate(PP_fwds = ifelse(PP_fwds == 3, "3 forwards", "4 forwards")) %>% ggplot(aes(x = power_play_second, y = percent, fill = PP_fwds)) + geom_area() + labs(title = "Composition of the power play", subtitle = "5v4 only, 4 NHL games November 2019", x = "Second of the power play") + scale_fill_manual(values = c("#808080", "#912919")) + scale_y_continuous(labels = scales::percent_format(accuracy = 1), expand = c(0, 0)) + scale_x_continuous(limits = c(1, 120), breaks = seq(0, 120, 20), expand = c(0, 0)) + CMSAC_theme() + theme(legend.position = c(0.87, 1.106), legend.title = element_blank(), axis.title.y = element_blank())
⭐ Pivoted and joined data with tidyr
and dplyr
.
⭐ Wrote our own function and used purrr
to apply it iteratively.
⭐ Created a custom theme and made an area graph.
⭐ Pivoted and joined data with tidyr
and dplyr
.
⭐ Wrote our own function and used purrr
to apply it iteratively.
⭐ Created a custom theme and made an area graph.
❗ Slides/data/code are available on my website and Github so you can review.
<i class="fab fa-twitter "></i>
{=html} @MeghanMHall
<i class="fab fa-github "></i>
{=html} meghall06
<i class="fas fa-link "></i>
{=html} meghan.rbind.io
<i class="fas fa-envelope "></i>
{=html} meghanhall6@gmail.com
Slides created via the R package xaringan.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |