Meghan Hall
@MeghanMHall
meghan.rbind.io
Hockey (Analytics) Night in Canada
Feb. 18, 2021
...no 😞
...no 😞
But we can...
...no 😞
But we can...
Disclaimer
The speaker herein pledges as follows: no declarative statements will be uttered about how R is objectively "the best" and no moral judgments will be made on which software, IDE, programming language, packages, dark mode, etc. the viewer chooses to use and/or not use.
tl;dr
I don't care what you use for your project! Use Excel! Use Python! Use SAS! Use an abacus!
...no 😞
But we can...
Disclaimer
The speaker herein pledges as follows: no declarative statements will be uttered about how R is objectively "the best" and no moral judgments will be made on which software, IDE, programming language, packages, dark mode, etc. the viewer chooses to use and/or not use.
tl;dr
I don't care what you use for your project! Use Excel! Use Python! Use SAS! Use an abacus!
But these are my 10 minutes, so we're going to talk about R.
1️⃣ the language itself: https://cloud.r-project.org/
2️⃣ the IDE (integrated development environment): https://rstudio.com/products/rstudio/download/
library(tidyverse)library(janitor)scouting <- read_csv("https://tinyurl.com/BDCscouting") %>% clean_names()nwhl <- read_csv("https://tinyurl.com/BDCnwhl") %>% clean_names()women <- read_csv("https://tinyurl.com/BDCwomens") %>% clean_names()
library(tidyverse)library(janitor) scouting <- read_csv("https://tinyurl.com/BDCscouting") %>% clean_names()nwhl <- read_csv("https://tinyurl.com/BDCnwhl") %>% clean_names()women <- read_csv("https://tinyurl.com/BDCwomens") %>% clean_names()
library(tidyverse) library(janitor) scouting <- read_csv("https://tinyurl.com/BDCscouting") %>% clean_names() nwhl <- read_csv("https://tinyurl.com/BDCnwhl") %>% clean_names() women <- read_csv("https://tinyurl.com/BDCwomens") %>% clean_names()
library(tidyverse) library(janitor)scouting <- read_csv("https://tinyurl.com/BDCscouting") %>% clean_names()nwhl <- read_csv("https://tinyurl.com/BDCnwhl") %>% clean_names()women <- read_csv("https://tinyurl.com/BDCwomens") %>% clean_names()
View(scouting)scouting %>% glimpse()scouting %>% count(event)scouting %>% count(event, detail_1)
View(scouting)scouting %>% glimpse() scouting %>% count(event)scouting %>% count(event, detail_1)
event | n |
---|---|
Dump In/Out | 4888 |
Faceoff Win | 2441 |
Goal | 293 |
Incomplete Play | 8890 |
Penalty Taken | 419 |
Play | 23778 |
Puck Recovery | 20667 |
Shot | 4887 |
View(scouting)scouting %>% glimpse() scouting %>% count(event) scouting %>% count(event, detail_1)
event | detail_1 | n |
---|---|---|
Dump In/Out | Lost | 4143 |
Dump In/Out | Retained | 745 |
Faceoff Win | Backhand | 2179 |
Faceoff Win | Feet | 17 |
Faceoff Win | Forehand | 245 |
Goal | Deflection | 17 |
Goal | Slapshot | 8 |
Goal | Snapshot | 148 |
From the scouting
data, among players who've taken at least 50 faceoffs, who has the best faceoff percentage?
From the scouting
data, among players who've taken at least 50 faceoffs, who has the best faceoff percentage?
(Is this the most rigorous Big Data Cup question? No, but I only have 10 minutes!)
From the scouting
data, among players who've taken at least 50 faceoffs, who has the best faceoff percentage?
(Is this the most rigorous Big Data Cup question? No, but I only have 10 minutes!)
game_date | team | player | event | player_2 |
---|---|---|---|---|
2019-09-20 | Sudbury Wolves | Blake Murray | Faceoff Win | Connor Lockhart |
2019-09-20 | Sudbury Wolves | Macauley Carson | Faceoff Win | Austen Swankler |
2019-09-20 | Sudbury Wolves | Quinton Byfield | Faceoff Win | Chad Yetman |
2019-09-20 | Sudbury Wolves | Macauley Carson | Faceoff Win | Austen Swankler |
2019-09-20 | Sudbury Wolves | Quinton Byfield | Faceoff Win | Chad Yetman |
2019-09-20 | Erie Otters | Alex Gritz | Faceoff Win | Blake Murray |
2019-09-20 | Erie Otters | Connor Lockhart | Faceoff Win | Ethan Larmand |
2019-09-20 | Erie Otters | Austen Swankler | Faceoff Win | Macauley Carson |
2019-09-20 | Erie Otters | Chad Yetman | Faceoff Win | Quinton Byfield |
2019-09-20 | Erie Otters | Brendan Hoffmann | Faceoff Win | Blake Murray |
From the scouting
data, among players who've taken at least 50 faceoffs, who has the best faceoff percentage?
(Is this the most rigorous Big Data Cup question? No, but I only have 10 minutes!)
game_date | team | player | event | player_2 |
---|---|---|---|---|
2019-09-20 | Sudbury Wolves | Blake Murray | Faceoff Win | Connor Lockhart |
2019-09-20 | Sudbury Wolves | Macauley Carson | Faceoff Win | Austen Swankler |
2019-09-20 | Sudbury Wolves | Quinton Byfield | Faceoff Win | Chad Yetman |
2019-09-20 | Sudbury Wolves | Macauley Carson | Faceoff Win | Austen Swankler |
2019-09-20 | Sudbury Wolves | Quinton Byfield | Faceoff Win | Chad Yetman |
2019-09-20 | Erie Otters | Alex Gritz | Faceoff Win | Blake Murray |
2019-09-20 | Erie Otters | Connor Lockhart | Faceoff Win | Ethan Larmand |
2019-09-20 | Erie Otters | Austen Swankler | Faceoff Win | Macauley Carson |
2019-09-20 | Erie Otters | Chad Yetman | Faceoff Win | Quinton Byfield |
2019-09-20 | Erie Otters | Brendan Hoffmann | Faceoff Win | Blake Murray |
player | faceoffs | faceoff_wins | faceoff_perc |
---|---|---|---|
Ty Dellandrea | 55 | 35 | 0.6363636 |
Cole Schwindt | 57 | 36 | 0.6315789 |
Cam Hillis | 104 | 64 | 0.6153846 |
Lucas Theriault | 52 | 29 | 0.5576923 |
Keean Washkurak | 55 | 29 | 0.5272727 |
Danny Zhilkin | 67 | 34 | 0.5074627 |
Austen Swankler | 334 | 169 | 0.5059880 |
Ryan McGregor | 72 | 36 | 0.5000000 |
Hayden Fowler | 310 | 149 | 0.4806452 |
Chad Yetman | 724 | 343 | 0.4737569 |
Connor Lockhart | 207 | 92 | 0.4444444 |
Noah Sedore | 68 | 30 | 0.4411765 |
Brendan Hoffmann | 488 | 197 | 0.4036885 |
Elias Cohen | 149 | 57 | 0.3825503 |
faceoffs <- scouting %>% filter(event == "Faceoff Win") %>% select(player, player_2) %>% rename(winner = player, loser = player_2) %>% pivot_longer(winner:loser, names_to = "status", values_to = "player") %>% mutate(win = ifelse(status == "winner", 1, 0)) %>% group_by(player) %>% summarize(faceoffs = n(), faceoff_wins = sum(win)) %>% mutate(faceoff_perc = faceoff_wins / faceoffs) %>% filter(faceoffs >= 50) %>% arrange(desc(faceoff_perc))
faceoffs <- scouting %>% filter(event == "Faceoff Win") %>% select(player, player_2) %>% rename(winner = player, loser = player_2) %>% pivot_longer(winner:loser, names_to = "status", values_to = "player") %>% mutate(win = ifelse(status == "winner", 1, 0)) %>% group_by(player) %>% summarize(faceoffs = n(), faceoff_wins = sum(win)) %>% mutate(faceoff_perc = faceoff_wins / faceoffs) %>% filter(faceoffs >= 50) %>% arrange(desc(faceoff_perc))
faceoffs <- scouting %>% filter(event == "Faceoff Win") %>% select(player, player_2) %>% rename(winner = player, loser = player_2) %>% pivot_longer(winner:loser, names_to = "status", values_to = "player") %>% mutate(win = ifelse(status == "winner", 1, 0)) %>% group_by(player) %>% summarize(faceoffs = n(), faceoff_wins = sum(win)) %>% mutate(faceoff_perc = faceoff_wins / faceoffs) %>% filter(faceoffs >= 50) %>% arrange(desc(faceoff_perc))
faceoffs <- scouting %>% filter(event == "Faceoff Win") %>% select(player, player_2) %>% rename(winner = player, loser = player_2) %>% pivot_longer(winner:loser, names_to = "status", values_to = "player") %>% mutate(win = ifelse(status == "winner", 1, 0)) %>% group_by(player) %>% summarize(faceoffs = n(), faceoff_wins = sum(win)) %>% mutate(faceoff_perc = faceoff_wins / faceoffs) %>% filter(faceoffs >= 50) %>% arrange(desc(faceoff_perc))
faceoffs <- scouting %>% filter(event == "Faceoff Win") %>% select(player, player_2) %>% rename(winner = player, loser = player_2) %>% pivot_longer(winner:loser, names_to = "status", values_to = "player") %>% mutate(win = ifelse(status == "winner", 1, 0)) %>% group_by(player) %>% summarize(faceoffs = n(), faceoff_wins = sum(win)) %>% mutate(faceoff_perc = faceoff_wins / faceoffs) %>% filter(faceoffs >= 50) %>% arrange(desc(faceoff_perc))
status | player |
---|---|
winner | Blake Murray |
loser | Connor Lockhart |
winner | Macauley Carson |
loser | Austen Swankler |
winner | Quinton Byfield |
loser | Chad Yetman |
winner | Macauley Carson |
loser | Austen Swankler |
winner | Quinton Byfield |
loser | Chad Yetman |
faceoffs <- scouting %>% filter(event == "Faceoff Win") %>% select(player, player_2) %>% rename(winner = player, loser = player_2) %>% pivot_longer(winner:loser, names_to = "status", values_to = "player") %>% mutate(win = ifelse(status == "winner", 1, 0)) %>% group_by(player) %>% summarize(faceoffs = n(), faceoff_wins = sum(win)) %>% mutate(faceoff_perc = faceoff_wins / faceoffs) %>% filter(faceoffs >= 50) %>% arrange(desc(faceoff_perc))
faceoffs <- scouting %>% filter(event == "Faceoff Win") %>% select(player, player_2) %>% rename(winner = player, loser = player_2) %>% pivot_longer(winner:loser, names_to = "status", values_to = "player") %>% mutate(win = ifelse(status == "winner", 1, 0)) %>% group_by(player) %>% summarize(faceoffs = n(), faceoff_wins = sum(win)) %>% mutate(faceoff_perc = faceoff_wins / faceoffs) %>% filter(faceoffs >= 50) %>% arrange(desc(faceoff_perc))
faceoffs <- scouting %>% filter(event == "Faceoff Win") %>% select(player, player_2) %>% rename(winner = player, loser = player_2) %>% pivot_longer(winner:loser, names_to = "status", values_to = "player") %>% mutate(win = ifelse(status == "winner", 1, 0)) %>% group_by(player) %>% summarize(faceoffs = n(), faceoff_wins = sum(win)) %>% mutate(faceoff_perc = faceoff_wins / faceoffs) %>% filter(faceoffs >= 50) %>% arrange(desc(faceoff_perc))
faceoffs <- scouting %>% filter(event == "Faceoff Win") %>% select(player, player_2) %>% rename(winner = player, loser = player_2) %>% pivot_longer(winner:loser, names_to = "status", values_to = "player") %>% mutate(win = ifelse(status == "winner", 1, 0)) %>% group_by(player) %>% summarize(faceoffs = n(), faceoff_wins = sum(win)) %>% mutate(faceoff_perc = faceoff_wins / faceoffs) %>% filter(faceoffs >= 50) %>% arrange(desc(faceoff_perc))
player | faceoffs | faceoff_wins | faceoff_perc |
---|---|---|---|
Ty Dellandrea | 55 | 35 | 0.6363636 |
Cole Schwindt | 57 | 36 | 0.6315789 |
Cam Hillis | 104 | 64 | 0.6153846 |
Lucas Theriault | 52 | 29 | 0.5576923 |
Keean Washkurak | 55 | 29 | 0.5272727 |
Danny Zhilkin | 67 | 34 | 0.5074627 |
Austen Swankler | 334 | 169 | 0.5059880 |
Ryan McGregor | 72 | 36 | 0.5000000 |
Hayden Fowler | 310 | 149 | 0.4806452 |
Chad Yetman | 724 | 343 | 0.4737569 |
Connor Lockhart | 207 | 92 | 0.4444444 |
Noah Sedore | 68 | 30 | 0.4411765 |
Brendan Hoffmann | 488 | 197 | 0.4036885 |
Elias Cohen | 149 | 57 | 0.3825503 |
faceoffs_team %>% ggplot(aes(x = reorder(player, faceoff_perc), y = faceoff_perc, fill = team)) + geom_bar(stat = "identity") + coord_flip() + scale_fill_manual(values = c("#F2A900", "#FF6720", "#862633", "#00205B", "#010101", "#C8C9C7")) + labs(title = "Faceoff Percentages", subtitle = "Among players with 50+ faceoffs, from Big Data Cup scouting data set") + ylab("Faceoff Win Percentage") + geom_text(aes(label = scales::percent(faceoff_perc, accuracy = 0.1)), family = "Seravek", hjust = -0.15) + scale_y_continuous(labels = scales::percent_format(accuracy = 1), limits = c(0, 0.72)) + hanic_theme() + theme(legend.title = element_blank(), legend.position = "bottom", panel.grid.major.y = element_blank(), axis.title.y = element_blank())
faceoffs_team %>% ggplot(aes(x = reorder(player, faceoff_perc), y = faceoff_perc, fill = team)) + geom_bar(stat = "identity") + coord_flip() + scale_fill_manual(values = c("#F2A900", "#FF6720", "#862633", "#00205B", "#010101", "#C8C9C7")) + labs(title = "Faceoff Percentages", subtitle = "Among players with 50+ faceoffs, from Big Data Cup scouting data set") + ylab("Faceoff Win Percentage") + geom_text(aes(label = scales::percent(faceoff_perc, accuracy = 0.1)), family = "Seravek", hjust = -0.15) + scale_y_continuous(labels = scales::percent_format(accuracy = 1), limits = c(0, 0.72)) + hanic_theme() + theme(legend.title = element_blank(), legend.position = "bottom", panel.grid.major.y = element_blank(), axis.title.y = element_blank())
faceoffs_team %>% ggplot(aes(x = reorder(player, faceoff_perc), y = faceoff_perc, fill = team)) + geom_bar(stat = "identity") + coord_flip() + scale_fill_manual(values = c("#F2A900", "#FF6720", "#862633", "#00205B", "#010101", "#C8C9C7")) + labs(title = "Faceoff Percentages", subtitle = "Among players with 50+ faceoffs, from Big Data Cup scouting data set") + ylab("Faceoff Win Percentage") + geom_text(aes(label = scales::percent(faceoff_perc, accuracy = 0.1)), family = "Seravek", hjust = -0.15) + scale_y_continuous(labels = scales::percent_format(accuracy = 1), limits = c(0, 0.72)) + hanic_theme() + theme(legend.title = element_blank(), legend.position = "bottom", panel.grid.major.y = element_blank(), axis.title.y = element_blank())
If you want to stick with the hockey theme:
install.packages("devtools")devtools::install_github("meghall06/betweenthepipes")# look in the Tutorial pane in the upper-right of RStudio# or...run:betweenthepipes::intro()betweenthepipes::data_manip()
If you want to stick with the hockey theme:
If you want to stick with the hockey theme:
install.packages("devtools")devtools::install_github("meghall06/betweenthepipes")# look in the Tutorial pane in the upper-right of RStudio# or...run:betweenthepipes::intro()betweenthepipes::data_manip()
Elsewhere:
...no 😞
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |