+ - 0:00:00
Notes for current slide
Notes for next slide

Using R for Your Big Data Cup Project

A gentle intro and some tips & tricks

Meghan Hall
@MeghanMHall
meghan.rbind.io
Hockey (Analytics) Night in Canada
Feb. 18, 2021

1

Can I teach you R in 10 minutes?

...no   😞

2

Can I teach you R in 10 minutes?

...no   😞

But we can...

  • talk about what's easier to do in R
  • go over some common functions you might use for this project
  • discuss a roadmap for learning more


2

Can I teach you R in 10 minutes?

...no   😞

But we can...

  • talk about what's easier to do in R
  • go over some common functions you might use for this project
  • discuss a roadmap for learning more


Disclaimer
The speaker herein pledges as follows: no declarative statements will be uttered about how R is objectively "the best" and no moral judgments will be made on which software, IDE, programming language, packages, dark mode, etc. the viewer chooses to use and/or not use.

tl;dr
I don't care what you use for your project! Use Excel! Use Python! Use SAS! Use an abacus!

2

Can I teach you R in 10 minutes?

...no   😞

But we can...

  • talk about what's easier to do in R
  • go over some common functions you might use for this project
  • discuss a roadmap for learning more


Disclaimer
The speaker herein pledges as follows: no declarative statements will be uttered about how R is objectively "the best" and no moral judgments will be made on which software, IDE, programming language, packages, dark mode, etc. the viewer chooses to use and/or not use.

tl;dr
I don't care what you use for your project! Use Excel! Use Python! Use SAS! Use an abacus!


But these are my 10 minutes, so we're going to talk about R.

2

Step 0: How does one use R, exactly?

1️⃣   the language itself: https://cloud.r-project.org/


2️⃣   the IDE (integrated development environment): https://rstudio.com/products/rstudio/download/

3

Step 1: Data!

library(tidyverse)
library(janitor)
scouting <- read_csv("https://tinyurl.com/BDCscouting") %>%
clean_names()
nwhl <- read_csv("https://tinyurl.com/BDCnwhl") %>%
clean_names()
women <- read_csv("https://tinyurl.com/BDCwomens") %>%
clean_names()
4

Step 1: Data!

library(tidyverse)
library(janitor)
scouting <- read_csv("https://tinyurl.com/BDCscouting") %>%
clean_names()
nwhl <- read_csv("https://tinyurl.com/BDCnwhl") %>%
clean_names()
women <- read_csv("https://tinyurl.com/BDCwomens") %>%
clean_names()
5

Step 1: Data!

library(tidyverse)
library(janitor)
scouting <- read_csv("https://tinyurl.com/BDCscouting") %>%
clean_names()
nwhl <- read_csv("https://tinyurl.com/BDCnwhl") %>%
clean_names()
women <- read_csv("https://tinyurl.com/BDCwomens") %>%
clean_names()
6

Step 1: Data!

library(tidyverse)
library(janitor)
scouting <- read_csv("https://tinyurl.com/BDCscouting") %>%
clean_names()
nwhl <- read_csv("https://tinyurl.com/BDCnwhl") %>%
clean_names()
women <- read_csv("https://tinyurl.com/BDCwomens") %>%
clean_names()
7

Step 2: Explore

View(scouting)
scouting %>%
glimpse()
scouting %>%
count(event)
scouting %>%
count(event, detail_1)
8

Step 2: Explore

View(scouting)
scouting %>%
glimpse()
scouting %>%
count(event)
scouting %>%
count(event, detail_1)
event n
Dump In/Out 4888
Faceoff Win 2441
Goal 293
Incomplete Play 8890
Penalty Taken 419
Play 23778
Puck Recovery 20667
Shot 4887
9

Step 2: Explore

View(scouting)
scouting %>%
glimpse()
scouting %>%
count(event)
scouting %>%
count(event, detail_1)
event detail_1 n
Dump In/Out Lost 4143
Dump In/Out Retained 745
Faceoff Win Backhand 2179
Faceoff Win Feet 17
Faceoff Win Forehand 245
Goal Deflection 17
Goal Slapshot 8
Goal Snapshot 148
10

Step 3: A Question

From the scouting data, among players who've taken at least 50 faceoffs, who has the best faceoff percentage?

11

Step 3: A Question

From the scouting data, among players who've taken at least 50 faceoffs, who has the best faceoff percentage?

(Is this the most rigorous Big Data Cup question? No, but I only have 10 minutes!)

11

Step 3: A Question

From the scouting data, among players who've taken at least 50 faceoffs, who has the best faceoff percentage?

(Is this the most rigorous Big Data Cup question? No, but I only have 10 minutes!)

game_date team player event player_2
2019-09-20 Sudbury Wolves Blake Murray Faceoff Win Connor Lockhart
2019-09-20 Sudbury Wolves Macauley Carson Faceoff Win Austen Swankler
2019-09-20 Sudbury Wolves Quinton Byfield Faceoff Win Chad Yetman
2019-09-20 Sudbury Wolves Macauley Carson Faceoff Win Austen Swankler
2019-09-20 Sudbury Wolves Quinton Byfield Faceoff Win Chad Yetman
2019-09-20 Erie Otters Alex Gritz Faceoff Win Blake Murray
2019-09-20 Erie Otters Connor Lockhart Faceoff Win Ethan Larmand
2019-09-20 Erie Otters Austen Swankler Faceoff Win Macauley Carson
2019-09-20 Erie Otters Chad Yetman Faceoff Win Quinton Byfield
2019-09-20 Erie Otters Brendan Hoffmann Faceoff Win Blake Murray
11

Step 3: A Question

From the scouting data, among players who've taken at least 50 faceoffs, who has the best faceoff percentage?

(Is this the most rigorous Big Data Cup question? No, but I only have 10 minutes!)

game_date team player event player_2
2019-09-20 Sudbury Wolves Blake Murray Faceoff Win Connor Lockhart
2019-09-20 Sudbury Wolves Macauley Carson Faceoff Win Austen Swankler
2019-09-20 Sudbury Wolves Quinton Byfield Faceoff Win Chad Yetman
2019-09-20 Sudbury Wolves Macauley Carson Faceoff Win Austen Swankler
2019-09-20 Sudbury Wolves Quinton Byfield Faceoff Win Chad Yetman
2019-09-20 Erie Otters Alex Gritz Faceoff Win Blake Murray
2019-09-20 Erie Otters Connor Lockhart Faceoff Win Ethan Larmand
2019-09-20 Erie Otters Austen Swankler Faceoff Win Macauley Carson
2019-09-20 Erie Otters Chad Yetman Faceoff Win Quinton Byfield
2019-09-20 Erie Otters Brendan Hoffmann Faceoff Win Blake Murray
12

Step 3: A Question

player faceoffs faceoff_wins faceoff_perc
Ty Dellandrea 55 35 0.6363636
Cole Schwindt 57 36 0.6315789
Cam Hillis 104 64 0.6153846
Lucas Theriault 52 29 0.5576923
Keean Washkurak 55 29 0.5272727
Danny Zhilkin 67 34 0.5074627
Austen Swankler 334 169 0.5059880
Ryan McGregor 72 36 0.5000000
Hayden Fowler 310 149 0.4806452
Chad Yetman 724 343 0.4737569
Connor Lockhart 207 92 0.4444444
Noah Sedore 68 30 0.4411765
Brendan Hoffmann 488 197 0.4036885
Elias Cohen 149 57 0.3825503
13

Step 3: A Question

faceoffs <- scouting %>%
filter(event == "Faceoff Win") %>%
select(player, player_2) %>%
rename(winner = player,
loser = player_2) %>%
pivot_longer(winner:loser,
names_to = "status",
values_to = "player") %>%
mutate(win = ifelse(status == "winner", 1, 0)) %>%
group_by(player) %>%
summarize(faceoffs = n(),
faceoff_wins = sum(win)) %>%
mutate(faceoff_perc = faceoff_wins / faceoffs) %>%
filter(faceoffs >= 50) %>%
arrange(desc(faceoff_perc))
14

Step 3: A Question

faceoffs <- scouting %>%
filter(event == "Faceoff Win") %>%
select(player, player_2) %>%
rename(winner = player,
loser = player_2) %>%
pivot_longer(winner:loser,
names_to = "status",
values_to = "player") %>%
mutate(win = ifelse(status == "winner", 1, 0)) %>%
group_by(player) %>%
summarize(faceoffs = n(),
faceoff_wins = sum(win)) %>%
mutate(faceoff_perc = faceoff_wins / faceoffs) %>%
filter(faceoffs >= 50) %>%
arrange(desc(faceoff_perc))
15

Step 3: A Question

faceoffs <- scouting %>%
filter(event == "Faceoff Win") %>%
select(player, player_2) %>%
rename(winner = player,
loser = player_2) %>%
pivot_longer(winner:loser,
names_to = "status",
values_to = "player") %>%
mutate(win = ifelse(status == "winner", 1, 0)) %>%
group_by(player) %>%
summarize(faceoffs = n(),
faceoff_wins = sum(win)) %>%
mutate(faceoff_perc = faceoff_wins / faceoffs) %>%
filter(faceoffs >= 50) %>%
arrange(desc(faceoff_perc))
16

Step 3: A Question

faceoffs <- scouting %>%
filter(event == "Faceoff Win") %>%
select(player, player_2) %>%
rename(winner = player,
loser = player_2) %>%
pivot_longer(winner:loser,
names_to = "status",
values_to = "player") %>%
mutate(win = ifelse(status == "winner", 1, 0)) %>%
group_by(player) %>%
summarize(faceoffs = n(),
faceoff_wins = sum(win)) %>%
mutate(faceoff_perc = faceoff_wins / faceoffs) %>%
filter(faceoffs >= 50) %>%
arrange(desc(faceoff_perc))
17

Step 3: A Question

faceoffs <- scouting %>%
filter(event == "Faceoff Win") %>%
select(player, player_2) %>%
rename(winner = player,
loser = player_2) %>%
pivot_longer(winner:loser,
names_to = "status",
values_to = "player") %>%
mutate(win = ifelse(status == "winner", 1, 0)) %>%
group_by(player) %>%
summarize(faceoffs = n(),
faceoff_wins = sum(win)) %>%
mutate(faceoff_perc = faceoff_wins / faceoffs) %>%
filter(faceoffs >= 50) %>%
arrange(desc(faceoff_perc))
18

Step 3: A Question

status player
winner Blake Murray
loser Connor Lockhart
winner Macauley Carson
loser Austen Swankler
winner Quinton Byfield
loser Chad Yetman
winner Macauley Carson
loser Austen Swankler
winner Quinton Byfield
loser Chad Yetman
19

Step 3: A Question

faceoffs <- scouting %>%
filter(event == "Faceoff Win") %>%
select(player, player_2) %>%
rename(winner = player,
loser = player_2) %>%
pivot_longer(winner:loser,
names_to = "status",
values_to = "player") %>%
mutate(win = ifelse(status == "winner", 1, 0)) %>%
group_by(player) %>%
summarize(faceoffs = n(),
faceoff_wins = sum(win)) %>%
mutate(faceoff_perc = faceoff_wins / faceoffs) %>%
filter(faceoffs >= 50) %>%
arrange(desc(faceoff_perc))
20

Step 3: A Question

faceoffs <- scouting %>%
filter(event == "Faceoff Win") %>%
select(player, player_2) %>%
rename(winner = player,
loser = player_2) %>%
pivot_longer(winner:loser,
names_to = "status",
values_to = "player") %>%
mutate(win = ifelse(status == "winner", 1, 0)) %>%
group_by(player) %>%
summarize(faceoffs = n(),
faceoff_wins = sum(win)) %>%
mutate(faceoff_perc = faceoff_wins / faceoffs) %>%
filter(faceoffs >= 50) %>%
arrange(desc(faceoff_perc))
21

Step 3: A Question

faceoffs <- scouting %>%
filter(event == "Faceoff Win") %>%
select(player, player_2) %>%
rename(winner = player,
loser = player_2) %>%
pivot_longer(winner:loser,
names_to = "status",
values_to = "player") %>%
mutate(win = ifelse(status == "winner", 1, 0)) %>%
group_by(player) %>%
summarize(faceoffs = n(),
faceoff_wins = sum(win)) %>%
mutate(faceoff_perc = faceoff_wins / faceoffs) %>%
filter(faceoffs >= 50) %>%
arrange(desc(faceoff_perc))
22

Step 3: A Question

faceoffs <- scouting %>%
filter(event == "Faceoff Win") %>%
select(player, player_2) %>%
rename(winner = player,
loser = player_2) %>%
pivot_longer(winner:loser,
names_to = "status",
values_to = "player") %>%
mutate(win = ifelse(status == "winner", 1, 0)) %>%
group_by(player) %>%
summarize(faceoffs = n(),
faceoff_wins = sum(win)) %>%
mutate(faceoff_perc = faceoff_wins / faceoffs) %>%
filter(faceoffs >= 50) %>%
arrange(desc(faceoff_perc))
23

Step 3: A Question

player faceoffs faceoff_wins faceoff_perc
Ty Dellandrea 55 35 0.6363636
Cole Schwindt 57 36 0.6315789
Cam Hillis 104 64 0.6153846
Lucas Theriault 52 29 0.5576923
Keean Washkurak 55 29 0.5272727
Danny Zhilkin 67 34 0.5074627
Austen Swankler 334 169 0.5059880
Ryan McGregor 72 36 0.5000000
Hayden Fowler 310 149 0.4806452
Chad Yetman 724 343 0.4737569
Connor Lockhart 207 92 0.4444444
Noah Sedore 68 30 0.4411765
Brendan Hoffmann 488 197 0.4036885
Elias Cohen 149 57 0.3825503
24

Step 4: A graph!

25

Step 4: A graph!

faceoffs_team %>%
ggplot(aes(x = reorder(player, faceoff_perc),
y = faceoff_perc, fill = team)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_fill_manual(values = c("#F2A900", "#FF6720", "#862633",
"#00205B", "#010101", "#C8C9C7")) +
labs(title = "Faceoff Percentages",
subtitle = "Among players with 50+ faceoffs, from Big Data
Cup scouting data set") +
ylab("Faceoff Win Percentage") +
geom_text(aes(label = scales::percent(faceoff_perc,
accuracy = 0.1)),
family = "Seravek", hjust = -0.15) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1),
limits = c(0, 0.72)) +
hanic_theme() +
theme(legend.title = element_blank(),
legend.position = "bottom",
panel.grid.major.y = element_blank(),
axis.title.y = element_blank())
26

Step 4: A graph!

faceoffs_team %>%
ggplot(aes(x = reorder(player, faceoff_perc),
y = faceoff_perc, fill = team)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_fill_manual(values = c("#F2A900", "#FF6720", "#862633",
"#00205B", "#010101", "#C8C9C7")) +
labs(title = "Faceoff Percentages",
subtitle = "Among players with 50+ faceoffs, from Big Data
Cup scouting data set") +
ylab("Faceoff Win Percentage") +
geom_text(aes(label = scales::percent(faceoff_perc,
accuracy = 0.1)),
family = "Seravek", hjust = -0.15) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1),
limits = c(0, 0.72)) +
hanic_theme() +
theme(legend.title = element_blank(),
legend.position = "bottom",
panel.grid.major.y = element_blank(),
axis.title.y = element_blank())
27

Step 4: A graph!

faceoffs_team %>%
ggplot(aes(x = reorder(player, faceoff_perc),
y = faceoff_perc, fill = team)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_fill_manual(values = c("#F2A900", "#FF6720", "#862633",
"#00205B", "#010101", "#C8C9C7")) +
labs(title = "Faceoff Percentages",
subtitle = "Among players with 50+ faceoffs, from Big Data
Cup scouting data set") +
ylab("Faceoff Win Percentage") +
geom_text(aes(label = scales::percent(faceoff_perc,
accuracy = 0.1)),
family = "Seravek", hjust = -0.15) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1),
limits = c(0, 0.72)) +
hanic_theme() +
theme(legend.title = element_blank(),
legend.position = "bottom",
panel.grid.major.y = element_blank(),
axis.title.y = element_blank())
28

Learn More

If you want to stick with the hockey theme:

install.packages("devtools")
devtools::install_github("meghall06/betweenthepipes")
# look in the Tutorial pane in the upper-right of RStudio
# or...run:
betweenthepipes::intro()
betweenthepipes::data_manip()
29

Learn More

If you want to stick with the hockey theme:

30

Learn More

If you want to stick with the hockey theme:

install.packages("devtools")
devtools::install_github("meghall06/betweenthepipes")
# look in the Tutorial pane in the upper-right of RStudio
# or...run:
betweenthepipes::intro()
betweenthepipes::data_manip()

Elsewhere:

31

Thanks!

@MeghanMHall
meghan.rbind.io
Slides created via the R package xaringan.

32

Can I teach you R in 10 minutes?

...no   😞

2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow