class: left, middle, inverse, title-slide # Moving Beyond Excel
For Your Hockey Analysis ## Meghan Hall ### #HANIC
April 18, 2020 ###
MeghanMHall
meghall06
meghan.rbind.io
--- # About me -- .center[
Data manager in higher education ] -- .center[
Amateur hockey analyst ([Hockey-Graphs!](https://hockey-graphs.com)) ] -- .center[
R enthusiast ] --- # About you -- .center[
Working with data manipulation, visualization, and analysis ] -- .center[
Maybe you use Excel? ] -- .center[
Curious about learning to code (or already on your way!) ] --- class: center, middle, inverse # Rule #1 ### No Excel shaming --- # Goals for tonight -- ### Limitations of Excel - When your goal is **efficient** and **reproducible** and **shareable** analysis -- ### Advantages of R - (Or whatever language you like) -- ### Confidence to move on - Find resources and develop a learning roadmap --- class: inverse, center, middle # Data Example ## From Moneypuck --- name: moneypuck class: inverse, middle, center background-image: url(img/Moneypuck.png) background-size: contain --- name: moneypuck2 class: inverse, middle, center background-image: url(img/Moneypuck2.png) background-size: contain --- # Two questions we have from this data -- ### Longest average shifts? - Forwards only - Strength state: 5v4 - 100+ shifts -- ### Most shifts per game? - Forwards vs. defensemen - Strength state: 5v5 - 10+ games --- # Easy enough to do in Excel! ### Longest average shifts? - create a **shift_length** variable: `icetime / shifts` - filter `position` to L, C, R - filter `situation` to 5on4 - filter `shifts` to at least 100 - sort descending by `shift_length` --- name: moneypuck4 class: inverse, middle, center background-image: url(img/Moneypuck4.png) background-size: contain --- # Easy enough to do in Excel! ### Most shifts per game? - create a **shifts_game** variable: `shifts / games_played` - create a **position_grouped** variable to create D and F - filter `situation` to 5on5 - filter `games_played` to at least 10 - find the average values for `shifts_game` --- name: moneypuck3 class: inverse, middle, center background-image: url(img/Moneypuck3.png) background-size: contain --- class: inverse, center, middle # But... --- # ...what if... --
someone asks you exactly what you did? 😬 <br> --
you get refreshed data? 😬 <br> --
you make a mistake, or there's something you want to make note of? 😬 <br> --
there's more data you want to join in? 😬 <br> --
you have to make a graph? 😬 --- class: inverse, center, middle # Let's try this in R --- # Longest average shifts? ```r shift_length <- moneypuck %>% mutate(shift_length = icetime / shifts) %>% filter(position != "D" & situation == "5on4" & shifts >= 100) %>% arrange(desc(shift_length)) %>% select(name, shift_length) ``` -- ```r head(shift_length) ## # A tibble: 6 x 2 ## name shift_length ## <chr> <dbl> ## 1 Alex Ovechkin 157. ## 2 Connor McDavid 138. ## 3 Leon Draisaitl 116. ## 4 Nathan MacKinnon 112. ## 5 Jack Eichel 108. ## 6 Auston Matthews 102. ``` --- # Most shifts per game? ```r shifts_game <- moneypuck %>% mutate(shifts_game = shifts / games_played, position_grouped = ifelse(position == "D", "D", "F")) %>% filter(situation == "5on5" & games_played >= 10) %>% group_by(position_grouped) %>% summarize(avg_shifts_game = mean(shifts_game)) ``` -- ```r head(shifts_game) ## # A tibble: 2 x 2 ## position_grouped avg_shifts_game ## <chr> <dbl> ## 1 D 19.7 ## 2 F 15.8 ``` --- # ...what if... --
someone asks you exactly what you did? 😄 <br> --- class: middle ```r # Always comment your code as if others will read it # "Others" includes yourself in 3 months when you've # forgotten everything # This is the "Skaters" file from MoneyPuck as of 2020-04-11 moneypuck <- read_csv("MoneyPuck.csv") # Created a new variable to calculate average shift length # Filters: forwards only, 5on4, at least 100 shifts # Can easily include justification for why you did all this # Because you WILL forget shift_length <- moneypuck %>% mutate(shift_length = icetime / shifts) %>% filter(position != "D" & situation == "5on4" & shifts >= 100) %>% arrange(desc(shift_length)) %>% select(name, shift_length) ``` --- # ...what if...
someone asks you exactly what you did? 😄 <br>
you get refreshed data? 😄 <br> --
you make a mistake, or there's something you want to make note of? 😄 <br> --- class: middle ```r # You can create functions for common tasks # Let's say you hate the Leafs and always want to # remove them for your data # That would be a pain to do every time by hand, so: no_leafs_no <- function(original_data, team_variable) { original_data %>% filter(team_variable != "TOR") } moneypuck_no_leafs <- no_leafs_no(moneypuck, moneypuck$team) # Or more realistically, you always want to remove a certain game # Or you have tasks that you do frequently # If you're copying and pasting code, it's time for a function ``` --- class: middle ```r # You can also run tests to check for mistakes # Should have 31 teams and 883 players moneypuck %>% summarize(teams = n_distinct(team)) ## # A tibble: 1 x 1 ## teams ## <int> ## 1 31 moneypuck %>% summarize(players = n_distinct(name)) ## # A tibble: 1 x 1 ## players ## <int> ## 1 883 ``` --- # ...what if...
someone asks you exactly what you did? 😄 <br>
you get refreshed data? 😄 <br>
you make a mistake, or there's something you want to make note of? 😄 <br>
there's more data you want to join in? 😄 <br> --- class: middle ```r # Read in your tracking data that has your proprietary statistic: # Gutsy Recoveries In Transition GRIT <- read_csv("your_tracking_data.csv") # Join that into the moneypuck data by both player and season moneypuck_w_tracking <- moneypuck %>% left_join(GRIT, by = c("name", "season", "team")) ``` --- # ...what if...
someone asks you exactly what you did? 😄 <br>
you get refreshed data? 😄 <br>
you make a mistake, or there's something you want to make note of? 😄 <br>
there's more data you want to join in? 😄 <br>
you have to make a graph? 😄 --- # Longest average shifts? ```r shift_length %>% top_n(10) %>% ggplot(aes(y = shift_length, x = reorder(name, shift_length))) + geom_col() + coord_flip() + labs( y = "Average Shift Length", x = NULL, title = "Average Shift Length Among Forwards", subtitle = "2019-20 NHL Season, 5v4 Only", caption = "Data from Moneypuck" ) + geom_text(aes(label = round(shift_length, 1)), colour = "white", size = 3, position = position_stack(vjust = 0.07)) + meg_theme() ``` --- # Longest average shifts? .center[ <img src="figs/Use/unnamed-chunk-10-1.png" width="504" /> ] --- # Most shifts per game? ```r shifts_game %>% ggplot(aes(y = avg_shifts_game, x = position_grouped)) + geom_col() + labs( y = "Average Shifts per Game", x = NULL, title = "Average Shifts per Game By Position", subtitle = "2019-20 NHL Season, 5v5 Only", caption = "Data from Moneypuck" ) + geom_text(aes(label = round(avg_shifts_game, 1)), size = 4, position = position_stack(vjust = 1.05)) + meg_theme() ``` --- # Most shifts per game? .center[ <img src="figs/Use/unnamed-chunk-11-1.png" width="504" /> ] --- # Other things you can do with R -- ## Github
- Integrates easily so you can collaborate, save, share. Version control! -- ## RMarkdown - Useful for creating and sharing all kinds of content -- - Like slides 😇 --- ```r shift_length %>% top_n(10) %>% mutate(shift_length = round(shift_length, 2)) %>% knitr::kable(format = "html") ``` <table> <thead> <tr> <th style="text-align:left;"> name </th> <th style="text-align:right;"> shift_length </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Alex Ovechkin </td> <td style="text-align:right;"> 156.64 </td> </tr> <tr> <td style="text-align:left;"> Connor McDavid </td> <td style="text-align:right;"> 138.04 </td> </tr> <tr> <td style="text-align:left;"> Leon Draisaitl </td> <td style="text-align:right;"> 116.45 </td> </tr> <tr> <td style="text-align:left;"> Nathan MacKinnon </td> <td style="text-align:right;"> 111.85 </td> </tr> <tr> <td style="text-align:left;"> Jack Eichel </td> <td style="text-align:right;"> 108.48 </td> </tr> <tr> <td style="text-align:left;"> Auston Matthews </td> <td style="text-align:right;"> 102.48 </td> </tr> <tr> <td style="text-align:left;"> Nicklas Backstrom </td> <td style="text-align:right;"> 102.39 </td> </tr> <tr> <td style="text-align:left;"> Patrick Kane </td> <td style="text-align:right;"> 101.51 </td> </tr> <tr> <td style="text-align:left;"> Ryan Nugent-Hopkins </td> <td style="text-align:right;"> 99.27 </td> </tr> <tr> <td style="text-align:left;"> Brad Marchand </td> <td style="text-align:right;"> 98.24 </td> </tr> </tbody> </table> --- # Other things you can do with R ## Github
- Integrates easily within RStudio so you can collaborate, save, share ## RMarkdown - Useful for creating and sharing all kinds of content - Like slides 😇 ## Make a website -- - Using `Shiny` (like Evolving-Hockey!) -- - Using `blogdown` (like my website!) --- class: center, middle, inverse # Okay but how? --- # Moving past Excel conceptually -- ### It is comforting to *see* all of your data -- - but that's not really necessary! - much more efficient to look at the pieces you need -- ### Excel quickly becomes unwieldy with too much data - if you're working with NHL play-by-play data? 1M+ rows -- ### Easier to keep track of data frames in an R file - rather than a lot of workbooks and sheets -- ### Separate your data from your analysis! --- # Treat it like learning a new language -- .pull-left[ ### Excel Create a new variable that creates **position_grouped** ] .pull-right[ ### R ```r mutate(position_grouped = ifelse(position == "D", "D", "F")) ``` ] --- # Treat it like learning a new language .pull-left[ ### Excel Sort descending by `shifts_game` ] .pull-right[ ### R ```r arrange(desc(shifts_game)) ``` ] --- # Treat it like learning a new language .pull-left[ ### Excel Make a pivot table by position and find average shifts ] .pull-right[ ### R ```r data %>% group_by(position_grouped) %>% summarize(avg_shifts_game = mean(shifts_game)) ``` ] --- # Treat it like learning a new language .pull-left[ ### Excel Filter to forwards only ] .pull-right[ ### R ```r filter(position != "D") ``` ] --- # Tips for learning
Practice a little bit frequently! <br> --
Pick **one** resource and go all the way through <br> --
Get really good at Googling <br> --
Use a project or a question as motivation --- # My favorite resources -- ### R for Data Science [
](https://r4ds.had.co.nz/) - Focuses on the tidyverse - Can skip from chapter to chapter to focus on what you need - Really helpful examples you can follow along with -- ### R for Excel Users workshop [
](https://rstudio-conf-2020.github.io/r-for-excel/) -- ### My tutorials — gotta plug my own stuff 🤷 - [Intro to R on Hockey-Graphs](https://hockey-graphs.com/2019/12/11/an-introduction-to-r-with-hockey-data/) (with lots of exercises) - [Intro to R with Swirl](https://meghan.rbind.io/post/hockey-data-with-swirl/) (interactive learning in the console) - Various ad hoc tutorials (exploratory data analysis, tidymodels, etc.) --- class: center, middle # Where to find me
[MeghanMHall](https://twitter.com/MeghanMHall)
[meghall06](https://github.com/meghall06)
[meghan.rbind.io](https://meghan.rbind.io) --- class: inverse, center, middle # Go forth and code 🙏 <br> <br> <br> Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan).