Tidy Tuesday time, y’all
It’s been months since I did a Tidy Tuesday, so I set aside an hour and said I had to post SOMETHING at the end of it. This week’s dataset was on Simpsons guest stars over the 30 (!!!) years the show has been on the air. Though I was initially tempted to do something looking at the most “famous” guest stars that have been on the show or the connectivity of the cast and guest stars in other projects outside of the show, both of those require additional data and more time than I had, so I went with the next thing that popped into my head, which was simply mapping the most common guest stars and their number of apperances in each season.
Step one, load the packages we’ll be using and grab the data.
library(tidyverse) library(cowplot) simpsons <- readr::read_delim("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-08-27/simpsons-guests.csv", delim = "|", quote = "")
Looking at the most commonly-appearing guest stars, we see that Marcia Wallace (the voice of Edna Krabappel) is by far the most frequent guest star on the show with 175 apperances. The list drops off pretty quickly with Glenn Close rounding out the top 10 with nine apperances over the 30 seasons.
simpsons %>% count(guest_star, name = "total_n", sort = TRUE)
## # A tibble: 795 x 2 ## guest_star total_n ## <chr> <int> ## 1 Marcia Wallace 175 ## 2 Phil Hartman 52 ## 3 Joe Mantegna 29 ## 4 Maurice LaMarche 25 ## 5 Frank Welker 21 ## 6 Kelsey Grammer 21 ## 7 Jon Lovitz 19 ## 8 Kevin Michael Richardson 18 ## 9 Jackie Mason 11 ## 10 Glenn Close 9 ## # … with 785 more rows
For the purpose of making a cleaner visualization, I’ll focus on the 8 most frequently appearing guest stars, so let’s make a dataset with just these stars and their total number of apperances.
top_stars <- simpsons %>% count(guest_star, name = "total_n", sort = TRUE) %>% head(8)
We’ll use this as our filtering dataset for plotting, simultaneously adding the total number of apperances per guest star which we’ll use to order our data for plotting (and incorporate as text labels into our final plot).
Ok, let’s get to plotting!
simpsons %>% count(season, guest_star) %>% inner_join(top_stars, by = "guest_star") %>% filter(season != "Movie") %>% # editorial choice; its not a season mutate(guest_star = fct_reorder(guest_star, total_n), season = fct_inseq(season)) %>% ggplot(aes(season, guest_star)) + geom_point(aes(size = n)) + scale_size("# appearances")
Not a bad start using the default aesthetics, but it leaves some room for improvement in the prettiness department and we can make it cleaner, so its easier to understand.
Looking at the raw data, some guest starts always reprise the same role - for example Marcia Wallace always plays Mrs. Krabappel and Joe Montegna always voices Fat Tony. Other guest stars rarely voice the same character twice, as we can see for Maurice LaMarche below:
## # A tibble: 25 x 2 ## episode_title role ## <chr> <chr> ## 1 A Star Is Burns George C. Scott; Hannibal Lecter; Captain James … ## 2 The Seemingly Never-Ending… Commander McBragg ## 3 Treehouse of Horror XVII Orson Welles ## 4 G.I. (Annoyed Grunt) Recruiter #2; Cap'n Crunch ## 5 The Wife Aquatic First Mate Billy; Oceanographer ## 6 Stop or My Dog Will Shoot Farmer; Horn Stuffer ## 7 You Kent Always Say What Y… Fox announcer ## 8 Treehouse of Horror XVIII Government Official ## 9 Husbands and Knives Jock ## 10 Dangerous Curves Toucan Sam; Cap'n Crunch; Trix Rabbit ## 11 No Loan Again, Naturally Dwight D. Eisenhower ## 12 Waverly Hills 9-0-2-1-D'oh City Inspector ## 13 Once Upon a Time in Spring… Nuclear Power Plant Guard ## 14 Chief of Hearts David Starsky ## 15 Angry Dad: The Movie Anthony Hopkins ## # … with 10 more rows
Let’s incorporate the names of the main characters each actor voiced into our plot. It would also be nice if our plot listed the total number of guest apperances by each actor or actress. Once we make these changes and add more neutral theme we get this:
(Ok, the whole thing took me more than an hour by the time I went back and added the blog text to the markdown, but this was still a useful exercise for me who typically spends too much time messing around with tiny details)
View the rest of my Tidy Tuesday contributions here.