Distinct causes of death in each state of the United States

Cause of Death- {gganimate} with Geographical Data

Teresa Chen https://teresashchen.github.io/blog/
03-14-2019

Table of Contents


This is part 3 of final project in Communicating and Transforming Data. In addition to answering the questions regarding causes of death using real data, I also hope to facilitate data analysis using R by documenting the steps leading towards the final results.

The R project aims to answer the following three questions by demonstrating three main data visualizations:

Q1. How do leading causes of death change over the past 18 years?
Q2. What are changing patterns of each leading cause of death over the years?
Q3. What are the distinct causes of death in each state of the United States?

This post presents data visualization for Q3. For steps leading to the plot, including data preparation and tidy data, please refer to part 1 of the post.

Here are the required packages:


install.packages("rio")
install.packages("here")
install.packages("tidyverse")
install.packages("janitor")
install.packages("gganimate")
install.packages("transformr")
install.packages("maps")
install.packages("colorblinr")

Here is a global settings:

Data visualization

The last plot of this R project series is for a general and academic audience and to display distinct and leading causes of death in each state.

First plot- map data


# get a map data with longitude and latitude of each state 

usa <- as_tibble(map_data("state"))
usa$region <- str_to_title(usa$region)
usa <- usa %>%
  rename(state = region)

# create data with distinct cause of each state

distinct_by_state <- tidy_df %>% 
  # creat rank per year and state
  group_by(year, state) %>% 
  arrange(desc(deaths)) %>% 
  mutate(rank = row_number()) %>% 
  arrange(year) %>% 
  ungroup() %>% 
  # only select rank no. 1
  filter(rank == 1,
         state != "United States")

#  combine the above 2 datasets together

distinct_map <- full_join(usa, distinct_by_state, by = "state") %>% 
  filter(!is.na(lat), !is.na(long), !is.na(year))


# map data with USA states

plot3 <- distinct_map  %>% 
  mutate(cause = factor(cause, levels = c("Heart disease", "Cancer"))) %>% 
  ggplot(aes(long, lat, group = group, fill = cause)) +
  geom_polygon(color = "white", alpha = 0.7) +
  scale_fill_OkabeIto() +
  coord_map()+
  labs(title = "Leading Causes of Death",
       subtitle = "Cancer emerges as a leading cause in the early 2000's.",
       caption = "Source: Centers for Disease Control and Prevention ",
       fill = "")+
  theme_void(base_size = 35) +
  facet_wrap(~year, ncol = 3) +
  theme(plot.title = element_text(face = "bold",
                                  hjust = 0.5,
                                  vjust = 10),
        plot.subtitle = element_text(face = "italic",
                                     hjust = 0.5,
                                     vjust = 10),
        legend.position = "top")

plot3

The plot clearly illustrates the pattern observed in the subtitle: cancer appears to be a distinct cause of death since the early 2000’s. Interestingly, southern regions did not have cancer as a distinct cause unitl 2008 with some states maintaining the same distinct cause (heart disease) throughout 18 years.

Although the plot effectively communicates the information, the space that needs to contain the plot is a concern due to the fact that there are so many facets. Therefore, I use {gganimate} again to turn the data maps alive!

Second plot- animated map data


ani_plot3 <- distinct_map  %>% 
  mutate(cause = factor(cause, levels = c("Heart disease", "Cancer"))) %>% 
  ggplot(aes(long, lat, group = group, fill = cause)) +
  geom_polygon(color = "white", alpha = 0.7) +
  scale_fill_OkabeIto() +
  coord_map() +
  theme_void() +
  labs(title = "A Distinct Cause of Death\nCancer emerges as a leading cause in early 20th.",
       subtitle = "Year: {round(frame_time)}",
       caption = "Source: Centers for Disease Control and Prevention ",
       fill = "") +
  theme(plot.title = element_text(face = "bold",
                                  hjust = 0.5,
                                  vjust = 10),
        plot.subtitle = element_text(face = "italic",
                                     hjust = 0.5,
                                     vjust = 10),
        legend.position = "top") +
  transition_time(year) +
  ease_aes("cubic-in-out")



animate(ani_plot3, 200, fps = 10, duration = 30, end_pause = 10,
        width = 800, height = 600, 
        renderer = gifski_renderer("plot3_ani.gif"))

This is not the first time I have created an animated map. I would still be struggling with the process of designing an animated map if it were not for Jordan Frey’s code. Hope at some point in the future my codes provided here will be utilized/referenced by someone :)

Conclusion

With these three parts of data visulization, now the original three questions can be properly answered.

Q1. How do leading causes of death change over the past 18 years?
A1: Heart disease is a leading cause of death over the years and cancer is the second highest. The sum of these top two causes of deaths are three times more compared to the sum of all other causes of death each year. It seems like Alzheimer’s disease is climbing up on its rank in the top 10 causes of death over the years.

Q2. What are the changing patterns of each leading cause of death over the years?
A2: It seems heart disease, cancer and Alzheimer’s disease are three causes that need most research resources in disease intervention and prevention. Heart disease and cancer are the top two distinct causes of death over the years with Alzheimer’s disease increasing rapidly.

Q3. What are the distinct causes of death in each state in the United States?
A3: Cancer appears to be a distinct cause of death since the early 2000’s. The southern region displays late and lesser number of cancer cases overall.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Chen (2019, March 14). Szu-Hua Teresa Chen, PT, PhD: Distinct causes of death in each state of the United States. Retrieved from https://teresashchen.github.io/blog/posts/2019-03-14-cause-of-death-gganimate-with-geographical-data/

BibTeX citation

@misc{chen2019distinct,
  author = {Chen, Teresa},
  title = {Szu-Hua Teresa Chen, PT, PhD: Distinct causes of death in each state of the United States},
  url = {https://teresashchen.github.io/blog/posts/2019-03-14-cause-of-death-gganimate-with-geographical-data/},
  year = {2019}
}