library(tidyverse)
library(readxl)
library(ggmosaic) # For mosaic plotting
library(usmap) # For US Map plotting
library(usmapdata) # For centroid locations of US map to map geom_text
library(tidycensus) # For population and FIPS data
library(thematic) # Adapt plots to RMD theme
thematic_rmd()
library(showtext) # To customize fonts in ggplot
showtext_auto(enable = TRUE)
font_add_google("Raleway", family = "raleway")
font_add_google("Bebas Neue", family = "bebas neue")
The Behavioral Risk Factor Surveillance System (BRFSS) is a data set from an observational study conducted in 2013 by the CDC in all 50 US states and the District of Columbia, Puerto Rico, and Guam. The purpose of this study was to retrieve state-specific data on preventative health practices and risk behaviors from approximately 490,000 randomly selected individuals that reside in private residences or college housing. The survey was conducted with stratified random sampling from all participating districts.
This survey data allows the ability to ask research questions that we can generalize to the US population. However we can not make any causal statements or inferences based upon this data because it was strictly an observational study and no random assignment was conducted.
Potential non-response bias may be considered for this data since the surveys were conducted using land lines and cell phones. We also know that there is a population of those who do not have neither land lines or cell phones and that those individuals may be at the lower end of the income range.
Since the dataset contains 491,775 rows and 2,365 columns - the data frame for this analysis will only be loaded with columns pertaining to housing, location and vehicular behavior data.
The full codebook PDF for the 2013 BRFSS dataset can be found here.
Research Question 1: What are the probabilities of owning or renting a home given an individual’s income level and education level?
Research Question 2: How do property values influence home ownership or renter status among each US State?
Research Question 3: What is the correlation between individuals who claim to “always” wear seat belts and their states’ vehicle fatality rates?
What are the probabilities of owning or renting a home given an individual’s income level and education level?
brfss2013 <- read_csv("Data/brfss13.csv")
renthom1
: Home owners, renters, and “other
arrangement”
n
= Number of respondents
pc
= Percentage of total survey
brfss2013 %>% count(renthom1) %>% summarize(renthom1, n, pc = n/sum(n)) %>% arrange(desc(pc))
## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
## always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## # A tibble: 4 × 3
## renthom1 n pc
## <chr> <int> <dbl>
## 1 Own 351636 0.715
## 2 Rent 110980 0.226
## 3 Other arrangement 20711 0.0421
## 4 <NA> 8448 0.0172
X_educag
: Education Levels (4
levels)
n
= Number of respondents
pc
= Percentage of total survey
brfss2013 %>% count(X_educag) %>% summarize(X_educag, n, pc = n/sum(n))
## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
## always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## # A tibble: 5 × 3
## X_educag n pc
## <chr> <int> <dbl>
## 1 Attended college or technical school 134196 0.273
## 2 Did not graduate high school 42213 0.0858
## 3 Graduated from college or technical school 170118 0.346
## 4 Graduated high school 142968 0.291
## 5 <NA> 2280 0.00464
income2
: Income Levels (8 classes)
n
= Number of respondents
pc
= Percentage of total survey
brfss2013 %>% count(income2) %>% summarize(income2, n, pc = n/sum(n))
## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
## always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## # A tibble: 9 × 3
## income2 n pc
## <chr> <int> <dbl>
## 1 $75,000 or more 115902 0.236
## 2 Less than $10,000 25441 0.0517
## 3 Less than $15,000 26794 0.0545
## 4 Less than $20,000 34873 0.0709
## 5 Less than $25,000 41732 0.0849
## 6 Less than $35,000 48867 0.0994
## 7 Less than $50,000 61509 0.125
## 8 Less than $75,000 65231 0.133
## 9 <NA> 71426 0.145
Create a new data frame and rename with the 3 variables selected for this analysis and remove any NA’s.
ds <- brfss2013 %>% select(income2, X_educag, renthom1) %>%
rename(income_lev = income2, edu_lev = X_educag) %>%
drop_na()
ds %>% head()
## # A tibble: 6 × 3
## income_lev edu_lev renthom1
## <chr> <chr> <chr>
## 1 Less than $75,000 Graduated from college or technical school Own
## 2 $75,000 or more Attended college or technical school Own
## 3 $75,000 or more Graduated from college or technical school Own
## 4 Less than $75,000 Graduated high school Own
## 5 Less than $50,000 Graduated from college or technical school Own
## 6 $75,000 or more Graduated from college or technical school Own
Relabel income values with numeric labels.
ds <- ds %>% mutate(income_lev = as.factor(case_when(
income_lev == "Less than $10,000" ~ "$0 - $10,000",
income_lev == "Less than $15,000" ~ "$10,000 - $15,000",
income_lev == "Less than $20,000" ~ "$15,000 - $20,000",
income_lev == "Less than $25,000" ~ "$20,000 - $25,000",
income_lev == "Less than $35,000" ~ "$25,000 - $35,000",
income_lev == "Less than $50,000" ~ "$35,000 - $50,000",
income_lev == "Less than $75,000" ~ "$50,000 - $75,000",
income_lev == "$75,000 or more" ~ "$75,000 +" ) ) )
Verify factor levels
ds$edu_lev <- factor(ds$edu_lev, levels = c("Did not graduate high school", "Graduated high school",
"Attended college or technical school", "Graduated from college or technical school"))
levels(ds$edu_lev)
## [1] "Did not graduate high school"
## [2] "Graduated high school"
## [3] "Attended college or technical school"
## [4] "Graduated from college or technical school"
levels(ds$income_lev)
## [1] "$0 - $10,000" "$10,000 - $15,000" "$15,000 - $20,000"
## [4] "$20,000 - $25,000" "$25,000 - $35,000" "$35,000 - $50,000"
## [7] "$50,000 - $75,000" "$75,000 +"
ds %>% ggplot(aes(x = renthom1)) +
geom_bar()
renthom1
- Home Residence Status shows that home owners comprised more than double the amount of “rent” and “other arrangement” responses.
ds %>% ggplot(aes(x = income_lev)) +
geom_bar() +
theme(axis.text.x = element_text(angle = 90))
income_lev
- Income levels of respondents is highly left skewed. Respondents over the $75,000 threshold almost doubles those in the $50,000-$75,000 range and decreases continuously down the income ranges.
ds %>% ggplot(aes(x = edu_lev)) +
geom_bar() + # Modify labels of ggplot2 barplot
scale_x_discrete(labels = function(x) str_wrap(levels(ds$edu_lev), width = 10))
edu_lev
- Education levels above “Did not graduate high school” make up 92% of respondents with “College graduates” taking about 20% more than the two median education levels.
Mosaic plots will show how education and income levels vary for each of the 3 residence types (own, rent, & other).
Each plot will display a weighted percentile label inside each box of
the mosaic plot. The first step is to build a data frame using
ggplot_build()
to locate all of the box coordinates to tell
ggplot where to map the percentiles on the plot. Since we have 4
education levels and 8 income levels, we will obtain 32 boxes for each
of the 3 mosaic plots.
# Calculate percentiles and X/Y locations by (own/rent/other)
gg_own <- ggplot_build(ds %>% filter(renthom1 == "Own") %>% ggplot() +
geom_mosaic(aes(x=product(edu_lev, income_lev), fill = edu_lev)))$data[[1]] %>%
summarize(pc = .wt/sum(.wt), label, xmin, xmax, ymin, ymax, .wt) %>% ungroup()
gg_rent <- ggplot_build(ds %>% filter(renthom1 == "Rent") %>% ggplot() +
geom_mosaic(aes(x=product(edu_lev, income_lev), fill = edu_lev)))$data[[1]] %>%
summarize(pc = .wt/sum(.wt), label, xmin, xmax, ymin, ymax, .wt) %>% ungroup()
gg_other <- ggplot_build(ds %>% filter(renthom1 == "Other arrangement") %>% ggplot() +
geom_mosaic(aes(x=product(edu_lev, income_lev), fill = edu_lev)))$data[[1]] %>%
summarize(pc = .wt/sum(.wt), label, xmin, xmax, ymin, ymax, .wt) %>% ungroup()
We now have the percentiles of each combination of income and
education (8x4=32) for each residence type in the pc
column, and the mapping in the xmin
, xmax
,
ymin
, ymax
columns.
Next, aggregate these individual percentiles into their marginal probabilities for income and education levels.
This function creates a data frame that calculates the marginal
probabilities, and maps them to the centroid of x
and
y
for geom_label
aes
.
geom_text_margin <- function(Own_Rent_Other_arrangement){ # Enter "Own", "Rent", or "Other arrangement"
r <- ggplot_build(ds %>% filter(renthom1 == Own_Rent_Other_arrangement) %>%
ggplot() + # Create gg_build object to give coordinates
geom_mosaic(aes(x=product(edu_lev, income_lev), fill = edu_lev)) +
ggtitle(Own_Rent_Other_arrangement) +
ylab("Education Level") +
xlab("Income Level") +
theme(axis.text.x = element_text(angle = 75, vjust = 0.5, hjust=0.5, size = 10, face = "bold"),
axis.text.y = element_blank(),
legend.position = "left",
legend.key.height = unit(100, "points"),
legend.title = element_blank(),
plot.title.position = "plot",
plot.title = element_text(hjust = 0.5, size = 28, face = "bold"),
axis.title = element_text(face = "bold", size = 15)) +
guides(fill = guide_legend(reverse=TRUE))) %>% .[[1]] %>% .[[1]] %>% # Pull out the data frame
mutate(x = case_when( # Add x and y columns by averaging min and max coordiantes to get center
x__income_lev == "$0 - $10,000" & x__fill__edu_lev == "Did not graduate high school" ~ (xmin+xmax)/2,
x__income_lev == "$10,000 - $15,000" & x__fill__edu_lev == "Did not graduate high school" ~ (xmin+xmax)/2,
x__income_lev == "$15,000 - $20,000" & x__fill__edu_lev == "Did not graduate high school" ~ (xmin+xmax)/2,
x__income_lev == "$20,000 - $25,000" & x__fill__edu_lev == "Did not graduate high school" ~ (xmin+xmax)/2,
x__income_lev == "$25,000 - $35,000" & x__fill__edu_lev == "Did not graduate high school" ~ (xmin+xmax)/2,
x__income_lev == "$35,000 - $50,000" & x__fill__edu_lev == "Did not graduate high school" ~ (xmin+xmax)/2,
x__income_lev == "$50,000 - $75,000" & x__fill__edu_lev == "Did not graduate high school" ~ (xmin+xmax)/2,
x__income_lev == "$75,000 +" & x__fill__edu_lev == "Did not graduate high school" ~ (xmin+xmax)/2),
y = case_when(
x__fill__edu_lev == "Did not graduate high school" & x__income_lev == "$75,000 +" ~ (ymin+ymax)/2,
x__fill__edu_lev == "Attended college or technical school" & x__income_lev == "$75,000 +" ~ (ymin+ymax)/2,
x__fill__edu_lev == "Did not graduate high school" & x__income_lev == "$75,000 +" ~ (ymin+ymax)/2,
x__fill__edu_lev == "Graduated from college or technical school" & x__income_lev == "$75,000 +" ~ (ymin+ymax)/2,
x__fill__edu_lev == "Graduated high school" & x__income_lev == "$75,000 +" ~ (ymin+ymax)/2)
) %>%
mutate(x = if_else(is.na(x),1,x), # Force income_lev marginals against the x axis
y = if_else(is.na(y),1,y) # Force edu_level marginals against the y axis
) %>%
mutate(pc = .wt/sum(.wt)) # Add percentile of each 32 possibilities (pc)
# Create marginal table of column sums
cat_inc <- r %>% group_by(x__income_lev) %>% summarize(wt_pc = sum(pc))
cat_edu <- r %>% group_by(x__fill__edu_lev) %>% summarize(wt_pc = sum(pc))
colnames(cat_inc) <- c("cat", "wt_pc")
colnames(cat_edu) <- c("cat", "wt_pc")
cat <- full_join(cat_edu, cat_inc)
# Merge r and cat manually
cat <- cat %>%
mutate(
x = case_when( # Map centroid x values to income_lev, edu_lev will be 1
cat == "Did not graduate high school" ~ 1,
cat == "Graduated high school" ~ 1,
cat == "Attended college or technical school" ~ 1,
cat == "Graduated from college or technical school" ~ 1,
cat == "$0 - $10,000" ~ r[r$label == "Did not graduate high school\n$0 - $10,000", "x"],
cat == "$10,000 - $15,000" ~ r[r$label == "Did not graduate high school\n$10,000 - $15,000", "x"],
cat == "$15,000 - $20,000" ~ r[r$label == "Did not graduate high school\n$15,000 - $20,000", "x"],
cat == "$20,000 - $25,000" ~ r[r$label == "Did not graduate high school\n$20,000 - $25,000", "x"],
cat == "$25,000 - $35,000" ~ r[r$label == "Did not graduate high school\n$25,000 - $35,000", "x"],
cat == "$35,000 - $50,000" ~ r[r$label == "Did not graduate high school\n$35,000 - $50,000", "x"],
cat == "$50,000 - $75,000" ~ r[r$label == "Did not graduate high school\n$50,000 - $75,000", "x"],
cat == "$75,000 +" ~ r[r$label == "Did not graduate high school\n$75,000 +", "x"]
),
y = case_when( # Map centroid y values to edu_lev, income_lev will be 1
cat == "Did not graduate high school" ~ r[r$label == "Did not graduate high school\n$75,000 +", "y"],
cat == "Graduated high school" ~ r[r$label == "Graduated high school\n$75,000 +", "y"],
cat == "Attended college or technical school" ~ r[r$label == "Attended college or technical school\n$75,000 +", "y"],
cat == "Graduated from college or technical school" ~ r[r$label == "Graduated from college or technical school\n$75,000 +", "y"],
cat == "$0 - $10,000" ~ 1,
cat == "$10,000 - $15,000" ~ 1,
cat == "$15,000 - $20,000" ~ 1,
cat == "$20,000 - $25,000" ~ 1,
cat == "$25,000 - $35,000" ~ 1,
cat == "$35,000 - $50,000" ~ 1,
cat == "$50,000 - $75,000" ~ 1,
cat == "$75,000 +" ~ 1
)
)
cat
}
ds %>% filter(renthom1 == "Own") %>% ggplot() +
geom_mosaic(aes(x=product(edu_lev, income_lev), fill = edu_lev)) +
ggtitle("Own") +
xlab("Income Level") +
scale_fill_discrete(labels = function(x) str_wrap(x, width = 10)) +
theme(axis.text.x = element_text(color = "white", size = 15, angle = 65, vjust = 0.65, hjust=0.5,
face = "bold", family = "bebas neue"),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.y = element_blank(),
axis.title.x = element_text(size = 22, hjust = 0.43, family = "bebas neue"),
legend.position = "left",
legend.justification = c(-1, 0.25),
legend.key.height = unit(95, "points"),
legend.text = element_text(size = 12, face = "bold", family = "raleway"),
legend.title = element_text(face = "bold", size = 20, hjust = 0.5, family = "bebas neue"),
plot.title.position = "plot",
plot.title = element_text(hjust = 0.5, size = 28, face = "bold", family = "bebas neue"),
axis.title = element_text(face = "bold", size = 18),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank()) +
guides(fill = guide_legend(title="Education\nLevel", reverse=TRUE),
colour = guide_legend(nrow = 2)) +
geom_text(data = gg_own, # Individual percentiles
aes(x = (xmin+xmax)/2, y = (ymin+ymax)/2, # min+max/2 gives the center location to the mapping
label = paste0(round(pc, 4)*100,"%")), # Multiply % by 100, round, and add % symbol
fontface = "bold",
family = "raleway") +
geom_label(data = geom_text_margin("Own"), # Marginal probabilities
aes(x = x, y = y,
label = paste0(round(wt_pc*100, 1),"%")),
fontface="bold.italic",
family = "raleway")
Home owner’s income levels are highly left skewed with the top half of income levels comprising over 77% of responses for home owners. Notice how wide the vertical bars on the right side of the chart are.
Home owner’s education levels are also highly left skewed with over 67% of respondents reporting education in the top half of the four education levels. Note the higher presence of purple and blue in the chart compared to green (27%) and red (5.7%).
ds %>% filter(renthom1 == "Rent") %>% ggplot() +
geom_mosaic(aes(x=product(edu_lev, income_lev), fill = edu_lev)) +
ggtitle("Rent") +
ylab("Education Level") +
xlab("Income Level") +
scale_fill_discrete(labels = function(x) str_wrap(x, width = 10)) +
theme(axis.text.x = element_text(color = "white", size = 15, angle = 65, vjust = 0.65, hjust=0.5,
face = "bold", family = "bebas neue"),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.y = element_blank(),
axis.title.x = element_text(size = 22, hjust = 0.43, family = "bebas neue"),
legend.position = "left",
legend.justification = c(-1, 0.25),
legend.key.height = unit(95, "points"),
legend.text = element_text(size = 12, face = "bold", family = "raleway"),
legend.title = element_text(face = "bold", size = 20, hjust = 0.5, family = "bebas neue"),
plot.title.position = "plot",
plot.title = element_text(hjust = 0.5, size = 28, face = "bold", family = "bebas neue"),
axis.title = element_text(face = "bold", size = 18),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank()) +
guides(fill = guide_legend(title="Education\nLevel", reverse=TRUE),
colour = guide_legend(nrow = 2)) +
geom_text(data = gg_rent, # Individual percentiles
aes(x = (xmin+xmax)/2, y = (ymin+ymax)/2, # min+max/2 gives the center location to the mapping
label = paste0(round(pc, 4)*100,"%")), # Multiply % by 100, round, and add % symbol
fontface = "bold",
family = "raleway") +
geom_label(data = geom_text_margin("Rent"), # Marginal probabilities
aes(x = x, y = y,
label = paste0(round(wt_pc*100, 1),"%")),
fontface="bold.italic",
family = "raleway")
Rental respondents report having a much more unimodal distribution in income levels with a tight range of 9.9%-14.4%. However the lower bound of the distribution range does fall on the highest income levels, and the upper bound falls on the lower end of income levels. This indicates that while this distribution appears unimodal, it is skewed in the opposite direction from the home owner’s income levels.
Renters education levels follow a similar distribution as the home owners however the entire distribution shifts downward with the top education level shrinking by nearly half (40% to 26%), and the lowest education level more than doubling (5.7% to 13.7%).
ds %>% filter(renthom1 == "Other arrangement") %>% ggplot() +
geom_mosaic(aes(x=product(edu_lev, income_lev), fill = edu_lev)) +
ggtitle("Other Arrangement") +
ylab("Education Level") +
xlab("Income Level") +
scale_fill_discrete(labels = function(x) str_wrap(x, width = 10)) +
theme(axis.text.x = element_text(color = "white", size = 15, angle = 65, vjust = 0.65, hjust=0.5,
face = "bold", family = "bebas neue"),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.y = element_blank(),
axis.title.x = element_text(size = 22, hjust = 0.43, family = "bebas neue"),
legend.position = "left",
legend.justification = c(-1, 0.25),
legend.key.height = unit(95, "points"),
legend.text = element_text(size = 12, face = "bold", family = "raleway"),
legend.title = element_text(face = "bold", size = 20, hjust = 0.5, family = "bebas neue"),
plot.title.position = "plot",
plot.title = element_text(hjust = 0.5, size = 28, face = "bold", family = "bebas neue"),
axis.title = element_text(face = "bold", size = 18),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank()) +
guides(fill = guide_legend(title="Education\nLevel", reverse=TRUE),
colour = guide_legend(nrow = 2)) +
geom_text(data = gg_other, # Individual percentiles
aes(x = (xmin+xmax)/2, y = (ymin+ymax)/2, # min+max/2 gives the center location to the mapping
label = paste0(round(pc, 4)*100,"%")), # Multiply % by 100, round, and add % symbol
fontface = "bold",
family = "raleway") +
geom_label(data = geom_text_margin("Other arrangement"), # Marginal probabilities
aes(x = x, y = y,
label = paste0(round(wt_pc*100, 1),"%")),
fontface="bold.italic",
family = "raleway")
“Other arrangement” respondents income level has a similar distribution to the renters but with a quite larger uptick (Δ4.6%) in the lowest income level.
“Other arrangement” education levels follow a similar distribution as the renters with the entire distribution shifting downward from the top where the home owner’s was and with the top education level shrinking by nearly half (40% to 22%), and the lowest education level more than doubling (5.7% to 13.8%).
options(digits = 4)
# P(A) - Marginal Probability of Home Residence Status (Home, Rent, and Other) ---
p_A <- ds %>% group_by(renthom1) %>% summarise(n = n(), p = n/nrow(.))
# P Own
p_A1 <- filter(p_A, renthom1 == "Own") %>% select(p)
# P Rent
p_A2 <- filter(p_A, renthom1 == "Rent") %>% select(p)
# P Other arrangement
p_A3 <- filter(p_A, renthom1 == "Other arrangement") %>% select(p)
# -------------------------------------
# P(B|A) - Conditional Probability of B (income level), given A (home residence status) ---
# Income | Own
p_B1 <- ds %>% filter(renthom1 == "Own") %>% group_by(income_lev) %>% summarise(n = n(), p = n/nrow(.))
# Income | Rent
p_B2 <- ds %>% filter(renthom1 == "Rent") %>% group_by(income_lev) %>% summarise(n = n(), p = n/nrow(.))
# Income | Other
p_B3 <- ds %>% filter(renthom1 == "Other arrangement") %>% group_by(income_lev) %>% summarise(n = n(), p = n/nrow(.))
# -------------------------------------
# P(C|A&B) - Conditional Probability of C (education level), given A (home residence status) & B (income level) ---
p_C1 <- { own_edu_inc <- list() # Preallocate list to record each iteration
for(i in seq_along(unique(ds$income_lev))){ # Filters to residence status and iterates through income levels
own_edu_inc[[i]] <- ds %>% filter(renthom1 == "Own" & income_lev == levels(ds$income_lev)[i]) %>%
group_by(edu_lev) %>% summarise(n = n(), p = n/nrow(.)) } # Group by edu_lev to get weights among each income level
own_edu_inc <- setNames(own_edu_inc, levels(ds$income_lev)) # Rename list elements to match income levels
own_edu_inc <- bind_rows(own_edu_inc, .id = "income") # Bind list elements into a data frame with income level names (.id)
}
p_C2 <- { rent_edu_inc <- list()
for(i in seq_along(unique(ds$income_lev))){
rent_edu_inc[[i]] <- ds %>% filter(renthom1 == "Rent" & income_lev == levels(ds$income_lev)[i]) %>%
group_by(edu_lev) %>% summarise(n = n(), p = n/nrow(.)) }
rent_edu_inc <- setNames(rent_edu_inc, levels(ds$income_lev))
rent_edu_inc <- bind_rows(rent_edu_inc, .id = "income")
}
p_C3 <- { other_edu_inc <- list()
for(i in seq_along(unique(ds$income_lev))){
other_edu_inc[[i]] <- ds %>% filter(renthom1 == "Other arrangement" & income_lev == levels(ds$income_lev)[i]) %>%
group_by(edu_lev) %>% summarise(n = n(), p = n/nrow(.)) }
other_edu_inc <- setNames(other_edu_inc, levels(ds$income_lev))
other_edu_inc <- bind_rows(other_edu_inc, .id = "income")
}
# -------------------------------------
# Joint Probabilities of A∩B∩C ---
own_proababilities <- p_C1 %>% # Add p_B values
left_join(p_B1 %>% select(1,3), by = c("income" = "income_lev"), suffix = c("C", "B")) %>%
mutate(pA = p_A[1,3]) %>% unnest(pA) %>% rename(pA = p) %>% # Add p_A values
mutate(joint = pC*pB*pA) # Create joint probabilities
rent_proababilities <- p_C2 %>% # Add p_B values
left_join(p_B2 %>% select(1,3), by = c("income" = "income_lev"), suffix = c("C", "B")) %>%
mutate(pA = p_A[2,3]) %>% unnest(pA) %>% rename(pA = p) %>% # Add p_A values
mutate(joint = pC*pB*pA) # Create joint probabilities
other_proababilities <- p_C3 %>% # Add p_B values
left_join(p_B3 %>% select(1,3), by = c("income" = "income_lev"), suffix = c("C", "B")) %>%
mutate(pA = p_A[3,3]) %>% unnest(pA) %>% rename(pA = p) %>% # Add p_A values
mutate(joint = pC*pB*pA) # Create joint probabilities
# -----------------------------------------
# P(A|B∩C) using loops with Bayes Theorem ---
own_proababilities$A_given_BandC <- 1:nrow(own_proababilities)
for(i in 1:nrow(own_proababilities)){
own_proababilities$A_given_BandC[i] <- own_proababilities[i, "joint"] /
(own_proababilities[i,"joint"] + rent_proababilities[i,"joint"] + other_proababilities[i, "joint"])
}
own_proababilities <- own_proababilities %>% unnest(A_given_BandC)
rent_proababilities$A_given_BandC <- 1:nrow(rent_proababilities)
for(i in 1:nrow(rent_proababilities)){
rent_proababilities$A_given_BandC[i] <- rent_proababilities[i, "joint"] /
(own_proababilities[i,"joint"] + rent_proababilities[i,"joint"] + other_proababilities[i, "joint"])
}
rent_proababilities <- rent_proababilities %>% unnest(A_given_BandC)
other_proababilities$A_given_BandC <- 1:nrow(other_proababilities)
for(i in 1:nrow(other_proababilities)){
other_proababilities$A_given_BandC[i] <- other_proababilities[i, "joint"] /
(own_proababilities[i,"joint"] + rent_proababilities[i,"joint"] + other_proababilities[i, "joint"])
}
other_proababilities <- other_proababilities %>% unnest(A_given_BandC)
# -------------------------------------------
Math Help Sources:
- https://stats.stackexchange.com/questions/258379/why-is-pa-bc-pbc-pab-c
- https://stats.stackexchange.com/questions/318674/what-is-pa-b-c-when-b-c-are-both-independent
- https://stats.stackexchange.com/questions/288260/what-does-pabpac-simplify-to
- https://ctools.ece.utah.edu/Probability/ConditionalProb/DiscreteRandVars/ProbCondDiscreteEx4.pdf
own_proababilities %>% ggplot(aes(x = income, y = A_given_BandC, color = edu_lev, group = edu_lev)) +
geom_line() +
geom_point() +
guides(color = guide_legend(reverse=TRUE)) +
scale_color_discrete("Education Level") +
ylab("P(A|B&C)") +
xlab("Income Level") +
ggtitle(expression(atop(
paste("Probability of ", underline("Ownership"), " Status Given Income & Education Level")))) +
theme(axis.text.x = element_text(angle = 90),
axis.text = element_text(family = "raleway"),
axis.title = element_text(family = "bebas neue", size = 14),
plot.title = element_text(family = "bebas neue", hjust = 0.5),
legend.title = element_text(family = "bebas neue"),
legend.text = element_text(family = "raleway"),
legend.key.height = unit(30, "points"),
panel.grid.minor = element_blank()) +
scale_y_continuous(n.breaks = 15) +
scale_color_discrete(name = "Education Level",
labels = function(x) str_wrap(levels(ds$edu_lev), width = 20))
According to these probabilities, owning a home is roughly 5% - 10% more likely for every level that is increased along the income scale with the lowest income class averaging a 35% chance of owning a home and the highest income class averaging an 85% chance of owning a home.
Education levels are fairly grouped together among all 4. The high school graduate rate is surprisingly higher than college attendees and college graduates throughout most of the plot which would indicate that graduating high school is the most meaningful way to increase the probabilities of owning a home. The spread between college attendees and college graduates is also very minimal which would indicate that college graduation does not meaningfully impact the chances of owning a home more than a college attendee’s.
rent_proababilities %>% ggplot(aes(x = income, y = A_given_BandC, color = edu_lev, group = edu_lev)) +
geom_line() +
geom_point() +
guides(color = guide_legend(reverse=TRUE)) +
scale_color_discrete("Education Level") +
ylab("P(A|B&C)") +
xlab("Income Level") +
ggtitle(expression(atop(
paste("Probability of ", underline("Renter"), " Status Given Income & Education Level")))) +
theme(axis.text.x = element_text(angle = 90),
axis.text = element_text(family = "raleway"),
axis.title = element_text(family = "bebas neue", size = 14),
plot.title = element_text(family = "bebas neue", hjust = 0.5),
legend.title = element_text(family = "bebas neue"),
legend.text = element_text(family = "raleway"),
legend.key.height = unit(30, "points"),
panel.grid.minor = element_blank()) +
scale_y_continuous(n.breaks = 15) +
scale_color_discrete(name = "Education Level",
labels = function(x) str_wrap(levels(ds$edu_lev), width = 20))
The probability of being a renter rapidly decreases as respondents move further up the income scale. Respondents in the bottom of the income range are nearly 5 times more likely to be a renter than those in the highest income class.
Education levels are again fairly grouped together with the highest spread being among high school graduated and high school drop outs. This indicates again that high school graduation rapidly decreases the decreases the chances of being a renter despite income class. College attendees and college graduates are nearly identical again suggesting little to no decrease in probability between the two among income levels.
other_proababilities %>% ggplot(aes(x = income, y = A_given_BandC, color = edu_lev, group = edu_lev)) +
geom_line() +
geom_point() +
guides(color = guide_legend(reverse=TRUE)) +
scale_color_discrete("Education Level") +
ylab("P(A|B&C)") +
xlab("Income Level") +
ggtitle(expression(atop(
paste("Probability of ", underline("Other Arrangement"), " Status Given Income & Education Level")))) +
theme(axis.text.x = element_text(angle = 90),
axis.text = element_text(family = "raleway"),
axis.title = element_text(family = "bebas neue", size = 14),
plot.title = element_text(family = "bebas neue", hjust = 0.5),
legend.title = element_text(family = "bebas neue"),
legend.text = element_text(family = "raleway"),
legend.key.height = unit(30, "points"),
panel.grid.minor = element_blank()) +
scale_y_continuous(n.breaks = 15) +
scale_color_discrete(name = "Education Level",
labels = function(x) str_wrap(levels(ds$edu_lev), width = 20))
Remembering that the “other arrangement” response is an ambiguous class of home residence status and considering this response was less than 5% this analysis will not hold as much weight at the prior two but we must still interpret the results.
The highest of probabilities in this group come from the bottom income class and then decreases exponentially by roughly 40% to the second tier income group. From the second tier to the top tier of income the decline in more linear suggesting the biggest contributor to lower the probability of “other arrangement” is getting out of the lowest income level.
Education levels are again fairly grouped together with the spread mostly appearing to widen only near the top two ends of the income range. This indicates that education is far less meaningful in decreasing the probability of “other arrangement” than in the prior two analyses. Again we also see that not graduating high school is the most meaningful increase in probability among income levels.
How do US State property values affect home ownership and renter rates?
In addition to the renthom1
responses and
X_state
in the BRFSS2013
data, we will need to
bring in an additional data set.
zhvi_df
- The Zillow Home
Value Index will be used to retrieve the average property value for
each month of 2013 from all 50 states.
renthom1
: Home owners, renters, and other
arrangement
X_state
: All 50 US States
pc
: Percentage of renthom1
for each
state
value
: Average 2013 property value for each state
Import data.
# 2013 ZHVI pulled from fredr package, csv uploaded to avoid API key requirement
zhvi_df <- read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vQdLUASpAAazGWjC0qN9lP0M5v9mNguGK9JW-TPR6KQ7v40w8nXhxnCrwpBNM7bSXfjEkVXcWOOl98g/pub?output=csv")
zhvi_df %>% head()
## # A tibble: 6 × 2
## state value
## <chr> <dbl>
## 1 AL 120507
## 2 AK 247127
## 3 AZ 176595
## 4 AR 107318
## 5 CA 365339
## 6 CO 248422
# Create list of state abbreviates from state names
ab <- fips_codes %>% select(state, state_name) %>% unique()
Calculate the percentage of home owners, renters, and other for each
state from brfss2013
and join
with
zhvi_df
and ab
(state abbreviations).
# Percentage of home owners per state
state_own <- brfss2013 %>% select(X_state, renthom1) %>% group_by(X_state) %>%
count(renthom1) %>% drop_na() %>%
summarize(renthom1, pc = n/sum(n)) %>%
filter(renthom1 == "Own" & !X_state %in% c("District of Columbia", "Puerto Rico", "Guam")) %>%
left_join(ab, by = c("X_state" = "state_name")) %>%
mutate(X_state = state) %>% select(-state) %>%
left_join(zhvi_df, by = c("X_state" = "state"))
# Percentage of renters per state
state_rent <- brfss2013 %>% select(X_state, renthom1) %>% group_by(X_state) %>%
count(renthom1) %>% drop_na() %>%
summarize(renthom1, pc = n/sum(n)) %>%
filter(renthom1 == "Rent" & !X_state %in% c("District of Columbia", "Puerto Rico", "Guam")) %>%
left_join(ab, by = c("X_state" = "state_name")) %>%
mutate(X_state = state) %>% select(-state) %>%
left_join(zhvi_df, by = c("X_state" = "state"))
# Percentage of other arrangement per state
state_other <- brfss2013 %>% select(X_state, renthom1) %>% group_by(X_state) %>%
count(renthom1) %>% drop_na() %>%
summarize(renthom1, pc = n/sum(n)) %>%
filter(renthom1 == "Other arrangement" & !X_state %in% c("District of Columbia", "Puerto Rico", "Guam")) %>%
left_join(ab, by = c("X_state" = "state_name")) %>%
mutate(X_state = state) %>% select(-state) %>%
left_join(zhvi_df, by = c("X_state" = "state"))
state_own %>% ggplot(aes(x = value, y = pc)) +
geom_point() +
geom_smooth(color = "red", method = "lm") +
scale_x_continuous(labels = scales::dollar_format(),
breaks = c(100000, 150000, 200000, 250000, 300000, 350000, 400000, 450000, 500000)) +
scale_y_continuous(labels = scales::percent_format(),
breaks = c(.55, .6, .65, .7, .75, .8)) +
labs(title = "Home Owners per State by Property Value",
subtitle = paste0("Correlation = ", round(cor(state_own$value, state_own$pc), 3))) +
xlab("Mean ZHVI State Home Values (2013)") +
ylab("Percentage of Home Owners Per State") +
theme(panel.grid.minor = element_blank(),
plot.title = element_text(face = "bold", size = 18, hjust = 0.5, family = "bebas neue"),
plot.subtitle = element_text(face = "italic", hjust = 0.5),
axis.title = element_text(face = "bold", size = 12, family = "bebas neue"),
axis.text = element_text(face = "bold", size = 10, family = "raleway"),
axis.text.x = element_text(margin = margin(t = 8), angle = 25))
According to the amount of home owners in the
brfss2013
survey, there is a far higher concentration of home owners in states that are more affordable. Remembering that the percentage of home owners was calculated from number of respondents for each state implies that the total number of surveys in each state most likely does not effect these results.Since mostly home owners were interviewed, the y-axis fully above 50% is not meaningful so we are more concerned with the change in y values along the property value levels (the slope).
state_rent %>% ggplot(aes(x = value, y = pc)) +
geom_point() +
geom_smooth(color = "green", method = "lm")+
scale_x_continuous(labels = scales::dollar_format(),
breaks = seq(100000, 500000, by = 50000)) +
scale_y_continuous(labels = scales::percent_format()) +
labs(title = "Renters per State by Property Value",
subtitle = paste0("Correlation = ", round(cor(state_rent$value, state_rent$pc), 3))) +
xlab("Mean ZHVI State Home Values (2013)") +
ylab("Percentage of Renters Per State") +
theme(panel.grid.minor = element_blank(),
plot.title = element_text(face = "bold", size = 18, hjust = 0.5, family = "bebas neue"),
plot.subtitle = element_text(face = "italic", hjust = 0.5),
axis.title = element_text(face = "bold", size = 12, family = "bebas neue"),
axis.text = element_text(face = "bold", size = 10, family = "raleway"),
axis.text.x = element_text(margin = margin(t = 8), angle = 25))
- The x (property) values are still constant so while the dots might appear similar formation, we are still more concerned the degree of change in the y values along the x-axis. This time that change (slope) is upward (+0.641) reflecting that property values do influence the amount of renters per state.
state_other %>% ggplot(aes(x = value, y = pc)) +
geom_point() +
geom_smooth(color = "blue", method = "lm")+
scale_x_continuous(labels = scales::dollar_format(),
breaks = seq(100000, 500000, by = 50000)) +
scale_y_continuous(labels = scales::percent_format()) +
labs(title = "\"Other Arrangement\" per State by Property Value",
subtitle = paste0("Correlation = ", round(cor(state_other$value, state_other$pc), 3))) +
xlab("Mean ZHVI State Home Values (2013)") +
ylab("Percentage of \"Other Arrangement\" Per State") +
theme(panel.grid.minor = element_blank(),
plot.title = element_text(face = "bold", size = 18, hjust = 0.5, family = "bebas neue"),
plot.subtitle = element_text(face = "italic", hjust = 0.5),
axis.title = element_text(face = "bold", size = 12, family = "bebas neue"),
axis.text = element_text(face = "bold", size = 10, family = "raleway"),
axis.text.x = element_text(margin = margin(t = 8), angle = 25))
Again remembering that the “other arrangement” response is an ambiguous class of home residence status and considering this response was less than 5% this analysis will not hold as much weight at the prior two but we must still interpret the results.
The misleading nature of the linear regression line here must be noted. If Hawaii (the only point above $400,000) were excluded, this line would show a negative correlation in “other arrangement” responses.
Which states have poor seat belt habits and do they correlate to states with higher vehicle fatalities?
Limitations: Total number of adults over 18 per state (what was calculated), is not the same as number of eligible licensed drivers per state (what could not be obtained).
In addition to the seatbelt
responses in the
BRFSS2013
data, we will need to bring in 3 more data
sets.
FARS2013
- The Fatality
Analysis Reporting System is a database provided by the National
Highway Traffic Safety Administration for data regarding motor vehicle
injuries and fatalities.
US_POPULATION_ESTIMATES
- Yearly
State Population Estimates provided by the United States Census
Bureau.
STATE_AGES
- Population
Distribution by Age is provided by the Kaiser Family Foundation, a
nonprofit organization focusing on national health issues, as well as
the U.S. role in global health policy.
X_state
: 51 States (including District of Columbia)
seatbelt
(transposed): Responses per state; “Always”,
“Nearly always”, “Sometimes”, “Seldom”, or “Never”.
always_pc
(calculated): Percentage of “Always” response
per state.
POPESTIMATE2013
: Total population estimate of each state
for 2013.
Children 0-18
: Percentage of children under 18-years-old
per state in 2013.
adults
(calculated): Total estimated number of adults
per state in 2013.
fatals
: Number of motor vehicle fatalities per state in
2013.
fatals_permil
(calculated): Motor vehicle fatalities per
1 million adults in each state during 2013.
Pull the seatbelt
variable from brfss2013
and transpose it to get a count of each response per state.
seatbelt <- brfss2013 %>% filter(!X_state %in% c("0", "80", "Puerto Rico", "Guam")) %>%
group_by(X_state) %>% count(seatbelt) %>%
filter(!seatbelt %in% c("Never drive or ride in a car", NA)) %>% # Remove non vehicle riders and NA's
pivot_wider(names_from = seatbelt, values_from = n)
seatbelt %>% head()
## # A tibble: 6 × 6
## # Groups: X_state [6]
## X_state Always `Nearly always` Never Seldom Sometimes
## <chr> <int> <int> <int> <int> <int>
## 1 Alabama 5546 275 67 58 158
## 2 Alaska 3355 436 130 78 196
## 3 Arizona 3383 250 68 31 116
## 4 Arkansas 4035 427 75 61 195
## 5 California 9017 300 77 32 101
## 6 Colorado 10585 992 188 152 360
Import and prepare the FARS2013
data set.
FARS2013 <- read_csv("Data/ACCIDENT.csv")
# Pull STATE & FATALS, group by state and total fatalities per state
FARS2013 <- FARS2013 %>% select(STATE, FATALS) %>%
group_by(STATE) %>% summarize(fatals = sum(FATALS))
# Pull FIPS code for each state_name
states <- fips_codes %>% select(state_code, state_name) %>%
mutate(state_code = as.numeric(state_code)) %>% # Because FARS FIPS were numeric
unique()
# Join state_name on FIPS & fatalities, then remove FIPS
FARS2013 <- FARS2013 %>% left_join(states, by = c("STATE" = "state_code")) %>% select(-STATE)
Retrieve POPESTIMATE2013
from
US_POPULATION_ESTIMATES
for State Population Estimate from
2013.
# Only pull name of state and 2013 estimate
pop <- read_csv("https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/national/totals/nst-est2019-alldata.csv") %>%
select(NAME, POPESTIMATE2013)
Retrieve Children 0-18
variable from
STATE_AGES
for each state to calculate number of
adults.
download.file("https://docs.google.com/spreadsheets/d/e/2PACX-1vSndFprgfW4wYC9WxGgDuFMR1IeKv1cu3dPdaPtjjAYWFqkrE8sdRHf-iuDWVIEOrIn8jg15qwnfTDm/pub?output=csv",
destfile = "Data/children.csv")
children <- read_csv("Data/children.csv", skip = 3) %>%
select(Location, `Children 0-18`)
Joining them all together.
# Join `FARS2013` with `pop`
FARS2013 <- FARS2013 %>% left_join(pop, by = c("state_name" = "NAME"))
# Join `FARS2013` with `children`, then calculate number of fatalities per 1 million adults
FARS2013 <- FARS2013 %>% left_join(children, by = c("state_name" = "Location")) %>%
mutate(adults = round((1 - `Children 0-18`) * POPESTIMATE2013, 0), # Percentage of adults * total estimate
fatals_permil = round(fatals/adults*1000000, 0)) %>% # Number of fatalities per 1 million adults
select(-POPESTIMATE2013, -`Children 0-18`) # No longer need estimate or child percentage
# Join `FARS2013` with `seatbelt`, then calculate percentage of "Always" response per state
FARS2013 <- FARS2013 %>%
left_join(seatbelt, by = c("state_name" = "X_state")) %>%
group_by(state_name) %>%
mutate(always_pc = Always/sum(Always, `Nearly always`, Sometimes, Seldom, Never),
.before = Always)
# Reorder state_name factor levels in ascending fatalities per million
levels <- FARS2013 %>% arrange(fatals_permil) %>% ungroup() %>% pull(state_name) %>% as.factor(.)
FARS2013$state_name <- factor(FARS2013$state_name, levels = levels)
FARS2013 <- FARS2013 %>% rename(state = state_name) # plot_usmap() requires variable to be named `state` or `fips`
FARS2013 %>% head()
## # A tibble: 6 × 10
## # Groups: state [6]
## fatals state adults fatals_permil always_pc Always `Nearly always` Never
## <dbl> <fct> <dbl> <dbl> <dbl> <int> <int> <int>
## 1 853 Alabama 3.63e6 235 0.909 5546 275 67
## 2 51 Alaska 5.37e5 95 0.800 3355 436 130
## 3 849 Arizona 4.92e6 173 0.879 3383 250 68
## 4 498 Arkansas 2.20e6 226 0.842 4035 427 75
## 5 3107 California 2.86e7 109 0.946 9017 300 77
## 6 482 Colorado 3.95e6 122 0.862 10585 992 188
## # ℹ 2 more variables: Seldom <int>, Sometimes <int>
To obtain the centroid locations for all 51 states in this analysis,
we will use usmapdata::centroid_labels()
to plot values on
top of a map.
#gg_FARS2013 <- FARS2013 %>% left_join(centroid_labels("states"), by = c("state" = "full")) %>%
# select(-fips, -abbr) # Remove unnecessary columns provided from centroid_labels()
gg_FARS2013 <- FARS2013 %>%
left_join(centroid_labels("states"), by = c("state" = "full")) %>%
mutate(geom = str_replace_all(geom, pattern = "[c\\(\\)]", replacement = "")) %>% # Remove 'c()' from the strings
mutate(geom = str_replace_all(geom, pattern = ",", replacement = " ")) %>% # Replace commas with spaces
separate(geom, into = c("x", "y"), sep = "\\s+", convert = TRUE) # Separate into x and y, converting to numeric
## Warning: There were 51 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `geom = str_replace_all(geom, pattern = "[c\\(\\)]", replacement
## = "")`.
## ℹ In group 1: `state = "Alabama"`.
## Caused by warning in `stri_replace_all_regex()`:
## ! argument is not an atomic vector; coercing
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 50 remaining warnings.
usmap
Packageplot_usmap(data = FARS2013, values = "fatals_permil", color = "black") +
scale_fill_continuous(
limits = c(50, 300),
breaks = c(50, 100, 150, 200, 250, 300),
low = "white", high = "red",
name = "Motor Vehicle Deaths\nper 1 Million Adults") +
ggtitle("Fatalities per Million by U.S. State (2013)") +
theme(plot.title = element_text(family = "bebas neue", color = "white", face = "bold",
size = 18, hjust = 0.5),
legend.text = element_text(family = "raleway", color = "white", face = "bold", size = 10),
legend.title = element_text(family = "bebas neue", color = "white", face = "bold",
size = 10, hjust = 0.5, vjust = 1),
legend.key.height = unit(30, "points"),
legend.key.width = unit(25, "points"),
legend.justification = "center",
legend.position = "right",
legend.background = element_blank(),
plot.background = element_blank()) +
geom_text(aes(x = x, y = y),
color = "black",
data = gg_FARS2013,
label = round(FARS2013$fatals_permil, 0),
size = 3,
family = "raleway")
- The highest concentrated regions for fatalities per million adults by state appear in the north mid west and the south.
plot_usmap(data = FARS2013, values = "always_pc", color = "black") +
scale_fill_continuous(
low = "red", high = "white",
name = "Percentage That \n\"Always\" Wear Seat Belts",
label = scales::label_percent()) +
ggtitle("Seat Belt Habits Among U.S. States") +
theme(plot.title = element_text(family = "bebas neue", color = "white", face = "bold",
size = 18, hjust = 0.5),
legend.text = element_text(family = "raleway", color = "white", face = "bold", size = 10),
legend.title = element_text(family = "bebas neue", color = "white", face = "bold",
size = 10, hjust = 0.5, vjust = 1),
legend.key.height = unit(30, "points"),
legend.key.width = unit(25, "points"),
legend.justification = "center",
legend.position = "right",
legend.background = element_blank(),
plot.background = element_blank()) +
geom_text(aes(x = x, y = y),
color = "black",
data = gg_FARS2013,
label = paste0(round(FARS2013$always_pc*100,0),"%"),
size = 3,
family = "raleway")
- The highest concentrated regions for poor seat belt habits by state appear again in the north mid west and New Hampshire.
Since the north mid west was the only obvious visual comparison among both maps, plot these points on a scatter plot and use the same linear regression line from prior to get a better visual of how these two variables compare.
FARS2013 %>% ggplot(aes(x = always_pc, y = fatals_permil)) +
geom_point() +
stat_smooth(color = "red", method = lm) +
# theme_bw() +
labs(title = "Better Seat Belt Habits Moderately Correlate to Fewer Vehicle Fatalities",
subtitle = paste0("Correlation = ", round(cor(FARS2013$fatals_permil, FARS2013$always_pc), 3))) +
xlab("\"Always\" Wear Seatbelts") +
ylab("Fatalities per Million by State") +
scale_x_continuous(label = scales::label_percent()) +
scale_y_continuous(breaks = c(50, 100, 150, 200, 250, 300)) +
theme(plot.title = element_text(family = "bebas neue", size = 15, face = "bold", hjust = 0.5),
plot.subtitle = element_text(face = "italic", hjust = 0.5),
axis.title = element_text(family = "bebas neue", size = 14, face = "bold"),
axis.text = element_text(family = "raleway", color = "white", size = 10, face = "bold"))
There is a moderate correlation (-0.37) to fewer vehicle fatalities and better seat belt habits.
So wear your seat belts!