A while back, I wrote an introductory post to what I envisioned as a data-driven series of posts about cycling in DC. I did follow up to that post with a brief overview of working with Open Data DC data in Rust, but I failed to revisit the series. 😅
Well, I’m finally revisiting it!
Why I’m revisiting the “Cycling in DC” series
My last blog post was all about how my transit habits have changed since I officially moved to the city in June of last year. One of the primary takeaways was that I now tend to walk or bike everywhere.
Well, my major update since then is that I sold my car, and am now officially car-free! 🎉 This is something that I’ve been working towards/thinking about for quite a while, and I finally made the jump.
Now, my “car” is my e-bike–which I bought a few weeks before I sold my car with the intention of using it as my primary mode of transport.
Since getting my e-bike, I’ve been out cycling more frequently than before. I keep a mental map of the various routes I can use to get places, and tend to stick to bike lanes that I feel are safe (or at least, safer than others), roads that have less traffic, or trails like the Capital Crescent or Rock Creek Park. I usually feel pretty safe biking around the city, but my partner worries quite a bit whenever I’m out and about.
There are definitely routes that I hate taking, for example: M Street towards Georgetown. While there is a bike lane, it feels really dicey…and once you reach Georgetown, you’re basically just dumped into traffic.
Note
I’d love to see M Street in Georgetown pedestrianized entirely, both as a pedestrian and cyclist…but I’m well aware that will probably never happen.
Generally though, now that my main mode of transportation is cycling, I am far more conscious of the gaps in the bike network–and far more concerned that something might happen to me. I’m mostly concerned about where I’m most at risk of being hit by a car, and how I can stay safely away from that situation.
So, like any data-minded person, I decided to dive into the data and see what sticks out.
Exploring the Crashes Dataset
We can explore data provided by Open Data DC. The main dataset we’ll be using is the Crashes dataset, which is the source the DC Government uses for its own reporting.
The crashes dataset contains a lot more than just incidents related to cyclists. Specifically, it contains:
publicly-available, mapped locations of the Metropolitan Police Department’s (MPD) reported crash records. DDOT processes new crash reports each night and creates a mapped point for each crash, provided the MPD has sufficient location info (good quality latitude/longitude coordinates and/or address information). Note therefore that any crashes that occur in the District on Federal Lands are investigated by the US Park Police or other agencies, and are not recorded in this MPD crash data.
Below is my setup for exploring this data, along with some helper functions that make it a bit easier to pull this data.
I’m not going to walk through these helper functions step-by-step, as they’re very similar to the functions I used in first post. The main difference is that they now use httr2!
Code
library(dplyr)library(httr2)library(ggplot2)library(janitor, include.only ="clean_names")library(lubridate)library(plotly)library(purrr)library(tidyr)## these are some helper functions to work with Open Data DC. ## generally, I use the geoJSON API because it's the one I'm most ## familiar with for this data source.get_dc_crashes_data <-function (rate_limit =1000, offset =0, record_limit =NULL) {if (rate_limit >1000) {warning(sprintf("A limit value of %s exceeeds the allowed rate limit. Defaulting to maximum of 1000.", limit) ) rate_limit =1000 }if (!is.null(record_limit)) { rate_limit <- record_limit } url <-"https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Public_Safety_WebMercator/MapServer/24/query?outFields=*&where=1%3D1&f=geojson" results_list <-list() n_responses <- rate_limitwhile (n_responses >0) { r <-request(url) |>req_url_query(resultOffset = offset, resultRecordCount = rate_limit ) resp <-req_perform(r)if (resp_status(resp) !=200) {error("Bad request.") } json_resp <-resp_body_json(resp) results_list <-append(results_list, json_resp[["features"]])if (!is.null(record_limit)) { n_responses <-0 } else { n_responses <-length(json_resp[["features"]]) } offset <- offset + n_responses }if (offset >0&length(results_list) ==0) {stop("Offset of this length returned no results, do you already have all the results?") } results_list}parse_crashes_to_df <-function(crashes) { total_crashes <-1:length(crashes)map(total_crashes, ~pluck(crashes, .x, "properties")) |>bind_rows() |>clean_names()}get_ward_coords <-function() { url <-"https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Administrative_Other_Boundaries_WebMercator/MapServer/53/query?outFields=*&where=1%3D1&f=geojson" r <-request(url) resp <-req_perform(r) json <-resp_body_json(resp) |>pluck("features") total_wards <-length(json)map(1:total_wards, ~(pluck(json, .x) |>pluck("geometry", "coordinates") |>unlist() |>matrix(ncol =2, byrow =TRUE) |>as_tibble() |>rename(long =1, lat =2) |>mutate(ward_name =pluck(json, .x, "properties", "NAME")) ) ) |>bind_rows()}get_bike_lane_coords <-function() { url <-"https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Transportation_Bikes_Trails_WebMercator/MapServer/2/query?outFields=*&where=1%3D1&f=geojson" r <-request(url) resp <-req_perform(r) json <-resp_body_json(resp) |>pluck("features") total_lanes <-length(json) lanes <-map(1:total_lanes, ~(pluck(json, .x) |>pluck("geometry", "coordinates") |>unlist() |>matrix(ncol =2, byrow =TRUE) |>as_tibble() |>rename(long =1, lat =2) |>mutate(route_name =pluck(json, .x, "properties", "ROUTENAME"), ward_name =paste("Ward", pluck(json, .x, "properties", "WARD_ID"), sep =" ") ) ) ) |>bind_rows()}
Now we can use those helper functions to get the data, and do a little bit of cleanup on the crashes dataframe to make it easier to work with for the purpose of evaluating crashes that involve cyclists.
Code
crashes <-get_dc_crashes_data() |>parse_crashes_to_df()wards <-get_ward_coords()bike_lanes <-get_bike_lane_coords()## keep the crashes that have some kind of cyclist injury## (we can do this using the API as well...)reports <- crashes |>filter(majorinjuries_bicyclist >0| minorinjuries_bicyclist >0| unknowninjuries_bicyclist >0| fatal_bicyclist >0) |>mutate(report_date =as_datetime(reportdate /1000, tz ="UTC"), latitude, longitude, ward,address = mar_address, majorinjuries_bicyclist, minorinjuries_bicyclist, unknowninjuries_bicyclist, fatal_bicyclist, .keep ="none" )
Where are crashes happening?
We can visualize all of the cyclist-related crashes on a heatmap, along with the city’s 8 wards, to see where the incidents are occurring.
Code
heatmap <- reports |>ggplot() +aes(x = longitude, y = latitude ) +geom_polygon(data = wards, aes(x = long, y = lat, group = ward_name), fill ="white", col ="black" ) +stat_density2d(geom ="polygon", aes(fill =after_stat(level), alpha =after_stat(level)) ) +scale_fill_gradientn(colours =rev(RColorBrewer::brewer.pal(10, "Spectral")), guide ="colourbar" ) +theme_void() +labs(fill =NULL, title ="Incidents Resulting in Cyclist Injuries in DC", subtitle ="Most incidents occur in Ward 1 and Ward 2.", caption ="Source: Open Data DC,\n Crashes Dataset " ) +guides(alpha ="none") +coord_cartesian(xlim =c(-77.15, -76.9), ylim =c(38.8, 39.0)) +theme(panel.grid.major =element_blank(), text =element_text(family ="IBM Plex Sans"),plot.title =element_text(size =14, face ="bold"), legend.position ="none", plot.caption =element_text(face ="italic") )print(heatmap)
Our heatmap reveals that most crashes occur in Ward 1 and Ward 2, with the vast majority occurring in Ward 2. For those unfamiliar with the city, Ward 2 is described by the Office of Planning as:
the home of National Mall, the White House, monuments and museums. It is the place where many tourists and other visitors spend the bulk of their time, and includes the images most associated with Washington, DC in the national and international psyches. Ward 2 also includes the Central Business District and the Federal Triangle where the highest concentration of office and jobs are in the city.
What’s up with Ward 2?
Let’s take a closer look at the crashes which occur in Ward 2. The plot below isolates Ward 2, and includes bike lanes as an overlay.
Code
reports_re_labeled <- reports |>rename(major = majorinjuries_bicyclist, minor = minorinjuries_bicyclist,unknown = unknowninjuries_bicyclist, fatal = fatal_bicyclist ) |>pivot_longer(cols =c(major, minor, unknown, fatal), names_to ="injury_level" ) |>filter(value >=1)ggplot() +geom_polygon(data = wards |>filter(ward_name =="Ward 2"), aes(x = long, y = lat, group = ward_name), fill ="white", col ="black" ) +geom_line(data = bike_lanes |>filter(ward_name =="Ward 2"), aes(x = long, y = lat,group = route_name, ), color ="grey50", linewidth =1 ) +geom_point(data = reports_re_labeled |>filter(ward =="Ward 2"),aes(x = longitude, y = latitude, group = injury_level,color = injury_level ), alpha = .25 ) +labs(title ="Incidents Resulting in Cyclist Injuries in Ward 2", caption ="Source: Open Data DC,\nCrashes Dataset ", color ="Injury Level" ) +scale_color_manual(values =c("red", "#4169e1", "#87ceeb", "#6a5acd") ) +theme_void() +theme(panel.grid.major =element_blank(), text =element_text(family ="IBM Plex Sans"),plot.title =element_text(size =14, face ="bold"), plot.caption =element_text(face ="italic"), legend.position ="top" )
The vast majority of recorded incidents in Ward 2 involve minor injuries (1205), while there are a handful of fatal incidents (4) and quite a few incidents involving major injuries (166).
The unfortunate thing about this plot is that it’s clear many injuries are occurring in corridors with bike lanes! That being said, this dataset has incidents spanning back to 2011 when the bike infrastructure in DC was not as good as it is now.
In fact, let’s take a look at this same plot but beginning in 2022.
Code
ggplot() +geom_polygon(data = wards |>filter(ward_name =="Ward 2"), aes(x = long, y = lat, group = ward_name), fill ="white", col ="black" ) +geom_line(data = bike_lanes |>filter(ward_name =="Ward 2"), aes(x = long, y = lat,group = route_name, ), color ="grey50", linewidth =1 ) +geom_point(data = reports_re_labeled |>filter(ward =="Ward 2", report_date >="2023-01-01"),aes(x = longitude, y = latitude, group = injury_level,color = injury_level ), alpha = .5 ) +labs(title ="Incidents Resulting in Cyclist Injuries in Ward 2", subtitle ="Jan. 2023 - Present",caption ="Source: Open Data DC,\nCrashes Dataset ", color ="Injury Level" ) +scale_color_manual(values =c("red", "#4169e1", "#87ceeb", "#6a5acd") ) +theme_void() +theme(panel.grid.major =element_blank(), text =element_text(family ="IBM Plex Sans"),plot.title =element_text(size =14, face ="bold"), plot.caption =element_text(face ="italic"), legend.position ="top" )
Unfortunately, there are still many incidents occurring around or in bike lanes.
How have injuries been trending city-wide?
Let’s take a look at the trend of incidents that have resulted in injuries of any kind to cyclists in DC.
Code
incidents_by_year <- reports |>mutate(year =year(report_date)) |>count(year) |>filter(year >=2016) ## this is just an aesthetic choice to add a dot to the end of the line :)end_point <- incidents_by_year |>filter(year <=2023) |>slice_max(order_by = year)incidents_by_year |>filter(year <=2023) |>ggplot() +aes(x = year, y = n ) +geom_area(fill ="deepskyblue4", alpha = .25) +geom_line(linewidth =1, color ="deepskyblue4" ) +geom_point(data = end_point, color ="deepskyblue4", size =2.5 ) +labs(x =NULL,y =NULL, title ="Incidents Resulting in Cyclist Injuries in Washington D.C.", subtitle ="the number of incidents resulting in injuries to cyclists has\ndeclined 44% since 2016, but ticked up in 2023.", caption ="Source: Open Data DC,\nCrashes Dataset " ) +theme_minimal() +theme(panel.grid.major =element_blank(), text =element_text(family ="IBM Plex Sans"),plot.title =element_text(size =14, face ="bold"), plot.caption =element_text(face ="italic"), legend.position ="top" )
The good news is that on a macro level, the number of incidents resulting in cyclist injuries has decreased. But it looks like there was an unfortunate tick up in 2023, so how are things looking for 2024 so far?
Code
cumulative_incidents_up_to_now <- reports |>mutate(year =year(report_date), week =week(floor_date(report_date, unit ="week")) ) |>count(year, week) |>group_by(year) |>mutate(cumulative =cumsum(n)) |>ungroup() |>filter(year >=2022, week <=week(today()) -1)cumulative_endpoints <- cumulative_incidents_up_to_now |>group_by(year) |>slice_max(order_by = week)cumulative_incidents_up_to_now |>ggplot() +aes(x = week, y = cumulative, color =as.factor(year) ) +geom_line(linewidth =1) +geom_point(data = cumulative_endpoints, size =2) +scale_color_manual(values =c("#007BFF", "#28A745", "#FFBF00")) +labs(title ="Total Incidents Resulting in Cyclist Injuries in Washington D.C. by Year",subtitle ="the number of incidents in 2024 is outpacing 2023 and 2022.",caption ="Source: Open Data DC,\nCrashes Dataset ",x ="Week of Year",y =NULL,color =NULL ) +theme_minimal() +theme(panel.grid.major =element_blank(), text =element_text(family ="IBM Plex Sans"),plot.title =element_text(size =14, face ="bold"), plot.caption =element_text(face ="italic"), legend.position ="right" )
Looks like the cumulative number of incidents resulting in injuries to cyclists has increased 23.4% compared to this same time in 2023, and 51% compared to this same time in 2022. Not great, DC!
Wrapping Up
The increasing number of injuries to cyclists is concerning–especially given the progress that DC has made in building additional bike infrastructure. I’ll be anxiously checking in on these numbers as the year goes on, hopefully we’ll see a reversal in this trend.
To assist in that reversal, I’m hoping that DC continues to take steps towards creating a safe, low-stress bike network that reduces the chance of being injured by a car. That means low-speed, low-traffic streets that are complemented by protected bike lanes or off-street paths and trails.
If you live in DC, are a cyclist, and are interested in advocating for a low-stress network, check out the Washington Area Bicyclist Association’s website!