Analyzing Data From U.S. Road Accidents With Data Visualization

Written by datomai | Published 2021/03/22
Tech Story Tags: data-science | big-data-analytics | big-data | data-integration | solving-data-integration | data | data-analysis | data-analytics

TLDR Every 24 seconds, a life is lost on the road, and it costs countries around 3% of their gross domestic product. In 2019, 36,096 lives were lost on U.S. roads and according to the National Highway Traffic System Administration (NHTSA), it costs about $871 billion annually to the US economy. In this article, we would be analyzing data related to US road accidents, which can be utilized to study accident-prone locations and also helps understand the factors that influence road fatalities in the United States.via the TL;DR App

Every 24 seconds, a life is lost on the road, and it costs countries around 3% of their gross domestic product - World Health Organization.
With a fatality rate of 12.3% per 100,000 inhabitants, traffic accidents are a leading cause of death in the United States. In 2019, it was reported that 36,096 lives were lost on U.S. roads and according to the National Highway Traffic System Administration (NHTSA), it costs about $871 billion annually to the U.S. economy.  
In this article, we would be analyzing data related to US road accidents, which can be utilized to study accident-prone locations and also helps understand the factors that influence road fatalities in the United States. 
“Having access to accurate and updated information about the current road situation enables drivers, pedestrians, and passengers to make informed road safety decisions.”
- Association For Safe International Road Travel.

Data Sources

The datasets used in this article have been imported from:
  1. Kaggle
  2. NHTSA’s Fatal Crashes dataset
The countrywide traffic accident dataset covers around 49 states and includes data collected from February 2016 to June 2020. It contains 3.5 million records, with data collected from several data providers. Two APIs (Application Programming Interface) - MapQuest Traffic and Microsoft Bing Map Traffic have been used to provide streaming traffic event data. 
Both these datasets are open data sets and are available to the data science community.  

Data Analysis 

The data has been analyzed using Python and visualized with Power BI.

Factors Influencing Accidents 

Time of Day
  • Hypothesis: Most accidents occur during the early morning or late at night
  • Data Analyzed: Time of accident
  • Observation: A peak in accidents was observed around 7:53 AM
When it comes to traffic accidents, the first hypothesis was that most accidents likely occurred either during the early morning rush hours or late at night. As we have the time of the accident, we can validate our hypothesis.
Upon exploring the traffic accidents dataset of February 2016 to June 2020 using python in the Jupyter notebook, most accidents seem to occur around 7:53 AM. 
The time of the accident at each location is the local time, so there are no variations here for converting the local time zone to a standard time zone that needs to be accounted for.
Weather Conditions
  • Hypothesis: Weather conditions impact traffic accidents
  • Data Analyzed: Weather conditions during the time of accidents
  • Observation:  A peak in accidents was observed during clear weather
Apart from the time of day, weather conditions are our second hypothesis around what could have caused an accident. The data set contains different weather conditions during the time of the accidents. Weather conditions cannot be considered a major influencing factor of road accidents. Data clearly shows about 48% of the accidents happen when the weather is clear and fair.
The top 6 weather conditions show that a significant number of accidents occur even in ‘clear’ weather conditions.

Fatality Data Analysis

The NHTSA fatality data has various information regarding the number of fatalities in the United States for each year. For this project, data from the year 2010 to 2019 was considered for analysis.

Major Observations:

Fatalities by Year
Analyzing NHTSA’s fatality data shows a steady decline in the number of fatalities in the United States since the year 2016.
Fatalities by States
Although the percentage change in fatalities from 2017 was -2%, 5 cities had an increase of above 5% change in fatalities in 2018. New Hampshire absorbed the highest with a 44% increase in fatalities. The fatality rate per 100,000 population for 2018 was 11.2. However, 27 of the 52 States had fatality rates higher than the country-wide rate. 
Fatalities by Months
Looking at five years of fatal crash data, February had the least occurrence of deadly crashes except for 2016. In 2016, February observed an increase of 72 fatal crashes as compared to January of the same year.
The highest occurrences of deadly crashes occurred in October except in 2017, where July had the highest occurrence - 130 more fatal crashes than October of the same year.
Fatalities by Gender
71% of 2018 fatalities were male.
Fatalities by Age
People from the age group 25 to 34 had more deaths than other age groups.
Alcohol-Related Fatalities
28%, i.e., about one-third of the fatalities in 2018 was alcohol-related. Most alcohol-related fatalities occurred between 9 PM and 3 AM.
However, alcohol-related crashes declined by 2% from 2012 to 2018.

Summary

From the analysis of both datasets, we can conclude that:
  1. Fatalities that result from road accidents have continued to reduce since its last high recording in 2016.
  2. About a third of traffic fatalities are caused by alcohol-impaired driving.
  3. Alcohol-related fatalities occurred more between 9 PM and 3 AM.
  4. The weather has little influence on on-road crashes as most accidents occur when the weather is clear and fair.
  5. Despite a 2% decrease in fatalities in 2018, 5 cities experienced above 5% increase in the fatality, with New Hampshire increasing by 44% from its 2017 figure of 102 fatalities.
You could analyze alcohol-related fatalities based on some POI (Point of Interest) if you had the exact location of such accidents. This could help determine if alcohol-related fatalities are more likely to occur within a specific radius of bars and restaurants.
With access to data on the exact geographical locations of accidents, you could visually map and plot the most fatal traffic accident locations for each city/county by the time of day and week.
With access to data about the movement of the vehicles, we could get information on the kind of movements the drivers were attempting when an accident/fatality occurred e.g. - left turn, right turn, etc.

Code

All the code used for this analysis is available at GitHub and can be found on Github.

References


Written by datomai | Digital Transformation Company
Published by HackerNoon on 2021/03/22