This data comes from the FBI's National Instant Criminal Background Check System. From the official site:
Mandated by the Brady Handgun Violence Prevention Act of 1993 and launched by the FBI on November 30, 1998, NICS is used by Federal Firearms Licensees (FFLs) to instantly determine whether a prospective buyer is eligible to buy firearms. Before ringing up the sale, cashiers call in a check to the FBI or to other designated agencies to ensure that each customer does not have a criminal record or isn’t otherwise ineligible to make a purchase. More than 230 million such checks have been made, leading to more than 1.3 million denials.
Before moving onto the meat of this investigation, I want to bring up a few notes.
At the beginning of this project, I wanted to use the data to calculate gun sales. It turns out that calculating this statistic is more complex than you would think.
In the PDF files the FBI provides, there is an important note:
These statistics represent the number of firearm background checks initiated through the NICS They do not represent the number of firearms sold Based on varying state laws and purchase scenarios, a one-to-one correlation cannot be made between a firearm background check and a firearm sale
Even though a one-to-one correlation cannot be made between a check and a sale, some organizations have estimated gun sales with this data. For example, the New York Times used a method suggested in the Small Arms Survey by Jurgen Brauer, a professor at Georgia Regents University. Long gun and handgun checks were counted as 1.1 sales, and multiple-gun checks were counted as two sales. Permit and other types of checks were omitted. The mulitplier used in the survey were based on interviews Mr. Brauer had with gun shop owners.
So background checks do not equal sales, but you can have a grasp of what gun sales are by visualizing this data.
In the visualization section, you will see statistics by state in some of the graphs. When looking at these, keep in mind that even though you can estimate gun sales using the NICS background check data, state laws for firearms differ from each other all across the nation. Even knowing this, the statistics between each state were still very interesting to learn about.
The data used to explore the firearm background checks can be downloaded from a BuzzFeed News GitHub repository here.
The code in this GitHub repository downloads that PDF, parses it, and produces a spreadsheet/CSV of the data. Click here to download the data, which currently covers November 1998 – February 2018.
The population numbers come from the the United States Census government website here. This data has annual estimates of the resident population for the U.S. from April 1, 2010 to July 1, 2017.
While exploring this data, I realized that I needed numbers for children under 18 within each state. As you'll see in the visualization, this was used to calculate how many people were of age to recieve a background check while buying a gun. That number was then used to calculate the percentage of checks compared to the population of that state.
Let's explore.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
% matplotlib inline
df_c17 = pd.read_csv('nst-est2017-01.csv')
df_gun = pd.read_csv('nics-firearm-background-checks.csv')
df_gun.head()
df_gun.shape
df_gun.info()
df_gun.describe()
df_gun.hist(figsize=(20,20));
pd.plotting.scatter_matrix(df_gun, figsize=(30,30));
# lets look at the census data now
df_c17.head()
df_c17.info()
To start the cleaning process, we'll start by narrowing down the rows and columns we need from the census data
df_c17.head()
# only take rows 3 through 59, these include the regions and states we need
df_c17 = df_c17.iloc[3:59]
#rename the columns
df_c17.columns = ['region/state', 'census_april_2010', 'est_base', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017']
df_c17.head()
#copy the data sets
df1 = df_c17.copy()
df2 = df_c17.copy()
#drop the region/state column from the first copy
df1.drop(columns = ['region/state'], inplace = True)
#get rid of the commas in all of the numbers
for col in df1.columns:
df1[col] = df1[col].replace({',':''}, regex=True)
#change all of the numbers to integers
df1 = df1.apply(pd.to_numeric)
df1.info()
# strangley, some of the state names had periods in the beginning of the string
df2['region/state'] = df2['region/state'].map(lambda x: x.lstrip('.'))
# change the region/state column into a list to add into other dataframe
df2 = df2['region/state'].tolist()
df2
# insert the list as a column onto the first copy that we cleaned
df1.insert(loc=0, column='region/state', value=df2)
df1.head()
# rename the dataframe
df_census_clean = df1
df_underage = pd.read_csv('pop_under_18.csv')
df_underage.head()
# use the first row for the name of the columns
df_underage = df_underage.rename(columns = df_underage.iloc[0])
#drop the row with the orignal column names
df_underage = df_underage.reindex(df_underage.index.drop(0))
#drop all columns after estimated total column
df_underage.drop(df_underage.columns[4:], axis = 1, inplace = True)
#drop the Id and Id2 columns
cols = [0,1]
df_underage.drop(df_underage.columns[cols], axis=1, inplace=True)
# rename columns
df_underage.columns = ['geography', 'est_total']
df_underage.head(1)
df_underage.info()
# change the est_total column to integers
df_underage['est_total'] = df_underage['est_total'].astype(int)
# create empty dictionary
df_underage_dict = {}
# write a loop that takes the assigns the states as keys, and the totals as values
x = 0
while x < len(df_underage):
state_name = df_underage['geography'].iloc[x]
total_num = df_underage['est_total'].iloc[x]
df_underage_dict[state_name] = total_num
x = x+1
df_underage_dict
# map dictionary to df_census_clean
df_census_clean['pop_underage_2016'] = df_census_clean['region/state'].map(df_underage_dict)
df_census_clean.head(8)
# fill NaN with zeros
df_census_clean['pop_underage_2016'].fillna(0, inplace=True)
# data frame for all checks done in 2016
df_gun_2016 = checks_by_state[checks_by_state.month.str.contains('2016') == True]
# get totals in 2016 in each state
df_gun_2016 = df_gun_2016.groupby('state')['totals'].sum()
# change to dict
df_gun_2016 = df_gun_2016.to_dict()
# map values of checks in 2016 to state names in the census data
df_census_clean['gun_permits_2016'] = df_census_clean['region/state'].map(df_gun_2016)
df_census_clean.head()
# change NaN to 0
df_census_clean['gun_permits_2016'].fillna(0, inplace=True)
df_census_clean.head(8)
# sum the totals by month
totals = df_gun.groupby("month")["totals"].sum()
# plot graph
tick_placement = pd.np.arange(2, len(totals), 12)
plt.style.use('seaborn')
ax = totals.plot(figsize=(20,8))
ax.set_title("Monthly NICS Background Check Totals Since Nov. 1998", fontsize=24)
ax.set_yticklabels([ "{0:,.0f}".format(y) for y in ax.get_yticks() ], fontsize=12);
plt.setp(ax.get_xticklabels(), rotation=0, fontsize=12)
ax.set_xticks(tick_placement)
ax.set_xticklabels([ totals.index[i].split("-")[0] for i in tick_placement ])
ax.set_xlim(0, len(totals) - 1)
ax.set_xlabel("")
The FBI’s background check numbers come with caveats: As seen in the late February-early March 2014 bubble, many checks are for concealed carry permits, not actual gun sales. Kentucky runs a new check on each concealed carry license holder each month. And of course, the FBI’s numbers don’t include private gun sales, many of which do not require a background check. A forthcoming study conducted by Harvard researchers found that roughly 40 percent of respondents had acquired their most recent firearm without going through a background check. Despite those vagaries, the FBI’s NICS numbers are widely accepted as the best proxy for total gun sales in a given time period.
For this visualization, I grouped all of the states together and summed up the total checks per state since 1998
# get the total checks by each state and each month
checks_by_state = df_gun.groupby(['state', 'month'])['totals'].sum().reset_index()
# group the states and sum the totals
state_totals = checks_by_state.groupby('state')['totals'].sum()
# plot graph
state_total_tick_placement = pd.np.arange(len(state_totals))
plt.style.use('seaborn')
state_ax = state_totals.plot(kind='bar',figsize=(20,8))
state_ax.set_title("NICS Background Check Totals By State Since Nov. 1998", fontsize=24)
state_ax.set_yticklabels([ "{0:,.0f}".format(y) for y in state_ax.get_yticks() ], fontsize=12);
plt.setp(state_ax.get_xticklabels(), fontsize=12)
state_ax.set_xticks(state_total_tick_placement)
state_ax.set_xticklabels(state_totals.index)
state_ax.set_xlim(0, len(state_totals) - 1)
state_ax.set_xlabel("")
There are currently 35 states where more than half of all state legislators have a grade of A- or better, according to an analysis of data provided by Vote Smart, a non-partisan, non-profit research organization. In 14 states, including most of those in the gun belt, that majority exceeds two thirds, reaching or approaching veto-proof. In Kentucky and Oklahoma, the number extends beyond 80 percent.
I chose to start at 1999 because when calculating percentage changes from 1998, the numbers are quite large. This was due to very little background check numbers in November of 1998, when the NICS program was created. Below I will show both percentage changes since 1998 and 1999. The latter year gives a more clear picture of how much the checks have grown.
# a loop that uses the total checks in each state in November 1998
# and compares it to the total checks in 2018
x=0
change_dict = {}
while x < len(checks_by_state):
og_num = checks_by_state.iloc[x]['totals']
new_num = checks_by_state.iloc[x + 231]['totals']
decrease = new_num - og_num
perc_change = round(decrease / og_num * 100, 0)
change_dict[checks_by_state.iloc[x]['state']] = perc_change
x = x + 232
# change dictionary to a pandas Series
df_percent_change = pd.Series(change_dict, name = 'percent_change')
df_percent_change.index.name = 'state'
df_percent_change
# drop rows that have inf values
df_percent_change = df_percent_change.replace([np.inf, -np.inf], np.nan).dropna()
# sort from biggest to smallest
df_percent_change = df_percent_change.sort_values(ascending = False)
# plot graph
percent_total_tick_placement = pd.np.arange(len(df_percent_change))
plt.style.use('seaborn')
percent_ax = df_percent_change.plot(kind='bar',figsize=(20,8))
percent_ax.set_title("Firearm Background Check Growth Rate by State Since Nov. 1998", fontsize=24)
percent_ax.set_yticklabels([ "{0:,.0f}".format(y) for y in percent_ax.get_yticks() ], fontsize=12);
plt.setp(percent_ax.get_xticklabels(), fontsize=12)
percent_ax.set_xticks(percent_total_tick_placement)
percent_ax.set_xticklabels(df_percent_change.index)
percent_ax.set_xlim(0, len(df_percent_change) - 1)
percent_ax.set_ylabel('Percent')
percent_ax.set_xlabel("")
# november 1998 totals in Illinois
checks_by_state[checks_by_state['state'] == 'Illinois'].iloc[0]
# Pennsylvania
checks_by_state[checks_by_state['state'] == 'Pennsylvania'].iloc[0]
# South Carolina
checks_by_state[checks_by_state['state'] == 'South Carolina'].iloc[0]
# Virginia
checks_by_state[checks_by_state['state'] == 'Virginia'].iloc[0]
# totals in november 1998 for every state
df_nov98_totals = checks_by_state[checks_by_state.month.str.contains('1998-11') == True]
df_nov98_totals['totals'].mean()
# dataset that excludes November 1998
checks_by_state_clean = checks_by_state[checks_by_state.month.str.contains('1998-11') == False]
# using same method as above to make a dictionary
# and turn it into a pandas Series
x=0
change_dict_clean = {}
while x < len(checks_by_state_clean):
og_num = checks_by_state_clean.iloc[x]['totals']
new_num = checks_by_state_clean.iloc[x + 230]['totals']
decrease = new_num - og_num
perc_change = round(decrease / og_num * 100, 0)
change_dict_clean[checks_by_state_clean.iloc[x]['state']] = perc_change
x = x + 231
df_percent_change_clean = pd.Series(change_dict_clean, name = 'percent_change')
df_percent_change_clean.index.name = 'state'
df_percent_change_clean
# drop inf
df_percent_change_clean = df_percent_change_clean.replace([np.inf, -np.inf], np.nan).dropna()
# sort biggest to smallest
df_percent_change_clean = df_percent_change_clean.sort_values(ascending = False)
# plot graph
percent_total_tick_placement_clean = pd.np.arange(len(df_percent_change_clean))
plt.style.use('seaborn')
percent_ax_clean = df_percent_change_clean.plot(kind='bar',figsize=(20,8))
percent_ax_clean.set_title("Firearm Background Check Growth Rate by State Since Dec. 1999", fontsize=24)
percent_ax_clean.set_yticklabels([ "{0:,.0f}".format(y) for y in percent_ax_clean.get_yticks() ], fontsize=12);
plt.setp(percent_ax_clean.get_xticklabels(), fontsize=12)
percent_ax_clean.set_xticks(percent_total_tick_placement_clean)
percent_ax_clean.set_xticklabels(df_percent_change_clean.index)
percent_ax_clean.set_xlim(0, len(df_percent_change_clean) - 1)
percent_ax_clean.set_ylabel('Percent')
percent_ax_clean.set_xlabel("")
# create a data frame for each state that has total checks by month
# Georgia
checks_by_state_georgia = checks_by_state_clean[checks_by_state_clean['state']=='Georgia']
checks_by_state_georgia = checks_by_state_georgia.groupby('month')['totals'].sum()
# Kentucky
checks_by_state_kentucky = checks_by_state_clean[checks_by_state_clean['state']=='Kentucky']
checks_by_state_kentucky = checks_by_state_kentucky.groupby('month')['totals'].sum()
# Massachusetts
checks_by_state_mass = checks_by_state_clean[checks_by_state_clean['state']=='Massachusetts']
checks_by_state_mass = checks_by_state_mass.groupby('month')['totals'].sum()
# plot graph
growth_rates_tick_placement = pd.np.arange(2, len(checks_by_state_georgia), 12)
plt.style.use('seaborn')
ax_state_growth = checks_by_state_georgia.plot(figsize=(20,8), label='Georgia')
plt.plot(checks_by_state_kentucky, label='Kentucky')
plt.plot(checks_by_state_mass, label='Massachusetts')
plt.legend()
ax_state_growth.set_title("Monthly NICS Gun Permit Check Totals For States With Most Growth", fontsize=24)
ax_state_growth.set_yticklabels([ "{0:,.0f}".format(y) for y in ax_state_growth.get_yticks() ], fontsize=12);
plt.setp(ax_state_growth.get_xticklabels(), rotation=0, fontsize=12)
ax_state_growth.set_xticks(growth_rates_tick_placement)
ax_state_growth.set_xticklabels([checks_by_state_georgia.index[i].split("-")[0] for i in growth_rates_tick_placement])
ax_state_growth.set_xlim(0, len(checks_by_state_georgia) - 1)
ax_state_growth.set_xlabel("")
# we'll use the same method of creating a dictionary
# and turning it into a series
perc_guns_2016_dict = {}
x=5
while x < len(df_census_clean):
# subtracting the under 18 population to the census
not_underage = df_census_clean['2016'].iloc[x] - df_census_clean['pop_underage_2016'].iloc[x]
# number of background checks in 2016
num_guns = df_census_clean['gun_permits_2016'].iloc[x]
# percentage
percent = round(num_guns/not_underage * 100, 2)
state = df_census_clean['region/state'].iloc[x]
perc_guns_2016_dict[state] = percent
x = x + 1
perc_guns_2016_dict
# change dict into series and sort from biggest to smallest
df_perc_guns_2016 = pd.Series(perc_guns_2016_dict, name = 'percent')
df_perc_guns_2016.index.name = 'state'
df_perc_guns_2016 = df_perc_guns_2016.sort_values(ascending=False)
# plot graph
percent_total_tick_placement_2016 = pd.np.arange(len(df_perc_guns_2016))
plt.style.use('seaborn')
percent_ax_2016 = df_perc_guns_2016.plot(kind='bar',figsize=(20,8))
percent_ax_2016.set_title("Number of Firearm Background Checks Compared to State Population in 2016", fontsize=24)
plt.setp(percent_ax_2016.get_xticklabels(), fontsize=12)
percent_ax_2016.set_xticks(percent_total_tick_placement_2016)
percent_ax_2016.set_xticklabels(df_perc_guns_2016.index)
percent_ax_2016.set_xlim(0, len(df_perc_guns_2016) - 1)
percent_ax_2016.set_ylabel('Percent')
percent_ax_2016.set_xlabel("")
This analysis allowed me to see the bigger picture when it comes to guns in America. Even though I didn't calculate actual gun sales, the data still allowed me to see trends between each state and all over the U.S. These are the conclusions I have arrived at:
As noted in the introduction of this analysis, background checks can give you an idea of gun sale activity, but a one-to-one comparison cannot be made. I chose not to estimate gun sales numbers for this reason. Although it is debated on the exact percentage, researchers are discovering that a significant amount of gun sales happen without background checks. In a report by The Guardian:
The 2015 survey found that just 22% of gun owners who had acquired a gun in the previous two years reported doing so without a background check. Gun owners who had acquired a gun earlier than that – between two and five years before 2015, or more than five years before – were more likely to remember doing so without a background check. A full 57% of gun owners who reported acquiring their most recent gun more than five years before 2015 reported getting the gun without a background check. Because the survey relied on the memories of the participants, the researchers wrote, the more recent gun acquisition data might be more accurate.
In a similar report by The Trace:
- Roughly 70 percent: Gun owners who purchased their most recent gun.
- Roughly 30 percent: Gun owners who did not purchase their most recent gun, instead obtaining it through a transfer (i.e., a gift, an inheritance, a swap between friends).
- Zeroing in on the population of gun buyers, about 34 percent did not go through a background check.
- Among the gun owners who got their firearms through a transfer, roughly two-thirds did not go through a background check.
Add it up, and it works out to:
- Roughly 60 percent: the share of gun owners surveyed who did go through a background check when they obtained (through sale or transfer) their latest gun.
- Roughly 40 percent: the share of gun owners who did not.
In the first visualization, the graph showed a steady increase in background checks for guns since 1998. The spikes in December likely due to Black Friday sales. Spikes that do not happen in December could be due to calls for new gun restrictions.
Along with the state having the highest amount of checks, the state has some of the highest background check activity, as well as the highest background check growth. Kentucky has some of the least restrictive gun control compared to other states, and over 80% of legislatures recieve a high grade from the NRA on gun legislation. Stricter gun laws have been introduced due to recent gun violence. The last graph in this report shows that more firearm background checks happen in Kentucky than people over 18. This shows high activity, but also has a caveat. Kentucky runs a new check on each concealed carry license holder each month, adding to the total number of checks for the state.