Investigating NICS Firearm Background Checks with FBI Gun Data

Table of Contents

Introduction

This data comes from the FBI's National Instant Criminal Background Check System. From the official site:

Mandated by the Brady Handgun Violence Prevention Act of 1993 and launched by the FBI on November 30, 1998, NICS is used by Federal Firearms Licensees (FFLs) to instantly determine whether a prospective buyer is eligible to buy firearms. Before ringing up the sale, cashiers call in a check to the FBI or to other designated agencies to ensure that each customer does not have a criminal record or isn’t otherwise ineligible to make a purchase. More than 230 million such checks have been made, leading to more than 1.3 million denials.

Before moving onto the meat of this investigation, I want to bring up a few notes.

Firearm Background Checks Do Not Equal Sales

At the beginning of this project, I wanted to use the data to calculate gun sales. It turns out that calculating this statistic is more complex than you would think.

In the PDF files the FBI provides, there is an important note:

These statistics represent the number of firearm background checks initiated through the NICS They do not represent the number of firearms sold Based on varying state laws and purchase scenarios, a one-to-one correlation cannot be made between a firearm background check and a firearm sale

Even though a one-to-one correlation cannot be made between a check and a sale, some organizations have estimated gun sales with this data. For example, the New York Times used a method suggested in the Small Arms Survey by Jurgen Brauer, a professor at Georgia Regents University. Long gun and handgun checks were counted as 1.1 sales, and multiple-gun checks were counted as two sales. Permit and other types of checks were omitted. The mulitplier used in the survey were based on interviews Mr. Brauer had with gun shop owners.

So background checks do not equal sales, but you can have a grasp of what gun sales are by visualizing this data.

States cannot be directly compared

In the visualization section, you will see statistics by state in some of the graphs. When looking at these, keep in mind that even though you can estimate gun sales using the NICS background check data, state laws for firearms differ from each other all across the nation. Even knowing this, the statistics between each state were still very interesting to learn about.

Data Wrangling

The Gun Data

The data used to explore the firearm background checks can be downloaded from a BuzzFeed News GitHub repository here.

The code in this GitHub repository downloads that PDF, parses it, and produces a spreadsheet/CSV of the data. Click here to download the data, which currently covers November 1998 – February 2018.

The Census Data

The population numbers come from the the United States Census government website here. This data has annual estimates of the resident population for the U.S. from April 1, 2010 to July 1, 2017.

While exploring this data, I realized that I needed numbers for children under 18 within each state. As you'll see in the visualization, this was used to calculate how many people were of age to recieve a background check while buying a gun. That number was then used to calculate the percentage of checks compared to the population of that state.

Let's explore.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
% matplotlib inline
In [2]:
df_c17 = pd.read_csv('nst-est2017-01.csv')
df_gun = pd.read_csv('nics-firearm-background-checks.csv')
In [3]:
df_gun.head()
Out[3]:
month state permit permit_recheck handgun long_gun other multiple admin prepawn_handgun ... returned_other rentals_handgun rentals_long_gun private_sale_handgun private_sale_long_gun private_sale_other return_to_seller_handgun return_to_seller_long_gun return_to_seller_other totals
0 2018-02 Alabama 24547.0 0.0 9220.0 7172.0 353.0 490 0.0 15.0 ... 0.0 0.0 0.0 29.0 39.0 4.0 0.0 1.0 2.0 45171
1 2018-02 Alaska 214.0 0.0 2279.0 1948.0 268.0 166 0.0 2.0 ... 0.0 0.0 0.0 17.0 5.0 6.0 0.0 0.0 0.0 5392
2 2018-02 Arizona 6513.0 435.0 14966.0 9058.0 1341.0 934 0.0 15.0 ... 0.0 0.0 0.0 20.0 18.0 5.0 0.0 2.0 0.0 36281
3 2018-02 Arkansas 4341.0 970.0 7671.0 6405.0 338.0 614 48.0 18.0 ... 0.0 0.0 0.0 24.0 33.0 3.0 0.0 0.0 0.0 24266
4 2018-02 California 39165.0 0.0 38531.0 24768.0 3940.0 0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 107649

5 rows × 27 columns

In [4]:
df_gun.shape
Out[4]:
(12760, 27)
In [5]:
df_gun.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12760 entries, 0 to 12759
Data columns (total 27 columns):
month                        12760 non-null object
state                        12760 non-null object
permit                       12736 non-null float64
permit_recheck               1375 non-null float64
handgun                      12740 non-null float64
long_gun                     12741 non-null float64
other                        5775 non-null float64
multiple                     12760 non-null int64
admin                        12737 non-null float64
prepawn_handgun              10817 non-null float64
prepawn_long_gun             10815 non-null float64
prepawn_other                5390 non-null float64
redemption_handgun           10820 non-null float64
redemption_long_gun          10819 non-null float64
redemption_other             5390 non-null float64
returned_handgun             2475 non-null float64
returned_long_gun            2420 non-null float64
returned_other               2090 non-null float64
rentals_handgun              1265 non-null float64
rentals_long_gun             1100 non-null float64
private_sale_handgun         3025 non-null float64
private_sale_long_gun        3025 non-null float64
private_sale_other           3025 non-null float64
return_to_seller_handgun     2750 non-null float64
return_to_seller_long_gun    3025 non-null float64
return_to_seller_other       2530 non-null float64
totals                       12760 non-null int64
dtypes: float64(23), int64(2), object(2)
memory usage: 2.6+ MB

Luckily, this dataset is pretty clean. All numbers are floats, so we can quickly get some statistics

In [6]:
df_gun.describe()
Out[6]:
permit permit_recheck handgun long_gun other multiple admin prepawn_handgun prepawn_long_gun prepawn_other ... returned_other rentals_handgun rentals_long_gun private_sale_handgun private_sale_long_gun private_sale_other return_to_seller_handgun return_to_seller_long_gun return_to_seller_other totals
count 12736.000000 1375.000000 12740.000000 12741.000000 5775.000000 12760.000000 12737.000000 10817.000000 10815.000000 5390.000000 ... 2090.000000 1265.000000 1100.000000 3025.000000 3025.000000 3025.000000 2750.000000 3025.000000 2530.000000 12760.000000
mean 6610.938128 1326.202182 6059.709262 7845.176281 378.960866 271.361834 58.141949 4.871499 7.787887 0.188868 ... 1.085167 0.107510 0.107273 17.233058 13.771570 1.231736 0.432727 0.481983 0.108696 22012.392085
std 24700.643313 10282.508342 8796.577531 9311.128953 1337.569194 779.820198 599.162551 10.927737 16.320863 1.082060 ... 4.454055 0.835988 0.747055 78.628717 60.243891 4.908641 1.481185 1.582317 0.425143 33629.425268
min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 0.000000 0.000000 888.000000 2092.000000 18.000000 15.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 4686.750000
50% 553.000000 0.000000 3114.000000 5147.000000 129.000000 126.000000 0.000000 0.000000 1.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 12519.500000
75% 4421.250000 1.000000 7421.500000 10435.000000 370.500000 303.250000 0.000000 5.000000 8.000000 0.000000 ... 0.000000 0.000000 0.000000 4.000000 6.000000 0.000000 0.000000 0.000000 0.000000 25846.500000
max 522188.000000 124056.000000 107224.000000 108058.000000 77929.000000 38907.000000 28083.000000 164.000000 269.000000 49.000000 ... 64.000000 12.000000 12.000000 1017.000000 777.000000 71.000000 28.000000 17.000000 4.000000 541978.000000

8 rows × 25 columns

Already we can see that the average number of checks between 1998-2018 is 22,012. The most checks that were done in a month was over half a million.

In [7]:
df_gun.hist(figsize=(20,20));
In [8]:
pd.plotting.scatter_matrix(df_gun, figsize=(30,30));

This scatter plot shows mostly positive relationships with permits and totals as well as the private sale categories. From this information we can expect the number of checks to rise throughout the years in this dataset.

In [9]:
# lets look at the census data now
df_c17.head()
Out[9]:
table with row headers in column A and column headers in rows 3 through 4. (leading dots indicate sub-parts) Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5 Unnamed: 6 Unnamed: 7 Unnamed: 8 Unnamed: 9 Unnamed: 10
0 Table 1. Annual Estimates of the Resident Popu... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Geographic Area April 1, 2010 NaN Population Estimate (as of July 1) NaN NaN NaN NaN NaN NaN NaN
2 NaN Census Estimates Base 2010 2011 2012 2013 2014 2015 2016 2017
3 United States 308,745,538 308,758,105 309,338,421 311,644,280 313,993,272 316,234,505 318,622,525 321,039,839 323,405,935 325,719,178
4 Northeast 55,317,240 55,318,350 55,388,349 55,642,659 55,860,261 56,047,732 56,203,078 56,296,628 56,359,360 56,470,581
In [10]:
df_c17.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 66 entries, 0 to 65
Data columns (total 11 columns):
table with row headers in column A and column headers in rows 3 through 4. (leading dots indicate sub-parts)    64 non-null object
Unnamed: 1                                                                                                      59 non-null object
Unnamed: 2                                                                                                      58 non-null object
Unnamed: 3                                                                                                      59 non-null object
Unnamed: 4                                                                                                      58 non-null object
Unnamed: 5                                                                                                      58 non-null object
Unnamed: 6                                                                                                      58 non-null object
Unnamed: 7                                                                                                      58 non-null object
Unnamed: 8                                                                                                      58 non-null object
Unnamed: 9                                                                                                      58 non-null object
Unnamed: 10                                                                                                     58 non-null object
dtypes: object(11)
memory usage: 5.8+ KB

The census data set is quite messy. We need to decide what we actually want to use for this investigation, clean up the column names, and change all of the strings into some integers we can work with. For this project, we will only need the population numbers from 2010-2017 from each region and state.

Data Cleaning

To start the cleaning process, we'll start by narrowing down the rows and columns we need from the census data

In [11]:
df_c17.head()
Out[11]:
table with row headers in column A and column headers in rows 3 through 4. (leading dots indicate sub-parts) Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5 Unnamed: 6 Unnamed: 7 Unnamed: 8 Unnamed: 9 Unnamed: 10
0 Table 1. Annual Estimates of the Resident Popu... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Geographic Area April 1, 2010 NaN Population Estimate (as of July 1) NaN NaN NaN NaN NaN NaN NaN
2 NaN Census Estimates Base 2010 2011 2012 2013 2014 2015 2016 2017
3 United States 308,745,538 308,758,105 309,338,421 311,644,280 313,993,272 316,234,505 318,622,525 321,039,839 323,405,935 325,719,178
4 Northeast 55,317,240 55,318,350 55,388,349 55,642,659 55,860,261 56,047,732 56,203,078 56,296,628 56,359,360 56,470,581

Since this file was converted to a csv format, the columns and rows are a mess. We'll only need the rows starting at United States

In [12]:
# only take rows 3 through 59, these include the regions and states we need
df_c17 = df_c17.iloc[3:59]

#rename the columns
df_c17.columns = ['region/state', 'census_april_2010', 'est_base', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017']

df_c17.head()
Out[12]:
region/state census_april_2010 est_base 2010 2011 2012 2013 2014 2015 2016 2017
3 United States 308,745,538 308,758,105 309,338,421 311,644,280 313,993,272 316,234,505 318,622,525 321,039,839 323,405,935 325,719,178
4 Northeast 55,317,240 55,318,350 55,388,349 55,642,659 55,860,261 56,047,732 56,203,078 56,296,628 56,359,360 56,470,581
5 Midwest 66,927,001 66,929,794 66,973,360 67,141,501 67,318,295 67,534,451 67,720,120 67,839,187 67,978,168 68,179,351
6 South 114,555,744 114,563,024 114,869,241 116,060,993 117,291,728 118,422,269 119,699,966 121,081,238 122,423,457 123,658,624
7 West 71,945,553 71,946,937 72,107,471 72,799,127 73,522,988 74,230,053 74,999,361 75,822,786 76,644,950 77,410,622

As we saw in the wrangling process, every single number in this data set are strings, to easily convert them all at once, we can get rid of the region/state column, convert all of the strings to numbers, then bring the column back. We'll do this by first making two copies of the data set.

In [13]:
#copy the data sets
df1 = df_c17.copy()
df2 = df_c17.copy()

#drop the region/state column from the first copy
df1.drop(columns = ['region/state'], inplace = True)

#get rid of the commas in all of the numbers
for col in df1.columns:
    df1[col] = df1[col].replace({',':''}, regex=True)
    
#change all of the numbers to integers
df1 = df1.apply(pd.to_numeric)

df1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 56 entries, 3 to 58
Data columns (total 10 columns):
census_april_2010    56 non-null int64
est_base             56 non-null int64
2010                 56 non-null int64
2011                 56 non-null int64
2012                 56 non-null int64
2013                 56 non-null int64
2014                 56 non-null int64
2015                 56 non-null int64
2016                 56 non-null int64
2017                 56 non-null int64
dtypes: int64(10)
memory usage: 4.5 KB
In [14]:
# strangley, some of the state names had periods in the beginning of the string
df2['region/state'] = df2['region/state'].map(lambda x: x.lstrip('.'))

# change the region/state column into a list to add into other dataframe
df2 = df2['region/state'].tolist()

df2
Out[14]:
['United States',
 'Northeast',
 'Midwest',
 'South',
 'West',
 'Alabama',
 'Alaska',
 'Arizona',
 'Arkansas',
 'California',
 'Colorado',
 'Connecticut',
 'Delaware',
 'District of Columbia',
 'Florida',
 'Georgia',
 'Hawaii',
 'Idaho',
 'Illinois',
 'Indiana',
 'Iowa',
 'Kansas',
 'Kentucky',
 'Louisiana',
 'Maine',
 'Maryland',
 'Massachusetts',
 'Michigan',
 'Minnesota',
 'Mississippi',
 'Missouri',
 'Montana',
 'Nebraska',
 'Nevada',
 'New Hampshire',
 'New Jersey',
 'New Mexico',
 'New York',
 'North Carolina',
 'North Dakota',
 'Ohio',
 'Oklahoma',
 'Oregon',
 'Pennsylvania',
 'Rhode Island',
 'South Carolina',
 'South Dakota',
 'Tennessee',
 'Texas',
 'Utah',
 'Vermont',
 'Virginia',
 'Washington',
 'West Virginia',
 'Wisconsin',
 'Wyoming']
In [15]:
# insert the list as a column onto the first copy that we cleaned
df1.insert(loc=0, column='region/state', value=df2)

df1.head()
Out[15]:
region/state census_april_2010 est_base 2010 2011 2012 2013 2014 2015 2016 2017
3 United States 308745538 308758105 309338421 311644280 313993272 316234505 318622525 321039839 323405935 325719178
4 Northeast 55317240 55318350 55388349 55642659 55860261 56047732 56203078 56296628 56359360 56470581
5 Midwest 66927001 66929794 66973360 67141501 67318295 67534451 67720120 67839187 67978168 68179351
6 South 114555744 114563024 114869241 116060993 117291728 118422269 119699966 121081238 122423457 123658624
7 West 71945553 71946937 72107471 72799127 73522988 74230053 74999361 75822786 76644950 77410622
In [16]:
# rename the dataframe
df_census_clean = df1

Now to bring in the census data for children under 18. This will allow us to calculate how much of the population is over 18 within each state.

In [17]:
df_underage = pd.read_csv('pop_under_18.csv')

df_underage.head()
Out[17]:
GEO.id GEO.id2 GEO.display-label HD01_VD01 HD02_VD01 HD01_VD02 HD02_VD02 HD01_VD03 HD02_VD03 HD01_VD04 ... HD01_VD06 HD02_VD06 HD01_VD07 HD02_VD07 HD01_VD08 HD02_VD08 HD01_VD09 HD02_VD09 HD01_VD10 HD02_VD10
0 Id Id2 Geography Estimate; Total: Margin of Error; Total: Estimate; In households: Margin of Error; In households: Estimate; In households: - Under 3 years Margin of Error; In households: - Under 3 years Estimate; In households: - 3 and 4 years ... Estimate; In households: - 6 to 8 years Margin of Error; In households: - 6 to 8 years Estimate; In households: - 9 to 11 years Margin of Error; In households: - 9 to 11 years Estimate; In households: - 12 to 14 years Margin of Error; In households: - 12 to 14 years Estimate; In households: - 15 to 17 years Margin of Error; In households: - 15 to 17 years Estimate; In group quarters Margin of Error; In group quarters
1 0400000US01 01 Alabama 1099517 3370 1097221 3446 172212 4738 114431 ... 184044 6174 188171 6271 179341 5624 196570 3223 2296 505
2 0400000US02 02 Alaska 187296 588 186806 595 32058 1696 20410 ... 31380 2199 33103 2308 29832 2132 29823 1192 490 47
3 0400000US04 04 Arizona 1631422 1266 1628590 1277 250883 6327 185293 ... 275660 7460 281451 7053 274104 7457 277440 1217 2832 323
4 0400000US05 05 Arkansas 707360 2248 704647 2280 114733 3388 73914 ... 123079 5561 119527 5327 118475 6041 118139 2743 2713 436

5 rows × 23 columns

Like the other census data, this set was a bit messy. We will only need the state and estimated total columns. After we extract that data, we will join it with the other census data.

In [18]:
# use the first row for the name of the columns
df_underage = df_underage.rename(columns = df_underage.iloc[0])

#drop the row with the orignal column names
df_underage = df_underage.reindex(df_underage.index.drop(0))

#drop all columns after estimated total column
df_underage.drop(df_underage.columns[4:], axis = 1, inplace = True)

#drop the Id and Id2 columns
cols = [0,1]
df_underage.drop(df_underage.columns[cols], axis=1, inplace=True)

# rename columns
df_underage.columns = ['geography', 'est_total']

df_underage.head(1)
Out[18]:
geography est_total
1 Alabama 1099517

Lets take a look at the data types to see if we need to convert anything

In [19]:
df_underage.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 52 entries, 1 to 52
Data columns (total 2 columns):
geography    52 non-null object
est_total    52 non-null object
dtypes: object(2)
memory usage: 1.2+ KB
In [20]:
# change the est_total column to integers
df_underage['est_total'] = df_underage['est_total'].astype(int)

For the next step, we need to join df_underage with df_census clean. This will be done by creating a python dictionary with the underage dataframe, and then mapping the matching states of the census data with the values of df_underage.

In [21]:
# create empty dictionary
df_underage_dict = {}

# write a loop that takes the assigns the states as keys, and the totals as values
x = 0
while x < len(df_underage):
    state_name = df_underage['geography'].iloc[x]
    total_num = df_underage['est_total'].iloc[x]
    df_underage_dict[state_name] = total_num
    x = x+1
    
df_underage_dict
Out[21]:
{'Alabama': 1099517,
 'Alaska': 187296,
 'Arizona': 1631422,
 'Arkansas': 707360,
 'California': 9093249,
 'Colorado': 1260512,
 'Connecticut': 752162,
 'Delaware': 204237,
 'District of Columbia': 120782,
 'Florida': 4145140,
 'Georgia': 2510610,
 'Hawaii': 307765,
 'Idaho': 436457,
 'Illinois': 2922719,
 'Indiana': 1577533,
 'Iowa': 726967,
 'Kansas': 714919,
 'Kentucky': 1012457,
 'Louisiana': 1117281,
 'Maine': 254951,
 'Maryland': 1346152,
 'Massachusetts': 1378947,
 'Michigan': 2191183,
 'Minnesota': 1288623,
 'Mississippi': 723153,
 'Missouri': 1390064,
 'Montana': 228674,
 'Nebraska': 474085,
 'Nevada': 677502,
 'New Hampshire': 259841,
 'New Jersey': 1984439,
 'New Mexico': 489514,
 'New York': 4179386,
 'North Carolina': 2296894,
 'North Dakota': 174595,
 'Ohio': 2608971,
 'Oklahoma': 961956,
 'Oregon': 868246,
 'Pennsylvania': 2674845,
 'Puerto Rico': 695876,
 'Rhode Island': 208640,
 'South Carolina': 1099964,
 'South Dakota': 213901,
 'Tennessee': 1503252,
 'Texas': 7291694,
 'Utah': 921847,
 'Vermont': 118684,
 'Virginia': 1867560,
 'Washington': 1629061,
 'West Virginia': 376883,
 'Wisconsin': 1284991,
 'Wyoming': 140860}
In [22]:
# map dictionary to df_census_clean
df_census_clean['pop_underage_2016'] = df_census_clean['region/state'].map(df_underage_dict)

df_census_clean.head(8)
Out[22]:
region/state census_april_2010 est_base 2010 2011 2012 2013 2014 2015 2016 2017 pop_underage_2016
3 United States 308745538 308758105 309338421 311644280 313993272 316234505 318622525 321039839 323405935 325719178 NaN
4 Northeast 55317240 55318350 55388349 55642659 55860261 56047732 56203078 56296628 56359360 56470581 NaN
5 Midwest 66927001 66929794 66973360 67141501 67318295 67534451 67720120 67839187 67978168 68179351 NaN
6 South 114555744 114563024 114869241 116060993 117291728 118422269 119699966 121081238 122423457 123658624 NaN
7 West 71945553 71946937 72107471 72799127 73522988 74230053 74999361 75822786 76644950 77410622 NaN
8 Alabama 4779736 4780135 4785579 4798649 4813946 4827660 4840037 4850858 4860545 4874747 1099517.0
9 Alaska 710231 710249 714015 722259 730825 736760 736759 737979 741522 739795 187296.0
10 Arizona 6392017 6392309 6407002 6465488 6544211 6616124 6706435 6802262 6908642 7016270 1631422.0

Since we don't need or have regional and national data for children under 18, a NaN is put in place of the row. We should change it to an integer.

In [23]:
# fill NaN with zeros
df_census_clean['pop_underage_2016'].fillna(0, inplace=True)

Later on in the analysis, we will need the amount of background checks that were done in 2016. Once we get that data, we'll add it as a column in the census data.

In [40]:
# data frame for all checks done in 2016
df_gun_2016 = checks_by_state[checks_by_state.month.str.contains('2016') == True]

# get totals in 2016 in each state
df_gun_2016 = df_gun_2016.groupby('state')['totals'].sum()

# change to dict
df_gun_2016 = df_gun_2016.to_dict()

# map values of checks in 2016 to state names in the census data
df_census_clean['gun_permits_2016'] = df_census_clean['region/state'].map(df_gun_2016)

df_census_clean.head()
Out[40]:
region/state census_april_2010 est_base 2010 2011 2012 2013 2014 2015 2016 2017 pop_underage_2016 gun_permits_2016
3 United States 308745538 308758105 309338421 311644280 313993272 316234505 318622525 321039839 323405935 325719178 0.0 NaN
4 Northeast 55317240 55318350 55388349 55642659 55860261 56047732 56203078 56296628 56359360 56470581 0.0 NaN
5 Midwest 66927001 66929794 66973360 67141501 67318295 67534451 67720120 67839187 67978168 68179351 0.0 NaN
6 South 114555744 114563024 114869241 116060993 117291728 118422269 119699966 121081238 122423457 123658624 0.0 NaN
7 West 71945553 71946937 72107471 72799127 73522988 74230053 74999361 75822786 76644950 77410622 0.0 NaN
In [41]:
# change NaN to 0
df_census_clean['gun_permits_2016'].fillna(0, inplace=True)

df_census_clean.head(8)
Out[41]:
region/state census_april_2010 est_base 2010 2011 2012 2013 2014 2015 2016 2017 pop_underage_2016 gun_permits_2016
3 United States 308745538 308758105 309338421 311644280 313993272 316234505 318622525 321039839 323405935 325719178 0.0 0.0
4 Northeast 55317240 55318350 55388349 55642659 55860261 56047732 56203078 56296628 56359360 56470581 0.0 0.0
5 Midwest 66927001 66929794 66973360 67141501 67318295 67534451 67720120 67839187 67978168 68179351 0.0 0.0
6 South 114555744 114563024 114869241 116060993 117291728 118422269 119699966 121081238 122423457 123658624 0.0 0.0
7 West 71945553 71946937 72107471 72799127 73522988 74230053 74999361 75822786 76644950 77410622 0.0 0.0
8 Alabama 4779736 4780135 4785579 4798649 4813946 4827660 4840037 4850858 4860545 4874747 1099517.0 616947.0
9 Alaska 710231 710249 714015 722259 730825 736760 736759 737979 741522 739795 187296.0 87647.0
10 Arizona 6392017 6392309 6407002 6465488 6544211 6616124 6706435 6802262 6908642 7016270 1631422.0 416279.0

Exploratory Data Analysis

What is the overall trend of gun purchases?

Using the totals from the NICS data, we can see what the overall trend of gun sales are from 1998 to 2018

In [24]:
# sum the totals by month
totals = df_gun.groupby("month")["totals"].sum()

# plot graph
tick_placement = pd.np.arange(2, len(totals), 12)
plt.style.use('seaborn')
ax = totals.plot(figsize=(20,8))


ax.set_title("Monthly NICS Background Check Totals Since Nov. 1998", fontsize=24)
ax.set_yticklabels([ "{0:,.0f}".format(y) for y in ax.get_yticks() ], fontsize=12);
plt.setp(ax.get_xticklabels(), rotation=0, fontsize=12)
ax.set_xticks(tick_placement)
ax.set_xticklabels([ totals.index[i].split("-")[0] for i in tick_placement ])
ax.set_xlim(0, len(totals) - 1)
ax.set_xlabel("")
Out[24]:
Text(0.5,0,'')

This visualization shows an exponential increase in background checks since 1998. Each spike shows that gun sales greatly increase in December of each year. The greatest spike being in 2015 due to Black Friday sales. In a report from The Trace, it was found that Black Friday in 2015 broke a record in a single 24-hour period with a total of 185,345 background checks processed, or slightly more than two background checks every second. This excerpt has an important note:

The FBI’s background check numbers come with caveats: As seen in the late February-early March 2014 bubble, many checks are for concealed carry permits, not actual gun sales. Kentucky runs a new check on each concealed carry license holder each month. And of course, the FBI’s numbers don’t include private gun sales, many of which do not require a background check. A forthcoming study conducted by Harvard researchers found that roughly 40 percent of respondents had acquired their most recent firearm without going through a background check. Despite those vagaries, the FBI’s NICS numbers are widely accepted as the best proxy for total gun sales in a given time period.

How many total checks have there been in each state since 1998?

For this visualization, I grouped all of the states together and summed up the total checks per state since 1998

In [25]:
# get the total checks by each state and each month
checks_by_state = df_gun.groupby(['state', 'month'])['totals'].sum().reset_index()

# group the states and sum the totals
state_totals = checks_by_state.groupby('state')['totals'].sum()

# plot graph
state_total_tick_placement = pd.np.arange(len(state_totals))
plt.style.use('seaborn')
state_ax = state_totals.plot(kind='bar',figsize=(20,8))

state_ax.set_title("NICS Background Check Totals By State Since Nov. 1998", fontsize=24)
state_ax.set_yticklabels([ "{0:,.0f}".format(y) for y in state_ax.get_yticks() ], fontsize=12);
plt.setp(state_ax.get_xticklabels(), fontsize=12)
state_ax.set_xticks(state_total_tick_placement)
state_ax.set_xticklabels(state_totals.index)
state_ax.set_xlim(0, len(state_totals) - 1)
state_ax.set_xlabel("")
Out[25]:
Text(0.5,0,'')

As you can see in this graph, Kentucky has the most activity in background checks for guns since 1998. The state is known to have the least restrictive gun control laws compared to other states. In a summary of gun laws in Kentucky, you do not need a state permit to purchase, you do not need to register your firearm, there is no assault weapon law, magazine capacity restriction, and an owner license is not required. Background checks for for private sales are not required as well. The concealed carry laws have also loosened. On March 16, 2011, a change to KRS § 527.020 was signed into law by Governor Steve Beshear allowing guns to be carried without a permit in any factory-installed compartment within the passenger area of a vehicle. Previously, such carry was only allowed in a glove compartment.

In a report by The Trace, Kentucky has the nation's highest percentage of legislators who got a least an A-minus grade from the NRA. This rating system gives members of Congress a grade ranging from A to F that reflects their voting record on gun rights.

There are currently 35 states where more than half of all state legislators have a grade of A- or better, according to an analysis of data provided by Vote Smart, a non-partisan, non-profit research organization. In 14 states, including most of those in the gun belt, that majority exceeds two thirds, reaching or approaching veto-proof. In Kentucky and Oklahoma, the number extends beyond 80 percent.

Kentucky has also been the victim of recent gun violence. On January 23, 2018, 2 students were killed and 18 were injured as a shooter opened fire at a Marshall County High School. The event has led to the introduction of stricter gun laws for the state.

In Texas, the second highest state that has activity of firearm background checks on this graph, just one day before the event in Kentucky, a 15-year old girl was injured at a school shooting.

How much have firearm background checks grown since 1999?

I chose to start at 1999 because when calculating percentage changes from 1998, the numbers are quite large. This was due to very little background check numbers in November of 1998, when the NICS program was created. Below I will show both percentage changes since 1998 and 1999. The latter year gives a more clear picture of how much the checks have grown.

In [26]:
# a loop that uses the total checks in each state in November 1998
# and compares it to the total checks in 2018
x=0
change_dict = {}
while x < len(checks_by_state):
    og_num = checks_by_state.iloc[x]['totals']
    new_num = checks_by_state.iloc[x + 231]['totals']
    decrease = new_num - og_num
    perc_change = round(decrease / og_num * 100, 0)
    change_dict[checks_by_state.iloc[x]['state']] = perc_change
    x = x + 232
C:\Users\kster.DESKTOP-8ROQVFD\Anaconda3\lib\site-packages\ipykernel_launcher.py:9: RuntimeWarning: divide by zero encountered in longlong_scalars
  if __name__ == '__main__':
In [27]:
# change dictionary to a pandas Series
df_percent_change = pd.Series(change_dict, name = 'percent_change')
df_percent_change.index.name = 'state'
df_percent_change
Out[27]:
state
Alabama                 4.153000e+03
Alaska                  3.619000e+03
Arizona                 9.473000e+03
Arkansas                4.020000e+03
California              5.024000e+03
Colorado                7.492000e+03
Connecticut             1.942100e+04
Delaware                8.356000e+03
District of Columbia             inf
Florida                 1.462400e+04
Georgia                 9.318700e+04
Guam                             inf
Hawaii                  3.736000e+03
Idaho                   1.146800e+04
Illinois                8.421180e+05
Indiana                 2.350300e+04
Iowa                    3.507000e+03
Kansas                  4.427000e+03
Kentucky                4.959300e+04
Louisiana               3.210000e+03
Maine                   7.716000e+03
Mariana Islands                  inf
Maryland                4.065000e+03
Massachusetts           3.818600e+04
Michigan                4.262000e+03
Minnesota               1.546000e+04
Mississippi             3.112000e+03
Missouri                8.443000e+03
Montana                 4.867000e+03
Nebraska                3.978000e+03
Nevada                  7.703000e+03
New Hampshire           2.078400e+04
New Jersey              1.088600e+04
New Mexico              7.431000e+03
New York                9.658000e+03
North Carolina          3.952000e+03
North Dakota            8.721000e+03
Ohio                    7.593000e+03
Oklahoma                4.918000e+03
Oregon                  9.318000e+03
Pennsylvania            5.768120e+05
Puerto Rico             1.037000e+05
Rhode Island            6.863000e+03
South Carolina          4.827000e+05
South Dakota            1.082700e+04
Tennessee               6.220600e+04
Texas                   5.481000e+03
Utah                    8.376000e+03
Vermont                 6.459000e+03
Virgin Islands                   inf
Virginia                1.973380e+05
Washington              1.483500e+04
West Virginia           5.622000e+03
Wisconsin               1.806500e+04
Wyoming                 4.626000e+03
Name: percent_change, dtype: float64
In [28]:
# drop rows that have inf values
df_percent_change = df_percent_change.replace([np.inf, -np.inf], np.nan).dropna()

# sort from biggest to smallest
df_percent_change = df_percent_change.sort_values(ascending = False)

# plot graph
percent_total_tick_placement = pd.np.arange(len(df_percent_change))
plt.style.use('seaborn')
percent_ax = df_percent_change.plot(kind='bar',figsize=(20,8))

percent_ax.set_title("Firearm Background Check Growth Rate by State Since Nov. 1998", fontsize=24)
percent_ax.set_yticklabels([ "{0:,.0f}".format(y) for y in percent_ax.get_yticks() ], fontsize=12);
plt.setp(percent_ax.get_xticklabels(), fontsize=12)
percent_ax.set_xticks(percent_total_tick_placement)
percent_ax.set_xticklabels(df_percent_change.index)
percent_ax.set_xlim(0, len(df_percent_change) - 1)
percent_ax.set_ylabel('Percent')
percent_ax.set_xlabel("")
Out[28]:
Text(0.5,0,'')

As you can see, a growth of over 800,000% seems quite large. I further explored the numbers in November

In [29]:
# november 1998 totals in Illinois
checks_by_state[checks_by_state['state'] == 'Illinois'].iloc[0]
Out[29]:
state     Illinois
month      1998-11
totals          22
Name: 3248, dtype: object
In [30]:
# Pennsylvania
checks_by_state[checks_by_state['state'] == 'Pennsylvania'].iloc[0]
Out[30]:
state     Pennsylvania
month          1998-11
totals              17
Name: 9280, dtype: object
In [31]:
# South Carolina
checks_by_state[checks_by_state['state'] == 'South Carolina'].iloc[0]
Out[31]:
state     South Carolina
month            1998-11
totals                 6
Name: 9976, dtype: object
In [32]:
# Virginia
checks_by_state[checks_by_state['state'] == 'Virginia'].iloc[0]
Out[32]:
state     Virginia
month      1998-11
totals          24
Name: 11600, dtype: object
In [33]:
# totals in november 1998 for every state
df_nov98_totals = checks_by_state[checks_by_state.month.str.contains('1998-11') == True]

df_nov98_totals['totals'].mean()
Out[33]:
385.0181818181818

After seeing how low the numbers were in November, I decided to check the amount of growth since December of 1999 for each state

In [34]:
# dataset that excludes November 1998
checks_by_state_clean = checks_by_state[checks_by_state.month.str.contains('1998-11') == False]

# using same method as above to make a dictionary 
# and turn it into a pandas Series
x=0
change_dict_clean = {}
while x < len(checks_by_state_clean):
    og_num = checks_by_state_clean.iloc[x]['totals']
    new_num = checks_by_state_clean.iloc[x + 230]['totals']
    decrease = new_num - og_num
    perc_change = round(decrease / og_num * 100, 0)
    change_dict_clean[checks_by_state_clean.iloc[x]['state']] = perc_change
    x = x + 231
C:\Users\kster.DESKTOP-8ROQVFD\Anaconda3\lib\site-packages\ipykernel_launcher.py:12: RuntimeWarning: divide by zero encountered in longlong_scalars
  if sys.path[0] == '':
In [35]:
df_percent_change_clean = pd.Series(change_dict_clean, name = 'percent_change')
df_percent_change_clean.index.name = 'state'
df_percent_change_clean
Out[35]:
state
Alabama                   27.000000
Alaska                    40.000000
Arizona                  112.000000
Arkansas                  15.000000
California                65.000000
Colorado                 104.000000
Connecticut              130.000000
Delaware                 124.000000
District of Columbia            inf
Florida                  279.000000
Georgia                 1439.000000
Guam                    1522.000000
Hawaii                   167.000000
Idaho                    184.000000
Illinois                 347.000000
Indiana                  269.000000
Iowa                      23.000000
Kansas                    50.000000
Kentucky                1245.000000
Louisiana                 -1.000000
Maine                     95.000000
Mariana Islands                 inf
Maryland                  52.000000
Massachusetts           1076.000000
Michigan                  51.000000
Minnesota                321.000000
Mississippi               -1.000000
Missouri                 110.000000
Montana                   61.000000
Nebraska                  17.000000
Nevada                    98.000000
New Hampshire            381.000000
New Jersey               128.000000
New Mexico                80.000000
New York                 116.000000
North Carolina            41.000000
North Dakota             115.000000
Ohio                     132.000000
Oklahoma                  63.000000
Oregon                   137.000000
Pennsylvania             103.000000
Puerto Rico              306.000000
Rhode Island             159.000000
South Carolina           203.000000
South Dakota             202.000000
Tennessee                170.000000
Texas                     96.000000
Utah                     117.000000
Vermont                   88.000000
Virgin Islands                  inf
Virginia                  88.000000
Washington               363.000000
West Virginia             69.000000
Wisconsin                188.000000
Wyoming                   50.000000
Name: percent_change, dtype: float64
In [36]:
# drop inf
df_percent_change_clean = df_percent_change_clean.replace([np.inf, -np.inf], np.nan).dropna()

# sort biggest to smallest
df_percent_change_clean = df_percent_change_clean.sort_values(ascending = False)

# plot graph
percent_total_tick_placement_clean = pd.np.arange(len(df_percent_change_clean))
plt.style.use('seaborn')
percent_ax_clean = df_percent_change_clean.plot(kind='bar',figsize=(20,8))

percent_ax_clean.set_title("Firearm Background Check Growth Rate by State Since Dec. 1999", fontsize=24)
percent_ax_clean.set_yticklabels([ "{0:,.0f}".format(y) for y in percent_ax_clean.get_yticks() ], fontsize=12);
plt.setp(percent_ax_clean.get_xticklabels(), fontsize=12)
percent_ax_clean.set_xticks(percent_total_tick_placement_clean)
percent_ax_clean.set_xticklabels(df_percent_change_clean.index)
percent_ax_clean.set_xlim(0, len(df_percent_change_clean) - 1)
percent_ax_clean.set_ylabel('Percent')
percent_ax_clean.set_xlabel("")
Out[36]:
Text(0.5,0,'')

This graph gives a better picture of how firearm background checks have grown since the beginning of the NICS system. Georgia, Kentucky, and Massachusetts, have the highest growth rates. Guam also does, but the numbers for the state are only in the hundreds and do not have a significant impact on the data overall.

Using information from this graph, I then created a graph that showed each states background checks over the years.

In [37]:
# create a data frame for each state that has total checks by month
# Georgia
checks_by_state_georgia = checks_by_state_clean[checks_by_state_clean['state']=='Georgia']
checks_by_state_georgia = checks_by_state_georgia.groupby('month')['totals'].sum()

# Kentucky
checks_by_state_kentucky = checks_by_state_clean[checks_by_state_clean['state']=='Kentucky']
checks_by_state_kentucky = checks_by_state_kentucky.groupby('month')['totals'].sum()

# Massachusetts
checks_by_state_mass = checks_by_state_clean[checks_by_state_clean['state']=='Massachusetts']
checks_by_state_mass = checks_by_state_mass.groupby('month')['totals'].sum()
In [38]:
# plot graph
growth_rates_tick_placement = pd.np.arange(2, len(checks_by_state_georgia), 12)
plt.style.use('seaborn')
ax_state_growth = checks_by_state_georgia.plot(figsize=(20,8), label='Georgia')
plt.plot(checks_by_state_kentucky, label='Kentucky')
plt.plot(checks_by_state_mass, label='Massachusetts')
plt.legend()

ax_state_growth.set_title("Monthly NICS Gun Permit Check Totals For States With Most Growth", fontsize=24)
ax_state_growth.set_yticklabels([ "{0:,.0f}".format(y) for y in ax_state_growth.get_yticks() ], fontsize=12);
plt.setp(ax_state_growth.get_xticklabels(), rotation=0, fontsize=12)
ax_state_growth.set_xticks(growth_rates_tick_placement)
ax_state_growth.set_xticklabels([checks_by_state_georgia.index[i].split("-")[0] for i in growth_rates_tick_placement])
ax_state_growth.set_xlim(0, len(checks_by_state_georgia) - 1)
ax_state_growth.set_xlabel("")
Out[38]:
Text(0.5,0,'')

Activity in Kentucky is significantly higher than the other two states with high growth rates. Each state shows a big spike in December of each year.

In the next visualization, I'll show the percentage of firearm background checks compared to the population of each state that are over 18 years old. The census data for underage children was only available up to the year 2016.

In [42]:
# we'll use the same method of creating a dictionary
# and turning it into a series
perc_guns_2016_dict = {}
x=5
while x < len(df_census_clean):
    # subtracting the under 18 population to the census
    not_underage = df_census_clean['2016'].iloc[x] - df_census_clean['pop_underage_2016'].iloc[x]
    # number of background checks in 2016
    num_guns = df_census_clean['gun_permits_2016'].iloc[x]
    # percentage
    percent = round(num_guns/not_underage * 100, 2)
    state = df_census_clean['region/state'].iloc[x]
    perc_guns_2016_dict[state] = percent
    x = x + 1
In [43]:
perc_guns_2016_dict
Out[43]:
{'Alabama': 16.4,
 'Alaska': 15.81,
 'Arizona': 7.89,
 'Arkansas': 11.66,
 'California': 7.87,
 'Colorado': 12.66,
 'Connecticut': 11.2,
 'Delaware': 7.94,
 'District of Columbia': 0.16,
 'Florida': 8.69,
 'Georgia': 7.86,
 'Hawaii': 1.49,
 'Idaho': 12.57,
 'Illinois': 19.41,
 'Indiana': 28.41,
 'Iowa': 8.41,
 'Kansas': 8.96,
 'Kentucky': 107.4,
 'Louisiana': 10.98,
 'Maine': 10.38,
 'Maryland': 3.53,
 'Massachusetts': 4.15,
 'Michigan': 7.49,
 'Minnesota': 15.38,
 'Mississippi': 12.21,
 'Missouri': 13.45,
 'Montana': 16.83,
 'Nebraska': 6.18,
 'Nevada': 6.54,
 'New Hampshire': 15.36,
 'New Jersey': 1.74,
 'New Mexico': 10.1,
 'New York': 2.59,
 'North Carolina': 7.65,
 'North Dakota': 12.4,
 'Ohio': 9.72,
 'Oklahoma': 12.32,
 'Oregon': 10.09,
 'Pennsylvania': 11.31,
 'Rhode Island': 3.27,
 'South Carolina': 10.44,
 'South Dakota': 17.28,
 'Tennessee': 13.89,
 'Texas': 8.35,
 'Utah': 13.89,
 'Vermont': 7.81,
 'Virginia': 8.07,
 'Washington': 12.63,
 'West Virginia': 16.69,
 'Wisconsin': 12.52,
 'Wyoming': 14.32}
In [45]:
# change dict into series and sort from biggest to smallest
df_perc_guns_2016 = pd.Series(perc_guns_2016_dict, name = 'percent')
df_perc_guns_2016.index.name = 'state'
df_perc_guns_2016 = df_perc_guns_2016.sort_values(ascending=False)

# plot graph
percent_total_tick_placement_2016 = pd.np.arange(len(df_perc_guns_2016))
plt.style.use('seaborn')
percent_ax_2016 = df_perc_guns_2016.plot(kind='bar',figsize=(20,8))

percent_ax_2016.set_title("Number of Firearm Background Checks Compared to State Population in 2016", fontsize=24)
plt.setp(percent_ax_2016.get_xticklabels(), fontsize=12)
percent_ax_2016.set_xticks(percent_total_tick_placement_2016)
percent_ax_2016.set_xticklabels(df_perc_guns_2016.index)
percent_ax_2016.set_xlim(0, len(df_perc_guns_2016) - 1)
percent_ax_2016.set_ylabel('Percent')
percent_ax_2016.set_xlabel("")
Out[45]:
Text(0.5,0,'')

With this visualization, we can get a picture of the amount of the population in each state participates in the NICS background checks. Kentucky is at 107%, which further shows the high amount of background check activity for guns in the state. This percentage comes with a caveat. Kentucky runs a new check on each concealed carry license holder each month, adding to the total number of checks for the state.

Conclusions

This analysis allowed me to see the bigger picture when it comes to guns in America. Even though I didn't calculate actual gun sales, the data still allowed me to see trends between each state and all over the U.S. These are the conclusions I have arrived at:

A one-to-one comparison cannot be made on background checks and gun sales

As noted in the introduction of this analysis, background checks can give you an idea of gun sale activity, but a one-to-one comparison cannot be made. I chose not to estimate gun sales numbers for this reason. Although it is debated on the exact percentage, researchers are discovering that a significant amount of gun sales happen without background checks. In a report by The Guardian:

The 2015 survey found that just 22% of gun owners who had acquired a gun in the previous two years reported doing so without a background check. Gun owners who had acquired a gun earlier than that – between two and five years before 2015, or more than five years before – were more likely to remember doing so without a background check. A full 57% of gun owners who reported acquiring their most recent gun more than five years before 2015 reported getting the gun without a background check. Because the survey relied on the memories of the participants, the researchers wrote, the more recent gun acquisition data might be more accurate.

In a similar report by The Trace:

  • Roughly 70 percent: Gun owners who purchased their most recent gun.
  • Roughly 30 percent: Gun owners who did not purchase their most recent gun, instead obtaining it through a transfer (i.e., a gift, an inheritance, a swap between friends).
  • Zeroing in on the population of gun buyers, about 34 percent did not go through a background check.
  • Among the gun owners who got their firearms through a transfer, roughly two-thirds did not go through a background check.

Add it up, and it works out to:

  • Roughly 60 percent: the share of gun owners surveyed who did go through a background check when they obtained (through sale or transfer) their latest gun.
  • Roughly 40 percent: the share of gun owners who did not.

NICS Background check activity has steadily risen since 1998

In the first visualization, the graph showed a steady increase in background checks for guns since 1998. The spikes in December likely due to Black Friday sales. Spikes that do not happen in December could be due to calls for new gun restrictions.

Kentucky has the highest amount of background checks since 1998

Along with the state having the highest amount of checks, the state has some of the highest background check activity, as well as the highest background check growth. Kentucky has some of the least restrictive gun control compared to other states, and over 80% of legislatures recieve a high grade from the NRA on gun legislation. Stricter gun laws have been introduced due to recent gun violence. The last graph in this report shows that more firearm background checks happen in Kentucky than people over 18. This shows high activity, but also has a caveat. Kentucky runs a new check on each concealed carry license holder each month, adding to the total number of checks for the state.