Demographics and Business Loans

A demonstration of how to combine FFIEC records of business loans with census data and display the data in a variety of types of maps..

View Full Notebook on Github

%load_ext metapack
%matplotlib inline

import pandas as pd
import geopandas as gpd
import numpy as np
import metapack as mp
import matplotlib.pyplot as plt
import seaborn as sns
from sdipylib.geo import *
sns.set(color_codes=True)

from pkg_resources import get_distribution
assert get_distribution('publicdata').version >= '0.1.3'

This notebook is a demonstration of how to combine FFIEC records of business loans with census data and display the data in a variety of type of maps. The demonstration will cover using Metatab and Metapack to access data, creating maps with GeoPandas, and creating zoomable maps with a basemap using folium.

This demonstration will focus on mapping, so if you’d like a more complete tutorial on using Metapack and Census data, please refer to the American Community Survey tutorial notebook.

First, we’ll add a Metatab configuration that defines the references to the data we’ll use, which include:

An ACS Table for counts of people by race in tracts in San Diego county
Tract grographic boundaries for San DIego County
A Metapack package for CRA Business loan originations.

The American Community Survey tutorial notebook. has more information about the structure of the Metatab configuration, and the CRA Loan dataset is available in a friendly form at the Data Library’s data repository.

Note that to run this notebook, you will need to install these python packages:

jupyter
pandas
geopandas
matplotlib
seaborn
publicdata

%%metatab

    
Dataset: cra_mapping
Origin: sandiegodata.org
Version: 1

Section: Contacts
Wrangler: Eric Busboom
Wrangler.Email: eric@sandiegodata.org
Wrangler.Organization: San Diego Regional Data Library
  
Section: References

Reference: censusreporter:B02001/140/05000US06073
Reference.Name: B02001
Reference.Description: Table B02001: Race, by tract, in San Diego County

Reference: censusreportergeo://B02001/140/05000US06073
Reference.Name: B02001g
Reference.Description: Table B02001: Race, by tract, in San Diego County

Reference: metapack+http://library.metatab.org/ffiec.gov-cra_aggregate_smb-orig-4.csv#sb_agg_loan_orig
Reference.Name: agg_loan_orig
Reference.Description: CRA Loan originations, aggregated to tracts.

Getting Started: Processing Data

Generally, the first parts of a Notebook should contain all of the data manipulations to get data sets combined, or or create new columns, and the the remainer of the notebook is the analysis. So, we will start by extracting data frames from the Metatab configuration. Recall from the American Community Survey tutorial notebook. that the %metatab cell creates the mt_pkg package variable, and from that package we can extract dataframes, for census data, and geoframes for geographic data.

# The mt_pkg var is created by the %%metatab magic. 
# The B02001 and B02001g frames have the same data, but B02001 is a CensusDataFrame and B02001g is a GeoDataFrame
B02001 = mt_pkg.reference('B02001').dataframe().set_index('geoid')
B02001g = mt_pkg.reference('B02001g').geoframe().set_index('geoid')

Because the URL of the data frame B02001 is special ( the censusreporter: scheme is defined in the publicdata package ) the resulting dataframe is of type publicdata.censusreporter.dataframe.CensusDataFrame, it has a lot of special features, including a .title property that expands the column names to be more sensible. This view makes it easier to figure out what columns to incorporate in later analysis.

B02001.titles.head(2).T

geoid	14000US06073000100	14000US06073000201
name	14000US06073000100	14000US06073000201
B02001001 Total	2716	2223
Margins for B02001001 Total	206	260
B02001002 Total White alone	2549	1983
Margins for B02001002 Total White alone	188	276
B02001003 Total Black or African American alone	0	27
Margins for B02001003 Total Black or African American alone	12	42
B02001004 Total American Indian and Alaska Native alone	0	7
Margins for B02001004 Total American Indian and Alaska Native alone	12	12
B02001005 Total Asian alone	60	95
Margins for B02001005 Total Asian alone	38	67
B02001006 Total Native Hawaiian and Other Pacific Islander alone	10	0
Margins for B02001006 Total Native Hawaiian and Other Pacific Islander alone	15	12
B02001007 Total Some other race alone	35	45
Margins for B02001007 Total Some other race alone	33	49
B02001008 Total Two or more races	62	66
Margins for B02001008 Total Two or more races	42	54
B02001009 Total Two or more races Two races including Some other race	30	9
Margins for B02001009 Total Two or more races Two races including Some other race	34	15
B02001010 Total Two or more races Two races excluding Some other race, and three or more races	32	57
Margins for B02001010 Total Two or more races Two races excluding Some other race, and three or more races	27	52

In this case, we’re going to create new variables for “Minority” and “Non Minority” where “Non Minority” is defined as White and Asian.

B02001['white_asian'], B02001['white_asian_m90'] = B02001.sum_m('B02001002', 'B02001005')
B02001['minority'] = B02001.B02001001 - B02001.white_asian
B02001['non_min_r'], B02001['non_min_r_m90'] = B02001.ratio('white_asian', 'B02001001')

For the loan data, let’s drop all of the records that are nulls, select only tract records, and select only the year 2015.

loans_r = mt_pkg.reference('agg_loan_orig')
loans = loans_r.read_csv()

loans = loans.dropna(subset=['geoid']).dropna(subset=['tract']).set_index('geoid')

# Note! Only one Year!
loans = loans[loans.year == 2015]

loans_r

agg_loan_orig

metapack+http://library.metatab.org/ffiec.gov-cra_aggregate_smb-orig-4.csv#sb_agg_loan_orig

Header	Type	Description
tableid	string	Table ID
year	number	Activity Year
type	number	Loan Type
action	number	Action Taken Type
state	string	State
county	string	County
msa	string	MSA/MD
tract	string	Census Tract
split_county	string	Split County Indicator
pop_class	string	Population Classification
income_group	string	Income Group Total
report_level	string	Report Level
number_lt100	number	Number of Small Business Loans Originated with Loan Amount at Origination < $100,000
total_lt100	number	Total Loan Amount of Small BusinessLoans Originated with Loan Amountat Origination < $100,000
number_100_250	number	Number of Small Business Loans Originated with Loan Amount at Origination > 100,000 and< $250,000
total_100_250	number	Total Loan Amount of Small BusinessLoans Originated with Loan Amount at Origination > $100,000 and < $250,000
number_250_1m	number	Number of Small Business Loans Originated with Loan Amount at Origination > $250,000 and< $1M
total_250_1m	number	Total Loan Amount of Small BusinessLoans Originated with Loan Amount at Origination > $250,000 and <$1,000,000
number_bus_rev_lt1m	number	Number of Loans Originated to SmallBusinesses with Gross AnnualRevenues < $1 million
total_bus_rev_lt1m	number	Total Loan Amount of Loans Originated to Small Businesses with Gross Annual Revenues < $1 million
filler	string	Filler
geoid	string	Geoid

Next, we can join the ACS, geographic, and loans dataframes. Previously, they were all given geoid for an index, so joining them is easy. Once they are joined, we can create some summary and derived columns. Note that the geographic dataset is joined first. This is important, because it means that the final df will also be a geographic dataframe.

df = B02001g[['geometry']].join(B02001[['non_min_r']]).join(loans)

df['number_loans'] = df.number_lt100 + df.number_100_250 + df.number_250_1m
df['total_value'] = df.total_lt100 + df.total_100_250 + df.total_250_1m
df['mean_value'] = df.total_value + df.number_loans
df['loans_per_pop'] = df.number_loans / B02001.B02001001
df['loans_per_min'] = df.number_loans / B02001.minority
df['loan_val_per_pop'] = df.total_value / B02001.B02001001
df['loan_val_per_min'] = df.total_value / B02001.minority

df.head(3).T

geoid	14000US06073000100	14000US06073000201	14000US06073000202
geometry	POLYGON ((-117.194904 32.75278, -117.19471 32....	POLYGON ((-117.178867 32.75765, -117.177966 32...	POLYGON ((-117.184043 32.74571, -117.183827 32...
non_min_r	0.960604	0.934773	0.932522
non_min_r_m90	0.101467	0.168156	0.129733
tableid	A1-1	A1-1	A1-1
year	2015	2015	2015
type	4	4	4
action	1	1	1
state	6	6	6
county	73	73	73
msa	41740	41740	41740
tract	1	2.01	2.02
split_county	N	N	N
pop_class	L	L	L
income_group	13	13	12
report_level	NaN	NaN	NaN
number_lt100	140	99	150
total_lt100	1925	1608	2162
number_100_250	2	1	4
total_100_250	300	250	490
number_250_1m	2	2	0
total_250_1m	1500	1298	0
number_bus_rev_lt1m	74	56	87
total_bus_rev_lt1m	1102	1835	1536
filler	NaN	NaN	NaN
total_value	3725	3156	2652
number_loans	144	102	154
mean_value	3869	3258	2806
loans_per_pop	0.0530191	0.0458839	0.0328849
loans_per_min	1.34579	0.703448	0.487342
loan_val_per_pop	1.3715	1.4197	0.566304
loan_val_per_min	34.8131	21.7655	8.39241
minr_dist_deg	0.109813	0.100358	0.0965819
minr_dist_m	11276.8	10254.4	10028.5

Creating Maps

Just as Metapack and the publicdata module worked together to turn the dataset references by the censusreporter: reference into a special Census data frame, the censusreportergeo: reference returns a Geopandas dataframe. Geopandas has a built in support for geographic operations, such as creating maps. For instance:

# Create the matplotlib figure and a single axis, and set the figure size
# to 12 inches wide, with the same aspect ratio as the geographic dataframe. This
# keeps the map from looking distored. 
fig, ax = aspect_fig_size(df, 12) 

_ = df.dropna(subset=['non_min_r'])

_.plot(ax=ax, column='non_min_r', cmap='RdYlGn',scheme='fisher_jenks', legend=True);

<matplotlib.axes._subplots.AxesSubplot at 0x113967278>

png

from sdipylib.geo import aspect 


# In this case, with multiple axes per figure, setting the aspect ratio is more manual
w = 6
a = max(aspect(_mv),aspect(_lpp))

fig = plt.figure(figsize = (2*w, 2*(w/a)))

# Doing ax2 first because it has the longer y axis, and want both plots to have the same
# y axis
ax2= fig.add_subplot(2,2,2)
_ = df.replace([np.inf, -np.inf], np.nan).dropna(subset=['loans_per_pop'])\
    .sort_values('loans_per_pop', ascending=False).iloc[:50]
_.plot(ax=ax2, column='loans_per_pop', cmap='RdYlGn',scheme='fisher_jenks', legend=True)
ax2.set_title("Top 50 Tracts By # Loans Per Population")

# subplot(nrows, ncols, plot_number)
ax1= fig.add_subplot(2,2,1, sharey=ax2)
_ = df.dropna(subset=['mean_value']).sort_values('mean_value', ascending=False).iloc[:50]

_.plot(ax=ax1, column='mean_value', cmap='RdYlGn',scheme='fisher_jenks', legend=True)
ax1.set_title("Top 50 Tracts By Mean Loan Value")

ax3= fig.add_subplot(2,2,3, sharey=ax2)
_ = df.replace([np.inf, -np.inf], np.nan).dropna(subset=['loan_val_per_pop'])
_ = _[_.loan_val_per_pop>3]
_.plot(ax=ax3, column='loan_val_per_pop', cmap='RdYlGn',scheme='fisher_jenks', legend=True)
ax3.set_title("Tracts with > $3 Loan Value Per Pop")

ax4= fig.add_subplot(2,2,4, sharey=ax2)
_ = df.replace([np.inf, -np.inf], np.nan).dropna(subset=['loan_val_per_min'])
_ = _[_.loan_val_per_min>24]
_.plot(ax=ax4, column='loan_val_per_min', cmap='RdYlGn',scheme='fisher_jenks', legend=True)
ax4.set_title("Tracts with > $24 Loan Value Per Minority");

png

Geopandas mapping is really easy, but it is hard to tell where the tracts are, because there is no base map. The folium module links Geopandas and Leaflet, and lets us put choropleth maps on a scrollable, zoomable map.

_ = df.replace([np.inf, -np.inf], np.nan).dropna(subset=['loan_val_per_pop'])
_ = _[_.loan_val_per_pop>3]
folium_map(_,'loan_val_per_pop', zoom_start=10, fill_color='YlGn', fill_opacity=.7)

_ = df.replace([np.inf, -np.inf], np.nan).dropna(subset=['loans_per_pop'])
_ = _.sort_values('loans_per_pop', ascending=False).iloc[:100] # don't exceed notebooks IOPub data rate
#_ = _[_.loan_val_per_pop>3]
folium_map(_,'loans_per_pop', zoom_start=10, fill_color='YlGn', fill_opacity=.7)

Focus on Minority Areas with Radius Searches

Next, we’ll locate an area with a high portion of minorities and focus on the nearby area. By sorting on the non_min_r ( Non-minority ratio ) and selecting the bottom 20 values, we can union the tracts, find the center, and then use that as the center for a minority cluster.

Using Geopandas, we can use plot the bottom 20 tracts to get a quick view of the area.

# Find the top 30 highest minority areas and plot as a seperate set. They are fairly 
# geographically clustered
_ = df.sort_values('non_min_r').iloc[:20]
_.plot()

[-117.150228   32.682374 -117.010391   32.753434]

png

Next, compute the distance from the minority center to the centroid of all other tracts. We’ll use this value to constrain the radius search. The resulting distances are in meters.

# Get the center point of the high-minority areas. The unary_union is required
# because the subsequence operations work on individual shapes, so we have to join the shapes first. 
minority_center = df.sort_values('non_min_r').iloc[:20].unary_union.convex_hull.centroid

#
# Then compute the distance of each other the remaining tracts to 
# the high minority centroid. 
df['minr_dist_deg'] = df.geometry.distance(minority_center)

import pyproj
from functools import partial

geod = pyproj.Geod(ellps='WGS84')
def geodetic_dist(point1, region2):
    """Returns the distance between a point and the centroid of a region in meters"""
    point2 = region2.centroid
    angle1,angle2,distance = geod.inv(point1.x, point1.y, point2.x, point2.y)
    return distance

# Note that this distance will tend to exclude large tracts, because the centroids are relatively farther away. 
df['minr_dist_m'] =df.geometry.apply(partial(geodetic_dist, minority_center))

Then, we just need to constrain the minr_dist_m to get our cluster.

min_areas = df[ df.minr_dist_m < 7000].dropna(subset=['non_min_r'])
min_areas = min_areas.dropna(subset=['non_min_r'])

Ploting is easy, as before, but this time we’ll also add a large blue dot to marl the location of the centroid.

fig, ax = aspect_fig_size(min_areas, 15)

min_areas.plot(ax=ax, column='non_min_r', cmap='RdYlGn',scheme='fisher_jenks', legend=True)


# Plot the minority centroid point, with a bit of a buffer to make it larger
import descartes 
circle = minority_center.buffer(.002)
ax.add_patch(descartes.PolygonPatch(circle, fc='blue', alpha=0.5));

png

And, of course, it is easier to visualize where this cluster is when it is laid over a street map.

folium_map(min_areas,'non_min_r')