GeoEnrichment uses the concept of a study area to define the location of the point or area that you want to enrich with additional information or create reports about. The accepted forms of study areas are:
Before we look at the exmaples of study areas, let's understand the concept of Data collections and analysis variables. We will look at Data collections in detail in a later section.
GeoEnrichment uses the concept of a data collection to define the data attributes (analysis variables) returned by the enrichment service. A data collection is a preassembled list of attributes that will be used to enrich the input features. Collection attributes can describe various types of information, such as demographic characteristics and geographic context of the locations or areas submitted as input features. We will introduce the concept of data collections here and look at the details in the next guide.
The Country
class can be used to discover the data collections, sub-geographies and available reports for a country. When working with a particular country, you will find it convenient to get a reference to it using the Country.get() method.
The data_collections
property of a Country
object lists a combination of available data collections and analysis variables for each data collection as a Pandas dataframe.
Once we know the data collection we would like to use, we can look at all the unique analysisVariable
available in that data collection.
# Import Libraries
from arcgis.gis import GIS
from arcgis.geoenrichment import Country, enrich, BufferStudyArea
# Create a GIS Connection
gis = GIS(profile='your_online_profile')
# Get US as a country
usa = Country.get('US')
type(usa)
arcgis.geoenrichment.enrichment.Country
df = usa.data_collections
# print a few rows of the DataFrame
df.head()
analysisVariable | alias | fieldCategory | vintage | |
---|---|---|---|---|
dataCollectionID | ||||
1yearincrements | 1yearincrements.AGE0_CY | 2020 Population Age <1 | 2020 Age: 1 Year Increments (Esri) | 2020 |
1yearincrements | 1yearincrements.AGE1_CY | 2020 Population Age 1 | 2020 Age: 1 Year Increments (Esri) | 2020 |
1yearincrements | 1yearincrements.AGE2_CY | 2020 Population Age 2 | 2020 Age: 1 Year Increments (Esri) | 2020 |
1yearincrements | 1yearincrements.AGE3_CY | 2020 Population Age 3 | 2020 Age: 1 Year Increments (Esri) | 2020 |
1yearincrements | 1yearincrements.AGE4_CY | 2020 Population Age 4 | 2020 Age: 1 Year Increments (Esri) | 2020 |
# call the shape property to get the total number of rows and columns
df.shape
(17608, 4)
Each data collection can have multiple analysis variables as seen in the table above. Every such analysis variable has a unique ID, found in the analysisVariable
column. When calling the enrich()
method, these analysis variables can be passed in the data_collections
and analysis_variables
parameters.
You can filter the data_collections
and query the collections analysis_variables
using Pandas expressions.
# get all the unique data collections available for the current country
df.index.unique()
Index(['1yearincrements', '5yearincrements', 'ACS_Housing_Summary_rep', 'ACS_Population_Summary_rep', 'Age', 'AgeDependency', 'Age_50_Profile_rep', 'Age_by_Sex_Profile_rep', 'Age_by_Sex_by_Race_Profile_rep', 'AtRisk', ... 'transportation', 'travelMPI', 'unitsinstructure', 'urbanizationgroupsNEW', 'vacant', 'vehiclesavailable', 'veterans', 'women', 'yearbuilt', 'yearmovedin'], dtype='object', name='dataCollectionID', length=150)
The snippet below shows how you can query the Age
data collection and get all the unique analysisVariable
s under that collection.
df.loc['Age']['analysisVariable'].unique()
array(['Age.MALE0', 'Age.MALE5', 'Age.MALE10', 'Age.MALE15', 'Age.MALE20', 'Age.MALE25', 'Age.MALE30', 'Age.MALE35', 'Age.MALE40', 'Age.MALE45', 'Age.MALE50', 'Age.MALE55', 'Age.MALE60', 'Age.MALE65', 'Age.MALE70', 'Age.MALE75', 'Age.MALE80', 'Age.MALE85', 'Age.FEM0', 'Age.FEM5', 'Age.FEM10', 'Age.FEM15', 'Age.FEM20', 'Age.FEM25', 'Age.FEM30', 'Age.FEM35', 'Age.FEM40', 'Age.FEM45', 'Age.FEM50', 'Age.FEM55', 'Age.FEM60', 'Age.FEM65', 'Age.FEM70', 'Age.FEM75', 'Age.FEM80', 'Age.FEM85'], dtype=object)
# View a sample of the `Age` data collection
df.loc['Age'].head()
analysisVariable | alias | fieldCategory | vintage | |
---|---|---|---|---|
dataCollectionID | ||||
Age | Age.MALE0 | 2020 Males Age 0-4 | 2020 Age: 5 Year Increments (Esri) | 2020 |
Age | Age.MALE5 | 2020 Males Age 5-9 | 2020 Age: 5 Year Increments (Esri) | 2020 |
Age | Age.MALE10 | 2020 Males Age 10-14 | 2020 Age: 5 Year Increments (Esri) | 2020 |
Age | Age.MALE15 | 2020 Males Age 15-19 | 2020 Age: 5 Year Increments (Esri) | 2020 |
Age | Age.MALE20 | 2020 Males Age 20-24 | 2020 Age: 5 Year Increments (Esri) | 2020 |
Now, let's look at some examples of enriching each of the study areas.
Street address locations can be passed as strings of input street addresses, points of interest or place names. A street address can be passed as a single line or as a multiple field input. If a point (e.g. a street address) is used as a study area, the service will create a 1 mile ring buffer around the point to collect and append enrichment data.
The example below uses a street address
as a study area for enrichment using Age
data collection.
# Enriching single address as single line imput
single_address = enrich(study_areas=["380 New York St Redlands CA 92373"],
data_collections=['Age'])
single_address
ID | OBJECTID | sourceCountry | X | Y | areaType | bufferUnits | bufferUnitsAlias | bufferRadii | aggregationMethod | ... | FEM45 | FEM50 | FEM55 | FEM60 | FEM65 | FEM70 | FEM75 | FEM80 | FEM85 | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | US | -117.194872 | 34.057237 | RingBuffer | esriMiles | Miles | 1 | BlockApportionment:US.BlockGroups | ... | 376 | 398 | 374 | 340 | 310 | 262 | 153 | 98 | 129 | {"rings": [[[-117.19487199429183, 34.071745616... |
1 rows × 50 columns
The returned spatial dataframe can be visualized on a map as shown below:
# Plot on a map
address_map = gis.map('Redlands, CA',13)
address_map
A buffer of 1 mile is created by default, as seen on this map, for any address.
single_address.spatial.plot(address_map)
True
Multiple addresses as single line input
# Enriching multiple addresses as single line input
enrich(study_areas=[{"address":{"text":"12 Concorde Place Toronto ON M3C 3R8","sourceCountry":"Canada"}},
{"address":{"text":"380 New York St Redlands CA 92373","sourceCountry":"US"}}],
data_collections=['Age'])
ID | OBJECTID | sourceCountry | X | Y | areaType | bufferUnits | bufferUnitsAlias | bufferRadii | aggregationMethod | ... | ECYPFA4549 | ECYPFA5054 | ECYPFA5559 | ECYPFA6064 | ECYPFA6569 | ECYPFA7074 | ECYPFA7579 | ECYPFA8084 | ECYPFA85P | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | CA | -79.328740 | 43.729720 | RingBuffer | esriMiles | Miles | 1 | BlockApportionment:CAN.DA | ... | 1351.0 | 1264.0 | 1323.0 | 1138.0 | 1156.0 | 973.0 | 784.0 | 576.0 | 970.0 | {"rings": [[[-79.3287400246266, 43.74420464321... |
1 | 1 | 2 | US | -117.194872 | 34.057237 | RingBuffer | esriMiles | Miles | 1 | BlockApportionment:US.BlockGroups | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | {"rings": [[[-117.19487199429183, 34.071745616... |
2 rows × 50 columns
enrich(study_areas=[{"address":{"Address":"380 New York Street",
"City":"Redlands", "Region":"CA", "Postal":92373}}],
data_collections=['Age'])
ID | OBJECTID | sourceCountry | X | Y | areaType | bufferUnits | bufferUnitsAlias | bufferRadii | aggregationMethod | ... | FEM45 | FEM50 | FEM55 | FEM60 | FEM65 | FEM70 | FEM75 | FEM80 | FEM85 | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | US | -117.194872 | 34.057237 | RingBuffer | esriMiles | Miles | 1 | BlockApportionment:US.BlockGroups | ... | 376 | 398 | 374 | 340 | 310 | 262 | 153 | 98 | 129 | {"rings": [[[-117.19487199429183, 34.071745616... |
1 rows × 50 columns
Enriching with various analysis variables for age such as FEM45, FEM50, FEM65
etc.
enrich(study_areas=["380 New York St Redlands CA 92373"],
analysis_variables=["Age.FEM45","Age.FEM55","Age.FEM65"])
ID | OBJECTID | sourceCountry | X | Y | areaType | bufferUnits | bufferUnitsAlias | bufferRadii | aggregationMethod | populationToPolygonSizeRating | apportionmentConfidence | HasData | FEM45 | FEM55 | FEM65 | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | US | -117.194872 | 34.057237 | RingBuffer | esriMiles | Miles | 1 | BlockApportionment:US.BlockGroups | 2.191 | 2.576 | 1 | 376 | 374 | 310 | {"rings": [[[-117.19487199429183, 34.071745616... |
Point geometries can be passed as x and y coordinates to study_areas
parameter. When points are specified as study areas, the service will analyze map areas surrounding or associated with the input point locations. Unless otherwise specified, the service will analyze a one mile ring around a point. This is also true for a line. Locations can also be given as polygon geometries.
from arcgis.geometry import Point
pt = Point({"x" : -117.1956, "y" : 34.0572, "spatialReference" : {"wkid" : 4326}})
enrich(study_areas=[pt], data_collections=['Age'])
ID | OBJECTID | sourceCountry | areaType | bufferUnits | bufferUnitsAlias | bufferRadii | aggregationMethod | populationToPolygonSizeRating | apportionmentConfidence | ... | FEM45 | FEM50 | FEM55 | FEM60 | FEM65 | FEM70 | FEM75 | FEM80 | FEM85 | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | US | RingBuffer | esriMiles | Miles | 1 | BlockApportionment:US.BlockGroups | 2.191 | 2.576 | ... | 364 | 388 | 361 | 329 | 300 | 253 | 147 | 92 | 122 | {"rings": [[[-117.19559999999998, 34.071708616... |
1 rows × 48 columns
pt1 = Point({"x" : -122.435, "y" : 37.785, "spatialReference" : {"wkid" : 4326}})
pt2 = Point({"x" : -122.433, "y" : 37.734, "spatialReference" : {"wkid" : 4326}})
enrich(study_areas=[pt1, pt2], data_collections=['Age'])
ID | OBJECTID | sourceCountry | areaType | bufferUnits | bufferUnitsAlias | bufferRadii | aggregationMethod | populationToPolygonSizeRating | apportionmentConfidence | ... | FEM45 | FEM50 | FEM55 | FEM60 | FEM65 | FEM70 | FEM75 | FEM80 | FEM85 | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | US | RingBuffer | esriMiles | Miles | 1 | BlockApportionment:US.BlockGroups | 2.191 | 2.576 | ... | 2994 | 2581 | 2615 | 2773 | 2602 | 2394 | 1926 | 1564 | 2351 | {"rings": [[[-122.43499999999999, 37.799499596... |
1 | 1 | 2 | US | RingBuffer | esriMiles | Miles | 1 | BlockApportionment:US.BlockGroups | 2.191 | 2.576 | ... | 2444 | 2373 | 2378 | 2164 | 2004 | 1660 | 1097 | 798 | 1040 | {"rings": [[[-122.43299999999999, 37.748499722... |
2 rows × 48 columns
from arcgis.geometry import Polyline
line = Polyline({"paths":[[[-13048580,4036370],[-13046151,4036366]]],
"spatialReference":{"wkid":102100}})
enriched_line_df = enrich(study_areas=[line], data_collections=['Age'])
enriched_line_df
ID | OBJECTID | sourceCountry | areaType | bufferUnits | bufferUnitsAlias | bufferRadii | aggregationMethod | populationToPolygonSizeRating | apportionmentConfidence | ... | FEM45 | FEM50 | FEM55 | FEM60 | FEM65 | FEM70 | FEM75 | FEM80 | FEM85 | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | US | RingBuffer | esriMiles | Miles | 1 | BlockApportionment:US.BlockGroups | 2.191 | 2.576 | ... | 585 | 585 | 528 | 498 | 443 | 389 | 227 | 151 | 228 | {"rings": [[[-117.21736177272676, 34.070851408... |
1 rows × 48 columns
The returned spatial dataframe can be visualized on a map as shown below:
# Plot on a map
line_map = gis.map('Redlands, CA',13)
line_map
We can clearly see the line and a 1 mile buffer around the line in this map
# Draw line
line_map.draw(line)
# Plot enriched area around line
enriched_line_df.spatial.plot(line_map)
True
from arcgis.geometry import Polygon
poly = Polygon({"rings":[[[-117.185412,34.063170],[-122.81,37.81],
[-117.200570,34.057196],[-117.185412,34.063170]]],
"spatialReference":{"wkid":4326}})
enrich(study_areas=[poly], data_collections=['Age'])
ID | OBJECTID | sourceCountry | aggregationMethod | populationToPolygonSizeRating | apportionmentConfidence | HasData | MALE0 | MALE5 | MALE10 | ... | FEM45 | FEM50 | FEM55 | FEM60 | FEM65 | FEM70 | FEM75 | FEM80 | FEM85 | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | US | BlockApportionment:US.BlockGroups | 2.191 | 2.576 | 1 | 5532 | 5473 | 5286 | ... | 3865 | 3905 | 3896 | 3528 | 2710 | 1997 | 1303 | 844 | 943 | {"rings": [[[-117.20057, 34.057196], [-122.809... |
1 rows × 44 columns
BufferStudyArea
instances are used to change the ring buffer size or create drive-time service areas around points specified using one of the above methods. BufferStudyArea allows you to buffer point and street address study areas. They can be created using the following parameters:
* area: the point geometry or street address (string) study area to be buffered
* radii: list of distances by which to buffer the study area, eg. [1, 2, 3]
* units: distance unit, eg. Miles, Kilometers, Minutes (when using drive times/travel_mode)
* overlap: boolean, uses overlapping rings/network service areas when True, or non-overlapping disks when False
* travel_mode: None or string, one of the supported travel modes when using network service areas
BufferStudyArea
also allows you to define drive time service areas around points as well as other advanced service areas such as walking and trucking.
The example below creates disks of radii 1, 3 and 5 Miles respectively from a street address and enriches these using the 'Age' data collection.
buffered = BufferStudyArea(area='380 New York St Redlands CA 92373',
radii=[1,3,5], units='Miles', overlap=False)
drive_dist_df = enrich(study_areas=[buffered], data_collections=['Age'])
drive_dist_df
ID | OBJECTID | sourceCountry | X | Y | areaType | bufferUnits | bufferUnitsAlias | bufferRadii | aggregationMethod | ... | FEM45 | FEM50 | FEM55 | FEM60 | FEM65 | FEM70 | FEM75 | FEM80 | FEM85 | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | US | -117.194872 | 34.057237 | RingBufferBands | Miles | Miles | 1 | BlockApportionment:US.BlockGroups | ... | 376 | 398 | 374 | 340 | 310 | 262 | 153 | 98 | 129 | {"rings": [[[-117.19487199429183, 34.071745616... |
1 | 0 | 2 | US | -117.194872 | 34.057237 | RingBufferBands | Miles | Miles | 3 | BlockApportionment:US.BlockGroups | ... | 1935 | 1936 | 1999 | 2073 | 1789 | 1430 | 986 | 719 | 947 | {"rings": [[[-117.19487199429183, 34.100762745... |
2 | 0 | 3 | US | -117.194872 | 34.057237 | RingBufferBands | Miles | Miles | 5 | BlockApportionment:US.BlockGroups | ... | 2375 | 2493 | 2601 | 2478 | 2008 | 1500 | 1062 | 780 | 1018 | {"rings": [[[-117.19487199429183, 34.129779737... |
3 rows × 50 columns
The returned spatial dataframe can be visualized on a map as shown below:
# Plot on a map
buffer_map1 = gis.map('Redlands, CA')
buffer_map1.basemap = 'dark-gray-vector'
buffer_map1
drive_dist_df.spatial.plot(map_widget=buffer_map1,
renderer_type='c', # for class breaks renderer
method='esriClassifyNaturalBreaks', # classification algorithm
class_count=4, # choose the number of classes
col='bufferRadii', # numeric column to classify
cmap='viridis', # color map to pick colors from for each class
alpha=0.7 # specify opacity
)
True
The example below creates 5 and 10 minute drive times from a street address and enriches these using the 'Age' data collection.
buffered = BufferStudyArea(area='380 New York St Redlands CA 92373',
radii=[5, 10], units='Minutes',
travel_mode='Driving')
drive_time_df = enrich(study_areas=[buffered], data_collections=['Age'])
drive_time_df
ID | OBJECTID | sourceCountry | X | Y | areaType | bufferUnits | bufferUnitsAlias | bufferRadii | aggregationMethod | ... | FEM45 | FEM50 | FEM55 | FEM60 | FEM65 | FEM70 | FEM75 | FEM80 | FEM85 | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | US | -117.194872 | 34.057237 | NetworkServiceArea | Minutes | Drive Time Minutes | 5 | BlockApportionment:US.BlockGroups | ... | 658 | 661 | 657 | 617 | 567 | 463 | 300 | 212 | 280 | {"rings": [[[-117.19120531126502, 34.080999174... |
1 | 0 | 2 | US | -117.194872 | 34.057237 | NetworkServiceArea | Minutes | Drive Time Minutes | 10 | BlockApportionment:US.BlockGroups | ... | 3208 | 3232 | 3352 | 3355 | 2874 | 2259 | 1545 | 1145 | 1651 | {"rings": [[[-117.19165446621216, 34.143207222... |
2 rows × 50 columns
The returned spatial dataframe can be visualized on a map as shown below:
# Plot on a map
buffer_map2 = gis.map('Redlands, CA')
buffer_map2.basemap = 'dark-gray-vector'
buffer_map2
drive_time_df.spatial.plot(map_widget=buffer_map2,
renderer_type='c', # for class breaks renderer
method='esriClassifyNaturalBreaks', # classification algorithm
class_count=3, # choose the number of classes
col='bufferRadii', # numeric column to classify
cmap='viridis', # color map to pick colors from for each class
alpha=0.7 # specify opacity
)
True
In all previous examples of different study area types, locations were defined as either points or polygons. Study area locations can also be passed as one or many named statistical areas. This form of study area lets you define an area as a standard geographic statistical feature, such as a census or postal area, for example, to obtain enrichment information for a U.S. state, county, or ZIP Code or a Canadian province or postal code. We will explore Named statistical areas in detail in the next section.
Enriching zip code 92373 in California using the 'Age' data collection:
usa = Country.get('US')
redlands = usa.subgeographies.states['California'].zip5['92373']
type(redlands)
arcgis.geoenrichment.enrichment.NamedArea
redlands
<NamedArea name:"Redlands" area_id="92373", level="US.ZIP5", country="United States">
redlands_df = enrich(study_areas=[redlands], data_collections=['Age'] )
redlands_df
ID | OBJECTID | StdGeographyLevel | StdGeographyName | StdGeographyID | sourceCountry | aggregationMethod | populationToPolygonSizeRating | apportionmentConfidence | HasData | ... | FEM45 | FEM50 | FEM55 | FEM60 | FEM65 | FEM70 | FEM75 | FEM80 | FEM85 | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | US.ZIP5 | Redlands | 92373 | US | Query:US.ZIP5 | 2.191 | 2.576 | 1 | ... | 1024 | 1089 | 1113 | 1184 | 1101 | 970 | 662 | 475 | 701 | {"rings": [[[-117.13603999963594, 34.032169999... |
1 rows × 47 columns
The returned spatial dataframe can be visualized on a map as shown below:
zip_map = gis.map('Redlands, CA')
zip_map
redlands_df.spatial.plot(zip_map)
True
ca_counties = usa.subgeographies.states['California'].counties
counties_df = enrich(study_areas=ca_counties, data_collections=['Age'])
counties_df.head()
ID | OBJECTID | StdGeographyLevel | StdGeographyName | StdGeographyID | sourceCountry | aggregationMethod | populationToPolygonSizeRating | apportionmentConfidence | HasData | ... | FEM45 | FEM50 | FEM55 | FEM60 | FEM65 | FEM70 | FEM75 | FEM80 | FEM85 | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | US.Counties | Alameda County | 06001 | US | Query:US.Counties | 2.191 | 2.576 | 1 | ... | 55457 | 54029 | 55299 | 50442 | 42912 | 34127 | 22548 | 15100 | 18391 | {"rings": [[[-122.2716789998329, 37.9047240001... |
1 | 1 | 2 | US.Counties | Alpine County | 06003 | US | Query:US.Counties | 2.191 | 2.576 | 1 | ... | 38 | 42 | 62 | 39 | 55 | 30 | 12 | 10 | 5 | {"rings": [[[-119.90059599883206, 38.930759999... |
2 | 2 | 3 | US.Counties | Amador County | 06005 | US | Query:US.Counties | 2.191 | 2.576 | 1 | ... | 1011 | 1162 | 1591 | 1813 | 1739 | 1527 | 968 | 667 | 733 | {"rings": [[[-120.07763899965389, 38.708886999... |
3 | 3 | 4 | US.Counties | Butte County | 06007 | US | Query:US.Counties | 2.191 | 2.576 | 1 | ... | 5383 | 5817 | 6860 | 7023 | 6568 | 5293 | 3614 | 2437 | 3396 | {"rings": [[[-121.4046210002662, 40.1466409995... |
4 | 4 | 5 | US.Counties | Calaveras County | 06009 | US | Query:US.Counties | 2.191 | 2.576 | 1 | ... | 1352 | 1649 | 2219 | 2371 | 2307 | 1942 | 1206 | 727 | 688 | {"rings": [[[-120.07246000003855, 38.509156000... |
5 rows × 47 columns
county_map = gis.map('California')
county_map
counties_df.spatial.plot(map_widget=county_map,
renderer_type='c', # for class breaks renderer
method='esriClassifyNaturalBreaks', # classification algorithm
class_count=5, # choose the number of classes
col='FEM75', # numeric column to classify
cmap='viridis', # color map to pick colors from for each class
alpha=0.7 # specify opacity
)
True
county_map.legend=True
Using comparison_levels
the information for the study areas can also be compared with standard geography areas in other levels. For example, if the study area is a zip code, you can compare enriched information for this zip code with information for the county or the state.
Example 1
Let's look at an example of enriching a zip code (study area) and then comparing its enrichment information with information for the county to which the zip code belongs using comparison_levels
.
fontana = usa.subgeographies.states['California'].zip5['92336']
testdf1 = enrich(study_areas=[fontana], data_collections=['Age'],
comparison_levels=['US.Counties'])
testdf1.head()
ID | OBJECTID | StdGeographyLevel | StdGeographyName | StdGeographyID | sourceCountry | aggregationMethod | populationToPolygonSizeRating | apportionmentConfidence | HasData | ... | FEM45 | FEM50 | FEM55 | FEM60 | FEM65 | FEM70 | FEM75 | FEM80 | FEM85 | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | US.ZIP5 | Fontana | 92336 | US | Query:US.ZIP5 | 2.191 | 2.576 | 1 | ... | 3458 | 3134 | 2975 | 2353 | 1777 | 1176 | 706 | 389 | 366 | {"rings": [[[-117.42984999972606, 34.187269999... |
1 | 0 | 2 | US.Counties | San Bernardino County | 06071 | US | Query:US.Counties | 2.191 | 2.576 | 1 | ... | 65463 | 64654 | 66066 | 60308 | 49629 | 36487 | 23911 | 15572 | 16245 | None |
2 rows × 47 columns
The first row in the table above shows data for the requested zip and the second row has the data it was compared against - US.counties
. We can see how using County as the comparison_level
, we are able to compare the enriched study area (zip code) with information for the county it was compared to.
Example 2
Let's look at another example. In this example below, the 92373 zip code in Redlands intersects with both Riverside and San Bernardino counties in California. Hence, when using comparsion_levels
both these counties are returned along with the results for the named zip code. We can also add State to the list of comparsion_levels
to output results for counties and well as states.
redlands = usa.subgeographies.states['California'].zip5['92373']
testdf2 = enrich(study_areas=[redlands], data_collections=['Age'],
comparison_levels=['US.Counties', 'US.States'])
testdf2.columns
Index(['ID', 'OBJECTID', 'StdGeographyLevel', 'StdGeographyName', 'StdGeographyID', 'sourceCountry', 'aggregationMethod', 'populationToPolygonSizeRating', 'apportionmentConfidence', 'HasData', 'MALE0', 'MALE5', 'MALE10', 'MALE15', 'MALE20', 'MALE25', 'MALE30', 'MALE35', 'MALE40', 'MALE45', 'MALE50', 'MALE55', 'MALE60', 'MALE65', 'MALE70', 'MALE75', 'MALE80', 'MALE85', 'FEM0', 'FEM5', 'FEM10', 'FEM15', 'FEM20', 'FEM25', 'FEM30', 'FEM35', 'FEM40', 'FEM45', 'FEM50', 'FEM55', 'FEM60', 'FEM65', 'FEM70', 'FEM75', 'FEM80', 'FEM85', 'SHAPE'], dtype='object')
testdf2.head()
ID | OBJECTID | StdGeographyLevel | StdGeographyName | StdGeographyID | sourceCountry | aggregationMethod | populationToPolygonSizeRating | apportionmentConfidence | HasData | ... | FEM45 | FEM50 | FEM55 | FEM60 | FEM65 | FEM70 | FEM75 | FEM80 | FEM85 | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | US.ZIP5 | Redlands | 92373 | US | Query:US.ZIP5 | 2.191 | 2.576 | 1 | ... | 1024 | 1089 | 1113 | 1184 | 1101 | 970 | 662 | 475 | 701 | {"rings": [[[-117.13603999963594, 34.032169999... |
1 | 0 | 2 | US.Counties | Riverside County | 06065 | US | Query:US.Counties | 2.191 | 2.576 | 1 | ... | 71840 | 71254 | 73314 | 68647 | 60386 | 49449 | 35062 | 23797 | 25463 | None |
2 | 0 | 3 | US.Counties | San Bernardino County | 06071 | US | Query:US.Counties | 2.191 | 2.576 | 1 | ... | 65463 | 64654 | 66066 | 60308 | 49629 | 36487 | 23911 | 15572 | 16245 | None |
3 | 0 | 4 | US.States | California | 06 | US | Query:US.States | 2.191 | 2.576 | 1 | ... | 1222805 | 1227409 | 1264110 | 1182621 | 1012441 | 803457 | 549654 | 374499 | 447146 | None |
4 rows × 47 columns
testdf2.iloc[:, -20:]
MALE85 | FEM0 | FEM5 | FEM10 | FEM15 | FEM20 | FEM25 | FEM30 | FEM35 | FEM40 | FEM45 | FEM50 | FEM55 | FEM60 | FEM65 | FEM70 | FEM75 | FEM80 | FEM85 | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 380 | 877 | 859 | 940 | 902 | 1187 | 1190 | 1150 | 1273 | 1041 | 1024 | 1089 | 1113 | 1184 | 1101 | 970 | 662 | 475 | 701 | {"rings": [[[-117.13603999963594, 34.032169999... |
1 | 17664 | 85123 | 86025 | 83957 | 80112 | 80542 | 98617 | 91315 | 79758 | 71974 | 71840 | 71254 | 73314 | 68647 | 60386 | 49449 | 35062 | 23797 | 25463 | None |
2 | 9848 | 79131 | 78595 | 76288 | 71381 | 78383 | 95432 | 87077 | 73658 | 65402 | 65463 | 64654 | 66066 | 60308 | 49629 | 36487 | 23911 | 15572 | 16245 | None |
3 | 274381 | 1221927 | 1234947 | 1248664 | 1240920 | 1337702 | 1543578 | 1464688 | 1349773 | 1210247 | 1222805 | 1227409 | 1264110 | 1182621 | 1012441 | 803457 | 549654 | 374499 | 447146 | None |
To understand how the data is different between the zip code, the two counties and the state, let's plot the male to female ratio for ages 80-85.
# Create a dataframe with new Male_Female_Ratio column
bar_df = testdf2.loc[:,['StdGeographyName','FEM80','MALE80']]
bar_df['Male_Female_Ratio'] = bar_df['MALE80'] / bar_df['FEM80']
bar_df
StdGeographyName | FEM80 | MALE80 | Male_Female_Ratio | |
---|---|---|---|---|
0 | Redlands | 475 | 345 | 0.726316 |
1 | Riverside County | 23797 | 19288 | 0.810522 |
2 | San Bernardino County | 15572 | 11698 | 0.751220 |
3 | California | 374499 | 283158 | 0.756098 |
# Plot the Male_Female_Ratio
import matplotlib.pyplot as plt
%matplotlib inline
plt.figure(figsize = (8,5))
plt.bar(x = 'StdGeographyName', height = 'Male_Female_Ratio', data = bar_df)
plt.title('Male to Female Ratio Comparison');
From the above plot, we can see minor difference in the male/female ratio between two counties, the state and Redlands (Zip=92373).
Enrichment not only works on clearly defined geometries such as county or state boundaries but it can also power arbitrary goemetires (random polygon on a map or an area covering parts of different counties etc.) just as well. Let's look at an example of how an arbitrary geometry can be enrich()
ed.
In this example, we will:
la_map = gis.map('Los Angeles, CA')
la_map
Here, we will define a callback function that enables user input. If no input is provided, a default polygon geometry will be enriched.
# Define the callback function.
drawn_polygon = None
def draw_poly(la_map, g):
global drawn_polygon
drawn_polygon = g
# Set draw_poly as the callback function to be invoked when a polygon is drawn on the map
drawn_polygon = la_map.on_draw_end(draw_poly)
Now, run the cell below and then draw a polygon on la_map
, finish drawing by double clicking the mouse pointer. If no map is drawn within 30 seconds, a default polygon geometry will be used for enrichment.
import time
# Draw polygon
la_map.draw("polygon")
# Sleep for 30 seconds
time.sleep(30)
# Use this as default polygon if no polygon drawn on map
drawn_polygon = {'spatialReference': {'latestWkid': 3857, 'wkid': 102100},
'rings': [[[-13176442.352731517, 4035051.715228523],
[-13167152.267973447, 4032788.462594141],
[-13169738.58648519, 4023675.1384639805],
[-13178995.82720767, 4028428.5661604665],
[-13176442.352731517, 4035051.715228523]]]}
# Check drawn polygon
drawn_polygon
{'spatialReference': {'latestWkid': 3857, 'wkid': 102100}, 'rings': [[[-13181795.930532897, 4037434.4007196007], [-13169038.11278067, 4034871.3716149125], [-13172126.646454819, 4026770.8381095314], [-13180793.88886522, 4029142.1774792233], [-13187354.932847794, 4034394.834516697], [-13181795.930532897, 4037434.4007196007]]]}
from arcgis.geometry import Polygon
poly = Polygon(drawn_polygon)
enriched_line_df2 = enrich(study_areas=[poly],
analysis_variables=["Age.FEM45","Age.FEM55","Age.FEM65"])
enriched_line_df2
ID | OBJECTID | sourceCountry | aggregationMethod | populationToPolygonSizeRating | apportionmentConfidence | HasData | FEM45 | FEM55 | FEM65 | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | US | BlockApportionment:US.BlockGroups | 2.191 | 2.576 | 1 | 11964 | 12174 | 9685 | {"rings": [[[-118.41408756542211, 34.064264163... |
# Plot on a map
poly_map2 = gis.map('Los Angeles, CA')
poly_map2
We can clearly see the enriched geometry on this map. Clicking on the geometry will display enriched features.
# Plot enriched area around line
enriched_line_df2.spatial.plot(poly_map2)
True
In this part of the arcgis.geoenrichment
module guide series, you were introduced to the concept of study areas and how Geoenrichment uses a study area to define the location of the point, polyline or area that you want to enrich. You have also seen in detail how different types of study areas can be enriched and visualized on a map.
In the subsequent pages, you will learn about: