part6_std_geography

Previously in Geoenrichment you have learnt that a study area is used to define the location of the point or area that you want to enrich with additional information. Now, you will be introduced a new form of study area - the Standard Geography Area which lets you define an area by the ID of a standard geographic statistical feature, such as a census or postal area. For example, to obtain enrichment information for a U.S. state, county or ZIP Code or a Canadian province or postal code. The most common workflow for this service is to find a FIPS (standard geography ID) for a geographic name.

standard_geography_query method allows you to query for standard geography IDs and features at the supported geographic levels, and then they can be used to obtain facts about the location using the enrich method or create reports using create_report.

Using Standard Geography Query¶

Let's look at an example to find the standard geography ID for all Orange counties in U.S. We will then use one of these IDs and enrich() the area with information from Age data collection.

We will use US as the source country and specify US.Counties as the standard geographic layer to be queried since we are looking for Orange counties across U.S. We will use orange as the text for the service to query.

In [4]:

# Find FIPS for all Orange counties in US
orange = standard_geography_query(source_country='US', layers='US.Counties', geoquery='orange')
orange

Out[4]:

	DatasetID	DataLayerID	AreaID	AreaName	MajorSubdivisionName	MajorSubdivisionAbbr	MajorSubdivisionType	CountryAbbr	Score	ObjectId
0	USA_ESRI_2020	US.Counties	06059	Orange County	California	CA	State	US	100	1
1	USA_ESRI_2020	US.Counties	12095	Orange County	Florida	FL	State	US	100	2
2	USA_ESRI_2020	US.Counties	18117	Orange County	Indiana	IN	State	US	100	3
3	USA_ESRI_2020	US.Counties	36071	Orange County	New York	NY	State	US	100	4
4	USA_ESRI_2020	US.Counties	37135	Orange County	North Carolina	NC	State	US	100	5
5	USA_ESRI_2020	US.Counties	48361	Orange County	Texas	TX	State	US	100	6
6	USA_ESRI_2020	US.Counties	50017	Orange County	Vermont	VT	State	US	100	7
7	USA_ESRI_2020	US.Counties	51137	Orange County	Virginia	VA	State	US	100	8

The resulting dataframe shows DatasetID, DataLayerID which are the IDs for dataset and layer being queried. AreaID is the unique ID for each area in the results. AreaName is Orange County as we looked for Orange counties across U.S. MajorSubdivisionName, MajorSubdivisionAbbr and MajorSubdivisionType show the type of major subdivision i.e. State along with state name and abbrevation.

The standard_geography_query returns a list of Orange counties for different states, with the state name shown as field MajorSubdivisionName. Now, let's enrich() Orange County in California using AreaID: 06059.

In [5]:

or_ca = {"sourceCountry":"US","layer":"US.Counties","ids":["06059"]}

In [6]:

orange_df = enrich(study_areas=[or_ca], data_collections=['Age'] )
orange_df

Out[6]:

	ID	OBJECTID	StdGeographyLevel	StdGeographyName	StdGeographyID	sourceCountry	aggregationMethod	populationToPolygonSizeRating	apportionmentConfidence	HasData	...	FEM45	FEM50	FEM55	FEM60	FEM65	FEM70	FEM75	FEM80	FEM85	SHAPE
0	0	1	US.Counties	Orange County	06059	US	Query:US.Counties	2.191	2.576	1	...	106005	107845	107235	97121	81462	65917	47365	33050	40794	{"rings": [[[-117.9157650000062, 33.9469249994...

1 rows × 47 columns

In [7]:

orange_df.columns

Out[7]:

Index(['ID', 'OBJECTID', 'StdGeographyLevel', 'StdGeographyName',
       'StdGeographyID', 'sourceCountry', 'aggregationMethod',
       'populationToPolygonSizeRating', 'apportionmentConfidence', 'HasData',
       'MALE0', 'MALE5', 'MALE10', 'MALE15', 'MALE20', 'MALE25', 'MALE30',
       'MALE35', 'MALE40', 'MALE45', 'MALE50', 'MALE55', 'MALE60', 'MALE65',
       'MALE70', 'MALE75', 'MALE80', 'MALE85', 'FEM0', 'FEM5', 'FEM10',
       'FEM15', 'FEM20', 'FEM25', 'FEM30', 'FEM35', 'FEM40', 'FEM45', 'FEM50',
       'FEM55', 'FEM60', 'FEM65', 'FEM70', 'FEM75', 'FEM80', 'FEM85', 'SHAPE'],
      dtype='object')

In [41]:

or_ca_map = gis.map('Los Angeles, CA')
or_ca_map

In [29]:

orange_df.spatial.plot(or_ca_map)

Out[29]:

True

geoquery parameter is used to specify the search criteria in order to query for the standard geography layers desired. A query is broken up into terms and operators. Multiple terms can be combined together with Boolean operators to form more complex queries. Learn more about using geoquery to create more complex queries here.

In [33]:

or_lake = standard_geography_query(source_country='US', layers='US.Counties', geoquery='(Orange OR Lake)')
or_lake

Out[33]:

	DatasetID	DataLayerID	AreaID	AreaName	MajorSubdivisionName	MajorSubdivisionAbbr	MajorSubdivisionType	CountryAbbr	Score	ObjectId
0	USA_ESRI_2020	US.Counties	06059	Orange County	California	CA	State	US	100	1
1	USA_ESRI_2020	US.Counties	12095	Orange County	Florida	FL	State	US	100	2
2	USA_ESRI_2020	US.Counties	18117	Orange County	Indiana	IN	State	US	100	3
3	USA_ESRI_2020	US.Counties	48361	Orange County	Texas	TX	State	US	100	4
4	USA_ESRI_2020	US.Counties	50017	Orange County	Vermont	VT	State	US	100	5
5	USA_ESRI_2020	US.Counties	51137	Orange County	Virginia	VA	State	US	100	6
6	USA_ESRI_2020	US.Counties	36071	Orange County	New York	NY	State	US	99	7
7	USA_ESRI_2020	US.Counties	37135	Orange County	North Carolina	NC	State	US	99	8
8	USA_ESRI_2020	US.Counties	06033	Lake County	California	CA	State	US	87	9
9	USA_ESRI_2020	US.Counties	08065	Lake County	Colorado	CO	State	US	87	10
10	USA_ESRI_2020	US.Counties	12069	Lake County	Florida	FL	State	US	87	11
11	USA_ESRI_2020	US.Counties	17097	Lake County	Illinois	IL	State	US	87	12
12	USA_ESRI_2020	US.Counties	18089	Lake County	Indiana	IN	State	US	87	13
13	USA_ESRI_2020	US.Counties	26085	Lake County	Michigan	MI	State	US	87	14
14	USA_ESRI_2020	US.Counties	27075	Lake County	Minnesota	MN	State	US	87	15
15	USA_ESRI_2020	US.Counties	30047	Lake County	Montana	MT	State	US	87	16
16	USA_ESRI_2020	US.Counties	39085	Lake County	Ohio	OH	State	US	87	17
17	USA_ESRI_2020	US.Counties	41037	Lake County	Oregon	OR	State	US	87	18
18	USA_ESRI_2020	US.Counties	47095	Lake County	Tennessee	TN	State	US	87	19
19	USA_ESRI_2020	US.Counties	16007	Bear Lake County	Idaho	ID	State	US	87	20
20	USA_ESRI_2020	US.Counties	27125	Red Lake County	Minnesota	MN	State	US	87	21
21	USA_ESRI_2020	US.Counties	46079	Lake County	South Dakota	SD	State	US	87	22
22	USA_ESRI_2020	US.Counties	49035	Salt Lake County	Utah	UT	State	US	87	23
23	USA_ESRI_2020	US.Counties	55047	Green Lake County	Wisconsin	WI	State	US	87	24
24	USA_ESRI_2020	US.Counties	02164	Lake and Peninsula Borough	Alaska	AK	State	US	86	25
25	USA_ESRI_2020	US.Counties	27077	Lake of the Woods County	Minnesota	MN	State	US	85	26

In [34]:

or_lake_ca = standard_geography_query(source_country='US', layers='US.Counties', geoquery='(Orange OR Lake) AND CA')
or_lake_ca

Out[34]:

	DatasetID	DataLayerID	AreaID	AreaName	MajorSubdivisionName	MajorSubdivisionAbbr	MajorSubdivisionType	CountryAbbr	Score	ObjectId
0	USA_ESRI_2020	US.Counties	06059	Orange County	California	CA	State	US	100	1
1	USA_ESRI_2020	US.Counties	06033	Lake County	California	CA	State	US	89	2

The standard_geography_query gave us details of Orange and Lake counties in California. Now, let's enrich() these counties using AreaID.

In [35]:

or_lk = {"sourceCountry":"US","layer":"US.Counties","ids":["06059","06033"]}

In [36]:

or_lake_df = enrich(study_areas=[or_lk], data_collections=['Age'] )
or_lake_df

Out[36]:

	ID	OBJECTID	StdGeographyLevel	StdGeographyName	StdGeographyID	sourceCountry	aggregationMethod	populationToPolygonSizeRating	apportionmentConfidence	HasData	...	FEM45	FEM50	FEM55	FEM60	FEM65	FEM70	FEM75	FEM80	FEM85	SHAPE
0	0	1	US.Counties	Orange County	06059	US	Query:US.Counties	2.191	2.576	1	...	106005	107845	107235	97121	81462	65917	47365	33050	40794	{"rings": [[[-117.9157650000062, 33.9469249994...
1	0	2	US.Counties	Lake County	06033	US	Query:US.Counties	2.191	2.576	1	...	1918	2176	2680	2946	2742	2104	1382	834	926	{"rings": [[[-122.81409900076635, 39.581399999...

2 rows × 47 columns

In [42]:

or_lake_map = gis.map('California, US',6)
or_lake_map

In [40]:

or_lake_df.spatial.plot(or_lake_map)

Out[40]:

True

The GeoEnrichment service employs a sophisticated geographic retrieval methodology to aggregate data called Data Apportionment. This methodology determines how data is gathered and summarized or aggregated for input features.

For standard geographic units such as states, provinces, counties, or postal codes, the link between a designated area and its attribute data is a simple one-to-one relationship. So, the data retrieval is a simple process of gathering the data for those areas.

The non-standard geographic units such as ring buffers, drive-time service areas, and other non-standard polygons, the geographic retrieval process is more complicated, because the input polygon may intersect geographic areas that contain data that needs to be aggregated.

The GeoEnrichment service uses Weighted Centroid geographic retrieval to aggregate data for rings and other polygons. With this methodology, data points within an area of interest are weighted more heavily than points outside that area. When the service aggregates data, the results are statistically adjusted to more accurately reflect the actual statistics within the area of interest.

The GeoEnrichment service uses the most detailed geographies with the most recent census data, or authoritative estimates, available for commercial use from each country. For most countries, data is updated every two years, and a few countries are updated annually because data are readily available. Esri spreads the updates throughout the year on a quarterly basis. The data for each country are the most recently available estimates.

The GeoEnrichment service uses a data apportionment algorithm to redistribute demographic, business, economic, and landscape variables to input polygon features. The algorithm analyzes each polygon to be enriched relative to a point dataset and a detailed dataset of reporting unit polygons that contain attributes for the selected variables. Based on how each polygon being enriched overlays these datasets, the algorithm determines the appropriate amount of each variable to assign.

Imagine you want to get statistics on total population for the study area represented by center polygon in this image.

Source: https://developers.arcgis.com/rest/geoenrichment/api-reference/data-apportionment.htm

The other four polygons represent census geographies that contain total population values. In the United States, these can be Block Groups with enrichment data; in Canada, they can be Dissemination Areas. The study area intersects 4 block groups that are partially inside the study area. Using area P3 as an example, the population weight for this area is determined by summing the block weights within this polygon. For example, if 90 percent of the P3 Blocks' population are within the study area, and the Total Population of P3 is 100 people, you can determine that 90 people in area P3 are inside the study area.

So, for those partially included blocks, the GeoEnrichment service uses data apportionment and the weighted centroid retrieval method to calculate the approximate statistics for those portions of block groups inside the study area. It considers all the block points within each block group touched by the study area but weights the block points inside the study area more heavily.

You can learn more about Data Apportionment and how it works here.

The GeoEnrichment service implements limits on users in order to guarantee accuracy and performance. The limits define the maximum size of a study area, maximum number of study areas, business records in an output, maximum drive time polygon size and many more. Exceeding these limits will cause your query to fail or be returned with a warning that you have exceeded one of these limits and will get results up until the limit is reached.

service_limits() method from arcgis.geoenrichment module can be used to discover and generate a list of service limits.

Let's look at all the service limits.

In [30]:

# Check service limits
service_limits()

Out[30]:

	paramName	paramDescription	dataType	value
0	MaximumRingSize	Maximum size of rings for simple rings builders.	esriMiles	1000
1	MaximumRingSizeTime	Maximum size of rings (time units) for drive t...	esriDriveTimeUnitsMinutes	300
2	defaultFeaturesLimitPerComparisonLevel	Default maximum number of features to return p...	numeric	5
3	maxRecordCount	Maximum number of features to return.	numeric	1000
4	maximumAttributeDescriptionLength	Maximum length of attribute’s description string.	numeric	1000
5	maximumDataCollections	Maximum number of data collections to return o...	numeric	20
6	maximumDetailedMethodStudyAreasSize	Maximum size of rings for drive time/simple ri...	esriMiles	300
7	maximumDriveDistance	Maximum size of rings for drive time rings bui...	esriMiles	300
8	maximumDriveTimeStudyAreasNumber	Maximum number of drive time study areas in on...	numeric	100
9	maximumNumberOfStudyAreasWithDetailedMethod	Maximum number of study areas in one enrich re...	numeric	3
10	maximumOutFieldsNumber	Maximum number of ‘outFields’ set in intersect...	numeric	256
11	maximumRingsNumber	Maximum number of rings for study area locatio...	numeric	10
12	maximumSelectBusinessesResponseRecords	Maximum number of features returned by select ...	numeric	5000
13	maximumStdGeographyIDsNumber	Maximum number of standard geography IDs to re...	numeric	1000
14	maximumStudyAreasNumber	Maximum number of study areas in one enrich re...	numeric	100
15	maximumStudyAreasNumberInfographicReportHTML	Maximum number of study areas in one create in...	numeric	100
16	maximumStudyAreasNumberInfographicReportPDF	Maximum number of study areas in one create in...	numeric	50
17	optimalBatchStudyAreasNumber	Optimal number of study areas to request in ea...	numeric	50

In [17]:

service_df = service_limits()

In [18]:

service_df.head()

Out[18]:

	paramName	paramDescription	dataType	value
0	MaximumRingSize	Maximum size of rings for simple rings builders.	esriMiles	1000
1	MaximumRingSizeTime	Maximum size of rings (time units) for drive t...	esriDriveTimeUnitsMinutes	300
2	defaultFeaturesLimitPerComparisonLevel	Default maximum number of features to return p...	numeric	5
3	maxRecordCount	Maximum number of features to return.	numeric	1000
4	maximumAttributeDescriptionLength	Maximum length of attribute’s description string.	numeric	1000

In [20]:

service_df[service_df['paramName']=='MaximumRingSize']

Out[20]:

	paramName	paramDescription	dataType	value
0	MaximumRingSize	Maximum size of rings for simple rings builders.	esriMiles	1000

In this final part of the arcgis.geoenrichment module guide series, you have seen how the standard_geography_query method is used to query for standard geography areas which can then be used for enrichment, and it being customized to meet more complex search criteria when targeting at more specific results. You have also seen how Data Apportionment utilizes geographic retrieval methodology to aggregate data and how service_limits() can be used to generate a list of limits for different services.

In this guide series, we have demonstrated a majority of the functionality showcasing the power of arcgis.geoenrichment module in various ways. To look up the API reference doc for GeoEnrichment see here.

Part 6 - Standard Geography Queries¶

Table of Contents

Standard Geography Queries¶

Using Standard Geography Query¶

Enrich using results from Standard Geography Query¶

Visualize on a Map¶

Customizing your Query¶

Enrich using results from Standard Geography Query¶

Visualize on a Map¶

Data Apportionment¶

How Apportionment Works¶

Service Limits¶

Conclusion¶