Part 6 - Standard Geography Queries

Table of Contents

Standard Geography Queries

Previously in Geoenrichment you have learnt that a study area is used to define the location of the point or area that you want to enrich with additional information. Now, you will be introduced a new form of study area - the Standard Geography Area which lets you define an area by the ID of a standard geographic statistical feature, such as a census or postal area. For example, to obtain enrichment information for a U.S. state, county or ZIP Code or a Canadian province or postal code. The most common workflow for this service is to find a FIPS (standard geography ID) for a geographic name.

standard_geography_query method allows you to query for standard geography IDs and features at the supported geographic levels, and then they can be used to obtain facts about the location using the enrich method or create reports using create_report.

Using Standard Geography Query

Let's look at an example to find the standard geography ID for all Orange counties in U.S. We will then use one of these IDs and enrich() the area with information from Age data collection.

We will use US as the source country and specify US.Counties as the standard geographic layer to be queried since we are looking for Orange counties across U.S. We will use orange as the text for the service to query.

The resulting dataframe shows DatasetID, DataLayerID which are the IDs for dataset and layer being queried. AreaID is the unique ID for each area in the results. AreaName is Orange County as we looked for Orange counties across U.S. MajorSubdivisionName, MajorSubdivisionAbbr and MajorSubdivisionType show the type of major subdivision i.e. State along with state name and abbrevation.

Enrich using results from Standard Geography Query

The standard_geography_query returns a list of Orange counties for different states, with the state name shown as field MajorSubdivisionName. Now, let's enrich() Orange County in California using AreaID: 06059.

Enrichment using Age data collection resulted in many columns for various age groups. Other columns such as Standard Geography ID, Name, Level, country, and populationToPolygonSizeRating were also added with enrichment.

Visualize on a Map

Let's visualize the enriched geography on a map.

Customizing your Query

geoquery parameter is used to specify the search criteria in order to query for the standard geography layers desired. A query is broken up into terms and operators. Multiple terms can be combined together with Boolean operators to form more complex queries. Learn more about using geoquery to create more complex queries here.

Let's look at an example of grouping the search terms to find all Orange or Lake counties in US. Search supports using parentheses to group clauses to form subqueries. This can be useful if you want to control the Boolean logic for a query.

We see that there are multiple Orange and Lake counties in US. Let's get the results for Orange or Lake county in California.

Enrich using results from Standard Geography Query

The standard_geography_query gave us details of Orange and Lake counties in California. Now, let's enrich() these counties using AreaID.

Visualize on a Map

Let's visualize the enriched counties on a map.

Data Apportionment

The GeoEnrichment service employs a sophisticated geographic retrieval methodology to aggregate data called Data Apportionment. This methodology determines how data is gathered and summarized or aggregated for input features.

For standard geographic units such as states, provinces, counties, or postal codes, the link between a designated area and its attribute data is a simple one-to-one relationship. So, the data retrieval is a simple process of gathering the data for those areas.

The non-standard geographic units such as ring buffers, drive-time service areas, and other non-standard polygons, the geographic retrieval process is more complicated, because the input polygon may intersect geographic areas that contain data that needs to be aggregated.

The GeoEnrichment service uses Weighted Centroid geographic retrieval to aggregate data for rings and other polygons. With this methodology, data points within an area of interest are weighted more heavily than points outside that area. When the service aggregates data, the results are statistically adjusted to more accurately reflect the actual statistics within the area of interest.

The GeoEnrichment service uses the most detailed geographies with the most recent census data, or authoritative estimates, available for commercial use from each country. For most countries, data is updated every two years, and a few countries are updated annually because data are readily available. Esri spreads the updates throughout the year on a quarterly basis. The data for each country are the most recently available estimates.

How Apportionment Works

The GeoEnrichment service uses a data apportionment algorithm to redistribute demographic, business, economic, and landscape variables to input polygon features. The algorithm analyzes each polygon to be enriched relative to a point dataset and a detailed dataset of reporting unit polygons that contain attributes for the selected variables. Based on how each polygon being enriched overlays these datasets, the algorithm determines the appropriate amount of each variable to assign.

Imagine you want to get statistics on total population for the study area represented by center polygon in this image.

Source: https://developers.arcgis.com/rest/geoenrichment/api-reference/data-apportionment.htm

The other four polygons represent census geographies that contain total population values. In the United States, these can be Block Groups with enrichment data; in Canada, they can be Dissemination Areas. The study area intersects 4 block groups that are partially inside the study area. Using area P3 as an example, the population weight for this area is determined by summing the block weights within this polygon. For example, if 90 percent of the P3 Blocks' population are within the study area, and the Total Population of P3 is 100 people, you can determine that 90 people in area P3 are inside the study area.

So, for those partially included blocks, the GeoEnrichment service uses data apportionment and the weighted centroid retrieval method to calculate the approximate statistics for those portions of block groups inside the study area. It considers all the block points within each block group touched by the study area but weights the block points inside the study area more heavily.

You can learn more about Data Apportionment and how it works here.

Service Limits

The GeoEnrichment service implements limits on users in order to guarantee accuracy and performance. The limits define the maximum size of a study area, maximum number of study areas, business records in an output, maximum drive time polygon size and many more. Exceeding these limits will cause your query to fail or be returned with a warning that you have exceeded one of these limits and will get results up until the limit is reached.

service_limits() method from arcgis.geoenrichment module can be used to discover and generate a list of service limits.

Let's look at all the service limits.

The paramName shows maximum size, number, drive time etc. of a study area. The paramDescription column details the description of each parameter name. The dataType column shows the type of data for the parameter and the value column shows the service limit.

service_limits() method returns a Pandas' DataFrame that describes the service's limitations for each input parameter. We can store the dataframe and use Pandas operations to subset and get results for specific service.

Conclusion

In this final part of the arcgis.geoenrichment module guide series, you have seen how the standard_geography_query method is used to query for standard geography areas which can then be used for enrichment, and it being customized to meet more complex search criteria when targeting at more specific results. You have also seen how Data Apportionment utilizes geographic retrieval methodology to aggregate data and how service_limits() can be used to generate a list of limits for different services.

In this guide series, we have demonstrated a majority of the functionality showcasing the power of arcgis.geoenrichment module in various ways. To look up the API reference doc for GeoEnrichment see here.