# Import Libraries
from arcgis.gis import GIS
from arcgis.geoenrichment import Country, enrich, service_limits, standard_geography_query
# Create a GIS Connection
gis = GIS(profile='your_online_profile')
# Get Country
usa = Country.get('USA')
Previously in Geoenrichment you have learnt that a study area
is used to define the location of the point or area that you want to enrich with additional information. Now, you will be introduced a new form of study area
- the Standard Geography Area
which lets you define an area by the ID of a standard geographic statistical feature, such as a census or postal area. For example, to obtain enrichment information for a U.S. state, county or ZIP Code or a Canadian province or postal code. The most common workflow for this service is to find a FIPS (standard geography ID) for a geographic name.
standard_geography_query
method allows you to query for standard geography IDs and features at the supported geographic levels, and then they can be used to obtain facts about the location using the enrich
method or create reports using create_report
.
Let's look at an example to find the standard geography ID for all Orange counties in U.S. We will then use one of these IDs and enrich()
the area with information from Age
data collection.
We will use US
as the source country and specify US.Counties
as the standard geographic layer to be queried since we are looking for Orange counties across U.S. We will use orange
as the text for the service to query.
# Find FIPS for all Orange counties in US
orange = standard_geography_query(source_country='US', layers='US.Counties', geoquery='orange')
orange
DatasetID | DataLayerID | AreaID | AreaName | MajorSubdivisionName | MajorSubdivisionAbbr | MajorSubdivisionType | CountryAbbr | Score | ObjectId | |
---|---|---|---|---|---|---|---|---|---|---|
0 | USA_ESRI_2020 | US.Counties | 06059 | Orange County | California | CA | State | US | 100 | 1 |
1 | USA_ESRI_2020 | US.Counties | 12095 | Orange County | Florida | FL | State | US | 100 | 2 |
2 | USA_ESRI_2020 | US.Counties | 18117 | Orange County | Indiana | IN | State | US | 100 | 3 |
3 | USA_ESRI_2020 | US.Counties | 36071 | Orange County | New York | NY | State | US | 100 | 4 |
4 | USA_ESRI_2020 | US.Counties | 37135 | Orange County | North Carolina | NC | State | US | 100 | 5 |
5 | USA_ESRI_2020 | US.Counties | 48361 | Orange County | Texas | TX | State | US | 100 | 6 |
6 | USA_ESRI_2020 | US.Counties | 50017 | Orange County | Vermont | VT | State | US | 100 | 7 |
7 | USA_ESRI_2020 | US.Counties | 51137 | Orange County | Virginia | VA | State | US | 100 | 8 |
The resulting dataframe shows DatasetID, DataLayerID
which are the IDs for dataset and layer being queried. AreaID
is the unique ID for each area in the results. AreaName
is Orange County as we looked for Orange counties across U.S. MajorSubdivisionName, MajorSubdivisionAbbr and MajorSubdivisionType
show the type of major subdivision i.e. State
along with state name and abbrevation.
The standard_geography_query
returns a list of Orange counties for different states, with the state name shown as field MajorSubdivisionName
. Now, let's enrich()
Orange County in California using AreaID: 06059
.
or_ca = {"sourceCountry":"US","layer":"US.Counties","ids":["06059"]}
orange_df = enrich(study_areas=[or_ca], data_collections=['Age'] )
orange_df
ID | OBJECTID | StdGeographyLevel | StdGeographyName | StdGeographyID | sourceCountry | aggregationMethod | populationToPolygonSizeRating | apportionmentConfidence | HasData | ... | FEM45 | FEM50 | FEM55 | FEM60 | FEM65 | FEM70 | FEM75 | FEM80 | FEM85 | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | US.Counties | Orange County | 06059 | US | Query:US.Counties | 2.191 | 2.576 | 1 | ... | 106005 | 107845 | 107235 | 97121 | 81462 | 65917 | 47365 | 33050 | 40794 | {"rings": [[[-117.9157650000062, 33.9469249994... |
1 rows × 47 columns
orange_df.columns
Index(['ID', 'OBJECTID', 'StdGeographyLevel', 'StdGeographyName', 'StdGeographyID', 'sourceCountry', 'aggregationMethod', 'populationToPolygonSizeRating', 'apportionmentConfidence', 'HasData', 'MALE0', 'MALE5', 'MALE10', 'MALE15', 'MALE20', 'MALE25', 'MALE30', 'MALE35', 'MALE40', 'MALE45', 'MALE50', 'MALE55', 'MALE60', 'MALE65', 'MALE70', 'MALE75', 'MALE80', 'MALE85', 'FEM0', 'FEM5', 'FEM10', 'FEM15', 'FEM20', 'FEM25', 'FEM30', 'FEM35', 'FEM40', 'FEM45', 'FEM50', 'FEM55', 'FEM60', 'FEM65', 'FEM70', 'FEM75', 'FEM80', 'FEM85', 'SHAPE'], dtype='object')
Enrichment using Age
data collection resulted in many columns for various age groups. Other columns such as Standard Geography ID, Name, Level, country, and populationToPolygonSizeRating were also added with enrichment.
Let's visualize the enriched geography on a map.
or_ca_map = gis.map('Los Angeles, CA')
or_ca_map
orange_df.spatial.plot(or_ca_map)
True
geoquery
parameter is used to specify the search criteria in order to query for the standard geography layers desired. A query is broken up into terms and operators. Multiple terms can be combined together with Boolean operators to form more complex queries. Learn more about using geoquery
to create more complex queries here.
Let's look at an example of grouping the search terms to find all Orange or Lake
counties in US. Search supports using parentheses to group clauses to form subqueries. This can be useful if you want to control the Boolean logic for a query.
or_lake = standard_geography_query(source_country='US', layers='US.Counties', geoquery='(Orange OR Lake)')
or_lake
DatasetID | DataLayerID | AreaID | AreaName | MajorSubdivisionName | MajorSubdivisionAbbr | MajorSubdivisionType | CountryAbbr | Score | ObjectId | |
---|---|---|---|---|---|---|---|---|---|---|
0 | USA_ESRI_2020 | US.Counties | 06059 | Orange County | California | CA | State | US | 100 | 1 |
1 | USA_ESRI_2020 | US.Counties | 12095 | Orange County | Florida | FL | State | US | 100 | 2 |
2 | USA_ESRI_2020 | US.Counties | 18117 | Orange County | Indiana | IN | State | US | 100 | 3 |
3 | USA_ESRI_2020 | US.Counties | 48361 | Orange County | Texas | TX | State | US | 100 | 4 |
4 | USA_ESRI_2020 | US.Counties | 50017 | Orange County | Vermont | VT | State | US | 100 | 5 |
5 | USA_ESRI_2020 | US.Counties | 51137 | Orange County | Virginia | VA | State | US | 100 | 6 |
6 | USA_ESRI_2020 | US.Counties | 36071 | Orange County | New York | NY | State | US | 99 | 7 |
7 | USA_ESRI_2020 | US.Counties | 37135 | Orange County | North Carolina | NC | State | US | 99 | 8 |
8 | USA_ESRI_2020 | US.Counties | 06033 | Lake County | California | CA | State | US | 87 | 9 |
9 | USA_ESRI_2020 | US.Counties | 08065 | Lake County | Colorado | CO | State | US | 87 | 10 |
10 | USA_ESRI_2020 | US.Counties | 12069 | Lake County | Florida | FL | State | US | 87 | 11 |
11 | USA_ESRI_2020 | US.Counties | 17097 | Lake County | Illinois | IL | State | US | 87 | 12 |
12 | USA_ESRI_2020 | US.Counties | 18089 | Lake County | Indiana | IN | State | US | 87 | 13 |
13 | USA_ESRI_2020 | US.Counties | 26085 | Lake County | Michigan | MI | State | US | 87 | 14 |
14 | USA_ESRI_2020 | US.Counties | 27075 | Lake County | Minnesota | MN | State | US | 87 | 15 |
15 | USA_ESRI_2020 | US.Counties | 30047 | Lake County | Montana | MT | State | US | 87 | 16 |
16 | USA_ESRI_2020 | US.Counties | 39085 | Lake County | Ohio | OH | State | US | 87 | 17 |
17 | USA_ESRI_2020 | US.Counties | 41037 | Lake County | Oregon | OR | State | US | 87 | 18 |
18 | USA_ESRI_2020 | US.Counties | 47095 | Lake County | Tennessee | TN | State | US | 87 | 19 |
19 | USA_ESRI_2020 | US.Counties | 16007 | Bear Lake County | Idaho | ID | State | US | 87 | 20 |
20 | USA_ESRI_2020 | US.Counties | 27125 | Red Lake County | Minnesota | MN | State | US | 87 | 21 |
21 | USA_ESRI_2020 | US.Counties | 46079 | Lake County | South Dakota | SD | State | US | 87 | 22 |
22 | USA_ESRI_2020 | US.Counties | 49035 | Salt Lake County | Utah | UT | State | US | 87 | 23 |
23 | USA_ESRI_2020 | US.Counties | 55047 | Green Lake County | Wisconsin | WI | State | US | 87 | 24 |
24 | USA_ESRI_2020 | US.Counties | 02164 | Lake and Peninsula Borough | Alaska | AK | State | US | 86 | 25 |
25 | USA_ESRI_2020 | US.Counties | 27077 | Lake of the Woods County | Minnesota | MN | State | US | 85 | 26 |
We see that there are multiple Orange and Lake counties in US. Let's get the results for Orange or Lake county in California.
or_lake_ca = standard_geography_query(source_country='US', layers='US.Counties', geoquery='(Orange OR Lake) AND CA')
or_lake_ca
DatasetID | DataLayerID | AreaID | AreaName | MajorSubdivisionName | MajorSubdivisionAbbr | MajorSubdivisionType | CountryAbbr | Score | ObjectId | |
---|---|---|---|---|---|---|---|---|---|---|
0 | USA_ESRI_2020 | US.Counties | 06059 | Orange County | California | CA | State | US | 100 | 1 |
1 | USA_ESRI_2020 | US.Counties | 06033 | Lake County | California | CA | State | US | 89 | 2 |
The standard_geography_query
gave us details of Orange and Lake counties in California. Now, let's enrich()
these counties using AreaID
.
or_lk = {"sourceCountry":"US","layer":"US.Counties","ids":["06059","06033"]}
or_lake_df = enrich(study_areas=[or_lk], data_collections=['Age'] )
or_lake_df
ID | OBJECTID | StdGeographyLevel | StdGeographyName | StdGeographyID | sourceCountry | aggregationMethod | populationToPolygonSizeRating | apportionmentConfidence | HasData | ... | FEM45 | FEM50 | FEM55 | FEM60 | FEM65 | FEM70 | FEM75 | FEM80 | FEM85 | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | US.Counties | Orange County | 06059 | US | Query:US.Counties | 2.191 | 2.576 | 1 | ... | 106005 | 107845 | 107235 | 97121 | 81462 | 65917 | 47365 | 33050 | 40794 | {"rings": [[[-117.9157650000062, 33.9469249994... |
1 | 0 | 2 | US.Counties | Lake County | 06033 | US | Query:US.Counties | 2.191 | 2.576 | 1 | ... | 1918 | 2176 | 2680 | 2946 | 2742 | 2104 | 1382 | 834 | 926 | {"rings": [[[-122.81409900076635, 39.581399999... |
2 rows × 47 columns
Let's visualize the enriched counties on a map.
or_lake_map = gis.map('California, US',6)
or_lake_map
or_lake_df.spatial.plot(or_lake_map)
True
The GeoEnrichment service employs a sophisticated geographic retrieval methodology to aggregate data called Data Apportionment. This methodology determines how data is gathered and summarized or aggregated for input features.
For standard geographic units such as states, provinces, counties, or postal codes, the link between a designated area and its attribute data is a simple one-to-one relationship. So, the data retrieval is a simple process of gathering the data for those areas.
The non-standard geographic units such as ring buffers, drive-time service areas, and other non-standard polygons, the geographic retrieval process is more complicated, because the input polygon may intersect geographic areas that contain data that needs to be aggregated.
The GeoEnrichment service uses Weighted Centroid geographic retrieval to aggregate data for rings and other polygons. With this methodology, data points within an area of interest are weighted more heavily than points outside that area. When the service aggregates data, the results are statistically adjusted to more accurately reflect the actual statistics within the area of interest.
The GeoEnrichment service uses the most detailed geographies with the most recent census data, or authoritative estimates, available for commercial use from each country. For most countries, data is updated every two years, and a few countries are updated annually because data are readily available. Esri spreads the updates throughout the year on a quarterly basis. The data for each country are the most recently available estimates.
The GeoEnrichment service uses a data apportionment algorithm to redistribute demographic, business, economic, and landscape variables to input polygon features. The algorithm analyzes each polygon to be enriched relative to a point dataset and a detailed dataset of reporting unit polygons that contain attributes for the selected variables. Based on how each polygon being enriched overlays these datasets, the algorithm determines the appropriate amount of each variable to assign.
Imagine you want to get statistics on total population for the study area represented by center polygon in this image.
Source: https://developers.arcgis.com/rest/geoenrichment/api-reference/data-apportionment.htm
The other four polygons represent census geographies that contain total population values. In the United States, these can be Block Groups with enrichment data; in Canada, they can be Dissemination Areas. The study area intersects 4 block groups that are partially inside the study area. Using area P3 as an example, the population weight for this area is determined by summing the block weights within this polygon. For example, if 90 percent of the P3 Blocks' population are within the study area, and the Total Population of P3 is 100 people, you can determine that 90 people in area P3 are inside the study area.
So, for those partially included blocks, the GeoEnrichment service uses data apportionment and the weighted centroid retrieval method to calculate the approximate statistics for those portions of block groups inside the study area. It considers all the block points within each block group touched by the study area but weights the block points inside the study area more heavily.
You can learn more about Data Apportionment and how it works here.
The GeoEnrichment service implements limits on users in order to guarantee accuracy and performance. The limits define the maximum size of a study area, maximum number of study areas, business records in an output, maximum drive time polygon size and many more. Exceeding these limits will cause your query to fail or be returned with a warning that you have exceeded one of these limits and will get results up until the limit is reached.
service_limits()
method from arcgis.geoenrichment
module can be used to discover and generate a list of service limits.
Let's look at all the service limits.
# Check service limits
service_limits()
paramName | paramDescription | dataType | value | |
---|---|---|---|---|
0 | MaximumRingSize | Maximum size of rings for simple rings builders. | esriMiles | 1000 |
1 | MaximumRingSizeTime | Maximum size of rings (time units) for drive t... | esriDriveTimeUnitsMinutes | 300 |
2 | defaultFeaturesLimitPerComparisonLevel | Default maximum number of features to return p... | numeric | 5 |
3 | maxRecordCount | Maximum number of features to return. | numeric | 1000 |
4 | maximumAttributeDescriptionLength | Maximum length of attribute’s description string. | numeric | 1000 |
5 | maximumDataCollections | Maximum number of data collections to return o... | numeric | 20 |
6 | maximumDetailedMethodStudyAreasSize | Maximum size of rings for drive time/simple ri... | esriMiles | 300 |
7 | maximumDriveDistance | Maximum size of rings for drive time rings bui... | esriMiles | 300 |
8 | maximumDriveTimeStudyAreasNumber | Maximum number of drive time study areas in on... | numeric | 100 |
9 | maximumNumberOfStudyAreasWithDetailedMethod | Maximum number of study areas in one enrich re... | numeric | 3 |
10 | maximumOutFieldsNumber | Maximum number of ‘outFields’ set in intersect... | numeric | 256 |
11 | maximumRingsNumber | Maximum number of rings for study area locatio... | numeric | 10 |
12 | maximumSelectBusinessesResponseRecords | Maximum number of features returned by select ... | numeric | 5000 |
13 | maximumStdGeographyIDsNumber | Maximum number of standard geography IDs to re... | numeric | 1000 |
14 | maximumStudyAreasNumber | Maximum number of study areas in one enrich re... | numeric | 100 |
15 | maximumStudyAreasNumberInfographicReportHTML | Maximum number of study areas in one create in... | numeric | 100 |
16 | maximumStudyAreasNumberInfographicReportPDF | Maximum number of study areas in one create in... | numeric | 50 |
17 | optimalBatchStudyAreasNumber | Optimal number of study areas to request in ea... | numeric | 50 |
The paramName shows maximum size, number, drive time etc. of a study area. The paramDescription column details the description of each parameter name. The dataType column shows the type of data for the parameter and the value column shows the service limit.
service_limits()
method returns a Pandas' DataFrame that describes the service's limitations for each input parameter. We can store the dataframe and use Pandas operations to subset and get results for specific service.
service_df = service_limits()
service_df.head()
paramName | paramDescription | dataType | value | |
---|---|---|---|---|
0 | MaximumRingSize | Maximum size of rings for simple rings builders. | esriMiles | 1000 |
1 | MaximumRingSizeTime | Maximum size of rings (time units) for drive t... | esriDriveTimeUnitsMinutes | 300 |
2 | defaultFeaturesLimitPerComparisonLevel | Default maximum number of features to return p... | numeric | 5 |
3 | maxRecordCount | Maximum number of features to return. | numeric | 1000 |
4 | maximumAttributeDescriptionLength | Maximum length of attribute’s description string. | numeric | 1000 |
service_df[service_df['paramName']=='MaximumRingSize']
paramName | paramDescription | dataType | value | |
---|---|---|---|---|
0 | MaximumRingSize | Maximum size of rings for simple rings builders. | esriMiles | 1000 |
In this final part of the arcgis.geoenrichment
module guide series, you have seen how the standard_geography_query
method is used to query for standard geography areas which can then be used for enrichment, and it being customized to meet more complex search criteria when targeting at more specific results. You have also seen how Data Apportionment utilizes geographic retrieval methodology to aggregate data and how service_limits()
can be used to generate a list of limits for different services.
In this guide series, we have demonstrated a majority of the functionality showcasing the power of arcgis.geoenrichment
module in various ways. To look up the API reference doc for GeoEnrichment see here.