Part 4 - What to enrich with? (What are Data Collections and Analysis Variables?)

Table of Contents

Data Collections and GeoEnrichment coverage

As described earlier, a data collection is a preassembled list of attributes that will be used to enrich the input features. Collection attributes can describe various types of information, such as demographic characteristics and geographic context of the locations or areas submitted as input features.

Some data collections (such as default) can be used in all supported countries. Other data collections may only be available in one or a collection of countries. Data Browser can be used to examine the entire global listing of variables, and associated datasets for each country.

List Countries with GeoEnrichment Data

The get_countries() method can be used to query the countries for which GeoEnrichment data is available, and it returns a list of Country objects with which you can further query for properties. This list can also be viewed here.

Data Collections for U.S.

The data_collections property of a Country object lists its available data collections and analysis variables under each data collection as a Pandas dataframe.

In order to discover the data collections for a particular country, you may first access the reference variable to it using the country.get() method, and then fetch the data collections from country.data_collections property. Once we know the data collection we would like to use, we can look at analysisVariables available in that data collection.

Unique Data Collections for U.S.

Each data collection and analysis variable has a unique ID. When calling the enrich() method (explained earlier in this guide) these analysis variables can be passed in the data_collections and analysis_variables parameters.

As an example, here we see a subset of the data collections for US showing 2 different data collections and multiple analysis variables for each collection.

The table above shows 2 different data collections (1yearincrements and 5yearincrements). Since these are Age data collections, the analysisVariables for these collections are similar. vintage shows the year that the demographic data represents. For example, a vintage of 2020 means that the data represents the year 2020.

Let's get a list of unique data collections that are available for U.S.

United States has 150 unique data collections. Here are the first 10 data collections.

Looking at fieldCategory is a great way to clearly understand what the data collection is about. fieldCategory combines vintage, datacollectionID columns along with the year and data collection. However, to query a data collection its unique ID (dataCollectionID) must be used.

Let's look at the fieldCategory column for a few data collections in US.

Data Collections by Socio-demographic Factors

You can filter the data_collections to get collections for a specific factor using Pandas expressions. Let's loook at data collections for different socio-demographic factors such as Age, Population, Income.

Data Collections for Age

Data Collections for Population

Data Collections for Income

As mentioned earlier, using a data_collection's unique ID (dataCollectionID) is the best way to further query a data collection. Let's look at the dataCollectionID for various Income data collections.

Analysis variables for Data Collections

Once we know the data collection we would like to use, we can look at all the unique variables available in that data collection using its unique ID. Let's discover analysisVariables for some of the data collections.

Analysis variables for Age data collection

Analysis variables are typically represented as dataCollectionID.<analysis variable name> as seen above.

Analysis variables for Age_by_Sex_by_Race_Profile_rep data collection

Analysis variables for DaytimePopulation data collection

Data Collections for Another Country

Let's look at data collections for New Zealand. Data Browser can be used to examine the entire global listing of variables, and associated datasets for New Zealand.

In order to discover the data collections for a particular country, you may first access the reference variable to it using the country.get() method, and then fetch the data collections from country.data_collections property. Once we know the data collection we would like to use, we can look at analysisVariables available in that data collection.

Unique Data Collections for New Zealand

Let's get a list of unique data collections that are available for New Zealand.

New Zealand has 12 unique data collections.

We can look at the fieldCategory column to understand each category better.

Looking at fieldCategory is a great way to clearly understand what the data collection is about. However, to query a data collection its unique ID (dataCollectionID) must be used.

Data Collections for Socio-demographic Factors

New Zealand has fewer data_collections compared to U.S. Let's look at data collections for Key Facts, Education and Spending.

Data Collection for Key Facts

Data Collection for Education

Data Collection for Spending

Analysis variables for Data Collections

Once we know the data collection we would like to use, we can look at all the unique variables available in that data collection using its unique ID. Let's discover analysisVariables for some of the data collections we looked at earlier.

Analysis variables for KeyGlobalFacts data collection

Analysis variables for EducationalAttainment data collection

Analysis variables for Spending data collection

Perform Enrichment using Data Collections and Analysis Variables

Data Collections can be used to enrich various study areas. data_collections and analysis_variables can be passed in the enrich() method. Details about enriching study areas can be found in Enriching Study Areas section.

Let's look at a few similar examples of GeoEnrichment here.

Enrich using Data Collections

Enrich with Age data collection

Here we see an address being enriched by data from Age data collection.

When a data collection is specified without specific analysis variables, all variables under the data collection are used for enrichment as can be seen above.

Enrich with Health data collection

Here we see a zip code being enriched by data from Health data collection.

Enrich using Analysis Variables

Data can be enriched by specifying specific analysis variables of a data collection with which we want to enrich our data. In this example, we will look at analysis_variables for Age data_collection and then use specific analysis variables to enrich() a study area.

Now, we will enrich our study area with Age.FEM45, Age.FEM55, Age.FEM65 variables

Enriching Spatially Enabled Dataframes

One of the most common use case for GeoEnrichment is enriching existing data in feature layers. As a user, you may need to analyze and enrich your data that already exists in feature layers. Spatially Enabled DataFrame (SeDF) helps us bring the data from layer into a dataframe which can then be GeoEnriched.

Let's look at an example using an existing layer of Covid-19 dataset. This feature layer includes latest Covid-19 Cases, Recovered and Deaths data for U.S. at the county level.

We can query the layer as a dataframe and then use the dataframe for enrichment.

To showcase GeoEnrichment, we will create a subset of the original data and then enrich() the subset.

A dataframe can be passed as a value to study_areas parameter of the enrich() method. Here we are enriching our dataframe with specific variables from Age data collection.

We can see that enrichment resulted in 91 records and 31 columns. There are some areas in our dataframe for which enrichment information is not available. Hence, we have 91 records instead of 100. Geoenrichment adds some additional columns along with the analysis variables we enriched for and so we see 31 columns however we are dropping duplicates and unnecessary columns to bring the count down to 28 columns.

Visualize on a Map

Let's visualize the enriched dataframe on a map. We will use FEM65 column to classify our data for plotting on the map.

Conclusion

In this part of the arcgis.geoenrichment module guide series, you saw how data_collections property of a Country object lists its available data_collections and analysis_variables. You explored different data collections, their analysis variables and then enriched study areas using the same. Towards the end, you experienced how spatially enabled dataframes can be enriched.

In the subsequent pages, you will learn about Generating Reports and Standard Geography Queries.