Analysis Notebooks:
  1. Part-1 Provider Shortage - Data Preparation
  2. Part-2 Explore Healthcare Providers in U.S.
  3. Part-3 Explore Mental and OB-GYN Providers
  4. Part-4 Modeling and Interpretation

Key Libraries: ArcGIS API for Python, ArcPy, Dask, Matplotlib, Seaborn, Statsmodels, Scikit-learn.

Language: Python

Project Details:

America has a severe shortage of healthcare providers. Demand for services is increasing and shortages of physicians and mental health providers are limiting patients’ access to required treatment.

In this 4-part study, we use Dask, ArcGIS Python API, ArcGIS WebGIS stack, spatial analysis and geoanalytics tools on National Plan and Provider Enumeration System (NPPES) healthcare provider data to:

  • Read, Clean and Geocode ~ 5.7 million records of healthcare provider data.
  • Identify shortage areas for physicians, mental health and OB-GYN healthcare providers in the U.S.
  • Identify sociodemographic and economic factors that influence access to providers.
  • Identify the influence of factors and whether the influence in positive or negative.

Part-1 Provider Shortage - Data Preparation
In this notebook, we will:

  • Read large data file (~6GB) using Dask.
  • Clean the data by dropping existing columns, adding new columns, renaming columns, change categories of a column and change data types.
  • Compute data using dask.
  • Basic exploration and missing value determination.
  • Export data into multiple csv files.

Part-2 Explore Healthcare Providers in U.S.
In this notebook, we will:

  • Get Geocoded data and check geocoding results.
  • Gather and Process Demographic and Health Expenditure Data.
  • Aggregate Provider Data at the County Level.
  • Explore distribution of Heakthcare Providers in U.S.
  • Explore Texas and study how:
    • Provider Count varies with Population Density, Median Income and Median Age.

Part-3 Explore Mental and OB-GYN Providers
In this notebook, we will:

  • Explore Mental Healthcare Providers
    • Study how Mental Healthcare Providers vary by Population Density accross all states.
    • Identify state with highest people to provider ratio.
    • Explore the state with highest ratio to understand how providers vary across different counties for that state.
  • Explore OBGYN Healthcare Providers
    • Study how OBGYN Healthcare Providers vary by population of mothers accross all states.
    • Identify state with highest mothers to provider ratio.
    • Explore the state with highest ratio to understand how providers vary across different counties for that state.

Part-4 Modeling and Interpretation
In this notebook, we will:

  • Generate a Base (OLS) Model of provider count using demographic and health expenditure variables.
  • Perform Feature Selection using multiple techniques to select relevant features.
  • Run a Global (OLS) model using selected features.
  • Create a Geographically Weighted Regression (GWR) Model to understand impact how impact of various predictors varies accross different counties.
  • Create a Forest Based Classification and Regression Trees Model to understand Non-linear relations and to indentify important variables.
  • Create Local Bivariate Relationships (LBR) Model to understand the type and significance of relationships of Provider Count with respect to variables selected from Forest based model.