Exploring Healthcare Provider Shortage in U.S. (4-part study)
Analysis Notebooks:
- Part-1 Provider Shortage - Data Preparation
- Part-2 Explore Healthcare Providers in U.S.
- Part-3 Explore Mental and OB-GYN Providers
- Part-4 Modeling and Interpretation
Key Libraries: ArcGIS API for Python, ArcPy, Dask, Matplotlib, Seaborn, Statsmodels, Scikit-learn.
Language: Python
Project Details:
America has a severe shortage of healthcare providers. Demand for services is increasing and shortages of physicians and mental health providers are limiting patients’ access to required treatment.
In this 4-part study, we use Dask, ArcGIS Python API, ArcGIS WebGIS stack, spatial analysis and geoanalytics tools on National Plan and Provider Enumeration System (NPPES) healthcare provider data to:
- Read, Clean and Geocode ~ 5.7 million records of healthcare provider data.
- Identify shortage areas for physicians, mental health and OB-GYN healthcare providers in the U.S.
- Identify sociodemographic and economic factors that influence access to providers.
- Identify the influence of factors and whether the influence in positive or negative.
Part-1 Provider Shortage - Data Preparation
In this notebook, we will:
- Read large data file (~6GB) using Dask.
- Clean the data by dropping existing columns, adding new columns, renaming columns, change categories of a column and change data types.
- Compute data using dask.
- Basic exploration and missing value determination.
- Export data into multiple csv files.
Part-2 Explore Healthcare Providers in U.S.
In this notebook, we will:
- Get Geocoded data and check geocoding results.
- Gather and Process Demographic and Health Expenditure Data.
- Aggregate Provider Data at the County Level.
- Explore distribution of Heakthcare Providers in U.S.
- Explore Texas and study how:
- Provider Count varies with Population Density, Median Income and Median Age.
Part-3 Explore Mental and OB-GYN Providers
In this notebook, we will:
- Explore Mental Healthcare Providers
- Study how Mental Healthcare Providers vary by Population Density accross all states.
- Identify state with highest people to provider ratio.
- Explore the state with highest ratio to understand how providers vary across different counties for that state.
- Explore OBGYN Healthcare Providers
- Study how OBGYN Healthcare Providers vary by population of mothers accross all states.
- Identify state with highest mothers to provider ratio.
- Explore the state with highest ratio to understand how providers vary across different counties for that state.
Part-4 Modeling and Interpretation
In this notebook, we will:
- Generate a Base (OLS) Model of provider count using demographic and health expenditure variables.
- Perform Feature Selection using multiple techniques to select relevant features.
- Run a Global (OLS) model using selected features.
- Create a Geographically Weighted Regression (GWR) Model to understand impact how impact of various predictors varies accross different counties.
- Create a Forest Based Classification and Regression Trees Model to understand Non-linear relations and to indentify important variables.
- Create Local Bivariate Relationships (LBR) Model to understand the type and significance of relationships of Provider Count with respect to variables selected from Forest based model.