Extreme heat and public health in coastal California cities: enhancing resilience via improved early warning systems
PROPOSAL TO CLIMATE PROGRAM OFFICE (CPO) EXTREME HEAT RISK INITIATIVE COMPETITION, FY2021
Data sources for green space proxies.
We will consider five main proxies for green space. These five indexes, interpolated at the zip code level, are described below and include NDVI, Tree Canopy, Park Access, Tree Cover, and Impervious Surface Cover.
NDVI (or Normalized Difference Vegetation Index) is a measure of the “greenness” of a given area of land, and is obtained by satellite imagery. This well-known satellite-derived index is obtained from NOAA’s Climate Data Record (CDR) NDVI remote sensing product.
Tree Canopy is a measure of the amount of tree canopy present in an area, specifically the “Population-weighted percentage of the census tract area with tree canopy” (HPI Documentation). This data will be obtained from the California Healthy Places Index, which in turn obtained the data from the California Department of Public Health. This data, originally presented by census tract, will be converted to zip code level for this study (as health data is only available at this level, see details below).
Park Access similarly measures “Percentage of the population living within a half-mile of a park, beach, or open space greater than 1 acre” (HPI Documentation). Data for this measure also come from the California Healthy Places Index and will be converted from census tract to zip code level.
Impervious Surface Cover measures the imperviousness of land surfaces – i.e. to what degree they are covered in impervious materials such as pavement. These data will be obtained from the National Landcover Database (NLCD) available in Google Earth Engine from USGS. This is a 30m resolution dataset that exists in 8 iterations from 1992 to 2016. We will then calculate the proportion of a given zip code with impervious land surfaces.
Finally, Tree Cover represents a percentage of land surface area covered by trees based on satellite imagery and was also obtained from the same source as impervious surfaces. We will also interpolate these values at the zip code level.
Health data at our disposal.
We will rely on statewide hospital admissions and emergency department (ED) visits available from 1999 to 2020 from California’s Office of Statewide Health Planning and Development and the UC electronic health records (EPIC system) with detailed information on symptoms and clinical signs associated with hospital admission, sociodemographic characteristics, date and zip code of residence (40, 56). We will obtain all hospitalizations data in California for the years 2004–2018 from the Office of Statewide Health Planning and Development (OSHPD). The following primary diagnoses will be considered, as listed in the International Classification of Disease codes, 9th Revision, Clinical Modification (ICD-9): acute myocardial infarction (MI) (410), acute renal failure (584), cardiac dysrhythmias (427), cardiovascular disease (CVD) (390–459), dehydration/volume depletion (276.5), essential hypertension (401), heat illness (992), ischemic heart disease (410–414), ischemic stroke (433–436), and all respiratory diseases (460–519). These particular diseases are chosen because they have previously been linked to extreme temperature (Bunker et al, 2016; Li et al., 2015, Sherbakov et al 2018).
Details about the implementation of the spatial Within-Community Matched Design
Step1 – Matching Procedure
For each HW day, we identified controls (non-HW days) based on three criteria, 1) same zip code 2) same month and 3) on weekday or weekend based on a binary category. These matched days were only included if they were above the 75th percentile of the warm season (May-September) distribution of daily-maximum temperature for the zip code where the HW day is observed. We chose three matched days at random and took the median as the comparison value. Median was used to account for outliers.
Step 2 – Absolute Scale Estimation
We calculated a zip code specific contrast between HW and matched non-HW days by taking the median number of hospitalizations for the 3 matched days, such that our estimates were not affected by outliers in counts of hospitalizations. We then calculated the difference in number of hospitalizations between the HW day and the median of the non-HW matches. Finally, we calculated the average of the contrast between all HW days and corresponding matched non-HW days for each zipcode to obtain the count of hospital admission attributable to heat for each zipcode.
Step 3 – Relative Scale Estimation
Population size is expected to be highly correlated with the count of hospital admission attributable to heat. Yet, we may observe some (positive or negative) discrepancies between the estimated burden in each zip code and the corresponding size of the population in that zip code. Such discrepancies can be interpreted as the relative risk of heat on hospitalizations. When such discrepancies are positive, they can be interpreted as a higher vulnerability as compared to what is expected based on the population size. When such discrepancies are negative, they can highlight that the population in those zip codes are less vulnerable as compared to the rest of California. Thus to estimate relative scale burden, we fitted a linear model of the absolute differences (the results of step 2). From that model, residuals were obtained and represent our measure of a relative risk of heat on hospitalizations. We then mapped these residuals.
Step 4 – Precision Estimation
We used a bootstrap-T procedure, also known as the studentized bootstrap (Efron & Tibshirani, 1994; Wilcox, 2017). We resampled the difference in hospitalizations on HW versus matched non-HW days to estimate 95% confidence interval lower bounds (2.5 percentile) for excess hospitalizations due to heat at each zip code. Such approach only accounts for the variance in taking the mean over the zip code and does not account for the variance in match days sampling and or the median of the matched days. Therefore, the bootstrap-T confidence intervals are not wide enough but give a general sense of precision and are comparable across zipcodes. Additionally, we created maps of the ratio between the mean and the bootstrap standard error (SE) estimate for an indication of signal strength, as a proxy for statistical precision.
Extension of analysis with Bayesian Hierarchical model
To consider spatial autocorrelation and improve precision in our estimates we used a spatial Bayesian Hierarchical model (BHM). BHMs provide a decrease in variance of estimates by using information spatially near any point. We used the within-community matched design estimates for each zip code as the response value in a linear BHM, using the spBayes package in R. Since point-referenced data is required for BHMs, we used population weighted centroids provided by the US Census Bureau as location for our ZIP codes. We fit an empirical semi-variogram to estimate the starting prior for the parameters: sill ( ), nugget (), and range() and used a spherical covariance structure based on the shape of the empirical semi-variogram. The model was formulated as a hierarchical model with the following two stages:
First stage: Y|θ,W ~ N(Xβ+W, τ^2 I)
Second stage: W|σ2, ϕ ~N(0, σ2H)
Where W is the vector of spatial random effects, and is the vector of parameters including , sill, nugget, and range. The Yi are conditionally independent given W. H is the spatial correlation structure, specified as isotropic. The second stage model is the process model introduced to capture spatial dependence. We completed model specification by adding priors to and , and the hyper parameters and .
We used prior distributions of parameters such that the model is not particularly sensitive to the priors. We used 10,000 Markov chain Monte Carlo (MCMC) samples, 7,500 of which were used for burn in. Recovered samples of spatial weights were used for the estimation and interpolation across space using multi-level B-splines. Though the above methodology assumes isotropy, we acknowledge that spatial correlation across grid cells in our study area differed based on climatic and topographic characteristics. Lastly, to represent the precision of the BHM estimates, we estimated the signal-to-noise ratio (SNR) using the resulting model output (weights for each ZIP code and standard deviations). The SNR was mapped for each ZIP code as an indication of areas where estimates are more (or less) precise. Traditionally, precision is found at an SNR of about 2.