In my previous post I demonstrated a strong negative correlation between cumulative COVID cases and the Rt (current rate of reproduction of the virus) on a countywide basis in the US. I mentioned, though, that my quick and dirty data analysis was incomplete – a univariate analysis can be misleading if there are confounding factors. In this post, I expand the data to a multivariate model to examine possibly correlated factors.
For those who think the hypothesis I expressed in my previous posts (that population immunity is the primary factor determining Rt) is wrong, there are two likely counterarguments:
- People in regions more strongly impacted by COVID take it more seriously, leading to more social distancing and mask wearing.
- Mask mandates have generally been introduced in places with high case loads, and its those mandates that are mostly responsible for the reduction in spread.
The multivariate model I present here includes 3 new factors:
The base dataset and date ranges remain the same as in my previous post. Each datapoint corresponds to one major county in America every 4th Tuesday. Here are the results from an OLS:
The R2 value isn’t terribly high, so we need to be careful about making strong conclusions (lower R2 indicates a lot is left unexplained). But the results do suggest some meaningful takeaways:
- My view remains unchallenged. Even after accounting for social mobility and mask wearing (mandates), the cumulative case rate (which is associated with population-level immunity) is by far the best predictor of Rt.
- Surprisingly, the coefficient for mobility is negative – which would imply that higher mobility leads to reduced virus spread if the coefficient was statistically significant (p<0.05). It is not. This could suggest that Stay at Home/Shelter in Place orders may have been worthless for containing the spread in America, though I suspect it might instead mean that the mobility data doesn’t accurately measure what’s meaningful in spreading the virus.
- Mask mandates do appear to contribute to reducing the spread, though they don’t guarantee anything.
- Current Case /Capita does not seem to matter (another indication that people in current hot spots don’t adjust their behavior, thereby causing a reduction in Rt).
Some notes and caveats:
- I normalized all input variables to have zero mean and stdev=1
- I smoothed the Apple mobility indexes with a 14-day moving average, then computed a single score by averaging the walking/driving/transit scores. This score may not give the best predictor of Rt, but I wanted to keep it as simple as possible.
- Ideally, the mobility data would’ve been measured year-over-year, instead of indexed in January. But this is the data I have.
- I should’ve smoothed the Current Cases /Capita with a moving average, but I got lazy.
- The statistical significance numbers are likely to be modestly overstated, as the data is likely not all 100% independent (bordering counties, repeated points from same location 4 weeks apart).
- I got most of the dates for the beginning of statewide mask mandates here, though I had to Google 2 or 3 missing dates. Some counties/cities had mask mandates before their states implemented them, but I’m not sure I’m willing to put in the time/effort to collect that data.