In my last post, I mentioned the idea that population immunity, or the total % of the population that has been infected, is a major determinant of COVID-19 spread. I displayed a chart of daily new cases in NYC and compared it to social mobility data, showing an apparent negative correlation between mobility and cases. My assertion is that population-level immunity is more important than many other factors in determining how fast the virus spreads. I’d like to add a little more support for that view here.

Another piece of anecdotal evidence comes from my second home, Los Angeles County:

From the above chart, you can see that the daily new case count peaked in mid July, even though lockdowns were enforced beginning in March, and a mask mandate has been in place since May. Yet from mid July, new cases have been steadily plummeting, even with little or no decrease in mobility since that time:

Now, anecdotal evidence is all well and good, but I much prefer statistical evidence when available, so I pulled some county-level data from a COVID tracking website, with estimates for the R_{t} value by U.S. county for each date during the crisis.

*I’ll note before giving the results that a more complete analysis than I’ve done would incorporate multiple variables (e.g. mask usage, mobility) to ensure I’m not picking up on secondary effects from correlated variables. Perhaps I’ll look at doing that in the future, but that requires substantially more work.*

(Update: My follow-up post looks at a more complete model).

I filtered the data to select only counties with a population of at least 250k, which gave me a total of 273 counties. I looked at the (smoothed) R_{t} values for every Tuesday during the crises, comparing them to the % of the population that had tested positive for COVID by that date. Here’s a scatter plot:

The correlation between these 2 variables is -0.52. Of course, there are many other factors that determine R_{t}, some of which are mostly random, but population infection rate (immunity) is clearly a large factor. Note that everyone agrees the total number of infected is much greater than the number of cases, though the ratio varies by region. With a 10x multiplier (typical for the U.S., I think) a 2% case rate implies 20% total infected.

Here’s a box plot comparing R_{t} for all instances above/below a threshold of 2% total case rate:

A statistical comparison of the 2 datasets gives:

The significance stats are somewhat overstated, as successive Tuesday’s numbers for each county will not be truly independent. But I’ve tried running these analyses by “undersampling” the dates (e.g. only using 1 Tuesday per month, or even less), and I still saw strong significance in all tests.

As I mentioned in the previous post regarding NYC, these high case rates don’t indicate real herd immunity. Instead, I suggest we stop thinking about herd immunity as a binary concept, and realize that for places with low population immunity, suppressing the spread is incredibly difficult, regardless of social distancing, masks, etc.