Latent Tree Models Learned on Word Embeddings, Part 3

Continuing my exploratory studies on latent tree models within NLP (Part 1, Part 2), I’ve run a few more simulations on a simplified English->French translation dataset. In my most recent post, I looked at a few different feature sets and hyperparameters to compare a simple RNN neural machine translation (NMT) model with an augmented version that has modified latent tree model (MLTM) features injected into the context vector of the decoder module. For this post, I’ve expanded the dataset somewhat (it’s still a very small, narrow set) and looked at the effect of using pretrained word embeddings in the model.


The methods I’m using here are similar to those in my previous post. I’m now using the full simplified dataset as seen in Chapter 8 of Rao & McMahan, and I’m reporting results with the more standard BLEU score (BLEU-4, with an unweighted average of unigrams…4-grams). As I’ve played around with these tests, I found that the random initialization of weights within the models had a larger impact on the final results than I expected, so I’ve run a set of tests with different random number seeds to look at the distribution of scores. I believe it’s common practice to use pretrained word embeddings within these kinds of models, while still allowing the embedding weights to be fine-tuned on the data, so I’ve run two sets of tests to compare the performance between using pretrained embeddings and starting with random initializations.

All code and data for these simulations can be found here. I’ve generally used the same hyperparameters as found in Rao & MchMahan, but I’ve changed the embedding dimensions to match the dimension from the GloVe data (50d) I’ve been using.


Table I compares BLEU scores for the baseline RNN-GRU model with attention along with my MLTM-feature-augmented model using randomly initialized word embedding weights. Table II shows the results for the models with pretrained embedding weights from the GloVe embeddings, the data I used to generate the MLTM. I was concerned that the performance improvement I saw from my MLTM feature enhancement may have been solely due to the introduction of word embeddings from a much larger dataset, but I still see significant improvement in translation performance for the MLTM model even when both models make use of pretrained embeddings.

Table I. BLEU scores without pretrained embeddings.

Table II. BLEU scores with pretrained embeddings.

Note these BLEU scores are much higher than you would normally see, which is primarily because the dataset is limited to sentences of a very specific format.

Going Forward

I mentioned some ideas for improving my feature enhancements in the previous post, but for now my plan is to move the current model to larger datasets with longer, more complex sentences. I’ve run many of the initial tests for the MLTM features on a Lenovo laptop with a dual-core Intel i3 CPU (I’m clearly doing serious ML research here 🤣), but I’ve recently received a major hardware upgrade, which I will talk about a little in my next post…

Latent Tree Models Learned on Word Embeddings, Part 2

In my previous post, I introduced the idea of learning a tree structure on word embedding data using an agglomerative clustering algorithm and applying the learned modified latent tree model (MLTM) as a feature extractor on text data. I examined the power of the extracted features on a simple text classification task using news article texts, and found the MLTM features provided for strong classification accuracy when used as inputs to a simple MLP classifier.

Extending this idea, I’ve been experimenting with augmenting a standard neural machine translator (NMT) with these features. Machine translation is the task of automatically translating a sentence (or any set of text, really) from one natural language to another (e.g. English to French). While deep learning has made great progress on this task, more improvement will be necessary to get automatic machine translators to match expert human ability.

Recurrent Neural Networks

Until a few years ago, recurrent neural networks (RNN) were generally considered the best tool for machine translation (along with convolutional neural networks). They’ve been replaced by a newer architecture known as the Transformer, which does away with recurrence altogether and outperforms RNN’s on standard benchmarks. For now, though, I’ve decided to stick with a RNN to test my MLTM feature set.

When I began diving into deep learning for natural language processing (NLP), one of the books I bought was Natural Language Processing with PyTorch by Rao & McMahan. It some nice Python example code, including full logic required to run a NMT simulation. A Jupyter notebook based on the material in chapter 8 can be found here, which I’ll be using as a template for my simulations. The ANN architecture is an Encoder-Decoder GRU RNN with an Attention mechanism.


Earlier RNN models proved capable of translating short sentences with good accuracy, but their performance fell off quickly when faced with longer strings of text. This is due to the logic of the encoder-decoder structure: the final hidden state from the encoder is used as the initial input to the decoder. As the encoder steps forward over the elements of a sequence, the information stored in the hidden state from the first words in a sentence decays.

The solution for this problem is a mechanism known as Attention. Built inside the decoder, after each step, a context vector is computed using the current decoder hidden state and all encoder hidden states from the sequence. You can think of the context vector as a representation of the most relevant encoder hidden state(s). The context vector is concatenated to the target input vector or hidden state and fed into the next layer. You can read find a deeper explanation of the Attention mechanism here.

Attention + MLTM

My initial idea for incorporating the MLTM features into a NMT model is to concatenate the features to the context vector within the decoder, or rather, concatenate a projected version of the features. The projection is done by adding a Linear layer in PyTorch to change the dimensionality of the MLTM features to match the context vector. One thing I find a bit awkward about this approach is that the MLTM features are static – they don’t change over each frame of the sequence like the other values. It’s easy to implement, however, and doesn’t add much processing cost.


I compare the baseline NMT model from Rao & McMahan to my augmented version, using their example code and accompanying English/French data. All code and some of the necessary data can be found here.

I generally used the same parameters as found in Rao & McMahan’s example, though I reduced the number of epochs from 100 to 50 to save time (from the logs, the validation loss appears to saturate around 20-30 epochs).

Parameters for simulation with no tree pruning


The data consists of pairs of English/French sentences, sourced from the Rao & McMahan repository. Because I’m running my simulations on cheap hardware, following their example, I’ve sliced out a very narrow subset of the data – my subset is actually narrower than theirs, consisting of only 3375 pairs total. All English sentences begin with “i am,” “he is,” “she is,” “they are,” “you are,” or “we are.” Mine is smaller because I ignore sentences with contractions that match (e.g. “he’s”).


Scoring machine translation is difficult, as many sentences with equivalent meanings can be phrased in different ways, and not all words translate directly across languages. Currently, the most popular scoring method is BLEU, but my understanding is that it has a number of flaws, and may be replaced with something better in the future. To keep things simple, for now I’m using the word accuracy computation function provided by Rao & McMahan as the performance metric.

Table I shows word accuracy for various values of a pruning parameter on the MLTM, both with and without dropout on the linear transformation from the MLTM binary features to context augmentation vector.

Min DescendantsMLTM FeaturesDropoutAccuracy
Baseline NTM ModelN/AN/A43.25%
16123None 45.68%
8251None 46.02%
4501None 46.18%
None1672None 49.65%
161230.2 47.33%
82510.2 46.80%
45010.2 47.78%
None16720.2 51.19%
Table I.

While the results are a bit noisy, dropout is generally beneficial for the MLTM feature projection (unhelpful for the smallest number of features), and the best performance comes from the model with no pruning. Every parameter combination outperforms the baseline, with the best performer providing an almost 8% absolute improvement.

Going Forward

While the preliminary results for a MLTM-augmented NMT model look good, they come from a very limited dataset. Furthermore, the base NTM model is no longer considered state-of-the-art. I’ll be looking to expand the dataset and compare against a Transformer, the current SoA model. I also want to look at alternative performance metrics such as BLEU.

Among ways to further improve the model, I have a couple other ideas:

  • The modified latent tree model could be extended to a modified latent random forest model, in which each tree uses a randomized bootstrap sampling of the dimensions of the word embeddings.
  • As I mentioned in part 1, the MLTM architecture is similar to a MLP with fixed weights and step function activations. I’d like to try injecting the tree itself into the neural model and allowing the training procedure to optimize the weights. This could lead to a “neural latent tree model” (NLTM).

Latent Tree Models Learned on Word Embeddings, Part 1

Back in June of 2008, I successfully defended my doctoral dissertation and received my PhD in Electrical and Computer Engineering from Marquette University. My field of research in grad school was automatic speech recognition, with an emphasis on ASR under noisy conditions. After I graduated, I started working for Crabel Capital Management. In my first months, I tried using some of the machine learning (ML) approaches I learned in college to develop automated trading strategies, but I never succeeded in making that work. I’m still with Crabel, and up until recently, I’d almost completely abandoned the field of machine learning.

I’ve become interested in doing ML research again, and have acquainted myself with PyTorch, Facebook’s deep learning toolkit for Python. While there’s been a ton of progress in ML since I left it more than a decade ago, I don’t believe deep neural networks – not by themselves, anyway – will lead to artificial generalized intelligence (AGI), or even robust AI. As I’ve been reading papers and articles on ML/AI, I came across a concept known as latent tree models (LTM) I found pretty interesting, and I decided to play around with this model on basic datasets.

Latent Tree Models

In a latent tree model, the leaves of the tree correspond to observed variables, such as words in English, and the non-leaf nodes correspond to latent variables, which could be interpreted as higher-level concepts. In some of my recent work at Crabel, I’ve used the sklearn toolkit to perform agglomerative clustering on data, and one of my first thoughts when I initially read about LTM’s is that building a tree with agglomerative clustering on word embeddings seemed like a natural task. Word embeddings are numerical feature vectors assigned to words within a dictionary, and are typically learned through some form of deep learning.

I was unable to find any papers that took this exact approach, though I did find one which looked LTM’s on text data using word co-occurrences. Here’s a good survey paper on LTM’s for anyone interested.

I wrote some python code to learn the structure of an LTM using word embeddings from Stanford NLP’s GloVe embeddings, which is an open-source, freely available project. You can find my code and the data referenced below here on Github.

A standard LTM is a particular type of Bayesian network (a causal tree), where each edge has a link matrix of probabilities associated with it. The code I’ve written modifies the standard form to ignore the link matrices, and simply treat every parent node (latent variable) as a binary random variable (proposition), using a form of OR function on its children, according to the rule:

In contrast, in a typical causal tree, within the upward message-passing algorithm, the likelihood values are multiplied, which would be more analogous to an AND function. Another way to view my logic is to see it as an ANN with a tree structure, where the inputs are the words, all weights are fixed at 1, the bias for each node is fixed at -0.5, and the activation function is the step function originally seen in perceptrons. The output of the network is every hidden node, which forms the latent feature vector. I refer to this model as a modified LTM (MLTM) in the code.


To test my approach, I decided to start with a simple text classification task. In this task, we have a dataset of text with class labels for various categories. The goal is to train a classifier with labeled data to be able to determine the correct class for unlabeled test data.


The dataset I chose contains brief news articles from AG News, labeled with one of four news topics. I performed some filtering and pre-processing to clean it up before running it through my classification code.


The comparison baseline classifier I used is a naive Bayes classifier. It’s an extremely simple method, but it works quite well on this type of task. All spam filters were originally built using naive Bayes.

To build the MLTM, I use the AgglomerativeClustering class from the sklearn package. Here’s some sample code:

The parameters shown will cause the clustering algorithm to build a full binary tree with the number of non-leaf nodes (latent variables) equal to the number of leaf nodes (words) minus one. After the MLTM is built, I extract binary feature vectors from a set of text by setting the leaf node values to True (1) if the corresponding word appears in the text:

The beta values are propagated upward according to the first equation. All beta values from non-leaf nodes are extracted from the tree to form the binary feature vector.

My experimental MLTM classifier consists of an MLP with two hidden layers, with the MLTM binary feature vectors as inputs, implemented with PyTorch. I added an option to the MLTM to prune the tree down by grouping together observed variables to meet a minimum count of leaf nodes for any parent node.


The table below shows the test (out-of-sample) classification accuracy for the baseline and the MLTM-MLP classifier for various values of the pruning variable, min_descendants.

Min DescendantsClassification Accuracy
Naive Bayes88.32%

This is a four-class classification problem, so a random classifier should give an accuracy of around 25%. The MLTM-MLP classifier is able to outperform naive Bayes when the pruning parameter is no higher than 8. I find it interesting that my experimental classifier still performs reasonably well even when I prune the tree down by requiring every latent variable have a minimum of 256 descendant leaf nodes.

Going Forward

Text classification is a classic example of an NLP task, but it’s not terribly challenging or interesting at this point, at least not when the data is relatively clean. Going forward, I’ll be looking at using the MLTM in conjunction with recurrent neural networks (RNN) for machine translation tasks (e.g. translating English sentences to French).


Embed from Getty Images

And so the time draws near, that inescapable fate of the writer. The time for inevitable rejection.

I’m finally approaching the finish line of my 8th manuscript, PLAGUE OF CATACLYSMS. When I say the finish line, I hope I mean kind-of-the-starting-line. But that, of course, depends on if I see slightly less than 100% rejection.

Ah, rejection, my old friend. Nemesis, really, but we know each other so well that it’s hard to tell the difference at this point. How does a writer deal with that rejection? In my case, poorly.

But! I’ve decided to look for some positive in the rejections. By taking the viewpoint that in getting rejected, that means I’ve tried, I gave it my best, and that’s more than many people have accomplished? No.

No, gross! Losers always whine about their best…

Instead, I’ve created a document where I can paste all the “encouraging rejections” I’ve received. The few (and it’s very few) that include positive personalized feedback or encouragement. I’ve also augmented it to include feedback I’ve received from industry professionals through freelance editing hires or charity auctions. As I gear up for the first round of queries for PoC later this month (yes, I realize the acronym is a bit awkward nowthesedays, but the title is awesome so 🤷‍♂️) I thought I’d share a few of the entries.

A rejection from the slush pile for my previous MS, THE OBSIDIAN PYRAMIDS:

Thank you so much for giving me the chance to consider THE OBSIDIAN PYRAMIDS. I was excited by your query and the premise of your book. It’s clear that you’ve devoted a lot of hard work to this project, and your passion comes through in your writing. However, while there is a lot to be commended, I struggled to connect with the manuscript in a meaningful way, and therefore don’t believe that I would be the most effective champion for your book.

OK, not exactly glowing praise, but something positive, at least. More than a form letter, ya know? At least she recognized I tried hard and gave it my best 🙄

From an agent I pitched TOP to at a conference, who gave me a full MS request:

Thanks so much for reaching out and for the opportunity to read your book, The Obsidian Pyramids.

You have such a strong narrative voice and I liked the amount of thought and detail that you applied to your world-building, especially when describing the ancient ruins and the power of the pyramids. Your writing has a very cinematic feel to it, which made for an engaging read.

However, while there was so much to love about your book, I found that the amount of information and backstory overshadowed Alaeric’s character. I unfortunately did not emotionally connect with him [sic] character as much as I had hoped to in order for me to take this project on in such a crowded market. I am so sorry that I do not have more positive news for you! But please know that I sincerely do like your writing style and believe that you are very talented! 

Well, at least this agent recognized I’ve got talent! And a strong narrative voice!

I also participated in Pitch Wars, and one of the writers I submitted standard materials to sent me a brief email, even though she didn’t request the full MS:

Hi! I just wanted to drop you an encouraging note to let you know that The OBSIDIAN PYRAMIDS was in my top twenty. I thought the concept was really cool (and you wrote a great query — kudos!), and I wish you all the luck with it! I hope you’ll keep me posted.

I think she received around 200 submissions, so top twenty is pretty solid.

A few months later, I received a full MS request from the slush pile for TOP, which led to… my very first R&R! The original response to my full submission:

Thanks for sending OBSIDIAN PYRAMIDS! I think the concept is fantastic, and I love the setting. There’s so much potential here! That said, I got a bit distracted by Alaeric’s internal dialogue and the balance between description and external dialogue. Just to expand a bit, Aeleric spends a good amount of time in his head asking himself clarifying questions about what’s going on. It’s always best to avoid that internal dialogue and either nudge the reader to ask those questions themselves or show that the character would be thinking them through action, physical response, dialogue, etc. The less time in a character’s head (“telling) and the more time showing through action, the better. Secondly, you have such a rich world here but the balance is much more heavily weighted towards dialogue than description. I’m all for fast pacing, but we do need description to set the scene and help us feel grounded in the world. Description of where we are, who’s doing what in the moment, etc. 

I don’t normally respond with this much information, but the manuscript and concept is absolutely worth it. If you want to revise and resubmit, I’d be happy to take another look. Either way, best of luck and happy writing!

And the response to my revision and resubmission:

Yes, thanks so much for sending OBSIDIAN PYRAMIDS! You really did execute my editorial suggestions, and I think the manuscript is much stronger for it. Unfortunately, though, I’m afraid the voice still didn’t capture me in the way that I hoped. I’m going to have to pass, but I really do appreciate your revision work and the opportunity to read. Please don’t be discouraged…publishing is a marathon, not a sprint, and you just need one person to catch that shared vision. Best of luck to you!

Son of a 😡

Fine, well, even if I have a strong, cinematic, narrative voice, I guess it’s not the right voice, at least for that MS.

And so, after that, it was time for me to move on to my next project, PLAGUE OF CATACLYSMS. No rejections yet, encouraging or otherwise, since I haven’t started querying, but I’ve hired some freelance editors to critique my drafts, and here are a couple pieces of positive feedback I’ve received:

Editor #1:

Thank you so much for letting me dive into Plague of Cataclysms! It was awesome to experience the deep world that you’ve built and to get to know all of your characters. You have a great eye for detail, for action, you have phenomenal chapter transitions, and as I mentioned before, you can certainly see the influence of Brandon Sanderson. Plus, that ending! What a cliffhanger! The pain of being an editor is wanting book two, and knowing it isn’t written yet!

I loved to see how your three main characters slowly weaved their way to each other. It is one of my favorite aspects of Sanderson’s writing that I think you captured beautifully. I also enjoyed that your characters were very morally grey. No one was necessarily the “good knight” to save the day, but everyone was just trying to survive while handling their own baggage. Those types of characters truly speak to readers, so it is phenomenal that you’ve built that into all three that we follow.

Editor #2:

I had a chance to sit down with your work this afternoon, and honestly, I’m so impressed. This first chapter of yours is in truly excellent shape, and your query letter is probably only a revision or two away from being ready to go.

I know that I promised you an editorial letter that addresses any big-picture issues that I spot in your first chapter, but… I don’t think there are any. This isn’t to say that I wouldn’t find any big-picture issues with the manuscript as a whole, if I were to read more, but the chapter that I did read is in stellar shape. Your narrative voice is so strong, and the details of your world, both large and small, are incredibly clear.

Call me Dr. Strong Narrative Voice 😎

These particular editors have pretty strong credentials in the publishing industry, so I find these comments especially encouraging, though they still found plenty of constructive criticisms to give me, which I’m still working on incorporating into my revisions.

Even so, rejection is inevitable. Inevitable, yet painful. I already have my next MS planned out, and I expect to start writing the first draft within a week or two of sending off the first round of queries for PoC, so if nothing else, I hope some of my rejections are encouraging. Preferably, something else: an offer from an awesome lit agent.

Introducing Thoth: A Manuscript Evaluation Tool

I’ve been tinkering around with code to analyze my manuscripts for a few years, and I finally got serious enough about it to build a real-life application. I named the app Thoth after the Egyptian god of writing and magic (among other things).

After writing several bad books and getting helpful (though sometimes painful) feedback, I realized I had a number of tendencies that showed up as weak writing. I also figured since I’m a heavily experienced programmer, I could make my own revision process easier by setting up some logic to analyze my manuscripts and identify those weaknesses with fancy charts and whatnot.

I believe other writers could benefit from my code, and so I’ve released the initial beta version of the application here. It’s free to download and use!

Admittedly, it’s far from perfect. With my writing, I’m typically very reluctant to let anyone else see it until I’ve spent months revising it. This is essentially a first draft, and as we all know, all software has bugs. Mine is no exception. The format of the PDF report generated could be cleaner, and the text I coded in could be better written. Also, I wish the download process was a little faster and easier. (It’s not really that bad, I promise!)

I plan to work on improving on these weaknesses, as well as adding more features in the future. But cut me a break here, please – you have no idea how much fucking time I spent on Stackoverflow trying to figure out why matplotlib was crashing the app and why pyinstaller and plotly don’t play so nice together. ON MY BIRTHDAY NO LESS.

Even so, I think other writers should give it a try. Oh, hey, did I mention it’s TOTALLY AND COMPLETELY FREE.

Here are the reports generated within the PDF file:

  • Dialogue %
  • # of dialogue beats
  • Sentence fragments
  • Repetitive cadences
  • Unusual narrative punctuation
  • Adjectives/adverbs
  • ‘Crutch’ words (that, just, etc.)
  • Filter words

Please give it a try 😁

Debunking COVID Ghost Stories

It is October, the spookiest month of the year. Halloween is nigh.

In the spirit of the holiday, many people have been posting their favorite horror movies on Twitter. I’m not a huge fan of the horror genre, either movies or books, though. Except, wait… the first manuscript I ever wrote was a horror novel 🤷‍♂️.

Oh well, anyway, instead of sharing my favorite horror movies, I decided to share my favorite COVID scary stories, where things go bump in the night – no, wait, that’s me trying to walk around drunk in the dark – where the virus spreads like… whatever it is that turns people into zombies. The kind of stories meant to strike fear into the hearts (and lungs) of children and adults alike.

Except, if you’re a scientist, well… they’re not very scary.

We start with the month preceding October, not so long ago, when Summer turns to Autumn, the leaves begin to change colors, and Halloween draws closer. Bloomberg published this:

In a “study,” survey respondents reported lingering symptoms even months after recovering from COVID. Symptoms like fatigue, heart palpitations, dry or peeling skin 🤔, feeling irritable 😕, and hair loss 🤨. Hair loss! Oh no, I must’ve contracted COVID ten years ago!

Dig a little deeper and you’ll find that this “study” was conducted by putting out an ad on Facebook, asking for anyone who had some medical problems – and also claimed to have had COVID – to complain about them.


Yeah, sorry, but nothing to see here. Lots of people have medical problems, pandemic or no. This one’s about as scary as a bunny costume – provided you’re not a vengeance demon. (That’s a Buffy reference, btw).

Our next tale of the COVID underworld takes us to Penn State, where researchers tested the Big Ten’s athletes and discovered that among those who tested positive with COVID, a third of them had myocarditis (inflammation of the heart), which CAN BE FATAL.

Well, as it turns out, water can be fatal, too. Also, as it turns out, the numbers were misreported:

The actual study, not conducted by the person who the original report was based on, found the actual number at around 15%. Still bad, right! Um, maybe? Maybe not? What’s the problem here?

THERE’S NO CONTROL GROUP. We have no idea how many athletes who didn’t get COVID would appear to have myocarditis.

Guys, if you’re gonna play in Texas, you gotta have a fiddle in the band, and if you want to do real science, you gotta have a control group.

Additionally, it’s believed that seasonal flu, which is clearly not COVID, may cause myocarditis in up to 9% of patients. I think it’s also plausible that young, elite athletes may occasionally develop myocarditis simply from the rigors they put their bodies through. We simply have no reason to believe these results are scary from this study alone. I rate this story 1.5 candy corns out of 5.

Now for a study that actually did seem scary at first: An observational study of 100 post-COVID patients in Germany, where 78 were found to have heart abnormalities. 78%, horrifying!

But here’s the thing – some of the comparisons they made were between COVID patients, many of whom were older and had pre-existing conditions, and young, healthy people. That’s not appropriate use of a CONTROL GROUP. When they looked at non-COVID patients with similar risk factors to the COVID patients, they still found a number of heart abnormalities. Furthermore, for the metrics they used to measure heart health, the post-COVID patients still had values considered to be within mostly healthy ranges.

Here’s a quick rundown from an actual expert (unlike me) on this stuff :

All right, we’ve survived our COVID haunted house so far. Not nearly as unpleasant as this:

One last ghost story, and I promise so much candy for everyone we’ll all puke for hours.

Bloomberg Opinion published another scare story recently:

The column references two studies: the German heart study I mentioned above, and an observational study carried out in China. I hadn’t seen the latter before, but from a quick skim of this paper, I see some important points:

  • Every single person in this study was hospitalized with COVID
  • There was a significant decline rate from the study invitees (~25%); I would guess the decliners were healthy enough to turn down medical care
  • Patients with abnormal lung CT scans were much older than those with normal scans
  • Of those patients with abnormal lung CT scans, only about a third actually had abnormal lung functionality

Putting all of these together, it seems very likely that (almost) everyone in the group with any real problems post-COVID was ALREADY QUITE SICK before they contracted the disease.

Now don’t get me wrong, none of this proves there are no long-term negative consequences for people who recover from COVID. Plenty of people are getting sick for real, and it does appear the (mild for most) symptoms for COVID last for much longer than the flu and common cold. But I still have yet to see a CREDIBLE scientific study which shows that a significant percentage of otherwise healthy people continue to suffer from serious medical conditions well after recovery.

Also, let’s understand that, in all likelihood, nearly 20% of Americans have been infected by COVID at this point, and most of those people are unaware they’ve actually had it. If a really large percentage of us were likely to experience serious medical problems after recovery, I’d expect to have already seen reports of massive waves of unexplained medical problems. Maybe we will, but you should never assume something will happen when there’s no evidence for it.

Ok, I’m done for now. Hopefully we’ve all had fun scaring ourselves for no good reason (yes, yes, people die from COVID, I’m aware. I’m specifically talking about people that recover here). Now it’s time for me to dig into my candy stash (and my Miller Lite stash, of course).

Population Immunity vs COVID-19 Spread Rate, Cont’d

In my previous post I demonstrated a strong negative correlation between cumulative COVID cases and the Rt (current rate of reproduction of the virus) on a countywide basis in the US. I mentioned, though, that my quick and dirty data analysis was incomplete – a univariate analysis can be misleading if there are confounding factors. In this post, I expand the data to a multivariate model to examine possibly correlated factors.

For those who think the hypothesis I expressed in my previous posts (that population immunity is the primary factor determining Rt) is wrong, there are two likely counterarguments:

  • People in regions more strongly impacted by COVID take it more seriously, leading to more social distancing and mask wearing.
  • Mask mandates have generally been introduced in places with high case loads, and its those mandates that are mostly responsible for the reduction in spread.

The multivariate model I present here includes 3 new factors:

The base dataset and date ranges remain the same as in my previous post. Each datapoint corresponds to one major county in America every 4th Tuesday. Here are the results from an OLS:

Multiple linear regression results predicting Rt by county. High Cumulative case /capita and mask mandates both show strong statistical significance in lowering Rt. Current cases and social mobility fail to pass the significance test (p=0.197,0.225).

The R2 value isn’t terribly high, so we need to be careful about making strong conclusions (lower R2 indicates a lot is left unexplained). But the results do suggest some meaningful takeaways:

  • My view remains unchallenged. Even after accounting for social mobility and mask wearing (mandates), the cumulative case rate (which is associated with population-level immunity) is by far the best predictor of Rt.
  • Surprisingly, the coefficient for mobility is negative – which would imply that higher mobility leads to reduced virus spread if the coefficient was statistically significant (p<0.05). It is not. This could suggest that Stay at Home/Shelter in Place orders may have been worthless for containing the spread in America, though I suspect it might instead mean that the mobility data doesn’t accurately measure what’s meaningful in spreading the virus. I think weather may play a major role here – if people are going places, but staying outside during the summer, that would be effectively quite different from traveling places in the winter, when gathering must occur indoors in much of the nation.
  • Mask mandates do appear to contribute to reducing the spread, though they don’t guarantee anything.
  • Current Case /Capita does not seem to matter (another indication that people in current hot spots don’t adjust their behavior, thereby causing a reduction in Rt).

Some notes and caveats:

  • I normalized all input variables to have zero mean and stdev=1
  • I smoothed the Apple mobility indexes with a 14-day moving average, then computed a single score by averaging the walking/driving/transit scores. This score may not give the best predictor of Rt, but I wanted to keep it as simple as possible.
  • Ideally, the mobility data would’ve been measured year-over-year, instead of indexed in January. But this is the data I have.
  • I should’ve smoothed the Current Cases /Capita with a moving average, but I got lazy.
  • The statistical significance numbers are likely to be modestly overstated, as the data is likely not all 100% independent (bordering counties, repeated points from same location 4 weeks apart).
  • I got most of the dates for the beginning of statewide mask mandates here, though I had to Google 2 or 3 missing dates. Some counties/cities had mask mandates before their states implemented them, but I’m not sure I’m willing to put in the time/effort to collect that data.

Population Immunity vs COVID-19 Spread Rate

In my last post, I mentioned the idea that population immunity, or the total % of the population that has been infected, is a major determinant of COVID-19 spread. I displayed a chart of daily new cases in NYC and compared it to social mobility data, showing an apparent negative correlation between mobility and cases. My assertion is that population-level immunity is more important than many other factors in determining how fast the virus spreads. I’d like to add a little more support for that view here.

Another piece of anecdotal evidence comes from my second home, Los Angeles County:

From the above chart, you can see that the daily new case count peaked in mid July, even though lockdowns were enforced beginning in March, and a mask mandate has been in place since May. Yet from mid July, new cases have been steadily plummeting, even with little or no decrease in mobility since that time:

Now, anecdotal evidence is all well and good, but I much prefer statistical evidence when available, so I pulled some county-level data from a COVID tracking website, with estimates for the Rt value by U.S. county for each date during the crisis.

*I’ll note before giving the results that a more complete analysis than I’ve done would incorporate multiple variables (e.g. mask usage, mobility) to ensure I’m not picking up on secondary effects from correlated variables. Perhaps I’ll look at doing that in the future, but that requires substantially more work.*

(Update: My follow-up post looks at a more complete model).

I filtered the data to select only counties with a population of at least 250k, which gave me a total of 273 counties. I looked at the (smoothed) Rt values for every Tuesday during the crises, comparing them to the % of the population that had tested positive for COVID by that date. Here’s a scatter plot:

The correlation between these 2 variables is -0.52. Of course, there are many other factors that determine Rt, some of which are mostly random, but population infection rate (immunity) is clearly a large factor. Note that everyone agrees the total number of infected is much greater than the number of cases, though the ratio varies by region. With a 10x multiplier (typical for the U.S., I think) a 2% case rate implies 20% total infected.

Here’s a box plot comparing Rt for all instances above/below a threshold of 2% total case rate:

A statistical comparison of the 2 datasets gives:

The significance stats are somewhat overstated, as successive Tuesday’s numbers for each county will not be truly independent. But I’ve tried running these analyses by “undersampling” the dates (e.g. only using 1 Tuesday per month, or even less), and I still saw strong significance in all tests.

As I mentioned in the previous post regarding NYC, these high case rates don’t indicate real herd immunity. Instead, I suggest we stop thinking about herd immunity as a binary concept, and realize that for places with low population immunity, suppressing the spread is incredibly difficult, regardless of social distancing, masks, etc.

Immune, But For How Long?

I believe we can now be confident that immunity from COVID-19 lasts for at least 6 months, whether an infection becomes symptomatic or not.

During much of the pandemic, all kinds of doomsayers and worry-warts have cried about COVID immunity disappearing. Here’s a paper that shows recovered patients that never developed symptoms were far more likely to lose their antibodies within 3 months than patients that got sick. With asymptomatic infections currently estimated at 40% of all infections, that could be a real concern.

We’ve also now seen several documented cases of legitimate reinfections:

Recently, though, a team of Chinese scientists published a paper that studied symptomatic COVID patients, showing that antibodies were still detectable 6 months after infection. But that still didn’t answer the question of lasting immunity in asymptomatic patients.

I believe we can safely say immunity will last for 6 months or longer in almost all people who are infected by SARS-CoV-2. Here’s why.

Back in April/May, I had a series of python scripts I ran daily which generated charts from curated COVID data in the U.S. One major phenomenon I noticed was that the trends in deaths/cases in New York were diverging greatly with the rest of the nation. New York deaths steadily dropped, while deaths in the rest of the nation continued to increase for quite a while before they finally peaked.

Given everything we knew at the time, I found this surprising at first. How was it that New York was able to get control of this, given all their inherent disadvantages, while the virus continued to spread around the rest of the country? Were people in NYC social distancing more? Was Andrew Cuomo some kind of hero?

No. Andrew Cuomo is neither a hero nor a competent governor. My hypothesis at the time, which I now believe has shown to be true, was that NYC had reached a level of public immunity necessary to keep the Rt of the virus below 1.0 (update: some more evidence on this). This is not to say they actually reached true herd immunity (which is what I originally thought before any of the seroprevalence studies were published). If NYC were to go back to completely normal, they would almost certainly see a surge in cases. But with some levels of social distancing, they have enough immunity to keep cases from surging.

Here’s a chart of NYC cases over time, smoothed by applying a 7-day moving average:

Now, some people will continue to argue that the real reason cases were brought down and remain low is that NYC is still locked down and citizens are still exercising extreme social distancing. What does the mobility data for NYC, provided by Apple, say about that?

Note that mobility in NYC hit bottom about a month before cases peaked. After the peak, as cases continued to decline, mobility continued to increase. (A couple of caveats: it would be better if the mobility were measured as year-over-year, instead of indexed to Jan 1. Also, the cases shown are impacted by testing availability).

What’s especially important is that even 6 full months after the mobility trough, we haven’t seen any real surge in COVID cases in NYC. If a significant percentage of people infected with COVID were to lose their immunity after 6 months or less, symptomatic or not, we’d almost certainly see some surge in NYC by now.

Given that the number of genuine reinfections worldwide has been limited so far, I’m now guessing that immunity will last a year or more. Of course, that’s less certain at this time.

Not to Get Too Political, But…

I will. Maybe just this once.

This is supposed to be a SF blog. Now, you might think SF stands for sci-fi, but that’s not quite right. The well know acronym for sci-fi & fantasy is SFF, which I see as two distinct (though sometimes overlapping, I guess) genres. I write fantasy, but not much sci-fi. I do sometimes talk about actual science, though, so SF = science & fantasy. Not to be confused with science fantasy, which is a blending of the two genres, as in Star Wars.

Anyway, back to the politics! With a dash of science!

So you may have heard of this guy named Trump. This post really isn’t about him, it’ll just seem like it to begin with. Believe me, I’m as sick of him as you are. Anywho, last night I saw a tweet pop up in my timeline that riled me up a little bit:

I’m no fan of Trump, and I believe he’s handled the pandemic poorly, like he’s handled most aspects of his presidency poorly. But the idea that governors and mayors, be they Democrats or Republicans, have failed so badly only because Trump hid information from them is ludicrous.

So I snarked back just a bit:

Yes, the reply tweet with my Amazon receipts is mostly a joke. Even so…

Let’s go through a timeline of events from January to early March. I bet I can convince you that you don’t need to be a very unstable genius like me to see that state and local government leaders should have seen this coming, with or without Trump’s actions.

Note that all information laid out below, to the best of my knowledge, was publicly available to everyone on the date provided, not retrieved from a secret government database months later. Oh, except the irrelevant and stupid personal references.

Jan 7, 2020: China announces a cluster of pneumonia cases attributed to a novel coronavirus.

Jan 15: Japan reports a confirmed case of COVID-19.

Jan 20: The U.S. reports its first confirmed case of COVID-19, a man who recently traveled to Wuhan, Hubei Province, China. South Korea reports its first confirmed case, a Chinese woman.

Jan 21: Taiwan reports first confirmed case.

Jan 23: Strict Wuhan lockdown begins. South Korea reports its first case in a resident.

Jan 28: Taiwan reports its first case of local transmission.

Jan 30: The U.S. reports its first case of local transmission, from a man to his wife in Chicago.

Jan 31: Spain reports its first confirmed case, a German tourist. Italy reports 2 cases in Rome, a pair of Chinese tourists. Italy suspends travel to/from China. The U.S. announces travel restrictions to/from China.

Even before February, we’ve already seen reports of local transmission in 3 nations outside of China, and cases in multiple European countries. Think about what that means, given that most nations lacked reliable tests at that point.

Also Jan 31: Paper published in The Lancet estimating the R0 of SARS-CoV-2 to be above 2.5. This is MUCH higher than seasonal flu, meaning it’s far more infectious and spreads faster and easier (you probably already know that by now). Later estimates would place the value even higher.


Feb 1: Hong Kong announces that a man who has recently traveled on the Diamond Princess has tested positive for COVID-19. In the following days, after the ship was quarantined, hundreds of passengers would test positive, even though nearly half of the patients had no symptoms at the time.

Feb 4: South Korea suspends travel to/from Hubei Province, China.

Feb 5: South Korea announces a new total of 19 cases, sourced from at least 3 different nations excluding China.

Feb 7: Kevin flies from Los Angeles to Milwaukee, getting plenty drunk in the process. But he may have imbibed more than just alcohol that day…

Feb 13: Bloomberg reports that Ira Longini, an adviser to the World Health Organization who tracked studies of the virus’s transmissibility in China,
estimates that 2/3 of the global population could be infected by SARS-CoV-2.

I can’t imagine why no cases, they’d performed all of 0 tests at that point.

Feb 14: Kevin watches Contagion for the first time ever… WHILE HE WAS SICK, MIND YOU, I WONDER IF THAT COULD BE RELEVANT IN SOME WAY.

Feb 19: Iran announces a cluster of confirmed cases in Qom.

Feb 21: Italy reports its first cluster of local cases (northern Italy).

We’ve already observed clusters of major spread in many parts of the world, even as major nations like the U.S. were failing/refusing to test anyone who hadn’t recently traveled to Wuhan, China, regardless of symptoms. By now, if not earlier (yes, earlier), you should be able to see that the cat is out of the bag, or the genie is out of bottle, or I’m out of booze, or something like that.

Feb 23: Kevin watches Outbreak for the first time since he was a teenager. (it’s just as good as he remembers)

Feb 24: Nancy Pelosi visits San Francisco’s Chinatown, downplaying concerns about the safety of doing so.

I’m trying to be fair and balanced here, just like Fox News.
It’s now estimated that well in excess of 10,000 New Yorkers were infected before March 2nd.

Was that trip down memory lane fun for you? Hope so.

I don’t claim to have predicted that here in September, many of us would still be working from home, or that bars & restaurants would still be closed by government mandate in many places in the U.S. By February 14th, when I had my Valentine’s date with Contagion, I had fully accepted that I would get COVID this year (assuming I didn’t have it already). The infection fatality rate (IFR) for COVID is clearly much higher than the flu, but it now appears to be well under 1%. We know (have known since January) that the IFR is heavily age dependent, and anyone under 65 is highly unlikely to die or even need to be hospitalized. I honestly thought we’d try a few extra precautions, get people to wash their hands more, maybe wear masks at times, then just power through. Hoo boy was I wrong about that part.

On another point, I don’t believe that earlier lockdowns would have been beneficial anywhere in the U.S. except for New York City (eh, maybe Detroit/Chicago). In fact, in places that saw very little early spread, I think early lockdowns may have been harmful. What we really needed was a lot more testing a lot earlier. Even then, the blame lies more with the CDC and the FDA than POTUS or state/local officials.

But the whole point of this dumb post is that if you think public health officials and governors/mayors couldn’t have seen a major pandemic coming without the information available only to POTUS, well… I have something rather insulting to say, but I think I’ll keep it to myself, just this once.

References for some of timeline information: