Sunday, June 28, 2020

A deeper dive into COVID data: who's getting sick and dying, how that's changing, and what it might mean

Everyone no doubt knows that older people are at greater risk of severe complications and death from COVID-19 than are younger people. Most people have probably also heard that men are at greater risk than women. And many people have also probably heard that the age demographics of who's getting infected have been shifting toward younger people.

Let's look more closely at these trends with the numbers from the Ohio Department of Health.

First, here is a graph of the per capita case rates broken down by sex and age group:


Remember, these are known cases of COVID, so there may be biases that influence how likely people from different groups are to get tested if they are in fact infected. For example, older people are more likely to develop symptoms, and so would be more likely to go get a test.

The striking trends from this are that people over 80 are much more likely to have a known case, people under 20 are much, much less likely, the other age groups are pretty flat at a level between the oldest and youngest, and for most age groups, men are more likely than women to have a known case.

To see how likely it is that different groups develop more severe symptoms, let's look at the same graph but for hospitalizations:


Here there is a very clear increasing trend with age, as well as a clear tendency in the older age ranges for men to have higher rates of hospitalization for COVID.

Last, deaths:


Here, the bias toward worse outcomes in older people becomes far more dramatic, and the bias toward worse outcomes in men remains.

Let's also look at the rates of hospitalization and death among people with known COVID cases for the different groups.

Hospitalizations:


Deaths:


So, similar trends in terms of age and sex biases as the previous graphs.

Note that among men at least 80 years in age known to have had COVID in Ohio, a staggering four out of ten have died. That is pretty disturbing and really drives home how important it is to protect our older friends, neighbors, and relatives from this disease.

Some people look at these trends and think that younger people don't have to worry about COVID. That's a, to put it bluntly, stupid idea for multiple reasons.

One, young people can easily pass it on to old people who can get really sick and die, and the more people in the overall population who are infected the more likely that is to happen, and it's a really cruel and heartless attitude to think it doesn't matter if a bunch of old people die.

Two, even among young and otherwise healthy people, there is a real chance of severe complications and death, even if it's a small one, but by now there are a lot of young people whose lives have been ended by this virus.

Three, and this is probably the part that is most misunderstood: COVID outcomes aren't a binary between "dies" and "recovers to full, normal health." There are so many people out there, of all ages, who are continuing to have debilitating symptoms months after they were infected. In a lot of these people the effects of COVID might be with them for the rest of their lives even though they didn't die and they "recovered" from the disease. That's not something anyone should want.

For the next part of this post, let's look at how the age demographics of the outbreak have changed over time. I'm sure a lot of people have heard by now that we are seeing a shift toward a younger infected population. Some people are even saying that we shouldn't worry about the case spikes that are currently happening in many parts of the country because it's now mostly younger people getting sick.

This is another stupid idea, but I digress.

Although you may have heard about these trends, do you know what the actual numbers are like? Let's take a look.

Here is a graph of the share of all COVID cases in Ohio falling into each age range, separated by months of the pandemic:


This dramatically illustrates how the age demographics of the outbreak have changed. In March the group with the biggest share of cases was ages 50-59 followed closely by 60-69. There was some slight shifting around in April but then in May and especially into June there has been a huge shift toward younger people. In June the group with the biggest share is 20-29 by a fair margin, 30-39 is second, and even 0-19 has come up to surpass every group 60 and above.

Next let's look at hospitalizations:


The hospitalizations graph illustrates the same shift toward younger demographics but it's more subtle because younger people continue to be less likely to be hospitalized. Still, if you compare June to the earlier months you can clearly see the below 40 age groups coming up and the above 50 age groups coming down.

And last, the shares of deaths:


Here there honestly aren't any major changes, because the death rates for younger people remain much lower. However, as deaths are a lagging indicator and the case spikes in young people are currently getting worse, I'd expect a bit of a shift in this graph for July.

Let's also look at a different way of visualizing the data, to show how the absolute numbers are changing over time. Here I combined ages into just three groups because the graphs would be messy and hard to read with eight different lines.


This is the graph of the 7-day moving average of daily cases for (roughly speaking) younger people, middle-aged people, and older people. Note as always that the weird spike in April is from a bunch of cases in prisons all being reported at once. This makes it very clear how the cases in younger people have been continuing to increase while this has not been happening in older people.

And here's the hospitalizations graph:


This shows that the number of older people being hospitalized has come down dramatically from the peak, whereas the number of younger people being hospitalized has remained fairly steady.

Why are the shapes of the cases and hospitalizations curves so different? If cases go up in an age group, shouldn't hospitalizations also go up?

It's because the number of cases is strongly affected by how many tests are being performed, and we have been increasing the number of tests. Hospitalizations, on the other hand, are mainly just a factor of how many people are getting sick.

I think it's possible to make some interesting inferences based on this, which I'll discuss more in a little bit.

The overall conclusions from these trends are that the outbreak is definitely shifting to be more concentrated among younger people. It appears that older people have been better at taking the message to heart that they need to be careful and limit exposure, whereas younger people have become increasingly less careful. And let's be honest, that makes a lot of sense.

As a result of this, in the current outbreaks the death rates are almost certainly going to be lower than in the outbreaks back in March and April. And it is a good thing that the death rates will be lower, I will acknowledge that. But that doesn't mean the current outbreaks are not a problem or that we shouldn't be very worried. We should be very worried for reasons I've already discussed and will discuss more in the rest of this post.

Everything I've shown so far has been me just presenting the numbers as they exist. The only manipulation I've done to the numbers is creating moving averages of the daily numbers to show the trends more clearly. Going forward, I will be making some inferences that I think are reasonable and playing around with the numbers in ways that it's important to acknowledge have a fair amount of uncertainty. You definitely shouldn't take anything I'm saying as the gospel truth. And I'm very open to feedback. But I think that by thinking more about what these numbers mean, we can make some interesting, albeit very tentative, conclusions.

What is the current case load in Ohio relative to other times during the pandemic?

The most naive way of answering this question would be to simply look at the number of daily reported cases. If you spend any time looking at graphs of Ohio's COVID numbers, the shape of this curve, a 7-day moving average of the daily reported cases, probably looks familiar:



The numbers are normalized so that 1 is the peak level we have been at thus far (excluding the spike from batch reporting of prison cases). So looking at this, you might think, oh, as of very recently we are at the highest infection rate yet.

But we know that the number of cases strongly depends on the number of tests administered, and the number of tests administered has increased a lot compared to the earlier days of the pandemic. So the number of people being hospitalized is probably a better indication of the current case load:


Here we see from the red curve of hospitalizations that the peak was reached in late March. Then the numbers went down, leveled off for a while, went down again, and then started going back up, but are still well below the peak. (The numbers here use my estimation method for more recent numbers that I detailed in my previous post and that appears to be somewhat more accurate, but you'd see a similar trend if you looked up a traditional graph of the daily reported hospitalization numbers.)

But do the changing demographics of who is infected affect the accuracy of using hospitalization rates to estimate case load? I think it's pretty undeniably the case that they do. Among people who are known to have had COVID in Ohio, someone who is 70 or older is about 8-9 times as likely to have been hospitalized as someone who is younger than 30. Therefore, if there were ten new hospitalizations of people who were younger than 30, that would clearly indicate a higher case load in the overall population than ten new hospitalizations of people who are older than 70.

I attempted to account for this factor by making an adjustment to the hospitalization numbers. For every date, I looked at how many people in each age group were hospitalized, adjusted by the different hospitalization rates of different age groups, and summed the results to get what I would call an "inferred case load." There are no units to this metric; I am not trying to make any judgment about what the actual infected rate in the population is, only about how it's changing over time. Here's a graph showing the results of this adjustment with the blue line that has been added on top of the previous graph:


It was in late April that the demographics of the pandemic began their shift toward younger people and there you can see the blue and red curves diverge from each other. The share of total hospitalizations taken up by younger people grew, implying (relatively speaking) a larger case load than that implied by the hospitalization numbers with no age adjustment.

From this estimate, it appears that the plateau from mid April all the way to late May may have been an illusion. Total cases may have been growing during that time, but concentrated more among young people.

Any way we look at it, there was a real drop in the infected rate toward the end of May.

But then the numbers did start going up again. And with the adjustment, it appears that the numbers may now be going up more sharply than we realize, and we may already be much closer to the peak infection rate thus far.

(In fact, because this is a 7-day moving average and the numbers are continuing to rise, we may already be higher than the peak in late March.)

I don't know exactly why numbers would have been gradually rising from late April to early May and then dropped in late May before starting to rise again. One idea, although it's speculation: at the start of the stay-at-home order everyone was taking things really seriously, which clearly halted the rise and brought the numbers down at first. As more time went on, people started to relax and engage in somewhat more risky behaviors. Especially younger people. Then, in May, when the weather got nicer, people shifted their social activities to the outdoors, where less disease spread occurs.

But then the state made the premature and foolish decision to open a bunch of businesses like bars, indoor restaurants, and gyms that should not be open now, and this led to the ongoing increase in June.

That's just speculation that I think is plausible. Especially for the explanations of the changes during April and May. For the current rise I think it's pretty clear that the premature reopening is a factor, when we put what's happening in Ohio in the context of what's happening elsewhere in the country.

I will reiterate that the inferred case load (blue curve) is an estimate based on methods I think are reasonable, but with a fair amount of uncertainty. It hasn't been peer reviewed or anything like that. You shouldn't take it as something that's definitely or even probably the truth. But I myself would stake out a pretty confident claim that the age-adjusted numbers are at least a closer match to the real case load than the non-age-adjusted hospitalization numbers.

By the way, in my last post, I showed this graph:


And I said:

We have also been in a period of declining hospitalizations. If you are looking at a graph of new hospitalizations by their report date, which is the graph you'd normally see, it looks like the curve is still trending downward. But in the estimated curve, it appears that the downward trend has recently leveled off and (although the most recent estimated numbers contain the most uncertainty) we may be starting to head back up.

What does the same graph look like now?


So when I said, based on my estimate, that the numbers might be starting to head back up even though the graph of reported numbers still seemed to be heading down, it turns out that this was correct. And the reported numbers are now following along and rising as well, with an expected time lag.

What does this all mean for the future course of the pandemic?

As I said, the death rates in the months ahead will be lower than the death rates in the spring. And that is a good thing.

But the idea that we are okay now because the people getting infected trend younger is nonsensical.

The only way we can get back to some semblance of normal life is by successfully containing the virus. Many other countries are doing this. We're utterly failing.

Even if older and more at risk people are doing a better job of staying safe, do we really want to live in a society where anyone who is older or in a high risk group, or who lives with anyone who is older or in a high risk group, has to constantly live in fear and limit their exposure as much as possible for what could be an entire additional year?

Because that's what things will be like if we continue to just let the virus spread unchecked among a now younger skewing infected population. And even then, there will be cases that slip past those safety measures and get to more vulnerable people who will die. And even among younger people, there will be lots of long term debilitating health consequences and some deaths.

The only effective methods of containing the virus without shutting down huge segments of society are widespread mask wearing and contact tracing. For the latter to be effective, cases need to be at a manageable level. If there are so many cases floating around that it's impossible to know about most of them, contact tracing isn't going to make a big difference. That's where we are right now in most of the country.

These trends have also gotten me thinking about what will happen with schools this fall. The low infection rates among children do suggest the possibility that, with appropriate safety measures, primary schools could reopen. But colleges? College students are the demographic where cases are exploding right now because they take it the least seriously. And putting them back in dorms together will inevitably lead to massive outbreaks that will leak out to the rest of the population and make it impossible to get the pandemic under control. There's no way that colleges with students living on campus can safely reopen this fall. Just no way.

We need to back off from reopening (as Texas and Florida are already doing, and Ohio probably will eventually so better sooner than later) and we need to take social distancing and mask wearing seriously. This problem is not getting better with the measures we're currently taking, so if we don't take better measures it's going to continue to be with us for a long time.

No comments:

Post a Comment