Loss of Smell in Coronavirus - The seduction of numbers

Yesterday caught a brief moment of Professor Van-Tam at great pains to explain that loss of smell as a symptom made only a teeny-teeny-teeny-tiny difference to the number of those who could be predicted with #coronavirus.

Now, the thing is, studies have shown this sort of thing:


"For example, a British study released last week collected COVID-19 symptom data from patients through an online app. The data show that almost 60 percent of the 579 users who reported testing positive for the coronavirus said they’d lost their sense of smell and taste. But a significant portion of patients who tested negative for the virus—18 percent of 1,123 people—also reported olfactory and taste troubles."

itchy.png

At first glance this is very confusing - surely if 60% of coronavirus patients report loss of smell, it HAS to be a good predictor even if/especially since that number is much lower (18%) in general (for other conditions & non-conditions)?

Van-Tam seemed to be so adamant about the small predictive qualities of loss of smell, I figured I would think it through carefully and run some numbers.

I decided to imagine that "itchiness" was a new observation and plugged in some numbers to calculate how diagnosis plays out. On the left is a very simple Excel spreadsheet which calculates how many people are in each group based on general percentages. I've used some representative percentages that are in the right ballpark to help make the thing (hopefully) more realistic.

It turns out that even if 60% of covid sufferers report itchiness, it is still a lousy predictor of them having the disease.

So what's going on here?

This is in the same realm as Simpson's paradox, which I discussed the other day:

In this case: A high percentage of small number (itchy with covid) can end up being much more diminutive than a small percentage of a high number (itchy without covid).

When the above observations are taken as individual groups, already KNOWING which group a person belongs to, it's certainly intuitive to draw the conclusion that you have a good predictor in the itchy-with-covid group.

But that's only AFTER the fact.

In reality, to start with, you don't have these groups, you are looking for a predictor in order to actually form them amongst a general population. And that is a different problem.

In total many, many more people who are itchy will actually belong to the itchy-without-covid group, simply because the proportion who do genuinely have covid is a much smaller part of the population. (At least for now).

In my demonstration model, if someone reports being itchy, they are 5.7 times more likely to have something else than #covid19 even though 60% of those who have #covid19 report being itchy!

Notes

It doesn't matter what value to start the population at, it all works out the same. So you can treat "population" as "the number of people who report itchiness that day, or week, or who have done so in the last month" etc.

quote source at: https://www.nationalgeographic.co.uk/science-and-technology/2020/04/lost-your-sense-of-smell-it-may-not-be-coronavirus

also see https://www.the-scientist.com/news-opinion/loss-of-smell-taste-may-be-reliable-predictor-of-covid-19-study-67528

In what order should we return to work?

Continuing the theme of occupational risk analysis, here are some more ways to slice and dice the data.
If you had to restart the economy how would you go about it?

If a client came to me and asked “which departments should I start up, given that I want to do it as safely as possible?” then I’d start by asking for some data: how much does each department contribute (and what is their size) and how risky is their operation.

Then, by slicing and dicing the data accordingly, we look for opportunities which have least risk but best return. All things being equal (which they are not ⚠), finding these kind of opportunities which balance the priorities in the most optimal way usually means a bigger “return” on a small action.

This particular ONS occupational data doesn’t contain size of contribution to the economy (I’m sure somewhere there will be a dataset that could be joined with it), so for now I will illustrate the point with the sector size, as measured by number of workers. And actually, it’s not an unreasonable view: irrespective of their contribution to the economy, the more workers go back to work, the more a sense of some kind of “normality” will pervade for those workers and their families.

If we arrange our data on axes as follows, measuring sector size and risk score, then we would naturally target occupations towards the upper right as large-employers-with-low-risk. This region of the chart is not densely populated, but ultimately working leftwards from the right, anything towards the top (the biggest, greenest marks) would be a reasonable next candidate.

Now, I bet you’re already making some interesting observations from that chart alone. And probably raising some objections to how useful it is. Bear with me.

Let’s look at the next question is raises: if I imagine a number of sectors do return to work, what impact does that have on the economy?

Again, we cannot measure economic contribution in this data set but let’s simply look at the number of workers. If a basic policy was to ask the least-risk workers to return to work first (based on lowest ranking of proximity to others) then we could start at the left of this chart and work rightwards occupation-by-occupation. In doing so, the line charts the the cumulative number of workers that would now be back at work.

It’s interesting we can see a pretty sharp rise in numbers in the first half of the line: there are only a small number of workers with really low proximity risk, but actually there’s a steep rise in sector size after that such that about 50% of the workforce are actually in the lowest half for the proximity risk.
If you wanted to develop a targeted strategy for return to work, you could work along this line and ask these occupations to return to work (if indeed they stopped).
It also begs another interesting question: how far along this line do you need to go to make a significant difference to the functioning of the economy? E.g. if 50% of workers are working at more-or-less normal capacity, how functional is the economy? (is it operating at 10%? 50%? 90%?). We can’t answer that question with this data of course but the answer would be very informative.

Now, of course, this chart is a gross simplification. All of this is not that straightforward, as there are several compounding factors with the #covid19 lockdown situation:

  • key workers are already at work to keep critical national functions running and indeed some of these workers are highest risk

  • many of the lower risk workers who, by definition, have least contact with other people, are actually still working because they can work remotely

  • some occupations which theoretically could restart rely on a market that is not available due to lockdown of movement and change in spending behaviour

So, to improve this analysis, one would need to be able to both remove workers who are already working from the picture and consider what else needs to be in place for some businesses to restart.

Moreover, there is considerable interlock between some occupations

  • some people are prevented from working because they require “childcare” (I’m including school in that loose definition)

  • some people are needed to work regardless because their function is so critical to public health or infrastructure

So, in fact, one would need to map out those interlocks and understand them to really identify which are the optimum occupations that could resume.

Data-led decisions

This would not be necessarily a trivial exercise. But either way, I’m hoping to demonstrate that these types of decisions can be data-led. That actually good data should allow us to make good decisions, even if - or perhaps especially if - they are difficult ones.

In closing, here’s a final (somewhat sizeable) chart which lists all the professions in the data set by, sorted by least proximity to other people (position in list & length of bar represents proximity level). The colour represents the combined risk score based on proximity and frequency of exposure to other people.
Hopefully this is sufficiently readable to find your own occupation (or its nearest equivalent).

Occupational Risk in Relation to Coronavirus COVID-19

I written a fair bit (and analysed a whole lot more) of the COVID19 situation and data but not published here because, frankly, the minute it’s published it’s out of date. Moreover, even using official data sources such as John Hopkins University, there’s a kind of “data entropy” at work, where data volume increases over time, but quality reduces. I could do a whole post on that topic alone, but that’s for another day.

Meanwhile, the Office for National Statistics (ONS) published an intriguing data set that quantified the nearly 400 occupations in the UK and, amongst other things, classified the type of contact with other people that workers had:

  • proximity (ranging from touching to close distance to no close contact with people at all),

  • and exposure ranging from many times a day to weekly/monthly/yearly to never.

This data can be explored interactively on the ONS website but I’ve also tried to produce some static readouts here, although it’s quite a challenge to compress this amount of data into a one-page visualisation! So, you will see a number of variations.

As the debate intensifies over whether to start schools up or not, it’s interesting to note that teachers and classroom assistants are basically in the next tranche of most-at-risk workers, behind the healthcare, police, cleaning and delivery key-workers that have kept critical services running. Many questions still remain (at the time of writing) as to the level of risk posed by the children they will mix with. Children, although seen to be less susceptible themselves to the disease, are certainly not immune.

HOw does pay compare for those potentially most-exposed. Redder, larger, more-to-right = more at risk. Lower down = lower pay.

Coloured by risk and size by percentile (% figure means x% of workers have this risk or higher)

coloured and ordered by risk, Sized by number of workers in sector.

OccuPations sized, sorted and coloured by risk

Occupations sized and sorted by sector size, coloured by risk

Some cautionary notes come with this data:

  • Risk profiles were actually collated from American workers, so difference in process and work-style could mean UK workers have a different profile.

  • The risk profile was devised prior to COVID19 and doesn’t take account of any potential social distances or other safety approaches (e.g PPE) that may be applied to a given occupation. So, in some sense, the risk score indicates what degree of protection could be needed.