Understanding False Positives in Serology Testing: Theory
Test accuracy is a key issue in tracking the virus. In particular, when people discuss testing for antibodies — “serology testing” — there is a lot of talk about false positives, about “sensitivity” and about “specificity.” What does this mean and why does it matter? (Read more about serology testing in our Testing Explainer).
We can start with definitions, focusing on COVID-19 and antibodies (although noting that these general principles apply to any test).
Sensitivity: A “false negative” test refers to a case where someone has antibodies (so they should test positive) but the test doesn’t detect them. The “sensitivity” of a test is a measure of this false negative rate: it is a number which indicates the share of positive people who are actually detected as positive. A test with 99% sensitivity would correctly identify 99 out of 100 people who have antibodies as “positive.” Think sensitive = always picks up when someone has antibodies.
Specificity: A “false positive” refers to a case where someone who does not have antibodies (so they should test negative) are incorrectly identified as positive. The “specificity” of a test is a measure of this false positive rate: it is a number which indicates the share of negative people who are correctly identified as negative. A test with a 99% specificity would correctly identify 99 out of 100 people without antibodies as “negative.” Think specific = never identifies someone without antibodies as having antibodies.
Larger numbers for either of these values indicate a better test. A perfect test would have 100% sensitivity and specificity: it would identify all positive people as positive, and all negative people as negative.
COVID-19 antibody tests are not perfect, and the quality differs across tests. A team at the University of San Francisco helpfully evaluated a large number of commercially available tests, looking at how they do on both false negative and false positive detection. The FDA has also released a summary of performance on all FDA authorized antibody tests.
Most of these tests identify between 80% and 100% of positive cases. That is, among 100 people with antibodies, some of the tests find virtually all of them, and others find only about 80, incorrectly classifying the other 20 as negative.
On the flip side, the false positive rate varied between 0% and 16%. That is, among 100 people without antibodies, some of the tests incorrectly identified as many as 16 of them as actually having antibodies.
If you’re planning to get an antibody test for yourself, the FDA provides a useful calculator where users can enter the estimated prevalence of SARS-CoV-2 antibodies in the target population as well as the sensitivity and specificity rates of the test (these numbers are provided on their website). The estimated performance of a single test or of two different tests is output and can be compared.
We should be clear: all of these tests are being commercially sold and used. None of these tests are perfect, and the worse performing ones are really not good at all. But why does it matter? To see this, we need to go a little bit more into the statistics.
Digging into the Statistics.
Let’s imagine we have one of these tests and it has a 99% sensitivity — meaning it correctly finds 99 out of 100 people who do have COVID-19 antibodies — and a 95% specificity, meaning it correctly identified 95 out of 100 people without antibodies. Flipping it around, 1 out of 100 people who are positive would be (incorrectly) flagged as negative, and 5 out of 100 people who do not have antibodies would be (incorrectly) flagged as positive.
There are really two things you might want to do with this test. One is to inform people about their own antibody status, and the other is to figure out the population-level antibody rate.
The key issue is that when a disease is rare, and COVID-19 still is in many places, even small errors in testing can make a big difference in these conclusions.
To see why, let’s do some numbers.
Imagine that we have a population which has been relatively unaffected by COVID-19 (for example, a rural population in the middle of the US), in which the true exposure rate is 1 in 200 people. To make it simple, let’s say there are 100,000 people in the population so, in reality, 500 of them have antibodies to the virus. If we had a perfect serology test, then we’d detect a positive rate of 0.5%, or 500 people out of 100,000.
Now we have our actual test, with a sensitivity of 99% and a specificity of 95%. This is similar — maybe a bit better — than many of the tests we actually have.
The 99% sensitivity means that of the 500 people who have antibodies, we correctly identify 495 of them. Good job!
Of the 99,500 people without antibodies, we correctly identify 95% of them as negative, or 94,525. Also great! But this means out of the negative people, we incorrectly classify 4,975 of them as “positive.”
Now let’s go to our questions. First, we want to use our data to calculate the share of people in the population who are positive. Remember, the truth is 1 in 200 people, but we don’t know that! That’s why we are getting the data. The simplest thing would be to take all the positive tests (that’s 495+4,975) and divide by the total population (100,000 people). If you do that, you will get a prevalence rate of 5.5%, which is way too high. The true rate is 0.5% in this case!
What happened? The problem is that because almost everyone is negative, even a small false positive rate here means that most of the people who test positive are mistakes.
This also makes it hard for people to rely on their antibody results. Among the people who test positive for antibodies with this hypothetical test, only about 10% of them actually have antibodies. The rest are false positives. This really matters if people with “false positives” base their activities around that test result, they may take risks they may not otherwise take thinking that they are relatively more protected. This could endanger themselves and others; they may expose themselves to the virus under the assumption they are immune, resulting in illness, and/or they could spread the virus to others.
If you are only interested in using your serology tests to figure out the prevalence in an overall population, these problems are less bad because you can (sort of) fix them. For example, if you knew the sensitivity and specificity of your test for sure, you could reverse my calculations above and fix your estimates. However: doing so requires knowing for sure the performance of the test. And with a new virus, with new tests, we do not always know. And when the prevalence is low, small differences in assumptions about these values will really, really change the results.
The main solution here is better testing. Some of the serology tests are better than others; focusing on using those will improve our understanding of antibodies against the virus. But this problem is always likely to be with us to some extent.