There are now rapidly increasing numbers of (non-peer reviewed) pre-prints and (peer-reviewed) published papers describing the application of wastewater-based epidemiology for COVID-19. This involves measuring the presence of SARS-CoV-2 RNA in municipal wastewater (sewage) and using those measurements to infer spatial and temporal patterns of infection in the community.
Some of the most high profile reports so-far have been from The Netherlands, Brisbane (Australia), Paris (France), Massachusetts (USA), Montana (USA), Milan and Rome (Italy) Milan (Italy), and Spain. Although there is great variability among the methods applied and the way the data are interpreted, the evidence base for the general viability of wastewater-based epidemiology for COVID-19 has grown with each of these reports.
A new pre-print paper from Yale University (USA) was recently added to the list and is making quite a splash on social media. This paper is titled “SARS-CoV-2 RNA concentrations in primary municipal sewage sludge as a leading indicator of COVID-19 outbreak dynamics”.
The most novel aspect of this paper is that the researchers chose to use primary sludge as the medium from which to extract and measure SARS-CoV-2 RNA. This is different to all of the other papers, which have focused their efforts on raw wastewater (untreated sewage, usually from the inlet works for a sewage treatment plant).
Primary sludge is a product of primary wastewater treatment, and consists of a water solution carrying a higher load of suspended solid material, which has been separated from the main wastewater stream by settling under gravity. The paper states that the samples were collected at the outlet of a gravity thickener, ranging in solids content of 2.6% to 5%.
A possible advantage of using primary sludge, is that the primary treatment and sludge collection processes may involve a degree of mixing, beyond that to which the raw wastewater may have been exposed. This is potentially helpful since it may lead to an “averaging” of an otherwise highly variable signal. If some of the (effectively random) variability can be removed from the samples, quantitation may become more meaningful and easier to interpret.
It may also be that the RNA is more concentrated in the sludge than in the raw wastewater. This would depend on partitioning of the RNA to the solids material that is settled under gravity. The method states that 2.5 mL of well mixed sludge were added directly to a commercial kit optimised for isolation of total RNA from soil.
It’s difficult to compare between the published studies, since there are many confounding factors to consider. However, the authors state that “Due to the elevated solids content and the high case load observed during the outbreak (~1,200 per 100,000 population), the concentrations of SARS-CoV-2 RNA reported here ranged from two to three orders of magnitude greater than raw wastewater SARS-CoV-2 values previously reported”.
Working with more complex matrices (such as sludge) also has disadvantages, one of which is often a higher detection limit than can be achieved with cleaner matrices. The authors state “SARS-CoV-2 viral RNA was detectable in all samples tested and ranged from 1.7 x 103 virus RNA copies mL-1 to 4.6 x 105 virus RNA copies mL-1. The lower concentration in this range corresponds to a qRT-PCR cycle threshold (CT) value of 38.75 and can be considered a detection threshold for this method and sludge matrix”.
But these methodological details are not the reason why this pre-print paper has attracted so much attention on social media. Instead, it is the claim presented in the paper that this approach to wastewater-based epidemiology can provide a highly accurate 7-day leading indicator of COVID-19 clinical testing data and a 3-day leading indicator of hospital admissions.
In particular, a highly smoothed curve (using LOWESS smoothing) comparing VIRUS RNA concentrations (per mL) with clinical reporting of new cases of COVID-19 appears very impressive and has been widely shared.
It seems apparent that this smoothing technique does require some reasonable amount of consecutive data to achieve. So whether it can be effectively achieved 7 days ahead of clinical testing data is unclear. From my understanding, standard LOWESS smoothing involves averaging several datapoints around each datapoint. This type of “prediction”, could this only be done retrospectively. Restrospective prediction would be of more limited value than real-time prediction. Furthermore, there is a question of how quickly this data could be acquired and processed, -which would further limit the predictive usefulness.
Note: The quoted R=0.994 in the paper and the above tweet are almost certainly not meaningful since this appears to have been acquired from from (at least one of ) the smoothed curves. Hopefully some of these statistical limitations (and over-statements) will be ironed out during peer review.
We’d be interested in your thoughts!
Peccia, J., Zulli, A., Brackney, D. E., Grubaugh, N. D., Kaplan, E. H., Casanovas-Massana, A., Ko, A. I., Malik, A. A., Wang, D., Wang, M., Weinberger, D. M. and Omer, S. B. (2020) SARS-CoV-2 RNA concentrations in primary municipal sewage sludge as a leading indicator of COVID-19 outbreak dynamics. medRxiv, 2020.05.19.20105999.