Objectives To derive and validate risk prediction algorithms (QCOVID4) to estimate the risk of covid-19 related death and hospital admission in people with a positive SARS-CoV-2 test result during the period when the omicron variant of the virus was predominant in England, and to evaluate performance compared with a high risk cohort from NHS Digital.
Setting QResearch database linked to English national data on covid-19 vaccinations, SARS-CoV-2 test results, hospital admissions, and cancer and mortality data, 11 December 2021 to 31 March 2022, with follow-up to 30 June 2022.
Participants 1.3 million adults in the derivation cohort and 0.15 million adults in the validation cohort, aged 18-100 years, with a positive test result for SARS-CoV-2 infection.
Main outcome measures Primary outcome was covid-19 related death and secondary outcome was hospital admission for covid-19. Risk equations with predictor variables were derived from models fitted in the derivation cohort. Performance was evaluated in a separate validation cohort.
Results Of 1 297 922 people with a positive test result for SARS-CoV-2 infection in the derivation cohort, 18 756 (1.5%) had a covid-19 related hospital admission and 3878 (0.3%) had a covid-19 related death during follow-up. The final QCOVID4 models included age, deprivation score and a range of health and sociodemographic factors, number of covid-19 vaccinations, and previous SARS-CoV-2 infection. The risk of death related to covid-19 was lower among those who had received a covid-19 vaccine, with evidence of a dose-response relation (42% risk reduction associated with one vaccine dose and 92% reduction with four or more doses in men). Previous SARS-CoV-2 infection was associated with a reduction in the risk of covid-19 related death (49% reduction in men). The QCOVID4 algorithm for covid-19 explained 76.0% (95% confidence interval 73.9% to 78.2%) of the variation in time to covid-19 related death in men with a D statistic of 3.65 (3.43 to 3.86) and Harrell’s C statistic of 0.970 (0.962 to 0.979). Results were similar for women. QCOVID4 was well calibrated. QCOVID4 was substantially more efficient than the NHS Digital algorithm for correctly identifying patients at high risk of covid-19 related death. Of the 461 covid-19 related deaths in the validation cohort, 333 (72.2%) were in the QCOVID4 high risk group and 95 (20.6%) in the NHS Digital high risk group.
Conclusion The QCOVID4 risk algorithm, modelled from data during the period when the omicron variant of the SARS-CoV-2 virus was predominant in England, now includes vaccination dose and previous SARS-CoV-2 infection, and predicted covid-19 related death among people with a positive test result. QCOVID4 more accurately identified individuals at the highest levels of absolute risk for targeted interventions than the approach adopted by NHS Digital. QCOVID4 performed well and could be used for targeting treatments for covid-19 disease.
During the first waves of the covid-19 pandemic, before the introduction of vaccines, identifying people at highest risk of severe covid-19 outcomes if they were to become infected with the SARS-CoV-2 virus was necessary. The QCOVID risk assessment tool for predicting risk of covid-19 related death or hospital admission based on individual characteristics was developed,1 independently externally validated in England,2 Wales,3 and Scotland,4 and was found to have performed well in identifying those at high risk of severe outcomes from covid-19. QCOVID was used in England in February 2021 to identify patients at high risk of severe covid-19 outcomes, adding another 1.5 million people to the national list of patients who required shielding. QCOVID was also used for prioritising people for vaccination across the UK (if they had not already been offered the vaccine based on their age or other risk classification).5 The QCOVID model was initially recalibrated during the first pandemic wave1 and then updated after the second and third waves of the pandemic to create two new versions of the model: QCOVID2, based on patients who were not vaccinated,6 and QCOVID3, based on patients who were partially vaccinated.6 These models accounted for changes that had occurred in the virus as well as the implementation of the vaccination programme.6
In December 2021 in the UK, a new wave of covid-19 infections with the omicron variant of the SARS-CoV-2 virus rapidly replaced high circulating levels of the previous delta variant. Although the omicron variant (BA.1) was associated with a lower risk of covid-19 related death than the delta variant,7 further mutations have occurred and concerns have been raised that covid-19 vaccines might become less effective. More treatments are likely to be needed to protect vulnerable individuals, such as antiviral agents and neutralising monoclonal antibodies.8 On 9 December 2021, neutralising monoclonal antibodies became available in the UK for high risk patients with symptoms of SARS-CoV-2 infection who did not require hospital admission.9 Neutralising monoclonal antibodies are a limited resource and hence have been targeted to those at highest risk of poor covid-19 outcomes who are most likely to benefit.910 This strategy was based on a set of clinical conditions associated with a high relative risk of severe outcomes from the published literature,611 combined with clinical judgment about the likelihood of clinical benefit based on the biological mechanism for neutralising monoclonal antibodies.9 Patients with these conditions were then identified from centrally held electronic health records and contacted by NHS Digital in December 2021 to inform them of their potential eligibility for neutralising monoclonal antibodies should they develop symptoms of SARS-CoV-2 infection. The guidance did not, however, account for the cumulative absolute risk associated with multiple comorbidities, age, previous infection, vaccination status, or the new variants of the SARS-CoV-2 virus.
This study was commissioned by the UK’s Department of Health and Social Care. The aims of the study were to develop and validate a new QCOVID risk algorithm (QCOVID4) based on new data from the period when the omicron variant of the SARS-CoV-2 virus was predominant in England, taking into account previous infection with the virus and number of doses of the covid-19 vaccine. We also evaluated the performance of the QCOVID4 algorithm with earlier versions of the risk model developed in the first two waves of the covid-19 pandemic, and with the high risk cohort identified by NHS Digital, based on the relative risks of a list of conditions. We hope that the results will be used to inform ongoing strategies for targeting treatments and other public health interventions, designed to protect those most at risk from covid-19 related death and hospital admission.
We used the QResearch database (version 47) of 12 million current patients with personal, clinical, and drug use data. The database is used for epidemiological112 and drug safety research.1314 QResearch is linked to multiple datasets at the individual patient level. For this analysis, we used the following linked datasets:
National Immunisation Management System (NIMS) database of covid-19 vaccinations to identify data on dates and doses of vaccines for all people who received a vaccine in England
Hospital Episode Statistics (HES) dataset supplemented by the more regularly updated Secondary Users Service (SUS-PLUS) data
Civil registration national data for mortality, with date, and up to 15 causes of death
SARS-CoV-2 infection data (Second Generation Surveillance System (SGSS) and Pillar 2)
Systemic anticancer treatment data
NHS Digital high risk cohort prioritised for new covid-19 treatments in December 2021.
Study design and period for cohort
We undertook a cohort study of all individuals aged 18-100 years who had one or more positive SARS-CoV-2 test results from 11 December 2021 (the date of the first notified patient with the omicron variant of the SARS-CoV-2 virus in the UK) to 31 March 2022 (the date after which widespread free NHS tests for SARS-CoV-2 were not available). Individuals were followed from the date of their first SARS-CoV-2 test result during the study period to the outcome of interest, or to their death or the end of the study on 30 June 2022 (the latest date for which data on mortality and hospital admissions were available).
Outcomes for cohort
The primary outcome was time to covid-19 related death (either in hospital or in the community), recorded in any position on the death certificate, or death within 28 days of a positive SARS-CoV-2 test result. The secondary outcome was time to hospital admission for covid-19, defined as confirmed or suspected covid-19 based on the ICD-10 (international classification of diseases, 10th edition) codes U071 and U072. We used these definitions of outcomes for consistency with other QCOVID algorithms and because they are used for covid-19 related death and hospital admission in the UK.15
Candidate predictor variables were those previously identified as associated with an increased risk of covid-19 related death or admission to hospital for covid-19 from the original QCOVID protocol16 and the published literature.161217 The variables were age, sex, ethnic group (based on categories in the 2021 census of England and Wales, www.ethnicity-facts-figures.service.gov.uk/style-guide/ethnic-groups), Townsend material deprivation score (an area level score based on postcode where higher scores indicate higher levels of deprivation18), number of vaccine doses (none, 1, 2, 3, or ≥4), previous SARS-CoV-2 infection, body mass index,17 residence (care home, homeless, or neither), chronic kidney disease, chemotherapy in the previous 12 months, type 1 or type 2 diabetes (with haemoglobin A1C <59 or ≥59 mmol/mol), blood cancer, bone marrow transplant in the past six months, respiratory cancer, radiotherapy in the past six months, solid organ transplant, chronic obstructive pulmonary disease, asthma, rare lung diseases (cystic fibrosis, bronchiectasis, or alveolitis), pulmonary hypertension or pulmonary fibrosis, coronary heart disease, stroke, atrial fibrillation, heart failure, venous thromboembolism, peripheral vascular disease, congenital heart disease, dementia, Parkinson’s disease, epilepsy, Down’s syndrome, rare neurological conditions (motor neurone disease, multiple sclerosis, myasthenia, or Huntington’s chorea), cerebral palsy, osteoporotic fracture, rheumatoid arthritis or systemic lupus erythematosus, liver cirrhosis, bipolar disorder or schizophrenia, inflammatory bowel disease, sickle cell disease, HIV or AIDS, and severe combined immunodeficiency.
We defined predictors based on information recorded in primary care electronic health records at the start of follow-up (date of the first positive test result for SARS-CoV-2 during the study period), except for data for the number of SARS-CoV-2 infections, covid-19 vaccinations, chemotherapy, radiotherapy, and transplants, which were based on linked secondary care data. For all predictor variables, we used the most recently recorded available value at the cohort entry date.
We used a random sample of 90% of practices to develop the models, and the remaining 10% of practices for validation of the model.6 We developed separate risk models for men and women, with Cox proportional hazard models to calculate hazard ratios for the two outcomes. Data for sex were taken from information in the QResearch database, which are likely to reflect the information reported by the patient to the general practice at registration. We decided not to use the landmarking approach or adjust for infection rate, as in QCOVID3,6 because our target population were those with a positive test result for SARS-CoV2 and follow-up was from the date of the positive test result. We used second degree fractional polynomials to model non-linear relations for continuous variables, including age, body mass index, and Townsend material deprivation score, by using the non-imputed complete data.18 We used multiple imputation with chained equations to impute missing values for ethnic group, Townsend score, body mass index, and haemoglobin A1C level. We carried out five imputations and fitted the prediction models in each imputed dataset. We used Rubin’s rules19 to combine the model parameter estimates across the imputed datasets.
We fitted full models and inspected the results. We then retained variables in the final models that were significant at the 5% level and where adjusted hazard ratios were >1.1 (for binary variables). We combined clinically similar variables with low numbers of events. We examined interactions between predictor variables and age. We assessed model optimism by calculating Van Houwelingen’s measure of heuristic shrinkage.20 We used post-estimation methods to estimate the baseline survivor function at 90 days from the Cox regression model, as described by Kalbfleisch and Prentice,21 based on zero values of centred continuous variables, with all binary predictor values set to zero. This value is analogous to the Kaplan-Meier product limit estimate.21 We used the mean of these values across all imputations in the calculation of predicted risks. Lastly, we combined regression coefficients from the final models with these estimates of the baseline survivor function evaluated at 90 days to derive risk equations for each outcome.
We evaluated the performance of the model in the validation cohort. We used multiple imputation to replace missing values for ethnic group, haemoglobin A1C levels, body mass index, and Townsend score, with the same imputation model as in the derivation cohort. We applied the final risk equations to calculate the predicted risks for each outcome. We calculated Harrell’s C statistics,22 Royston’s R2 values, and associated D statistics,23 and combined them across imputed datasets with Rubin’s rules.19 We ran a Cox proportional hazards model to calculate the calibration slope with the prognostic index over the study period. We generated smoothed calibration plots with the running command in Stata to compare observed event probabilities and predicted risks at 90 days across the range of predicted risks. Pseudovalues were used to generate the observed event probabilities determined at 90 days with the stpsurv command in Stata.24
We calculated each performance metric in the whole validation cohort and in subgroups for age, ethnic group (where numbers allowed), and geographical region. We compared model discrimination with predicted risks calculated with earlier versions of QCOVID developed on populations that were not vaccinated: during the first wave of the pandemic, QCOVID1 was developed on the total population that was not vaccinated, between 24 January 2020 and 30 April 2020; and during the second pandemic wave, QCOVID2 was developed on patients with a positive test result for SARS-CoV-2 who were not vaccinated, between 8 December 2020 and 21 June 2021, when the alpha and delta variants of the SARS-CoV-2 virus were dominant. The predicted risks based on these models did not therefore account for the vaccination status of individuals in the validation cohort.
We decided not to include QCOVID3 because it was modelled with a population who were vaccinated. Also, QCOVID3 was designed to follow up participants from 14 days after the date of vaccination whereas the follow-up for QCOVID4 was from the date of a positive SARS-CoV-2 test result. QCOVID4 also included people who were not vaccinated as well as people who had received up to four doses of vaccine, so we considered that QCOVID3 was less applicable to our study population.
We applied the QCOVID4 algorithms to the validation cohort to define the centile thresholds based on absolute predicted risk. We calculated sensitivity as the total number of patients with a predicted risk above the risk threshold with a covid-19 related death out of the total number of covid-19 related deaths.
We also compared risk stratification using QCOVID4 to identify the top 2.5% of patients at highest absolute predicted risk in the validation cohort (because this percentage reflected the actual percentage of high risk patients identified by NHS Digital) with the current recommended guidelines which have selected a high risk cohort based on relative risks of selected medical conditions.
Decision curve analysis
We used decision curve analysis in the validation cohort accounting for censoring to evaluate the net benefits of the new risk equations.25 This approach assesses the benefits of correctly detecting people who will have a covid-19 related death (for example) compared with the harms from a false positive classification (which could lead to unnecessary intervention). The net benefit of a risk equation at a specific risk threshold is calculated from the difference between the proportion of true positive results and the proportion of false positive results multiplied by the odds defined by the risk threshold value.26 We calculated the net benefits of QCOVID4 compared with QCOVID2 in the whole cohort and in regional subgroups, across a range of threshold probabilities, and compared these with alternative strategies, such as assuming that all patients are treated or that no patients are treated. In general, the strategy with the highest net benefit at any given risk threshold is considered to have the most clinical value.
We adhered to the RECORD (REporting of studies Conducted using Observational Routinely collected health Data)27 and TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis) statements for reporting.28 We used all of the available data on the database to maximise the power and generalisability of the results. We used Stata (version 17) for analyses.
Patient and public involvement
Patients were involved in framing the research question, identifying predictors, and in developing plans for design and implementation of the QCOVID risk tool. A citizen’s jury convened by the Scottish government evaluated earlier versions of the QCOVID algorithm and highlighted the importance of keeping it up-to-date and maintaining transparency over its use.29
Baseline characteristics of study cohorts
The QResearch database (version 47) included 1430 practices. We allocated 1287 practices to the derivation cohort and 143 to the validation cohort. Of the 9 526 580 patients aged 18-100 years in the derivation cohort, 1 297 922 (13.6%) had a positive test result for SARS-CoV-2 during the study period. Of these, 18 756 (1.5%) were admitted to hospital for covid-19 and 3878 (0.3%) had a covid-19 related death during follow-up. Of the 1 064 255 patients in the validation cohort, 145 397 (13.7%) had a positive test result for SARS-CoV-2 and were included in the analysis. Of these, 2124 (1.5%) were admitted to hospital for covid-19 and 461 (0.3%) had a covid-19 related death.
Table 1 shows the baseline characteristics of those who had a positive test result for SARS-CoV-2, those with a covid-19 related death, and those admitted to hospital for covid-19 in the derivation cohort. Supplementary table 1 shows the corresponding results for the validation cohort. Mean age in the derivation cohort for those with a positive SARS-CoV-2 test result was 42.4 years (standard deviation 16.4), for those admitted to hospital for covid-19, 55.6 years (22.1), and for those with a covid-19 related death, 80.9 years (12.3).
Factors associated with increased or decreased risk of severe covid-19 outcomes
Figure 1 shows the adjusted hazard ratios for variables in the final QCOVID4 model for covid-19 related death in women. Figure 2 show the corresponding results for men. Figure 3 and figure 4 show the adjusted hazard ratios for variables in the final model for hospital admission for covid-19 in women and men, respectively. Supplementary figure 1 shows the adjusted hazard ratios for fractional polynomial terms for each of the models for age and body mass index. The full model, model coefficients, functional form, and baseline survival function are published at https://bmj2022.qcovid.org. Values for Van Houwelingen’s heuristic shrinkage20 were close to one (for covid-19 related death the value was 0.99 in women and men; the corresponding values for hospital admission for covid-19 were both 1.00). Based on our large sample size and these shrinkage values, we considered that modifying the final model to account for model optimism was not needed.
The rate of covid-19 related death in men increased steeply with age and was associated with deprivation score. In the final model in men, adjusted hazard ratios were highest for those with these conditions (fig 2 shows the 95% confidence intervals): kidney transplant (6.1-fold increase), Down’s syndrome (4.9-fold), radiotherapy (3.1-fold), type 1 diabetes (3.4-fold), chemotherapy grade A (3.8-fold), chemotherapy grade B (5.8-fold), chemotherapy grade C (10.9-fold), solid organ transplant ever (2.4-fold), dementia (1.6-fold), Parkinson’s disease (2.2-fold), and cirrhosis of the liver (2.5-fold). Other conditions associated with increased covid-19 related death in men included learning disability, chronic kidney disease (stages 4 and 5), blood cancer, respiratory cancer, immunosuppressants, oral steroids, chronic obstructive pulmonary disease, coronary heart disease, stroke, atrial fibrillation, heart failure, thromboembolism, rheumatoid or systemic lupus erythematosus, schizophrenia or bipolar disease, sickle cell, HIV, or severe combined immunodeficiency, and type 2 diabetes. We found no association between covid-19 related death and residence, asthma, rare pulmonary conditions, cerebral palsy, or congenital heart disease. We found no difference in risk according to haemoglobin A1C levels, so we included type 1 and type 2 diabetes as binary rather than categorical variables. These results were generally similar in women (fig 1).
We found no significantly increased risks of covid-19 related death in men or women in other ethnic groups compared with the white group. An increased risk of hospital admission for covid-19 was found among Bangladeshi, Pakistani, and other Asian men and women, and for black African women (fig 3 and fig 4).
The rate of covid-19 related death was lower among those who were vaccinated against covid-19 than those who were not vaccinated, with evidence of a dose-response relation (supplementary table 2). For example, compared with men who were not vaccinated, we found a 42% risk reduction associated with one dose of vaccine (adjusted hazard ratio 0.58, 95% confidence interval 0.43 to 0.79) and a 92% reduction in risk with four or more doses (0.08, 0.06 to 0.12). The reduced risks of covid-19 related death associated with covid-19 vaccination doses were similar in women. The risk of hospital admission for covid-19 was also reduced in men and women who were vaccinated.
Previous SARS-CoV-2 infection was associated with a 49% reduced risk of covid-19 related death in men (adjusted hazard ratio 0.51, 95% confidence interval 0.40 to 0.64) and a 45% reduced risk in women (0.55, 0.45 to 0.67), independent of age, ethnic group, vaccination status, and other factors included in the final QCOVID4 models. The 39% reduction in the risk of hospital admission associated with previous SARS-CoV-2 infection for men (0.61, 0.56 to 0.68) was similar to that in women (33%; 0.67, 0.63 to 0.72).
Table 2 shows the explained variation and discrimination of QCOVID1, QCOVID2, and QCOVID4 models in the validation cohort for women and men overall for covid-19 related death and hospital admission. The QCOVID4 algorithm explained 76.6% (95% confidence interval 74.4% to 78.8%) of the variation (R2) in time to covid-19 related death in women. The D statistic was 3.70 (3.48 to 3.93) and Harrell’s C statistic was 0.965 (0.951 to 0.978). The corresponding results for covid-19 related death in men were similar, with R2 of 76.0% (73.9% to 78.2%), D statistic 3.65 (3.43 to 3.86), and C statistic 0.970 (0.962 to 0.979).
The performance of QCOVID4 for covid-19 related death was slightly improved compared with both QCOVID1 and QCOVID2. For hospital admissions for covid-19, however, QCOVID4 had substantially improved performance compared with QCOVID2, which had improved performance compared with QCOVID1. For example, Harrell’s C statistic in men for QCOVID4 was 0.970 (95% confidence interval 0.963 to 0.977) compared with 0.932 (0.918 to 0.945) for QCOVID2 and 0.798 (0.781 to 0.814), for QCOVID1.
Supplementary table 3 shows the performance of QCOVID4 in subgroups by age and ethnic group. Supplementary table 4 shows the performance of QCOVID4 in subgroups by geographical region. Performance measures were generally higher in younger age groups and similar across ethnic and regional groups (where sufficient numbers in the subgroup existed to conduct an analysis).
Figure 5 and figure 6 show the smoothed calibration plots for covid-19 related death and hospital admission for covid-19 with QCOVID4 to assess calibration in the validation cohort. The results showed that the QCOVID4 model was well calibrated to the current contemporaneous validation dataset overall, with a small degree of underprediction for covid-19 related death, and underprediction at lower risks and overprediction at higher risks for hospital admission for covid-19.
Figure 7 and figure 8 show the corresponding results for QCOVID2 which was less well calibrated with substantial overprediction of risks for both covid-19 related death and admission to hospital. Supplementary figure 2 shows the corresponding results for QCOVID4 by ethnic group in men and women and a histogram of the distribution of predicted risks. Supplementary figure 3 shows the corresponding results for QCOVID4 by region in men and women. Supplementary figure 4 shows the corresponding results for QCOVID4 by age group. We found evidence of miscalibration in some subgroups, although low numbers of events in some ethnic groups and regions existed, resulting in wide confidence intervals.
Table 3 shows the classification statistics in the validation cohort for men and women for 5% increments in centile values for the predicted covid-19 mortality risk with QCOVID4. For example, for the 20% of the cohort at the highest predicted risk (ie, those with a 90 day predicted risk of 0.075% or higher), sensitivity was 97.8%, specificity was 80.2%, and the observed risk at 90 days was 1.54%. The corresponding values for the top 5% at the highest predicted risk were sensitivity 87.6%, specificity 95.3%, and observed 90 day risk of 5.52%.
We identified 34 864 patients in the NHS Digital high risk cohort in the QResearch database, of whom 3600 were in the validation cohort. Table 4 shows the characteristics of these 3600 patients compared with the characteristics of 3600 patients (top 2.48%) with the highest predicted risks of covid-19 related death with the QCOVID4 models. Patients in the QCOVID4 high risk group tended to be much older (mean 85.0 v 55.4 years) with higher levels of comorbidities, with some exceptions (eg, chronic kidney disease stage 5, blood cancer, grade B chemotherapy, and rare neurological conditions). In total, 520 patients were included in both the QCOVID4 and NHS Digital high risk groups. Of the 461 covid-19 related deaths in the validation cohort, 333 (72.2%) were in the QCOVID4 high risk group, 307 (66.6%) in the QCOVID2 high risk group (top 2.48%), and 95 (20.6%) in the NHS Digital high risk group. Uptake of covid-19 treatments (both antiviral agents and neutralising monoclonal antibodies) was low, with 504 (14.0%) patients in the NHS Digital high risk group receiving treatment and 131 (3.6%) in the QCOVID4 high risk cohort.
Comparing QCOVID2 and QCOVID4, 2962 people were in the high risk groups on both measures (287 covid-19 related deaths), 638 were high risk on QCOVID2 and low risk on QCOVID4 (28 covid-19 related deaths), and 638 were high risk on QCOVID4 and low risk on QCOVID2 (46 covid-19 related deaths). Overall, we found 28 more covid-19 related deaths (6.1% of the total) which would be correctly identified in the QCOVID4 high risk group compared with the QCOVID2 high risk group.
Figure 9 and figure 10 show the decision analysis curves which indicated the improved net benefit of using QCOVID4 compared with QCOVID2, and of both algorithms compared with strategies of treating all patients or treating none. Supplementary figure 5 shows the decision analysis curves by geographical region. These curves also indicate the potential clinical use of the models despite some lower net benefit in some of the regional subgroups.
We have developed and validated a new QCOVID model (QCOVID4) based on data recorded during the period when the omicron variant of the SARS-CoV-2 virus was the predominant strain in England. QCOVID4 more accurately identified individuals at highest levels of absolute risk for targeted interventions than the conditions based approach adopted by NHS Digital, which is based on the relative risks of a list of medical conditions. QCOVID4 was also more accurate and had a higher net benefit than our earlier QCOVID2 algorithm at identifying high risk patients and had substantially better calibration in the current population. This finding likely reflects the largely vaccinated population and the lower virulence of the omicron compared with the alpha variant of the SARS-CoV-2 virus.
We also compared discrimination with earlier versions of the QCOVID algorithms on this dataset. The earlier QCOVID models were developed in the first wave of the original variant (QCOVID1) and second waves of the alpha and delta variants of the SARS-CoV-2 virus (QCOVID2). Overall, the factors associated with an increased risk in earlier models16 were still associated with an increased risk in the QCOVID4 model, apart from residence, asthma, rare pulmonary conditions, cerebral palsy, and congenital heart disease. Another exception was ethnic group where the previously increased risks, particularly associated with South Asian and black African and Caribbean ethnic groups for covid-19 related death in QCOVID11 and QCOVID26 were no longer apparent in QCOVID4 although these results should be interpreted with caution because of possible differential testing rates. However, we saw a residual increased risk for Pakistani, Bangladeshi, and other Asian groups for admission to hospital for covid-19 compared with the white group for both men and women and for black African women, after adjustment for age, deprivation score, comorbidity, and vaccination status.
We have shown that infection with the SARS-CoV-2 virus before the study period was associated with about a 50% lower risk of covid-19 related death in both men and women. This finding was independent of age, ethnic group, deprivation score, comorbidity, and vaccination status. Similarly, we saw a dose dependent reduction in the risk of covid-19 related mortality in men and women after covid-19 vaccination, with each subsequent dose associated with a lower risk.
The validation showed that all three models (QCOVID4, QCOVID1, and QCOVID2) have high levels of discrimination and explained variation for covid-19 related death in this dataset. The QCOVID4 model has substantially improved discrimination and explained variation for predicting the risk of hospital admission for covid-19. Of those identified by NHS Digital as a high risk group for targeted treatments (2.5% of total), only 14% were also included in an equivalently sized high risk group with QCOVID4. The covid-19 related death rate in the QCOVID4 high risk group was three times higher than that in the NHS Digital cohort, supporting the face validity of QCOVID4. This difference was not explained by the use of treatment interventions, which was low in both groups. Similarly, QCOVID2 was more accurate in identifying high risk patients than the NHS Digital approach.
The validation results also showed that the QCOVID4 model was well calibrated to the current contemporaneous validation dataset with a small degree of underprediction of risk for covid-19 related death and overprediction at the highest levels of risk for hospital admission for covid-19. QCOVID1 was developed in a general population to estimate the combined risk of catching and dying from covid-19 because of lack of testing data available in the first wave of the pandemic and so estimates of absolute risk would not be valid for prediction of covid-19 related death in people with a positive test result for SARS-CoV-2. Overprediction was associated with QCOVID2, which likely mainly reflects higher levels of vaccination, better treatments, and differences in the variant type, with the omicron variant of the virus now generally considered to be less severe than earlier variants.730 Taken together, QCOVID1 and QCOVID2 have acceptable ongoing use for ranking those at highest risk of death for interventions but are less robust for predicting the absolute risk of each outcome than QCOVID4.
Strengths and limitations of the study
Our study had major strengths and some limitations. Limitations included specific problems related to covid-19 along with other strengths and limitations characteristic of widely used clinical risk prediction algorithms developed with the QResearch database.313233 Key strengths included the use of large representative population based contemporaneous data sources which have been used to develop other widely used risk prediction tools,3132 an extensive set of diagnostic predictors from electronic healthcare records, prospective recording of outcomes and their determination with multiple national level database linkage, lack of selection, recall, and respondent biases, and robust statistical analysis. The size of the dataset makes overfitting unlikely.34 We used non-linear terms to model associations with body mass index and age, and multiple imputation to handle missing data. Finally, the update to the algorithm is a strength; prediction algorithms are seldom updated, despite changes in the natural history of a disease and the introduction of interventions.
Limitations included the relatively small numbers of events in some of the subgroups, which is an inevitable consequence of undertaking an analysis of multiple subgroups during a relatively short wave of a pandemic. Although we have accounted for many risk factors for severe covid-19 outcomes, additional risks might arise from rare medical conditions or other factors associated with infection, such as occupation, that are poorly recorded in general practice or hospital records. We could not separately identify those patients admitted to hospital because of SARS-CoV-2 infection from those admitted for other reasons where a coincidental infection was present. We used the first positive test results during the study period rather than account for repeat infections, although our study window was short (maximum 3.5 months) so the number of new reinfections should be small. Also, our study did not look at outcomes related to the recent omicron BA.4/BA.5 wave in England which was identified as a variant of concern on 18May 2022.
Although we have reported a validation based on practices from QResearch, these practices were separate from those used to develop the model. Previously we have used this approach to develop and validate other widely used prediction models. When these models have been validated on data from different clinical computer systems, the results have been similar.353637 Work has now been completed to evaluate earlier QCOVID models in external datasets, including the English national dataset hosted by the Office of National Statistics,2 and Scotland4 and Wales3 data, and these evaluations also showed similar levels of performance to the validation in QResearch practices.
Implications for clinical practice, policy, and research
The use of a risk prediction model depends on the purpose for which it has been designed, the setting in which it has been developed, and where it might be used. Our model provides a mechanism to estimate risk if a person has a positive test result and might require treatment. The speed at which new SARS-CoV-2 variants of concern have emerged and become dominant means that prediction models could be out of date almost as soon as they have been developed and implemented. This study, with its comparisons of algorithms developed over the past three major waves of the pandemic, each associated with different variant types, provides evidence that the discrimination of the QCOVID1 and QCOVID2 algorithms remains good. This finding means that the latest QCOVID algorithm is likely to be effective for risk stratifying or ranking individuals for interventions, even potentially when a new variant emerges, although continual monitoring for changes in risk factors is warranted. Algorithms might need to be recalibrated before being used to calculate absolute risks in settings or time periods with different rates for covid-19 related death or hospital admission for covid-19. QCOVID4 can be temporally recalibrated with a similar method to that used to temporally recalibrate QCOVID1.1
The QCOVID4 algorithm, developed based on data from the period when the omicron variant of the SARS-CoV-2 virus was predominant in England, performed well in identifying those at highest risk of severe covid-19 outcomes. This approach can be used to risk stratify patients for intervention (such as covid-19 treatments) and inform clinical decision making on individualised risk management. This strategy could be more effective than an approach based on relative risks of individual medical conditions.
What is already known on this topic
The QCOVID risk assessment algorithm for predicting risk of covid-19 related death or hospital admission for covid-19 based on individual characteristics has been used in England to identify people at high risk of severe covid-19 outcomes
Treatments for covid-19 (monoclonal antibodies and antivirals) are available but need to be targeted to those at highest risk of severe outcomes
What this study adds
The QCOVID4 risk algorithm, based on data from the period when the omicron variant was predominant in England, now includes number of vaccination doses and previous SARS-CoV-2 infection
QCOVID4 performed well both for ranking individuals (discrimination) and predicting levels of absolute risk (calibration), and can be used for targeting covid-19 treatments as well as individualised risk assessment
QCOVID4 more accurately identified individuals at the highest levels of absolute risk than the approach for targeted interventions adopted by NHS Digital based on a list of medical conditions with increased relative risks
We acknowledge the contribution of EMIS practices who contribute to QResearch and EMIS Health, and the Universities of Nottingham and Oxford for expertise in establishing, developing, or supporting the QResearch database. This project involves data derived from patient level information collected by the NHS, as part of the care and support of cancer patients. The data are collated, maintained, and quality assured by the National Cancer Registration and Analysis Service, which is now part of NHS Digital. The Hospital Episode Statistics data, SARS-Cov-2 results, and civil registration mortality data are used by permission from NHS Digital who retain the copyright of those data. NHS Digital and Public Health England bear no responsibility for the analysis or interpretation of the data.