Prediction of aspects of vaccine misinformation
Multiple experimental trials were conducted to achieve the highest possible performance for the aspect prediction model. Out of those models, the optimized LightGBM model with 50 features outperformed the experimental models. This model was trained on the stratified annotated vaccine misinformation dataset with a split of 85% for training and 15% equally split between validation and testing. Table 5 shows the dataset distribution.
The LightGBM aspect classification model was able to achieve a validation ensemble accuracy of 80.1% and a testing accuracy of 71.3%. Accuracy is the performance evaluation metric used throughout the evaluation of the experimental results. For each of the vaccine misinformation aspects, for instance, “Vaccine Constituent,” the classification results could be any of the following:
True Positive (TP): Predicted aspect is “Vaccine Constituent” and the ground truth is “Vaccine Constituent.”
True Negative (TN): Predicted aspect is any non-“Vaccine Constituent” aspect, and the ground truth is any non-“Vaccine Constituent” aspect
False Positive (FP): Predicted aspect is “Vaccine Constituent” while the ground truth is any non-“Vaccine Constituent” aspect
False Negative (FN): Predicted emotion is any non-“Vaccine Constituent” aspect, while the ground truth is “Vaccine Constituent.”
Accuracy is the percentage of correctly labeled tweets.
Table 6 shows the obtained per-class (i.e., per an aspect accuracy).
Figure 3 shows the Receiver Operating Characteristic Curve (ROC) of the validation and testing, respectively. The obtained testing Area Under the ROC Curve (AUC) is 90.3% and 89.6% for validation and testing, respectively. These results indicate acceptable performance, considering the skewness of the data.
In order to provide an in-depth insight of the progression of vaccine misinformation aspects, multiple analytics were produced. For instance, Fig. 4 illustrates the number of misinformation tweets associated with each type of vaccine. It can be seen that Pfizer, Moderna, and AstraZeneca had the majority of the public misinformation. Pfizer, specifically, was by far the most discussed in the misinformation tweets relevant to all aspects, making up for approximately 53% of the total tweets in the dataset.
Furthermore, Fig. 5 illustrates the number of misinformation tweets factored by aspect for each vaccine type. It can be seen that Pfizer, i.e., represented by the green bar, generated most of the vaccine misinformation discourse relevant to all the aspects. Moreover, Moderna and AstraZeneca, i.e., represented by yellow and light blue bars respectively, were the second most discussed vaccines in the misinformation tweets relevant to all the defined misinformation aspects. However, it is observed that Moderna was specifically associated with the aspects of “Vaccine Constituents” and “Efficacy and Clinical Trials” more frequently than AstraZeneca since it is an mRNA-based vaccine with questionable efficacy among the public. Meanwhile, AstraZeneca was associated more frequently than Moderna with the aspect of “Adverse Affects” given that many cases reported blood clots and heart problems after taking the vaccine. Furthermore, it can be seen that the aspect of “Agenda Discussions” associated with Pfizer was almost three times as frequent as AstraZeneca and exceeded the total frequency of tweets reporting the same aspect of misinformation relevant to all other vaccine types. The fact that Pfizer is an mRNA-based vaccine triggered much discourse among the public, which was reported in their tweets.
Figure 6 shows the percentage of misinformation aspects associated with the three most discussed vaccine types on Twitter: Pfizer, Moderna, and AstraZeneca.
To understand the spatiotemporal progression of the vaccine misinformation, the tweets’ geo-location, and timestamp metadata of Twitter chatter were used to develop spatial and temporal analytics. To illustrate the geographical spread of the most dominant vaccine misinformation aspects, Fig. 7 shows the spatial distribution obtained from the Twitter data. Figure 7 provides a visual representation of the prevalence of the four vaccine misinformation aspects considered in this research. Blue refers to ‘Efficacy and Clinical Trials’, green refers to ‘Agenda’, pink refers to ‘Adverse Effects, and orange refers to ‘Vaccine Constituent’. The map shows spatial dominance at two levels: coverage and intensity. Regions are colored with a specific color to indicate the dominance of a certain aspect of misinformation in that region. The intensity of the color indicates the frequency or the prevalence of particular aspects of misinformation in specific regions. Furthermore, overlapping colors indicate several aspects of misinformation prevalent in that particular region. Finally, circles, indicate more specific areas such as cities that were mentioned in the Twitter chatter data.
By mapping the spatial distribution of the aspects of misinformation, policymakers can identify regions that require s specific interventions aimed at correcting misguided beliefs with evidence-based messaging.
As illustrated in Fig. 7, the vaccine “Efficacy and Clinical Trials” was the most dominant aspect in European countries as well as Middle Eastern and African countries. With 33 total registered trials in Russia, 18 in South Africa, and less than 10 trials in the rest of Africa , Twitter users expressed their growing concern on the insufficient testing of the experimental jabs. It was also observed that “Efficacy and Clinical Trials” as well as “Agenda” misinformation aspects were dominant in South Africa, where some experimental trials were conducted before approving the vaccine. Meanwhile, Twitter users were more concerned about the vaccine being part of an agenda, whether it is depopulation or market profit, in the USA and Canada. Throughout the study duration, most of the tweets concerned about the adverse effects were in Australia, Colombia, and Japan. Several adverse events were reported worldwide and in Australia , including testing positive for HIV as many claimed.
Figure 8 shows the temporal progression of misinformation aspects over the dataset timeline, starting from the approval of the first vaccine in December 2020 until July 2021.
As shown in Fig. 8, the “Efficacy and Clinical Trials” was the dominant aspect during the first three months, starting from the first official approval of the vaccine. Post the official approval, and as published by the American Journal of Managed Care (AJMC) , many countries, including the UK, began distributing the vaccines. In addition, vaccination acceleration plans were adopted in the USA. In early February and March, multiple efficacy-related reports were published. Simultaneously, multiple variants started to spread, raising the concern about the efficacy of the vaccines.
An increasing public controversy arose in March, associating blood clots with AstraZeneca after several reported cases. In response to those cases, in South Africa and many European countries, the vaccine was temporarily suspended to further investigate the association of the vaccine with blood clots. Moreover, AstraZeneca issued a statement refusing the causality link between its vaccine and blood clots and experts stressed that there is no causal link . This can be seen in the temporal progression of the curve associated with the aspect “Efficacy and Clinical Trials”, represented in green color in Fig. 8. The curve grows between February and March, then, gradually declines toward April upon the release of experimental statistics negating the relationship between the vaccine and the reported cases. Toward the end of March, Pfizer and Moderna released positive efficacy data and survey results reporting a drop in vaccine hesitancy ; hence, the decline of the “Efficacy and Clinical Trials” aspect curve.
Furthermore, the “Agenda” aspect, represented by the yellow color in Fig. 8, starts to peak in April. This can be explained by the latest events starting from late April when fake Pfizer vaccines were reported in Mexico and Poland  as well as the distribution of vaccines, especially AstraZeneca, outside the USA.
The “Efficacy and Clinical Trials” aspect peaks again during May 2021. May witnessed the preparation for authorization and approval of Pfizer vaccine in adolescents . Moreover, multiple vaccines, including doses of Johnson & Johnson, Pfizer, Moderna, and AstraZeneca—that were not approved yet in the USA- were shipped out from the USA . In addition, several cases of adverse effects, including heart problems, were reported towards the end of May. This is reflected in the analysis, as the “Efficacy and Clinical Trials” as well as “Adverse Effects” aspects peak. Furthermore, it is observed that “Agenda” remains the most dominant aspect and remarkably peaks in May, exceeding the peaks of all aspects throughout the entire duration. The peak of the three previous aspects was aligned due to the relatedness of the topics involved in discussing these aspects. This indicates that the spread of misinformation leads to the public being concerned about having intentionally rushed and untested vaccines that have adverse and dangerous effects that fit into depopulation and profit agendas. Finally, in relation to the timeline, in June 2021, employers in the USA were permitted by the Equal Employment Opportunity Commission to mandate vaccination among their employees .
To evaluate the correlation between the misinformation tweets’ count and the number of vaccinations administered world-wide during the same timeframe, Standard Pearson correlation is calculated. This work hypothesizes that these two variables are negatively correlated at significance level of 0.05. The Pearson correlation coefficient and p-values are calculated for the global misinformation count against the vaccination counts of 43 countries from December 2020 until July 2021. The correlation values varied, but all values indicated negative correlations with values ranging from -0.349 to -0.915. The highest numerical value indicates stronger negative correlation between the study variables, indicating that the increase of misinformation tweet leads to the decrease of vaccinations. To assess the significance of the correlation, a p-value threshold of 0.05 is used. Values equal to or less than 0.05 are considered significant, indicating a strong significant correlation. On the other hand, p-values greater than 0.05 are considered insignificant. Figure 9 plots the correlation of the misinformation count against the vaccination count for 43 countries, and divides them into significant, and insignificant correlations. The gradient shades of the bar charts are proportional to the p-values, where darker values indicate lower p-values, hence, more significant correlation. The graph shows that 37% of the countries were negatively affected by the spread of misinformation. Since misinformation tweets’ spread on social media is globally accessible, we deem it imperative to consider their significance beyond local contexts.