Monkeypox is a re-emerging zoonotic disease caused by the Monkeypox virus, which belongs to the Poxviridae family, Chordopoxvirinae subfamily, and Orthopoxvirus genus . The virus was first identified in monkeys in 1958  and later found to infect humans in the Democratic Republic of the Congo (DRC) in 1970 . Common initial symptoms of Monkeypox include headache, fever, backache, muscle aches, swollen lymph nodes, and fatigue. Within the first three days of experiencing these symptoms, most infected individuals develop a rash or sores, initially appearing on the face and then rapidly spreading centrifugally to other parts of the body .
The spread of the virus had been limited to a few African countries for a considerable period. However, as of the time of writing this article (December 19, 2022), the world was experiencing a global outbreak of the Monkeypox virus. There had been 82,809 confirmed cases reported in 110 countries. In response to this concerning situation, the World Health Organization (WHO) convened an “emergency meeting” on May 20, 2022, to address the escalating cases of the Monkeypox virus. The WHO was deliberating whether to declare the outbreak officially. Furthermore, due to a significant increase in cases, the Centers for Disease Control (CDC) in the United States raised its Monkeypox alert level to “Level 2” on June 6, 2022 . The CDC noted that, as of now, there are no specific treatments available for Monkeypox infection . However, it is worth mentioning that the Food and Drug Administration (FDA) recently approved a Monkeypox vaccine.
As the number of Monkeypox cases continued to rise, countries around the world were implemented various preparations, initiatives, and measures to mitigate the spread of the virus. These efforts included implementing lockdown measures in Belgium, the United States procuring 500,000 doses of the smallpox vaccine, Canada offering vaccination to high-risk groups, French and Danish health authorities advocating for vaccine distribution to adults affected by the virus, Germany recommending vaccinations for high-risk populations, and the United Kingdom advising self-isolation for all individuals infected with the virus .
Over the past two decades, significant progress has been made in the development of modern Machine Learning (ML) algorithms, particularly Deep Learning (DL). This progress has been facilitated by the availability of large databases, improved computational power, and increased accessibility to advanced technologies. As a result, Artificial Intelligence (AI) and ML have transitioned from experimental laboratory concepts to practical and applicable technologies in various commercial domains. One sector that has witnessed substantial growth is the application of ML techniques in healthcare. DL has emerged as a prominent player in the field of health informatics, offering distinct advantages in feature extraction and data classification .
DL models typically employ a larger number of hidden neurons and layers compared to traditional neural network architectures. This design choice is driven by the availability of vast amounts of raw data during the training phase, enabling the use of more neurons. DL approaches are based on representation learning, which involves constructing nonlinear modules layer by layer to achieve higher levels of representation. Each layer transforms the representation from one form to the next, ultimately resulting in a more abstract representation, thus facilitating the automatic generation of a feature set [9, 10]. In the field of health informatics, the automatic generation of a feature set without human intervention offers significant advantages. Medical image processing is a prominent area where DL has been successfully applied. Among the various DL architectures, Convolutional Neural Networks (CNNs) are commonly used for medical image processing due to their proficiency in computer vision tasks and their ability to leverage Graphics Processing Units (GPUs) . CNNs have found application in various areas such as cancer detection, survival prediction, and the prediction of Coronavirus Disease 2019 (COVID-19) outcomes [12,13,14,15]. In recent years, there has been a significant surge in the integration of AI in clinical domains. DL has exhibited remarkable advancements in performance. Various Deep Neural Network (DNN) architectures, such as VGG, ResNet, Inception, AlexNet, GoogLeNet, MobilNet, and ShuffleNet, have been employed for the classification of Monkeypox images [16,17,18].
Despite their promising performance, a critical concern lies in the limited understanding and interpretability of these models’ decision-making processes. This lack of transparency poses a challenge to gaining trust and acceptance among clinicians. Clinicians often hesitate to fully rely on AI models due to their “black box” nature, where the internal workings and reasoning behind predictions are not readily explainable. To address this issue and establish clarity and certainty, research efforts have focused on developing methods for interpreting and explaining the performance of AI models. These methods aim to unveil the decision-making mechanisms employed by AI models and identify the influential factors that contribute to their predictions . By employing such interpretability techniques, clinicians can gain deeper insights into how AI models arrive at their predictions. This enhanced understanding fosters trust and confidence in the reliability of AI systems, bridging the gap between advanced AI technologies and real-world clinical applications. Some researchers have also emphasized the interpretability and explainability of DL models in the context of Monkeypox. To achieve this, a few studies have employed two approaches, namely Local Interpretable Model-Agnostic Explanations (LIME) and Gradient-weighted Class Activation Mapping (Grad-CAM) .
At the time of writing this article, there were only two publicly available datasets that were specifically created for the development of ML and DL models to detect Monkeypox disease [21, 22]. For this study, we utilized the dataset provided by Ahsan et al. , which includes skin lesion images of Monkeypox, Chickenpox, and Measles diseases. Considering the available datasets, two scenarios have been explored for Monkeypox detection: multi-class classification, where each class represents a specific skin lesion disease, and two-class classification, such as distinguishing Monkeypox from other diseases.
Despite the existing literature on the detection of Monkeypox disease using DL models, our literature review, along with a recent systematic literature review , has identified certain areas that warrant further research:
It has been observed that most published studies have focused on developing one of the previously mentioned approaches. Additionally, the two-class approach, specifically distinguishing Monkeypox from other diseases, has emerged as more prevalent in the literature .
Most studies have primarily reported accuracy as the main metric for evaluating the performance of their models. This trend is also evident in , where accuracy is the sole metric utilized across all included studies. However, it has been well-established that accuracy alone may not provide a comprehensive evaluation of model performance. In light of this, the Receiver Operating Characteristic (ROC) curve, which offers a measurement based on the surface, has been proposed as an alternative evaluation metric .
The importance of model interpretability has been recognized, but only a limited number of studies have addressed this aspect [16, 24]. Understanding how models arrive at their predictions is crucial for ensuring trust, transparency, and effective decision-making in clinical settings.
Therefore, further research is needed to explore alternative approaches, consider diverse evaluation metrics, and delve into the interpretability of DL models in the detection of Monkeypox disease. By addressing these gaps, we can enhance the effectiveness and reliability of these models for real-world applications.
Objectives and contributions
In this study, we utilized the previously mentioned publicly available dataset  and performed preprocessing techniques to prepare the data for analysis. Subsequently, we implemented seven DL models that leverage pre-trained capabilities to diagnose Monkeypox disease based on skin lesion images from patients. To enhance the performance of these models, we conducted experiments and introduced modifications to the standard architecture, incorporating five dense layers. Additionally, we explored two scenarios: the two-class scenario and the four-class scenario. In the two-class scenario, images were categorized into Monkeypox and non-Monkeypox classes, while the four-class scenario involved Monkeypox, Chickenpox, Measles, and Normal classes. These scenarios allowed us to develop robust models and facilitate a comprehensive analysis of the data.
To ensure the optimal performance of the models, we conducted rigorous hyperparameter optimization and evaluated their performance using eight different evaluation metrics. Additionally, to enhance interpretability and provide explanations for the models’ results, we employed LIME and Grad-Cam techniques . The contributions of this paper can be summarized as follows:
Development of seven modified DNNs, specifically designed for the detection of Monkeypox, considering both two-class and four-class scenarios.
Evaluation of model performance and generalization capabilities using eight different performance metrics, providing a comprehensive understanding of the effectiveness of the models.
Utilization of LIME and Grad-Cam techniques to enhance the interpretability of the models, allowing for a better understanding of the factors influencing the models’ decisions.
Comparative analysis with previous studies that employed similar scenarios, demonstrates the superior performance of our proposed model in terms of the F1-Score metric for both scenarios. Additionally, we reported ROC curves and Area Under the Curve (AUC) values for all models, further validating their performance.