Generative adversarial networks based skin lesion segmentation

Innani, Shubham; Dutande, Prasad; Baid, Ujjwal; Pokuri, Venu; Bakas, Spyridon; Talbar, Sanjay; Baheti, Bhakti; Guntuku, Sharath Chandra

doi:10.1038/s41598-023-39648-8

Download PDF

Article
Open access
Published: 18 August 2023

Generative adversarial networks based skin lesion segmentation

Shubham Innani¹,
Prasad Dutande¹,
Ujjwal Baid^1,2,
Venu Pokuri³,
Spyridon Bakas²,
Sanjay Talbar¹,
Bhakti Baheti^1,2^na1 &
…
Sharath Chandra Guntuku³^na1

Scientific Reports volume 13, Article number: 13467 (2023) Cite this article

1836 Accesses
2 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Skin cancer is a serious condition that requires accurate diagnosis and treatment. One way to assist clinicians in this task is using computer-aided diagnosis tools that automatically segment skin lesions from dermoscopic images. We propose a novel adversarial learning-based framework called Efficient-GAN (EGAN) that uses an unsupervised generative network to generate accurate lesion masks. It consists of a generator module with a top-down squeeze excitation-based compound scaled path, an asymmetric lateral connection-based bottom-up path, and a discriminator module that distinguishes between original and synthetic masks. A morphology-based smoothing loss is also implemented to encourage the network to create smooth semantic boundaries of lesions. The framework is evaluated on the International Skin Imaging Collaboration Lesion Dataset. It outperforms the current state-of-the-art skin lesion segmentation approaches with a Dice coefficient, Jaccard similarity, and accuracy of 90.1%, 83.6%, and 94.5%, respectively. We also design a lightweight segmentation framework called Mobile-GAN (MGAN) that achieves comparable performance as EGAN but with an order of magnitude lower number of training parameters, thus resulting in faster inference times for low compute resource settings.

A whole-slide foundation model for digital pathology from real-world data

Article Open access 22 May 2024

A guide to artificial intelligence for cancer researchers

Article 16 May 2024

scGPT: toward building a foundation model for single-cell multi-omics using generative AI

Article 26 February 2024

Introduction

Skin cancer results in approximately 91,000 deaths annually¹. Early detection and regular monitoring are crucial in improving the quality of diagnosis, ensuring accurate treatment planning, and reducing skin cancer mortality rates². A common detection method involves a dermatologist examining skin images to identify ambiguous clinical patterns of lesions that are often not visible to the naked eye. Dermoscopy, a widely used technique, helps dermatologists differentiate between malignant and benign lesions by eliminating surface reflections on the skin, thereby improving the accuracy of skin cancer diagnosis³.

Table 1 Related work on skin lesion segmentation with CNN and GAN-based approaches.

Full size table

Skin lesion segmentation, a method to differentiate foreground lesions from the background, has received a lot of attention for over a decade due to its high clinical applicability. Computer-aided diagnostic algorithms for automated skin lesion segmentation could aid clinicians in precise treatment and diagnosis, strategic planning, and cost reduction. However, automated skin lesion segmentation is challenging due to several factors⁷ such as (1) large variance in shape, texture, color, geographical conditions, and fuzzy boundaries, (2) the presence of artifacts such as hair and blood vessels, and (3) poor contrast between background skin and cancer lesions in addition to artifacts from image acquisition, as shown in Fig. 1.

Prior work

Pixel-level skin lesion segmentation algorithms can be divided into approaches built upon (a) classical image processing and (b) deep learning-based architectures. Deep learning-based methods can be further classified into Convolutional Neural Networks (CNN) and Adversarial Learning-based Generative Networks (GAN) based on the network topology. A brief review of a few prior works in these categories is presented in Table 1. The performance of classical image processing approaches heavily depends on post-processing, such as thresholding, clustering, and hole filling, tuning hyperparameters, and manual feature selection. Manually tuning these parameters can be expensive and could result in poor generalizability. Lately, deep learning-based approaches have surpassed several classical image processing-based approaches, mainly due to the wide availability of large labeled datasets and compute resources. Deep convolutional neural networks (DCNN) based methods gained a lot of popularity for skin lesion segmentation prior to the introduction of Transformer and GAN-based approaches in the field of medical imaging^{23,24,25,26,27}.

The success of prior DCNN-based approaches in skin lesion segmentation is primarily based on supervised methods that rely on large labeled datasets to extract features related to the image’s spatial characteristics and deep semantic maps. However, gathering a large dataset with finely annotated images is time-consuming and expensive. To address this challenge, Goodfellow et al.²⁸ introduced Generative Adversarial Networks (GANs), which have gained popularity in various applications, including medical image synthesis, due to the lack of widely available finely annotated data. Several recent and relevant GAN-based approaches in skin lesion analysis from the literature are listed in Table 1. Unsupervised learning-based algorithms that can handle large datasets with precision and high performance without requiring ground truth labels carry significant promise in addressing real-world problems such as computer-aided medical image analysis.

In our work, we address the challenges of skin lesion segmentation by utilizing generative adversarial networks (GANs)²⁸, which can generate accurate segmentation masks with minimal or no supervision. GANs work by training a generator and discriminator to compete against each other, where the generator tries to create realistic images, and the discriminator tries to differentiate between real and generated images (Fig. 2). However, designing an effective GAN for segmentation takes considerable time, as the performance is highly dependent on the architecture and choice of the loss function. Our study aims to optimize all three components (generator, discriminator, and loss function) for better segmentation results. The choice of the loss function is critical for the success of any deep learning architecture, and our approach takes this into account²⁹.

Proposed work

We propose two GAN frameworks for skin lesion segmentation. The first is Efficient-GAN (EGAN), which focuses on precision and learns in an unsupervised manner, making it data-efficient. It uses an encoder-decoder-based generator, patchGAN³⁰ based discriminator and smoothing-based loss function. The generator architecture uses a squeeze and excitation-based compound scaled encoder and a lateral connection-based asymmetric decoder. This architecture captures dense features to generate fine-grained segmentation maps, and the discriminator distinguishes between synthetic and original labels. We also implement a morphological-based smoothing loss function to capture fuzzy boundaries more effectively.

Although deep learning methods provide high precision for lesion segmentation, they are computationally expensive, making them impractical for real-world applications with limited resources like dermatoscopy machines. This presents a challenge in contexts where high-resource devices are unavailable to dermatologists. To address this issue, various devices like MoleScope II, DermLite, and HandyScope have been developed for lesion analysis and support low computational resources. These devices use a special lens with a smartphone. To create a more practical model for such real-time applications, we propose Mobile-GAN (MGAN), which is a lightweight unsupervised model consisting of an Inverted Residual block³¹ with Atrous Spatial Pyramid Pooling³². With this model, we aim to achieve good segmentation performance in terms of the Jaccard score with lower resource strain. With only 2.2M parameters (as opposed to 27M parameters in EGAN), the model can run at 13 frames per second, increasing the potential impact of computer vision-based approaches in day-to-day clinical practice.

Results

Performance of CNN-based models

We implemented and analyzed the results of several CNN and GAN-based approaches for this task. Table 2 summarizes the evaluation of CNN and GAN-based approaches on the unseen test dataset. We started with one of the most popular architectures in medical imaging segmentation-UNet³³. Since this architecture is a simple stack of convolutional layers, the original UNet provided a baseline performance on ISIC 2018 dataset. We strategically conducted several experiments using deeper encoders like ResNet, MobileNet, EfficientNet, and asymmetric decoders (described in the Methods section). The concatenation of low-level features is skewed, based on the number of output feature maps, rather than linking each block from the encoder, like in traditional UNet. Adding a batch normalization layer after each convolutional layer also helped achieve better performance. For detailed evaluation with CNN-based methods, we also experiment with DeepLabV3+³² and Feature Pyramid Network (FPN)³⁴ decoders in combination with various encoders as described above, and the modification led to improved performance. These results on the ISIC 2018 test set from our experimentation, i.e., us running the authors’ code to train the proposed models, are listed with $*$ in Table 2.

Performance of GAN-based models

Table 2 also lists several results from recent literature on this dataset for comparison completeness. Models trained by us are submitted to the evaluation server for a fair evaluation. We then compare the results of various GAN-based approaches, as shown in Table 2. We observe that a well-designed generative adversarial network (GAN) improves performance compared to techniques based on CNNs for medical image segmentation. This demonstrates GANs ability to overcome the main challenge in this domain of not having large labeled training data. Our proposed EGAN approach outperforms all other approaches in terms of Dice coefficient. A few works^8,9,35,36 report better performance compared to our results. But these works created and used an independent test split from ISIC training data and did not use the actual ISIC test data.

Table 2 Results of CNN and GAN-based approaches including our proposed algorithms (MGAN and EGAN) on the ISIC 2018 test dataset.

Full size table

Performance of lightweight models

We designed a lightweight generator model called MGAN, based on DeepLabV3+ and MobileNetV2, which achieves results comparable to our EGAN model in terms of Dice Coefficient with significantly fewer parameters and faster inference times. Table 3 compares various mobile architectures based on the Jaccard Index, the number of parameters in a million, and inference speed on the test dataset for a patch size of $512 \times 512$. As we can see from the table 3, MGAN has 2.2M parameters providing the Inference Speed of 13FPS. Even though SLSNet reports a higher performance in terms of the Jaccard Index, this result is evaluated on the independent validation test set.

Table 3 Comparison of various Mobile networks at the task of skin lesion segmentation on the ISIC 2018 dataset.

Full size table

Visualization of the learned representations

One of the criticisms of deep neural networks, which can make valuable and skillful predictions, is that they are generally opaque, i.e., it is unclear how or why a particular prediction or decision is made. To address concerns about the opacity of deep neural networks, we utilized the internal structures of convolutional neural networks that work on 2D image data to investigate the representations learned by our unsupervised model. Figure 4 displays the segmentation results for visual interpretation. The proposed GAN framework also demonstrates better segmentation performance regardless of non-skin objects or artifacts in the image. We assessed and visualized the 2D filter weights of the model to explore the features learned by the model. Additionally, we investigated the activation layers of the model to understand precisely which features the model recognized for a given input image, and we visualized the results in Fig. 3. We selected the output of seven blocks of the encoder (Block1-Block7) and four output feature maps from the decoder (D1-D4) for visualization, as the model has numerous convolutional layers in each architecture block.

Discussion

This paper has three main findings. First, we proposed a novel unsupervised adversarial learning-based framework (EGAN) based on Generative Adversarial Networks(GANs) to segment skin lesions in a fine-grained manner accurately. In data-scarce applications such as skin lesion segmentation, the success of GANs relies on the quality of the generator, discriminator, and loss function used. One of the main challenges in the field of medical imaging is the availability of large annotated data, collecting which is a tedious, consuming, and costly task. To address the data-efficiency challenge, we trained our model unsupervised, allowing the generator module to capture features effectively and segment the lesion without supervision. Our patchGAN-based discriminator penalized the adversarial network by differentiating between labels and predictions. As we do not backpropagate the error during training in the discriminator, no such advancement is needed as PatchGAN-based architecture is powerful to classify between real and fake. In skin lesion segmentation, capturing contextual information around the segmentation boundary is crucial for improving performance⁸. To address this, we implemented the morphological-based smoothing loss to capture fuzzy lesion boundaries, resulting in a highly discriminative GAN that considers contextual information and segmented boundaries. The performance-exclusive EGAN approach outperforms prior works achieving improved performance with a dice coefficient of 90.1% on the ISIC 2018 test dataset when trained with adversarial learning and morphology-based smoothing loss function compared to using the dice loss alone, which achieved a dice coefficient of 88.4%. Our evaluation of the ISIC 2018 dataset demonstrates significantly improved performance compared to existing models in the literature. Furthermore, the proposed framework’s potential can be extended to other medical imaging applications.

Second, we proposed a lightweight segmentation framework (MGAN) that achieves comparable results while being much less computationally expensive – with an order of magnitude lower number of training parameters and significantly faster inference time. The MGAN approach is suitable for real-time applications, making it a viable solution for cutting-edge deployment, for instance, in low compute resource contexts. Our proposed framework includes two generative models: EGAN and MGAN, which are designed to balance performance and efficiency. Integrating models like MGAN with dermoscopy devices has the potential to revolutionize the future of dermatology, enabling more efficient, accurate, real-time segmentation and accessible care for patients with skin lesions.

Third, our approach enables visualizing the learned representations of the model to interpret the predictions. This is especially crucial for clinical algorithms-in-the-loop applications such as skin lesion segmentation, where the decisions of automated segmentation methods could be considered by clinicians in the context of the features learned by the model.

Limitations: Although our model achieved promising performance on ISIC 2018 dataset, the performance could not be evaluated on other datasets. We explored different datasets such as Derm7pt⁴³, Diverse Dermatology Images⁴⁴, and Fitzpatrick 17k⁴⁵, among others, to assess the generalizability of the proposed approach. However, we noticed that segmentation masks were not available at the time of writing this paper. While segmentation masks were available for the PH2 dataset⁴⁶, we could not access the dataset. Deep Learning models are computationally intensive and require significant resources. EGAN model is computationally heavy for deployment in real-time clinical applications. This can limit the use in resource constrained environments or devices with limited processing capabilities. In such scenarios, models such as MGAN could be utilized.

Methods

The skin lesion GAN-based segmentation framework we propose in this work is shown in Fig. 2. The framework contains three main components: (1) the generator, which consists of an encoder to extract feature maps and a decoder to generate segmentation maps without supervision and adapt to variations in contrast and artifacts; (2) the discriminator, which distinguishes between the reference label and the segmentation output; and (3) appropriate loss functions to prevent overfitting, achieve excellent convergence, and accurately capture fuzzy lesion boundaries.

Dataset

The proposed segmentation approach was evaluated using the ISIC 2018 dataset, a standard skin lesion analysis dataset. This dataset contains 2594 images with corresponding ground truth, of which 20% (514 images) were used for validation. The images in the dataset vary in size and aspect ratio and contain lesions with different appearances in various skin areas. Some sample images from the dataset are shown in Fig. 1. To ensure a fair evaluation, the results of the test set were uploaded to the online server of the ISIC 2018⁴ portal.

Generative adversarial network

Goodfellow et al.²⁸ first introduced Generative adversarial networks (GAN) to generate synthetic data. Labeling clinical information is a tricky and time-consuming task requiring a specialist. Several medical imaging applications lack adequately annotated data. Inspired by this, the proposed work leverages unsupervised GAN for skin lesion segmentation. To begin with the methodology, we first briefly discuss generator and discriminator concepts. An adversarial network comprises a generator (G) and a discriminator (D). The generator maps a random vector $\gamma$ from source domain space $\alpha$ to generate the desired output in the target domain $\beta$ and tries to fool the discriminator. D learns to classify whether $\beta$ is real (reference ground truth) or fake (generated by (G)). The generator’s distribution $p_{G}$ learns over $\alpha$ data, input noise distribution is defined as $P_\gamma$($\gamma$),which maps data space as (G)($\gamma$; $\theta _{G}$), where differentiable function (G) has parameters $\theta _{G}$. (D)($\alpha$) is the probability distribution of $\alpha$ from the data instead of $p_{G}$.

The adversarial training is represented by following equation²⁸ which is minmax game between G and D :

$$\begin{aligned} \min _{G}\max _{D} V(D,G)&= E_{\alpha \sim P_{data}(\alpha )} [logD_{\theta _{D}}(\alpha )]\\&+ E_{\gamma \sim P_{\gamma }(\gamma )}[log(1 - logD_{\theta _{D}}(G_{\theta _{G}}(\gamma )))] \end{aligned}$$

(1)

where V is function of Discriminator (D), Generator (G),$\gamma$ is from a input noise distribution $P_{\gamma }(\gamma )$, true samples are from $P_{data}(\alpha )$ and $\theta _{G}$ are generator paramaters and $\theta _{D}$ are discriminator paramaters.

Segmentation framework

Generally, segmentation frameworks consist of encoder-decoder-based architecture. The encoder module is the block for feature extraction to capture spatial information within the image. It reduces the spatial size, i.e. the dimension of the input image, and decreases feature map resolution to catch significant level features. The decoder recuperates the spatial data by upsampling the feature map extracted by layers of the encoder and providing the output segmentation map. We propose to modify the architecture design of the encoder-decoder to capture the dense feature map rather than the traditional encoder and change the decoder appropriately, as shown in Fig. 5. Including squeeze and excitation-based compound scaled encoders significantly improves efficiency in terms of results.

Design of encoder

Advancement of CNN designs is dependent on the accessibility of infrastructure and, afterward, the scaling of the model in terms of width (w), depth (d), or resolution (r) of the network to accomplish further significant improvement in performance when there is an expansion in the availability of resources. Instead of doing this scaling manually and arbitrarily, Tan et al.⁴⁷ proposed a novel systematic and automatic scaling approach by introducing a compound coefficient. The novel technique of compound coefficient $\phi$ to efficiently scale the network’s depth, width, and resolution with a proper arrangement of scaling factors is per the following equation:

$$\begin{aligned} & w:Network\;width = \beta ^{\phi } \\ & d:Network{\mkern 1mu} depth = \alpha ^{\phi } \\ & r:Input\;Resolution = \gamma ^{\phi } \\ & satisfying\;\alpha \times \beta ^{2} \times \gamma ^{2} \approx 2 \\ & also\;\alpha {\mkern 1mu} \ge 1{\mkern 1mu} \beta \ge 1{\mkern 1mu} {\mkern 1mu} \gamma \ge 1 \\ \end{aligned}$$

(2)

The encoder is built using the above equation proposed by Baheti et al.⁴⁰, consisting of seven building blocks. Each basic building block for this encoder model is squeezing, and excitation functions⁴⁸ with mobile inverted bottleneck convolution (MBConv), as shown in Fig. 5b. Also, swish activation is used in each encoder block, enhancing performance.

Design of decoder

The encoder downsamples the input image to a smaller resolution and captures contextual information. A decoder block likewise called an upsampling path, comprises many convolutional layers that progressively upsample the feature map obtained from the encoder. The conventional segmentation framework like UNet³³ has symmetric encoder and decoder architectures. The proposed architecture builds upon a compound scaled squeeze & excitation-based encoder and decoder as an asymmetric network. The output features from the encoder are expanded in the decoder blocks consisting of bilinear upsampling. The low-level features from the encoder are combined with the higher-level feature maps from the decoder of respective sizes to generate a more precise segmentation output.

Design of lightweight segmentation framework

To develop a lightweight segmentation architecture for the generator, we leverage the power of MobileNetV2³¹ and DeepLabV3+³² consisting of atrous spatial pyramid pooling module (ASPP) as shown in Fig. 6. MobileNetV2 uses depthwise separable convolution and inverted residual blocks as the basic building module, as shown in Fig. 6 above the encoder. MobileNetV2 is modified such that the output stride, i.e., the ratio of the input image to the output image, is 8. It has fewer computations and parameters and is thus suitable for real-time applications. The ASPP block has a variety of dilation rates, i.e., 1, 6, 12, and 18, to generate multi-scale feature maps and further integrate by concatenation. This feature map is upsampled and integrated with a low-level intermediate feature map from the contracting path, i.e., encoder, to generate fine-grained segmentation output. The feature extraction consisted of blocks of inverted residual blocks, as shown in Fig. 6. The stride of the latter blocks is set as one. Images of size 512 $\times$ 512 $\times$ 3 are fed as input to MGAN architecture.

Discriminator

In our architecture, we have a generator and a discriminator. The discriminator supervises the generator to produce precise masks that match the original ground truth. We have implemented a patchGAN-based approach to achieve this, classifying each $m \times n$ mask as equivalent to the ground truth. The discriminator consists of five Conv2D layers with a kernel size of 4 $\times$ 4 and a stride of 2 $\times$ 2, with 64, 128, 256, 512, and 1 feature maps in each layer. LeakyReLU activation with an alpha value of 0.2 is used in each Conv2D layer, with the last layer using sigmoid activation. The patch-based discriminator has an output size ($m \times n$) of 16 $\times$ 16, where one pixel is linked to a patch of input probability maps with a size of 94 $\times$ 94. The discriminator classifies each patch as either fake or real. This learning strategy enforces the predicted label to be similar to the ground truth. The number of parameters is the same as proposed in patchGAN³⁰.

We practice the following adversarial technique for each generated label to align with the ground truth labels. A min-max two-step game alternatively renews the generator and discriminator network with adversarial learning. The discriminator function is given by:

$$\begin{aligned} L_{D}(x,y) = -\sum _{x,y} \gamma log(D(I_S)) + (1 - \gamma ) log(1 - D(I_T)) \end{aligned}$$

(3)

where x, y are the pixel locations of the input, D($I_S$) is the Discriminator function of Source Domain Images($I_S$), i.e., Label Image, D($I_T$) is Discriminator function of Target Domain Images ($I_T$), i.e., Predicted Image and $\gamma$ is the probability of the predicted pixel, $\gamma$ =1 when prediction is from ground truth, i.e., source domain, and $\gamma$= 0 when prediction is from generator segmented mask, i.e., target domain.

Loss function

We implement smoothing loss based on morphology to improve skin lesions segmentation and supervise the network that captures the lesion’s smoothness and fuzzy boundaries. The network’s loss function includes dice coefficient loss $(L_{DL})$ as well as the morphology-based smoothing loss $(L_{SL})$. The dice coefficient loss assesses the cross-over between the ground truth and prediction and is given by the condition:

$$\begin{aligned} L_{DL({\widehat{v}},v))} = 1 - \frac{ 2\sum _{i \in \omega } \widehat{{v}_{i}}\cdot v_{i}}{\sum _{i \in \omega } \widehat{{v}_{i}}^{2} + {\sum _{i \in \omega } v_{i}^{2}}} \end{aligned}$$

(4)

where $\omega$ is the cumulative of pixels in the input image, v, and ${\widehat{v}}$ are the original mask and predicted mask probability map, respectively.

The morphology-based smoothing loss strengthens the network to allow smooth predictions within the nearest neighbor area⁴⁹. It is pairwise interaction of binary labels written as:

$$\begin{aligned} L_{SL({\widehat{y}},y)} = \sum _{{i \in \Omega }}\sum _{{j \in \mathbb {N^{\iota }}}} B(i,j) \times y_{i} \times \mathopen |{\widehat{y}}_{i} - {\widehat{y}}_{j} \mathclose |{} & {} where: B_{i,j} = \left\{ \begin{matrix}1\hspace{0.5cm} if \hspace{0.2cm} y_{i} = y_{j} \\ 0 \hspace{0.5cm} otherwise \end{matrix}\right. \end{aligned}$$

where $\mathbb {N^{\iota }}$ is four neighbor connection of pixels. y and ${\widehat{y}}$ denote the ground truth and prediction probability maps, respectively. The four connected neighbor algorithm-based smoothing loss encourages the surrounding area of pixel j with center pixel i to produce prediction probabilities similar to the original ground-truth class ($B_{i,j} = 1$).

The combined loss function is written as:

$$\begin{aligned} L_{{\widehat{y}},y} = L_{DL}({\widehat{y}},y) + L_{SL}({\widehat{y}},y) \end{aligned}$$

(5)

Thus, the complete framework works to optimize the loss function by training the network iteratively⁴⁹.

Data availibility

The dataset is available in ISIC archive publicly at https://challenge.isic-archive.com/data/#2018

Code availability

The source code is available at https://github.com/shubhaminnani/EGAN.

References

Society, C. Melanoma stats, facts, and figures. https://www.aimatmelanoma.org/about-melanoma/melanoma-stats-facts-and-figures (2018). [Online accessed on 20-February-2020].
Rigel, D. S., Russak, J. & Friedman, R. The evolution of melanoma diagnosis: 25 years beyond the ABCDs. CA Cancer J. Clin. 60, 301–316. https://doi.org/10.3322/caac.20074 (2010).
Article PubMed Google Scholar
Centre, S. Dermoscopy and mole scans in perth and regional wa. https://myskincentre.com.au/service/dermoscopy/ (2018). [Online accessed on 20-February-2020].
Codella, N. C. F. et al. Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (ISIC). arxiv:1902.03368 (2019).
Jahanifar, M., Zamani Tajeddin, N., Mohammadzadeh Asl, B. & Gooya, A. Supervised saliency map driven segmentation of lesions in dermoscopic images. IEEE J. Biomed. Health Inf.https://doi.org/10.1109/JBHI.2018.2839647 (2019).
Article Google Scholar
Tang, P. et al. Efficient skin lesion segmentation using separable-unet with stochastic weight averaging. Comput. Methods Programs Biomed. 178, 289–301. https://doi.org/10.1016/j.cmpb.2019.07.005 (2019).
Article PubMed Google Scholar
Al-masni, M. A., Al-antari, M. A., Choi, M.-T., Han, S.-M. & Kim, T.-S. Skin lesion segmentation in dermoscopy images via deep full resolution convolutional networks. Comput. Methods Programs Biomed. 162, 221–231. https://doi.org/10.1016/j.cmpb.2018.05.027 (2018).
Article PubMed Google Scholar
Feng, R., Zhuo, L., Li, X., Yin, H. & Wang, Z. Bla-net: Boundary learning assisted network for skin lesion segmentation. Comput. Methods Programs Biomed. 226, 107190. https://doi.org/10.1016/j.cmpb.2022.107190 (2022).
Article PubMed Google Scholar
Nguyen, D. K., Tran, T.-T., Nguyen, C. P. & Pham, V.-T. Skin lesion segmentation based on integrating efficientnet and residual block into u-net neural network. In 2020 5th International Conference on Green Technology and Sustainable Development (GTSD) 366–371. https://doi.org/10.1109/GTSD50082.2020.9303084 (2020).
Hu, K., Lu, J., Lee, D., Xiong, D. & Chen, Z. As-net: Attention synergy network for skin lesion segmentation. Expert Syst. Appl. 201, 117112. https://doi.org/10.1016/j.eswa.2022.117112 (2022).
Article Google Scholar
Xie, F. et al. Skin lesion segmentation using high-resolution convolutional neural network. Comput. Methods Programs Biomed. 186, 105241. https://doi.org/10.1016/j.cmpb.2019.105241 (2020).
Article PubMed Google Scholar
Jiang, X., Jiang, J., Wang, B., Yu, J. & Wang, J. Seacu-net: Attentive convlstm u-net with squeeze-and-excitation layer for skin lesion segmentation. Comput. Methods Programs Biomed.https://doi.org/10.1016/j.cmpb.2022.107076 (2022).
Article PubMed Google Scholar
Ashraf, H., Waris, M., Ghafoor, M., Gilani, S. & Niazi, I. Melanoma segmentation using deep learning with test-time augmentations and conditional random fields. Sci. Rep.https://doi.org/10.1038/s41598-022-07885-y (2022).
Article PubMed PubMed Central Google Scholar
Wu, H. et al. Fat-net: Feature adaptive transformers for automated skin lesion segmentation. Med. Image Anal. 76, 102327. https://doi.org/10.1016/j.media.2021.102327 (2022).
Article PubMed Google Scholar
Fan, C., Yang, L., Lin, H. & Qiu, Y. Dfe-net: Dual-branch feature extraction network for enhanced segmentation in skin lesion. Biomed. Signal Process. Control 81, 104423. https://doi.org/10.1016/j.bspc.2022.104423 (2023).
Article Google Scholar
Feng, K., Ren, L., Wang, G., Wang, H. & Li, Y. Slt-net: A codec network for skin lesion segmentation. Comput. Biol. Med. 148, 105942. https://doi.org/10.1016/j.compbiomed.2022.105942 (2022).
Article PubMed Google Scholar
Mirza, M. & Osindero, S. Conditional generative adversarial nets. https://doi.org/10.48550/ARXIV.1411.1784 (2014).
Izadi, S., Mirikharaji, Z., Kawahara, J. & Hamarneh, G. Generative adversarial networks to segment skin lesions. In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). https://doi.org/10.1109/ISBI.2018.8363712 (2018).
Bissoto, A., Perez, F., Valle, E. & Avila, S. Skin lesion synthesis with generative adversarial networks. In OR 2.0 Context-Aware Operating Theaters, Computer Assisted Robotic Endoscopy, Clinical Image-Based Procedures, and Skin Image Analysis 294–302 (Springer, 2018).
Singh, V. et al. Fca-net: Adversarial learning for skin lesion segmentation based on multi-scale features and factorized channel attention. IEEE Accesshttps://doi.org/10.1109/ACCESS.2019.2940418 (2019).
Article Google Scholar
Lei, B. et al. Skin lesion segmentation via generative adversarial networks with dual discriminators. Med. Image Anal. 64, 101716. https://doi.org/10.1016/j.media.2020.101716 (2020).
Article PubMed Google Scholar
Sarker, M. M. K. et al. Slsnet: Skin lesion segmentation using a lightweight generative adversarial network. Expert Syst. Appl. 183, 115433. https://doi.org/10.1016/j.eswa.2021.115433 (2021).
Article ADS Google Scholar
Yi, X., Walia, E. & Babyn, P. Generative adversarial network in medical imaging: A review. Med. Image Anal. 58, 101552 (2019).
Article PubMed Google Scholar
Li, H., Zeng, N., Wu, P. & Clawson, K. Cov-net: A computer-aided diagnosis method for recognizing covid-19 from chest x-ray images via machine vision. Expert Syst. Appl.https://doi.org/10.1016/j.eswa.2022.118029 (2022).
Article PubMed PubMed Central Google Scholar
Yao, X., Wang, X., Wang, S.-H. & Zhang, Y.-D. A comprehensive survey on convolutional neural network in medical image analysis. Multimed. Tools Appl. 20, 1–45 (2020).
Google Scholar
Liu, M. et al. Aa-wgan: Attention augmented wasserstein generative adversarial network with application to fundus retinal vessel segmentation. Comput. Biol. Med..https://doi.org/10.1016/j.compbiomed.2023.106874 (2023).
Article PubMed PubMed Central Google Scholar
Wu, P. et al. Aggn: Attention-based glioma grading network with multi-scale feature extraction and multi-modal information fusion. Comput. Biol. Med. 152, 106457. https://doi.org/10.1016/j.compbiomed.2022.106457 (2023).
Article PubMed Google Scholar
Goodfellow, I. et al. Generative adversarial networks. Adv. Neural Inf. Process. Syst. 3, 53–65 (2014).
Google Scholar
Kervadec, H. et al. Boundary loss for highly unbalanced segmentation. In Proceedings of The 2nd International Conference on Medical Imaging with Deep Learning, vol. 102 of Proceedings of Machine Learning Research (eds Cardoso, M. J. et al.)285–296 (PMLR, 2019).
Isola, P., Zhu, J., Zhou, T. & Efros, A. A. Image-to-image translation with conditional adversarial networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 5967–5976 (2017).
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L. Mobilenetv2: Inverted residuals and linear bottlenecks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/CVPR.2018.00474 (2018).
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F. & Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Computer Vision—ECCV 2018 (eds Ferrari, V. et al.) 833–851 (Springer, 2018).
Chapter Google Scholar
Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015 (eds Navab, N. et al.) 234–241 (Springer, 2015).
Google Scholar
Lin, T. et al. Feature pyramid networks for object detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 936–944 (2017).
Han, Q. et al. Hwa-segnet: Multi-channel skin lesion image segmentation network with hierarchical analysis and weight adjustment. Comput. Biol. Med. 152, 106343. https://doi.org/10.1016/j.compbiomed.2022.106343 (2023).
Article PubMed Google Scholar
Feng, S. et al. Cpfnet: Context pyramid fusion network for medical image segmentation. IEEE Trans. Med. Imaging 39, 3008–3018. https://doi.org/10.1109/TMI.2020.2983721 (2020).
Article PubMed Google Scholar
Baheti, B., Innani, S., Gajre, S. & Talbar, S. Semantic scene segmentation in unstructured environment with modified deeplabv3+. Pattern Recogn. Lett. 138, 223–229. https://doi.org/10.1016/j.patrec.2020.07.029 (2020).
Article ADS Google Scholar
Innani, S., Dutande, P., Baheti, B., Talbar, S. & Baid, U. Fuse-pn: A novel architecture for anomaly pattern segmentation in aerial agricultural images. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2954–2962. https://doi.org/10.1109/CVPRW53098.2021.00331 (2021).
Innani, S., Dutande, P., Baheti, B., Baid, U. & Talbar, S. Deep learning based novel cascaded approach for skin lesion analysis. 2301, 06226 (2023).
Baheti, B., Innani, S., Gajre, S. & Talbar, S. Eff-unet: A novel architecture for semantic segmentation in unstructured environment. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 1473–1481. https://doi.org/10.1109/CVPRW50498.2020.00187 (2020).
Paszke, A., Chaurasia, A., Kim, S. & Culurciello, E. Enet: A deep neural network architecture for real-time semantic segmentation. https://doi.org/10.48550/ARXIV.1606.02147 (2016).
Bi, L., Feng, D. & Kim, J. Improving automatic skin lesion segmentation using adversarial learning based data augmentation. https://doi.org/10.48550/ARXIV.1807.08392 (2018).
Kawahara, J., Daneshvar, S., Argenziano, G. & Hamarneh, G. Seven-point checklist and skin lesion classification using multitask multimodal neural nets. IEEE J. Biomed. Health Inform.https://doi.org/10.1109/JBHI.2018.2824327 (2019).
Article PubMed Google Scholar
Daneshjou, R. et al. Disparities in dermatology ai performance on a diverse, curated clinical image set. Sci. Adv. 8, 6147. https://doi.org/10.1126/sciadv.abq6147 (2022).
Article ADS Google Scholar
Groh, M. et al. Evaluating deep neural networks trained on clinical images in dermatology with the fitzpatrick 17k dataset. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 1820–1828. https://doi.org/10.1109/CVPRW53098.2021.00201 (IEEE Computer Society, 2021).
Mendonça, T., Ferreira, P. M., Marques, J. S., Marcal, A. R. S. & Rozeira, J. Ph2—A dermoscopic image database for research and benchmarking. In 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 5437–5440. https://doi.org/10.1109/EMBC.2013.6610779 (2013).
Tan, M. & Le, Q. V. Efficientnet: Rethinking model scaling for convolutional neural networks. arxiv:1905.11946 (2019).
Hu, J., Shen, L. & Sun, G. Squeeze-and-excitation networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 7132–7141 (2018).
Wang, S., Yu, L., Yang, X., Fu, C.-W. & Heng, P.-A. Patch-based output space adversarial learning for joint optic disc and cup segmentation. IEEE Trans. Med. Imaging 38, 2485–2495. https://doi.org/10.1109/TMI.2019.2899910 (2019).
Article PubMed Google Scholar

Download references

Acknowledgements

The authors are grateful to the Center of Excellence, Signal and Image Processing, Shri Guru Gobind Singhji Institute of Engineering and Technology, Vishnupuri, Nanded, Maharashtra, India, for the research resources. Dr. Bakas was supported by grant NCI: U01CA242871 from National Cancer Institute, and Dr. Guntuku was supported by grant NIMHD: R01MD018340 from the National Institutes of Health. The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Author information

These authors jointly supervised this work: Bhakti Baheti and Sharath Chandra Guntuku.

Authors and Affiliations

Center of Excellence in Signal and Image Processing, Shri Guru Gobind Singhji Institute of Engineering and Technology, Nanded, Maharashtra, India
Shubham Innani, Prasad Dutande, Ujjwal Baid, Sanjay Talbar & Bhakti Baheti
Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, USA
Ujjwal Baid, Spyridon Bakas & Bhakti Baheti
Department of Computer and Information Science, University of Pennsylvania, Philadelphia, USA
Venu Pokuri & Sharath Chandra Guntuku

Authors

Shubham Innani
View author publications
You can also search for this author in PubMed Google Scholar
Prasad Dutande
View author publications
You can also search for this author in PubMed Google Scholar
Ujjwal Baid
View author publications
You can also search for this author in PubMed Google Scholar
Venu Pokuri
View author publications
You can also search for this author in PubMed Google Scholar
Spyridon Bakas
View author publications
You can also search for this author in PubMed Google Scholar
Sanjay Talbar
View author publications
You can also search for this author in PubMed Google Scholar
Bhakti Baheti
View author publications
You can also search for this author in PubMed Google Scholar
Sharath Chandra Guntuku
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.I. and P.D. conducted and analyzed the analyses, and S.I., B.B., and U.B. wrote the main manuscript text. V.P. helped in the preparation of the figures. B.B. and S.C.G. guided the complete work. S.T., B.B., S.B., and S.C.G. reviewed the manuscript.

Corresponding author

Correspondence to Shubham Innani.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Innani, S., Dutande, P., Baid, U. et al. Generative adversarial networks based skin lesion segmentation. Sci Rep 13, 13467 (2023). https://doi.org/10.1038/s41598-023-39648-8

Download citation

Received: 23 April 2023
Accepted: 28 July 2023
Published: 18 August 2023
DOI: https://doi.org/10.1038/s41598-023-39648-8

This article is cited by

Conditional adversarial segmentation and deep learning approach for skin lesion sub-typing from dermoscopic images
- P. Mirunalini
- Karthik Desingu
- S. M. Jaisakthi
Neural Computing and Applications (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

A whole-slide foundation model for digital pathology from real-world data

A guide to artificial intelligence for cancer researchers

scGPT: toward building a foundation model for single-cell multi-omics using generative AI

Introduction

Prior work

Proposed work

Results

Performance of CNN-based models

Performance of GAN-based models

Performance of lightweight models

Visualization of the learned representations

Discussion

Methods

Dataset

Generative adversarial network

Segmentation framework

Design of encoder

Design of decoder

Design of lightweight segmentation framework

Discriminator

Loss function

Data availibility

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Conditional adversarial segmentation and deep learning approach for skin lesion sub-typing from dermoscopic images

Comments

Search

Quick links