File size: 13,312 Bytes
8aed5f0
 
b6bb21d
8aed5f0
 
 
 
 
b6bb21d
8aed5f0
b6bb21d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22d13a6
 
 
 
 
 
b6bb21d
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
---
title: Pet Image Segmentation using PyTorch
emoji: 😻
colorFrom: blue
colorTo: pink
sdk: gradio
sdk_version: 5.4.0
app_file: run_webapp.py
pinned: true
license: mit
short_description: Segments pet image into foreground, background & boundary
---
# Pet Image Segmentation using PyTorch

[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co./spaces/soumyaprabhamaiti/pet-image-segmentation-pytorch)

This project focuses on segmenting pet images into three classes: background, pet, and boundary using a [U-Net](https://arxiv.org/abs/1505.04597) model implemented in PyTorch. The model is trained on [Oxford-IIIT Pet Dataset](https://www.robots.ox.ac.uk/~vgg/data/pets/) and the web app for inference is deployed using [Gradio](https://gradio.app/).

## Webapp Demo

The deployed version of this project can be accessed at [Hugging Face Spaces](https://huggingface.co./spaces/soumyaprabhamaiti/pet-image-segmentation-pytorch). Segmentation on a sample image is shown below:
![Segmentation on a sample image](readme_images/webapp.png)

## Installing Locally

1. Clone the repository:
    ```
    git clone https://github.com/soumya-prabha-maiti/pet-image-segmentation-pytorch.git
    ```

1. Navigate to the project folder:
    ```
    cd pet-image-segmentation-pytorch
    ```

1. Create and activate a virtual environment:
    ```
    python -m venv env
    source env/bin/activate  # On Windows use `env\Scripts\activate`
    ```

1. Install the required libraries:
    ```
    pip install -r requirements.txt
    ```

1. Run the application:
    ```
    python run_webapp.py
    ```

## Dataset

The [Oxford-IIIT Pet Dataset](https://www.robots.ox.ac.uk/~vgg/data/pets/) contains 37 categories of pets with roughly 200 images for each category. The images have a large variation in scale, pose and lighting. All images have an associated ground truth annotation of breed, head ROI, and pixel level trimap segmentation. Here the dataset was obtained using Torchvision.

## Model Architecture

The segmentation model uses the UNET architecture. The basic architecture of the UNET model is shown below:
![UNET Architecture](readme_images/unet.png)

The UNET model consists of an encoder and a decoder. The encoder is a series of convolutional layers that extract features from the input image. The decoder is a series of transposed convolutional layers that upsample the features to the original image size. Skip connections are used to connect the encoder and decoder layers. The skip connections concatenate the feature maps from the encoder to the corresponding feature maps in the decoder. This helps the decoder to recover the spatial information lost during the encoding process.

<details>
    <summary>Detailed architecture of the UNET model used in this project</summary>

    ==========================================================================================
    Layer (type:depth-idx)                   Output Shape              Param #
    ==========================================================================================
    UNet                                     [16, 3, 128, 128]         --
    ├─ModuleList: 1-9                        --                        (recursive)
    │    └─DoubleConvOriginal: 2-1           [16, 16, 128, 128]        --
    │    │    └─Sequential: 3-1              [16, 16, 128, 128]        --
    │    │    │    └─Conv2d: 4-1             [16, 16, 128, 128]        432
    │    │    │    └─BatchNorm2d: 4-2        [16, 16, 128, 128]        32
    │    │    │    └─ReLU: 4-3               [16, 16, 128, 128]        --
    │    │    │    └─Conv2d: 4-4             [16, 16, 128, 128]        2,304
    │    │    │    └─BatchNorm2d: 4-5        [16, 16, 128, 128]        32
    │    │    │    └─ReLU: 4-6               [16, 16, 128, 128]        --
    ├─MaxPool2d: 1-2                         [16, 16, 64, 64]          --
    ├─ModuleList: 1-9                        --                        (recursive)
    │    └─DoubleConvOriginal: 2-2           [16, 32, 64, 64]          --
    │    │    └─Sequential: 3-2              [16, 32, 64, 64]          --
    │    │    │    └─Conv2d: 4-7             [16, 32, 64, 64]          4,608
    │    │    │    └─BatchNorm2d: 4-8        [16, 32, 64, 64]          64
    │    │    │    └─ReLU: 4-9               [16, 32, 64, 64]          --
    │    │    │    └─Conv2d: 4-10            [16, 32, 64, 64]          9,216
    │    │    │    └─BatchNorm2d: 4-11       [16, 32, 64, 64]          64
    │    │    │    └─ReLU: 4-12              [16, 32, 64, 64]          --
    ├─MaxPool2d: 1-4                         [16, 32, 32, 32]          --
    ├─ModuleList: 1-9                        --                        (recursive)
    │    └─DoubleConvOriginal: 2-3           [16, 64, 32, 32]          --
    │    │    └─Sequential: 3-3              [16, 64, 32, 32]          --
    │    │    │    └─Conv2d: 4-13            [16, 64, 32, 32]          18,432
    │    │    │    └─BatchNorm2d: 4-14       [16, 64, 32, 32]          128
    │    │    │    └─ReLU: 4-15              [16, 64, 32, 32]          --
    │    │    │    └─Conv2d: 4-16            [16, 64, 32, 32]          36,864
    │    │    │    └─BatchNorm2d: 4-17       [16, 64, 32, 32]          128
    │    │    │    └─ReLU: 4-18              [16, 64, 32, 32]          --
    ├─MaxPool2d: 1-6                         [16, 64, 16, 16]          --
    ├─ModuleList: 1-9                        --                        (recursive)
    │    └─DoubleConvOriginal: 2-4           [16, 128, 16, 16]         --
    │    │    └─Sequential: 3-4              [16, 128, 16, 16]         --
    │    │    │    └─Conv2d: 4-19            [16, 128, 16, 16]         73,728
    │    │    │    └─BatchNorm2d: 4-20       [16, 128, 16, 16]         256
    │    │    │    └─ReLU: 4-21              [16, 128, 16, 16]         --
    │    │    │    └─Conv2d: 4-22            [16, 128, 16, 16]         147,456
    │    │    │    └─BatchNorm2d: 4-23       [16, 128, 16, 16]         256
    │    │    │    └─ReLU: 4-24              [16, 128, 16, 16]         --
    ├─MaxPool2d: 1-8                         [16, 128, 8, 8]           --
    ├─ModuleList: 1-9                        --                        (recursive)
    │    └─DoubleConvOriginal: 2-5           [16, 256, 8, 8]           --
    │    │    └─Sequential: 3-5              [16, 256, 8, 8]           --
    │    │    │    └─Conv2d: 4-25            [16, 256, 8, 8]           294,912
    │    │    │    └─BatchNorm2d: 4-26       [16, 256, 8, 8]           512
    │    │    │    └─ReLU: 4-27              [16, 256, 8, 8]           --
    │    │    │    └─Conv2d: 4-28            [16, 256, 8, 8]           589,824
    │    │    │    └─BatchNorm2d: 4-29       [16, 256, 8, 8]           512
    │    │    │    └─ReLU: 4-30              [16, 256, 8, 8]           --
    ├─MaxPool2d: 1-10                        [16, 256, 4, 4]           --
    ├─DoubleConvOriginal: 1-11               [16, 512, 4, 4]           --
    │    └─Sequential: 2-6                   [16, 512, 4, 4]           --
    │    │    └─Conv2d: 3-6                  [16, 512, 4, 4]           1,179,648
    │    │    └─BatchNorm2d: 3-7             [16, 512, 4, 4]           1,024
    │    │    └─ReLU: 3-8                    [16, 512, 4, 4]           --
    │    │    └─Conv2d: 3-9                  [16, 512, 4, 4]           2,359,296
    │    │    └─BatchNorm2d: 3-10            [16, 512, 4, 4]           1,024
    │    │    └─ReLU: 3-11                   [16, 512, 4, 4]           --
    ├─ModuleList: 1-12                       --                        --
    │    └─ConvTranspose2d: 2-7              [16, 256, 8, 8]           524,544
    │    └─DoubleConvOriginal: 2-8           [16, 256, 8, 8]           --
    │    │    └─Sequential: 3-12             [16, 256, 8, 8]           --
    │    │    │    └─Conv2d: 4-31            [16, 256, 8, 8]           1,179,648
    │    │    │    └─BatchNorm2d: 4-32       [16, 256, 8, 8]           512
    │    │    │    └─ReLU: 4-33              [16, 256, 8, 8]           --
    │    │    │    └─Conv2d: 4-34            [16, 256, 8, 8]           589,824
    │    │    │    └─BatchNorm2d: 4-35       [16, 256, 8, 8]           512
    │    │    │    └─ReLU: 4-36              [16, 256, 8, 8]           --
    │    └─ConvTranspose2d: 2-9              [16, 128, 16, 16]         131,200
    │    └─DoubleConvOriginal: 2-10          [16, 128, 16, 16]         --
    │    │    └─Sequential: 3-13             [16, 128, 16, 16]         --
    │    │    │    └─Conv2d: 4-37            [16, 128, 16, 16]         294,912
    │    │    │    └─BatchNorm2d: 4-38       [16, 128, 16, 16]         256
    │    │    │    └─ReLU: 4-39              [16, 128, 16, 16]         --
    │    │    │    └─Conv2d: 4-40            [16, 128, 16, 16]         147,456
    │    │    │    └─BatchNorm2d: 4-41       [16, 128, 16, 16]         256
    │    │    │    └─ReLU: 4-42              [16, 128, 16, 16]         --
    │    └─ConvTranspose2d: 2-11             [16, 64, 32, 32]          32,832
    │    └─DoubleConvOriginal: 2-12          [16, 64, 32, 32]          --
    │    │    └─Sequential: 3-14             [16, 64, 32, 32]          --
    │    │    │    └─Conv2d: 4-43            [16, 64, 32, 32]          73,728
    │    │    │    └─BatchNorm2d: 4-44       [16, 64, 32, 32]          128
    │    │    │    └─ReLU: 4-45              [16, 64, 32, 32]          --
    │    │    │    └─Conv2d: 4-46            [16, 64, 32, 32]          36,864
    │    │    │    └─BatchNorm2d: 4-47       [16, 64, 32, 32]          128
    │    │    │    └─ReLU: 4-48              [16, 64, 32, 32]          --
    │    └─ConvTranspose2d: 2-13             [16, 32, 64, 64]          8,224
    │    └─DoubleConvOriginal: 2-14          [16, 32, 64, 64]          --
    │    │    └─Sequential: 3-15             [16, 32, 64, 64]          --
    │    │    │    └─Conv2d: 4-49            [16, 32, 64, 64]          18,432
    │    │    │    └─BatchNorm2d: 4-50       [16, 32, 64, 64]          64
    │    │    │    └─ReLU: 4-51              [16, 32, 64, 64]          --
    │    │    │    └─Conv2d: 4-52            [16, 32, 64, 64]          9,216
    │    │    │    └─BatchNorm2d: 4-53       [16, 32, 64, 64]          64
    │    │    │    └─ReLU: 4-54              [16, 32, 64, 64]          --
    │    └─ConvTranspose2d: 2-15             [16, 16, 128, 128]        2,064
    │    └─DoubleConvOriginal: 2-16          [16, 16, 128, 128]        --
    │    │    └─Sequential: 3-16             [16, 16, 128, 128]        --
    │    │    │    └─Conv2d: 4-55            [16, 16, 128, 128]        4,608
    │    │    │    └─BatchNorm2d: 4-56       [16, 16, 128, 128]        32
    │    │    │    └─ReLU: 4-57              [16, 16, 128, 128]        --
    │    │    │    └─Conv2d: 4-58            [16, 16, 128, 128]        2,304
    │    │    │    └─BatchNorm2d: 4-59       [16, 16, 128, 128]        32
    │    │    │    └─ReLU: 4-60              [16, 16, 128, 128]        --
    ├─Conv2d: 1-13                           [16, 3, 128, 128]         51
    ==========================================================================================
    Total params: 7,778,643
    Trainable params: 7,778,643
    Non-trainable params: 0
    Total mult-adds (Units.GIGABYTES): 17.01
    ==========================================================================================
    Input size (MB): 3.15
    Forward/backward pass size (MB): 595.59
    Params size (MB): 31.11
    Estimated Total Size (MB): 629.85
    ==========================================================================================
</details>

## Metrics
|                                             | # Parameters | Foreground IoU | Background IoU | Boundary IoU |
|---------------------------------------------|--------------|----------------|----------------|--------------|
| U-Net                                       | 7.8 M        | 0.72           | 0.84           | 0.36         |
| U-Net with depthwise separable convolutions | 1.5 M        | 0.71           | 0.83           | 0.32         |

## Libraries Used

The following libraries were used in this project:

- PyTorch + PyTorch Lightning : To build segmentation model.
- Gradio : To create the user interface for the segmentation app.

## License

This project is licensed under the [MIT License](LICENSE).