2D discrete Wavelet Transform for Image Classification and Segmentation
We propose WaveMix– a novel neural architecture for computer vision that is resource-efficient yet generalizable and scalable. WaveMix networks achieve comparable or better accuracy than the state-of-the-art convolutional neural networks, vision transformers, and token mixers for several tasks, establishing new benchmarks for segmentation on Cityscapes; and for classification on Places-365, f ive EMNIST datasets, and iNAT-mini. Remarkably, WaveMix architectures require fewer parameters to achieve these benchmarks compared to the previous state-of-the-art. Moreover, when controlled for the number of parameters, WaveMix requires lesser GPU RAM, which translates to savings in time, cost, and energy. To achieve these gains we used multi-level two-dimensional discrete wavelet transform (2D-DWT) in WaveMix blocks, which has the following advantages: (1) It reorganizes spatial information based on three strong image priors– scale-invariance, shift-invariance, and sparseness of edges, (2) in a lossless manner without adding parameters, (3) while also reducing the spatial sizes of feature maps, which reduces the memory and time required for forward and backward passes, and (4) expanding the receptive field faster than convolutions do. The whole architecture is a stack of self-similar and resolution-preserving WaveMix blocks, which allows architectural f lexibility for various tasks and levels of resource availability.
Task | Dataset | Metric | Value |
---|---|---|---|
Semantic Segmentation | Cityscapes | Single-scale mIoU | 82.70% (SOTA) |
Image Classification | ImageNet-1k | Accuracy | 75.32% |
Image Classification | CIFAR-10 | Accuracy | 95.98% |
Image Classification | Galaxy 10 DECals | Accuracy | 95.42% (SOTA) |
Task | Model | Parameters |
---|---|---|
99% Accu. in MNIST | WaveMix Lite-8/10 | 3566 |
90% Accu. in Fashion MNIST | WaveMix Lite-8/5 | 7156 |
80% Accu. in CIFAR-10 | WaveMix Lite-32/7 | 37058 |
90% Accu. in CIFAR-10 | WaveMix Lite-64/6 | 520106 |
The high parameter efficiency is obtained by replacing Deconvolution layers with Upsampling
This is an implementation of code from the following papers : Openreview Paper, ArXiv Paper 1, ArXiv Paper 2
$ pip install wavemix
import torch, wavemix
from wavemix.SemSegment import WaveMix
import torch
model = WaveMix(
num_classes= 20,
depth= 16,
mult= 2,
ff_channel= 256,
final_dim= 256,
dropout= 0.5,
level=4,
stride=2
)
img = torch.randn(1, 3, 256, 256)
preds = model(img) # (1, 20, 256, 256)
import torch, wavemix
from wavemix.classification import WaveMix
import torch
model = WaveMix(
num_classes= 1000,
depth= 16,
mult= 2,
ff_channel= 192,
final_dim= 192,
dropout= 0.5,
level=3,
patch_size=4,
)
img = torch.randn(1, 3, 256, 256)
preds = model(img) # (1, 1000)
import wavemix, torch
from wavemix.sisr import WaveMix
model = WaveMix(
depth = 4,
mult = 2,
ff_channel = 144,
final_dim = 144,
dropout = 0.5,
level=1,
)
img = torch.randn(1, 3, 256, 256)
out = model(img) # (1, 3, 512, 512)
import wavemix, torch
from wavemix import Level1Waveblock
num_classes
: int.depth
: int.mult
: int.ff_channel
: int.final_dim
: int.dropout
: float between [0, 1]
, default 0.
.level
: int.stride
: int.initial_conv
: str.patch_size
: int.@misc{
p2022wavemix,
title={WaveMix: Multi-Resolution Token Mixing for Images},
author={Pranav Jeevan P and Amit Sethi},
year={2022},
url={https://openreview.net/forum?id=tBoSm4hUWV}
}
@misc{jeevan2022wavemix,
title={WaveMix: Resource-efficient Token Mixing for Images},
author={Pranav Jeevan and Amit Sethi},
year={2022},
eprint={2203.03689},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{jeevan2023wavemix,
title={WaveMix: A Resource-efficient Neural Network for Image Analysis},
author={Pranav Jeevan and Kavitha Viswanathan and Anandu A S and Amit Sethi},
year={2023},
eprint={2205.14375},
archivePrefix={arXiv},
primaryClass={cs.CV}
}