tensorflow implementation for "High-Resolution Representations for Labeling Pixels and Regions"
This is a tensorflow implementation of high-resolution representations for ImageNet classification. The network structure and training hyperparamters are kept the same as the offical pytorch implementation.
First, the four-resolution feature maps are fed into a bottleneck and the number of output channels are increased to 128, 256, 512, and 1024, respectively. Then, we downsample the high-resolution representations by a 2-strided 3x3 convolution outputting 256 channels and add them to the representations of the second-high-resolution representations. This process is repeated two times to get 1024 channels over the small resolution. Last, we transform 1024 channels to 2048 channels through a 1x1 convolution, followed by a global average pooling operation. The output 2048-dimensional representation is fed into the classifier.
model | #Params | GFLOPs | top-1 error | top-5 error | Link |
---|---|---|---|---|---|
HRNet-W18-C | 21.3M | 3.99 | 24.2% | 7.3% | TF-HRNET-W18 |
HRNet-W30-C | 37.7M | 7.55 | 21.9% | 6.0% | TF-HRNet-W30 |
This repo is built on tensorflow 1.12 and Python 3.6
pip install -r requirements.txt
Please follow instructions to converted imagenet dataset from images to tfrecords. This can accelerate the training speed significantly. After convertion, you will have tfrecords files under data/tfrecords
as below
# training files
train-00000-of-01024
train-00001-of-01024
...
# validation files
validation-00000-of-00128
validation-00001-of-00128
...
python top/train.py --net_cfg cfgs/w30_s4.cfg --data_path /path/to/tfrecords
resume_training
to enable resume training.python top/train.py --net_cfg cfgs/w30_s4.cfg --data_path /path/to/tfrecords --resume_training
models
.python top/train.py --net_cfg cfgs/w30_s4.cfg --data_path /path/to/tfrecords --eval_only
nb_gpus
and extra_args
in ./scripts/run_horovod.sh
. For example, if you want to train HRNet-w30 by using 4 GPUs, the scripts would be like belownb_gpus=4
extra_args='--net_cfg cfgs/w30_s4.cfg'
echo "multi-GPU training enabled"
mpirun -np ${nb_gpus} -bind-to none -map-by slot -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH \
-mca pml ob1 -mca btl ^openib \
python top/train.py --enbl_multi_gpu
If you find this work or code is helpful in your research, please cite:
@inproceedings{SunXLW19,
title={Deep High-Resolution Representation Learning for Human Pose Estimation},
author={Ke Sun and Bin Xiao and Dong Liu and Jingdong Wang},
booktitle={CVPR},
year={2019}
}
@article{SunZJCXLMWLW19,
title={High-Resolution Representations for Labeling Pixels and Regions},
author={Ke Sun and Yang Zhao and Borui Jiang and Tianheng Cheng and Bin Xiao
and Dong Liu and Yadong Mu and Xinggang Wang and Wenyu Liu and Jingdong Wang},
journal = {CoRR},
volume = {abs/1904.04514},
year={2019}
}