Insightface Keras implementation
This is the keras implementation of deepinsight/insightface, and is released under the MIT License. There is no limitation for both academic and commercial usage.
The training data containing the annotation (and the models trained with these data) are available for non-commercial research purposes only.
IJBB
and IJBC
are scored at TAR@FAR=1e-4
Model backbone
are h5
models in Google drive. Links in Training
are training details.r18
/ r34
/ r50
/ r100
on glint360k
are models loaded weights from official publication.r50 magface
and r100 magface
are ported from Github IrvingMeng/MagFace.r100 4m adaface
and r100 12m adaface
are ported from Github mk-minchul/AdaFace.WebFace4M
/ WebFace12M
pretrained models cannot be used for any commercial purposes: WebFace.
Model backbone | Training | lfw | cfp_fp | agedb_30 | IJBB | IJBC |
---|---|---|---|---|---|---|
Resnet34 | CASIA, E40 | 0.994667 | 0.949143 | 0.9495 | ||
Mobilenet emb256 | Emore,E110 | 0.996000 | 0.951714 | 0.959333 | 0.887147 | 0.911745 |
Mobilenet distill | MS1MV3,E50 | 0.997333 | 0.969 | 0.975333 | 0.91889 | 0.940328 |
se_mobile_facenet | MS1MV3,E50 | 0.997333 | 0.969286 | 0.973000 | 0.922103 | 0.941913 |
Ghostnet,S2,swish | MS1MV3,E50 | 0.997333 | 0.966143 | 0.973667 | 0.923661 | 0.941402 |
Ghostnet,S1,swish | MS1MV3,E67 | 0.997500 | 0.981429 | 0.978167 | 0.93739 | 0.953163 |
EfficientNetV2B0 | MS1MV3,E67 | 0.997833 | 0.976571 | 0.977333 | 0.940701 | 0.955259 |
Botnet50 relu GDC | MS1MV3,E52 | 0.9985 | 0.980286 | 0.979667 | 0.940019 | 0.95577 |
r50 swish | MS1MV3,E50 | 0.998333 | 0.989571 | 0.984333 | 0.950828 | 0.964463 |
se_r50 swish SD | MS1MV3,E67 | 0.9985 | 0.989429 | 0.9840 | 0.956378 | 0.968144 |
Resnet101V2 swish | MS1MV3,E50 | 0.9985 | 0.989143 | 0.9845 | 0.952483 | 0.966406 |
EfficientNetV2S | MS1MV3,E67 | 0.9985 | 0.991143 | 0.986167 | 0.956475 | 0.968605 |
EffV2S,AdamW | MS1MV3,E53 | 0.998500 | 0.991429 | 0.985833 | 0.957449 | 0.97065 |
EffV2S,MagFace | MS1MV3,E53 | 0.998500 | 0.991571 | 0.984667 | 0.958325 | 0.971212 |
r100,AdaFace | MS1MV3,E53 | 0.998667 | 0.992286 | 0.984333 | 0.961636 | 0.972849 |
r100,AdaFace | Glint360k,E53 | 0.998500 | 0.993000 | 0.986000 | 0.962415 | 0.974843 |
Ported Models | ||||||
r18 converted | Glint360k | 0.997500 | 0.977143 | 0.976500 | 0.936806 | 0.9533 |
r34 converted | Glint360k | 0.998167 | 0.987000 | 0.982833 | 0.951801 | 0.9656 |
r50 converted | Glint360k | 0.998333 | 0.991 | 0.9835 | 0.957157 | 0.970292 |
r100 converted | Glint360k | 0.9985 | 0.992286 | 0.985167 | 0.962512 | 0.974689 |
r50 magface | MS1MV2,E25 | 0.998167 | 0.981143 | 0.980500 | 0.943622 | |
r100 magface | MS1MV2,E25 | 0.998333 | 0.987429 | 0.983333 | 0.949562 | |
r100 4m AdaFace | WebFace4M,E26 | 0.998333 | 0.992857 | 0.978833 | 0.960954 | 0.974485 |
r100 12m AdaFace | WebFace12M,E26 | 0.998500 | 0.993286 | 0.981667 | 0.964752 | 0.977451 |
Tensorflow 2.9.1
with cuda==11.2
cudnn==8.1
# $ ipython
# Python 3.8.5 (default, Sep 4 2020, 07:30:14)
>>> tf.__version__
# '2.9.1'
>>> import tensorflow_addons as tfa
>>> tfa.__version__
Out[3]: '0.17.0'
Or tf-nightly
conda create -n tf-nightly python==3.8.5
conda activate tf-nightly
pip install tf-nightly tfa-nightly glob2 pandas tqdm scikit-image scikit-learn ipython
# Not required
pip install pip-search icecream opencv-python cupy-cuda112 tensorflow-datasets tabulate mxnet-cu112 torch
import os
import sys
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
gpus = tf.config.experimental.list_physical_devices("GPU")
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
tensorflow
/ numpy
.LFW
CFP-FP
AgeDB-30
bin files included in MS1M-ArcFace
datasetfolders
.
# Convert `/datasets/faces_emore` to `/datasets/faces_emore_112x112_folders`
CUDA_VISIBLE_DEVICES='-1' ./prepare_data.py -D /datasets/faces_emore
# Convert evaluating bin files
CUDA_VISIBLE_DEVICES='-1' ./prepare_data.py -D /datasets/faces_emore -T lfw.bin cfp_fp.bin agedb_30.bin
Executing again will skip dataset
conversion.folder
including person folders
, each person folder
including multi face images
. Format like
. # dataset folder
├── 0 # person folder
│ ├── 100.jpg # face image
│ ├── 101.jpg # face image
│ └── 102.jpg # face image
├── 1 # person folder
│ ├── 111.jpg
│ ├── 112.jpg
│ └── 113.jpg
├── 10
│ ├── 707.jpg
│ ├── 708.jpg
│ └── 709.jpg
# bins | issame_list
img_1 img_2 | True
img_3 img_4 | True
img_5 img_6 | False
img_7 img_8 | False
Image data in bin files like CFP-FP
AgeDB-30
is not compatible with tf.image.decode_jpeg
, we need to reformat it, which is done by -T
parameter.
''' Throw error if not reformated yet '''
ValueError: Can't convert non-rectangular Python sequence to Tensor.
person folders
, and person folder
containing face images
. May run
# For dataset folder name `/dataset/Foo`
CUDA_VISIBLE_DEVICES='0' ./face_detector.py /dataset/Foo
to detect and align face images. Target saving directory will be /dataset/Foo_aligned_112_112
. Then this one can be used as data_path
for train.Train
.{dataset_name}_shuffle.npz
is saved in first time training. Remove it if dataset content changed.mobilefacenet
/ mobilenetv3
/ efficientnet
/ botnet
/ ghostnet
. Most of them are copied from keras.applications
source code and modified. Other backbones like ResNet101V2
is loaded from keras.applications
in train.buildin_models
.tf.dataset
for training. Triplet
dataset is different from others.bin
files.softmax
/ arcface
/ centerloss
/ triplet
loss functions.buildin_models
/ add_l2_regularizer_2_model
/ replace_ReLU_with_PReLU
.Train
class. It uses a scheduler
to connect different loss
/ optimizer
/ epochs
. The basic function is simply basic_model
--> build dataset
--> add output layer
--> add callbacks
--> compile
--> fit
.RandAug
and AutoAug
.YoloV5FaceDetector
, and ONNX one SCRFD
.Training example train.Train
is mostly functioned as a scheduler.
from tensorflow import keras
import losses, train, models
import tensorflow_addons as tfa
# basic_model = models.buildin_models("ResNet101V2", dropout=0.4, emb_shape=512, output_layer="E")
basic_model = models.buildin_models("MobileNet", dropout=0, emb_shape=256, output_layer="GDC")
data_path = '/datasets/faces_emore_112x112_folders'
eval_paths = ['/datasets/faces_emore/lfw.bin', '/datasets/faces_emore/cfp_fp.bin', '/datasets/faces_emore/agedb_30.bin']
tt = train.Train(data_path, save_path='keras_mobilenet_emore.h5', eval_paths=eval_paths,
basic_model=basic_model, batch_size=512, random_status=0,
lr_base=0.1, lr_decay=0.5, lr_decay_steps=16, lr_min=1e-5)
optimizer = tfa.optimizers.SGDW(learning_rate=0.1, momentum=0.9, weight_decay=5e-5)
sch = [
{"loss": losses.ArcfaceLoss(scale=16), "epoch": 5, "optimizer": optimizer},
{"loss": losses.ArcfaceLoss(scale=32), "epoch": 5},
{"loss": losses.ArcfaceLoss(scale=64), "epoch": 40},
# {"loss": losses.ArcfaceLoss(), "epoch": 20, "triplet": 64, "alpha": 0.35},
]
tt.train(sch, 0)
May use tt.train_single_scheduler
controlling the behavior more detail.
Model basically containing two parts:
input
to embedding
.Basic model
+ bottleneck
layer, like softmax
/ arcface
layer. For triplet training, Model
== Basic model
. For combined loss
training, it may have multiple outputs.Saving strategy
./checkpoints
, name is specified by train.Train
save_path
.eval_paths
evaluating bin
item, and save the best only.train.Train model parameters including basic_model
/ model
. Combine them to initialize model from different sources. Sometimes may need custom_objects
to load model.
basic_model | model | Used for |
---|---|---|
model structure | None | Scratch train |
basic model .h5 file | None | Continue training from a saved basic model |
None for 'embedding' layer or layer index of basic model output | model .h5 file | Continue training from last saved model |
None for 'embedding' layer or layer index of basic model output | model structure | Continue training from a modified model |
None | None | Reload model from "checkpoints/{save_path}" |
Scheduler is a list of dicts, each containing a training plan
model.built
is True
.None
indicates using the last one.True
will set basic_model.trainable = False
, train the output layer only.CenterLoss
to logits_loss
, and the value means loss_weight
.BatchHardTripletLoss
to logits_loss
, and the value means loss_weight
.0.35
. Alpha value for BatchHardTripletLoss
if attached.top K
value for Sub Center ArcFace method.loss_weight
for distiller_loss
using Knowledge distillation, default 7
.softmax
/ arcface
/ triplet
/ center
, but mostly this could be guessed from loss
.# Scheduler examples
sch = [
{"loss": losses.scale_softmax, "optimizer": "adam", "epoch": 2},
{"loss": keras.losses.CategoricalCrossentropy(label_smoothing=0.1), "centerloss": 0.01, "epoch": 2},
{"loss": losses.ArcfaceLoss(scale=32.0, label_smoothing=0.1), "optimizer": keras.optimizers.SGD(0.1, momentum=0.9), "epoch": 2},
{"loss": losses.BatchAllTripletLoss(0.3), "epoch": 2},
{"loss": losses.BatchHardTripletLoss(0.25), "epoch": 2},
{"loss": losses.CenterLoss(num_classes=85742, emb_shape=256), "epoch": 2},
{"loss": losses.CurricularFaceLoss(), "epoch": 2},
]
Some more complicated combinations are also supported.
# `softmax` + `centerloss`, `"centerloss": 0.1` means loss_weight
sch = [{"loss": keras.losses.CategoricalCrossentropy(label_smoothing=0.1), "centerloss": 0.1, "epoch": 2}]
# `softmax` / `arcface` + `triplet`, `"triplet": 64` means loss_weight
sch = [{"loss": keras.losses.ArcfaceLoss(scale=64), "triplet": 64, "alpha": 0.3, "epoch": 2}]
# `triplet` + `centerloss`
sch = [{"loss": losses.BatchHardTripletLoss(0.25), "centerloss": 0.01, "epoch": 2}]
sch = [{"loss": losses.CenterLoss(num_classes=85742, emb_shape=256), "triplet": 10, "alpha": 0.25, "epoch": 2}]
# `softmax` / `arcface` + `triplet` + `centerloss`
sch = [{"loss": losses.ArcfaceLoss(), "centerloss": 1, "triplet": 32, "alpha": 0.2, "epoch": 2}]
Restore training from break point
from tensorflow import keras
import losses, train
data_path = '/datasets/faces_emore_112x112_folders'
eval_paths = ['/datasets/faces_emore/lfw.bin', '/datasets/faces_emore/cfp_fp.bin', '/datasets/faces_emore/agedb_30.bin']
tt = train.Train(data_path, 'keras_mobilenet_emore.h5', eval_paths, model='./checkpoints/keras_mobilenet_emore.h5',
batch_size=512, random_status=0, lr_base=0.1, lr_decay=0.5, lr_decay_steps=16, lr_min=1e-5)
sch = [
# {"loss": losses.ArcfaceLoss(scale=16), "epoch": 5, "optimizer": optimizer},
# {"loss": losses.ArcfaceLoss(scale=32), "epoch": 5},
{"loss": losses.ArcfaceLoss(scale=64), "epoch": 35},
# {"loss": losses.ArcfaceLoss(), "epoch": 20, "triplet": 64, "alpha": 0.35},
]
tt.train(sch, initial_epoch=15)
Evaluation
import evals
basic_model = keras.models.load_model('checkpoints/keras_mobilefacenet_256_basic_agedb_30_epoch_39_0.942500.h5', compile=False)
ee = evals.eval_callback(basic_model, '/datasets/faces_emore/lfw.bin')
ee.on_epoch_end(0)
# >>>> lfw evaluation max accuracy: 0.993167, thresh: 0.316535, previous max accuracy: 0.000000, PCA accuray = 0.993167 ± 0.003905
# >>>> Improved = 0.993167
For training process, default evaluating strategy is on_epoch_end
. Setting an eval_freq
greater than 1
in train.Train
will also add an on_batch_end
evaluation.
# Change evaluating strategy to `on_epoch_end`, as long as `on_batch_end` for every `1000` batch.
tt = train.Train(data_path, 'keras_mobilefacenet_256.h5', eval_paths, basic_model=basic_model, eval_freq=1000)
L2 regularizer
value added to output_layer
.
0
for None.(0, 1)
for specific value, actual added value will also divided by 2
.>= 1
will be value multiplied by L2 regularizer
value in basic_model
if added.-1
will disable all augmentation.0
will apply random_flip_left_right
only.1
will also apply random_brightness
.2
will also apply random_contrast
and random_saturation
.3
will also apply random_crop
.>= 100
will apply RandAugment
with magnitude = 5 * random_status / 100
, so random_status=100
means using RandAugment
with magnitude=5
.2/5
area, regarding as ignoring mask area.2
/ 4
, will build model and dataset with total classes split in partial_fc_split
parts. Works also on a single GPU. Currently only ArcFace
loss family like ArcFace
/ AirFaceLoss
/ CosFaceLoss
/ MagFaceLoss
supports. Still under testing.GDC
/ E
or others to a backbone model. The first parameter stem_model
can be:
MobileNet
/ r50
/ ResNet50
or other names printed by models.print_buildin_models()
.keras.models.Model
instance. Like keras.applications.MobileNet(input_shape=(112, 112, 3), include_top=False)
.l2_regularizer
to dense
/ convolution
layers, or set apply_to_batch_normal=True
also to PReLU
/ BatchNormalization
layers. The actual added l2
value is divided by 2
.
# Will add keras.regularizers.L2(5e-4) to `dense` / `convolution` layers.
basic_model = models.add_l2_regularizer_2_model(basic_model, 1e-3, apply_to_batch_normal=False)
n
and <Enter>
anytime during training, will set training stop on that epoch ends.loss
, accuracy
and evaluating accuracy
.save_path
defined in train.Train
with suffix _hist.json
.<save_path>_hist.json
file exists._hist.json
can be used for plotting using plot.py
.CUDA_VISIBLE_DEVICES='0' ./eval_folder.py -d {DATA_PATH} -m {BASIC_MODEL.h5}
Or create own test bin file which can be used in train.Train
eval_paths
:
CUDA_VISIBLE_DEVICES='0' ./eval_folder.py -d {DATA_PATH} -m {BASIC_MODEL.h5} -B {BIN_FILE.bin}
""" Comparing images """
python image_video_test.py --images test1.jpg test2.jpg test3.jpg
# >>>> image_path: test1.jpg, faces count: 1
# >>>> image_path: test2.jpg, faces count: 1
# >>>> image_path: test3.jpg, faces count: 1
# cosine_similarities:
# [[1.0000001 1.0000001 1.0000001]
# [1.0000001 1.0000001 1.0000001]
# [1.0000001 1.0000001 1.0000001]]
""" Search in known users """
python image_video_test.py --images test.jpg --known_users test
# >>>> image_classes info:
# 0 10
# 1 10
# ...
# recognition_similarities: [0.47837412]
# recognition_classes: ['9']
# bbs: [[176.56265 54.588932 272.8746 181.40137 ]]
# ccs: [0.8820559]
# >>>> Saving result to: test_recognition_result.jpg
""" Video test """
python image_video_test.py --known_users test --video_source 0
train.Train
parameters lr_base
/ lr_decay
/ lr_decay_steps
/ lr_warmup_steps
set different decay strategies and their parameters.
tt.lr_scheduler
can also be used to set learning rate scheduler directly.
tt = train.Train(...)
import myCallbacks
tt.lr_scheduler = myCallbacks.CosineLrSchedulerEpoch(lr_base=1e-3, first_restart_step=16, warmup_steps=3)
lr_decay_steps controls different decay types.
Exponential decay
with lr_base=0.001, lr_decay=0.05
.CosineLrScheduler
, steps_per_epoch
is set after dataset been inited.CosineLrScheduler
, default value of cooldown_steps=1
, means will train 1 epoch
using lr_min
before each restart.lr_decay_steps | decay type | mean of lr_decay_steps | mean of lr_decay |
---|---|---|---|
<= 1 | Exponential decay | decay_rate | |
> 1 | Cosine decay, will multiply with steps_per_epoch | first_restart_step, epoch | m_mul |
list | Constant decay | lr_decay_steps | decay_rate |
# lr_decay_steps == 0, Exponential
tt = train.Train(..., lr_base=0.001, lr_decay=0.05, ...)
# 1 < lr_decay_steps, Cosine decay, first_restart_step = lr_decay_steps * steps_per_epoch
# restart on epoch [16 * 1 + 1, 16 * 3 + 2, 16 * 7 + 3] == [17, 50, 115]
tt = train.Train(..., lr_base=0.001, lr_decay=0.5, lr_decay_steps=16, lr_min=1e-7, ...)
# 1 < lr_decay_steps, lr_min == lr_base * lr_decay, Cosine decay, no restart
tt = train.Train(..., lr_base=0.001, lr_decay=1e-4, lr_decay_steps=24, lr_min=1e-7, ...)
# lr_decay_steps is a list, Constant
tt = train.Train(..., lr_base=0.1, lr_decay=0.1, lr_decay_steps=[3, 5, 7, 16, 20, 24], ...)
Example learning rates
from myCallbacks import exp_scheduler, CosineLrScheduler, constant_scheduler
epochs = np.arange(60)
plt.figure(figsize=(14, 6))
plt.plot(epochs, [exp_scheduler(ii, 0.001, 0.1, warmup_steps=10) for ii in epochs], label="lr=0.001, decay=0.1")
plt.plot(epochs, [exp_scheduler(ii, 0.001, 0.05, warmup_steps=10) for ii in epochs], label="lr=0.001, decay=0.05")
plt.plot(epochs, [constant_scheduler(ii, 0.001, [10, 20, 30, 40], 0.1) for ii in epochs], label="Constant, lr=0.001, decay_steps=[10, 20, 30, 40], decay_rate=0.1")
steps_per_epoch = 100
batchs = np.arange(60 * steps_per_epoch)
aa = CosineLrScheduler(0.001, first_restart_step=50, lr_min=1e-6, warmup_steps=0, m_mul=1e-3, steps_per_epoch=steps_per_epoch)
lrs = []
for ii in epochs:
aa.on_epoch_begin(ii)
lrs.extend([aa.on_train_batch_begin(jj) for jj in range(steps_per_epoch)])
plt.plot(batchs / steps_per_epoch, lrs, label="Cosine, first_restart_step=50, min=1e-6, m_mul=1e-3")
bb = CosineLrScheduler(0.001, first_restart_step=16, lr_min=1e-7, warmup_steps=1, m_mul=0.4, steps_per_epoch=steps_per_epoch)
lrs = []
for ii in epochs:
bb.on_epoch_begin(ii)
lrs.extend([bb.on_train_batch_begin(jj) for jj in range(steps_per_epoch)])
plt.plot(batchs / steps_per_epoch, lrs, label="Cosine restart, first_restart_step=16, min=1e-7, warmup=1, m_mul=0.4")
plt.xlim(0, 60)
plt.legend()
plt.grid(True)
plt.tight_layout()
Mixed precision
at the beginning of all functional code by
keras.mixed_precision.set_global_policy("mixed_float16")
~2x
speedup and less GPU memory consumption.# !pip install tensorflow-addons
!pip install tfa-nightly
import tensorflow_addons as tfa
optimizer = tfa.optimizers.SGDW(learning_rate=0.1, weight_decay=5e-4, momentum=0.9)
optimizer = tfa.optimizers.AdamW(learning_rate=0.001, weight_decay=5e-5)
weight_decay
and learning_rate
should share the same decay strategy. A callback OptimizerWeightDecay
will set weight_decay
according to learning_rate
.
opt = tfa.optimizers.AdamW(weight_decay=5e-5)
sch = [{"loss": keras.losses.CategoricalCrossentropy(label_smoothing=0.1), "centerloss": True, "epoch": 60, "optimizer": opt}]
mx.optimizer.SGD weight_decay
/ tfa.optimizers.SGDW weight_decay
/ L2_regulalizer
is explained here the discussion.# Rectified Adam,a.k.a. RAdam, [ON THE VARIANCE OF THE ADAPTIVE LEARNING RATE AND BEYOND](https://arxiv.org/pdf/1908.03265.pdf)
optimizer = tfa.optimizers.RectifiedAdam()
# SGD with Lookahead [Lookahead Optimizer: k steps forward, 1 step back](https://arxiv.org/pdf/1907.08610.pdf)
optmizer = tfa.optimizers.Lookahead(keras.optimizers.SGD(0.1))
# Ranger [Gradient Centralization: A New Optimization Technique for Deep Neural Networks](https://arxiv.org/pdf/2004.01461.pdf)
optmizer = tfa.optimizers.Lookahead(tfa.optimizers.RectifiedAdam())
tf.distribute.MirroredStrategy().scope()
with
block. This is just working in my case... The batch_size
will be multiplied by count of GPUs
.
with tf.distribute.MirroredStrategy().scope():
basic_model = ...
tt = train.Train(..., batch_size=1024, ...) # With 2 GPUs, `batch_size` will be 2048
sch = [...]
tt.train(sch, 0)
keras.losses.CategoricalCrossentropy
should specify the reduction
parameter.
sch = [{"loss": keras.losses.CategoricalCrossentropy(label_smoothing=0.1, reduction=tf.keras.losses.Reduction.NONE), "epoch": 25}]
PDF Sub-center ArcFace: Boosting Face Recognition by Large-scale Noisy Web Faces
This is still under test, Multi GPU is NOT tested
As far as I can see
Sub Center ArcFace
works like cleaning the dataset.lossTopK=3
case, it will train 3 sub classes
in each label, and each sub class
is a center
.domain center
, and remove those are too far away from this center
.large model
to clean the dataset
, and then train other models on the cleaned dataset
.Train Original MXNet version
cd ~/workspace/insightface/recognition/SubCenter-ArcFace
cp sample_config.py config.py
sed -i 's/config.ckpt_embedding = True/config.ckpt_embedding = False/' config.py
CUDA_VISIBLE_DEVICES='1' python train_parall.py --network r50 --per-batch-size 512
# Iter[20] Batch [8540], accuracy 0.80078125, loss 1.311261, lfw 0.99817, cfp_fp 0.97557, agedb_30 0.98167
CUDA_VISIBLE_DEVICES='1' python drop.py --data /datasets/faces_emore --model models/r50-arcface-emore/model,1 --threshold 75 --k 3 --output /datasets/faces_emore_topk3_1
# header0 label [5822654. 5908396.] (5822653, 4)
# total: 5800493
sed -i 's/config.ckpt_embedding = False/config.ckpt_embedding = True/' config.py
sed -i 's/config.loss_K = 3/config.loss_K = 1/' config.py
sed -i 's#/datasets/faces_emore#/datasets/faces_emore_topk3_1#' config.py
ls -1 /datasets/faces_emore/*.bin | xargs -I '{}' ln -s {} /datasets/faces_emore_topk3_1/
CUDA_VISIBLE_DEVICES='1' python train_parall.py --network r50 --per-batch-size 512
# 5800493
# Iter[20] Batch [5400], accuracy 0.8222656, loss 1.469272, lfw 0.99833, cfp_fp 0.97986, agedb_30 0.98050
Keras version train mobilenet on CASIA test
import tensorflow_addons as tfa
import train, losses, models
data_basic_path = '/datasets/faces_casia'
data_path = data_basic_path + '_112x112_folders'
eval_paths = [os.path.join(data_basic_path, ii) for ii in ['lfw.bin', 'cfp_fp.bin', 'agedb_30.bin']]
""" First, Train with `lossTopK = 3` """
basic_model = models.buildin_models("mobilenet", dropout=0, emb_shape=256, output_layer='E')
tt = train.Train(data_path, save_path='TT_mobilenet_topk_bs256.h5', eval_paths=eval_paths,
basic_model=basic_model, model=None, lr_base=0.1, lr_decay=0.1, lr_decay_steps=[20, 30],
batch_size=256, random_status=0, output_wd_multiply=1)
optimizer = tfa.optimizers.SGDW(learning_rate=0.1, weight_decay=5e-4, momentum=0.9)
sch = [
{"loss": losses.ArcfaceLoss(scale=16), "epoch": 5, "optimizer": optimizer, "lossTopK": 3},
{"loss": losses.ArcfaceLoss(scale=32), "epoch": 5, "lossTopK": 3},
{"loss": losses.ArcfaceLoss(scale=64), "epoch": 40, "lossTopK": 3},
]
tt.train(sch, 0)
""" Then drop non-dominant subcenters and high-confident noisy data, which is `>75 degrees` """
import data_drop_top_k
# data_drop_top_k.data_drop_top_k('./checkpoints/TT_mobilenet_topk_bs256.h5', '/datasets/faces_casia_112x112_folders/', limit=20)
new_data_path = data_drop_top_k.data_drop_top_k(tt.model, tt.data_path)
""" Train with the new dataset again, this time `lossTopK = 1` """
tt.reset_dataset(new_data_path)
optimizer = tfa.optimizers.SGDW(learning_rate=0.1, weight_decay=5e-4, momentum=0.9)
sch = [
{"loss": losses.ArcfaceLoss(scale=16), "epoch": 5, "optimizer": optimizer},
{"loss": losses.ArcfaceLoss(scale=32), "epoch": 5},
{"loss": losses.ArcfaceLoss(scale=64), "epoch": 40},
]
tt.train(sch, 0)
data_drop_top_k.py
can also be used as a script. -M
and -D
are required.
$ CUDA_VISIBLE_DEVICES='-1' ./data_drop_top_k.py -h
# usage: data_drop_top_k.py [-h] -M MODEL_FILE -D DATA_PATH [-d DEST_FILE]
# [-t DEG_THRESH] [-L LIMIT]
#
# optional arguments:
# -h, --help show this help message and exit
# -M MODEL_FILE, --model_file MODEL_FILE
# Saved model file path, NOT basic_model (default: None)
# -D DATA_PATH, --data_path DATA_PATH
# Original dataset path (default: None)
# -d DEST_FILE, --dest_file DEST_FILE
# Dest file path to save the processed dataset npz
# (default: None)
# -t DEG_THRESH, --deg_thresh DEG_THRESH
# Thresh value in degree, [0, 180] (default: 75)
# -L LIMIT, --limit LIMIT
# Test parameter, limit converting only the first [NUM]
# ones (default: 0)
$ CUDA_VISIBLE_DEVICES='-1' ./data_drop_top_k.py -M checkpoints/TT_mobilenet_topk_bs256.h5 -D /datasets/faces_casia_112x112_folders/ -L 20
[Discussions] SubCenter_training_Mobilenet_on_CASIA
Scenario | Max lfw | Max cfp_fp | Max agedb_30 |
---|---|---|---|
Baseline, topk 1 | 0.9822 | 0.8694 | 0.8695 |
TopK 3 | 0.9838 | 0.9044 | 0.8743 |
TopK 3->1 | 0.9838 | 0.8960 | 0.8768 |
TopK 3->1, bottleneckOnly, initial_epoch=0 | 0.9878 | 0.8920 | 0.8857 |
TopK 3->1, bottleneckOnly, initial_epoch=40 | 0.9835 | 0.9030 | 0.8763 |
PDF Improving Face Recognition from Hard Samples via Distribution Distillation Loss
data_distiller.py
works to extract embedding
data from images and save locally. MODEL_FILE
can be Keras h5
/ pytorch jit pth
/ MXNet model
.
.tfrecord
, which needs less memory while training.xxx.npz
to xxx.tfrecord
.float16
format, which needs half less disk space than default float32
.$ CUDA_VISIBLE_DEVICES='-1' ./data_distiller.py -h
# usage: data_distiller.py [-h] -D DATA_PATH [-M MODEL_FILE] [-d DEST_FILE]
# [-b BATCH_SIZE] [-L LIMIT] [--use_fp16] [--save_npz]
#
# optional arguments:
# -h, --help show this help message and exit
# -D DATA_PATH, --data_path DATA_PATH
# Data path, or npz file converting to tfrecord
# (default: None)
# -M MODEL_FILE, --model_file MODEL_FILE
# Model file, keras h5 / pytorch pth / mxnet (default:
# None)
# -d DEST_FILE, --dest_file DEST_FILE
# Dest file path to save the processed dataset (default:
# None)
# -b BATCH_SIZE, --batch_size BATCH_SIZE
# Batch size (default: 256)
# -L LIMIT, --limit LIMIT
# Test parameter, limit converting only the first [NUM]
# (default: -1)
# --use_fp16 Save using float16 (default: False)
# --save_npz Save as npz file, default is tfrecord (default: False)
$ CUDA_VISIBLE_DEVICES='0' ./data_distiller.py -M subcenter-arcface-logs/r100-arcface-msfdrop75/model,0 -D /datasets/faces_casia_112x112_folders/ -b 32 --use_fp16
# >>>> Output: faces_casia_112x112_folders_shuffle_label_embs_normed_512.npz
Then this dataset can be used to train a new model.
data_path
as the new dataset path. If key embeddings
is in, then it will be a distiller train
.distiller_loss_cosine
will be added to match this embeddings
data, default loss_weights = [1, 7]
. Parameter distill
in scheduler
set this loss weight.softmax
/ arcface
/ centerloss
/ triplet
.emb_shape
can be differ from teacher
, in this case, a dense layer distill_emb_map_layer
will be added between basic_model
embedding layer output and teacher
embedding data.import train, losses, models
import tensorflow_addons as tfa
data_basic_path = '/datasets/faces_casia'
data_path = 'faces_casia_112x112_folders_shuffle_label_embs_512_fp16.tfrecord'
eval_paths = [os.parh.join(data_basic_path, ii) for ii in ['lfw.bin', 'cfp_fp.bin', 'agedb_30.bin']]
basic_model = models.buildin_models("mobilenet", dropout=0.4, emb_shape=512, output_layer='E')
tt = train.Train(data_path, save_path='TT_mobilenet_distill_bs400.h5', eval_paths=eval_paths,
basic_model=basic_model, model=None, lr_base=0.1, lr_decay=0.1, lr_decay_steps=[20, 30],
batch_size=400, random_status=0)
optimizer = tfa.optimizers.SGDW(learning_rate=0.1, weight_decay=5e-4, momentum=0.9)
sch = [
{"loss": losses.ArcfaceLoss(scale=16), "epoch": 5, "optimizer": optimizer, "distill": 128},
{"loss": losses.ArcfaceLoss(scale=32), "epoch": 5, "distill": 128},
{"loss": losses.ArcfaceLoss(scale=64), "epoch": 40, "distill": 128},
]
tt.train(sch, 0)
Knowledge distillation result of training Mobilenet on CASIA
Teacher | emb_shape | Dropout | Optimizer | Distill | Max lfw | Max cfp_fp | Max agedb_30 |
---|---|---|---|---|---|---|---|
None | 512 | 0 | SGDW | 0 | 0.9838 | 0.8730 | 0.8697 |
None | 512 | 0.4 | SGDW | 0 | 0.9837 | 0.8491 | 0.8745 |
r100 | 512 | 0 | SGDW | 7 | 0.9900 | 0.9111 | 0.9068 |
r100 | 512 | 0.4 | SGDW | 7 | 0.9905 | 0.9170 | 0.9112 |
r100 | 512 | 0.4 | SGDW | 128 | 0.9955 | 0.9376 | 0.9465 |
r100 | 512 | 0.4 | AdamW | 128 | 0.9920 | 0.9346 | 0.9387 |
r100 | 512 | 0.4 | AdamW | 128 | 0.9920 | 0.9346 | 0.9387 |
r100 | 256 | 0 | SGDW | 128 | 0.9937 | 0.9337 | 0.9427 |
r100 | 256 | 0.4 | SGDW | 128 | 0.9942 | 0.9369 | 0.9448 |
Knowledge distillation using Mobilenet on MS1M dataset
Teacher | emb_shape | Dropout | Optimizer | Distill | Max lfw | Max cfp_fp | Max agedb_30 |
---|---|---|---|---|---|---|---|
r100 | 512 | 0.4 | SGDW | 128 | 0.997 | 0.964 | 0.972833 |
IJB
dataset /media/SD/IJB_release
, basic usage will be:
# Test mxnet model, default scenario N0D1F1
CUDA_VISIBLE_DEVICES='1' python IJB_evals.py -m '/media/SD/IJB_release/pretrained_models/MS1MV2-ResNet100-Arcface/model,0' -d /media/SD/IJB_release -L
# Test keras h5 model, default scenario N0D1F1
CUDA_VISIBLE_DEVICES='1' python IJB_evals.py -m 'checkpoints/basic_model.h5' -d /media/SD/IJB_release -L
# `-B` to run all 8 tests N{0,1}D{0,1}F{0,1}
CUDA_VISIBLE_DEVICES='1' python IJB_evals.py -m 'checkpoints/basic_model.h5' -d /media/SD/IJB_release -B -L
# `-N` to run 1N test
CUDA_VISIBLE_DEVICES='1' python IJB_evals.py -m 'checkpoints/basic_model.h5' -d /media/SD/IJB_release -N -L
# `-E` to save embeddings data
CUDA_VISIBLE_DEVICES='1' python IJB_evals.py -m 'checkpoints/basic_model.h5' -d /media/SD/IJB_release -E
# Then can be restored for other tests, add `-E` to save again
python IJB_evals.py -R IJB_result/MS1MV2-ResNet100-Arcface_IJBB.npz -d /media/SD/IJB_release -B
# Plot result only, this needs the `label` data, which can be saved using `-L` parameter.
# Or should provide the label txt file.
python IJB_evals.py --plot_only /media/SD/IJB_release/IJBB/result/*100*.npy /media/SD/IJB_release/IJBB/meta/ijbb_template_pair_label.txt
-h
for detail usage.
python IJB_evals.py -h
Test using TFLite Model Benchmark Tool
Platform
Qualcomm Technologies, Inc SDM630
Android
TFLite
mobilenet_v2 comparing orignal
/ dynamic
/ float16
/ uint8
conversion of TFLite
model. Using header GDC + emb_shape=512 + pointwise_conv=False
.
mobilenet_v2 | Size (MB) | threads=1 (ms) | threads=4 (ms) |
---|---|---|---|
orignal | 11.576 | 52.224 | 18.102 |
orignal xnn | 11.576 | 29.116 | 8.744 |
dynamic | 3.36376 | 38.497 | 20.008 |
dynamic xnn | 3.36376 | 37.433 | 19.234 |
float16 | 5.8267 | 53.986 | 19.191 |
float16 xnn | 5.8267 | 29.862 | 8.661 |
uint8 | 3.59032 | 27.247 | 10.783 |
mobilenet_v2 comparing different headers using float16 conversion + xnn + threads=4
emb_shape | output_layer | pointwise_conv | PReLU | Size (MB) | Time (ms) |
---|---|---|---|---|---|
256 | GDC | False | False | 5.17011 | 8.214 |
512 | GDC | False | False | 5.82598 | 8.436 |
256 | GDC | True | False | 6.06384 | 9.129 |
512 | GDC | True | False | 6.32542 | 9.357 |
256 | E | True | False | 9.98053 | 10.669 |
256 | E | False | False | 14.9618 | 11.502 |
512 | E | True | False | 14.174 | 11.958 |
512 | E | False | False | 25.4481 | 15.063 |
512 | GDC | False | True | 5.85275 | 10.481 |
Backbones comparing using float16 conversion + xnn + threads=4
, header GDC + emb_shape=512 + pointwise_conv=False
Model | Size (MB) | Time (ms) |
---|---|---|
mobilenet_v3_small | 2.80058 | 4.211 |
mobilenet_v3_large | 6.95015 | 10.025 |
ghostnet strides=2 | 8.06546 | 11.125 |
mobilenet | 7.4905 | 11.836 |
se_mobilefacenet | 1.88518 | 18.713 |
mobilefacenet | 1.84267 | 20.443 |
EB0 | 9.40449 | 22.054 |
EB1 | 14.4268 | 31.881 |
ghostnet strides=1 | 8.16576 | 46.142 |
mobilenet_m1 | 7.02651 | 52.648 |
@misc{leondgarse,
author = {Leondgarse},
title = {Keras Insightface},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
doi = {10.5281/zenodo.6506949},
howpublished = {\url{https://github.com/leondgarse/Keras_insightface}}
}