Minimum:
tar
extraction temporarily requires additional free space 440 GB.Device | Recommended BSZ |
---|---|
1080ti, 2080ti (11GB), Titan X, Titan V (12GB), AWS/GCP V100(16 GB) | 320 |
Quadro RTX 6000 (24 GB), 3090 (24GB) | 640 |
V100v2 (32GB), AWS/GCP A100 (40 GB) | 1280 |
allow_gpu_memory_growth()
from the run.py.# CUDA 10.1-based image
docker pull mimbres/neural-audio-fp:latest
# CUDA 11.2-based image for RTX 30x0 and later
docker pull mimbres/neural-audio-fp:cuda11.2.0-cudnn8
You can create an image through Dockerfile
and environment.yml
.
git clone https://github.com/mimbres/neural-audio-fp.git
cd neural-audio-fp
docker build -t neural-audio-fp .
libopenblas
from Dockerfile.Faiss
and Numpy
are optimized for Intel MKL.NVIDIA driver >= 450.80.02
, CUDA >= 11.0
and cuDNN 8
(Compatiability)
NVIDIA driver >= 440.33
, CUDA == 10.2
and cuDNN 7
(Compatiability)
After checking the requirements,
git clone https://github.com/mimbres/neural-audio-fp.git
cd neural-audio-fp
conda env create -f environment.yml
conda activate fp
# Python 3.8: installing in the same virtual environment
conda create -n YOUR_ENV_NAME
conda install -c anaconda -c pytorch tensorflow=2.4.1=gpu_py38h8a7d6ce_0 cudatoolkit faiss-gpu=1.6.5
conda install pyyaml click matplotlib
conda install -c conda-forge librosa
pip install kapre wavio
tensorflow
and faiss-gpu=1.6.5
(not 1.7.1) in separate environments.#After creating a tensorflow environment for training...
conda create -n YOUR_ENV_NAME
conda install -c pytorch faiss-gpu=1.6.5
conda install pyyaml, click
Now you can run search & evaluation by
python eval/eval_faiss.py --help
Dataset-mini v1.1 (11.2 GB) | Dataset-full v1.1 (443 GB) | |
---|---|---|
tar | :eight_spoked_asterisk:kaggle / gdrive | dataport(open-access) |
raw | gdrive | gdrive |
Dataset-mini
. Dataset-full
is for
testing in 100x larger scale.Dataset-mini
via kaggle
CLI (recommended).
kaggle.json
pip install --user kaggle
cp kaggle.json ~/.kaggle/ && chmod 600 ~/.kaggle/kaggle.json
kaggle datasets download -d mimbres/neural-audio-fingerprint
100%|███████████████████████████████████| 9.84G/9.84G [02:28<00:00, 88.6MB/s]
This dataset includes all music sources, background noises, impulse-reponses (IR) samples that can be used for reproducing the ICASSP results.
The default directory of the dataset is ../neural-audio-fp-dataset
. You can
change the directory location by modifying config/default.yaml
.
.
├── neural-audio-fp-dataset
└── neural-audio-fp
neural-audio-fp-dataset/
├── aug
│ ├── bg <=== Audioset, Pub/cafe etc. for background noise mix
│ ├── ir <=== IR data for microphone and room reverb simulatio
│ └── speech <=== subset of common-voice, NOT USED IN THE PAPER RESULT
├── extras
│ └── fma_info <=== Meta data for music sources.
└── music
├── test-dummy-db-100k-full <== 100K songs of full-lengths
├── test-query-db-500-30s <== 500 songs (30s) and 2K synthesized queries
├── train-10k-30s <== 10K songs (30s) for training
└── val-query-db-500-30s <== 500 songs (30s) for validation/mini-search
The data format is 16-bit 8000 Hz PCM Mono WAV
. README.md
and LICENSE
is
included in the dataset for more details.
Install checksumdir
.
pip install checksumdir
Compare checksum.
checksumdir -a md5 neural-audio-fp-dataset
# aa90a8fbd3e6f938cac220d8aefdb134
checksumdir -a sha1 neural-audio-fp-dataset
# 5bbeec7f5873d8e5619d6b0de87c90e180363863d
There are 3 basic COMMAND
s for each step.
# Train
python run.py train CHECKPOINT_NAME
# Generate fingreprint
python run.py generate CHECKPOINT_NAME
# Search & Evalutaion (after generating fingerprint)
python run.py evaluate CHECKPOINT_NAME CHECKPOINT_INDEX
Help for run.py
client and its commands.
python run.py --help
python run.py COMMAND --help
Click to expand each topic.
python run.py train CHECKPOINT_NAME CHECKPOINT_INDEX
CHECKPOINT_INDEX
is not specified, the training will resume from the
latest checkpoint.default
configuration, all checkpoints are stored in
logs/checkpoint/CHECKPOINT_NAME/ckpt-CHECKPOINT_INDEX.index
.python run.py train CHECKPOINT --max_epoch=100 -c default
Notes:
default
config is set TR_BATCH_SZ
=120 with OPTIMIZER
=Adam
.TR_BATCH_SZ
>= 240, OPTIMIZER
=LAMB
is recommended.TR_BATCH_SZ
>= 1280, LR
=1e-4
can be too small.TAU
is in the
range of [0.05, 0.1].The config file is located in config/CONFIG_NAME.yaml
.
You can edit directory location
, data selection
, hyperparameters for
model
and optimizer
, batch-size
, strategies for time-domain and
spectral-domain augmentation chain
, etc. After training, it is important
to keep the config file in order to restore the model.
python run.py COMMAND -c CONFIG
When using generate
command, it is important to use the same config that was used
in training.
python run.py generate CHECKPOINT_NAME # from the latest checkpoint
python run.py generate CHECKPOINT_NAME CHECKPOINT_INDEX -c CONFIG_NAME
# Location of the generated fingerprint
.
└──logs
└── emb
└── CHECKPOINT_NAME
└── CHECKPOINT_INDEX
├── db.mm
├── db_shape.npy
├── dummy_db.mm
├── dummy_db_shape.npy
├── query.mm
└── query_shape.npy
By default
config, generate
will generate embeddings (or fingerprints)
from 'dummy_db', test_query
and test_db
. The generated embeddings will
be located in logs/emb/CHECKPOINT_NAME/CHECKPOINT_INDEX/**.mm
and
**.npy
.
dummy_db
is generated from the 100K full-length dataset.DATASEL
section of config, you can select options for a pair of
db
and query
generation. The default is unseen_icassp
, which uses a
pre-defined test set.db
and query
pairs by
--skip_dummy
option. This is a frequently used option to avoid overwriting
the most time-consuming dummy_db
fingerprints in every experiment.python run.py generate --source SOURCE_ROOT_DIR --output FP_OUTPUT_DIR --skip_dummy # for custom audio source
python run.py generate --help # more details...
The following command will construct a faiss.index
from the generated
embeddings or fingerprints located at
logs/emb/CHECKPOINT_NAME/CHECKPOINT_INDEX/
.
# faiss-gpu
python run.py evaluate CHECKPOINT_NAME CHECKPOINT_INDEX [OPTIONS]
# faiss-cpu
python run.py evaluate CHECKPOINT_NAME CHECKPOINT_INDEX --nogpu
In addition, you can choose one of the --index_type
(default is IVFPQ
)
from the table below:
Type of index | Description |
---|---|
l2 |
L2 distance |
ivf |
Inverted File Index (IVF) |
ivfpq |
Product Quantization (PQ) with IVF :book: |
ivfpq-rr |
IVF-PQ with re-ranking |
ivfpq-rr-ondisk |
|
hnsw |
Hierarchical Navigable Small World :book: |
python run.py evaluate CHECKPOINT_NAME CHECKPOINT_INDEX --index_type IVFPQ
Currently, few options for Faiss
settings are available in run.py
client.
Instead, you can directly run:
python eval/eval_faiss.py EMB_DIR --index_type IVFPQ --kprobe 20 --nogpu
python eval/eval_faiss.py --help
Note that eval_faiss.py
does not require Tensorflow
.
Tensorboard is enabled by default in the ['TRAIN']
section of the config file.
# Run Tensorboard
tensorboard --logdir=logs/fit --port=8900 --host=0.0.0.0
Here is an overview of the system for building and retrieving the database. The system and 'matcher' algorithm are not detailed in the paper. But it's very simple as in this code.
tf.data
-based new data pipeline for multi-GPU and TPU support.Augmentation demo was generated by dataset2wav.py.
This project has been supported by the TPU Research Cloud (TRC) program.
@conference {chang2021neural,
author={Chang, Sungkyun and Lee, Donmoon and Park, Jeongsoo and Lim, Hyungui and Lee, Kyogu and Ko, Karam and Han, Yoonchang},
title={Neural Audio Fingerprint for High-specific Audio Retrieval based on Contrastive Learning},
booktitle={International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)},
year = {2021}
}