📲 Transformers android examples (Tensorflow Lite & Pytorch Mobile)
Sentiment classification finetuned on Movie Review Dataset (IMDB English Dataset, NSMC Korean Dataset). Both English and Korean are supported.
Available models:
- "Original" TorchScript ELECTRA-Small (53MB)
- "Original" TFLite ELECTRA-Small (53MB)
- FP16 post-training-quantized TFLite ELECTRA-Small (26MB)
- "hybrid" (8-bits precision weights) post-training-quantized TFLite ELECTRA-Small (13MB)
Most of the assets are from Official Pytorch Android Code. (Tested with Galaxy S10)
📱 APK Download Link 📱
libs
directory contains a custom build of TensorFlow Lite with TensorFlow ops built-in, which is used by the app. It results in a bigger binary than the "normal" build but allows compatibility with ELECTRA-Small.Open an existing Android Studio project
.Run
to run the demo app on your Android device.If Android SDK and Android NDK are already installed you can install this application to the connected android device with:
./gradlew installDebug
To convert the original model
to tflite
format, it has to use select TensorFlow Ops
. It results in a bigger binary than the "normal" build but allows compatibility with Transformers architecture.
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.target_spec.supported_ops = [tf.lite.OpsSet.SELECT_TF_OPS]
🚨 Using the transformers tflite model, you should build aar file by yourself. (Please check this documentation for tflite ops) 🚨
In this app, I used the same aar file provided from huggingface demo app. (The libs
directory contains a custom build of aar.)
dependencies {
implementation 'org.pytorch:pytorch_android:1.5.0'
implementation 'org.pytorch:pytorch_android_torchvision:1.5.0'
// implementation 'org.tensorflow:tensorflow-lite:2.1.0'
// implementation 'org.tensorflow:tensorflow-lite-select-tf-ops:0.0.0-nightly'
implementation(name: 'tensorflow-lite-with-select-tf-ops-0.0.0-nightly', ext: 'aar')
}
🚨 Highly recommend to use tensorflow v2.1.0
instead of tensorflow v2.2.0
. TF-Lite conversion is not working in tensorflow v2.2.0
. (Related Issue) 🚨
※ The models are already uploaded on huggingface s3. They will be automatically downloaded during build. If you want to download fp16
or 8bits
model, uncomment the line in download.gradle.
🚨 TFLite conversion isn't working on CPU environment, working well with GPU environment. 🚨
You should specify the input shape(=max_seq_len) for model conversion.
# torchscript
$ python3 model_converter/{$TASK_NAME}/jit_compile.py --max_seq_len 40
# tflite (default)
$ python3 model_converter/{$TASK_NAME}/tflite_converter.py --max_seq_len 40
# tflite (fp16)
$ python3 model_converter/{$TASK_NAME}/tflite_converter.py --max_seq_len 40 --model fp16
# tflite (8bits)
$ python3 model_converter/{$TASK_NAME}/tflite_converter.py --max_seq_len 40 --model 8bits
MAX_SEQ_LEN
is set as 40 in this app. You may change this one by yourself.
--max_seq_len
option in python script)MAX_SEQ_LEN
in android source code.private static final int MAX_SEQ_LEN = 40;
max_seq_len
, it crashes:( You should pad the input sequence for tflite model.private static final boolean PAD_TO_MAX_LENGTH = true;
I've already uploaded fp16
and 8bits
tflite model on huggingface s3
. (English & Korean both)
If you want to use those models, uncomment the line in download.gradle
as below. They will be automatically downloaded during gradle build.
task downloadLiteModel {
def downloadFiles = [
// "https://s3.amazonaws.com/models.huggingface.co/bert/monologg/koelectra-small-finetuned-sentiment/nsmc_small_fp16.tflite" : "nsmc_small_fp16.tflite",
// "https://s3.amazonaws.com/models.huggingface.co/bert/monologg/koelectra-small-finetuned-sentiment/nsmc_small_8bits.tflite": "nsmc_small_8bits.tflite",
]
}
Also you need to change the MODEL_PATH
on Activity.
// 1. fp16
private static final String MODEL_PATH = "imdb_small_fp16.tflite";
// 2. 8bits hybrid
private static final String MODEL_PATH = "imdb_small_8bits.tflite";
At the first time running the inference using torchscript, the inference is quite slow. After the first pass, inference time comes back as normal.
It seems that the first time running the forward
might do some preheating work. (Not sure about it...)