Repository for Project Insight: NLP as a Service
Project Insight is designed to create NLP as a service with code base for both front end GUI (streamlit
) and backend server (FastApi
) the usage of transformers models on various downstream NLP task.
The downstream NLP tasks covered:
News Classification
Entity Recognition
Sentiment Analysis
Summarization
Information Extraction To Do
The user can select different models from the drop down to run the inference.
The users can also directly use the backend fastapi server to have a command line inference.
Fastapi
and Streamlit
making the complete code base in Python.Docker Compose
to spin up the Fastapi based backend service.streamlit run command
.Download the models
src_fastapi
folder.Running the backend service.
src_fastapi
folderDocker Compose
comnand$ cd src_fastapi
src_fastapi:~$ sudo docker-compose up -d
Running the frontend app.
src_streamlit
folder$ cd src_streamlit
src_streamlit:~$ streamlit run NLPfily.py
Access to Fastapi Documentation: Since this is a microservice based design, every NLP task has its own seperate documentation
Front End: Front end code is in the src_streamlit
folder. Along with the Dockerfile
and requirements.txt
Back End: Back End code is in the src_fastapi
folder.
Classification
, ner
, summary
...etc- sentiment
> app
> api
> distilbert
- model.bin
- network.py
- tokeniser files
>roberta
- model.bin
- network.py
- tokeniser files
For each new model under each service a new folder will have to be added.
Each folder model will need the following files:
network.py
Defining the class of the model if customised model used.config.json
: This file contains the details of the models in the backend and the dataset they are trained on.
Fine Tune a transformer model for specific task. You can leverage the transformers-tutorials
Save the model files, tokenizer files and also create a network.py
script if using a customized training network.
Create a directory within the NLP task with directory_name
as the model name
and save all the files in this directory.
Update the config.json
with the model details and dataset details.
Update the <service>pro.py
with the correct imports and conditions where the model is imported. For example for a new Bert model in Classification Task, do the following:
Create a new directory in classification/app/api/
. Directory name bert
.
Update config.json
with following:
"classification": {
"model-1": {
"name": "DistilBERT",
"info": "This model is trained on News Aggregator Dataset from UC Irvin Machine Learning Repository. The news headlines are classified into 4 categories: **Business**, **Science and Technology**, **Entertainment**, **Health**. [New Dataset](https://archive.ics.uci.edu/ml/datasets/News+Aggregator)"
},
"model-2": {
"name": "BERT",
"info": "Model Info"
}
}
Update classificationpro.py
with the following snippets:
Only if customized class used
from classification.bert import BertClass
Section where the model is selected
if model == "bert":
self.model = BertClass()
self.tokenizer = BertTokenizerFast.from_pretrained(self.path)
This project is licensed under the GPL-3.0 License - see the LICENSE.md file for details