Sberbank Ai Model Zoo Save

NLP model zoo for Russian

Project README

Welcome to the Model Zoo!

Here you can find NLP models for Russian, implemented in HF transformers🤗

See Examples In Colab!

Models:

Model Task Type Tokenizer Dict size Num Parameters Training Data Volume
ruBERT-base mask filling encoder bpe 120 138 178 M 30 GB
ruBERT-large mask filling encoder bpe 120 138 427 M 30 GB
ruRoBERTa-large mask filling encoder bbpe 50 257 355 M 250 GB
ruT5-base text2text generation encoder-decoder bpe 32101 222 M 300 GB
ruT5-large text2text generation encoder-decoder bpe 32101 737 M 300 GB

ruT5

Text2Text Generation task T5 paper

Model parameters

ruRoBerta

fill-mask task Roberta paper

ruBert

fill-mask task Bert paper

How to:

Use this Colab! to explore the models or run them on your machine.

Model set up:

pip install -r requirements.txt

Pipeline usage

from transformers import pipeline

unmasker = pipeline("fill-mask", model="sberbank-ai/ruRoberta-large")
unmasker("Евгений Понасенков назвал <mask> величайшим маэстро.", top_k=1)

Classical usage

# ruRoberta-large example 
from transformers import RobertaForMaskedLM,RobertaTokenizer

model=RobertaForMaskedLM.from_pretrained('sberbank-ai/ruRoberta-large')

tokenizer=RobertaTokenizer.from_pretrained('sberbank-ai/ruRoberta-large')

unmasker = pipeline('fill-mask', model=model,tokenizer=tokenizer)
unmasker("Стоит чаще писать на Хабр про <mask>.")

Use BertViz to obtain model visualizations

Roberta model_view:

/ !

from transformers import RobertaModel, RobertaTokenizer
from bertviz import model_view

model_version = 'sberbank-ai/ruRoberta-large'
model = RobertaModel.from_pretrained(model_version, output_attentions=True)
tokenizer = RobertaTokenizer.from_pretrained(model_version)

sentence_a = "The cat sat on the mat"
sentence_b = "The cat lay on the rug"
inputs = tokenizer.encode_plus(sentence_a, sentence_b, return_tensors='pt', add_special_tokens=True)
input_ids = inputs['input_ids']
attention = model(input_ids)[-1]
input_id_list = input_ids[0].tolist() # Batch index 0
tokens = tokenizer.convert_ids_to_tokens(input_id_list)
model_view(attention, tokens)

Open Source Agenda is not affiliated with "Sberbank Ai Model Zoo" Project. README Source: ai-forever/model-zoo
Stars
44
Open Issues
2
Last Commit
2 years ago
License

Open Source Agenda Badge

Open Source Agenda Rating