OpenNMT Py Versions Save

Open Source Neural Machine Translation and (Large) Language Models in PyTorch

v3.5.1

1 month ago

Further fixes
added wikitext runs

v3.5.0

2 months ago

3.5.0 (2024-02-22)

Further improvements and fixes
Suport for AWQ models
Add n_best for topp/topk generation
Support MoE (Mixtral) inference
Extend HF models converter
use flash_attn_with_kvcache for faster inference
Add wikitext2 PPL computation
Support for Phi-2 models

v3.4.3

5 months ago

Further improvements to beam search and decoding
New indexing "in bucket" for faster inference cf #2496
Code cleanup
Fix int8 for CPU dynamic quantization (still slow...)

v3.4.2

6 months ago

torch 2.1 (scaled_dot_product improvements)
Mistral 7B sliding window
Speed-up inference
flash attention 2 (with sliding window) >= v2.3.1
use FusedRMSNorm from apex if available
fixed attn_debug

v3.4.1

7 months ago

bug fixes
torch 2.x requirement (flash attention requires it)
zero-out the prompt loss in LM finetuning
batching sorted on src then tgt instead of max len
six dependancy

v3.4.0

7 months ago

bitsandbytes 4/8 bit quantization at inference
MMLU-FR results and scoring
flan-T5 support
flash attention
terminology transform
tensor parallelism (inference, training)

v3.3.0

10 months ago

Switch to pytorch 2.0.1
Eval LLM with MMLU benchmark
Fix Falcon 40B conversion / finetuning / inference
Plugin encoder/decoder thanks @kleag / @n2oblife
optional Safetensors for model storage (beta)
finetuning config templates for supported LLMs

v3.2.0

10 months ago

Lots new stuff in this release:

Skip init during model build (way faster building)
Enable quantization of LoRA layers
Enable 4bit quantization from bitsandbytes (NF4 / FP4)
Enable "some" bnb.optim Optimizers for benchmarking purpose
Refactor model state_dict loading to enable pseudo lazy loading with move on GPU as it loads
Enable Gradient checkpointing for FFN, MHA, LoRA modules
Make FFN bias optional (same as QKV): llama, mpt, redpajama, openllama converters changed accordingly. Convertv2_v3 set add_qkvbias=True, add_ffnbias=True. load_checkpoint: if w1_bias detected in checkpoint then add_ffnbias=True
Add Multi Query attention
Add Parallel Residual attention
Add Falcon 7B converter

v3.1.3

11 months ago

Step-by-step Tuto for Vicuna replication thanks Lina
MosaicML MPT7B converter and support (Alibi embeddings)
Open Llama converter / Redpajama converter
Switch GCLD3 to Fasttext thanks ArtanieTheOne
fix coverage attention in beam decoding
fix ct2 keys for "Llama / MPT7B based" OpenNMT-y models

v3.1.2

11 months ago

fixes: transforms (normalize, clean, inlinetags)
Llama support (rotary embeddings, RMSNorm, Silu activation)
8bit loading for specific layers (along with LoRa for other layers)
subword learner added to build_vocab