FlagEmbedding Versions Save

Retrieval and Retrieval-augmented LLMs

BGE-M3&Beacon

3 months ago

BGE-M3

A new member of the BGE model series! BGE-M3 stands for Multi-linguality, Multi-granularities (input length up to 8192), and Multi-Functionality (unification of dense, lexical, multi-vec retrieval). It is the first embedding model which supports all three retrieval methods.

For more details please refer to Technical Report and Code.

Activation Beacon

An effective, efficient, compatible, and low-cost (training) method to extend the context length of LLM by x100 times. We extend the context length of Llama-2-chat-7b from 4K to 400K.

For more details please refer to paper and code

Feedback is welcome

lm-cocktail

5 months ago

LM-Cocktail

Merge language models (e.g., Llama, bge) to improve the general ability of models. This method can be used to:

Mitigate the Problem of Catastrophic Forgetting
Improve the performance of new tasks without fine-tuning
Approximate multitask learning or model ensemble

More details please refer to paper and code

1.1

7 months ago

Create the first release #131

FlagEmbedding

Update Embedding Models bge-*-v1.5:
- alleviate the issue of the similarity distribution
- the new models can do retrieval tasks without instruction, but still recommend using instruction which can have a better performance.
New Models bge-reranker-*: cross-encoders that can rerank the top-k retrieved results
Specify using normalization in the configuration for sentence-transformers, thanks to skirres. Now users have no need to set normalize_embeddings=True manually when using sentence-transformers.

C-MTEB

Add two cross-lingual retrieval tasks: T2RerankingZh2En and T2RerankingEn2Zh.