The GeoV model was designed by Georges Harik and uses Rotary Positional Embeddings with Relative distances (RoPER) by Georges Hark and Varuna Jayasiri.
RoPER, in addition to using relative positions in the attention score calculation by RoPE embeddings, adds relative positional information explicitly to value embeddings. Specifically, it incorporates the relative positions of the tokens paid attention to. RoPER has given better performance in some algorithmic tasks, and seems comparable to RoPE in language modeling.
We have shared 9B parameter pre-trained model at GeoV/GeoV-9b, The released weights were trained on ~70 billion tokens. We plan to continue training up to 300 billion tokens and update the weights at every 20b tokens. This training run is monolingual and uses c4en and english wikipedia datasets. We will also train smaller and larger versions. Our aim is to have broadly available smaller and larger models.
This implementation is built on top of transformers library.
pip install geov
These are results from EleutherAI/lm-evaluation-harness tests at different checkpoints. We will keep updating these as the training progresses.
from geov import GeoVForCausalLM, GeoVTokenizer model = GeoVForCausalLM.from_pretrained("GeoV/GeoV-9b") tokenizer = GeoVTokenizer.from_pretrained("GeoV/GeoV-9b") prompt = "In mathematics, topology is the study of" input_ids = tokenizer(prompt, return_tensors="pt").input_ids gen_tokens = model.generate( input_ids, do_sample=True, temperature=0.9, max_length=100, ) gen_text = tokenizer.batch_decode(gen_tokens)