Llama Qrlhf Save

Implementation of the Llama architecture with RLHF + Q-learning

Project README

Llama - QRLHF (wip)

Implementation of the Llama (or any language model) architecture with RLHF + Q-learning.

This is experimental / independent open research, built off nothing but speculation. But I'll throw some of my brain cycles at the problem in the coming month, just in case the rumors have any basis. Anything you PhD students can get working is up for grabs.

Will start off by adapting the autoregressive discrete Q-learning formulation in the cited paper below and run a few experiments on arithmetic, using a symbolic solver as reward generator.

Yannic Kilcher's educational Q-learning video

Citations

@inproceedings{qtransformer,
    title   = {Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions},
    authors = {Yevgen Chebotar and Quan Vuong and Alex Irpan and Karol Hausman and Fei Xia and Yao Lu and Aviral Kumar and Tianhe Yu and Alexander Herzog and Karl Pertsch and Keerthana Gopalakrishnan and Julian Ibarz and Ofir Nachum and Sumedh Sontakke and Grecia Salazar and Huong T Tran and Jodilyn Peralta and Clayton Tan and Deeksha Manjunath and Jaspiar Singht and Brianna Zitkovich and Tomas Jackson and Kanishka Rao and Chelsea Finn and Sergey Levine},
    booktitle = {7th Annual Conference on Robot Learning},
    year   = {2023}
}

@inproceedings{Wang2015DuelingNA,
    title   = {Dueling Network Architectures for Deep Reinforcement Learning},
    author  = {Ziyun Wang and Tom Schaul and Matteo Hessel and H. V. Hasselt and Marc Lanctot and Nando de Freitas},
    booktitle = {International Conference on Machine Learning},
    year    = {2015},
    url     = {https://api.semanticscholar.org/CorpusID:5389801}
}

Open Source Agenda is not affiliated with "Llama Qrlhf" Project. README Source: lucidrains/llama-qrlhf

Stars

148

Open Issues

Last Commit

4 months ago

Repository

lucidrains/llama-qrlhf

License

MIT

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/llama-qrlhf"><img src="https://www.opensourceagenda.com/projects/llama-qrlhf/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022