Large Language Models for All, 🦙 Cult and More, Stay in touch !
PushShift.io
Reddit) using metaseq, 1/7th the carbon footprint if GPT-3, combining Meta’s open source Fully Sharded Data Parallel (FSDP) API and NVIDIA’s tensor parallel abstraction within Megatron-LM, contain predominantly English text and a small amount of non-English data via CommonCrawl, released under a noncommercial license.$\color{red}{\textsf{Refactoring...}}$
Data & Model Parallel
Param Efficient
Other
$\color{red}{\textsf{Refactoring...}}$ raw version here https://github.com/shm007g/LLaMA-Cult-and-More/issues/4
$\color{red}{\textsf{Planning}}$
$\color{red}{\textsf{Planning}}$