Smol Gpt Save Abandoned

Smol but mighty language model

Project README

Smol GPT

Multimodal instruction-following model for text generation that runs on your CPU. Less than 14 GB of RAM required.



Models hallucinates - this is meant for fun. We do not recommend using the model in production - unless you hallucinate for a living know what you are doing.

The implementation may contain bugs and int4 quantization performed is not optimal – This might lead to worse performance than the original model.


  1. git clone
  2. pip install -r requirements.txt
  3. cd cpp && make
  4. cd ..
  5. python3 (May take a few minutes to download and load the model)
  6. Open in your browser.`

Contributing (Possible future directions)

Contributions are welcome. Please open an issue or a PR. New features will be community driven. Following features can be easily added for the model:


  • Chat/Conversation mode is supported by the model, but not the app.
  • Increase Input/Output length.
  • GPTQ quantization.
  • Interesting Prompts.

Performance (Speed and Memory)

  • Reduce RAM usage by 4x (down to ~4 GB)
    • Current Flask implementation loads the Bert & CLIP models twice for some reason.
    • Offload T5 encoder after getting the hidden representations.
    • Shift Vision and Bert model to int4/int8 and offload after using.
  • Speed up 4x:
  • MMap Speed up.
  • Support Smoller GPT for running multimodal models in 4 GB of RAM.

Unknowns about the model

  • Performance on multiple collated images.
  • Couple with OCR to reason about text from images.





The model used are Clip and Bert following Blip-2 and Flan-T5 for instruction following.

Open Source Agenda is not affiliated with "Smol Gpt" Project. README Source: NolanoOrg/smol-gpt
Open Issues
Last Commit
5 months ago

Open Source Agenda Badge

Open Source Agenda Rating