Effort Save

An implementation of bucketMul LLM inference

Project README

An example implementation of the bucketMul algorithm - you can read about it here.

With it you can smoothly adjust—in real time—the number of calculations performed during the inference of an LLM model.

At 50% effort, it performs as fast as regular matrix multiplications on Apple Silicon chips; at 25% effort, it is twice as fast while still retaining most of the quality.

You also have the option to skip loading the least important weights.

Getting Started

Binaries

You can quickly get started by downloading the precompiled binaries available at: Effort Engine v0.0.1

To bypass macOS Gatekeeper, hold option while clicking to open the downloaded application for the first time.

Initial Setup

On the first run, you will be prompted to download the converted weights necessary for operation. Subsequently, a matrix multiplication benchmark will execute to demonstrate the capabilities of the engine.

Source Code

The sources are in Swift & Metal.

Download and open effort.xcodeproj. It should work straight away.

Additional Resources

More Information: Visit our project page.
See it in Action: Watch a demo on Asciinema.

Updates

Ton of things to fix, looking for collabolators! :)

Open Source Agenda is not affiliated with "Effort" Project. README Source: kolinko/effort

Stars

206

Open Issues

Last Commit

1 month ago

Repository

kolinko/effort

License

MIT

Homepage

https://kolinko.github.io/effort/

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/effort"><img src="https://www.opensourceagenda.com/projects/effort/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022