LLaMA Now Goes Faster on CPUs

justine.lol

cross-posted to:
[email protected]
[email protected]

LLaMA Now Goes Faster on CPUs

justine.lol

agilob@programming.devM to Performance@programming.devEnglish · 7 months ago

cross-posted to:
[email protected]
[email protected]

I wrote 84 new matmul kernels to improve llamafile CPU performance.

My kernels go 2x faster than MKL for matrices that fit in L2 cache, which makes them a work in progress, since the speedup works best for prompts having fewer than 1,000 tokens.

You must log in or register to comment.

Chat

Warning: Some posts on this platform may contain adult material intended for mature audiences only. Viewer discretion is advised. By clicking ‘Continue’, you confirm that you are 18 years or older and consent to viewing explicit content.

LLaMA Now Goes Faster on CPUs

LLaMA Now Goes Faster on CPUs