From the course: LLaMa for Developers

Unlock the full course today

Join today to access over 23,100 courses taught by industry experts.

Quantizing LLaMA

Quantizing LLaMA

- [Instructor] In the previous video, we talked a little bit about fitting LLaMa onto your hardware. In this video, we're going to dive deeper into quantizing Lama. So to start off, why is quantization important? There are four main reasons. The first one is it allows you to run more powerful models by reducing the memory footprint, second one is it allows you to train more powerful models, the third one is that it reduces energy consumption, and the fourth one that it advances computer science. Now, recapping quantization. From the previous video, we saw this chart. Depending on the precision that we store our model, we require a different amount of memory. Urgent LLaMa would require 28 gigabytes of memory while 4-bit quantized LLaMa only requires 3 1/2. Now, quantization is fairly new. We only achieved reliable 8-bit quantization for large language models in 2022. So with that said, let's review some of the important blog posts and papers about quantizing large language models. So…

Contents