From the course: LLaMa for Developers
Unlock the full course today
Join today to access over 23,100 courses taught by industry experts.
Explaining LoRA and SLoRA - Llama Tutorial
From the course: LLaMa for Developers
Explaining LoRA and SLoRA
- [Instructor] In our chapter about training LoRa models, it's pretty interesting to see that we could create multiple adapters. In this video, we're going to talk about how we can serve many LoRa models using the Hugging Face framework, as well as another technique called S-LoRA. I'm here on the Hugging Face documentation, and we can see here we can add a LoRa adapter to a model. Scrolling down, we can also see that we can add another model. That's pretty interesting. And we can actually cycle between these two adapters to get different responses. Scrolling further, we can see that we can actively enable and disable these types of adapters, so in theory, we can serve many models at once using our base model and then adding adapters on top, and that's pretty cool. Now let's head over to this framework called S-LoRA. I'm on their GitHub page. Now, the interesting idea behind S-LoRA is that you can serve many more LoRa adapters at once. This is pretty powerful. In this diagram, we can…
Contents
-
-
-
-
-
-
(Locked)
Resources required to serve LLaMA4m 35s
-
(Locked)
Quantizing LLaMA4m 7s
-
(Locked)
Using TGI for serving LLaMA2m 40s
-
(Locked)
Using VLLM for serving LLaMA5m 27s
-
(Locked)
Using DeepSpeed for serving LLaMA4m 13s
-
(Locked)
Explaining LoRA and SLoRA1m 59s
-
(Locked)
Using a vendor for serving LLaMA3m 16s
-
(Locked)
-
-