Tune Gemini models overview

Model tuning is a crucial process in adapting Gemini to perform specific tasks with greater precision and accuracy. Model tuning works by providing a model with a training dataset that contains a set of examples of specific downstream tasks.

This page provides an overview of model tuning for Gemini, describes the tuning options available for Gemini, and helps you determine when each tuning option should be used.

Benefits of model tuning

Model tuning is an effective way to customize large models to your tasks. It's a key step to improve the model's quality and efficiency. Model tuning provides the following benefits:

  • Higher quality for your specific tasks.
  • Increased model robustness.
  • Lower inference latency and cost due to shorter prompts.

Tuning compared to prompt design

Tuning provides the following benefits over prompt design.

  • Allows deep customization on the model and results in better performance on specific tasks.
  • Offers more consistent and reliable results.
  • Capable of handling more examples at once.

Tuning approaches

Parameter-efficient tuning and full fine-tuning are two approaches to customizing large models. Both methods have their advantages and implications in terms of model quality and resource efficiency.

Parameter efficient tuning

Parameter-efficient tuning, also called adapter tuning, enables efficient adaptation of large models to your specific tasks or domain. Parameter-efficient tuning updates a relatively small subset of the model's parameters during the tuning process.

To understand how Vertex AI supports adapter tuning and serving, you can find more details in the following whitepaper, Adaptation of Large Foundation Models.

Full fine-tuning

Full fine-tuning updates all parameters of the model, making it suitable for adapting the model to highly complex tasks, with the potential of achieving higher quality. However full fine tuning demands higher computational resources for both tuning and serving, leading to higher overall costs.

Parameter efficient tuning compared to full fine tuning

Parameter-efficient tuning is more resource efficient and cost effective compared to full fine-tuning. It uses significantly lower computational resources to train. It's able to adapt the model faster with a smaller dataset. The flexibility of parameter-efficient tuning offers a solution for multi-task learning without the need for extensive retraining.

Supported tuning methods

Vertex AI supports the following methods to tune foundation models.

Supervised fine-tuning

Supervised fine-tuning improves the performance of the model by teaching it a new skill. Data that contains hundreds of labeled examples is used to teach the model to mimic a desired behavior or task. Each labeled example demonstrates what you want the model to output during inference.

When you run a supervised fine-tuning job, the model learns additional parameters that help it encode the necessary information to perform the desired task or learn the desired behavior. These parameters are used during inference. The output of the tuning job is a new model that combines the newly learned parameters with the original model.

Supervised fine-tuning of a text model is a good option when the output of your model isn't complex and is relatively easy to define. Supervised fine-tuning is recommended for classification, sentiment analysis, entity extraction, summarization of content that's not complex, and writing domain-specific queries. For code models, supervised tuning is the only option.

Models that support supervised fine-tuning

Both Gemini and PaLM models support supervised fine-tuning. For more information on using supervised fine-tuning with each respective model, see the following pages.

Reinforcement learning from human feedback (RLHF) tuning

Reinforcement learning from human feedback (RLHF) uses preferences specified by humans to optimize a language model. By using human feedback to tune your models, you can make the models better align with human preferences and reduce undesired outcomes in scenarios where people have complex intuitions about a task. For example, RLHF can help with an ambiguous task, such as how to write a poem about the ocean, by offering a human two poems about the ocean and letting that person choose their preferred one.

RLHF tuning is a good option when the output of your model is complex and isn't easily achieved with supervised tuning. RLHF tuning is recommended for question answering, summarization of complex content, and content creation, such as a rewrite. RLHF tuning isn't supported by code models.

Models that support RLHF tuning

PaLM models support RLHF tuning. For more information on, see Tune PaLM text models by using RLHF tuning.

Model distillation

Model distillation for is a good option if you have a large model that you want to make smaller without degrading its ability to do what you want. The process of distilling a model creates a new, smaller trained model that costs less to use and has lower latency than the original model.

Models that support model distillation

PaLM models support model distillation. For more information, see Create distilled text models for PaLM.

What's next