Starting with a basic LLM, there are multiple ways to adapt and tune the LLM to work for the task you want it to do. They include the following:
Prompt Engineering
In-context learning (ICL) using instruction prompts
When providing a prompt to the LLM, tell the model what you want it to do, and/or provide some examples.
- Zero shot inference: E.g., In prompt to model, provide instructions to “Summarize the following dialogue:” and output as “Summary:”
- One/few shot inference: Provide a few examples.
Limitations:
- The context window may not be big enough to accommodate all the examples.
- May not work well for small LLMs.
Fine tuning
Fine tuning involves providing a pre-trained LLM with prompt-completion examples, and further training that LLM to tune it for the specific task in the examples. Depending on the resources available, there are multiple ways this can be achieved.
Full fine-tuning
- Updates all the weights of the model.
- Memory-intensive.
- If model is tuned for one specific task, it may be prone to catastrophic forgetting. To counter this, model can be fine-tuned on a variety of tasks (e.g., summarization, Q&A, logical reasoning, etc.)
- Example models: FLAN-T5 (fine-tuned version of T5 model)
Parameter-efficient Fine-tuning (PEFT)
Why is this necessary? Fully fine-tuning an existing LLM involves updating all the weights. Not only is this time-consuming, it may also require a lot of memory space. PEFT involves freezing the original weights and only re-training other components to save time and space with acceptable loss in model performance. Several techniques exist to do this:
- Selective: Fine-tuning only a subset of LLM parameters.
- Reparameterization: Represent model weights using lower-rank representations (LoRA)
- Additive: Add trainable layers or parameters to model. E.g., prompt tuning
References:
Generative AI with Large Language Models course on DeepLearning.ai