hi, depends on your use case and hardware you have access. generally speaking you can finetune a model couple ways
1. form of prefix-tuning that prepends a learnable adaption-prompt - Adapter
2. Low-rank adaption (LoRA) This significantly reduces the number of trainable parameters and speeds up training with little impact on the final performance of the model.
The finetuning requires at least one GPU with ~24 GB memory (RTX 3090) for 7B models. QLora came to play if you a a limited GPU vram.
just keep in mind, quantization to 8 bits reduce the precision and quality.
depends how your configuration your training parameters. you can find more details here: https://huggingface.co/docs/transformers/main_classes/trainer
hope this help, cheers!