Replies: 2 comments
-
To make QAT work with other models should be pretty straight forward, you just swap out the model and tokenizer part of the config to match the config options for those other models. We may offer a single device recipe for QAT in the future, but for now we haven't been prioritizing it. You can always run a distributed recipe on single device though by setting "--nproc_per_node 1" though. |
Beta Was this translation helpful? Give feedback.
-
Just to follow up on this, while we don't have a separate recipe for single-device QAT, we do now provide a recipe for QAT + LoRA. This is similar to how the quantized 1B and 3B Llama models were trained, and you should be able to train using much less memory than the previous QAT full finetune recipe. You can see an example config here. |
Beta Was this translation helpful? Give feedback.
-
I'd like to experiment with QAT. I see "tune ls" shows there is a QAT recipe available for the Llama3 model but only distributed and only for full fine tuning. Any chance of making additional recipes for Llama 3.1 or 3.2 on a single GPU?
Beta Was this translation helpful? Give feedback.
All reactions