27b int4 version
Hello!
Are there plans to release an int4 version of this model (i.e. google/gemma-3-27b-it-qat-int4-unquantized
)?
I'm looking to fine tune this model further using 4-bit quantization and QLoRA.
Thanks in advance!
I'm also looking forward to the release of the google/gemma-3-27b-it-qat-int4-unquantized
model.
Additionally, I'd like to confirm my understanding regarding the difference between int4
and q4_0
quantization formats.
As a beginner, after doing some searching, my understanding is that q4_0
is a format primarily associated with llama.cpp
, while int4
can be implemented using libraries like bitsandbytes
within the Hugging Face ecosystem.
Therefore, if I want to stay within the Hugging Face ecosystem, using the int4
version is the correct approach, right? Could anyone please confirm if this is accurate?
Following up on the int4
discussion, I have a question about the specific quantization method used for this QAT (Quantization-Aware Training) model. I typically use a BitsAndBytesConfig
like the one below for 4-bit loading:
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
I understand that nf4
is introduced by QLoRA and may not be the same as the original int4 quantization methods. The Gemma3 technical report only mentions "per-channel int4, per-block int4, and switched fp8" to their QAT process.
Given this, I'm unsure if simply using a standard bnb_config with bnb_4bit_quant_type="nf4" is the correct way to load this specific QAT model and leverage the exact quantization it was trained with.
Could anyone provide guidance on how to load or work with this model to match the 'per-channel int4' or 'per-block int4' methods mentioned in the report? Any insights would be greatly appreciated!
Update.
I've found that per-channel is row-wise quantization, and per-block (blocks=32) int4 is what Q4_0 does (reference: https://huggingface.co./docs/hub/en/gguf#quantization-types).
And according to the issue, https://github.com/bitsandbytes-foundation/bitsandbytes/issues/1329, the current bitsandbytes
can not handle Q4_0, is that right?