W4A8
#1
by
Lucena190
- opened
Hello, do you think I could get a performance improvement, without much loss of precision, if I quantize in W4A8? I'm wondering if vLLM converts and loads the activations in 8 bits in VRAM. I'm using an Nvidia L4 and would like to extract the maximum performance from it.