W4A8

#1
by Lucena190 - opened

Hello, do you think I could get a performance improvement, without much loss of precision, if I quantize in W4A8? I'm wondering if vLLM converts and loads the activations in 8 bits in VRAM. I'm using an Nvidia L4 and would like to extract the maximum performance from it.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment