LO-BCQ: Locally Optimal Block Clustered Quantization for 4-bit (W4A4) LLM Inference
Reena Elangovan, Charbel Sakr, Anand Raghunathan, Brucek Khailany
Action editor: Yunhe Wang
https://openreview.net/forum?id=loWISTqGwW
#quantization #quantizing #blocks