TL;DR Quantization makes the model run with less computing power by truncating the precision of the tokens.
- FP16: 1.726
- Q8: 1.73
- Q4: 1.7
- Q2: 2.0 (rounded to nearest available value)
TL;DR Quantization makes the model run with less computing power by truncating the precision of the tokens.
- FP16: 1.726
- Q8: 1.73
- Q4: 1.7
- Q2: 2.0 (rounded to nearest available value)
But don’t forget: hard fought solutions are durable! You’ll understand *why* your algo is good once it gets there.
But don’t forget: hard fought solutions are durable! You’ll understand *why* your algo is good once it gets there.