marcus
banner
mk2112.bsky.social
marcus
@mk2112.bsky.social
neurons, tokens, understanding.
github.com/mk2112
While being really capable, SOTA LLMs are >huge< and waste token space by e.g. applying filler phrases. A solution to the latter? Penalize wasting tokens/bandwidth/time, more data, hyperparameter optimization. The usual? Seems simple, but not easy.
October 17, 2024 at 6:28 PM
Model size itself is of course a crucial factor for applicability. Pruning and Quantization progressed beyond addressing this to being order-of-magnitude performance boosters. More accuracy with less computation.
October 17, 2024 at 6:28 PM