glassduck

From Floats to Integers: Quantizing a Neural Network Without Losing Accuracy

Taking an MNIST classifier from float32 down to 8-bit integers without losing accuracy: the scale-and-zero-point trick, and how a dot product survives the move to integers.

— Jun 10, 2026

Post-Training Quantization to Trit-Planes for Large Language Models

Understanding how trit-plane quantization compresses LLMs without retraining.

— Feb 8, 2026