Quantization reduces the numerical precision of model weights. A 70B-parameter model in 16-bit precision needs 140GB of memory; quantized to 4-bit it needs 35GB and fits on a single consumer GPU.
Quantization is essential for running open-source models locally and for cost-effective serving at scale. Quality loss at 8-bit is usually undetectable; at 4-bit there is a measurable but tolerable drop on benchmarks. Tools like llama.cpp, GGUF format, and AWQ make quantization accessible to non-researchers.