The Transformer is a deep-learning architecture introduced by Google researchers in 2017 in the paper "Attention Is All You Need." It replaced recurrent networks for sequence modelling and is the foundation of every major LLM, image generator, and multimodal model in 2026.
Transformers use a mechanism called self-attention to weigh the relevance of every token in a sequence to every other token, regardless of distance. This is what lets an LLM keep track of context across thousands of tokens. The full architecture stacks many attention layers; "scale" in modern AI refers mostly to stacking more layers and training on more data.