Glossary entry

Transformer

The neural network architecture introduced in 2017 ("Attention Is All You Need") that powers every major LLM today.

The Transformer is a deep-learning architecture introduced by Google researchers in 2017 in the paper "Attention Is All You Need." It replaced recurrent networks for sequence modelling and is the foundation of every major LLM, image generator, and multimodal model in 2026.

Transformers use a mechanism called self-attention to weigh the relevance of every token in a sequence to every other token, regardless of distance. This is what lets an LLM keep track of context across thousands of tokens. The full architecture stacks many attention layers; "scale" in modern AI refers mostly to stacking more layers and training on more data.

Related terms

Written by

John Ethan

Founder & Editor-in-Chief

Founder of MytheAi. Tracking and reviewing AI and SaaS tools since January 2026. Built MytheAi out of frustration with pay-to-rank listicles and SEO-driven AI directories that prioritize ad revenue over honest guidance. Hands-on testing across 500+ tools to date.