Glossary entry

Mixture of Experts (MoE)

A model architecture where only a subset of parameters activates per token, giving large total capacity at lower inference cost.

A Mixture of Experts model contains many "expert" sub-networks; a routing layer picks which experts handle each token. The total parameter count can be enormous (Mixtral 8x7B totals 47B parameters) but only a fraction is active per inference step (Mixtral activates ~13B per token).

MoE gives the quality of a large model at the inference cost of a smaller one. Mixtral, DeepSeek, and rumoured GPT-4 architectures all use MoE. The trade-off is more complex training and serving infrastructure.

Related terms

Written by

John Ethan

Founder & Editor-in-Chief

Founder of MytheAi. Tracking and reviewing AI and SaaS tools since January 2026. Built MytheAi out of frustration with pay-to-rank listicles and SEO-driven AI directories that prioritize ad revenue over honest guidance. Hands-on testing across 500+ tools to date.