Glossary entry

Guardrails

The combined safety, format, and policy filters that constrain what an LLM can output in production.

Guardrails are the layer of filters and validators around an LLM that block unsafe, off-topic, or wrongly-formatted output before it reaches the user. They include content moderation models, format validators (regex, JSON schema), topic filters, and prompt-injection detectors.

Most production LLM apps use a mix of provider-level safety (OpenAI moderation, Anthropic's constitutional AI) and application-level guardrails (NeMo Guardrails, Guardrails AI library, Lakera). The trade-off is always latency and false-positive rate vs robustness.

Related terms

Written by

John Ethan

Founder & Editor-in-Chief

Founder of MytheAi. Tracking and reviewing AI and SaaS tools since January 2026. Built MytheAi out of frustration with pay-to-rank listicles and SEO-driven AI directories that prioritize ad revenue over honest guidance. Hands-on testing across 500+ tools to date.