RAG is the standard pattern for grounding LLM output in fresh, private, or domain-specific knowledge. Instead of fine-tuning facts into model weights, you index documents in a vector database, retrieve the top-k relevant chunks at query time, and pass them to the LLM as context.
RAG is how every "chat with your docs" product works. It scales cheaply (just add more documents to the index), updates instantly (re-index changed content), and produces citable answers (return source links alongside the response). Quality depends heavily on the quality of chunking, embedding, and retrieval.