As you go deeper down the rabbit hole building LLM-based applications, you may find that you need to root your LLM responses in your source data. Fine-tuning an LLM with your custom data may get you a generative AI model that understands your particular domain, but it may still be subject to inaccuracies and hallucinations. This has led a lot of organizations to look into retrieval-augmented generation (RAG) to ground LLM responses in specific data and back them up with sources.

With RAG, you create text embeddings of the pieces of data that you want to draw from and retrieve. That allows you to place a piece of the source text within the semantic space that LLMs use to create responses. At the same time, the RAG system can return the source text as well, so that the LLM response is backed by human-created text with a citation.

When it comes to RAG systems, you'll need to pay special attention to how big the individual pieces of data are. How you divide your data up is called chunking, and it's more complex than embedding whole documents. This article will take a look at some of the current thinking around chunking data for RAG systems.

continue reading on stackoverflow.blog

⚠️ This post links to an external website. ⚠️