Reason to RAG: Optimizing Dynamic Agentic Retrieval with LLMs and GRPO-RL

Author: Fausto Albers

Date: 2025-02-13

Abstract

Retrieval-Augmented Generation (RAG) has become a powerful partner for augmenting Large Language Models (LLMs) with external information at inference. Traditional RAG systems rely on naive embedding-based retrieval, which often fails to retrieve optimal chunks for a given query. While Agentic RAG offers a solution, it requires self-improvement capabilities to manage increasing complexity. This agentic framework combines programmatic prompt engineering with model parameter updates. The recently introduced GRPO algorithm by DeepSeek enables Reinforcement Learning (RL) using arbitrary reward functions, without requiring Human Feedback or a separate critic-model.

In this work, we introduce a novel reinforcement learning approach that redefines RAG as a retrieval problem rather than a generative one. By framing retrieval as a closed-domain task—where the ground truth is a discrete set of chunk IDs rather than an open-ended text response—we suggest that RAG retrieval can be optimized through reinforcement learning. This makes it ideal for Group Relative Policy Optimization (GRPO), the algorithm recently introduced by DeepSeek in their R1 model training.

We propose a self-optimizing retrieval agent that applies GRPO-driven reinforcement learning to dynamically refine retrieval strategies, improving recall, precision, and multi-hop inference. To validate this, we construct a synthetic dataset tailored for RAG reinforcement learning and develop a structured reward mechanism that incentivizes correct retrievals and query decomposition. By leveraging distillation techniques, we expect that models trained with our method can outperform DeepSeek-R1 despite being orders of magnitude smaller, enabling local inference on commodity hardware at a fraction of the computational cost.

As a case study, we focus on AI-assisted retrieval for service engineers diagnosing complex industrial machines, where accurate multi-hop retrieval is critical.

Introduction

The accuracy of a Large Language Model (LLM) in performing complex, multi-hop logical inference—particularly when the required information, including the complete answer, is partially or completely absent from its pre-training corpus—is significantly enhanced by integration with a Retrieval-Augmented Generation (RAG) system [1]. This enhancement depends on several critical conditions:

Premise Completeness: All premises necessary for logically deducing the correct answer must be available within the resources accessible to the RAG system.
Logical Sufficiency: The retrieved premises, in combination, must be logically sufficient to allow the LLM to derive the correct answer.
Effective Retrieval Process: The RAG system must employ a robust retrieval process, characterized by the following capabilities:
- (a) Query Analysis: The system must accurately determine whether the initial query requires decomposition into sub-queries to effectively retrieve all necessary information.
- (b) Decomposition: When necessary, the system must be capable of decomposing the original query into a set of logically connected sub-queries.
- (c) Strategic Routing: The system must possess the ability to route each sub-query to the appropriate database or knowledge source. This routing is contingent upon the system having explicit, structured knowledge of the content and organizational schema of each available database.

This research focuses on significantly enhancing the retrieval process, recognizing it as the most challenging and crucial aspect of RAG system design. We introduce the concept of Inference-Time Compute, referring to the computational resources and processing steps an LLM utilizes during the generation of a response, after the initial prompt is received. Increasing inference-time compute, analogous to giving the model more "time to think," can significantly improve reasoning capabilities.