How Rocket Mortgage built a text-to-SQL system with RAG

Picture this: your company sits on tens of petabytes of data. To put that into perspective, if I had a penny for each byte and stacked them up, I'd have enough to reach Pluto and back, with some change left over.

That’s the reality we face at Rocket Mortgage, and it's probably not too different from what your organization is facing.

💡

All that data represents a goldmine of insights, but the challenge has always been making it accessible to the people who need it most (executives, analysts, and decision-makers who understand the business but don’t necessarily code SQL in their sleep.)

That’s why we built Rocket Analytics, and today I want to take you behind the scenes of how we created a text-to-SQL application using agentic RAG (Retrieval-Augmented Generation).

This tool fundamentally changes how our teams interact with data, letting them focus on what they do best: asking strategic and thoughtful questions, while the system handles the technical heavy lifting.

What Rocket Analytics actually does

Here’s how it works in practice: a user asks a natural language question, such as:

“Give me the count of loans for the past six months.”

Behind the scenes, the system:

Converts the question into a SQL query
Executes it against the relevant database
Returns the results in a clean, understandable format

💡

But the real magic comes when users ask follow-up questions. They can request insights based on the initial results, or generate dashboards highlighting actionable patterns and trends.

During a recent demo, someone went from raw loan counts to a comprehensive dashboard showing:

Total loans closed
Average daily loans
Maximum daily loan dates
Trend analysis

—all within seconds. For executives and stakeholders in the mortgage industry, where speed of decision-making is crucial, this capability is transformative.

The architecture behind the magic

Rocket Analytics relies on several key components, each playing a critical role in delivering accurate results.

Query transformation: Making questions understandable

The journey starts when a user inputs their question. Large language models, while powerful, don’t inherently understand our specific business context or data structures.

Ask ChatGPT, for example:

“Can you give me the top ten days for loan production?”

It won’t know exactly what you mean.

Our query transformation module converts natural language into explicit, actionable instructions. That same question becomes:

“Write a SQL query to get the top ten days based on loan closing.”

This step is crucial because we never pass actual data to the model—only metadata about our database structures.

Building and managing the knowledge base

Before queries can be answered, we need a comprehensive knowledge base. We:

Convert database metadata into embeddings using Amazon Titan
Store them in a FAISS (Facebook AI Similarity Search) vector store
Include information about all tables, schemas, relationships, and business context

This knowledge base is the foundation that allows the system to understand which tables might be relevant for any given question.

Intelligent retrieval: Finding the right data sources

When a question comes in, we perform a hybrid search:

Semantic similarity search identifies potentially relevant tables
Keyword triggers ensure critical tables are included even if semantic search alone misses them

For example, a knowledge base with 15 tables might be narrowed down to 4–5 most relevant. This ensures the LLM is not overwhelmed and reduces errors.

Re-ranking for precision

Once candidate tables are retrieved, we use Cohere’s re-ranker model (BART-based) to refine selection. This classification system evaluates the likelihood of each document being relevant to the user’s question.

This step is essential for preventing hallucinations, ensuring the LLM generates accurate SQL queries based only on relevant metadata.

Prompt engineering: The art and science

Crafting prompts involves multiple layers:

Standard guidelines: Always answer within context; use exact table and column names
Adaptive guidelines: Few-shot examples for tricky question types
Domain-specific context: Helps the model understand mortgage-industry terminology

A typical prompt looks like:

💡

“Given the below two tables and their detailed metadata information, please answer the question at the end of the prompt, keeping the below guidelines in mind…”

This carefully crafted prompt ensures accuracy while guiding the model through complex queries.

Execution and post-processing

Once the LLM generates a SQL query:

Execute it against the appropriate database
Post-process results to make them user-friendly
Users can consume results as-is or ask follow-up questions

For instance, a user reviewing six months of loan data can then ask:

“Can you show me some insights?”

The system will automatically generate visualizations and highlight actionable trends, all within seconds.

The agentic framework: Orchestrating complexity

So far, we’ve described a single agent handling queries in one domain. But organizations often need cross-domain insights.

Our agentic framework introduces:

A main orchestrator agent
Sub-agents specialized in domains like sales, marketing, and operations
Each sub-agent has its own knowledge base and optimized prompts

Currently, each agent operates within domain constraints. The next step is multi-agent coordination, enabling cross-domain queries without manual integration.

Performance optimization: Speed and cost matter

Two strategies have made a big difference:

Semantic caching

Avoids redundant processing by reusing tables and prompts for semantically similar questions
Requires careful tuning to handle subtleties like “past six months” vs. “past ten days”

Prompt caching

Caches static prompt components
Reduces computation and improves latency and cost efficiency

These optimizations help Rocket Analytics scale across teams without slowing down queries.

Measuring success: How we know it’s working

We evaluate the system with several key metrics:

Relevance: Are retrieved tables truly relevant? Checked using LLM-as-a-judge
Hallucination detection: Compares outputs against a golden dataset
Toxicity screening: Ensures all responses are professional

Multiple iterations per query give a reliable performance overview and help catch edge cases.

Real-world impact: Who benefits and how?

Rocket Analytics helps teams across the organization:

Sales & Marketing

Analyze campaign effectiveness
Identify seasonal trends
Compare strategies across segments

Operations

Monitor performance metrics
Optimize processes

Finance

Generate reports instantly
Analyze trends

Previously, these insights took days to produce via analysts; now they’re available in seconds.

The human element: Trust but verify

AI accelerates insight, but human oversight is critical:

Human-in-the-loop checkpoints for high-stakes queries
Regular audits to catch drift or emerging errors
Ensures speed and accuracy without compromising trust

Looking ahead: The evolution continues

Our roadmap includes:

Multi-agent coordination for cross-domain insights
Continuous performance improvements
Expanded domain coverage for more teams

By democratizing data access, Rocket Analytics enables any team member to make data-driven decisions, without SQL expertise.

Final thoughts

Building a text-to-SQL system with agentic RAG has taught us:

AI can augment human intelligence, not replace it
Speed, accuracy, and accessibility matter more than sheer complexity
Thoughtful implementation and user-centered design are key

By removing technical barriers, Rocket Analytics empowers more people to explore, analyze, and act on data, creating a competitive advantage.

The goal isn’t to build the most sophisticated AI but to build the most useful AI.

Arjun Barli, Staff Data Scientist at Rocket Mortgage, gave this presentation at our Agentic AI Summit, New York, 2025.

Unlocking the power of data: How we built text-to-SQL with agentic RAG at Rocket Mortgage