Building enterprise AI agents: Frontline lessons with TrueFoundry

When I first started moderating panels about AI, the conversations were mostly theoretical. But sitting down with Kishore Aradhya (Head of Data Engineering & Architecture, Frontdoor), Eli Tsinovoi (Head of AI, UKG), Shafik “SQ” Quoraishee (Senior Android/ML Games Engineer, The New York Times), and Manish Nigam (Senior Director, AI, Ameriprise Financial), the discussion was refreshingly different.

These are leaders who are actually building agentic AI systems at scale, and learning some hard truths along the way.

Here’s what struck me most: every single panelist emphasized starting small. Not because they lack ambition, but because they’ve learned that crawling before you walk actually works.

Kishore from Frontdoor put it perfectly when he said rushing into “fancy agentic frameworks” is a recipe for disaster. Instead, his team focuses on known problems with measurable outcomes, like automating insurance claim reviews, before tackling anything more complex.

That mindset is something we see often in our work at TrueFoundry, where I’m CEO and co-founder. We spend a lot of time with teams trying to move AI from experimentation into reliable, governed systems in production.

Hosting this panel made it clear how quickly real-world constraints (governance, access, and infrastructure) shape what’s actually possible once models leave the demo stage.

The model access maze nobody talks about

You'd think accessing AI models would be straightforward by now. It's not. Each company on the panel has taken a wildly different approach, and there's a good reason for that.

Eli from UKG routes everything through Google Cloud's Vertex AI. Makes sense for them; they get the nice model garden features and can control token usage without building from scratch. But here's where it gets interesting: they're already moving beyond basic controls toward LLM proxies for better routing and fallback models.

Meanwhile, Kishore's team at Frontdoor runs everything through Snowflake. Why? Because they're a 50-year-old insurance company where governance is survival. Every action in their Snowflake ecosystem is traceable by default. No extra work needed.

The New York Times takes yet another approach. Shafiq explained that they actually maintain separate AI infrastructures for their newsroom and business operations. Different needs, different tools. Their journalism side requires absolute fidelity (no hallucinations in news stories, thank you very much), while the business side can experiment more freely with design pattern detection and subscription management.

And Ameriprise? They're using a managed environment approach that lets them maintain their conservative financial services stance while still innovating. As Manish put it, security is paramount, but that doesn't mean standing still.

Why AI gateways are becoming the new battleground

Here's a debate that got heated: do we need specialized AI gateways, or can traditional API gateways handle the job?

Eli thinks API gateways can evolve to handle AI traffic. His argument? The network infrastructure is basically the same. Just add semantic concepts like "time to first token" instead of thinking in raw payloads. Keep it simple, avoid the "Maserati platforms that do everything."

But Manish pushed back hard on this. Traditional observability, tokens, latency, and costs work fine for basic chatbots. But agentic systems? That's a different beast entirely.

Think about it: when an AI agent receives your request, it might:

Decide which tools to use
Call multiple APIs in sequence
Access your file system
Loop through this process multiple times
Coordinate with other agents

How do you trace that? How do you debug when something goes wrong? Traditional observability tools show you the what, but agentic systems need to show you the why. The reasoning. The decision tree.

This isn't academic; it's a real problem these companies face daily. When your AI makes a decision about an insurance claim or a financial recommendation, you need to explain that decision. Not just to your users, but to auditors, regulators, and your own teams trying to improve the system.

The tracing nightmare that keeps CTOs awake

Speaking of tracing, let me share what really surprised me: none of these companies has figured out enterprise-wide tracing yet. Not one.

Shafiq from The New York Times was refreshingly honest about this. Different teams implement observability differently because their needs vary so much. A mobile app generating content for users requires different tracing than a centralized system distributing outputs across multiple endpoints. They're even exploring using AI to help trace AI; meta, right?

The challenge compounds when you realize most observability platforms only serve specific personas. Engineers want traces. Data scientists want evaluation metrics. Product managers want user behavior data. Nobody's built a platform that serves everyone well.

Eli shared a painful lesson: they evaluated several big-name platforms (he mentioned Arise and Langsmith among others) and weren't truly happy with any of them. The demos looked great: perfectly manicured data, smooth workflows. But drop them into a real enterprise environment with multiple hops, proxies, and infrastructure layers? Different story.

His advice? Do upfront trials. Make vendors stand behind their demos with your actual data and workflows. And pay attention to the boring stuff like regulatory compliance and on-premise capabilities (which, by the way, are often "seven versions behind" the cloud offerings).

What "agentic" actually means in practice

Everyone talks about AI agents, but definitions are all over the map. Manish offered the clearest one I've heard: an agent equals a model plus access to tools plus memory. All three components. Miss one, and you've got something else.

This matters because it shapes how you build. Take The New York Times' approach to code analysis. They're using agents to identify where their design system hasn't been fully applied across their codebase. That's a model (understanding code patterns) with tools (accessing repositories) and memory (tracking what's been analyzed). Classic agent behavior.

But here's what nobody tells you about building agents: the non-determinism stacks. One LLM's outputs vary. Add tool calling? More variation. Multiple agents coordinating? Now you're in what Shafiq called "an undiscovered continent" of complexity.

This is why Kishore insists on starting with problems humans already solve. If a human can do it with reasonable accuracy, you have a baseline. You understand the decision process. You can measure improvement. But asking an agent to do something nobody understands? That's asking for trouble.

The MCP revolution everyone's watching

Just when enterprises are getting comfortable with current AI architectures, along comes MCP (Model Context Protocol) to shake things up. The panelists see it as potentially transformational but with caveats.

Manish explained why MCP matters: standardization. Before MCP, every integration with external systems was custom-built. Now there's a common framework for models to access files, call tools, and interact with APIs. Better yet, this standardization extends to observability; if everything uses the same protocol, tracing becomes manageable.

But Eli's most excited about MCP's memory capabilities. Not the basic session memory everyone's implementing, but graph-based memory systems that can make conversations truly personal and contextual. As he put it, "everybody will start claiming they have memory in their system. But you've got to ask, what kind?"

The sobering part? If these companies started their AI initiatives today, they'd build differently. Eli admitted they spent significant time developing capabilities that MCP now provides out of the box. Connect an MCP server to Claude, and you might match what teams worked months to build.

Still, Kishore offered a perspective: MCP is just a protocol, like TCP/IP. What matters is the semantic layer: the context engineering, the knowledge graphs, and the meaning you build on top. He quoted Andrej Karpathy: "context engineering beats prompt engineering." The protocol is just plumbing.

The two barriers nobody wants to admit

When an audience member asked about the biggest barriers to agent adoption, the panel's answers were revealing.

First barrier: alignment. Not technical alignment but human alignment. Manish emphasized getting all stakeholders in one room early. Risk teams, business owners, and technical teams. Without a unified agreement on the problem and approach, you're building expensive proofs-of-concept that never deploy.

This is especially true for agentic systems that fundamentally change workflows. Customer service reps weren't trained for AI assistance. That's change management. And change management at enterprise scale is brutal.

Second barrier: starting with solutions instead of problems. Eli shared a cautionary tale: executives demanding "build as much AI as you can." They built plenty. Most of it will never create value because it was technology looking for a problem.

His framework for avoiding this? Define what AI excels at in your industry:

Automation of repetitive tasks
Surfacing unique insights
Decision assistance
Scenario simulation

Then, decompose your business problems the old-fashioned way. See where these capabilities genuinely help. String together small wins into larger systems.

The reality check every enterprise needs

After moderating this panel, I'm convinced of three things:

One, the companies succeeding with AI agents aren't the ones with the biggest budgets or fanciest tech. They're the ones starting small, measuring obsessively, and building on proven foundations.

Two, infrastructure matters more than models. Every panelist spent more time discussing gateways, observability, and protocols than on which LLM they use. The plumbing determines what you can build.

Three, we're still in the early days. When experienced teams at major enterprises admit they're "figuring it out," that's just honesty. The playbook for enterprise AI agents is being written right now, in real time, by teams willing to share both successes and failures.

The path forward isn't about revolutionary leaps. It's about evolutionary steps. Crawl with simple automations. Walk with integrated workflows. Maybe then, and only then, run with fully autonomous agents.

As Kishore reminded us, there's always a human somewhere in these loops who understands the work. Start there. Build from that understanding. And don't let anyone convince you that throwing agents at undefined problems is innovation.

It's not. It's just expensive experimentation.

The real innovation happens when you match AI's capabilities to genuine business needs, build the infrastructure to support it at scale, and create transparency that satisfies everyone from engineers to auditors. That's not sexy. But it's what actually works.