You know that feeling when you're building something and the ground keeps shifting beneath your feet? That's exactly what it's like constructing an agentic AI stack right now. The GPUs evolve, the frameworks update, the models improve; everything's in constant flux. But here's what I've learned: some things remain constant, and those are the foundations you need to focus on.

I recently shared my journey building an agentic stack for StartUp Play, an OTT platform aggregator service. Let me walk you through what worked, what didn't, and what you absolutely need to know if you're venturing into this space.

The enterprise evolution that got us here

Think about where we've come from. We started with monolithic architectures, and hey, Prime Video still uses them for monitoring, so they're not dead yet. Then came the progression: servers, microservices, event-driven architectures, and finally serverless with Lambda functions.

Now? We're in the AI-native era. And that means adding reasoning capabilities, large language models, RAG systems, and agent AI into our existing enterprise stacks. The biggest challenge isn't the technology itself, but the integration. How do you weave agentic capabilities into systems that are already running, already serving customers, already generating revenue?

The layers that matter (and why you can't skip any)

Let me paint you a picture of what a modern agentic stack actually looks like. Yes, it's complex. No, you can't skip layers and hope for the best.

Starting from the top, you've got your API layers: the interface between your agents and the world. Below that sits the orchestration layer, whether that's Kubernetes, microservices, or something like LangGraph for workflows. Then come your language models (large or small, depending on your use case), followed by the memory and context layer—this is where embeddings live, where knowledge graphs provide semantic understanding.

The action layer is where things get interesting. Your agents need tools and APIs to actually do things in the real world. And underneath it all? Data and governance. Because without proper data handling and security, you're building a house of cards.

When GPT-5 thinks like a scientist
GPT-5 is transforming research with novel insights, deep literature search, and human-AI collaboration that accelerates scientific breakthroughs.

The microservices mandate

Here's something crucial: your microservices must be stateless. I can't stress this enough. Store your state information in Kafka, Redis, Cassandra, or MongoDB -anywhere but in the service itself. This isn't just about following best practices; it's about building something that can scale when you need it to.

And speaking of scale, let me touch on something we achieved: a system supporting one million transactions per second. Yes, you read that right. It's possible, but only if you architect for it from day one.

Your APIs need clear lifecycle management. Are they experimental? Stable? Deprecated? This matters more than you think, especially when you're iterating rapidly.

Database writes should be append-only. For reads, leverage caches aggressively. And your data pipeline? It needs schema validation, ETL processes, incremental loads, and backfill capabilities. These aren't nice-to-haves; they're essential.

For expert advice like this straight to your inbox every other Friday, sign up for Pro+ membership.

You'll also get access to 300+ hours of exclusive video content, a complimentary Summit ticket, and so much more.

So, what are you waiting for?

Get Pro+

The five paths forward

Through trial and error, I've identified five distinct approaches to building your agentic stack. Let me break them down:

Path one is for teams with existing enterprise systems. You've got microservices, they're stateless, and you're offloading state to Redis or Kafka. The beauty here? Token efficiency. You're not calling models unnecessarily. Maybe you've got a Lambda function running for 15 minutes, calling LLMs or small language models as needed. It's fast to market because you're building on what you already have.

Path two looks similar but with one key difference: hosting. In path one, you host the models yourself. Path two leverages public cloud providers: Google, Azure, AWS. The trade-off? Less control for more convenience.

Path three introduces MCP (Model Context Protocol) as a separate component. This standardizes your tooling, querying, and resource access. It's about creating consistency in a world of constant change.

Path four focuses on workflows. Tools like LangGraph let you define states and transitions, calling different models or agents based on where you are in the process. It's powerful for complex, multi-step operations.

Path five (and this is bleeding-edge stuff)involves agent sandboxes. Think of it like Android apps running in sandboxes over Linux. Everything's controlled: your data, your file system, your execution environment. This literally emerged last week with announcements from Enterprise Agent Cloud and Kubernetes North America 2025. I'm optimistic about this approach. Imagine agent stores where developers deploy agents like mobile apps. We're not there yet, but it's coming.

Use cases that taught me everything

Let me share what we built for our OTT aggregator platform. Instead of subscribing to multiple streaming services, users subscribe to our aggregator and access all of them through one interface. We built models for metadata enrichment, recommendations, search, video monitoring, quality of experience tracking, and content publishing.

Here's the crucial lesson: we built this framework three years ago. The models have changed. The frameworks have evolved. But the application data, the interface patterns, the user insights we captured? Those are still gold. The data you collect today will outlive any specific model or framework you choose.

Our multimodal recommendation system taught us the value of flexibility. We use a proxy and a load balancer to route calls between locally hosted models and remote ones. This means we can switch models without disrupting the service. That kind of architectural decision pays dividends over time.

Case Study: Loveable
Loveable, the Stockholm-based “vibe coding” platform, is demonstrating that Europe is still a prime incubator for global AI unicorns.

Data: the constant in a world of variables

Let me be crystal clear about this: data handling will make or break your agentic system. You need to think about data at three levels:

  1. Session data: What remains within a single user session?
  2. Cross-session data: What persists across multiple interactions?
  3. Long-term data: What becomes part of your institutional knowledge?

Every piece of data entering your system needs to be captured and orchestrated. Whether it needs ranking, deduplication, prioritization, or aging, you need a plan. This isn't sexy work, but it's the foundation everything else builds on.

We experimented extensively with vector databases. LightFM and DeepFM models were giving us slow query-to-embedding performance. After testing multiple options, we landed on Milvus for its scale capabilities. For knowledge graphs, we went deep on metadata enrichment, carefully designing our node and context structures.

The build versus buy decision matrix

This is where things get strategic. You need to identify what won't change and what will add unique value to your organization. Here's my framework:

Build these components:

  • Your orchestration layer (if you have microservices, keep them stateless and add an SEO layer for distribution)
  • Memory architecture (Redis or Hazelcast for short-term, Neo4j for knowledge graphs)
  • Context routing (this is your secret sauce, keep it in-house)
  • Data pipeline (transformation, schema mapping, deduplication - all critical and specific to your use case)
  • Governance and safety rules (domain-specific and crucial for compliance)
  • Cost optimization and model routing (you need visibility into what's costing you money)

Buy or adopt these:

  • Large and small language models (the open-source ecosystem is rich here)
  • Edge inference capabilities (Akamai's edge inference is game-changing for scale)
  • Vector databases (Milvus has proven itself)
  • MCP frameworks (LangGraph or CrewAI are solid choices)
  • DevOps and MLOps platforms (unless you have very specific needs)
  • Experimental platforms (Weights & Biases, Comet, or MLflow for model versioning)

The edge inference revolution

Here's something that doesn't get enough attention: inference doesn't need to happen in your centralized infrastructure. Edge inference is critical for scale. When you're pushing toward that million TPS mark, centralizing all inference becomes your bottleneck. Akamai and CloudFlare are doing incredible things here. Consider it seriously.

Small AI models can see for powerful language models like GPT-4
A new framework called BeMyEyes shows how lightweight vision models can act as “eyes” for text-only AI systems.

Your integration touchpoints strategy

This is about future-proofing. Your stack needs multiple integration touchpoints: API-driven, modular, replaceable. You will change models. You will adopt new platforms. If you're tightly coupled to any single component, you're setting yourself up for pain.

The takeaways that matter

After building and rebuilding these systems, here's what I know for certain:

First, your enterprise system's founding pillars still matter. Scale, reliability, security; these don't go away because you added AI. They become more critical.

Second, intelligence must be woven into your enterprise fabric, not bolted on. Your agentic architecture needs to reason, adapt, collaborate, and, crucially, work with your existing systems.

Third, identify your business case now. Not in three months. Not after more research. Now. Use prompts and agents to build something today. But recognize that's just stage one.

Fourth, build your workspace by establishing agentic rules and structures specific to your domain. This isn't about following someone else's playbook; it's about creating your own.

Fifth, create solid application workflows that handle memory, context, and knowledge graphs. This becomes your wealth of information, something only you can create for your specific domain.

Sixth, fine-tune relentlessly. Generic language models won't cut it. Whether you use LoRA, QLoRA, or other methods, you need models that understand your specific context.

Seventh, invest in better inference methods. Edge-based inference isn't optional if you want to scale. Think Meta-scale, not MVP-scale.

Finally, own your domain. The application layer, the data, the user behaviors, these are yours. They'll outlast any specific technology choice you make today.

The bottom line

Building an agentic stack feels like constructing a building during an earthquake. Everything's shifting, evolving, improving. But some things remain constant: the need for solid architecture, the value of your data, and the importance of building for change rather than stability.

Your application layer and the data it generates will be with you long after today's hot framework is obsolete. Build your stack to capture and leverage that value. Make it flexible enough to evolve but stable enough to rely on.

The models will change. The frameworks will evolve. But the problems you're solving and the value you're creating? Those are yours to own. Build accordingly.


Prathap Chowdry, SVP, Head Product Engineering at Tata Play, gave this presentation at our Agentic AI Summit in London, December 2025.